[Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-10 Thread Jan Kiszka
This enables acceleration for MMIO-based TPR registers accesses of
32-bit Windows guest systems. It is mostly useful with KVM enabled,
either on older Intel CPUs (without flexpriority feature, can also be
manually disabled for testing) or any current AMD processor.

The approach introduced here is derived from the original version of
qemu-kvm. It was refactored, documented, and extended by support for
user space APIC emulation, both with and without KVM acceleration. The
VMState format was kept compatible, so was the ABI to the option ROM
that implements the guest-side para-virtualized driver service. This
enables seamless migration from qemu-kvm to upstream or, one day,
between KVM and TCG mode.

The basic concept goes like this:
 - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
   irqchip) a vmcall hypercall is registered
 - VAPIC option ROM is loaded into guest
 - option ROM activates TPR MMIO access reporting via port 0x7e
 - TPR accesses are trapped and patched in the guest to call into option
   ROM instead, VAPIC support is enabled
 - option ROM TPR helpers track state in memory and invoke hypercall to
   poll for pending IRQs if required

Signed-off-by: Jan Kiszka 
---
 Makefile.target|3 +-
 hw/apic.c  |  126 -
 hw/apic_common.c   |   64 +-
 hw/apic_internal.h |   27 ++
 hw/kvm/apic.c  |   32 +++
 hw/kvmvapic.c  |  774 
 6 files changed, 1012 insertions(+), 14 deletions(-)
 create mode 100644 hw/kvmvapic.c

diff --git a/Makefile.target b/Makefile.target
index 68481a3..ec7eff8 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -230,7 +230,8 @@ obj-y += device-hotplug.o
 
 # Hardware support
 obj-i386-y += mc146818rtc.o pc.o
-obj-i386-y += sga.o apic_common.o apic.o ioapic_common.o ioapic.o piix_pci.o
+obj-i386-y += apic_common.o apic.o kvmvapic.o
+obj-i386-y += sga.o ioapic_common.o ioapic.o piix_pci.o
 obj-i386-y += vmport.o
 obj-i386-y += pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
diff --git a/hw/apic.c b/hw/apic.c
index 086c544..2ebf3ca 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -35,6 +35,10 @@
 #define MSI_ADDR_DEST_ID_SHIFT 12
 #defineMSI_ADDR_DEST_ID_MASK   0x000
 
+#define SYNC_FROM_VAPIC 0x1
+#define SYNC_TO_VAPIC   0x2
+#define SYNC_ISR_IRR_TO_VAPIC   0x4
+
 static APICCommonState *local_apics[MAX_APICS + 1];
 
 static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
@@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
 return !!(tab[i] & mask);
 }
 
+/* return -1 if no bit is set */
+static int get_highest_priority_int(uint32_t *tab)
+{
+int i;
+for (i = 7; i >= 0; i--) {
+if (tab[i] != 0) {
+return i * 32 + fls_bit(tab[i]);
+}
+}
+return -1;
+}
+
+static void apic_sync_vapic(APICCommonState *s, int sync_type)
+{
+VAPICState vapic_state;
+size_t length;
+off_t start;
+int vector;
+
+if (!s->vapic_paddr) {
+return;
+}
+if (sync_type & SYNC_FROM_VAPIC) {
+cpu_physical_memory_rw(s->vapic_paddr, (void *)&vapic_state,
+   sizeof(vapic_state), 0);
+s->tpr = vapic_state.tpr;
+}
+if (sync_type & (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
+start = offsetof(VAPICState, isr);
+length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
+
+if (sync_type & SYNC_TO_VAPIC) {
+assert(qemu_cpu_is_self(s->cpu_env));
+
+vapic_state.tpr = s->tpr;
+vapic_state.enabled = 1;
+start = 0;
+length = sizeof(VAPICState);
+}
+
+vector = get_highest_priority_int(s->isr);
+if (vector < 0) {
+vector = 0;
+}
+vapic_state.isr = vector & 0xf0;
+
+vapic_state.zero = 0;
+
+vector = get_highest_priority_int(s->irr);
+if (vector < 0) {
+vector = 0;
+}
+vapic_state.irr = vector & 0xff;
+
+cpu_physical_memory_write_rom(s->vapic_paddr + start,
+  ((void *)&vapic_state) + start, length);
+}
+}
+
+static void apic_vapic_base_update(APICCommonState *s)
+{
+apic_sync_vapic(s, SYNC_TO_VAPIC);
+}
+
 static void apic_local_deliver(APICCommonState *s, int vector)
 {
 uint32_t lvt = s->lvt[vector];
@@ -239,20 +307,17 @@ static void apic_set_base(APICCommonState *s, uint64_t 
val)
 
 static void apic_set_tpr(APICCommonState *s, uint8_t val)
 {
-s->tpr = (val & 0x0f) << 4;
-apic_update_irq(s);
+/* Updates from cr8 are ignored while the VAPIC is active */
+if (!s->vapic_paddr) {
+s->tpr = val << 4;
+apic_update_irq(s);
+}
 }
 
-/* return -1 if no bit is set */
-static int get_highest_priority_int(uint32_t *tab)
+static uint8_t apic_get_tpr(APICCommonState *s)
 {
-int i;
-   

Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-11 Thread Blue Swirl
On Fri, Feb 10, 2012 at 18:31, Jan Kiszka  wrote:
> This enables acceleration for MMIO-based TPR registers accesses of
> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
> either on older Intel CPUs (without flexpriority feature, can also be
> manually disabled for testing) or any current AMD processor.
>
> The approach introduced here is derived from the original version of
> qemu-kvm. It was refactored, documented, and extended by support for
> user space APIC emulation, both with and without KVM acceleration. The
> VMState format was kept compatible, so was the ABI to the option ROM
> that implements the guest-side para-virtualized driver service. This
> enables seamless migration from qemu-kvm to upstream or, one day,
> between KVM and TCG mode.
>
> The basic concept goes like this:
>  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
>   irqchip) a vmcall hypercall is registered
>  - VAPIC option ROM is loaded into guest
>  - option ROM activates TPR MMIO access reporting via port 0x7e
>  - TPR accesses are trapped and patched in the guest to call into option
>   ROM instead, VAPIC support is enabled
>  - option ROM TPR helpers track state in memory and invoke hypercall to
>   poll for pending IRQs if required
>
> Signed-off-by: Jan Kiszka 

I must say that I find the approach horrible, patching guests and ROMs
and looking up Windows internals. Taking the same approach to extreme,
we could for example patch Xen guest to become a KVM guest. Not that I
object merging.

> ---
>  Makefile.target    |    3 +-
>  hw/apic.c          |  126 -
>  hw/apic_common.c   |   64 +-
>  hw/apic_internal.h |   27 ++
>  hw/kvm/apic.c      |   32 +++
>  hw/kvmvapic.c      |  774 
> 
>  6 files changed, 1012 insertions(+), 14 deletions(-)
>  create mode 100644 hw/kvmvapic.c
>
> diff --git a/Makefile.target b/Makefile.target
> index 68481a3..ec7eff8 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -230,7 +230,8 @@ obj-y += device-hotplug.o
>
>  # Hardware support
>  obj-i386-y += mc146818rtc.o pc.o
> -obj-i386-y += sga.o apic_common.o apic.o ioapic_common.o ioapic.o piix_pci.o
> +obj-i386-y += apic_common.o apic.o kvmvapic.o
> +obj-i386-y += sga.o ioapic_common.o ioapic.o piix_pci.o
>  obj-i386-y += vmport.o
>  obj-i386-y += pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> diff --git a/hw/apic.c b/hw/apic.c
> index 086c544..2ebf3ca 100644
> --- a/hw/apic.c
> +++ b/hw/apic.c
> @@ -35,6 +35,10 @@
>  #define MSI_ADDR_DEST_ID_SHIFT         12
>  #define        MSI_ADDR_DEST_ID_MASK           0x000
>
> +#define SYNC_FROM_VAPIC                 0x1
> +#define SYNC_TO_VAPIC                   0x2
> +#define SYNC_ISR_IRR_TO_VAPIC           0x4

Enum, please.

> +
>  static APICCommonState *local_apics[MAX_APICS + 1];
>
>  static void apic_set_irq(APICCommonState *s, int vector_num, int 
> trigger_mode);
> @@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
>     return !!(tab[i] & mask);
>  }
>
> +/* return -1 if no bit is set */
> +static int get_highest_priority_int(uint32_t *tab)
> +{
> +    int i;
> +    for (i = 7; i >= 0; i--) {
> +        if (tab[i] != 0) {
> +            return i * 32 + fls_bit(tab[i]);
> +        }
> +    }
> +    return -1;
> +}
> +
> +static void apic_sync_vapic(APICCommonState *s, int sync_type)
> +{
> +    VAPICState vapic_state;
> +    size_t length;
> +    off_t start;
> +    int vector;
> +
> +    if (!s->vapic_paddr) {
> +        return;
> +    }
> +    if (sync_type & SYNC_FROM_VAPIC) {
> +        cpu_physical_memory_rw(s->vapic_paddr, (void *)&vapic_state,
> +                               sizeof(vapic_state), 0);
> +        s->tpr = vapic_state.tpr;
> +    }
> +    if (sync_type & (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
> +        start = offsetof(VAPICState, isr);
> +        length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
> +
> +        if (sync_type & SYNC_TO_VAPIC) {
> +            assert(qemu_cpu_is_self(s->cpu_env));
> +
> +            vapic_state.tpr = s->tpr;
> +            vapic_state.enabled = 1;
> +            start = 0;
> +            length = sizeof(VAPICState);
> +        }
> +
> +        vector = get_highest_priority_int(s->isr);
> +        if (vector < 0) {
> +            vector = 0;
> +        }
> +        vapic_state.isr = vector & 0xf0;
> +
> +        vapic_state.zero = 0;
> +
> +        vector = get_highest_priority_int(s->irr);
> +        if (vector < 0) {
> +            vector = 0;
> +        }
> +        vapic_state.irr = vector & 0xff;
> +
> +        cpu_physical_memory_write_rom(s->vapic_paddr + start,
> +                                      ((void *)&vapic_state) + start, 
> length);

This assumes that the vapic_state structure matches guest what guest
expect without conversion. Is this true for i386 on x86_64? I didn't
check the structure in question.

> +    }
> +}
> +
> +static void

Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-13 Thread Jan Kiszka
On 2012-02-11 16:25, Blue Swirl wrote:
> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka  wrote:
>> This enables acceleration for MMIO-based TPR registers accesses of
>> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
>> either on older Intel CPUs (without flexpriority feature, can also be
>> manually disabled for testing) or any current AMD processor.
>>
>> The approach introduced here is derived from the original version of
>> qemu-kvm. It was refactored, documented, and extended by support for
>> user space APIC emulation, both with and without KVM acceleration. The
>> VMState format was kept compatible, so was the ABI to the option ROM
>> that implements the guest-side para-virtualized driver service. This
>> enables seamless migration from qemu-kvm to upstream or, one day,
>> between KVM and TCG mode.
>>
>> The basic concept goes like this:
>>  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
>>   irqchip) a vmcall hypercall is registered
>>  - VAPIC option ROM is loaded into guest
>>  - option ROM activates TPR MMIO access reporting via port 0x7e
>>  - TPR accesses are trapped and patched in the guest to call into option
>>   ROM instead, VAPIC support is enabled
>>  - option ROM TPR helpers track state in memory and invoke hypercall to
>>   poll for pending IRQs if required
>>
>> Signed-off-by: Jan Kiszka 
> 
> I must say that I find the approach horrible, patching guests and ROMs
> and looking up Windows internals. Taking the same approach to extreme,
> we could for example patch Xen guest to become a KVM guest. Not that I
> object merging.

Yes, this is horrible. But there is no real better way in the absence of
hardware assisted virtualization of the TPR. I think MS is recommending
this patching approach as well.

>> diff --git a/hw/apic.c b/hw/apic.c
>> index 086c544..2ebf3ca 100644
>> --- a/hw/apic.c
>> +++ b/hw/apic.c
>> @@ -35,6 +35,10 @@
>>  #define MSI_ADDR_DEST_ID_SHIFT 12
>>  #defineMSI_ADDR_DEST_ID_MASK   0x000
>>
>> +#define SYNC_FROM_VAPIC 0x1
>> +#define SYNC_TO_VAPIC   0x2
>> +#define SYNC_ISR_IRR_TO_VAPIC   0x4
> 
> Enum, please.

OK.

> 
>> +
>>  static APICCommonState *local_apics[MAX_APICS + 1];
>>
>>  static void apic_set_irq(APICCommonState *s, int vector_num, int 
>> trigger_mode);
>> @@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
>> return !!(tab[i] & mask);
>>  }
>>
>> +/* return -1 if no bit is set */
>> +static int get_highest_priority_int(uint32_t *tab)
>> +{
>> +int i;
>> +for (i = 7; i >= 0; i--) {
>> +if (tab[i] != 0) {
>> +return i * 32 + fls_bit(tab[i]);
>> +}
>> +}
>> +return -1;
>> +}
>> +
>> +static void apic_sync_vapic(APICCommonState *s, int sync_type)
>> +{
>> +VAPICState vapic_state;
>> +size_t length;
>> +off_t start;
>> +int vector;
>> +
>> +if (!s->vapic_paddr) {
>> +return;
>> +}
>> +if (sync_type & SYNC_FROM_VAPIC) {
>> +cpu_physical_memory_rw(s->vapic_paddr, (void *)&vapic_state,
>> +   sizeof(vapic_state), 0);
>> +s->tpr = vapic_state.tpr;
>> +}
>> +if (sync_type & (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
>> +start = offsetof(VAPICState, isr);
>> +length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
>> +
>> +if (sync_type & SYNC_TO_VAPIC) {
>> +assert(qemu_cpu_is_self(s->cpu_env));
>> +
>> +vapic_state.tpr = s->tpr;
>> +vapic_state.enabled = 1;
>> +start = 0;
>> +length = sizeof(VAPICState);
>> +}
>> +
>> +vector = get_highest_priority_int(s->isr);
>> +if (vector < 0) {
>> +vector = 0;
>> +}
>> +vapic_state.isr = vector & 0xf0;
>> +
>> +vapic_state.zero = 0;
>> +
>> +vector = get_highest_priority_int(s->irr);
>> +if (vector < 0) {
>> +vector = 0;
>> +}
>> +vapic_state.irr = vector & 0xff;
>> +
>> +cpu_physical_memory_write_rom(s->vapic_paddr + start,
>> +  ((void *)&vapic_state) + start, 
>> length);
> 
> This assumes that the vapic_state structure matches guest what guest
> expect without conversion. Is this true for i386 on x86_64? I didn't
> check the structure in question.

Yes, the structure in question is a packed one, stable on both guest and
host side (the guest side is 32-bit only anyway).

>> diff --git a/hw/apic_common.c b/hw/apic_common.c
>> index 588531b..1977da7 100644
>> --- a/hw/apic_common.c
>> +++ b/hw/apic_common.c
>> @@ -20,8 +20,10 @@
>>  #include "apic.h"
>>  #include "apic_internal.h"
>>  #include "trace.h"
>> +#include "kvm.h"
>>
>>  static int apic_irq_delivered;
>> +bool apic_report_tpr_access;
> 
> This should go to APICCommonState.

Nope, it is a global state, also checked in a place where the APIC is
set up, thus have no

Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-13 Thread Blue Swirl
On Mon, Feb 13, 2012 at 10:16, Jan Kiszka  wrote:
> On 2012-02-11 16:25, Blue Swirl wrote:
>> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka  wrote:
>>> This enables acceleration for MMIO-based TPR registers accesses of
>>> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
>>> either on older Intel CPUs (without flexpriority feature, can also be
>>> manually disabled for testing) or any current AMD processor.
>>>
>>> The approach introduced here is derived from the original version of
>>> qemu-kvm. It was refactored, documented, and extended by support for
>>> user space APIC emulation, both with and without KVM acceleration. The
>>> VMState format was kept compatible, so was the ABI to the option ROM
>>> that implements the guest-side para-virtualized driver service. This
>>> enables seamless migration from qemu-kvm to upstream or, one day,
>>> between KVM and TCG mode.
>>>
>>> The basic concept goes like this:
>>>  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
>>>   irqchip) a vmcall hypercall is registered
>>>  - VAPIC option ROM is loaded into guest
>>>  - option ROM activates TPR MMIO access reporting via port 0x7e
>>>  - TPR accesses are trapped and patched in the guest to call into option
>>>   ROM instead, VAPIC support is enabled
>>>  - option ROM TPR helpers track state in memory and invoke hypercall to
>>>   poll for pending IRQs if required
>>>
>>> Signed-off-by: Jan Kiszka 
>>
>> I must say that I find the approach horrible, patching guests and ROMs
>> and looking up Windows internals. Taking the same approach to extreme,
>> we could for example patch Xen guest to become a KVM guest. Not that I
>> object merging.
>
> Yes, this is horrible. But there is no real better way in the absence of
> hardware assisted virtualization of the TPR. I think MS is recommending
> this patching approach as well.

Maybe instead of routing via ROM and the hypercall, the TPR accesses
could be handled directly with guest invisible breakpoints (like GDB
breakpoints, but for QEMU internal use), much like other
instrumentation could be handled.

>>> diff --git a/hw/apic.c b/hw/apic.c
>>> index 086c544..2ebf3ca 100644
>>> --- a/hw/apic.c
>>> +++ b/hw/apic.c
>>> @@ -35,6 +35,10 @@
>>>  #define MSI_ADDR_DEST_ID_SHIFT         12
>>>  #define        MSI_ADDR_DEST_ID_MASK           0x000
>>>
>>> +#define SYNC_FROM_VAPIC                 0x1
>>> +#define SYNC_TO_VAPIC                   0x2
>>> +#define SYNC_ISR_IRR_TO_VAPIC           0x4
>>
>> Enum, please.
>
> OK.
>
>>
>>> +
>>>  static APICCommonState *local_apics[MAX_APICS + 1];
>>>
>>>  static void apic_set_irq(APICCommonState *s, int vector_num, int 
>>> trigger_mode);
>>> @@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
>>>     return !!(tab[i] & mask);
>>>  }
>>>
>>> +/* return -1 if no bit is set */
>>> +static int get_highest_priority_int(uint32_t *tab)
>>> +{
>>> +    int i;
>>> +    for (i = 7; i >= 0; i--) {
>>> +        if (tab[i] != 0) {
>>> +            return i * 32 + fls_bit(tab[i]);
>>> +        }
>>> +    }
>>> +    return -1;
>>> +}
>>> +
>>> +static void apic_sync_vapic(APICCommonState *s, int sync_type)
>>> +{
>>> +    VAPICState vapic_state;
>>> +    size_t length;
>>> +    off_t start;
>>> +    int vector;
>>> +
>>> +    if (!s->vapic_paddr) {
>>> +        return;
>>> +    }
>>> +    if (sync_type & SYNC_FROM_VAPIC) {
>>> +        cpu_physical_memory_rw(s->vapic_paddr, (void *)&vapic_state,
>>> +                               sizeof(vapic_state), 0);
>>> +        s->tpr = vapic_state.tpr;
>>> +    }
>>> +    if (sync_type & (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
>>> +        start = offsetof(VAPICState, isr);
>>> +        length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
>>> +
>>> +        if (sync_type & SYNC_TO_VAPIC) {
>>> +            assert(qemu_cpu_is_self(s->cpu_env));
>>> +
>>> +            vapic_state.tpr = s->tpr;
>>> +            vapic_state.enabled = 1;
>>> +            start = 0;
>>> +            length = sizeof(VAPICState);
>>> +        }
>>> +
>>> +        vector = get_highest_priority_int(s->isr);
>>> +        if (vector < 0) {
>>> +            vector = 0;
>>> +        }
>>> +        vapic_state.isr = vector & 0xf0;
>>> +
>>> +        vapic_state.zero = 0;
>>> +
>>> +        vector = get_highest_priority_int(s->irr);
>>> +        if (vector < 0) {
>>> +            vector = 0;
>>> +        }
>>> +        vapic_state.irr = vector & 0xff;
>>> +
>>> +        cpu_physical_memory_write_rom(s->vapic_paddr + start,
>>> +                                      ((void *)&vapic_state) + start, 
>>> length);
>>
>> This assumes that the vapic_state structure matches guest what guest
>> expect without conversion. Is this true for i386 on x86_64? I didn't
>> check the structure in question.
>
> Yes, the structure in question is a packed one, stable on both guest and
> host side (the guest side is 32-bit only anyway).
>
>>> diff --git a/hw/apic_common.c b/hw/apic_commo

Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-13 Thread Gleb Natapov
On Mon, Feb 13, 2012 at 06:50:08PM +, Blue Swirl wrote:
> On Mon, Feb 13, 2012 at 10:16, Jan Kiszka  wrote:
> > On 2012-02-11 16:25, Blue Swirl wrote:
> >> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka  wrote:
> >>> This enables acceleration for MMIO-based TPR registers accesses of
> >>> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
> >>> either on older Intel CPUs (without flexpriority feature, can also be
> >>> manually disabled for testing) or any current AMD processor.
> >>>
> >>> The approach introduced here is derived from the original version of
> >>> qemu-kvm. It was refactored, documented, and extended by support for
> >>> user space APIC emulation, both with and without KVM acceleration. The
> >>> VMState format was kept compatible, so was the ABI to the option ROM
> >>> that implements the guest-side para-virtualized driver service. This
> >>> enables seamless migration from qemu-kvm to upstream or, one day,
> >>> between KVM and TCG mode.
> >>>
> >>> The basic concept goes like this:
> >>>  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
> >>>   irqchip) a vmcall hypercall is registered
> >>>  - VAPIC option ROM is loaded into guest
> >>>  - option ROM activates TPR MMIO access reporting via port 0x7e
> >>>  - TPR accesses are trapped and patched in the guest to call into option
> >>>   ROM instead, VAPIC support is enabled
> >>>  - option ROM TPR helpers track state in memory and invoke hypercall to
> >>>   poll for pending IRQs if required
> >>>
> >>> Signed-off-by: Jan Kiszka 
> >>
> >> I must say that I find the approach horrible, patching guests and ROMs
> >> and looking up Windows internals. Taking the same approach to extreme,
> >> we could for example patch Xen guest to become a KVM guest. Not that I
> >> object merging.
> >
> > Yes, this is horrible. But there is no real better way in the absence of
> > hardware assisted virtualization of the TPR. I think MS is recommending
> > this patching approach as well.
> 
> Maybe instead of routing via ROM and the hypercall, the TPR accesses
> could be handled directly with guest invisible breakpoints (like GDB
> breakpoints, but for QEMU internal use), much like other
> instrumentation could be handled.
> 
Hypercall is rarely called. The idea behind patching is to not
have exit on each TPR update. Breakpoint will cause exit making the
whole exercise pointless.

--
Gleb.



Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-13 Thread Jan Kiszka
On 2012-02-13 19:50, Blue Swirl wrote:
> On Mon, Feb 13, 2012 at 10:16, Jan Kiszka  wrote:
>> On 2012-02-11 16:25, Blue Swirl wrote:
>>> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka  wrote:
 This enables acceleration for MMIO-based TPR registers accesses of
 32-bit Windows guest systems. It is mostly useful with KVM enabled,
 either on older Intel CPUs (without flexpriority feature, can also be
 manually disabled for testing) or any current AMD processor.

 The approach introduced here is derived from the original version of
 qemu-kvm. It was refactored, documented, and extended by support for
 user space APIC emulation, both with and without KVM acceleration. The
 VMState format was kept compatible, so was the ABI to the option ROM
 that implements the guest-side para-virtualized driver service. This
 enables seamless migration from qemu-kvm to upstream or, one day,
 between KVM and TCG mode.

 The basic concept goes like this:
  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
   irqchip) a vmcall hypercall is registered
  - VAPIC option ROM is loaded into guest
  - option ROM activates TPR MMIO access reporting via port 0x7e
  - TPR accesses are trapped and patched in the guest to call into option
   ROM instead, VAPIC support is enabled
  - option ROM TPR helpers track state in memory and invoke hypercall to
   poll for pending IRQs if required

 Signed-off-by: Jan Kiszka 
>>>
>>> I must say that I find the approach horrible, patching guests and ROMs
>>> and looking up Windows internals. Taking the same approach to extreme,
>>> we could for example patch Xen guest to become a KVM guest. Not that I
>>> object merging.
>>
>> Yes, this is horrible. But there is no real better way in the absence of
>> hardware assisted virtualization of the TPR. I think MS is recommending
>> this patching approach as well.
> 
> Maybe instead of routing via ROM and the hypercall, the TPR accesses
> could be handled directly with guest invisible breakpoints (like GDB
> breakpoints, but for QEMU internal use), much like other
> instrumentation could be handled.

Gleb answered it already.

 @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev)
  {
 APICCommonState *s = APIC_COMMON(dev);
 APICCommonClass *info;
 +static DeviceState *vapic;
 static int apic_no;

 if (apic_no >= MAX_APICS) {
 @@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev)
 info = APIC_COMMON_GET_CLASS(s);
 info->init(s);

 -sysbus_init_mmio(&s->busdev, &s->io_memory);
 +sysbus_init_mmio(dev, &s->io_memory);
 +
 +if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK) {
 +vapic = sysbus_create_simple("kvmvapic", -1, NULL);
 +}
 +s->vapic = vapic;
 +if (apic_report_tpr_access && info->enable_tpr_reporting) {
>>>
>>> I think you should not rely on apic_report_tpr_access being in sane
>>> condition during class init.
>>
>> It is mandatory, e.g. for CPU hotplug, as reporting needs to be
>> consistent accross all VCPUs. Therefore it is a static global, set to
>> false initially. However, you are right, we lack proper clearing of  the
>> access report feature on reset, not only in this variable.
> 
> I'd also set it to false initially.

It's a global variable, thus initialized to false by definition.

 +
 +#define VAPIC_CPU_SHIFT 7
 +
 +#define ROM_BLOCK_SIZE  512
 +#define ROM_BLOCK_MASK  (~(ROM_BLOCK_SIZE - 1))
 +
 +typedef struct VAPICHandlers {
 +uint32_t set_tpr;
 +uint32_t set_tpr_eax;
 +uint32_t get_tpr[8];
 +uint32_t get_tpr_stack;
 +} QEMU_PACKED VAPICHandlers;
 +
 +typedef struct GuestROMState {
 +char signature[8];
 +uint32_t vaddr;
>>>
>>> This does not look 64 bit clean.
>>
>> It's packed.
> 
> I meant "virtual address could be 64 bits on a 64 bit host", not
> structure packing.

This is for 32-bit guests only. 64-bit Windows doesn't access the TPR
via MMIO, thus is not activating the VAPIC.

 +uint32_t state;
 +uint32_t rom_state_paddr;
 +uint32_t rom_state_vaddr;
 +uint32_t vapic_paddr;
 +uint32_t real_tpr_addr;
 +GuestROMState rom_state;
 +size_t rom_size;
 +} VAPICROMState;
 +
 +#define TPR_INSTR_IS_WRITE  0x1
 +#define TPR_INSTR_ABS_MODRM 0x2
 +#define TPR_INSTR_MATCH_MODRM_REG   0x4
 +
 +typedef struct TPRInstruction {
 +uint8_t opcode;
 +uint8_t modrm_reg;
 +unsigned int flags;
 +size_t length;
 +off_t addr_offset;
 +} TPRInstruction;
>>>
>>> Also here the order is pessimized.
>>
>> Don't see the gain here, though.
> 
> There are two bytes' hole between modrm_reg and flags, maybe also 4
> bytes between len

Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-13 Thread Gleb Natapov
On Mon, Feb 13, 2012 at 08:22:21PM +0100, Jan Kiszka wrote:
> >> Unfortunately, this is only an internal structure, not officially
> >> documented by MS. However, all supported OS versions a legacy by now, no
> >> longer changing its structure.
> > 
> > This and a note about the supported OS versions could be added as comment.
> 
> OK.
> 
> For the folks that developed it in qemu-kvm: This targets Windows XP,
> Vista and Server 2003, all 32-bit, right?
> 
Not Vista. Not sure about Server 2003.

--
Gleb.



Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-14 Thread Jan Kiszka
On 2012-02-14 08:54, Gleb Natapov wrote:
> On Mon, Feb 13, 2012 at 08:22:21PM +0100, Jan Kiszka wrote:
 Unfortunately, this is only an internal structure, not officially
 documented by MS. However, all supported OS versions a legacy by now, no
 longer changing its structure.
>>>
>>> This and a note about the supported OS versions could be added as comment.
>>
>> OK.
>>
>> For the folks that developed it in qemu-kvm: This targets Windows XP,
>> Vista and Server 2003, all 32-bit, right?
>>
> Not Vista. Not sure about Server 2003.

I think I saw some 2003 reference in the qemu-kvm git logs.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-14 Thread Gleb Natapov
On Tue, Feb 14, 2012 at 09:55:46AM +0100, Jan Kiszka wrote:
> On 2012-02-14 08:54, Gleb Natapov wrote:
> > On Mon, Feb 13, 2012 at 08:22:21PM +0100, Jan Kiszka wrote:
>  Unfortunately, this is only an internal structure, not officially
>  documented by MS. However, all supported OS versions a legacy by now, no
>  longer changing its structure.
> >>>
> >>> This and a note about the supported OS versions could be added as comment.
> >>
> >> OK.
> >>
> >> For the folks that developed it in qemu-kvm: This targets Windows XP,
> >> Vista and Server 2003, all 32-bit, right?
> >>
> > Not Vista. Not sure about Server 2003.
> 
> I think I saw some 2003 reference in the qemu-kvm git logs.
> 
Very likely. AFAIK it uses the same kernel as XP.

--
Gleb.