RE: [PATCH 5/6] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests

2008-10-26 Thread Han, Weidong
Su, Disheng wrote:
> Amit Shah wrote:
>> This patch has been contributed to by the following people:
>> 
>> From: Or Sagi <[EMAIL PROTECTED]>
>> From: Nir Peleg <[EMAIL PROTECTED]>
>> From: Amit Shah <[EMAIL PROTECTED]>
>> From: Ben-Ami Yassour <[EMAIL PROTECTED]>
>> From: Weidong Han <[EMAIL PROTECTED]>
>> From: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
>> 
>> With this patch, we can assign a device on the host machine to a
>> guest. 
>> 
>> A new command-line option, -pcidevice is added.
>> To invoke it for a device sitting at PCI bus:dev.fn 04:08.0, use
>> this: 
>> 
>> -pcidevice host=04:08.0
>> 
>> * The host driver for the device, if any, is to be removed before
>> assigning the device (else device assignment will fail).
>> 
>> * A device that shares IRQ with another host device cannot currently
>> be assigned. 
>> 
>> * The RAW_IO capability is needed for this to work
>> 
>> This works only with the in-kernel irqchip method; to use the
>> userspace irqchip, a kernel module (irqhook) and some extra changes
>> are needed. 
>> 
>> Signed-off-by: Amit Shah <[EMAIL PROTECTED]>
>> ---
>>  qemu/Makefile.target|1 +
>>  qemu/hw/device-assignment.c |  619
>>  +++
>>  qemu/hw/device-assignment.h |   98 +++ qemu/hw/pc.c
>>  |6 + qemu/hw/pci.c   |7 +
>>  qemu/vl.c   |   18 ++
>>  6 files changed, 749 insertions(+), 0 deletions(-)
>>  create mode 100644 qemu/hw/device-assignment.c
>>  create mode 100644 qemu/hw/device-assignment.h
>> 
>> diff --git a/qemu/Makefile.target b/qemu/Makefile.target
>> index d9bdeca..05a1d84 100644
>> --- a/qemu/Makefile.target
>> +++ b/qemu/Makefile.target
>> @@ -621,6 +621,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW)
>>  dma.o OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o
>>  pc.o OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
>>  OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o +OBJS+=
>>  device-assignment.o ifeq ($(USE_KVM_PIT), 1)
>>  OBJS+= i8254-kvm.o
>>  endif
>> diff --git a/qemu/hw/device-assignment.c
>> b/qemu/hw/device-assignment.c new file mode 100644 index
>> 000..5ba21a0 --- /dev/null
>> +++ b/qemu/hw/device-assignment.c
>> @@ -0,0 +1,619 @@
>> +/*
>> + * Copyright (c) 2007, Neocleus Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> modify it + * under the terms and conditions of the GNU General
>> Public License, + * version 2, as published by the Free Software
>> Foundation. + * + * This program is distributed in the hope it will
>> be useful, but WITHOUT + * ANY WARRANTY; without even the implied
>> warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE.
>> See the GNU General Public License for + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> along with + * this program; if not, write to the Free Software
>> Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA
>> 02111-1307 USA. + * + *
>> + *  Assign a PCI device from the host to a guest VM. + *
>> + *  Adapted for KVM by Qumranet.
>> + *
>> + *  Copyright (c) 2007, Neocleus, Alex Novik ([EMAIL PROTECTED])
>> + *  Copyright (c) 2007, Neocleus, Guy Zana ([EMAIL PROTECTED])
>> + *  Copyright (C) 2008, Qumranet, Amit Shah ([EMAIL PROTECTED])
>> + *  Copyright (C) 2008, Red Hat, Amit Shah ([EMAIL PROTECTED]) +
>> */ +#include 
>> +#include 
>> +#include "qemu-kvm.h"
>> +#include "hw.h"
>> +#include "pc.h"
>> +#include "sysemu.h"
>> +#include "console.h"
>> +#include 
>> +#include "device-assignment.h"
>> +
>> +/* From linux/ioport.h */
>> +#define IORESOURCE_IO   0x0100  /* Resource type */
>> +#define IORESOURCE_MEM  0x0200
>> +#define IORESOURCE_IRQ  0x0400
>> +#define IORESOURCE_DMA  0x0800
>> +#define IORESOURCE_PREFETCH 0x1000  /* No side effects */ +
>> +/* #define DEVICE_ASSIGNMENT_DEBUG 1 */
>> +
>> +#ifdef DEVICE_ASSIGNMENT_DEBUG
>> +#define DEBUG(fmt, args...)   \
>> +do {  \
>> +  fprintf(stderr, "%s: " fmt, __func__ , ## args);\ +}
>> while (0) +#else
>> +#define DEBUG(fmt, args...) do { } while(0)
>> +#endif
>> +
>> +static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr,
>> +   uint32_t value) +{
>> +AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
>> +uint32_t r_pio = (unsigned long)r_access->r_virtbase
>> ++ (addr - r_access->e_physbase);
>> +
>> +DEBUG(stderr, "%s: r_pio=%08x e_physbase=%08x"
>> +  " r_virtbase=%08lx value=%08x\n",
>> +  __func__, r_pio, (int)r_access->e_physbase,
>> +  (unsigned long)r_access->r_virtbase, value); +   
>> outb(value, r_pio); +}
>> +
>> +static void assigned_dev_ioport_writew(void *opaque, uint32_t addr,
>> +   uint32_t value) +{
>> +AssignedDevRe

Re: [PATCH 0/2] kvm: disable virtualization on kdump

2008-10-26 Thread Eric W. Biederman
Eduardo Habkost <[EMAIL PROTECTED]> writes:

> On Sun, Oct 26, 2008 at 05:07:45PM +0200, Avi Kivity wrote:
>> Eric W. Biederman wrote:
> 
> Is it possible to disable vmx mode before we enable interrrupts in the
> kdump kernel?
>
>   
 You need IPIs to disable vmx on smp.
 
>>>
>>> Thank you.  Reading your description and taking a quick look at
>>> the code in hardware disable it does not appear that there is
>>> anything needed (other than restricting ourselves it running
>>> uniprocessor in the kdump kernel) that needs to happen.
>>>
>>> Certainly it would be nice to have kvm disabled in hardware,
>>> but if you are proposing using the existing hardware disable
>>> I must say that the cure looks much worse than the disease.
>>>   
>>
>> Certainly you don't want to issue IPIs when kdump()ing.  It's not  
>> unlikely that the other cpus have interrupts permanently disabled.
>>
>> (we can use NMI IPIs, but that will likely be messy)
>
> NMI IPIs are already used on x86 native_machine_crash_shutdown(), so
> it wouldn't get more messy that it is currently. We just need to add
> another bit of code to the code that already runs on an NMI handler.

Yes.  And handling of those NMIs is best effort.  Nothing fails if
they don't actually run.

> My question is: is a notifier chain too much complexity for a sensible
> piece of code like that? If so, a compile-time hook on that point
> would be safer, but then it wouldn't work when KVM is compiled as a
> out-of-tree module.

Well we could fairly easily have a non-modular function that does.
if (vmx_present && vmx_enabled) {
   turn_off_vmx();
}

Which at first skim looks like it is all of about 10-20 machine
instructions.

There are a few real places where we need code on the kdump
path because there it is not possible to do the work any
other way.  However we need to think long and hard about
that because placing the code anywhere besides in a broken
and failing kernel is going to be easier to maintain and
more reliable.

I oppose an atomic notifier because it makes the review
essentially impossible.  If any module can come in and register
a notifier we can't know what code is running on that code
path and we can't be certain the code is safe in an abnormal
case to run on that code path.

Right now we only need to support vmx on the kdump path because
of what appears to be a hardware design bug.  Enabling vmx
apparently disables standard functions like an INIT IPI.  Things
like this do happen but they should be rare.

> Good point. My problem was a hang when booting the kdump kernel, but it
> may also cause problems later, when the kdump kernel reboots.

What was the cause of the hang when booting the kdump kernel?

Eric

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 5/6] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests

2008-10-26 Thread Su, Disheng
Amit Shah wrote:
> This patch has been contributed to by the following people:
>
> From: Or Sagi <[EMAIL PROTECTED]>
> From: Nir Peleg <[EMAIL PROTECTED]>
> From: Amit Shah <[EMAIL PROTECTED]>
> From: Ben-Ami Yassour <[EMAIL PROTECTED]>
> From: Weidong Han <[EMAIL PROTECTED]>
> From: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
>
> With this patch, we can assign a device on the host machine to a
> guest.
>
> A new command-line option, -pcidevice is added.
> To invoke it for a device sitting at PCI bus:dev.fn 04:08.0, use this:
>
> -pcidevice host=04:08.0
>
> * The host driver for the device, if any, is to be removed before
> assigning the device (else device assignment will fail).
>
> * A device that shares IRQ with another host device cannot currently
> be assigned.
>
> * The RAW_IO capability is needed for this to work
>
> This works only with the in-kernel irqchip method; to use the
> userspace irqchip, a kernel module (irqhook) and some extra changes
> are needed.
>
> Signed-off-by: Amit Shah <[EMAIL PROTECTED]>
> ---
>  qemu/Makefile.target|1 +
>  qemu/hw/device-assignment.c |  619
>  +++
>  qemu/hw/device-assignment.h |   98 +++ qemu/hw/pc.c
>  |6 + qemu/hw/pci.c   |7 +
>  qemu/vl.c   |   18 ++
>  6 files changed, 749 insertions(+), 0 deletions(-)
>  create mode 100644 qemu/hw/device-assignment.c
>  create mode 100644 qemu/hw/device-assignment.h
>
> diff --git a/qemu/Makefile.target b/qemu/Makefile.target
> index d9bdeca..05a1d84 100644
> --- a/qemu/Makefile.target
> +++ b/qemu/Makefile.target
> @@ -621,6 +621,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
>  OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
>  OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
>  OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o
> +OBJS+= device-assignment.o
>  ifeq ($(USE_KVM_PIT), 1)
>  OBJS+= i8254-kvm.o
>  endif
> diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
> new file mode 100644
> index 000..5ba21a0
> --- /dev/null
> +++ b/qemu/hw/device-assignment.c
> @@ -0,0 +1,619 @@
> +/*
> + * Copyright (c) 2007, Neocleus Corporation.
> + *
> + * This program is free software; you can redistribute it and/or
> modify it + * under the terms and conditions of the GNU General
> Public License, + * version 2, as published by the Free Software
> Foundation. + *
> + * This program is distributed in the hope it will be useful, but
> WITHOUT + * ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> General Public License for + * more details.
> + *
> + * You should have received a copy of the GNU General Public License
> along with + * this program; if not, write to the Free Software
> Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA
> 02111-1307 USA. + *
> + *
> + *  Assign a PCI device from the host to a guest VM.
> + *
> + *  Adapted for KVM by Qumranet.
> + *
> + *  Copyright (c) 2007, Neocleus, Alex Novik ([EMAIL PROTECTED])
> + *  Copyright (c) 2007, Neocleus, Guy Zana ([EMAIL PROTECTED])
> + *  Copyright (C) 2008, Qumranet, Amit Shah ([EMAIL PROTECTED])
> + *  Copyright (C) 2008, Red Hat, Amit Shah ([EMAIL PROTECTED])
> + */
> +#include 
> +#include 
> +#include "qemu-kvm.h"
> +#include "hw.h"
> +#include "pc.h"
> +#include "sysemu.h"
> +#include "console.h"
> +#include 
> +#include "device-assignment.h"
> +
> +/* From linux/ioport.h */
> +#define IORESOURCE_IO   0x0100  /* Resource type */
> +#define IORESOURCE_MEM  0x0200
> +#define IORESOURCE_IRQ  0x0400
> +#define IORESOURCE_DMA  0x0800
> +#define IORESOURCE_PREFETCH 0x1000  /* No side effects */
> +
> +/* #define DEVICE_ASSIGNMENT_DEBUG 1 */
> +
> +#ifdef DEVICE_ASSIGNMENT_DEBUG
> +#define DEBUG(fmt, args...)   \
> +do {  \
> +  fprintf(stderr, "%s: " fmt, __func__ , ## args);\
> +} while (0)
> +#else
> +#define DEBUG(fmt, args...) do { } while(0)
> +#endif
> +
> +static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr,
> +   uint32_t value)
> +{
> +AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
> +uint32_t r_pio = (unsigned long)r_access->r_virtbase
> ++ (addr - r_access->e_physbase);
> +
> +DEBUG(stderr, "%s: r_pio=%08x e_physbase=%08x"
> +  " r_virtbase=%08lx value=%08x\n",
> +  __func__, r_pio, (int)r_access->e_physbase,
> +  (unsigned long)r_access->r_virtbase, value);
> +outb(value, r_pio);
> +}
> +
> +static void assigned_dev_ioport_writew(void *opaque, uint32_t addr,
> +   uint32_t value)
> +{
> +AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
> +uint32_t r_pio = (unsigned long)r_access->r_virtbase
> ++ (addr - r_access

Re: [PATCH 0/2] kvm: disable virtualization on kdump

2008-10-26 Thread Eric W. Biederman
Avi Kivity <[EMAIL PROTECTED]> writes:

> Eric W. Biederman wrote:
>>> so reboots don't
>>> work.  It also assigns some memory to the cpu; if the new kernel isn't aware
> of
>>> it,
>>
>> Not a problem for a kdump kernel, as it lives out of a reserved region
>> of memory.
>>
>
> But it is a problem for general kexec.

Agreed.  We certainly want to cleanly disable vmx on a normal
kexec reboot.

 Is it possible to disable vmx mode before we enable interrrupts in the
 kdump kernel?


>>> You need IPIs to disable vmx on smp.
>>>
>>
>> Thank you.  Reading your description and taking a quick look at
>> the code in hardware disable it does not appear that there is
>> anything needed (other than restricting ourselves it running
>> uniprocessor in the kdump kernel) that needs to happen.
>>
>> Certainly it would be nice to have kvm disabled in hardware,
>> but if you are proposing using the existing hardware disable
>> I must say that the cure looks much worse than the disease.
>>
>
> Certainly you don't want to issue IPIs when kdump()ing.  It's not unlikely 
> that
> the other cpus have interrupts permanently disabled.
>
> (we can use NMI IPIs, but that will likely be messy)
>
>> It looks like the disable function is all of about 20 assembly
>> instructions so I would not have a problem if he had a
>> little inline function we could call that test to see if
>> vmx is enabled and disable it in the case of kexec on panic.
>>
>> The normal polite shutdown.  That just looks like asking for trouble.
>>
>
> But what happens when the kdump kernel reboots?  If it is uniprocessor, it 
> will
> never have a chance to disable vmx on other cpus.  Using acpi reset (now
> default) works around this on some machines, but not all.

Mostly likely any reboot path will trigger software to toggle the
reset line on the board.  At least that has been my experience, and
the reason we don't see many problems when we fail to properly
shutdown devices before reboot.  If vmx persists across that it would
seem to be very broken by design.

In any case if reboot fails after a kdump that is a minor thing.  Someone
can always push the power button or the equivalent.

My objections are:  on_each_cpu called from after a panic looks like
a good way to loose control of the box and never return.  Adding
a notifier looks like a good way too add functionality onto the
kdump path that never gets reviewed for robustness after a kernel
crash and thus a good way to remove the usefulness of the kexec
on panic kernel path.

Eric
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm: disable virtualization on kdump

2008-10-26 Thread Eduardo Habkost
On Sun, Oct 26, 2008 at 05:07:45PM +0200, Avi Kivity wrote:
> Eric W. Biederman wrote:

 Is it possible to disable vmx mode before we enable interrrupts in the
 kdump kernel?

   
>>> You need IPIs to disable vmx on smp.
>>> 
>>
>> Thank you.  Reading your description and taking a quick look at
>> the code in hardware disable it does not appear that there is
>> anything needed (other than restricting ourselves it running
>> uniprocessor in the kdump kernel) that needs to happen.
>>
>> Certainly it would be nice to have kvm disabled in hardware,
>> but if you are proposing using the existing hardware disable
>> I must say that the cure looks much worse than the disease.
>>   
>
> Certainly you don't want to issue IPIs when kdump()ing.  It's not  
> unlikely that the other cpus have interrupts permanently disabled.
>
> (we can use NMI IPIs, but that will likely be messy)

NMI IPIs are already used on x86 native_machine_crash_shutdown(), so
it wouldn't get more messy that it is currently. We just need to add
another bit of code to the code that already runs on an NMI handler.

My question is: is a notifier chain too much complexity for a sensible
piece of code like that? If so, a compile-time hook on that point
would be safer, but then it wouldn't work when KVM is compiled as a
out-of-tree module.


>
>> It looks like the disable function is all of about 20 assembly
>> instructions so I would not have a problem if he had a
>> little inline function we could call that test to see if
>> vmx is enabled and disable it in the case of kexec on panic.
>>
>> The normal polite shutdown.  That just looks like asking for trouble.
>>   
>
> But what happens when the kdump kernel reboots?  If it is uniprocessor,  
> it will never have a chance to disable vmx on other cpus.  Using acpi  
> reset (now default) works around this on some machines, but not all.

Good point. My problem was a hang when booting the kdump kernel, but it
may also cause problems later, when the kdump kernel reboots.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2198658 ] USB Passthrough is broken in the latest code

2008-10-26 Thread SourceForge.net
Bugs item #2198658, was opened at 2008-10-26 20:06
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2198658&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: qemu
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jamie Kirkpatrick (gringostarr)
Assigned to: Nobody/Anonymous (nobody)
Summary: USB Passthrough is broken in the latest code

Initial Comment:
I found that in the latest source I got the error message: "Could not add USB 
device ${device_id}" when trying to use my printer with an ubuntu guest.  This 
previously worked until I updated, built and installed the latest kvm-userspace 
source last week to the latest Git source.

After an hour or so digging around on the web I found out that this seems to be 
a known issue with the current qemu source, which was recently integrated into 
the Git trunk.  A patch that fixed the issue against the KVM-77 sources can be 
found here: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/156085.

My host is Debian Lenny running a 2.6.27 rc kernel..

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2198658&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: external module: test for EXT_CONFIG_KVM_TRACE instead of CONFIG_KVM_TRACE

2008-10-26 Thread Avi Kivity

Eduardo Habkost wrote:

This patch complements 800e7a37ca9ace991a6a36a0046551ec30ec3b9c
(kvm: external module: adjust KVM_TRACE support to account for host kernel).

The config variable defined on config.kbuild was renamed to
EXT_CONFIG_KVM_TRACE, but the check for the configure setting on
kernel/Makefile was not changed to the new name, making the build system
try to compile kvm_trace.c without -DEXT_CONFIG_KVM_TRACE=y, if
building using --with-kvm-trace.

  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] kvm: bios: Update e820 table for EPT real mode pagetable

2008-10-26 Thread Avi Kivity

Sheng Yang wrote:

On Friday 17 October 2008 15:17:52 Sheng Yang wrote:
  

I remembered I had sent this long long ago, but happened to find it missing
in upstream...

Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>



Avi?
  


Thanks for persisting; applied.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm: disable virtualization on kdump

2008-10-26 Thread Avi Kivity

Eric W. Biederman wrote:

so reboots don't
work.  It also assigns some memory to the cpu; if the new kernel isn't aware of
it, 



Not a problem for a kdump kernel, as it lives out of a reserved region
of memory.
  


But it is a problem for general kexec.


Is it possible to disable vmx mode before we enable interrrupts in the
kdump kernel?

  

You need IPIs to disable vmx on smp.



Thank you.  Reading your description and taking a quick look at
the code in hardware disable it does not appear that there is
anything needed (other than restricting ourselves it running
uniprocessor in the kdump kernel) that needs to happen.

Certainly it would be nice to have kvm disabled in hardware,
but if you are proposing using the existing hardware disable
I must say that the cure looks much worse than the disease.
  


Certainly you don't want to issue IPIs when kdump()ing.  It's not 
unlikely that the other cpus have interrupts permanently disabled.


(we can use NMI IPIs, but that will likely be messy)


It looks like the disable function is all of about 20 assembly
instructions so I would not have a problem if he had a
little inline function we could call that test to see if
vmx is enabled and disable it in the case of kexec on panic.

The normal polite shutdown.  That just looks like asking for trouble.
  


But what happens when the kdump kernel reboots?  If it is uniprocessor, 
it will never have a chance to disable vmx on other cpus.  Using acpi 
reset (now default) works around this on some machines, but not all.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] Add S3 state to DSDT. Handle resume event in the BIOS.

2008-10-26 Thread Avi Kivity

Gleb Natapov wrote:

Signed-off-by: Gleb Natapov <[EMAIL PROTECTED]>
---

 bios/acpi-dsdt.dsl |   33 ++-
 bios/rombios.c |   35 +
 bios/rombios32.c   |   74 
 3 files changed, 130 insertions(+), 12 deletions(-)

diff --git a/bios/acpi-dsdt.dsl b/bios/acpi-dsdt.dsl
index 577b3fe..02f53a1 100755
--- a/bios/acpi-dsdt.dsl
+++ b/bios/acpi-dsdt.dsl
@@ -65,6 +65,32 @@ DefinitionBlock (
gen_processor(14, E)
 }
 
+/*

+ * S3 (suspend-to-ram), S4 (suspend-to-disc) and S5 (power-off) type codes:
+ * must match piix4 emulation.
+ */
+Name (\_S3, Package (0x04)
+{
+0x01,  /* PM1a_CNT.SLP_TYP */
+0x01,  /* PM1b_CNT.SLP_TYP */
+Zero,  /* reserved */
+Zero   /* reserved */
+})
+Name (\_S4, Package (0x04)
+{
+Zero,  /* PM1a_CNT.SLP_TYP */
+Zero,  /* PM1b_CNT.SLP_TYP */
+Zero,  /* reserved */
+Zero   /* reserved */
+})
+Name (\_S5, Package (0x04)
+{
+Zero,  /* PM1a_CNT.SLP_TYP */
+Zero,  /* PM1b_CNT.SLP_TYP */
+Zero,  /* reserved */
+Zero   /* reserved */
+})
+
 Scope (\)
 {
 /* Debug Output */
@@ -626,13 +652,6 @@ DefinitionBlock (
 }
 }
 
-/* S5 = power off state */

-Name (_S5, Package (4) {
-0x00, // PM1a_CNT.SLP_TYP
-0x00, // PM2a_CNT.SLP_TYP
-0x00, // reserved
-0x00, // reserved
-})
 Scope (\_GPE)
 {
  

Any reason for the code movement?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: Fix kvm_free_physmem memory leak.

2008-10-26 Thread Avi Kivity

François Diakhate wrote:

Make sure that kvm_free_physmem actually frees memory
when a memory slot is not user allocated.

@@ -4195,7 +4195,7 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
memslot->userspace_addr = userspace_addr;
spin_unlock(&kvm->mmu_lock);
} else {
-   if (!old.user_alloc && old.rmap) {
+   if (!old.user_alloc && old.rmap && current->mm) {
int ret;

  


What's the purpose of this?


diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a87f45e..b0d7435 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -639,8 +639,17 @@ void kvm_free_physmem(struct kvm *kvm)
 {
int i;

-   for (i = 0; i < kvm->nmemslots; ++i)
+   for (i = 0; i < kvm->nmemslots; ++i) {
+   struct kvm_userspace_memory_region mem = {
+   .slot = i,
+   .guest_phys_addr = kvm->memslots[i].base_gfn << 
PAGE_SHIFT,
+   .memory_size = 0,
+   .flags = 0,
+   };
+
+   kvm_set_memory_region(kvm, &mem, kvm->memslots[i].user_alloc);
kvm_free_physmem_slot(&kvm->memslots[i], NULL);
+   }
 }
  


Better to fix kvm_free_physmem_slot() if it doesn't handle 
!user_allocated memory properly.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm: disable virtualization on kdump

2008-10-26 Thread Eric W. Biederman
Avi Kivity <[EMAIL PROTECTED]> writes:

> Eric W. Biederman wrote:
>> Why do we need to disable vmx mode before booting a normal linux kernel?
>>
>
> vmx mode blocks INIT (even on the host; not just on the guests) 
*blink* broken hardware design there.

> so reboots don't
> work.  It also assigns some memory to the cpu; if the new kernel isn't aware 
> of
> it, 

Not a problem for a kdump kernel, as it lives out of a reserved region
of memory.

> the cpu and the kernel would both think it belongs to them.  Finally, if vmx
> mode is enabled, you can't start kvm on the new kernel.

This isn't especially interesting in the crash dump scenario.

>> Is it possible to disable vmx mode before we enable interrrupts in the
>> kdump kernel?
>>
>
> You need IPIs to disable vmx on smp.

Thank you.  Reading your description and taking a quick look at
the code in hardware disable it does not appear that there is
anything needed (other than restricting ourselves it running
uniprocessor in the kdump kernel) that needs to happen.

Certainly it would be nice to have kvm disabled in hardware,
but if you are proposing using the existing hardware disable
I must say that the cure looks much worse than the disease.

It looks like the disable function is all of about 20 assembly
instructions so I would not have a problem if he had a
little inline function we could call that test to see if
vmx is enabled and disable it in the case of kexec on panic.

The normal polite shutdown.  That just looks like asking for trouble.

Eric
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH] KVM: VMX: Report VNMI emulation

2008-10-26 Thread Avi Kivity

Jan Kiszka wrote:

In case we ever have to debug possibly NMI-related issues of the guest,
it may help to correlate them with the VNMI emulation for older VMX
CPUs.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |4 
 1 file changed, 4 insertions(+)

Index: b/arch/x86/kvm/vmx.c
===
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3671,6 +3671,10 @@ static int __init vmx_init(void)
 
 	ept_sync_global();
 
+	if (!cpu_has_virtual_nmis())

+   printk(KERN_WARNING
+  "kvm: emulating NMI window via interrupt window\n");
+

  


This is best done via /proc/cpuinfo... (vnmi or vmx_vnmi) what's the 
status on those patches?



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] virtio_net: Fix leaked netdev->refcnt

2008-10-26 Thread Avi Kivity

Mark McLoughlin wrote:

If after receiving some packets and refilling the queue with buffers,
we detect that more packets are available then we re-schedule the
queue and process them.

This re-scheduling - i.e. calling __netif_rx_schedule() - causes a
netdev reference to be taken.

Once we've finally run out of buffers to process, we return zero and
net_rx_action() drops the reference taken by the original call to
_netif_rx_schedule() in e.g. skb_recv_done().

The reference taken by re-scheduling is always leaked, leading to:

  waiting for eth0 to become free. Usage count = 132568

Fix by immediately dropping the extra reference taken.
  


Applied all three, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvmtrace: Allow user to clean up stale trace setup

2008-10-26 Thread Avi Kivity

Eduardo Habkost wrote:

If kvmtrace crashes or gets killed while collecting trace data, stale
trace setup information may be enabled on the kernel, and currently
there is no way to force the previous trace setup to be overwritten
or disabled.

The following two patches against kvm-userspace will add a new
command-line option to kvmtrace (-f), that will allow the user to forcibly
clean up an existing trace setup before enabling trace.
  


The long term plan is to drop kvmtrace in favor of the more generic 
frameworks (which provide a common trace transport, IIRC).  Can you 
check if this has been merged (or if enough of it is in mainline)?


This is exactly the sort of issues that are best handled by a generic 
implementation.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] qemu: piix: Introduce functions to get pin number from irq and vice versa

2008-10-26 Thread Avi Kivity

Amit Shah wrote:
 
+int piix3_get_pin(int pic_irq)

+{
+int i;
+for (i = 0; i < 4; i++)
+if (piix3_dev->config[0x60+i] == pic_irq)
+return i;
+return -1;
+}
  


What happens if two pci interrupts are routed to one irq line?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] KVM/userspace: Device Assignment: Add ioctl wrappers needed for assigning devices

2008-10-26 Thread Avi Kivity

Amit Shah wrote:
 
+#ifdef KVM_CAP_DEVICE_ASSIGNMENT

+int kvm_assign_pci_device(kvm_context_t kvm,
+ struct kvm_assigned_pci_dev *assigned_dev)
+{
+   return ioctl(kvm->vm_fd, KVM_ASSIGN_PCI_DEVICE, assigned_dev);
  


Convert -1s to -errno, to avoid problems with errno being overwritten later.


+}
+
+int kvm_assign_irq(kvm_context_t kvm,
+  struct kvm_assigned_irq *assigned_irq)
+{
+   return ioctl(kvm->vm_fd, KVM_ASSIGN_IRQ, assigned_irq);
+}
+#endif
  



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests

2008-10-26 Thread Avi Kivity

Anthony Liguori wrote:


I don't think having a kvm_enabled() check here is very useful.  I 
think device-assignment.c should be conditional on USE_KVM, and the 
only kvm_enabled() check should be when creating the initial device 
assignment.  Practically speaking, QEMU is never going to support 
device assignment outside of the context of KVM because I strongly 
doubt anything like irqhook will make it upstream.


Userspace interrupt handlers are actually possible with MSI; we should 
see if uio is open to adding support for that.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm: disable virtualization on kdump

2008-10-26 Thread Avi Kivity

Eric W. Biederman wrote:

Why do we need to disable vmx mode before booting a normal linux kernel?
  


vmx mode blocks INIT (even on the host; not just on the guests) so 
reboots don't work.  It also assigns some memory to the cpu; if the new 
kernel isn't aware of it, the cpu and the kernel would both think it 
belongs to them.  Finally, if vmx mode is enabled, you can't start kvm 
on the new kernel.



Is it possible to disable vmx mode before we enable interrrupts in the
kdump kernel?
  


You need IPIs to disable vmx on smp.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm: disable virtualization on kdump

2008-10-26 Thread Avi Kivity

Eduardo Habkost wrote:

Considering they touch both KVM and kexec, which tree would be best way
to get them in?

  


kvm.git will be happy to hold these patches, provided they are acked by 
the relevant maintainer.



(Avi: the patches were sent only to kexec and kvm mailing lists,
initially. If it's better to submit them to your address also so it gets
on your queue, please let me know)
  


Please do copy me.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/3] KVM: MMU: prepopulate the shadow on invlpg

2008-10-26 Thread Avi Kivity

Marcelo Tosatti wrote:

If the guest executes invlpg on a non present entry, peek into th
pagetable and attempt to prepopulate the shadow entry.

Also stop dirty fault updates from interfering with the fork dete

For RHEL3 and 32-bit Vista guests the success rate is high. With Win2003
there is a 1:2 success/fail ratio, but even then compilation benchmarks
are slightly faster. 2000 and XP rarely execute invlpg on non present
entries.

2% improvement on RHEL3/AIM7.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -2365,7 +2365,8 @@ static void kvm_mmu_access_page(struct k
 }
 
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,

-  const u8 *new, int bytes)
+  const u8 *new, int bytes,
+  bool speculative)
  


kvm_mmu_pte_write()s are always speculative.  Maybe this is misnamed?

Perhaps 'guest_initiated' (with the opposite meaning).


@@ -222,7 +223,7 @@ walk:
if (ret)
goto walk;
pte |= PT_DIRTY_MASK;
-   kvm_mmu_pte_write(vcpu, pte_gpa, (u8 *)&pte, sizeof(pte));
+   kvm_mmu_pte_write(vcpu, pte_gpa, (u8 *)&pte, sizeof(pte), 1);
  


This is definitely not a speculative write.  But it is !guest_initiated.


@@ -467,10 +468,18 @@ static int FNAME(shadow_invlpg_entry)(st
  struct kvm_vcpu *vcpu, u64 addr,
  u64 *sptep, int level)
 {
+   struct shadow_walker *sw =
+   container_of(_sw, struct shadow_walker, walker);
 
 	if (level == PT_PAGE_TABLE_LEVEL) {

-   if (is_shadow_present_pte(*sptep))
+   struct kvm_mmu_page *sp = page_header(__pa(sptep));
  


blank line after declarations.  Or move to head of function.


+   sw->pte_gpa = (sp->gfn << PAGE_SHIFT);
+   sw->pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t);
+
+   if (is_shadow_present_pte(*sptep)) {
rmap_remove(vcpu->kvm, sptep);
+   sw->pte_gpa = -1;
  


Why?  The pte could have heen replaced (for example, a write access to a 
cow page).



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/3] KVM: MMU: skip global pgtables on sync due to cr3 switch

2008-10-26 Thread Avi Kivity

Marcelo Tosatti wrote:

Skip syncing global pages on cr3 switch (but not on cr4/cr0). This is
important for Linux 32-bit guests with PAE, where the kmap page is
marked as global.

  


Patch is good, but won't apply without the first.


 {
u64 spte;
int ret = 0;
u64 mt_mask = shadow_mt_mask;
+   struct kvm_mmu_page *sp = page_header(__pa(shadow_pte));
+
+   if (!global && sp->global) {
+   sp->global = 0;
  


A slight deficiency in this approach is that a page can't transition 
from !global to global.  I don't think this is frequent, so we don't 
need to deal with it.  But a more accurate approach is to keep a count 
of non-global present mappings, and to activate the global logic when 
this count is nonzero.




--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/3] KVM: MMU: collapse remote TLB flushes on root sync

2008-10-26 Thread Avi Kivity

Marcelo Tosatti wrote:

Instead of flushing remote TLB's at every page resync, do an initial
pass to write protect the sptes, collapsing the flushes on a single
remote TLB invalidation.

kernbench is 2.3% faster on 4-way guest. Improvements have been seen
with other loads such as AIM7.

Avi: feel free to change this if you dislike the style (I do, but can't
think of anything nicer).

 static void mmu_sync_children(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
struct sync_walker walker = {
-   .walker = { .entry = mmu_sync_fn, },
+   .walker = { .entry = mmu_wprotect_fn,
+   .clear_unsync = false, },
.vcpu = vcpu,
+   .write_protected = false
};
 
+	/* collapse the TLB flushes as an optimization */

+   mmu_unsync_walk(sp, &walker.walker);
+   if (walker.write_protected)
+   kvm_flush_remote_tlbs(vcpu->kvm);
+
+   walker.walker.entry = mmu_sync_fn;
+   walker.walker.clear_unsync = true;
+
while (mmu_unsync_walk(sp, &walker.walker))
cond_resched_lock(&vcpu->kvm->mmu_lock);
  


We're always doing two passes here, which is a bit sad.  How about 
having a single pass which:


- collects unsync pages into an array
- exits on no more unsync pages or max array size reached

Then, iterate over the array:

- write protect all pages
- flush tlb
- sync pages

Loop until the root is synced.

If the number of pages to sync is typically small, and the array is 
sized to be larger than this, then we only walk the pagetables once.


btw, our walkers are a bit awkward (though still better than what we had 
before).  If we rewrite them into for_each style iterators, the code 
could become cleaner and shorter.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html