date:20121017

PCI device not properly reset after VFIO

2012-10-17 Thread Hannes Reinecke


Hi Alex,

I've been playing around with VFIO and megasas (of course).
What I did now was switching between VFIO and 'normal' operation, ie 
emulated access.


megasas is happily running under VFIO, but when I do an emergency 
stop like killing the Qemu session the PCI device is not properly reset.

IE when I load 'megaraid_sas' after unbinding the vfio_pci module
the driver cannot initialize the card and waits forever for the 
firmware state to change.


I need to do a proper pci reset via
echo 1 > /sys/bus/pci/device//reset
to get it into a working state again.

Looking at vfio_pci_disable() pci reset is called before the config 
state and BARs are restored.
Seeing that vfio_pci_enable() calls pci reset right at the start, 
too, before modifying anything I do wonder whether the pci reset is 
at the correct location for disable.


I would have expected to call pci reset in vfio_pci_disable() 
_after_ we have restored the configuration, to ensure a sane state 
after reset.

And, as experience show, we do need to call it there.

So what is the rationale for the pci reset?
Can we move it to the end of vfio_pci_disable() or do we need to 
call pci reset twice?


Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost-blk: Add vhost-blk support v2

2012-10-17 Thread Michael S. Tsirkin

On Thu, Oct 18, 2012 at 02:50:56PM +1030, Rusty Russell wrote:
> Asias He  writes:
> >>> +#define BLK_HDR  0
> >> 
> >> What's this for, exactly? Please add a comment.
> >
> > The block headr is in the first and separate buffer.
> 
> Please don't assume this!  We're trying to fix all the assumptions in
> qemu at the moment.
> 
> vhost_net handles this correctly, taking bytes off the descriptor chain
> as required.
> 
> Thanks,
> Rusty.

BTW are we agreed on the spec update that makes cmd 32 bytes?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to do fast accesses to LAPIC TPR under kvm?

2012-10-17 Thread Jan Kiszka

On 2012-10-17 21:24, Stefan Fritsch wrote:
> Hi,
> 
> OpenBSD/i386 seems to be one of the few operating systems that still 
> uses the LAPIC taks priority register for interrupt handling. On AMD 

Yeah, only very special OSes do this... ;)

> CPUs and on older Intel CPUs without the flexpriority feature, this 
> causes a huge performance impact on kvm. I have seen slowdown by a 
> factor of 10.
> 
> Is there a way to use the TPR under kvm without the slowdown? There 
> are some MSRs inherited from Hyper-V, but using these does not make 
> that much difference. I think this is because they still cause an 
> vmexit for every TPR access. I expect the the same is true for x2apic 
> emulation, isn't it?

Didn't study the HyperV interface yet, but the trick is indeed to avoid
as many vmexits as possible, specifically when lowering the TPR value
has no effect as no interrupts are pending.

> 
> There is also the kvmvapic, but kvm does not expose a sane interface 
> to it and only uses it for Windows XP specific binary patching.

The kvmvapic is not a classic paravirtual interface in that it does not
really require guest OS awareness. But it requires the guest to accept
being patched. That's the case for certain Windows versions. Also, the
option ROMs, including our kvmvapic "ROM", have to be mapped at fixed,
accessible addresses to allow jumping to it from a patched TPR instructions.

Therefore, we limited the patching to known OS versions, avoiding to
mess around with other, untested OSes. However, it may be possible to
accept OpenBSD as well by adjusting the tests in kvmvapic and possibly
adjusting some other details.

> 
> Another possibility is TPR access via CR8 on AMD, but the missing 
> cr8_legacy CPUID bit and this discussion [1] make me believe that this 
> is not supported under kvm, at least in 32bit mode. Could this be 
> easily fixed? If yes, would it solve the performance problems, i.e. 
> offer performance comparable to Intel's flexpriority feature?

Everything that unconditionally traps, and so do CR8 accesses, does not
help.

> 
> OpenBSD seems to be reluctant to stop using the TPR. In fact, in a 
> recent discussion, there has been a suggestion that OpenBSD should 
> switch to using TPR also on OpenBSD/amd64 to solve some problems with 
> boot interrupts. How do you expect this would affect performance under 
> kvm (if using CR8)?
> 
> Or do you have any other suggestions? One could also modify kvm to 
> expose a real interface to kvmvapic, e.g. allow the guest OS to 
> provide the virtual address of the option rom and the offset of the 
> CPU number in the %fs segment, instead of using hard coded values for 
> Windows XP.

Of course, though all we need is a stable address in fact. See
vapic_write() for the existing PV interface (between option ROM and
hypervisor so far). We can extend it as long as it is compatible with
the existing one.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost-blk: Add vhost-blk support v2

2012-10-17 Thread Rusty Russell

Asias He  writes:
>>> +#define BLK_HDR0
>> 
>> What's this for, exactly? Please add a comment.
>
> The block headr is in the first and separate buffer.

Please don't assume this!  We're trying to fix all the assumptions in
qemu at the moment.

vhost_net handles this correctly, taking bytes off the descriptor chain
as required.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86: clear vmcss on all cpus when doing kdump if necessary

2012-10-17 Thread Zhang Yanfei

于 2012年10月17日 18:16, Avi Kivity 写道:
> On 10/17/2012 04:28 AM, Zhang Yanfei wrote:
>> 于 2012年10月15日 23:43, Avi Kivity 写道:
>>> On 10/12/2012 08:40 AM, Zhang Yanfei wrote:
 Currently, kdump just makes all the logical processors leave VMX operation 
 by
 executing VMXOFF instruction, so any VMCSs active on the logical 
 processors may
 be corrupted. But, sometimes, we need the VMCSs to debug guest images 
 contained
 in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs 
 before
 executing the VMXOFF instruction.
>>>
>>> How have you verified that VMXOFF doesn't flush cached VMCSs already?
>>>
>>
>> I tried some tests, for example, I made copies for every vmcs, and in the 
>> kdump
>> path, I backed up all the loaded vmcs into the copies before vmxoff.
>> After generating the vmcore, I retrieve the vmcss and their copies, and 
>> compare them,
>> no differences.
>>
>> Another test is using VMCLEAR to clear all the loaded vmcs before VMXOFF,
>> and compare the vmcss and their copies, there are indeed differences between 
>> the
>> vmcs and its copy.
>>
>> I know the tests may be not so convincing, for example, I used memcpy to 
>> back up
>> the vmcss and it is an ordinary memory operation. But to ensure the 
>> non-corruption
>> of the vmcss in the vmcore, I think we should VMCLEAR the vmcss before 
>> VMXOFF just
>> as the Intel spec says.
> 
> Sorry, I was unclear -- I was referring to the spec, I wasn't sure
> whether VMXOFF is defined to flush VMCSes or whether it just invalidates
> on-chip caches so that it won't flush them out in the future, corrupting
> memory.  We don't want to depend on actual behaviour as it may change
> with future version.
> 
> Copying some Intel folk, maybe they can clarify it.
> 

Yes, the Intel spec says "may be" about the VMCS-corruption thing. From
chapter 24.10.1 in Intel® 64 and IA-32 Architectures Software Developer’s
Manual Volume 3C:System Programming Guide, Part 3, there is the description:

"If a logical processor leaves VMX operation, any VMCSs active on that logical
processor may be corrupted (see below). To prevent such corruption of a VMCS 
that
may be used either after a return to VMX operation or on another logical 
processor,
software should VMCLEAR that VMCS before executing the VMXOFF instruction or
removing power from the processor (e.g., as part of a transition to the S3 and 
S4
power states)."

Our purpose is to make sure the VMCSs in the vmcore are updated and 
non-corrupted. So
according to the description above, no matter whether VMXOFF is defined to flush
VMCSs or whether it just invalidates on-chip caches, we'd better VMCLEAR the
VMCSs before executing the VMXOFF.

Thanks
Zhang Yanfei
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] KVM: ia64: remove unused variable in kvm_release_vm_pages()

2012-10-17 Thread Zhang, Xiantao

Acked-by: Xiantao Zhang

> -Original Message-
> From: Wei Yongjun [mailto:weiyj...@gmail.com]
> Sent: Wednesday, October 17, 2012 11:04 PM
> To: a...@redhat.com; mtosa...@redhat.com; Zhang, Xiantao; Luck, Tony; Yu,
> Fenghua
> Cc: yongjun_...@trendmicro.com.cn; kvm@vger.kernel.org; kvm-
> i...@vger.kernel.org; linux-i...@vger.kernel.org
> Subject: [PATCH] KVM: ia64: remove unused variable in
> kvm_release_vm_pages()
> 
> From: Wei Yongjun 
> 
> The variable base_gfn is initialized but never used otherwise, so remove the
> unused variable.
> 
> dpatch engine is used to auto generate this patch.
> (https://github.com/weiyj/dpatch)
> 
> Signed-off-by: Wei Yongjun 
> ---
>  arch/ia64/kvm/kvm-ia64.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index
> 8b3a9c0..c71acd7 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -1362,11 +1362,9 @@ static void kvm_release_vm_pages(struct kvm
> *kvm)
>   struct kvm_memslots *slots;
>   struct kvm_memory_slot *memslot;
>   int j;
> - unsigned long base_gfn;
> 
>   slots = kvm_memslots(kvm);
>   kvm_for_each_memslot(memslot, slots) {
> - base_gfn = memslot->base_gfn;
>   for (j = 0; j < memslot->npages; j++) {
>   if (memslot->rmap[j])
>   put_page((struct page *)memslot->rmap[j]);

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-17 Thread Benjamin Herrenschmidt

On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote:
> 
> With the XICS, there are two types of irqchip: a source controller and
> a presentation controller.  There is one presentation controller per
> vcpu and typically one source controller per PCI host bridge (a source
> controller can manage multiple sources).  The "buid" above is
> basically an identifier for a source controller.
> 
> So with the above, it would be quite easy to add new types and
> arguments for them. 

The only possible issue is that afiak, the ioctl number depends on the
structure size, no ? If it does, then we should add some padding to the
union to leave room for new types.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-17 Thread Paul Mackerras

On Wed, Oct 17, 2012 at 04:39:57PM -0400, Christoffer Dall wrote:
> On Wed, Oct 17, 2012 at 4:38 PM, Alexander Graf  wrote:
> > On 10/14/2012 02:04 AM, Christoffer Dall wrote:
> >>
> >> *** warning: this RFC patch series is only compile-tested ***
> >>
> >> We need a way to specify the address at which we expect VMs to access
> >> the interrupt controller (both the emulated distributor and the hardware
> >> interface supporting virtualization).  User space should decide on this
> >> address as user space decides on an emulated board and loads a device
> >> tree describing these details directly to the guest.
> >>
> >> Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
> >> ioctl with a a highly device specific set of parameters, we try
> >> something slightly more generic, that should fit well with how user
> >> space (read QEMU) first builds the individual devices and later sets up
> >> the emulated platform.
> >
> >
> > Have you talked to Ben about this one? He wanted to design a new, more
> > flexible irqchip API that would work for XICS & MPIC. Maybe there's some
> > room for cooperation here?
> >
> I have not - Ben, what do you have in mind?

I've taken over Ben's patches in this area and I'm currently working
on getting them ready for submission.  So far we only have XICS
emulation, and it is accessed through hypercalls, so there are no
addresses in the create-iochip ioctl argument yet.

What we have so far is a new ioctl:

#define KVM_CREATE_IRQCHIP_ARGS   _IOW(KVMIO,  0xac, struct kvm_irqchip_args)

where kvm_irqchip_args is defined in an arch header and currently
looks like this:

/* for KVM_CAP_SPAPR_XICS */
#define __KVM_HAVE_IRQCHIP_ARGS
struct kvm_irqchip_args {
#define KVM_IRQCHIP_TYPE_ICP0   /* XICS: ICP (presentation controller) 
*/
#define KVM_IRQCHIP_TYPE_ICS1   /* XICS: ICS (source controller) */
__u32 type;
union {
/* XICS ICP arguments. This needs to be called once before
 * creating any VCPU to initialize the main kernel XICS data
 * structures.
 */
struct {
#define KVM_ICP_FLAG_NOREALMODE 0x0001 /* Disable real mode ICP */
__u32 flags;
} icp;

/* XICS ICS arguments. You can call this for every BUID you
 * want to make available.
 *
 * The BUID is 12 bits, the interrupt number within a BUID
 * is up to 12 bits as well. The resulting interrupt numbers
 * exposed to the guest are BUID || IRQ which is 24 bit
 *
 * BUID cannot be 0.
 */
struct {
__u32 flags;
__u16 buid;
__u16 nr_irqs;
} ics;
};
};

With the XICS, there are two types of irqchip: a source controller and
a presentation controller.  There is one presentation controller per
vcpu and typically one source controller per PCI host bridge (a source
controller can manage multiple sources).  The "buid" above is
basically an identifier for a source controller.

So with the above, it would be quite easy to add new types and
arguments for them.

Thoughts?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] Added call parameter to track whether invocation originated with guest or elsewhere

2012-10-17 Thread Auld, Will

OK, agreed it is not pretty. 

Thanks,

Will

> -Original Message-
> From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
> Sent: Wednesday, October 17, 2012 7:09 AM
> To: Avi Kivity
> Cc: Auld, Will; Will Auld; kvm@vger.kernel.org; Zhang, Xiantao; Liu,
> Jinsong
> Subject: Re: [PATCH] Added call parameter to track whether invocation
> originated with guest or elsewhere
> 
> On Wed, Oct 17, 2012 at 12:35:33PM +0200, Avi Kivity wrote:
> > On 10/17/2012 04:10 AM, Will Auld wrote:
> > > Signed-off-by: Will Auld 
> > > ---
> > >
> > > Resending to full list
> > >
> > > Marcelo,
> > >
> > > This patch is what I believe you ask for as foundational for later
> > > patches to address IA32_TSC_ADJUST.
> > >
> >
> > Please write a changelog to reflect the motivation.
> >
> > All those bool parameters scattered all over the place aren't very
> > pretty.  Usually we solve this with helpers that embed the parameter
> > name (kvm_set_msr() vs. kvm_set_msr_host()) but there are too many
> > functions for this to work here.
> >
> > Marcelo, any ideas?
> 
> Its easier to read
> 
> kvm_x86_ops->kvm_set_msr()
> kvm_x86_ops->kvm_set_msr_host()
> 
> then
> 
> kvm_x86_ops->kvm_set_msr(,false)
> kvm_x86_ops->kvm_set_msr(,true)
> 
> So you're right.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-17 Thread Benjamin Herrenschmidt

On Wed, 2012-10-17 at 16:39 -0400, Christoffer Dall wrote:

> > Have you talked to Ben about this one? He wanted to design a new, more
> > flexible irqchip API that would work for XICS & MPIC. Maybe there's some
> > room for cooperation here?
> >
> I have not - Ben, what do you have in mind?

I've been sidetracked to some other stuff so for now Paul (CC) is taking
over my interrupt patches.

We initially changes IRQ_CREATE_IRQCHIP to take an argument but that was
causing an x86 ABI breakage (ioctl number changing). So we'll probably
be creating a new one.

>From there, nothing fancy really, just an ioctl with an IRQ chip type at
the beginning followed by a union of type-specific parameters.

The main problem we haven't sorted out yet is how to replace some of the
horrors related to mapping interrupts that have tendrils all the way
into virtio-pci etc... in kemu that don't apply to use (well mostly) and
the interaction with in-kernel generated interrupts to avoid going
through qemu for vhost ec...

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-17 Thread Christoffer Dall

On Wed, Oct 17, 2012 at 4:38 PM, Alexander Graf  wrote:
> On 10/14/2012 02:04 AM, Christoffer Dall wrote:
>>
>> *** warning: this RFC patch series is only compile-tested ***
>>
>> We need a way to specify the address at which we expect VMs to access
>> the interrupt controller (both the emulated distributor and the hardware
>> interface supporting virtualization).  User space should decide on this
>> address as user space decides on an emulated board and loads a device
>> tree describing these details directly to the guest.
>>
>> Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
>> ioctl with a a highly device specific set of parameters, we try
>> something slightly more generic, that should fit well with how user
>> space (read QEMU) first builds the individual devices and later sets up
>> the emulated platform.
>
>
> Have you talked to Ben about this one? He wanted to design a new, more
> flexible irqchip API that would work for XICS & MPIC. Maybe there's some
> room for cooperation here?
>
I have not - Ben, what do you have in mind?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl

2012-10-17 Thread Christoffer Dall

On Wed, Oct 17, 2012 at 4:31 PM, Peter Maydell  wrote:
> On 17 October 2012 21:23, Christoffer Dall
>  wrote:
>> On Wed, Oct 17, 2012 at 4:21 PM, Peter Maydell  
>> wrote:
 +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the 
 CPU is
 +initally run.
>>>
>>> "initially".
>>
>> thanks a bunch for those, and sorry about the sloppyness.
>
> No problem. Also just noticed "platform" there :-)
>
I'll spell check the diff just to be sure. :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-17 Thread Alexander Graf


On 10/14/2012 02:04 AM, Christoffer Dall wrote:

*** warning: this RFC patch series is only compile-tested ***

We need a way to specify the address at which we expect VMs to access
the interrupt controller (both the emulated distributor and the hardware
interface supporting virtualization).  User space should decide on this
address as user space decides on an emulated board and loads a device
tree describing these details directly to the guest.

Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
ioctl with a a highly device specific set of parameters, we try
something slightly more generic, that should fit well with how user
space (read QEMU) first builds the individual devices and later sets up
the emulated platform.


Have you talked to Ben about this one? He wanted to design a new, more 
flexible irqchip API that would work for XICS & MPIC. Maybe there's some 
room for cooperation here?



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl

2012-10-17 Thread Peter Maydell

On 17 October 2012 21:23, Christoffer Dall
 wrote:
> On Wed, Oct 17, 2012 at 4:21 PM, Peter Maydell  
> wrote:
>>> +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU 
>>> is
>>> +initally run.
>>
>> "initially".
>
> thanks a bunch for those, and sorry about the sloppyness.

No problem. Also just noticed "platform" there :-)

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2012-10-17 Thread Peter Maydell

On 14 October 2012 01:04, Christoffer Dall
 wrote:
> On ARM (and possibly other architectures) some bits are specific to the
> model being emulated for the guest and user space needs a way to tell
> the kernel about those bits.  An example is mmio device base addresses,
> where KVM must know the base address for a given device to properly
> emulate mmio accesses within a certain address range or directly map a
> device with virtualiation extensions into the guest address space.
>
> We try to make this API slightly more generic than for our specific use,
> but so far only the VGIC uses this feature.
>
> Signed-off-by: Christoffer Dall 
> ---
>  Documentation/virtual/kvm/api.txt |   30 ++
>  arch/arm/include/asm/kvm.h|   13 +
>  arch/arm/include/asm/kvm_mmu.h|1 +
>  arch/arm/include/asm/kvm_vgic.h   |6 ++
>  arch/arm/kvm/arm.c|   31 ++-
>  arch/arm/kvm/vgic.c   |   34 +++---
>  include/linux/kvm.h   |8 
>  7 files changed, 119 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 26e953d..30ddcac 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2118,6 +2118,36 @@ for the emulated platofrm (see 
> KVM_SET_DEVICE_ADDRESS), but before the CPU is
>  initally run.
>
>
> +4.80 KVM_SET_DEVICE_ADDRESS
> +
> +Capability: KVM_CAP_SET_DEVICE_ADDRESS
> +Architectures: arm
> +Type: vm ioctl
> +Parameters: struct kvm_device_address (in)
> +Returns: 0 on success, -1 on error
> +Errors:
> +  ENODEV: The device id is unknwown

"unknown"

> +  ENXIO:  Device not supported in configuration

"in this configuration" ? (I'm guessing this is for "you tried to
map a GIC when this CPU doesn't have a GIC" and similar errors?)

> +  E2BIG:  Address outside of guest physical address space

I would say "outside" rather than "outside of" here.

> +
> +struct kvm_device_address {
> +   __u32 id;
> +   __u64 addr;
> +};
> +
> +Specify a device address in the guest's physical address space where guests
> +can access emulated or directly exposed devices, which the host kernel needs
> +to know about. The id field is an architecture specific identifier for a
> +specific device.
> +
> +ARM divides the id field into two parts, a device ID and an address type id

We should be consistent about whether ID is capitalised or not.

> +specific to the individual device.
> +
> +  bits:  | 31...16 | 15...0 |
> +  field: | device id   |  addr type id  |

This doesn't say whether userspace is allowed to make this ioctl
multiple times for the same device. This could be any of:
 * undefined behaviour
 * second call fails with some errno
 * second call overrides first one

It also doesn't say that you're supposed to call this after CREATE
and before INIT of the irqchip. (Nor does it say what happens if
you call it at some other time.)

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl

2012-10-17 Thread Christoffer Dall

On Wed, Oct 17, 2012 at 4:21 PM, Peter Maydell  wrote:
> On 14 October 2012 01:04, Christoffer Dall
>  wrote:
>> Used to initialize the in-kernel interrupt controller. On ARM we need to
>> map the virtual generic interrupt controller (vGIC) into Hyp the guest's
>> physicall address space so the guest can access the virtual cpu
>> interface. This must be done after the IRQ chips is create and after a
>> base address has been provided for the emulated platform (patch is
>> following), but before the CPU is initally run.
>
> I've now written the code for that patch but don't have access to a machine
> with the ARM cross compile setup to build it until tomorrow.
>
>>
>> Signed-off-by: Christoffer Dall 
>> ---
>>  Documentation/virtual/kvm/api.txt |   16 
>>  arch/arm/kvm/arm.c|1 +
>>  include/linux/kvm.h   |3 +++
>>  3 files changed, 20 insertions(+)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index 25eacc6..26e953d 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -2102,6 +2102,22 @@ This ioctl returns the guest registers that are 
>> supported for the
>>  KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
>>
>>
>> +4.79 KVM_INIT_IRQCHIP
>> +
>> +Capability: KVM_CAP_INIT_IRQCHIP
>> +Architectures: arm
>> +Type: vm ioctl
>> +Parameters: none
>> +Returns: 0 on success, -1 on error
>> +
>> +Initialize the in-kernel interrupt controller. On ARM we need to map the
>> +virtual generic interrupt controller (vGIC) into Hyp the guest's physicall
>
> Should that "Hyp" be deleted?

yup

>
> "physical"
>
>> +address space so the guest can access the virtual cpu interface. This must 
>> be
>> +done after the IRQ chips is create and after a base address has been 
>> provided
>
> "chip". "created".
>
>> +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU 
>> is
>> +initally run.
>
> "initially".

thanks a bunch for those, and sorry about the sloppyness.

>
> (all these typos are also in your commit message)
>

yeah, you caught my -ECUTANDPASTE there ;)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl

2012-10-17 Thread Peter Maydell

On 14 October 2012 01:04, Christoffer Dall
 wrote:
> Used to initialize the in-kernel interrupt controller. On ARM we need to
> map the virtual generic interrupt controller (vGIC) into Hyp the guest's
> physicall address space so the guest can access the virtual cpu
> interface. This must be done after the IRQ chips is create and after a
> base address has been provided for the emulated platform (patch is
> following), but before the CPU is initally run.

I've now written the code for that patch but don't have access to a machine
with the ARM cross compile setup to build it until tomorrow.

>
> Signed-off-by: Christoffer Dall 
> ---
>  Documentation/virtual/kvm/api.txt |   16 
>  arch/arm/kvm/arm.c|1 +
>  include/linux/kvm.h   |3 +++
>  3 files changed, 20 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 25eacc6..26e953d 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2102,6 +2102,22 @@ This ioctl returns the guest registers that are 
> supported for the
>  KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
>
>
> +4.79 KVM_INIT_IRQCHIP
> +
> +Capability: KVM_CAP_INIT_IRQCHIP
> +Architectures: arm
> +Type: vm ioctl
> +Parameters: none
> +Returns: 0 on success, -1 on error
> +
> +Initialize the in-kernel interrupt controller. On ARM we need to map the
> +virtual generic interrupt controller (vGIC) into Hyp the guest's physicall

Should that "Hyp" be deleted?

"physical"

> +address space so the guest can access the virtual cpu interface. This must be
> +done after the IRQ chips is create and after a base address has been provided

"chip". "created".

> +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is
> +initally run.

"initially".

(all these typos are also in your commit message)

> +
> +
>  5. The kvm_run structure
>  
>
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index f8c377b..85c76e4 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -195,6 +195,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> switch (ext) {
>  #ifdef CONFIG_KVM_ARM_VGIC
> case KVM_CAP_IRQCHIP:
> +   case KVM_CAP_INIT_IRQCHIP:
> r = vgic_present;
> break;
>  #endif
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 8091b1d..90ee023 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -626,6 +626,7 @@ struct kvm_ppc_smmu_info {
>  #ifdef __KVM_HAVE_READONLY_MEM
>  #define KVM_CAP_READONLY_MEM 81
>  #endif
> +#define KVM_CAP_INIT_IRQCHIP 82
>
>  #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -839,6 +840,8 @@ struct kvm_s390_ucas_mapping {
>  #define KVM_PPC_GET_SMMU_INFO_IOR(KVMIO,  0xa6, struct kvm_ppc_smmu_info)
>  /* Available with KVM_CAP_PPC_ALLOC_HTAB */
>  #define KVM_PPC_ALLOCATE_HTAB_IOWR(KVMIO, 0xa7, __u32)
> +/* Available with KVM_CAP_INIT_IRQCHIP */
> +#define KVM_INIT_IRQCHIP _IO(KVMIO,   0xa8)
>
>  /*
>   * ioctls for vcpu fds
> --
> 1.7.9.5
>


-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

How to do fast accesses to LAPIC TPR under kvm?

2012-10-17 Thread Stefan Fritsch

Hi,

OpenBSD/i386 seems to be one of the few operating systems that still 
uses the LAPIC taks priority register for interrupt handling. On AMD 
CPUs and on older Intel CPUs without the flexpriority feature, this 
causes a huge performance impact on kvm. I have seen slowdown by a 
factor of 10.

Is there a way to use the TPR under kvm without the slowdown? There 
are some MSRs inherited from Hyper-V, but using these does not make 
that much difference. I think this is because they still cause an 
vmexit for every TPR access. I expect the the same is true for x2apic 
emulation, isn't it?

There is also the kvmvapic, but kvm does not expose a sane interface 
to it and only uses it for Windows XP specific binary patching.

Another possibility is TPR access via CR8 on AMD, but the missing 
cr8_legacy CPUID bit and this discussion [1] make me believe that this 
is not supported under kvm, at least in 32bit mode. Could this be 
easily fixed? If yes, would it solve the performance problems, i.e. 
offer performance comparable to Intel's flexpriority feature?

OpenBSD seems to be reluctant to stop using the TPR. In fact, in a 
recent discussion, there has been a suggestion that OpenBSD should 
switch to using TPR also on OpenBSD/amd64 to solve some problems with 
boot interrupts. How do you expect this would affect performance under 
kvm (if using CR8)?

Or do you have any other suggestions? One could also modify kvm to 
expose a real interface to kvmvapic, e.g. allow the guest OS to 
provide the virtual address of the option rom and the offset of the 
CPU number in the %fs segment, instead of using hard coded values for 
Windows XP.

Cheers,
Stefan

[1] http://www.mail-archive.com/kvm@vger.kernel.org/msg30627.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: I/O errors in guest OS after repeated migration

2012-10-17 Thread Brian Jackson

On Wednesday, October 17, 2012 10:45:14 AM Guido Winkelmann wrote:
> Am Dienstag, 16. Oktober 2012, 12:44:27 schrieb Brian Jackson:
> > On Tuesday, October 16, 2012 11:33:44 AM Guido Winkelmann wrote:
> > > The commandline, as generated by libvirtd, looks like this:
> > > 
> > > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
> > > QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.15 -enable-kvm -m 1024
> > > -smp 1,sockets=1,cores=1,threads=1 -name migratetest2 -uuid
> > > ddbf11e9-387e-902b-4849-8c3067dc42a2 -nodefconfig -nodefaults -chardev
> > > socket,id=charmonitor,path=/var/lib/libvirt/qemu/migratetest2.monitor,s
> > > erv e
> > > r,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
> > > -no-reboot -no- shutdown -device
> > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
> > > file=/data/migratetest2_system,if=none,id=drive-virtio-
> > > disk0,format=qcow2,cache=none -device virtio-blk-
> > > pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-
> > > disk0,bootindex=1 -drive
> > > file=/data/migratetest2_data-1,if=none,id=drive-
> > > virtio-disk1,format=qcow2,cache=none -device virtio-blk-
> > > pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk
> > > 1 - netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device
> > > virtio-net-
> > > pci,netdev=hostnet0,id=net0,mac=02:00:00:00:00:0c,bus=pci.0,addr=0x3
> > > -vnc 127.0.0.1:2,password -k de -vga cirrus -incoming
> > > tcp:0.0.0.0:49153 -device
> > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
> > 
> > I see qcow2 in there. Live migration of qcow2 was a new feature in 1.0.
> > Have you tried other formats or different qemu/kvm versions?
> 
> I tried the same thing with a raw image file instead of qcow2, and the
> problem still happens. From the /var/log/messages of the guest:
> 
> Oct 17 17:10:34 localhost sshd[2368]: nss_ldap: could not search LDAP
> server - Server is unavailable
> Oct 17 17:10:39 localhost kernel: [  126.800075] eth0: no IPv6 routers
> present Oct 17 17:10:52 localhost kernel: [  140.335783] Clocksource tsc
> unstable (delta = -70265501 ns)
> Oct 17 17:12:04 localhost /O error on device vda1, logical block 1858765
> Oct 17 17:12:04 localhost kernel: [  212.070584] Buffer I/O error on device
> vda1, logical block 1858766
> Oct 17 17:12:04 localhost kernel: [  212.070587] Buffer I/O error on device
> vda1, logical block 1858767
> Oct 17 17:12:04 localhost kernel: [  212.070589] Buffer I/O error on device
> vda1, logical block 1858768
> Oct 17 17:12:04 localhost kernel: [  212.070592] Buffer I/O error on device
> vda1, logical block 1858769
> Oct 17 17:12:04 localhost kernel: [  212.070595] Buffer I/O error on device
> vda1, logical block 1858770
> Oct 17 17:12:04 localhost kernel: [  212.070597] Buffer I/O error on device
> vda1, logical block 1858771
> Oct 17 17:12:04 localhost kernel: [  212.070600] Buffer I/O error on device
> vda1, logical block 1858772
> Oct 17 17:12:04 localhost kernel: [  212.070602] Buffer I/O error on device
> vda1, logical block 1858773
> Oct 17 17:12:04 localhost kernel: [  212.070605] Buffer I/O error on device
> vda1, logical block 1858774
> Oct 17 17:12:04 localhost kernel: [  212.070607] Buffer I/O error on device
> vda1, logical block 1858775
> Oct 17 17:12:04 localhost kernel: [  212.070610] Buffer I/O error on device
> vda1, logical block 1858776
> Oct 17 17:12:04 localhost kernel: [  212.070612] Buffer I/O error on device
> vda1, logical block 1858777
> Oct 17 17:12:04 localhost kernel: [  212.070615] Buffer I/O error on device
> vda1, logical block 1858778
> Oct 17 17:12:04 localhost kernel: [  212.070617] Buffer I/O error on device
> vda1, logical block 1858779
> 
> (I was writing a large file at the time, to make sure I actually catch I/O
> errors as they happen)


What about newer versions of qemu/kvm? But of course if those work, your next 
task is going to be git bisect it or file a bug with your distro that is using 
an ancient version of qemu/kvm.


> 
>   Guido
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: I/O errors in guest OS after repeated migration

2012-10-17 Thread Brian Jackson

On Wednesday, October 17, 2012 06:54:00 AM Guido Winkelmann wrote:
> Am Dienstag, 16. Oktober 2012, 12:44:27 schrieb Brian Jackson:
> > On Tuesday, October 16, 2012 11:33:44 AM Guido Winkelmann wrote:
> [...]
> 
> > > The commandline, as generated by libvirtd, looks like this:
> > > 
> > > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
> > > QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.15 -enable-kvm -m 1024
> > > -smp 1,sockets=1,cores=1,threads=1 -name migratetest2 -uuid
> > > ddbf11e9-387e-902b-4849-8c3067dc42a2 -nodefconfig -nodefaults -chardev
> > > socket,id=charmonitor,path=/var/lib/libvirt/qemu/migratetest2.monitor,s
> > > erv e
> > > r,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
> > > -no-reboot -no- shutdown -device
> > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
> > > file=/data/migratetest2_system,if=none,id=drive-virtio-
> > > disk0,format=qcow2,cache=none -device virtio-blk-
> > > pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-
> > > disk0,bootindex=1 -drive
> > > file=/data/migratetest2_data-1,if=none,id=drive-
> > > virtio-disk1,format=qcow2,cache=none -device virtio-blk-
> > > pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk
> > > 1 - netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device
> > > virtio-net-
> > > pci,netdev=hostnet0,id=net0,mac=02:00:00:00:00:0c,bus=pci.0,addr=0x3
> > > -vnc 127.0.0.1:2,password -k de -vga cirrus -incoming
> > > tcp:0.0.0.0:49153 -device
> > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
> > 
> > I see qcow2 in there. Live migration of qcow2 was a new feature in 1.0.
> > Have you tried other formats or different qemu/kvm versions?
> 
> Are you sure about that? Because I'm fairly certain I have been using live
> migration since at least 0.14, if not 0.13, and I have always been using
> qcow2 as the image format for the disks...
> 
> I can still try with other image formats, though.


Yes, see the release notes for 1.0. It may have worked by chance before that, 
but it wasn't guaranteed to work. There was no blacklisting feature then like 
there is now to stop it.


> 
>   Guido
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] KVM: PPC: Support ioeventfd

2012-10-17 Thread Alexander Graf


On 10/17/2012 04:50 PM, Avi Kivity wrote:

On 10/16/2012 04:49 PM, Alexander Graf wrote:


If there is a lot of prioritization and/or queuing logic, then yes.  But
what about MSI?  Doesn't that have a direct path?

Nope. Well, yes, in a certain special case where the MPIC pushes the
interrupt vector on interrupt delivery into a special register. But not
for the "normal" case.

Ok.  The patches are fine then, but would be good to add the PIO check.


Yup, will do as a separate patch.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm-unit test behavior

2012-10-17 Thread Avi Kivity

On 10/17/2012 06:08 PM, Conny Seidel wrote:
> Hi,
> 
> 
> we are seeing something strange when running the KVM unit-tests on
> recent KVM and "older" CPUs (K8 Family).
> 


A patch was just applied fixing this; it will be merged upstream in a
few days.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kvm-unit test behavior

2012-10-17 Thread Conny Seidel

Hi,


we are seeing something strange when running the KVM unit-tests on
recent KVM and "older" CPUs (K8 Family).

[ cut here ]
WARNING: at arch/x86/kvm/../../../virt/kvm/kvm_main.c:1325 
kvm_release_pfn_clean+0x5b/0x60 [kvm]()
Hardware name: WARTHOG
Modules linked in: tun nfsv4 auth_rpcgss nfsv3 nfs_acl nfs fscache lockd sunrpc 
bridge stp llc ipv6 amd8111e mii powernow_k8 freq_table kvm_amd kvm serio_raw 
pcspkr k8temp amd64_edac_mod edac_core edac_mce_amd i2c_amd756 amd_rng 
i2c_amd8111 sg shpchp ext3 jbd mbcache sd_mod crc_t10dif sr_mod cdrom sata_sil 
ata_generic pata_acpi pata_amd radeon ttm drm_kms_helper drm i2c_algo_bit 
i2c_core dm_mirror dm_region_hash dm_log dm_mod
Pid: 2084, comm: qemu-kvm Not tainted 3.6.0.20121010_ecefbd9-1.el6.osrc.x86_64 
#1
Call Trace:
 [] warn_slowpath_common+0x7f/0xc0
 [] warn_slowpath_null+0x1a/0x20
 [] kvm_release_pfn_clean+0x5b/0x60 [kvm]
 [] paging64_fetch+0x1eb/0x370 [kvm]
 [] ? __gfn_to_pfn+0x6f/0x80 [kvm]
 [] ? gfn_to_pfn_async+0x1a/0x20 [kvm]
 [] ? try_async_pf+0x4b/0x1f0 [kvm]
 [] paging64_page_fault+0x293/0x2d0 [kvm]
 [] ? kfree+0x2c/0x120
 [] kvm_mmu_page_fault+0x27/0xd0 [kvm]
 [] pf_interception+0xa4/0x170 [kvm_amd]
 [] handle_exit+0x146/0x2d0 [kvm_amd]
 [] ? kvm_get_cr8+0x1d/0x30 [kvm]
 [] ? svm_vcpu_run+0x425/0x530 [kvm_amd]
 [] vcpu_enter_guest+0x39c/0x6b0 [kvm]
 [] __vcpu_run+0x1e8/0x320 [kvm]
 [] kvm_arch_vcpu_ioctl_run+0x9a/0x1f0 [kvm]
 [] kvm_vcpu_ioctl+0x4a8/0x590 [kvm]
 [] do_vfs_ioctl+0x8c/0x340
 [] sys_ioctl+0xa1/0xb0
 [] ? __audit_syscall_exit+0x3d6/0x430
 [] system_call_fastpath+0x16/0x1b
---[ end trace bc3b9055849b3814 ]---

The failing tests are svm and svm-disable, which seem to loop forever
once started.

Begin logfile:
 enabling apic
 enabling apic
 paging enabled
 cr0 = 80010011
 cr3 = 7fff000
 cr4 = 20
 null: PASS
 vmrun: PASS
 vmrun intercept check: PASS
 cr3 read intercept: PASS
 enabling apic
 enabling apic
 paging enabled
 cr0 = 80010011
 cr3 = 7fff000
 cr4 = 20
 null: PASS
 vmrun: PASS
 vmrun intercept check: PASS
 cr3 read intercept: PASS
  # goes on until the test is killed.

Anyone seen this behavior?

--
Kind regards.

Conny Seidel

##
# Email : conny.sei...@amd.comGnuPG-Key : 0xA6AB055D #
# Fingerprint: 17C4 5DB2 7C4C C1C7 1452 8148 F139 7C09 A6AB 055D #
##
# Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach  #
# General Managers: Alberto Bozzo#
# Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen #
#   HRB Nr. 43632#
##


signature.asc
Description: PGP signature

Re: I/O errors in guest OS after repeated migration

2012-10-17 Thread Guido Winkelmann

Am Dienstag, 16. Oktober 2012, 12:44:27 schrieb Brian Jackson:
> On Tuesday, October 16, 2012 11:33:44 AM Guido Winkelmann wrote:
> > The commandline, as generated by libvirtd, looks like this:
> > 
> > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
> > QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.15 -enable-kvm -m 1024
> > -smp 1,sockets=1,cores=1,threads=1 -name migratetest2 -uuid
> > ddbf11e9-387e-902b-4849-8c3067dc42a2 -nodefconfig -nodefaults -chardev
> > socket,id=charmonitor,path=/var/lib/libvirt/qemu/migratetest2.monitor,serv
> > e
> > r,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
> > -no-reboot -no- shutdown -device
> > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
> > file=/data/migratetest2_system,if=none,id=drive-virtio-
> > disk0,format=qcow2,cache=none -device virtio-blk-
> > pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-
> > disk0,bootindex=1 -drive file=/data/migratetest2_data-1,if=none,id=drive-
> > virtio-disk1,format=qcow2,cache=none -device virtio-blk-
> > pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -
> > netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-
> > pci,netdev=hostnet0,id=net0,mac=02:00:00:00:00:0c,bus=pci.0,addr=0x3 -vnc
> > 127.0.0.1:2,password -k de -vga cirrus -incoming tcp:0.0.0.0:49153 -device
> > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
> 
> I see qcow2 in there. Live migration of qcow2 was a new feature in 1.0. Have
> you tried other formats or different qemu/kvm versions?

I tried the same thing with a raw image file instead of qcow2, and the problem 
still happens. From the /var/log/messages of the guest:

Oct 17 17:10:34 localhost sshd[2368]: nss_ldap: could not search LDAP server - 
Server is unavailable
Oct 17 17:10:39 localhost kernel: [  126.800075] eth0: no IPv6 routers present
Oct 17 17:10:52 localhost kernel: [  140.335783] Clocksource tsc unstable 
(delta = -70265501 ns)
Oct 17 17:12:04 localhost /O error on device vda1, logical block 1858765
Oct 17 17:12:04 localhost kernel: [  212.070584] Buffer I/O error on device 
vda1, logical block 1858766
Oct 17 17:12:04 localhost kernel: [  212.070587] Buffer I/O error on device 
vda1, logical block 1858767
Oct 17 17:12:04 localhost kernel: [  212.070589] Buffer I/O error on device 
vda1, logical block 1858768
Oct 17 17:12:04 localhost kernel: [  212.070592] Buffer I/O error on device 
vda1, logical block 1858769
Oct 17 17:12:04 localhost kernel: [  212.070595] Buffer I/O error on device 
vda1, logical block 1858770
Oct 17 17:12:04 localhost kernel: [  212.070597] Buffer I/O error on device 
vda1, logical block 1858771
Oct 17 17:12:04 localhost kernel: [  212.070600] Buffer I/O error on device 
vda1, logical block 1858772
Oct 17 17:12:04 localhost kernel: [  212.070602] Buffer I/O error on device 
vda1, logical block 1858773
Oct 17 17:12:04 localhost kernel: [  212.070605] Buffer I/O error on device 
vda1, logical block 1858774
Oct 17 17:12:04 localhost kernel: [  212.070607] Buffer I/O error on device 
vda1, logical block 1858775
Oct 17 17:12:04 localhost kernel: [  212.070610] Buffer I/O error on device 
vda1, logical block 1858776
Oct 17 17:12:04 localhost kernel: [  212.070612] Buffer I/O error on device 
vda1, logical block 1858777
Oct 17 17:12:04 localhost kernel: [  212.070615] Buffer I/O error on device 
vda1, logical block 1858778
Oct 17 17:12:04 localhost kernel: [  212.070617] Buffer I/O error on device 
vda1, logical block 1858779

(I was writing a large file at the time, to make sure I actually catch I/O 
errors as they happen)

Guido
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC V2 0/5] Separate consigned (expected steal) from steal time.

2012-10-17 Thread Michael Wolf

On Wed, 2012-10-17 at 21:14 +0400, Glauber Costa wrote:
> On 10/17/2012 06:23 AM, Michael Wolf wrote:
> > In the case of where you have a system that is running in a
> > capped or overcommitted environment the user may see steal time
> > being reported in accounting tools such as top or vmstat.  This can
> > cause confusion for the end user.  To ease the confusion this patch set
> > adds the idea of consigned (expected steal) time.  The host will separate
> > the consigned time from the steal time.  The consignment limit passed to the
> > host will be the amount of steal time expected within a fixed period of
> > time.  Any other steal time accruing during that period will show as the
> > traditional steal time.
> > 
> > TODO:
> > * Change native_clock to take params and not return a value
> > * Change update_rq_clock_task
> > 
> > Changes from V1:
> > * Removed the steal time allowed percentage from the guest
> > * Moved the separation of consigned (expected steal) and steal time to the
> >   host.
> > * No longer include a sysctl interface.
> > 
> 
> You are showing this in the guest somewhere, but tools like top will
> still not show it. So for quite a while, it achieves nothing.
> 
> Of course this is a barrier that any new statistic has to go through. So
> while annoying, this is per-se ultimately not a blocker.
> 
> What I still fail to see, is how this is useful information to be shown
> in the guest. Honestly, if I'm in a guest VM or container, any time
> during which I am not running is time I lost. It doesn't matter if this
> was expected or not. This still seems to me as a host-side problem, to
> be solved entirely by tooling.
> 

What tools like top and vmstat will show is altered.  When I put time in
the consign bucket it does not show up in steal.  So now as long as the
system is performing as expected the user will see 100% and 0% steal.  I
added the consign field to /proc/stat so that all time accrued in the
period is accounted for and also for debugging purposes.  The user wont
care about consign and will not see it.  

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv4 2/2] kvm: deliver msi interrupts from irq handler

2012-10-17 Thread Michael S. Tsirkin

We can deliver certain interrupts, notably MSI,
from atomic context.  Use kvm_set_irq_inatomic,
to implement an irq handler for msi.

This reduces the pressure on scheduler in case
where host and guest irq share a host cpu.

Signed-off-by: Michael S. Tsirkin 
---
 virt/kvm/assigned-dev.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 23a41a9..3642239 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -105,6 +105,15 @@ static irqreturn_t kvm_assigned_dev_thread_intx(int irq, 
void *dev_id)
 }
 
 #ifdef __KVM_HAVE_MSI
+static irqreturn_t kvm_assigned_dev_msi(int irq, void *dev_id)
+{
+   struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
+   int ret = kvm_set_irq_inatomic(assigned_dev->kvm,
+  assigned_dev->irq_source_id,
+  assigned_dev->guest_irq, 1);
+   return unlikely(ret == -EWOULDBLOCK) ? IRQ_WAKE_THREAD : IRQ_HANDLED;
+}
+
 static irqreturn_t kvm_assigned_dev_thread_msi(int irq, void *dev_id)
 {
struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
@@ -117,6 +126,23 @@ static irqreturn_t kvm_assigned_dev_thread_msi(int irq, 
void *dev_id)
 #endif
 
 #ifdef __KVM_HAVE_MSIX
+static irqreturn_t kvm_assigned_dev_msix(int irq, void *dev_id)
+{
+   struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
+   int index = find_index_from_host_irq(assigned_dev, irq);
+   u32 vector;
+   int ret = 0;
+
+   if (index >= 0) {
+   vector = assigned_dev->guest_msix_entries[index].vector;
+   ret = kvm_set_irq_inatomic(assigned_dev->kvm,
+  assigned_dev->irq_source_id,
+  vector, 1);
+   }
+
+   return unlikely(ret == -EWOULDBLOCK) ? IRQ_WAKE_THREAD : IRQ_HANDLED;
+}
+
 static irqreturn_t kvm_assigned_dev_thread_msix(int irq, void *dev_id)
 {
struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
@@ -334,11 +360,6 @@ static int assigned_device_enable_host_intx(struct kvm 
*kvm,
 }
 
 #ifdef __KVM_HAVE_MSI
-static irqreturn_t kvm_assigned_dev_msi(int irq, void *dev_id)
-{
-   return IRQ_WAKE_THREAD;
-}
-
 static int assigned_device_enable_host_msi(struct kvm *kvm,
   struct kvm_assigned_dev_kernel *dev)
 {
@@ -363,11 +384,6 @@ static int assigned_device_enable_host_msi(struct kvm *kvm,
 #endif
 
 #ifdef __KVM_HAVE_MSIX
-static irqreturn_t kvm_assigned_dev_msix(int irq, void *dev_id)
-{
-   return IRQ_WAKE_THREAD;
-}
-
 static int assigned_device_enable_host_msix(struct kvm *kvm,
struct kvm_assigned_dev_kernel *dev)
 {
-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv4 1/2] kvm: add kvm_set_irq_inatomic

2012-10-17 Thread Michael S. Tsirkin

Add an API to inject IRQ from atomic context.
Return EWOULDBLOCK if impossible (e.g. for multicast).
Only MSI is supported ATM.

Signed-off-by: Michael S. Tsirkin 
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/irq_comm.c  | 83 +---
 2 files changed, 72 insertions(+), 12 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 93bfc9f..e165c09 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -677,6 +677,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
*ioapic,
   unsigned long *deliver_bitmask);
 #endif
 int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
+int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int 
level);
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
*kvm,
int irq_source_id, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 2eb58af..656fa45 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -102,6 +102,23 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
return r;
 }
 
+static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+  struct kvm_lapic_irq *irq)
+{
+   trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
+
+   irq->dest_id = (e->msi.address_lo &
+   MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+   irq->vector = (e->msi.data &
+   MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
+   irq->dest_mode = (1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo;
+   irq->trig_mode = (1 << MSI_DATA_TRIGGER_SHIFT) & e->msi.data;
+   irq->delivery_mode = e->msi.data & 0x700;
+   irq->level = 1;
+   irq->shorthand = 0;
+   /* TODO Deal with RH bit of MSI message address */
+}
+
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
struct kvm *kvm, int irq_source_id, int level)
 {
@@ -110,22 +127,26 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
if (!level)
return -1;
 
-   trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
+   kvm_set_msi_irq(e, &irq);
 
-   irq.dest_id = (e->msi.address_lo &
-   MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
-   irq.vector = (e->msi.data &
-   MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
-   irq.dest_mode = (1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo;
-   irq.trig_mode = (1 << MSI_DATA_TRIGGER_SHIFT) & e->msi.data;
-   irq.delivery_mode = e->msi.data & 0x700;
-   irq.level = 1;
-   irq.shorthand = 0;
-
-   /* TODO Deal with RH bit of MSI message address */
return kvm_irq_delivery_to_apic(kvm, NULL, &irq);
 }
 
+
+static int kvm_set_msi_inatomic(struct kvm_kernel_irq_routing_entry *e,
+struct kvm *kvm)
+{
+   struct kvm_lapic_irq irq;
+   int r;
+
+   kvm_set_msi_irq(e, &irq);
+
+   if (kvm_irq_delivery_to_apic_fast(kvm, NULL, &irq, &r))
+   return r;
+   else
+   return -EWOULDBLOCK;
+}
+
 int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi)
 {
struct kvm_kernel_irq_routing_entry route;
@@ -178,6 +199,44 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 
irq, int level)
return ret;
 }
 
+/*
+ * Deliver an IRQ in an atomic context if we can, or return a failure,
+ * user can retry in a process context.
+ * Return value:
+ *  -EWOULDBLOCK - Can't deliver in atomic context: retry in a process context.
+ *  Other values - No need to retry.
+ */
+int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int 
level)
+{
+   struct kvm_kernel_irq_routing_entry *e;
+   int ret = -EINVAL;
+   struct kvm_irq_routing_table *irq_rt;
+   struct hlist_node *n;
+
+   trace_kvm_set_irq(irq, level, irq_source_id);
+
+   /*
+* Injection into either PIC or IOAPIC might need to scan all CPUs,
+* which would need to be retried from thread context;  when same GSI
+* is connected to both PIC and IOAPIC, we'd have to report a
+* partial failure here.
+* Since there's no easy way to do this, we only support injecting MSI
+* which is limited to 1:1 GSI mapping.
+*/
+   rcu_read_lock();
+   irq_rt = rcu_dereference(kvm->irq_routing);
+   if (irq < irq_rt->nr_rt_entries)
+   hlist_for_each_entry(e, n, &irq_rt->map[irq], link) {
+   if (likely(e->type == KVM_IRQ_ROUTING_MSI))
+   ret = kvm_set_msi_inatomic(e, kvm);
+   else
+   ret = -EWOULDBLOCK;
+   break;
+   }
+   rcu_read_unlock();
+   re

[PATCHv4 0/2] kvm: direct msix injection

2012-10-17 Thread Michael S. Tsirkin

We can deliver certain interrupts, notably MSIX,
from atomic context.
Here's an untested patch to do this (compiled only).

Changes from v2:
Don't inject broadcast interrupts directly
Changes from v1:
Tried to address comments from v1, except unifying
with kvm_set_irq: passing flags to it looks too ugly.
Added a comment.

Jan, you said you can test this?


Michael S. Tsirkin (2):
  kvm: add kvm_set_irq_inatomic
  kvm: deliver msi interrupts from irq handler

 include/linux/kvm_host.h |  1 +
 virt/kvm/assigned-dev.c  | 36 +++--
 virt/kvm/irq_comm.c  | 83 +---
 3 files changed, 98 insertions(+), 22 deletions(-)

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] KVM: PPC: Support ioeventfd

2012-10-17 Thread Avi Kivity

On 10/16/2012 04:49 PM, Alexander Graf wrote:

>> If there is a lot of prioritization and/or queuing logic, then yes.  But
>> what about MSI?  Doesn't that have a direct path?
> 
> Nope. Well, yes, in a certain special case where the MPIC pushes the
> interrupt vector on interrupt delivery into a special register. But not
> for the "normal" case.

Ok.  The patches are fine then, but would be good to add the PIO check.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 2/6] KVM: MMU: remove mmu_is_invalid

2012-10-17 Thread Avi Kivity

On 10/16/2012 02:08 PM, Xiao Guangrong wrote:
> Remove mmu_is_invalid and use is_invalid_pfn instead


Applied 2-5 to next; 6 depends on 1, so will wait until it is merged
upstream.



-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Added call parameter to track whether invocation originated with guest or elsewhere

2012-10-17 Thread Avi Kivity

On 10/17/2012 04:09 PM, Marcelo Tosatti wrote:
> On Wed, Oct 17, 2012 at 12:35:33PM +0200, Avi Kivity wrote:
>> On 10/17/2012 04:10 AM, Will Auld wrote:
>> > Signed-off-by: Will Auld 
>> > ---
>> > 
>> > Resending to full list
>> > 
>> > Marcelo,
>> > 
>> > This patch is what I believe you ask for as foundational for later
>> > patches to address IA32_TSC_ADJUST. 
>> > 
>> 
>> Please write a changelog to reflect the motivation.
>> 
>> All those bool parameters scattered all over the place aren't very
>> pretty.  Usually we solve this with helpers that embed the parameter
>> name (kvm_set_msr() vs. kvm_set_msr_host()) but there are too many
>> functions for this to work here.
>> 
>> Marcelo, any ideas?
> 
> Its easier to read
> 
> kvm_x86_ops->kvm_set_msr()
> kvm_x86_ops->kvm_set_msr_host()
> 
> then
> 
> kvm_x86_ops->kvm_set_msr(,false)
> kvm_x86_ops->kvm_set_msr(,true)
> 
> So you're right.

Yes, but we have a million functions for setting MSRs.

Maybe

struct msr {
bool host_requested;
u32 index;
u64 data;
};

and change all the APIs to use that.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/10] UAPI: Put a comment into uapi/asm-generic/kvm_para.h and use it from arches

2012-10-17 Thread David Howells

Make uapi/asm-generic/kvm_para.h non-empty by addition of a comment to stop
the patch program from deleting it when it creates it.

Then delete empty arch-specific uapi/asm/kvm_para.h files and tell the Kbuild
files to use the generic instead.

Should this perhaps instead be a #warning or #error that the facility is
unsupported on this arch?

Signed-off-by: David Howells 
cc: Arnd Bergmann 
cc: Avi Kivity 
cc: Marcelo Tosatti 
cc: kvm@vger.kernel.org
---

 arch/ia64/include/uapi/asm/Kbuild |2 ++
 arch/ia64/include/uapi/asm/kvm_para.h |0 
 arch/s390/include/uapi/asm/Kbuild |2 ++
 arch/s390/include/uapi/asm/kvm_para.h |0 
 include/uapi/asm-generic/kvm_para.h   |4 
 5 files changed, 8 insertions(+)
 delete mode 100644 arch/ia64/include/uapi/asm/kvm_para.h
 delete mode 100644 arch/s390/include/uapi/asm/kvm_para.h

diff --git a/arch/ia64/include/uapi/asm/Kbuild 
b/arch/ia64/include/uapi/asm/Kbuild
index 30cafac..1b3f5eb 100644
--- a/arch/ia64/include/uapi/asm/Kbuild
+++ b/arch/ia64/include/uapi/asm/Kbuild
@@ -1,6 +1,8 @@
 # UAPI Header export list
 include include/uapi/asm-generic/Kbuild.asm
 
+generic-y += kvm_para.h
+
 header-y += auxvec.h
 header-y += bitsperlong.h
 header-y += break.h
diff --git a/arch/ia64/include/uapi/asm/kvm_para.h 
b/arch/ia64/include/uapi/asm/kvm_para.h
deleted file mode 100644
index e69de29..000
diff --git a/arch/s390/include/uapi/asm/Kbuild 
b/arch/s390/include/uapi/asm/Kbuild
index 7bf68ff..59b67ed 100644
--- a/arch/s390/include/uapi/asm/Kbuild
+++ b/arch/s390/include/uapi/asm/Kbuild
@@ -1,6 +1,8 @@
 # UAPI Header export list
 include include/uapi/asm-generic/Kbuild.asm
 
+generic-y += kvm_para.h
+
 header-y += auxvec.h
 header-y += bitsperlong.h
 header-y += byteorder.h
diff --git a/arch/s390/include/uapi/asm/kvm_para.h 
b/arch/s390/include/uapi/asm/kvm_para.h
deleted file mode 100644
index e69de29..000
diff --git a/include/uapi/asm-generic/kvm_para.h 
b/include/uapi/asm-generic/kvm_para.h
index e69de29..486f0af 100644
--- a/include/uapi/asm-generic/kvm_para.h
+++ b/include/uapi/asm-generic/kvm_para.h
@@ -0,0 +1,4 @@
+/*
+ * There isn't anything here, but the file must not be empty or patch
+ * will delete it.
+ */

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Added call parameter to track whether invocation originated with guest or elsewhere

2012-10-17 Thread Marcelo Tosatti

On Wed, Oct 17, 2012 at 12:35:33PM +0200, Avi Kivity wrote:
> On 10/17/2012 04:10 AM, Will Auld wrote:
> > Signed-off-by: Will Auld 
> > ---
> > 
> > Resending to full list
> > 
> > Marcelo,
> > 
> > This patch is what I believe you ask for as foundational for later
> > patches to address IA32_TSC_ADJUST. 
> > 
> 
> Please write a changelog to reflect the motivation.
> 
> All those bool parameters scattered all over the place aren't very
> pretty.  Usually we solve this with helpers that embed the parameter
> name (kvm_set_msr() vs. kvm_set_msr_host()) but there are too many
> functions for this to work here.
> 
> Marcelo, any ideas?

Its easier to read

kvm_x86_ops->kvm_set_msr()
kvm_x86_ops->kvm_set_msr_host()

then

kvm_x86_ops->kvm_set_msr(,false)
kvm_x86_ops->kvm_set_msr(,true)

So you're right.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/10] UAPI: Put a comment into uapi/asm-generic/kvm_para.h and use it from arches

2012-10-17 Thread Arnd Bergmann

On Wednesday 17 October 2012, David Howells wrote:
> Make uapi/asm-generic/kvm_para.h non-empty by addition of a comment to stop
> the patch program from deleting it when it creates it.
> 
> Then delete empty arch-specific uapi/asm/kvm_para.h files and tell the Kbuild
> files to use the generic instead.
> 
> Should this perhaps instead be a #warning or #error that the facility is
> unsupported on this arch?

Just an empty file is fine by me, but an #error also sounds reasonable if
we want users to be able to write autoconf tests for it.

> Signed-off-by: David Howells 
> cc: Arnd Bergmann 
> cc: Avi Kivity 
> cc: Marcelo Tosatti 
> cc: kvm@vger.kernel.org

Acked-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/10] UAPI: Put a comment into uapi/asm-generic/kvm_para.h and use it from arches

2012-10-17 Thread David Howells

Make uapi/asm-generic/kvm_para.h non-empty by addition of a comment to stop
the patch program from deleting it when it creates it.

Then delete empty arch-specific uapi/asm/kvm_para.h files and tell the Kbuild
files to use the generic instead.

Should this perhaps instead be a #warning or #error that the facility is
unsupported on this arch?

Signed-off-by: David Howells 
cc: Arnd Bergmann 
cc: Avi Kivity 
cc: Marcelo Tosatti 
cc: kvm@vger.kernel.org
---

 arch/ia64/include/uapi/asm/Kbuild |2 ++
 arch/ia64/include/uapi/asm/kvm_para.h |0 
 arch/s390/include/uapi/asm/Kbuild |2 ++
 arch/s390/include/uapi/asm/kvm_para.h |0 
 include/uapi/asm-generic/kvm_para.h   |4 
 5 files changed, 8 insertions(+)
 delete mode 100644 arch/ia64/include/uapi/asm/kvm_para.h
 delete mode 100644 arch/s390/include/uapi/asm/kvm_para.h

diff --git a/arch/ia64/include/uapi/asm/Kbuild 
b/arch/ia64/include/uapi/asm/Kbuild
index 30cafac..1b3f5eb 100644
--- a/arch/ia64/include/uapi/asm/Kbuild
+++ b/arch/ia64/include/uapi/asm/Kbuild
@@ -1,6 +1,8 @@
 # UAPI Header export list
 include include/uapi/asm-generic/Kbuild.asm
 
+generic-y += kvm_para.h
+
 header-y += auxvec.h
 header-y += bitsperlong.h
 header-y += break.h
diff --git a/arch/ia64/include/uapi/asm/kvm_para.h 
b/arch/ia64/include/uapi/asm/kvm_para.h
deleted file mode 100644
index e69de29..000
diff --git a/arch/s390/include/uapi/asm/Kbuild 
b/arch/s390/include/uapi/asm/Kbuild
index 7bf68ff..59b67ed 100644
--- a/arch/s390/include/uapi/asm/Kbuild
+++ b/arch/s390/include/uapi/asm/Kbuild
@@ -1,6 +1,8 @@
 # UAPI Header export list
 include include/uapi/asm-generic/Kbuild.asm
 
+generic-y += kvm_para.h
+
 header-y += auxvec.h
 header-y += bitsperlong.h
 header-y += byteorder.h
diff --git a/arch/s390/include/uapi/asm/kvm_para.h 
b/arch/s390/include/uapi/asm/kvm_para.h
deleted file mode 100644
index e69de29..000
diff --git a/include/uapi/asm-generic/kvm_para.h 
b/include/uapi/asm-generic/kvm_para.h
index e69de29..486f0af 100644
--- a/include/uapi/asm-generic/kvm_para.h
+++ b/include/uapi/asm-generic/kvm_para.h
@@ -0,0 +1,4 @@
+/*
+ * There isn't anything here, but the file must not be empty or patch
+ * will delete it.
+ */

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 1/6] KVM: MMU: fix release noslot pfn

2012-10-17 Thread Avi Kivity

On 10/16/2012 02:07 PM, Xiao Guangrong wrote:
> We can not directly call kvm_release_pfn_clean to release the pfn
> since we can meet noslot pfn which is used to cache mmio info into
> spte

Applied to master for 3.7, 3.6, thanks.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: I/O errors in guest OS after repeated migration

2012-10-17 Thread Guido Winkelmann

Am Dienstag, 16. Oktober 2012, 12:44:27 schrieb Brian Jackson:
> On Tuesday, October 16, 2012 11:33:44 AM Guido Winkelmann wrote:
[...]
> > The commandline, as generated by libvirtd, looks like this:
> > 
> > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
> > QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.15 -enable-kvm -m 1024
> > -smp 1,sockets=1,cores=1,threads=1 -name migratetest2 -uuid
> > ddbf11e9-387e-902b-4849-8c3067dc42a2 -nodefconfig -nodefaults -chardev
> > socket,id=charmonitor,path=/var/lib/libvirt/qemu/migratetest2.monitor,serv
> > e
> > r,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
> > -no-reboot -no- shutdown -device
> > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
> > file=/data/migratetest2_system,if=none,id=drive-virtio-
> > disk0,format=qcow2,cache=none -device virtio-blk-
> > pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-
> > disk0,bootindex=1 -drive file=/data/migratetest2_data-1,if=none,id=drive-
> > virtio-disk1,format=qcow2,cache=none -device virtio-blk-
> > pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -
> > netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-
> > pci,netdev=hostnet0,id=net0,mac=02:00:00:00:00:0c,bus=pci.0,addr=0x3 -vnc
> > 127.0.0.1:2,password -k de -vga cirrus -incoming tcp:0.0.0.0:49153 -device
> > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
> 
> I see qcow2 in there. Live migration of qcow2 was a new feature in 1.0. Have
> you tried other formats or different qemu/kvm versions?

Are you sure about that? Because I'm fairly certain I have been using live 
migration since at least 0.14, if not 0.13, and I have always been using qcow2 
as the image format for the disks...

I can still try with other image formats, though.

Guido
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM on NFS

2012-10-17 Thread Avi Kivity

On 10/17/2012 01:04 PM, Andrew Holway wrote:
> 
> 
>> O_DIRECT is good.  I/O schedulers don't affect NFS so no need to tune
>> anything on the host.  You might experiment with switching to the
>> deadline scheduler in the guest.
> 
> Ill give it a go. Any ideas how I should be tuning my NFS?

Not really.  The defaults should work well enough.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM on NFS

2012-10-17 Thread Andrew Holway



> O_DIRECT is good.  I/O schedulers don't affect NFS so no need to tune
> anything on the host.  You might experiment with switching to the
> deadline scheduler in the guest.

Ill give it a go. Any ideas how I should be tuning my NFS?

> 
> 
> -- 
> error compiling committee.c: too many arguments to function
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM on NFS

2012-10-17 Thread Avi Kivity

On 10/17/2012 11:20 AM, Andrew Holway wrote:
> Hello,
> 
> I am testing KVM on an Oracle NFS box that I have.
> 
> Does the list have any advice on best practice? I remember reading that there 
> is stuff you can do with I/O schedulers and stuff to make it more efficient.
> 
> My VMs will primarily be running mysql databases. I am currently using 
> o_direct.
> 

O_DIRECT is good.  I/O schedulers don't affect NFS so no need to tune
anything on the host.  You might experiment with switching to the
deadline scheduler in the guest.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Added call parameter to track whether invocation originated with guest or elsewhere

2012-10-17 Thread Avi Kivity

On 10/17/2012 04:10 AM, Will Auld wrote:
> Signed-off-by: Will Auld 
> ---
> 
> Resending to full list
> 
> Marcelo,
> 
> This patch is what I believe you ask for as foundational for later
> patches to address IA32_TSC_ADJUST. 
> 

Please write a changelog to reflect the motivation.

All those bool parameters scattered all over the place aren't very
pretty.  Usually we solve this with helpers that embed the parameter
name (kvm_set_msr() vs. kvm_set_msr_host()) but there are too many
functions for this to work here.

Marcelo, any ideas?

> Thanks,
> 
> Will
> 
>  arch/x86/include/asm/kvm_host.h |  8 
>  arch/x86/kvm/svm.c  | 18 ++
>  arch/x86/kvm/vmx.c  | 18 ++
>  arch/x86/kvm/x86.c  | 18 ++
>  arch/x86/kvm/x86.h  |  2 +-
>  5 files changed, 35 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 09155d6..c06f0d1 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -621,7 +621,7 @@ struct kvm_x86_ops {
>   void (*set_guest_debug)(struct kvm_vcpu *vcpu,
>   struct kvm_guest_debug *dbg);
>   int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
> - int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
> + int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data, bool 
> guest_initiated);
>   u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg);
>   void (*get_segment)(struct kvm_vcpu *vcpu,
>   struct kvm_segment *var, int seg);
> @@ -684,7 +684,7 @@ struct kvm_x86_ops {
>   bool (*has_wbinvd_exit)(void);
>  
>   void (*set_tsc_khz)(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool 
> scale);
> - void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
> + void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset, bool 
> guest_initiated);
>  
>   u64 (*compute_tsc_offset)(struct kvm_vcpu *vcpu, u64 target_tsc);
>   u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu);
> @@ -772,7 +772,7 @@ static inline int emulate_instruction(struct kvm_vcpu 
> *vcpu,
>  
>  void kvm_enable_efer_bits(u64);
>  int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
> -int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
> +int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data, bool 
> guest_initiated);
>  
>  struct x86_emulate_ctxt;
>  
> @@ -799,7 +799,7 @@ void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, 
> int *l);
>  int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
>  
>  int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
> -int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data);
> +int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool 
> guest_initiated);
>  
>  unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu);
>  void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index baead95..424be27 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1012,7 +1012,8 @@ static void svm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 
> user_tsc_khz, bool scale)
>   svm->tsc_ratio = ratio;
>  }
>  
> -static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> +static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset, 
> + bool guest_initiated)
>  {
>   struct vcpu_svm *svm = to_svm(vcpu);
>   u64 g_tsc_offset = 0;
> @@ -1255,7 +1256,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm 
> *kvm, unsigned int id)
>   svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
>   svm->asid_generation = 0;
>   init_vmcb(svm);
> - kvm_write_tsc(&svm->vcpu, 0);
> + kvm_write_tsc(&svm->vcpu, 0, false /*Not Guest Initiated*/);
>  
>   err = fx_init(&svm->vcpu);
>   if (err)
> @@ -3147,13 +3148,14 @@ static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 
> data)
>   return 0;
>  }
>  
> -static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
> +static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data, 
> + bool guest_initiated)
>  {
>   struct vcpu_svm *svm = to_svm(vcpu);
>  
>   switch (ecx) {
>   case MSR_IA32_TSC:
> - kvm_write_tsc(vcpu, data);
> + kvm_write_tsc(vcpu, data, guest_initiated);
>   break;
>   case MSR_STAR:
>   svm->vmcb->save.star = data;
> @@ -3208,12 +3210,12 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, 
> unsigned ecx, u64 data)
>   vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", 
> ecx, data);
>   break;
>   default:
> - return kvm_set_msr_common(vcpu, ecx, data);
> + return kvm_set_msr_common(vcpu, ecx, data, guest_initiated);
>   }
>   return 0

Re: [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT

2012-10-17 Thread Avi Kivity

On 10/16/2012 11:52 PM, Paul Mackerras wrote:
> On Tue, Oct 16, 2012 at 03:06:33PM +0200, Avi Kivity wrote:
>> On 10/16/2012 01:58 PM, Paul Mackerras wrote:
>> > On Tue, Oct 16, 2012 at 12:06:58PM +0200, Avi Kivity wrote:
>> >> Does/should the fd support O_NONBLOCK and poll? (=waiting for an entry
>> >> to change).
>> > 
>> > No.
>> 
>> This forces userspace to dedicate a thread for the HPT.
> 
> Why? Reads never block in any case.

Ok.  This parallels KVM_GET_DIRTY_LOG.

>> 
>> I meant the internal data structure that holds HPT entries.
> 
> Oh, that's just an array, and userspace already knows how big it is.
> 
>> I guess I don't understand the index.  Do we expect changes to be in
>> contiguous ranges?  And invalid entries to be contiguous as well?  That
>> doesn't fit with how hash tables work.  Does the index represent the
>> position of the entry within the table, or something else?
> 
> The index is just the position in the array.  Typically, in each group
> of 8 it will tend to be the low-numbered ones that are valid, since
> creating an entry usually uses the first empty slot.  So I expect that
> on the first pass, most of the records will represent 8 HPTEs.  On
> subsequent passes, probably most records will represent a single HPTE.

So it's a form of RLE compression.  Ok.

>> 
>> 16MiB is transferred in ~0.15 sec on GbE, much faster with 10GbE.  Does
>> it warrant a live migration protocol?
> 
> The qemu people I talked to seemed to think so.
> 
>> > Because it is a hash table, updates tend to be scattered throughout
>> > the whole table, which is another reason why per-page dirty tracking
>> > and updates would be pretty inefficient.
>> 
>> This suggests a stream format that includes the index in every entry.
> 
> That would amount to dropping the n_valid and n_invalid fields from
> the current header format.  That would be less efficient for the
> initial pass (assuming we achieve an average n_valid of at least 2 on
> the initial pass), and probably less efficient for the incremental
> updates, since a newly-invalidated entry would have to be represented
> as 16 zero bytes rather than just an 8-byte header with n_valid=0 and
> n_invalid=1.  I'm assuming here that the initial pass would omit
> invalid entries.

I agree.  But let's have some measurements to make sure.

> 
>> > 
>> > As for the change rate, it depends on the application of course, but
>> > basically every time the guest changes a PTE in its Linux page tables
>> > we do the corresponding change to the corresponding HPT entry, so the
>> > rate can be quite high.  Workloads that do a lot of fork, exit, mmap,
>> > exec, etc. have a high rate of HPT updates.
>> 
>> If the rate is high enough, then there's no point in a live update.
> 
> True, but doesn't that argument apply to memory pages as well?

In some cases it does.  The question is what happens in practice.  If
you migrate a kernel build, how many entries are sent in the guest
stopped phase?


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT

2012-10-17 Thread Avi Kivity

On 10/16/2012 10:03 PM, Anthony Liguori wrote:
>>
>> This forces userspace to dedicate a thread for the HPT.
> 
> If no changes are available, does read return a size > 0?  I don't think
> it's necessary to support polling.  The kernel should always be able to
> respond to userspace here.  The only catch is whether to return !0 read
> sizes when there are no changes.
> 
> At any case, I can't see why a dedicated thread is needed.  QEMU is
> going to poll HPT based on how fast we can send data over the wire.

That means spinning if we can send the data faster than we dirty it.
But we do that anyway for memory.



-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Secure migration of LVM based guests over WAN

2012-10-17 Thread Lukas Laukamp


Am 16.10.2012 12:10, schrieb Avi Kivity:

On 10/16/2012 11:48 AM, Lukas Laukamp wrote:

Am 16.10.2012 11:40, schrieb Avi Kivity:

On 10/16/2012 11:12 AM, Lukas Laukamp wrote:

Hey all,

I have a question about a solution for migrate LVM based guests directly
over the network.

So the situation: Two KVM hosts with libvirt, multiple LVM based guests
Want to do: Migrate a LVM based guest directly to the other host over an
secure connection

I know that migration is possible when the VM disks are stored on an
NFS, GFS2 filer/cluster etc.

So would it be possible to do an offline migration directly with netcat
or something like that?


If all you need is offline, you can use scp to copy each volume to the
destination volume.  Make sure the guests are shut down when you do that.

It is also possible to do a live migration, but unless the destination
and source are in the same IP subnet, the guests are going to lose
connectivity.



Hello Avi,

so can I simply copy an logical volume to the path of the volume group
with scp?

Yes.  Best to enable compression to avoid sending zero blocks.


For the live migration theme, it would be no problem when the guests
looses connectivity, how could be done a live migration?


See the -b option to the migrate command.



I will read a little bit about the live migration theme.

Best Regards
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86: clear vmcss on all cpus when doing kdump if necessary

2012-10-17 Thread Avi Kivity

On 10/17/2012 04:28 AM, Zhang Yanfei wrote:
> 于 2012年10月15日 23:43, Avi Kivity 写道:
>> On 10/12/2012 08:40 AM, Zhang Yanfei wrote:
>>> Currently, kdump just makes all the logical processors leave VMX operation 
>>> by
>>> executing VMXOFF instruction, so any VMCSs active on the logical processors 
>>> may
>>> be corrupted. But, sometimes, we need the VMCSs to debug guest images 
>>> contained
>>> in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs 
>>> before
>>> executing the VMXOFF instruction.
>> 
>> How have you verified that VMXOFF doesn't flush cached VMCSs already?
>> 
> 
> I tried some tests, for example, I made copies for every vmcs, and in the 
> kdump
> path, I backed up all the loaded vmcs into the copies before vmxoff.
> After generating the vmcore, I retrieve the vmcss and their copies, and 
> compare them,
> no differences.
> 
> Another test is using VMCLEAR to clear all the loaded vmcs before VMXOFF,
> and compare the vmcss and their copies, there are indeed differences between 
> the
> vmcs and its copy.
> 
> I know the tests may be not so convincing, for example, I used memcpy to back 
> up
> the vmcss and it is an ordinary memory operation. But to ensure the 
> non-corruption
> of the vmcss in the vmcore, I think we should VMCLEAR the vmcss before VMXOFF 
> just
> as the Intel spec says.

Sorry, I was unclear -- I was referring to the spec, I wasn't sure
whether VMXOFF is defined to flush VMCSes or whether it just invalidates
on-chip caches so that it won't flush them out in the future, corrupting
memory.  We don't want to depend on actual behaviour as it may change
with future version.

Copying some Intel folk, maybe they can clarify it.

> 
>>>
>>> The patch set provides an alternative way to clear VMCSs related to guests
>>> on all cpus when host is doing kdump.
>>>
>> 
>> I'm not sure the sysctl is really necessary.  The only reason to turn if
>> off is if the corruption is so severe that the loaded vmcs list itself
>> causes a crash.  I think it should be rare enough that we can do it
>> unconditionally.
>> 
> 
> You mean not using sysctl and just let VMCLEAR-VMCSS be a default behaviour? 
> If so,
> I agree with you.

Yes, that's what I meant.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch]KVM: enabling per domain PLE

2012-10-17 Thread Avi Kivity

On 10/17/2012 10:02 AM, Hu, Xuekun wrote:
>> 
>> The problem with this is that it requires an administrator to understand the
>> workload, not only of the guest, but also of other guests on the machine.
>> With low overcommit, a high PLE window reduces unneeded exits, but with
>> high overcommit we need those exits to reduce spinning.
>> 
>> In addition, most kvm hosts don't have an administrator.  They are controlled
>> by a management system, which means we'll need some algorithm in
>> userspace to control the PLE window.  Taking the two together, we need a
>> dynamic (for changing workloads) algorithm.
>> 
>> There are threads discussing this dynamic algorithm, we are making slow
>> progress because it's such a difficult problem, but I think this is much more
>> useful than anything requiring user intervention.
> 
> Avi, agreed that dynamic adaptive ple should be the best solution. However
> currently it is a difficult problem like you said. Our solution just gives 
> user
> a choice who know how to set the two PLE values. So the solution is a 
> compromise
> solution, which should be better than nothing, for now? :-)

Let's see how the PLE thread works out.  Yes the patches give the user
control, but we need to make sure the user knows how to control it (in
fact your patch doesn't even update the documentation).  Just throwing
out a new ioctl, even if it is documented, doesn't mean that userspace
will begin to use it, or that users will exploit it.

Do you have a specific use case in mind?

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v3 06/19] Implement "-dimm" command line option

2012-10-17 Thread Avi Kivity

On 10/17/2012 11:19 AM, Vasilis Liaskovitis wrote:
>> 
>> I don't think so, but probably there's a limit of DIMMs that real
>> controllers have, something like 8 max.
> 
> In the case of i440fx specifically, do you mean that we should model the DRB
> (Dram row boundary registers in section 3.2.19 of the i440fx spec) ?
> 
> The i440fx DRB registers only supports up to 8 DRAM rows (let's say 1 row
> maps 1-1 to a DimmDevice for this discussion) and only supports up to 2GB of
> memory afaict (bit 31 and above is ignored).
> 
> I 'd rather not model this part of the i440fx - having only 8 DIMMs seems too
> restrictive. The rest of the patchset supports up to 255 DIMMs so it would be 
> a
> waste imho to model an old pc memory controller that only supports 8 DIMMs.
> 
> There was also an old discussion about i440fx modeling here:
> https://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg02705.html
> the general direction was that i440fx is too old and we don't want to 
> precisely
> emulate the DRB registers, since they lack flexibility.
> 
> Possible solutions:
> 
> 1) is there a newer and more flexible chipset that we could model?

Look for q35 on this list.

> 
> 2) model and document 
 ^--- the critical bit

> a generic (non-existent) i440fx that would support more
> and larger DIMMs. E.g. support 255 DIMMs. If we want to use a description
> similar to the i440fx DRB registers, the registers would take up a lot of 
> space.
> In i440fx there is one 8-bit DRB register per DIMM, and DRB[i] describes how
> many 8MB chunks are contained in DIMMs 0...i. So, the register values are
> cumulative (and total described memory cannot exceed 256x8MB = 2GB)

Our i440fx has already been extended by support for pci and cpu hotplug,
and I see no reason not to extend it for memory.  We can allocate extra
mmio space for registers if needed.  Usually I'm against this sort of
thing, but in this case we don't have much choice.

> 
> We could for example model: 
> - an 8-bit non-cumulative register for each DIMM, denoting how many
> 128MB chunks it contains. This allowes 32GB for each DIMM, and with 255 DIMMs 
> we
> describe a bit less than 8TB. These registers require 255 bytes.
> - a 16-bit cumulative register for each DIMM again for 128MB chunks. This 
> allows
> us to describe 8TB of memory (but the registers take up double the space, 
> because
> they describe cumulative memory amounts)

There is no reason to save space.  Why not have two 64-bit registers per
DIMM, one describing the size and the other the base address, both in
bytes?  Use a few low order bits for control.

> 
> 3) let everything be handled/abstracted by dimmbus - the chipset DRB modelling
> is not done (at least for i440fx, other machines could). This is the least 
> precise
> in terms of emulation. On the other hand, if we are not really trying to 
> emulate
> the real (too restrictive) hardware, does it matter?

We could emulate base memory using the chipset, and extra memory using
the scheme above.  This allows guests that are tied to the chipset to
work, and guests that have more awareness (seabios) to use the extra
features.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM ept flush

2012-10-17 Thread Avi Kivity

On 10/16/2012 08:50 PM, Rohan Sharma wrote:
> Thanks for the reply.
> I have one more question.
> If I do munmap of the RAM allocated in qemu,
> will the changes be reflected in KVM Ept.

Yes.  Those changes will be reflected.  See
kvm_mmu_notifier_invalidate_page(), and related.


> I guess there is some mmu notifier which ensures that entries of EPT
> are synced with the host entries.
> 
> On Tue, Oct 16, 2012 at 8:27 PM, Avi Kivity  wrote:
>> On 10/16/2012 01:57 PM, Rohan Sharma wrote:
>>> Is there a way to flush ept entries in qemu-kvm.
>>
>> No.
>>
>>
>> --
>> error compiling committee.c: too many arguments to function
> 


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM on NFS

2012-10-17 Thread Andrew Holway

Hello,

I am testing KVM on an Oracle NFS box that I have.

Does the list have any advice on best practice? I remember reading that there 
is stuff you can do with I/O schedulers and stuff to make it more efficient.

My VMs will primarily be running mysql databases. I am currently using o_direct.

Thanks,

Andrew



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v3 06/19] Implement "-dimm" command line option

2012-10-17 Thread Vasilis Liaskovitis

On Sat, Oct 13, 2012 at 08:57:19AM +, Blue Swirl wrote:
> On Tue, Oct 9, 2012 at 5:04 PM, Vasilis Liaskovitis
>  wrote:
> >>
snip
> >> Maybe even the dimmbus device shouldn't exist by itself after all, or
> >> it should be pretty much invisible to users. On real HW, the memory
> >> controller or south bridge handles the memory. For i440fx, it's part
> >> of the same chipset. So I think we should just add qdev properties to
> >> i440fx to specify the sizes, nodes etc. Then i440fx should create the
> >> dimmbus device unconditionally using the properties. The default
> >> properties should create a sane configuration, otherwise -global
> >> i440fx.dimm_size=512M etc. could be used. Then the bus would be
> >> populated as before or with device_add.
> >
> > hmm the problem with using only i440fx properties, is that size/nodes look
> > dimm specific to me, not chipset-memcontroller specific. Unless we only 
> > allow
> > uniform size dimms. Is it possible to have a dynamic list of sizes/nodes 
> > pairs as
> > properties of a qdev device?
> 
> I don't think so, but probably there's a limit of DIMMs that real
> controllers have, something like 8 max.

In the case of i440fx specifically, do you mean that we should model the DRB
(Dram row boundary registers in section 3.2.19 of the i440fx spec) ?

The i440fx DRB registers only supports up to 8 DRAM rows (let's say 1 row
maps 1-1 to a DimmDevice for this discussion) and only supports up to 2GB of
memory afaict (bit 31 and above is ignored).

I 'd rather not model this part of the i440fx - having only 8 DIMMs seems too
restrictive. The rest of the patchset supports up to 255 DIMMs so it would be a
waste imho to model an old pc memory controller that only supports 8 DIMMs.

There was also an old discussion about i440fx modeling here:
https://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg02705.html
the general direction was that i440fx is too old and we don't want to precisely
emulate the DRB registers, since they lack flexibility.

Possible solutions:

1) is there a newer and more flexible chipset that we could model?

2) model and document a generic (non-existent) i440fx that would support more
and larger DIMMs. E.g. support 255 DIMMs. If we want to use a description
similar to the i440fx DRB registers, the registers would take up a lot of space.
In i440fx there is one 8-bit DRB register per DIMM, and DRB[i] describes how
many 8MB chunks are contained in DIMMs 0...i. So, the register values are
cumulative (and total described memory cannot exceed 256x8MB = 2GB)

We could for example model: 
- an 8-bit non-cumulative register for each DIMM, denoting how many
128MB chunks it contains. This allowes 32GB for each DIMM, and with 255 DIMMs we
describe a bit less than 8TB. These registers require 255 bytes.
- a 16-bit cumulative register for each DIMM again for 128MB chunks. This allows
us to describe 8TB of memory (but the registers take up double the space, 
because
they describe cumulative memory amounts)

3) let everything be handled/abstracted by dimmbus - the chipset DRB modelling
is not done (at least for i440fx, other machines could). This is the least 
precise
in terms of emulation. On the other hand, if we are not really trying to emulate
the real (too restrictive) hardware, does it matter?

thanks,

- Vasilis
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [3.5.0 BUG] vmx_handle_exit: unexpected, valid vectoring info (0x80000b0e)

2012-10-17 Thread Fengguang Wu

On Wed, Oct 17, 2012 at 03:04:49PM +0800, Xiao Guangrong wrote:
> On 10/17/2012 02:43 PM, Fengguang Wu wrote:
> > On Wed, Oct 17, 2012 at 02:26:22PM +0800, Xiao Guangrong wrote:
> >> On 09/14/2012 01:57 PM, Xiao Guangrong wrote:
> >>> On 09/12/2012 04:15 PM, Avi Kivity wrote:
>  On 09/12/2012 07:40 AM, Fengguang Wu wrote:
> > Hi,
> >
> > 3 of my test boxes running v3.5 kernel become unaccessible and I find
> > two of them kept emitting this dmesg:
> >
> > vmx_handle_exit: unexpected, valid vectoring info (0x8b0e) and exit 
> > reason is 0x31
> >
> > The other one has froze and the above lines are the last dmesg.
> > Any ideas?
> 
>  First, that printk should be rate-limited.
> 
>  Second, we should add EXIT_REASON_EPT_MISCONFIG (0x31) to 
> 
>   if ((vectoring_info & VECTORING_INFO_VALID_MASK) &&
>   (exit_reason != EXIT_REASON_EXCEPTION_NMI &&
>   exit_reason != EXIT_REASON_EPT_VIOLATION &&
>   exit_reason != EXIT_REASON_TASK_SWITCH))
>   printk(KERN_WARNING "%s: unexpected, valid vectoring info "
>  "(0x%x) and exit reason is 0x%x\n",
>  __func__, vectoring_info, exit_reason);
> 
>  since it's easily caused by the guest.
> >>>
> >>> Yes, i will do these.
> >>>
> 
>  Third, it's really unexpected.  It seems the guest was attempting to 
>  deliver a page fault exception (0x0e) but encountered an mmio page 
>  during delivery (in the IDT, TSS, stack, or page tables).  Is this 
>  reproducible?  If so it's easy to patch kvm to halt in that case and 
>  allow examining the guest via qemu.
> 
> >>>
> >>> Have no idea yet why the box was frozen under this case, will try to 
> >>> write a test case,
> >>> hope it can help me to find the reason out.
> >>>
> >>
> >> Still did not know why linux kernel triggered it. I have posted
> >> a patchset to report an internal error for this case, hoping
> >> Fengguang can reproduce it after the patchset and Qemu's dump
> >> can help us to find the reason out.
> >>
> >> I will keep working on it.
> > 
> > Thanks! Shall I run some patched kernel, or just 3.6.0?
> 
> The patchset is under review. Can be found at:
> https://lkml.org/lkml/2012/10/17/31

Thanks, I'll try it.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC V2 0/5] Separate consigned (expected steal) from steal time.

2012-10-17 Thread Glauber Costa

On 10/17/2012 06:23 AM, Michael Wolf wrote:
> In the case of where you have a system that is running in a
> capped or overcommitted environment the user may see steal time
> being reported in accounting tools such as top or vmstat.  This can
> cause confusion for the end user.  To ease the confusion this patch set
> adds the idea of consigned (expected steal) time.  The host will separate
> the consigned time from the steal time.  The consignment limit passed to the
> host will be the amount of steal time expected within a fixed period of
> time.  Any other steal time accruing during that period will show as the
> traditional steal time.
> 
> TODO:
> * Change native_clock to take params and not return a value
> * Change update_rq_clock_task
> 
> Changes from V1:
> * Removed the steal time allowed percentage from the guest
> * Moved the separation of consigned (expected steal) and steal time to the
>   host.
> * No longer include a sysctl interface.
> 

You are showing this in the guest somewhere, but tools like top will
still not show it. So for quite a while, it achieves nothing.

Of course this is a barrier that any new statistic has to go through. So
while annoying, this is per-se ultimately not a blocker.

What I still fail to see, is how this is useful information to be shown
in the guest. Honestly, if I'm in a guest VM or container, any time
during which I am not running is time I lost. It doesn't matter if this
was expected or not. This still seems to me as a host-side problem, to
be solved entirely by tooling.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [Patch]KVM: enabling per domain PLE

2012-10-17 Thread Hu, Xuekun

> 
> The problem with this is that it requires an administrator to understand the
> workload, not only of the guest, but also of other guests on the machine.
> With low overcommit, a high PLE window reduces unneeded exits, but with
> high overcommit we need those exits to reduce spinning.
> 
> In addition, most kvm hosts don't have an administrator.  They are controlled
> by a management system, which means we'll need some algorithm in
> userspace to control the PLE window.  Taking the two together, we need a
> dynamic (for changing workloads) algorithm.
> 
> There are threads discussing this dynamic algorithm, we are making slow
> progress because it's such a difficult problem, but I think this is much more
> useful than anything requiring user intervention.

Avi, agreed that dynamic adaptive ple should be the best solution. However
currently it is a difficult problem like you said. Our solution just gives user
a choice who know how to set the two PLE values. So the solution is a compromise
solution, which should be better than nothing, for now? :-)

Your comments? 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM_MAX_VCPUS

2012-10-17 Thread Gleb Natapov

On Wed, Oct 17, 2012 at 02:57:15AM +, Wei, Bing (WeiBing, MCXS-SH) wrote:
> For pCPU/core and VCPUS/logical cpu mapping, It should be 8 multiple. 254 is 
> reasonable. Or something I miss?
> 
I am not sure what do you mean. Can you clarify?

> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf 
> Of Vinod, Chegu
> Sent: Sunday, October 14, 2012 9:43 PM
> To: Gleb Natapov
> Cc: Sasha Levin; KVM
> Subject: Re: KVM_MAX_VCPUS
> 
> On 10/14/2012 2:08 AM, Gleb Natapov wrote:
> > On Sat, Oct 13, 2012 at 10:32:13PM -0400, Sasha Levin wrote:
> >> On 10/13/2012 06:29 PM, Chegu Vinod wrote:
> >>> Hello,
> >>>
> >>> Wanted to get a clarification about KVM_MAX_VCPUS(currently set to 254)
> >>>   in kvm_host.h file. The kvm_vcpu *vcpus array is sized based on 
> >>> KVM_MAX_VCPUS.
> >>> (i.e. a max of 254 elements in the array).
> >>>   
> >>> An 8bit APIC id should allow for 256 ID's. Reserving one for Broadcast 
> >>> should
> >>> leave 255 ID's.  Is there one more ID reserved for some other purpose ? 
> >>> (hence
> >>> leading to KVM_MAX_VCPUS being set to 254 and not 255).
> >> Another ID goes to the IO-APIC.
> >>
> > This is not really needed on KVM. We can enlarge KVM_MAX_VCPUS to 255.
> 
> Thanks for clarification!  ( We did suspect the IO-APIC...but weren't 
> quite sure).
> 
> Vinod
> >
> > --
> > Gleb.
> > .
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [3.5.0 BUG] vmx_handle_exit: unexpected, valid vectoring info (0x80000b0e)

2012-10-17 Thread Xiao Guangrong

On 10/17/2012 02:43 PM, Fengguang Wu wrote:
> On Wed, Oct 17, 2012 at 02:26:22PM +0800, Xiao Guangrong wrote:
>> On 09/14/2012 01:57 PM, Xiao Guangrong wrote:
>>> On 09/12/2012 04:15 PM, Avi Kivity wrote:
 On 09/12/2012 07:40 AM, Fengguang Wu wrote:
> Hi,
>
> 3 of my test boxes running v3.5 kernel become unaccessible and I find
> two of them kept emitting this dmesg:
>
> vmx_handle_exit: unexpected, valid vectoring info (0x8b0e) and exit 
> reason is 0x31
>
> The other one has froze and the above lines are the last dmesg.
> Any ideas?

 First, that printk should be rate-limited.

 Second, we should add EXIT_REASON_EPT_MISCONFIG (0x31) to 

if ((vectoring_info & VECTORING_INFO_VALID_MASK) &&
(exit_reason != EXIT_REASON_EXCEPTION_NMI &&
exit_reason != EXIT_REASON_EPT_VIOLATION &&
exit_reason != EXIT_REASON_TASK_SWITCH))
printk(KERN_WARNING "%s: unexpected, valid vectoring info "
   "(0x%x) and exit reason is 0x%x\n",
   __func__, vectoring_info, exit_reason);

 since it's easily caused by the guest.
>>>
>>> Yes, i will do these.
>>>

 Third, it's really unexpected.  It seems the guest was attempting to 
 deliver a page fault exception (0x0e) but encountered an mmio page during 
 delivery (in the IDT, TSS, stack, or page tables).  Is this reproducible?  
 If so it's easy to patch kvm to halt in that case and allow examining the 
 guest via qemu.

>>>
>>> Have no idea yet why the box was frozen under this case, will try to write 
>>> a test case,
>>> hope it can help me to find the reason out.
>>>
>>
>> Still did not know why linux kernel triggered it. I have posted
>> a patchset to report an internal error for this case, hoping
>> Fengguang can reproduce it after the patchset and Qemu's dump
>> can help us to find the reason out.
>>
>> I will keep working on it.
> 
> Thanks! Shall I run some patched kernel, or just 3.6.0?

The patchset is under review. Can be found at:
https://lkml.org/lkml/2012/10/17/31

> 
> Another problem I sometimes run into is, dmesg no longer works in the
> test boxes that run lots of KVMs. It aborts with an error message:
> 
> dmesg: klogctl failed: Bad address

Interesting, will fight for it. :)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

55 matches

Mail list logo