date:20100317

[PATCH] KVM: VMX: Disable unrestricted guest when EPT disabled

2010-03-17 Thread Sheng Yang

Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest
supported processors.

Signed-off-by: Sheng Yang 
---

Please apply this to 2.6.32 stable. Thanks!

Patch already in the upstream, commit:
046d87103addc117f0d397196e85189722d4d7de

 arch/x86/kvm/vmx.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 80367c5..1092e8a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2316,8 +2316,10 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
if (vmx->vpid == 0)
exec_control &= ~SECONDARY_EXEC_ENABLE_VPID;
-   if (!enable_ept)
+   if (!enable_ept) {
exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
+   enable_unrestricted_guest = 0;
+   }
if (!enable_unrestricted_guest)
exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
-- 
1.5.4.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: VMX: Disable unrestricted guest when EPT disabled

2010-03-17 Thread Sheng Yang

On Thursday 18 March 2010 13:51:41 Alexander Graf wrote:
> On 18.03.2010, at 02:50, Sheng Yang wrote:
> > On Thursday 18 March 2010 02:37:10 Alexander Graf wrote:
> >> Marcelo Tosatti wrote:
> >>> On Fri, Nov 27, 2009 at 04:46:26PM +0800, Sheng Yang wrote:
>  Otherwise would cause VMEntry failure when using ept=0 on unrestricted
>  guest supported processors.
> 
>  Signed-off-by: Sheng Yang 
> >>>
> >>> Applied, thanks.
> >>
> >> So without this patch kvm breaks with ept=0? Sounds like a stable
> >> candidate to me.
> >
> > Seems unrestricted guest code isn't in v2.6.31-stable, and v2.6.32 had
> > already fixed this issue. So it should be fine.
> 
> Are you sure? I don't see the patch in 2.6.32-stable git.

Yes, you are right. Found it not in 2.6.32-stable...

Would post a patch for stable.

Thanks

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: VMX: Disable unrestricted guest when EPT disabled

2010-03-17 Thread Alexander Graf


On 18.03.2010, at 02:50, Sheng Yang wrote:

> On Thursday 18 March 2010 02:37:10 Alexander Graf wrote:
>> Marcelo Tosatti wrote:
>>> On Fri, Nov 27, 2009 at 04:46:26PM +0800, Sheng Yang wrote:
 Otherwise would cause VMEntry failure when using ept=0 on unrestricted
 guest supported processors.
 
 Signed-off-by: Sheng Yang 
>>> 
>>> Applied, thanks.
>> 
>> So without this patch kvm breaks with ept=0? Sounds like a stable
>> candidate to me.
> 
> Seems unrestricted guest code isn't in v2.6.31-stable, and v2.6.32 had 
> already 
> fixed this issue. So it should be fine.

Are you sure? I don't see the patch in 2.6.32-stable git.


Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 23/42] KVM: Activate Virtualization On Demand

2010-03-17 Thread Michael Tokarev

Dieter Ries wrote:
> On Wed, Mar 17, 2010 at 11:02:40PM +0100, Alexander Graf wrote:
[]
>> Are you 100% sure you don't have vmware, virtualbox, parallels, whatever 
>> running in parallel on that machine?
> 
> Definitely. I have virtualbox installed, but haven't used it in months.
> The others I don't use at all, so they are not installed either.

Dieter, we've talked with you on IRC yesterday...

Can you take a look at what's in the startup script sequence on
your machine, and what modules are loaded which may be related?
What I'm trying to say: I don't know how virtualbox works, but it
may come with a kernel module or a bootup script that touches SVM
settings.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Sheng Yang

On Thursday 18 March 2010 13:22:28 Sheng Yang wrote:
> On Thursday 18 March 2010 12:50:58 Zachary Amsden wrote:
> > On 03/17/2010 03:19 PM, Sheng Yang wrote:
> > > On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote:
> > >> On 03/16/2010 11:28 PM, Sheng Yang wrote:
> > >>> On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
> >  On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
> > > On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
> > >> Right, but there is a scope between kvm_guest_enter and really
> > >> running in guest os, where a perf event might overflow. Anyway,
> > >> the scope is very narrow, I will change it to use flag PF_VCPU.
> > >
> > > There is also a window between setting the flag and calling 'int
> > > $2' where an NMI might happen and be accounted incorrectly.
> > >
> > > Perhaps separate the 'int $2' into a direct call into perf and
> > > another call for the rest of NMI handling.  I don't see how it
> > > would work on svm though - AFAICT the NMI is held whereas vmx
> > > swallows it.
> > >
> > >I guess NMIs
> > > will be disabled until the next IRET so it isn't racy, just tricky.
> > 
> >  I'm not sure if vmexit does break NMI context or not. Hardware NMI
> >  context isn't reentrant till a IRET. YangSheng would like to double
> >  check it.
> > >>>
> > >>> After more check, I think VMX won't remained NMI block state for
> > >>> host. That's means, if NMI happened and processor is in VMX non-root
> > >>> mode, it would only result in VMExit, with a reason indicate that
> > >>> it's due to NMI happened, but no more state change in the host.
> > >>>
> > >>> So in that meaning, there _is_ a window between VMExit and KVM handle
> > >>> the NMI. Moreover, I think we _can't_ stop the re-entrance of NMI
> > >>> handling code because "int $2" don't have effect to block following
> > >>> NMI.
> > >>>
> > >>> And if the NMI sequence is not important(I think so), then we need to
> > >>> generate a real NMI in current vmexit-after code. Seems let APIC send
> > >>> a NMI IPI to itself is a good idea.
> > >>>
> > >>> I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to
> > >>> replace "int $2". Something unexpected is happening...
> > >>
> > >> You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't
> > >> supposed to be able to.
> > >
> > > Um? Why?
> > >
> > > Especially kernel is already using it to deliver NMI.
> >
> > That's the only defined case, and it is defined because the vector field
> > is ignore for DM_NMI.  Vol 3A (exact section numbers may vary depending
> > on your version).
> >
> > 8.5.1 / 8.6.1
> >
> > '100 (NMI) Delivers an NMI interrupt to the target processor or
> > processors.  The vector information is ignored'
> >
> > 8.5.2  Valid Interrupt Vectors
> >
> > 'Local and I/O APICs support 240 of these vectors (in the range of 16 to
> > 255) as valid interrupts.'
> >
> > 8.8.4 Interrupt Acceptance for Fixed Interrupts
> >
> > '...; vectors 0 through 15 are reserved by the APIC (see also: Section
> > 8.5.2, "Valid Interrupt Vectors")'
> >
> > So I misremembered, apparently you can deliver interrupts 0x10-0x1f, but
> > vectors 0x00-0x0f are not valid to send via APIC or I/O APIC.
> 
> As you pointed out, NMI is not "Fixed interrupt". If we want to send NMI,
>  it would need a specific delivery mode rather than vector number.
> 
> And if you look at code, if we specific NMI_VECTOR, the delivery mode would
>  be set to NMI.
> 
> So what's wrong here?

OK, I think I understand your points now. You meant that these vectors can't 
be filled in vector field directly, right? But NMI is a exception due to 
DM_NMI. Is that your point? I think we agree on this.

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Huang, Zhiteng

Hi Avi, Ingo,

I've been following through this long thread since the very first email.  

I'm a performance engineer whose job is to tune workloads run on top of KVM 
(and Xen previously).  As a performance engineer, I desperately want to have a 
tool that can monitor the host and guests at same time.  Think about >100 
guests mixed with Linux/Windows running together on single system, being able 
to know what's happening is critical to do performance analysis.   Actually I 
am the person asked Yanmin to add feature for CPU utilization break down (into 
host_usr, host_krn, guest_usr, guest_krn) so that I can monitor dozens of 
running guests.   I hasn't made this patch work on my system yet but I _do_ 
think this patch is a very good start.  

And finally, monitoring guests from host is useful for users too (administrator 
and performance guy like me).   I really appreciate you guys' work and would 
love to provide feedback from my point of view if needed.

Regards,

HUANG, Zhiteng

Intel SSG/SSD/SPA/PRC Scalability Lab

-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of 
Avi Kivity
Sent: Wednesday, March 17, 2010 11:55 AM
To: Frank Ch. Eigler
Cc: Anthony Liguori; Ingo Molnar; Zhang, Yanmin; Peter Zijlstra; Sheng Yang; 
linux-ker...@vger.kernel.org; kvm@vger.kernel.org; Marcelo Tosatti; oerg 
Roedel; Jes Sorensen; Gleb Natapov; Zachary Amsden; ziteng.hu...@intel.com
Subject: Re: [PATCH] Enhance perf to collect KVM guest os statistics from host 
side

On 03/17/2010 02:41 AM, Frank Ch. Eigler wrote:
> Hi -
>
> On Tue, Mar 16, 2010 at 06:04:10PM -0500, Anthony Liguori wrote:
>
>> [...]
>> The only way to really address this is to change the interaction.
>> Instead of running perf externally to qemu, we should support a perf
>> command in the qemu monitor that can then tie directly to the perf
>> tooling.  That gives us the best possible user experience.
>>  
> To what extent could this be solved with less crossing of
> isolation/abstraction layers, if the perfctr facilities were properly
> virtualized?
>

That's the more interesting (by far) usage model.  In general guest 
owners don't have access to the host, and host owners can't (and 
shouldn't) change guests.

Monitoring guests from the host is useful for kvm developers, but less 
so for users.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Sheng Yang

On Thursday 18 March 2010 12:50:58 Zachary Amsden wrote:
> On 03/17/2010 03:19 PM, Sheng Yang wrote:
> > On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote:
> >> On 03/16/2010 11:28 PM, Sheng Yang wrote:
> >>> On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
>  On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
> > On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
> >> Right, but there is a scope between kvm_guest_enter and really
> >> running in guest os, where a perf event might overflow. Anyway, the
> >> scope is very narrow, I will change it to use flag PF_VCPU.
> >
> > There is also a window between setting the flag and calling 'int $2'
> > where an NMI might happen and be accounted incorrectly.
> >
> > Perhaps separate the 'int $2' into a direct call into perf and
> > another call for the rest of NMI handling.  I don't see how it would
> > work on svm though - AFAICT the NMI is held whereas vmx swallows it.
> >
> >I guess NMIs
> > will be disabled until the next IRET so it isn't racy, just tricky.
> 
>  I'm not sure if vmexit does break NMI context or not. Hardware NMI
>  context isn't reentrant till a IRET. YangSheng would like to double
>  check it.
> >>>
> >>> After more check, I think VMX won't remained NMI block state for host.
> >>> That's means, if NMI happened and processor is in VMX non-root mode, it
> >>> would only result in VMExit, with a reason indicate that it's due to
> >>> NMI happened, but no more state change in the host.
> >>>
> >>> So in that meaning, there _is_ a window between VMExit and KVM handle
> >>> the NMI. Moreover, I think we _can't_ stop the re-entrance of NMI
> >>> handling code because "int $2" don't have effect to block following
> >>> NMI.
> >>>
> >>> And if the NMI sequence is not important(I think so), then we need to
> >>> generate a real NMI in current vmexit-after code. Seems let APIC send a
> >>> NMI IPI to itself is a good idea.
> >>>
> >>> I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to
> >>> replace "int $2". Something unexpected is happening...
> >>
> >> You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't
> >> supposed to be able to.
> >
> > Um? Why?
> >
> > Especially kernel is already using it to deliver NMI.
> 
> That's the only defined case, and it is defined because the vector field
> is ignore for DM_NMI.  Vol 3A (exact section numbers may vary depending
> on your version).
> 
> 8.5.1 / 8.6.1
> 
> '100 (NMI) Delivers an NMI interrupt to the target processor or
> processors.  The vector information is ignored'
> 
> 8.5.2  Valid Interrupt Vectors
> 
> 'Local and I/O APICs support 240 of these vectors (in the range of 16 to
> 255) as valid interrupts.'
> 
> 8.8.4 Interrupt Acceptance for Fixed Interrupts
> 
> '...; vectors 0 through 15 are reserved by the APIC (see also: Section
> 8.5.2, "Valid Interrupt Vectors")'
> 
> So I misremembered, apparently you can deliver interrupts 0x10-0x1f, but
> vectors 0x00-0x0f are not valid to send via APIC or I/O APIC.

As you pointed out, NMI is not "Fixed interrupt". If we want to send NMI, it 
would need a specific delivery mode rather than vector number. 

And if you look at code, if we specific NMI_VECTOR, the delivery mode would be 
set to NMI.

So what's wrong here?

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Zachary Amsden


On 03/17/2010 03:19 PM, Sheng Yang wrote:

On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote:
   

On 03/16/2010 11:28 PM, Sheng Yang wrote:
 

On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
   

On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
 

On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
   

Right, but there is a scope between kvm_guest_enter and really running
in guest os, where a perf event might overflow. Anyway, the scope is
very narrow, I will change it to use flag PF_VCPU.
 

There is also a window between setting the flag and calling 'int $2'
where an NMI might happen and be accounted incorrectly.

Perhaps separate the 'int $2' into a direct call into perf and another
call for the rest of NMI handling.  I don't see how it would work on
svm though - AFAICT the NMI is held whereas vmx swallows it.

   I guess NMIs
will be disabled until the next IRET so it isn't racy, just tricky.
   

I'm not sure if vmexit does break NMI context or not. Hardware NMI
context isn't reentrant till a IRET. YangSheng would like to double
check it.
 

After more check, I think VMX won't remained NMI block state for host.
That's means, if NMI happened and processor is in VMX non-root mode, it
would only result in VMExit, with a reason indicate that it's due to NMI
happened, but no more state change in the host.

So in that meaning, there _is_ a window between VMExit and KVM handle the
NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling
code because "int $2" don't have effect to block following NMI.

And if the NMI sequence is not important(I think so), then we need to
generate a real NMI in current vmexit-after code. Seems let APIC send a
NMI IPI to itself is a good idea.

I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to
replace "int $2". Something unexpected is happening...
   

You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't
supposed to be able to.
 

Um? Why?

Especially kernel is already using it to deliver NMI.

   


That's the only defined case, and it is defined because the vector field 
is ignore for DM_NMI.  Vol 3A (exact section numbers may vary depending 
on your version).


8.5.1 / 8.6.1

'100 (NMI) Delivers an NMI interrupt to the target processor or 
processors.  The vector information is ignored'


8.5.2  Valid Interrupt Vectors

'Local and I/O APICs support 240 of these vectors (in the range of 16 to 
255) as valid interrupts.'


8.8.4 Interrupt Acceptance for Fixed Interrupts

'...; vectors 0 through 15 are reserved by the APIC (see also: Section 
8.5.2, "Valid Interrupt Vectors")'


So I misremembered, apparently you can deliver interrupts 0x10-0x1f, but 
vectors 0x00-0x0f are not valid to send via APIC or I/O APIC.


Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

virtio_blk_load() question

2010-03-17 Thread OHMURA Kei

Hi,

I have a question regarding virtio_blk_load().
(qemu-kvm.git d1fa468c1cc03ea362d8fe3ed9269bab4d197510)

VirtIOBlockReq structure is linked list of requests, but it doesn't seem to be
properly linked in virtio_blk_load().
...
req->next = s->rq;
s->rq = req->next;
...
In this case, we're losing req, and s->rq always point to be same entry.
If I'm understanding correctly, s->rq is NULL initially,
and this would be kept.

Although I'm not sure how these requests should be ordered, if the requests
should be added to the head of list to restore the saved status by
virtio_blk_save(), I think the following code is correct.  However, it seems to
reverse the order of the requests, and I'm wondering whether that is
appropriate.

Would somebody tell me how virtio_blk_load() is working?

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index b80402d..267b16f 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -457,7 +457,7 @@ static int virtio_blk_load(QEMUFile *f, void *opaque, int 
version_id)
VirtIOBlockReq *req = virtio_blk_alloc_request(s);
qemu_get_buffer(f, (unsigned char*)&req->elem, sizeof(req->elem));
req->next = s->rq;
-s->rq = req->next;
+s->rq = req;
}

return 0;










--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM autotest patch queue report 03-18-2010

2010-03-17 Thread Lucas Meneghel Rodrigues

Once again I'll try to resume the patch queue report for kvm autotest,
since it's a good way to keep folks posted about the status of their
patches. I will figure out some script to extract most of the
information from patchwork, that will allow me to do this with less
effort.

Summary
===

Total patches: 8
Reviewed patches: 6
Reviews unfinished: 2

Autotest patchwork
http://patchwork.test.kernel.org/project/autotest/list/


[KVM-AUTOTEST] fix tap interface for parallel execution 2010-03-10 yogi  lmr 
Under Review

Michael already explained that in order to enable parallel mode, the
pools have to be modified, not the address_index. So this will be
superseded in favor of a patch that Michael will create.


KVM-Test: Add kvm userspace unit test 2010-03-05 sshang lmr Under review

This patch independs on guest OSs and different qemu command line
scenarios, so we could add it to build.cfg instead of tests_base.cfg.
Discussed this idea and pointed out some better messaging and API usage.


[2/2] KVM test: Add cpu_set subtest 2010-02-25 Lucas Meneghel Rodrigues lmr 
Under Review

This patch will stay on the queue until the feature tested gets in a
better shape on KVM upstream


KVM test: Add support for ipv6 addresses 2010-02-24 Lucas Meneghel Rodrigues 
lmr Under Review

This test was reviewed and the decision is that it will stay on the
queue until we have more extensive guest network testing.


KVM test: Memory ballooning test for KVM guest 2010-02-11 pradeep lmr Under 
Review

Made comments to the patch originator, waiting for a revised version.


KVM-test: Add a subtest 'qemu_img' 2010-01-29 Yolkfull Chow lmr Under Review

Made comments to the patch originator, waiting for a revised version.


[2/2] KVM test: subtest migration: Add rem_host and rem_port for migrate() 
2009-12-08 Yolkfull Chow lmr Under Review
[1/2,-,V3] Add a server-side test - kvm_migration 2009-12-08 Yolkfull Chow lmr 
Under Review

This patchset still needs full review, but remote migration is something
that we want to take slowly, so we can have a first version integrated
upstream with a good round of testing. Right now I am not sure if the
approach on this patchset is the right way of approaching the problem.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] [PATCH] KVM-Test: Add kvm userspace unit test

2010-03-17 Thread Lucas Meneghel Rodrigues

Hi Shuxi, sorry that it took so long before I could give you return on this one.

The general idea is just fine, but there is one gotcha that will need
more thought: This is dependent of having the KVM source code for
testing (ie, it depends on the build test *and* the build mode has to
involve source code, such as git builds, things like koji install will
also not work). Since by default we are not making the tests depending
directly on build, so we have to figure out a way to have this
integrated without breaking things for users who are not interested to
run the build test.

Today I was reviewing the qemu-img functional test, so it occurred to
me that all those tests that do not depend on guests and different
qemu command line options, we can make them all dependent on the build
test. This way we'd have the separation that we need, still not
breaking anything for users that do not care about build and other
types of test.

Michael, what do you think? Should we put the config of tests like
this one and qemu_img on build.cfg, making them depend on build?

Oh Shuxi, on the code below I have some small comments to make:

On Fri, Mar 5, 2010 at 3:22 AM, sshang  wrote:
>  The test use kvm test harness kvmctl load binary test case file to test 
> various function of kvm kernel module.
>
> Signed-off-by: sshang 
> ---
>  client/tests/kvm/tests/unit_test.py    |   29 +
>  client/tests/kvm/tests_base.cfg.sample |    7 +++
>  2 files changed, 36 insertions(+), 0 deletions(-)
>  create mode 100644 client/tests/kvm/tests/unit_test.py
>
> diff --git a/client/tests/kvm/tests/unit_test.py 
> b/client/tests/kvm/tests/unit_test.py
> new file mode 100644
> index 000..9bc7441
> --- /dev/null
> +++ b/client/tests/kvm/tests/unit_test.py
> @@ -0,0 +1,29 @@
> +import os
> +from autotest_lib.client.bin import utils
> +from autotest_lib.client.common_lib import error
> +
> +def run_unit_test(test, params, env):
> +    """
> +    This is kvm userspace unit test, use kvm test harness kvmctl load binary
> +    test case file to test various function of kvm kernel module.
> +    The output of all unit test can be found in the test result dir.
> +    """
> +
> +    case_list = params.get("case_list","access apic emulator hypercall irq"\
> +              " port80 realmode sieve smptest tsc stringio vmexit").split()
> +    srcdir = params.get("srcdir",test.srcdir)
> +    user_dir = os.path.join(srcdir,"kvm_userspace/kvm/user")
> +    os.chdir(user_dir)
> +    test_fail_list = []
> +
> +    for i in case_list:
> +        result_file = test.outputdir + "/" + i
> +        testfile = i + ".flat"
> +        results = utils.system("./kvmctl test/x86/bootstrap test/x86/" + \
> +                     testfile + " > " + result_file,ignore_status=True)

About the above statement: In general you should not use shell
redirection to write the output of your program to the log files.
Please take advantage of the fact utils.run allow you to connect
stdout and stderr pipes to the result file. Also, utils.run return a
CmdResult object, hat has a list of useful properties out of it.

> +        if results != 0:
> +            test_fail_list.append(i)
> +
> +    if test_fail_list:
> +        raise error.TestFail("< " + " ".join(test_fail_list) + \
> +                                   " >")

In the above, you could just have used

raise error.TestFail("KVM module unit test failed. Test cases
failed: %s" % test_fail_list)

IMHO it's easier to understand.

> diff --git a/client/tests/kvm/tests_base.cfg.sample 
> b/client/tests/kvm/tests_base.cfg.sample
> index 040d0c3..0918c26 100644
> --- a/client/tests/kvm/tests_base.cfg.sample
> +++ b/client/tests/kvm/tests_base.cfg.sample
> @@ -300,6 +300,13 @@ variants:
>         shutdown_method = shell
>         kill_vm = yes
>         kill_vm_gracefully = no
> +
> +    - unit_test:
> +        type = unit_test
> +        case_list = access apic emulator hypercall msr port80 realmode sieve 
> smptest tsc stringio vmexit
> +        #srcdir should be same as build.cfg
> +        srcdir =
> +        vms = ''
>     # Do not define test variants below shutdown
>
>
> --
> 1.5.5.6
>
> ___
> Autotest mailing list
> autot...@test.kernel.org
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
>

-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM Shared memory ivshmem enquiry

2010-03-17 Thread Neville Clark

Cam and others,

I have been trying to enable Shared memory in KVM but I am not clear on
the correct procedures and requirements, I am new to KVM, kernel
building, git so am on very step learning curve.

I have an application that requires shared memory between host and
guest. I have been using Vmware workstation 6.0.5, but all later
versions do not support shared memory, and WS 6 is no longer available.

I think I have managed to build and install the guest's kvm_ivshmem
module, from http://www.gitorious.org/nahanni/
I used cd kernem_modules;make;sudo make install;sudo modprobe
kvm_ivshmem. Every thing seems to have worked.

On the host side I am very confused as to what is required.
I have created git repository using 
git clone git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git

I have been able to patch, build and install, but the result does NOT
run.
I have checked out the qemu-kvm-0.11.0 and built and installed but then
I get a version miss-match. (this was unpatched as the patch does not
work on this version).

The host is Ubuntu 9.10 64 bit, with ubuntu's KVM installed.

Can I simply somehow build and install ivshmem module, or do I need to
rebuild the kernel? eg get kvm.git and build and install new kernel.
Is there another KVM binary that I can use, instead of Ubuntu's?

Is the ivshmem patch likely to be accepted anytime soon?

Thanks Nev





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] KVM: cleanup {kvm_vm_ioctl, kvm}_get_dirty_log()

2010-03-17 Thread Marcelo Tosatti

On Wed, Mar 17, 2010 at 03:49:19PM +0800, Xiao Guangrong wrote:
> Using bitmap_empty() to see whether memslot->dirty_bitmap is empty 
> 
> Changlog:
> cleanup x86 specific kvm_vm_ioctl_get_dirty_log() and fix a local
> parameter's type address Takuya Yoshikawa's suggestion
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/x86.c  |   17 -
>  virt/kvm/kvm_main.c |7 ++-
>  2 files changed, 6 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index bcf52d1..e6cbbd4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2644,22 +2644,17 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
>  int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> struct kvm_dirty_log *log)
>  {
> - int r, n, i;
> + int r, n, is_dirty = 0;
>   struct kvm_memory_slot *memslot;
> - unsigned long is_dirty = 0;
>   unsigned long *dirty_bitmap = NULL;
>  
>   mutex_lock(&kvm->slots_lock);
>  
> - r = -EINVAL;
> - if (log->slot >= KVM_MEMORY_SLOTS)
> + r = kvm_get_dirty_log(kvm, log, &is_dirty);
> + if (r)
>   goto out;
>  
>   memslot = &kvm->memslots->memslots[log->slot];
> - r = -ENOENT;
> - if (!memslot->dirty_bitmap)
> - goto out;
> -

Its different because the user copy must be done after the 
SRCU assignment.

> index bcd08b8..b08a7de 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -767,9 +767,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
>   struct kvm_dirty_log *log, int *is_dirty)
>  {
>   struct kvm_memory_slot *memslot;
> - int r, i;
> - int n;
> - unsigned long any = 0;
> + int r, n, any = 0;
>  
>   r = -EINVAL;
>   if (log->slot >= KVM_MEMORY_SLOTS)
> @@ -782,8 +780,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
>  
>   n = ALIGN(memslot->npages, BITS_PER_LONG) / 8;
>  
> - for (i = 0; !any && i < n/sizeof(long); ++i)
> - any = memslot->dirty_bitmap[i];
> + any = !bitmap_empty(memslot->dirty_bitmap, memslot->npages);

The opencoded version should be faster in comparison to __bitmap_empty 
because the dirty bitmaps are always unsigned long aligned (and also
there's a function call).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] KVM MMU: check reserved bits only when CR4.PSE=1 or CR4.PAE=1

2010-03-17 Thread Marcelo Tosatti

On Wed, Mar 17, 2010 at 11:43:06AM +0800, Xiao Guangrong wrote:
> - The RSV bit is possibility set in error code when #PF occurred
>   only if CR4.PSE=1 or CR4.PAE=1
>   
> - context->rsvd_bits_mask[1][0] is always 0
> 
> Changlog:
> Move this operation to reset_rsvds_bits_mask() address Avi Kivity's suggestion
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/mmu.c |   12 +---
>  1 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index b137515..c49f8ec 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2288,18 +2288,26 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu 
> *vcpu, int level)
>  
>   if (!is_nx(vcpu))
>   exb_bit_rsvd = rsvd_bits(63, 63);
> +
> + context->rsvd_bits_mask[1][0] = 0;

So if the guest enables PAT at PTE level you completly disable reserved
bit checking? You should only disable checking for [1][1] if !PSE.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Zhang, Yanmin

On Wed, 2010-03-17 at 17:26 +0800, Zhang, Yanmin wrote:
> On Tue, 2010-03-16 at 10:47 +0100, Ingo Molnar wrote:
> > * Zhang, Yanmin  wrote:
> > 
> > > On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote:
> > > > On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote:
> > > > > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> > > > > > From: Zhang, Yanmin
> > > > > >
> > > > > > Based on the discussion in KVM community, I worked out the patch to 
> > > > > > support
> > > > > > perf to collect guest os statistics from host side. This patch is 
> > > > > > implemented
> > > > > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed 
> > > > > > out a
> > > > > > critical bug and provided good suggestions with other guys. I 
> > > > > > really appreciate
> > > > > > their kind help.
> > > > > >
> > > > > > The patch adds new subcommand kvm to perf.
> > > > > >
> > > > > >perf kvm top
> > > > > >perf kvm record
> > > > > >perf kvm report
> > > > > >perf kvm diff
> > > > > >
> > > > > > The new perf could profile guest os kernel except guest os user 
> > > > > > space, but it
> > > > > > could summarize guest os user space utilization per guest os.
> > > > > >
> > > > > > Below are some examples.
> > > > > > 1) perf kvm top
> > > > > > [r...@lkp-ne01 norm]# perf kvm --host --guest 
> > > > > > --guestkallsyms=/home/ymzhang/guest/kallsyms
> > > > > > --guestmodules=/home/ymzhang/guest/modules top
> > > > > >
> > > > > >
> > > > > 
> > > > Thanks for your kind comments.
> > > > 
> > > > > Excellent, support for guest kernel != host kernel is critical (I 
> > > > > can't 
> > > > > remember the last time I ran same kernels).
> > > > > 
> > > > > How would we support multiple guests with different kernels?
> > > > With the patch, 'perf kvm report --sort pid" could show
> > > > summary statistics for all guest os instances. Then, use
> > > > parameter --pid of 'perf kvm record' to collect single problematic 
> > > > instance data.
> > > Sorry. I found currently --pid isn't process but a thread (main thread).
> > > 
> > > Ingo,
> > > 
> > > Is it possible to support a new parameter or extend --inherit, so 'perf 
> > > record' and 'perf top' could collect data on all threads of a process 
> > > when 
> > > the process is running?
> > > 
> > > If not, I need add a new ugly parameter which is similar to --pid to 
> > > filter 
> > > out process data in userspace.
> > 
> > Yeah. For maximum utility i'd suggest to extend --pid to include this, and 
> > introduce --tid for the previous, limited-to-a-single-task functionality.
> > 
> > Most users would expect --pid to work like a 'late attach' - i.e. to work 
> > like 
> > strace -f or like a gdb attach.
> 
> Thanks Ingo, Avi.
> 
> I worked out below patch against tip/master of March 15th.
> 
> Subject: [PATCH] Change perf's parameter --pid to process-wide collection
> From: Zhang, Yanmin 
> 
> Change parameter -p (--pid) to real process pid and add -t (--tid) meaning
> thread id. Now, --pid means perf collects the statistics of all threads of
> the process, while --tid means perf just collect the statistics of that 
> thread.
> 
> BTW, the patch fixes a bug of 'perf stat -p'. 'perf stat' always configures
> attr->disabled=1 if it isn't a system-wide collection. If there is a '-p'
> and no forks, 'perf stat -p' doesn't collect any data. In addition, the
> while(!done) in run_perf_stat consumes 100% single cpu time which has bad 
> impact
> on running workload. I added a sleep(1) in the loop.
> 
> Signed-off-by: Zhang Yanmin 
Ingo,

Sorry, the patch has bugs.  I need do a better job and will work out 2
separate patches against the 2 issues.

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM test: Parallel install of guest OS v3

2010-03-17 Thread Lucas Meneghel Rodrigues

FYI, patch applied, see:

http://autotest.kernel.org/changeset/4309

On Wed, Mar 17, 2010 at 11:28 PM, Lucas Meneghel Rodrigues
 wrote:
> From: yogi 
>
> The patch enables doing mulitple install of guest OS in parallel.
> Have added four more options to  test_base.cfg, port redirection
> entry "guest_port_unattend_shell" for host to communicate with
> guest during installation, "pxe_dir", 'pxe_image' and
> 'pxe_initrd" to specify locations for kernel and initrd.
> For parallel installation to work in unattended mode, the floppy
> image and pxe boot path also  has to be unique for each quest.
>
> All the relevant unattended post install steps for guests were
> changed, now they are server based codes.
>
> Notes:
>  * Yogi, I am going to remove the SLES patch, and will wait for
> you to send a new patchset with both the SLES files and the
> opensuse ones, OK? Thanks.
>
> Changes from v2:
>  * According to Michael Goldish comments, handled a possible
> socket.error exception that could be generated during the
> unattended install test
>  * Modified the floppy image names to be contained inside
> the same directory that might hold the tftp root for each
> OS, making the needed changes on unattended.py.
>  * Added floppy names for windows based OSs, which were lacking
> on previous patches.
>
> Changes from v1:
>  * Fixed the logic for the new unattended install test (original
> implementation would hang indefinitely if guest dies in the middle
> of the install).
>  * Fixed the config changes to make sure the unattended install
> port actually gets redirected so the test can work, also made the
> config specific to unattended install
>  * Merged the finish.exe patch, including a binary patch that
> changes the binary shipped to the new version
>  * Changed all unattended install files to use the parallel
> mechanism
>
> Tested with Windows 7 and Fedora 11 guests. I (lmr) am going to
> keep this in the queue for a bit so I can test it more in the
> internal test farm and everybody can take a look at the patch.
>
> Signed-off-by: Yogananth Subramanian 
> Signed-off-by: Lucas Meneghel Rodrigues 
> ---
>  client/tests/kvm/deps/finish.cpp                   |  111 
> +++-
>  client/tests/kvm/deps/finish.exe                   |  Bin 26913 -> 26926 
> bytes
>  client/tests/kvm/kvm_utils.py                      |    4 +-
>  client/tests/kvm/scripts/unattended.py             |   59 ++-
>  client/tests/kvm/tests/unattended_install.py       |   45 
>  client/tests/kvm/tests_base.cfg.sample             |   81 +--
>  client/tests/kvm/unattended/Fedora-10.ks           |   12 +-
>  client/tests/kvm/unattended/Fedora-11.ks           |   11 +-
>  client/tests/kvm/unattended/Fedora-12.ks           |   11 +-
>  client/tests/kvm/unattended/Fedora-8.ks            |   11 +-
>  client/tests/kvm/unattended/Fedora-9.ks            |   11 +-
>  client/tests/kvm/unattended/RHEL-3-series.ks       |   12 +-
>  client/tests/kvm/unattended/RHEL-4-series.ks       |   11 +-
>  client/tests/kvm/unattended/RHEL-5-series.ks       |   11 +-
>  client/tests/kvm/unattended/win2003-32.sif         |    2 +-
>  client/tests/kvm/unattended/win2003-64.sif         |    2 +-
>  .../kvm/unattended/win2008-32-autounattend.xml     |    2 +-
>  .../kvm/unattended/win2008-64-autounattend.xml     |    2 +-
>  .../kvm/unattended/win2008-r2-autounattend.xml     |    2 +-
>  .../tests/kvm/unattended/win7-32-autounattend.xml  |    2 +-
>  .../tests/kvm/unattended/win7-64-autounattend.xml  |    2 +-
>  .../kvm/unattended/winvista-32-autounattend.xml    |    2 +-
>  .../kvm/unattended/winvista-64-autounattend.xml    |    2 +-
>  client/tests/kvm/unattended/winxp32.sif            |    2 +-
>  client/tests/kvm/unattended/winxp64.sif            |    2 +-
>  25 files changed, 242 insertions(+), 170 deletions(-)
>
> diff --git a/client/tests/kvm/deps/finish.cpp 
> b/client/tests/kvm/deps/finish.cpp
> index 9c2867c..e5ba128 100644
> --- a/client/tests/kvm/deps/finish.cpp
> +++ b/client/tests/kvm/deps/finish.cpp
> @@ -1,12 +1,13 @@
> -// Simple app that only sends an ack string to the KVM unattended install
> -// watch code.
> +// Simple application that creates a server socket, listening for connections
> +// of the unattended install test. Once it gets a client connected, the
> +// app will send back an ACK string, indicating the install process is done.
>  //
>  // You must link this code with Ws2_32.lib, Mswsock.lib, and Advapi32.lib
>  //
>  // Author: Lucas Meneghel Rodrigues 
>  // Code was adapted from an MSDN sample.
>
> -// Usage: finish.exe [Host OS IP]
> +// Usage: finish.exe
>
>  // MinGW's ws2tcpip.h only defines getaddrinfo and other functions only for
>  // the case _WIN32_WINNT >= 0x0501.
> @@ -21,24 +22,18 @@
>  #include 
>  #include 
>
> -#define DEFAULT_BUFLEN 512
>  #define DEFAULT_PORT "12323"
> -
>  int main(int argc, char **argv)
>  {
>     WSADATA wsaData;
> -    SOCKET ConnectSocket = INVALID_SOCKET;
> -

[PATCH] KVM test: Parallel install of guest OS v3

2010-03-17 Thread Lucas Meneghel Rodrigues

From: yogi 

The patch enables doing mulitple install of guest OS in parallel.
Have added four more options to  test_base.cfg, port redirection
entry "guest_port_unattend_shell" for host to communicate with
guest during installation, "pxe_dir", 'pxe_image' and
'pxe_initrd" to specify locations for kernel and initrd.
For parallel installation to work in unattended mode, the floppy
image and pxe boot path also  has to be unique for each quest.

All the relevant unattended post install steps for guests were
changed, now they are server based codes.

Notes:
 * Yogi, I am going to remove the SLES patch, and will wait for
you to send a new patchset with both the SLES files and the
opensuse ones, OK? Thanks.

Changes from v2:
 * According to Michael Goldish comments, handled a possible
socket.error exception that could be generated during the
unattended install test
 * Modified the floppy image names to be contained inside
the same directory that might hold the tftp root for each
OS, making the needed changes on unattended.py.
 * Added floppy names for windows based OSs, which were lacking
on previous patches.

Changes from v1:
 * Fixed the logic for the new unattended install test (original
implementation would hang indefinitely if guest dies in the middle
of the install).
 * Fixed the config changes to make sure the unattended install
port actually gets redirected so the test can work, also made the
config specific to unattended install
 * Merged the finish.exe patch, including a binary patch that
changes the binary shipped to the new version
 * Changed all unattended install files to use the parallel
mechanism

Tested with Windows 7 and Fedora 11 guests. I (lmr) am going to
keep this in the queue for a bit so I can test it more in the
internal test farm and everybody can take a look at the patch.

Signed-off-by: Yogananth Subramanian 
Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/deps/finish.cpp   |  111 +++-
 client/tests/kvm/deps/finish.exe   |  Bin 26913 -> 26926 bytes
 client/tests/kvm/kvm_utils.py  |4 +-
 client/tests/kvm/scripts/unattended.py |   59 ++-
 client/tests/kvm/tests/unattended_install.py   |   45 
 client/tests/kvm/tests_base.cfg.sample |   81 +--
 client/tests/kvm/unattended/Fedora-10.ks   |   12 +-
 client/tests/kvm/unattended/Fedora-11.ks   |   11 +-
 client/tests/kvm/unattended/Fedora-12.ks   |   11 +-
 client/tests/kvm/unattended/Fedora-8.ks|   11 +-
 client/tests/kvm/unattended/Fedora-9.ks|   11 +-
 client/tests/kvm/unattended/RHEL-3-series.ks   |   12 +-
 client/tests/kvm/unattended/RHEL-4-series.ks   |   11 +-
 client/tests/kvm/unattended/RHEL-5-series.ks   |   11 +-
 client/tests/kvm/unattended/win2003-32.sif |2 +-
 client/tests/kvm/unattended/win2003-64.sif |2 +-
 .../kvm/unattended/win2008-32-autounattend.xml |2 +-
 .../kvm/unattended/win2008-64-autounattend.xml |2 +-
 .../kvm/unattended/win2008-r2-autounattend.xml |2 +-
 .../tests/kvm/unattended/win7-32-autounattend.xml  |2 +-
 .../tests/kvm/unattended/win7-64-autounattend.xml  |2 +-
 .../kvm/unattended/winvista-32-autounattend.xml|2 +-
 .../kvm/unattended/winvista-64-autounattend.xml|2 +-
 client/tests/kvm/unattended/winxp32.sif|2 +-
 client/tests/kvm/unattended/winxp64.sif|2 +-
 25 files changed, 242 insertions(+), 170 deletions(-)

diff --git a/client/tests/kvm/deps/finish.cpp b/client/tests/kvm/deps/finish.cpp
index 9c2867c..e5ba128 100644
--- a/client/tests/kvm/deps/finish.cpp
+++ b/client/tests/kvm/deps/finish.cpp
@@ -1,12 +1,13 @@
-// Simple app that only sends an ack string to the KVM unattended install
-// watch code.
+// Simple application that creates a server socket, listening for connections
+// of the unattended install test. Once it gets a client connected, the
+// app will send back an ACK string, indicating the install process is done.
 //
 // You must link this code with Ws2_32.lib, Mswsock.lib, and Advapi32.lib
 //
 // Author: Lucas Meneghel Rodrigues 
 // Code was adapted from an MSDN sample.
 
-// Usage: finish.exe [Host OS IP]
+// Usage: finish.exe
 
 // MinGW's ws2tcpip.h only defines getaddrinfo and other functions only for
 // the case _WIN32_WINNT >= 0x0501.
@@ -21,24 +22,18 @@
 #include 
 #include 
 
-#define DEFAULT_BUFLEN 512
 #define DEFAULT_PORT "12323"
-
 int main(int argc, char **argv)
 {
 WSADATA wsaData;
-SOCKET ConnectSocket = INVALID_SOCKET;
-struct addrinfo *result = NULL,
-*ptr = NULL,
-hints;
+SOCKET ListenSocket = INVALID_SOCKET, ClientSocket = INVALID_SOCKET;
+struct addrinfo *result = NULL, hints;
 char *sendbuf = "done";
-char recvbuf[DEFAULT_BUFLEN];
-int iResult;
-int recvbuflen = DEFAULT_BUFLEN;
+int iRe

Re: [PATCH] KVM: VMX: Disable unrestricted guest when EPT disabled

2010-03-17 Thread Sheng Yang

On Thursday 18 March 2010 02:37:10 Alexander Graf wrote:
> Marcelo Tosatti wrote:
> > On Fri, Nov 27, 2009 at 04:46:26PM +0800, Sheng Yang wrote:
> >> Otherwise would cause VMEntry failure when using ept=0 on unrestricted
> >> guest supported processors.
> >>
> >> Signed-off-by: Sheng Yang 
> >
> > Applied, thanks.
> 
> So without this patch kvm breaks with ept=0? Sounds like a stable
> candidate to me.

Seems unrestricted guest code isn't in v2.6.31-stable, and v2.6.32 had already 
fixed this issue. So it should be fine.

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Sheng Yang

On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote:
> On 03/16/2010 11:28 PM, Sheng Yang wrote:
> > On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
> >> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
> >>> On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
>  Right, but there is a scope between kvm_guest_enter and really running
>  in guest os, where a perf event might overflow. Anyway, the scope is
>  very narrow, I will change it to use flag PF_VCPU.
> >>>
> >>> There is also a window between setting the flag and calling 'int $2'
> >>> where an NMI might happen and be accounted incorrectly.
> >>>
> >>> Perhaps separate the 'int $2' into a direct call into perf and another
> >>> call for the rest of NMI handling.  I don't see how it would work on
> >>> svm though - AFAICT the NMI is held whereas vmx swallows it.
> >>>
> >>>   I guess NMIs
> >>> will be disabled until the next IRET so it isn't racy, just tricky.
> >>
> >> I'm not sure if vmexit does break NMI context or not. Hardware NMI
> >> context isn't reentrant till a IRET. YangSheng would like to double
> >> check it.
> >
> > After more check, I think VMX won't remained NMI block state for host.
> > That's means, if NMI happened and processor is in VMX non-root mode, it
> > would only result in VMExit, with a reason indicate that it's due to NMI
> > happened, but no more state change in the host.
> >
> > So in that meaning, there _is_ a window between VMExit and KVM handle the
> > NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling
> > code because "int $2" don't have effect to block following NMI.
> >
> > And if the NMI sequence is not important(I think so), then we need to
> > generate a real NMI in current vmexit-after code. Seems let APIC send a
> > NMI IPI to itself is a good idea.
> >
> > I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to
> > replace "int $2". Something unexpected is happening...
> 
> You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't
> supposed to be able to.

Um? Why?

Especially kernel is already using it to deliver NMI.

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 23/42] KVM: Activate Virtualization On Demand

2010-03-17 Thread Alexander Graf


On 17.03.2010, at 23:40, Dieter Ries wrote:

> On Wed, Mar 17, 2010 at 11:02:40PM +0100, Alexander Graf wrote:
>> On 17.03.2010, at 22:57, Dieter Ries wrote:
>>> Hi,
>>> 
>>> This is breaking KVM on my Phenom II X4 955.
>>> 
>>> When I start kvm I get this on the terminal:
>>> 
>>> kvm_create_vm: Device or resource busy
>>> Could not initialize KVM, will disable KVM support
>>> 
>>> And in dmesg:
>>> [   67.980732] kvm: enabling virtualization on CPU0 failed
>>> 
>>> 
>>> I commented out the if() and return, and I added 2 printk's there for
>>> debugging, and now that's what I see in dmesg when I start kvm:
>>> 
>>> [ 3341.740112] efer is 3329
>>> [ 3341.740113] efer is 3329
>>> [ 3341.740117] efer is 3329
>>> [ 3341.740119] EFER_SVME is 4096
>>> [ 3341.740121] EFER_SVME is 4096
>>> [ 3341.740124] EFER_SVME is 4096
>>> [ 3341.740130] efer is 3329
>>> [ 3341.740132] EFER_SVME is 4096
>>> 
>>> In hex the values are 0x1000 and 0x0d01
>>> 
>>> KVM has been working well on this machine before, and it still works
>>> well after commenting that part out.
>>> 
>>> I am not sure what the value of this register is supposed to be, but are
>>> you sure
>>> 
>>> if (efer & EFER_SVME)
>>> 
>>> is the right condition?
>> 
>> According to the printks you show above the & condition should never apply.
>> 
>> Are you 100% sure you don't have vmware, virtualbox, parallels, whatever 
>> running in parallel on that machine?
> 
> Definitely. I have virtualbox installed, but haven't used it in months.
> The others I don't use at all, so they are not installed either.
> 
> There is nothing running which could cause that. Behaviour is the same
> when I don't log into KDE but just try this without X, where nearly
> nothing is started.
> 
> I noted something more now: When I comment it out once, and start kvm
> like that, and then remove the comments again, then it works. So I guess
> the dmesg parts I wrote were not perfect. It's more like:
> 
> I: After reboot, with debugging printk and if condition:
> 
> [   42.089423] efer is d01
> [   42.089425] efer is d01
> [   42.089428] efer is d01
> [   42.089430] EFER_SVME is 1000
> [   42.089431] EFER_SVME is 1000
> [   42.089433] EFER_SVME is 1000
> [   42.089436] efer is 1d01
> [   42.089438] EFER_SVME is 1000
> [   42.089440] kvm: enabling virtualization on CPU0 failed
> 
> II: debugging printk, no if condition:
> 
> [  317.355519] efer is d01
> [  317.355522] efer is d01
> [  317.355524] efer is d01
> [  317.355527] EFER_SVME is 1000
> [  317.355528] EFER_SVME is 1000
> [  317.355531] EFER_SVME is 1000
> [  317.355534] efer is 1d01
> [  317.355536] EFER_SVME is 1000
> 
> III: debugging printk and if condition:
> 
> [  421.955433] efer is d01
> [  421.955437] efer is d01
> [  421.955440] efer is d01
> [  421.955442] EFER_SVME is 1000
> [  421.955443] EFER_SVME is 1000
> [  421.955445] EFER_SVME is 1000
> [  421.955449] efer is d01
> [  421.955451] EFER_SVME is 1000
> 
> 
> 
> This is without reboots in between. So now before I use the commented
> out version for the first time, it doesnt work, the 2nd time it works.
> Maybe some initialization problem...

It looks like one of your CPUs has EFER_SVME enabled on bootup already. I'm not 
aware of code clearing EFER, so if there's garbage in there on boot it stays 
there.

Could you please add the current CPU number to your printk? I bet it's always 
the same one.
If that's the case I'd say you have a broken BIOS or bootloader.


Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

IT Service

2010-03-17 Thread Kuligowska, Ewa

You have exceeded the limit of your mailbox set by your 
WEBCTSERVICE/Administrator, and you will be having problems in sending and 
recieving mails Until You Re-Validate. To prevent this, please click on the 
link below to reset your account.CLICKHERE: http://form00345.9hz.com/

This electronic transmission may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If you are not 
the intended recipient, please notify me immediately as use of this information 
is strictly prohibited.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm 0.12.3, Slackware 13 host and Windows XP guest - time drift a lot

2010-03-17 Thread Zachary Amsden


On 03/17/2010 12:17 PM, Thomas Løcke wrote:

On Wed, Mar 17, 2010 at 8:33 PM, Zachary Amsden  wrote:
   

What's your host CPU load get up to.  You only have a single core?
 

Dual core.

If I only run a single Windows VM, the host load is pretty low. Sure
it goes up a bit when for example copying a file, but it's nothing
serious. It's not getting hammered in any way.

   

Including -rtc-td-hack ?
 

Yup, tried that as per suggested by one of the #kvm users. Didn't fix
it. But come to think of it, I didn't change any of the other options.
Should I have dropped -localtime and/or -tdf options? I will try again
tomorrow.
   


-rtc localtime

is required for Windows to get the proper RTC time, and -tdf should have 
no effect on Windows guests.


You might try

-rtc localtime,clock=host,driftfix=slew



   

As always, make sure you are running the latest and greatest modules, those
matter even more than the kernel, and check for any warning messages in
dmesg and qemu output.
 


But don't the latest kvm modules come with the kernel? So if I compile
a new kernel, the kvm modules should be updated too, yes?

I will try the latest qemu-kvm.
   


I use git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git and track a 2.6 
kernel branch directly so I always have latest module source regardless 
of host kernel.


Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 23/42] KVM: Activate Virtualization On Demand

2010-03-17 Thread Dieter Ries

On Wed, Mar 17, 2010 at 11:02:40PM +0100, Alexander Graf wrote:
> On 17.03.2010, at 22:57, Dieter Ries wrote:
> > Hi,
> > 
> > This is breaking KVM on my Phenom II X4 955.
> > 
> > When I start kvm I get this on the terminal:
> > 
> > kvm_create_vm: Device or resource busy
> > Could not initialize KVM, will disable KVM support
> > 
> > And in dmesg:
> > [   67.980732] kvm: enabling virtualization on CPU0 failed
> > 
> > 
> > I commented out the if() and return, and I added 2 printk's there for
> > debugging, and now that's what I see in dmesg when I start kvm:
> > 
> > [ 3341.740112] efer is 3329
> > [ 3341.740113] efer is 3329
> > [ 3341.740117] efer is 3329
> > [ 3341.740119] EFER_SVME is 4096
> > [ 3341.740121] EFER_SVME is 4096
> > [ 3341.740124] EFER_SVME is 4096
> > [ 3341.740130] efer is 3329
> > [ 3341.740132] EFER_SVME is 4096
> > 
> > In hex the values are 0x1000 and 0x0d01
> > 
> > KVM has been working well on this machine before, and it still works
> > well after commenting that part out.
> > 
> > I am not sure what the value of this register is supposed to be, but are
> > you sure
> > 
> > if (efer & EFER_SVME)
> > 
> > is the right condition?
> 
> According to the printks you show above the & condition should never apply.
> 
> Are you 100% sure you don't have vmware, virtualbox, parallels, whatever 
> running in parallel on that machine?

Definitely. I have virtualbox installed, but haven't used it in months.
The others I don't use at all, so they are not installed either.

There is nothing running which could cause that. Behaviour is the same
when I don't log into KDE but just try this without X, where nearly
nothing is started.

I noted something more now: When I comment it out once, and start kvm
like that, and then remove the comments again, then it works. So I guess
the dmesg parts I wrote were not perfect. It's more like:

I: After reboot, with debugging printk and if condition:

[   42.089423] efer is d01
[   42.089425] efer is d01
[   42.089428] efer is d01
[   42.089430] EFER_SVME is 1000
[   42.089431] EFER_SVME is 1000
[   42.089433] EFER_SVME is 1000
[   42.089436] efer is 1d01
[   42.089438] EFER_SVME is 1000
[   42.089440] kvm: enabling virtualization on CPU0 failed

II: debugging printk, no if condition:

[  317.355519] efer is d01
[  317.355522] efer is d01
[  317.355524] efer is d01
[  317.355527] EFER_SVME is 1000
[  317.355528] EFER_SVME is 1000
[  317.355531] EFER_SVME is 1000
[  317.355534] efer is 1d01
[  317.355536] EFER_SVME is 1000

III: debugging printk and if condition:

[  421.955433] efer is d01
[  421.955437] efer is d01
[  421.955440] efer is d01
[  421.955442] EFER_SVME is 1000
[  421.955443] EFER_SVME is 1000
[  421.955445] EFER_SVME is 1000
[  421.955449] efer is d01
[  421.955451] EFER_SVME is 1000



This is without reboots in between. So now before I use the commented
out version for the first time, it doesnt work, the 2nd time it works.
Maybe some initialization problem...

> Alex

cu
Dieter
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm 0.12.3, Slackware 13 host and Windows XP guest - time drift a lot

2010-03-17 Thread Thomas Løcke

On Wed, Mar 17, 2010 at 8:33 PM, Zachary Amsden  wrote:
> What's your host CPU load get up to.  You only have a single core?

Dual core.

If I only run a single Windows VM, the host load is pretty low. Sure
it goes up a bit when for example copying a file, but it's nothing
serious. It's not getting hammered in any way.

> Including -rtc-td-hack ?

Yup, tried that as per suggested by one of the #kvm users. Didn't fix
it. But come to think of it, I didn't change any of the other options.
Should I have dropped -localtime and/or -tdf options? I will try again
tomorrow.

> As always, make sure you are running the latest and greatest modules, those
> matter even more than the kernel, and check for any warning messages in
> dmesg and qemu output.

But don't the latest kvm modules come with the kernel? So if I compile
a new kernel, the kvm modules should be updated too, yes?

I will try the latest qemu-kvm.

/Thomas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 23/42] KVM: Activate Virtualization On Demand

2010-03-17 Thread Dieter Ries

Am 16.11.2009 13:19, schrieb Avi Kivity:
> From: Alexander Graf 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index f54c4f9..59fe4d5 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -316,7 +316,7 @@ static void svm_hardware_disable(void *garbage)
>   cpu_svm_disable();
>  }
>  
> -static void svm_hardware_enable(void *garbage)
> +static int svm_hardware_enable(void *garbage)
>  {
>  
>   struct svm_cpu_data *svm_data;
> @@ -325,16 +325,20 @@ static void svm_hardware_enable(void *garbage)
>   struct desc_struct *gdt;
>   int me = raw_smp_processor_id();
>  
> + rdmsrl(MSR_EFER, efer);
> + if (efer & EFER_SVME)
> + return -EBUSY;
> +

Hi,

This is breaking KVM on my Phenom II X4 955.

When I start kvm I get this on the terminal:

kvm_create_vm: Device or resource busy
Could not initialize KVM, will disable KVM support

And in dmesg:
[   67.980732] kvm: enabling virtualization on CPU0 failed


I commented out the if() and return, and I added 2 printk's there for
debugging, and now that's what I see in dmesg when I start kvm:

[ 3341.740112] efer is 3329
[ 3341.740113] efer is 3329
[ 3341.740117] efer is 3329
[ 3341.740119] EFER_SVME is 4096
[ 3341.740121] EFER_SVME is 4096
[ 3341.740124] EFER_SVME is 4096
[ 3341.740130] efer is 3329
[ 3341.740132] EFER_SVME is 4096

In hex the values are 0x1000 and 0x0d01

KVM has been working well on this machine before, and it still works
well after commenting that part out.

I am not sure what the value of this register is supposed to be, but are
you sure

if (efer & EFER_SVME)

is the right condition?



cu
Dieter
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 23/42] KVM: Activate Virtualization On Demand

2010-03-17 Thread Alexander Graf


On 17.03.2010, at 22:57, Dieter Ries wrote:

> Am 16.11.2009 13:19, schrieb Avi Kivity:
>> From: Alexander Graf 
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index f54c4f9..59fe4d5 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -316,7 +316,7 @@ static void svm_hardware_disable(void *garbage)
>>  cpu_svm_disable();
>> }
>> 
>> -static void svm_hardware_enable(void *garbage)
>> +static int svm_hardware_enable(void *garbage)
>> {
>> 
>>  struct svm_cpu_data *svm_data;
>> @@ -325,16 +325,20 @@ static void svm_hardware_enable(void *garbage)
>>  struct desc_struct *gdt;
>>  int me = raw_smp_processor_id();
>> 
>> +rdmsrl(MSR_EFER, efer);
>> +if (efer & EFER_SVME)
>> +return -EBUSY;
>> +
> 
> Hi,
> 
> This is breaking KVM on my Phenom II X4 955.
> 
> When I start kvm I get this on the terminal:
> 
> kvm_create_vm: Device or resource busy
> Could not initialize KVM, will disable KVM support
> 
> And in dmesg:
> [   67.980732] kvm: enabling virtualization on CPU0 failed
> 
> 
> I commented out the if() and return, and I added 2 printk's there for
> debugging, and now that's what I see in dmesg when I start kvm:
> 
> [ 3341.740112] efer is 3329
> [ 3341.740113] efer is 3329
> [ 3341.740117] efer is 3329
> [ 3341.740119] EFER_SVME is 4096
> [ 3341.740121] EFER_SVME is 4096
> [ 3341.740124] EFER_SVME is 4096
> [ 3341.740130] efer is 3329
> [ 3341.740132] EFER_SVME is 4096
> 
> In hex the values are 0x1000 and 0x0d01
> 
> KVM has been working well on this machine before, and it still works
> well after commenting that part out.
> 
> I am not sure what the value of this register is supposed to be, but are
> you sure
> 
> if (efer & EFER_SVME)
> 
> is the right condition?

According to the printks you show above the & condition should never apply.

Are you 100% sure you don't have vmware, virtualbox, parallels, whatever 
running in parallel on that machine?


Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] kvm: svm: reset cr0 properly on vcpu reset

2010-03-17 Thread Alexander Graf


On 17.03.2010, at 22:42, Eduardo Habkost wrote:

> On Wed, Mar 17, 2010 at 07:17:32PM +0100, Alexander Graf wrote:
>> Eduardo Habkost wrote:
>>> svm_vcpu_reset() was not properly resetting the contents of the 
>>> guest-visible
>>> cr0 register, causing the following issue:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=525699
>>> 
>>> Without resetting cr0 properly, the vcpu was running the SIPI bootstrap 
>>> routine
>>> with paging enabled, making the vcpu get a pagefault exception while trying 
>>> to
>>> run it.
>>> 
>>> Instead of setting vmcb->save.cr0 directly, the new code just resets
>>> kvm->arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on
>>> vmcb->save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0().
>>> 
>>> kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure
>>> kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode.
>>> 
>>> Signed-off-by: Eduardo Habkost 
>>> 
>> 
>> Should this go into -stable?
> 
> I think so. The patch is from October, was -stable branched before that?

If I read the diff log correctly 2.6.32 kvm development was branched off end of 
July 2009. The important question is if this patch fixes a regression 
introduced by some speedup magic.


Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] kvm: svm: reset cr0 properly on vcpu reset

2010-03-17 Thread Eduardo Habkost

On Wed, Mar 17, 2010 at 07:17:32PM +0100, Alexander Graf wrote:
> Eduardo Habkost wrote:
> > svm_vcpu_reset() was not properly resetting the contents of the 
> > guest-visible
> > cr0 register, causing the following issue:
> > https://bugzilla.redhat.com/show_bug.cgi?id=525699
> >
> > Without resetting cr0 properly, the vcpu was running the SIPI bootstrap 
> > routine
> > with paging enabled, making the vcpu get a pagefault exception while trying 
> > to
> > run it.
> >
> > Instead of setting vmcb->save.cr0 directly, the new code just resets
> > kvm->arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on
> > vmcb->save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0().
> >
> > kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure
> > kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode.
> >
> > Signed-off-by: Eduardo Habkost 
> >   
> 
> Should this go into -stable?

I think so. The patch is from October, was -stable branched before that?

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Zachary Amsden


On 03/16/2010 11:28 PM, Sheng Yang wrote:

On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
   

On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
 

On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
   

Right, but there is a scope between kvm_guest_enter and really running
in guest os, where a perf event might overflow. Anyway, the scope is
very narrow, I will change it to use flag PF_VCPU.
 

There is also a window between setting the flag and calling 'int $2'
where an NMI might happen and be accounted incorrectly.

Perhaps separate the 'int $2' into a direct call into perf and another
call for the rest of NMI handling.  I don't see how it would work on svm
though - AFAICT the NMI is held whereas vmx swallows it.

  I guess NMIs
will be disabled until the next IRET so it isn't racy, just tricky.
   

I'm not sure if vmexit does break NMI context or not. Hardware NMI context
isn't reentrant till a IRET. YangSheng would like to double check it.
 

After more check, I think VMX won't remained NMI block state for host. That's
means, if NMI happened and processor is in VMX non-root mode, it would only
result in VMExit, with a reason indicate that it's due to NMI happened, but no
more state change in the host.

So in that meaning, there _is_ a window between VMExit and KVM handle the NMI.
Moreover, I think we _can't_ stop the re-entrance of NMI handling code because
"int $2" don't have effect to block following NMI.

And if the NMI sequence is not important(I think so), then we need to generate
a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to
itself is a good idea.

I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to replace
"int $2". Something unexpected is happening...
   


You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't 
supposed to be able to.


Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm crashes with Assertion ... failed.

2010-03-17 Thread André Weidemann


On 17.03.2010 19:22, Marcelo Tosatti wrote:

On Sun, Mar 14, 2010 at 09:57:52AM +0100, André Weidemann wrote:

Hi,
I cloned the qemu-kvm git repository today with "git clone
git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git
qemu-kvm-2010-03-14", ran configure and compiled it and did a "make
install". Everything went fine without warnings or errors.
For configure output take a look here: http://pastebin.com/BL4DYCRY

Here is my Server Hardware:
Asus P5Q Mainbaord
Intel Q9300
8GB RAM
RAID5 with mdadm consisting of 4x 1TB disks
The volume /dev/storage/Windows7test mentioned below is on this RAID5.

I ran my virtual machine with the following command:

qemu-system-x86_64 -cpu core2duo -vga cirrus -boot order=ndc -vnc
192.168.3.42:2 -k de -smp 4,cores=4 -drive
file=/vmware/Windows7Test_600G.img,if=ide,index=0,cache=writeback -m
1024 -net nic,model=e1000,macaddr=DE:AD:BE:EF:12:3A -net
tap,script=/usr/local/bin/qemu-ifup  -monitor pty -name
Windows7test,process=Windows7test -drive
file=/dev/storage/Windows7test,if=ide,index=1,cache=none,aio=native


Andre,

Can you try qemu-kvm-0.12.3 ?


I did the following:
git clone git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git 
qemu-kvm-2010-03-17

cd qemu-kvm-2010-03-17
git checkout -b test qemu-kvm-0.12.3
./configure
make -j6 && make install

I started the VM again exactly as I did the last time and it crashed 
again with the same error message.
"qemu-system-x86_64: 
/usr/local/src/qemu-kvm-2010-03-17/hw/ide/internal.h:507: 
bmdma_active_if: Assertion `bmdma->unit != (uint8_t)-1' failed."


 André
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: -enable-kvm - can it be a required option?

2010-03-17 Thread Anthony Liguori


On 03/17/2010 03:18 PM, Michael Tokarev wrote:

What I mean is: if asked to enable kvm but kvm
can't be initialized for some reason (lack of
virt extensions on the cpu, permission denied
and so on), can we stop with a fatal error
instead of continuing in emulated mode?
   


What I've been thinking, is that we should make kvm enablement a -cpu 
option.  Something like:


-cpu host,accel=kvm
-cpu host,accel=tcg
-cpu host,accel=kvm:tcg

(1) would be KVM only, (2) would be TCG only, (3) would be KVM falling 
back to TCG.


What's nice about this approach, is that we already pull CPU model 
definitions from a global config file which means that you could tweak 
this parameter to your liking.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-enable-kvm - can it be a required option?

2010-03-17 Thread Michael Tokarev

What I mean is: if asked to enable kvm but kvm
can't be initialized for some reason (lack of
virt extensions on the cpu, permission denied
and so on), can we stop with a fatal error
instead of continuing in emulated mode?

Or maybe with another option, like -require-kvm?

I understand that -enable-kvm is now in upstream
qemu too, and _there_ it means something different,
that is, it enables something that is disabled by
default.  But even with that, if user asks for
something and that something isn't available, it
seems like a good idea to stop here instead of
producing a warning and continuing...

This is especially true for kvm where -enable-kvm
is the default anyway.

I see more and more people are using this option
now in a hope that kvm will actually stop when
no virt extensions are available.  It was my
first reaction too, "wow, now I can force it to
require kvm extensions instead of running 1000
times slower!".  So this has something to think
about, it looks like... ;)

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

SIGSEGV with -smp 17+, and error handling around...

2010-03-17 Thread Michael Tokarev

When run with -smp 17 or greather, kvm
fails like this:

$ kvm -smp 17
kvm_create_vcpu: Invalid argument
kvm_setup_mce FAILED: Invalid argument
KVM_SET_LAPIC failed
Segmentation fault
$ _

In qemu-kvm.c, the kvm_create_vcpu() routine
(which is used in a vcpu thread to set up
vcpu) is declared as void, i.e, no error
return.  And the code that calls it blindly
assumes that it will never fail...

But the first error message above is from kernel,
which - apparently - refuses to create 17th vCPU.
Hence we've a vcpu thread which is empty/dummy and
not even fully initialized... so it fails later
in the game.

This all looks quite... raw, not polished ;)

Can we somehow handle the (several possible) errors
in that (and other) places, and how we ever can act
on them?  Abort?  Warn the user and reduce the number
of vcpus accordingly (seems wrong, esp. if it were
some first vcpus or in the middle which failed)...

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: guest kernel debugging through serial port

2010-03-17 Thread Neo Jia

Here is what I have asked before. The problem that I want to assign a
real serial port to the guest is that the debugging through network
becomes really slow.

Thanks,
Neo

On Thu, Mar 11, 2010 at 2:44 AM, Neo Jia  wrote:
> hi,
>
> I have followed the windows guest debugging procedure from
> http://www.linux-kvm.org/page/WindowsGuestDrivers/GuestDebugging. And
> it works when I start two guests and bind tcp port to guest serial
> port, but it is really slow.
>
> And if I use -serial /dev/ttyS1 for the guest debugging target, I
> can't talk to it from my dev machine that has connected to ttyS1 with
> target machine (host).
>
> Is this a known problem?
>
> Thanks,
> Neo
>
> --
> I would remember that if researchers were not ambitious
> probably today we haven't the technology we are using!
>



-- 
I would remember that if researchers were not ambitious
probably today we haven't the technology we are using!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2972152 ] guest crash when -cpu kvm64

2010-03-17 Thread SourceForge.net

Bugs item #2972152, was opened at 2010-03-17 14:43
Message generated for change (Tracker Item Submitted) made by high33
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2972152&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: libkvm
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: hugohiggins (high33)
Assigned to: Nobody/Anonymous (nobody)
Summary: guest crash when -cpu kvm64

Initial Comment:
When using -cpu kvm64 guest crashes when X starts up.  
dmesg on hypervisor says:
[6149047.906364] kvm: 29020: cpu0 unhandled rdmsr: 0xc0010112

Guest boots OK without -cpu parameter

cpu: dual opteron 2435 (12 cores total)
ram: 32gig 
host dist: ubuntu 9.04
host kernel: 2.6.28-16-generic #55-Ubuntu SMP
guest dist: xubuntu-9.10-amd64

# /usr/local/qemu-kvm-0.12.3/bin/qemu-system-x86_64 -name "ubu64 localhost:69" 
-M pc \
-m 2048 -boot d -vga std \
-net nic,macaddr=BA:DD:C0:FF:EE:F6,model=virtio -net vde \
-drive file=/dev/sdp,if=scsi,boot=on \
-cpu kvm64 \
-cdrom iso/xubuntu-9.10-desktop-amd64.iso -k en-us -localtime -sdl -vnc 
localhost:69 -daemonize -usbdevice tablet



--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2972152&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm 0.12.3, Slackware 13 host and Windows XP guest - time drift a lot

2010-03-17 Thread Zachary Amsden


On 03/17/2010 09:22 AM, Thomas Løcke wrote:

Hey all,

I'm working on moving from a mixture of physical servers and
virtualized servers running on Virtualbox, to a pure KVM setup. But
I'm having some problems with my Windows XP guests in my test-setup.

This is the host I'm testing on:

CPU: Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
RAM: 8GB
2x320GB WD SATA disks (one for host OS and one for KVM guest images)
2x1GBs Intel nics (bonded)

Host OS is Slackware 13 with the following kernels: 2.6.29.6-huge,
2.6.29.6-generic, 2.6.33 and 2.6.33.1

qemu-kvm is 0.12.3
   


qemu's been changing a lot, might be best to build from the actual git 
repository, which is 0.12.50 now.



My Linux guests works like a charm. When they boot up I do a single
"ntpdate -b europe.pool.ntp.org" and after that the time stays in near
perfect sync with the host, with no ntpd running on the guests. My
Windows XP guests on the other hand drifts backwards in time,
especially when there's load on the guest, for example when I'm
copying a large file from my samba server to the Windows XP guest. The
guest can easily lose 10 minutes while copying a 600MB file. Or if I
start a few browsers and point them at some horrible flash heavy sites
and just let them sit there, then the VM also start losing a lot of
time real fast.
   


What's your host CPU load get up to.  You only have a single core?


This is the commandline I use to start the Windows XP guests:

qemu-system-x86_64 -hda winxppro.raw -boot c -m 1024 -vnc :1 -k da
-smp 1 -localtime -daemonize -name qemu_winxppro,process=qemu_winxppro
-net nic,macaddr=de:ad:be:ef:00:01,model=rtl8139 -net tap -runas kvm

I use the same commandline for my Linux guests, except the nic is virtio.

I'm at my wits end. I've tried the -tdf option with no success. I've
tried setting various -rtc options with no success.
   


Including -rtc-td-hack ?

Could it be I'm missing some key-component in the kernel? Or is there
perhaps some dev version of qemu-kvm I could/should try?

According to some of the #kvm residents, this should "just work" (tm),
but I simply cannot make it happen.

Any and all advice are more than welcome.
   


As always, make sure you are running the latest and greatest modules, 
those matter even more than the kernel, and check for any warning 
messages in dmesg and qemu output.


Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

qemu-kvm 0.12.3, Slackware 13 host and Windows XP guest - time drift a lot

2010-03-17 Thread Thomas Løcke

Hey all,

I'm working on moving from a mixture of physical servers and
virtualized servers running on Virtualbox, to a pure KVM setup. But
I'm having some problems with my Windows XP guests in my test-setup.

This is the host I'm testing on:

CPU: Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
RAM: 8GB
2x320GB WD SATA disks (one for host OS and one for KVM guest images)
2x1GBs Intel nics (bonded)

Host OS is Slackware 13 with the following kernels: 2.6.29.6-huge,
2.6.29.6-generic, 2.6.33 and 2.6.33.1

qemu-kvm is 0.12.3

My Linux guests works like a charm. When they boot up I do a single
"ntpdate -b europe.pool.ntp.org" and after that the time stays in near
perfect sync with the host, with no ntpd running on the guests. My
Windows XP guests on the other hand drifts backwards in time,
especially when there's load on the guest, for example when I'm
copying a large file from my samba server to the Windows XP guest. The
guest can easily lose 10 minutes while copying a 600MB file. Or if I
start a few browsers and point them at some horrible flash heavy sites
and just let them sit there, then the VM also start losing a lot of
time real fast.

This is the commandline I use to start the Windows XP guests:

qemu-system-x86_64 -hda winxppro.raw -boot c -m 1024 -vnc :1 -k da
-smp 1 -localtime -daemonize -name qemu_winxppro,process=qemu_winxppro
-net nic,macaddr=de:ad:be:ef:00:01,model=rtl8139 -net tap -runas kvm

I use the same commandline for my Linux guests, except the nic is virtio.

I'm at my wits end. I've tried the -tdf option with no success. I've
tried setting various -rtc options with no success.

Could it be I'm missing some key-component in the kernel? Or is there
perhaps some dev version of qemu-kvm I could/should try?

According to some of the #kvm residents, this should "just work" (tm),
but I simply cannot make it happen.

Any and all advice are more than welcome.

:o)
/Thomas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: Cleanup: change to use bool return values

2010-03-17 Thread Marcelo Tosatti

On Mon, Mar 15, 2010 at 05:29:09PM +0800, Gui Jianfeng wrote:
> Make use of bool as return values, and remove some useless
> bool value converting. Thanks Avi to point this out.
> 
> Signed-off-by: Gui Jianfeng 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH rework] KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s error handling

2010-03-17 Thread Marcelo Tosatti

On Mon, Mar 15, 2010 at 10:13:30PM +0900, Takuya Yoshikawa wrote:
> kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced
> mmio ring page and dev even after it has freed them.
> 
> Also, if this function fails, though it might be rare, it seems to be
> suggesting the system's serious state: so we'd better stop the works
> following the kvm_creat_vm().
> 
> This patch clears these problems.
> 
>   We move the coalesced mmio's initialization out of kvm_create_vm().
>   This seems to be natural because it includes a registration which
>   can be done only when vm is successfully created.
> 
> Signed-off-by: Takuya Yoshikawa 
> ---
>  virt/kvm/coalesced_mmio.c |2 ++
>  virt/kvm/kvm_main.c   |   12 
>  2 files changed, 10 insertions(+), 4 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Broken loadvm ?

2010-03-17 Thread Marcelo Tosatti

On Tue, Mar 16, 2010 at 05:25:13PM +0200, Alpár Török  wrote:
> PS:  It just occurred to me , that  it does  indeed freeze and cause a
> 100% CPU usage. At least i can say for sure that network, serial line,
> keyboard, nor mouse work. If  loadvm is loaded from the command line.
> If loaded from the monitor, everything seams to work, except the
> mouse.  After a -loadvm from the command line, repeating the command
> from the monitor doesn't unfreeze it.
> 
> i am really stuck with this. Any help is greatly appreciated, as
> downgrading is not an option.

Upgrade to qemu-kvm-0.12.3?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] add "xchg ax, reg" emulator test

2010-03-17 Thread Marcelo Tosatti

On Tue, Mar 16, 2010 at 02:42:52PM +0200, Gleb Natapov wrote:
> Add test for opcodes 0x90-0x9f emulation
> 
> Signed-off-by: Gleb Natapov 
> diff --git a/kvm/user/test/x86/realmode.c b/kvm/user/test/x86/realmode.c
> index bc6b27f..bfc2942 100644

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: >2 serial ports?

2010-03-17 Thread Michael Tokarev

Neo Jia wrote:
> On Wed, Mar 17, 2010 at 3:35 AM, Michael Tokarev  wrote:
>> Neo Jia wrote:
>>> May I ask if it is possible to bind a real physical serial port to a guest?
>> It is all described in the documentation, quite a long list of
>> various things you can attach to a virtual serial port, incl.
>> a real one.
> 
> I have tried -serial /dev/ttyS0 but I can't use it to debug my Windows guest.

That's entirely different issue, -- inability to debug windows guests.
Please don't hijack other threads for unrelated issues -- it makes
finding information and replying more difficult.  If it does not
work for you, ask in a new thread.  But before, try to research
the issue a bit, I've seen several discussions about debugging
guests over serial port in kvm.  Besides, I've no idea what are
you really trying to do - debugging a guest is much easier in kvm
than to set up another HOST and connect two HOSTS over a null-modem
serial cable

/mjt

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Chris Webb

Vivek Goyal  writes:

> Are you using CFQ in the host? What is the host kernel version? I am not sure
> what is the problem here but you might want to play with IO controller and put
> these guests in individual cgroups and see if you get better throughput even
> with cache=writethrough.

Hi. We're using the deadline IO scheduler on 2.6.32.7. We got better
performance from deadline than from cfq when we last tested, which was
admittedly around the 2.6.30 timescale so is now a rather outdated
measurement.

> If the problem is that if sync writes from different guests get intermixed
> resulting in more seeks, IO controller might help as these writes will now
> go on different group service trees and in CFQ, we try to service requests
> from one service tree at a time for a period before we switch the service
> tree.

Thanks for the suggestion: I'll have a play with this. I currently use
/sys/kernel/uids/N/cpu_share with one UID per guest to divide up the CPU
between guests, but this could just as easily be done with a cgroup per
guest if a side-effect is to provide a hint about IO independence to CFQ.

Best wishes,

Chris.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: >2 serial ports?

2010-03-17 Thread Neo Jia

On Wed, Mar 17, 2010 at 3:35 AM, Michael Tokarev  wrote:
> Neo Jia wrote:
>> May I ask if it is possible to bind a real physical serial port to a guest?
>
> It is all described in the documentation, quite a long list of
> various things you can attach to a virtual serial port, incl.
> a real one.

I have tried -serial /dev/ttyS0 but I can't use it to debug my Windows guest.

Thanks,
Neo

>
> /mjt
>



-- 
I would remember that if researchers were not ambitious
probably today we haven't the technology we are using!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost: fix error handling in vring ioctls

2010-03-17 Thread Laurent Chavey

Acked-by: Laurent Chavey 

On Wed, Mar 17, 2010 at 10:54 AM, Laurent Chavey  wrote:
> Acked-by: cha...@google.com
>
>
> On Wed, Mar 17, 2010 at 7:42 AM, Michael S. Tsirkin  wrote:
>> Stanse found a locking problem in vhost_set_vring:
>> several returns from VHOST_SET_VRING_KICK, VHOST_SET_VRING_CALL,
>> VHOST_SET_VRING_ERR with the vq->mutex held.
>> Fix these up.
>>
>> Reported-by: Jiri Slaby 
>> Signed-off-by: Michael S. Tsirkin 
>> ---
>>  drivers/vhost/vhost.c |   18 --
>>  1 files changed, 12 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 7cd55e0..7bd7a1e 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -476,8 +476,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
>> ioctl, void __user *argp)
>>                if (r < 0)
>>                        break;
>>                eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
>> -               if (IS_ERR(eventfp))
>> -                       return PTR_ERR(eventfp);
>> +               if (IS_ERR(eventfp)) {
>> +                       r = PTR_ERR(eventfp);
>> +                       break;
>> +               }
>>                if (eventfp != vq->kick) {
>>                        pollstop = filep = vq->kick;
>>                        pollstart = vq->kick = eventfp;
>> @@ -489,8 +491,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
>> ioctl, void __user *argp)
>>                if (r < 0)
>>                        break;
>>                eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
>> -               if (IS_ERR(eventfp))
>> -                       return PTR_ERR(eventfp);
>> +               if (IS_ERR(eventfp)) {
>> +                       r = PTR_ERR(eventfp);
>> +                       break;
>> +               }
>>                if (eventfp != vq->call) {
>>                        filep = vq->call;
>>                        ctx = vq->call_ctx;
>> @@ -505,8 +509,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
>> ioctl, void __user *argp)
>>                if (r < 0)
>>                        break;
>>                eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
>> -               if (IS_ERR(eventfp))
>> -                       return PTR_ERR(eventfp);
>> +               if (IS_ERR(eventfp)) {
>> +                       r = PTR_ERR(eventfp);
>> +                       break;
>> +               }
>>                if (eventfp != vq->error) {
>>                        filep = vq->error;
>>                        vq->error = eventfp;
>> --
>> 1.7.0.18.g0d53a5
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: VMX: Disable unrestricted guest when EPT disabled

2010-03-17 Thread Alexander Graf

Marcelo Tosatti wrote:
> On Fri, Nov 27, 2009 at 04:46:26PM +0800, Sheng Yang wrote:
>   
>> Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest
>> supported processors.
>>
>> Signed-off-by: Sheng Yang 
>> 
>
> Applied, thanks.
>   

So without this patch kvm breaks with ept=0? Sounds like a stable
candidate to me.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: MMU: Disassociate direct maps from guest levels

2010-03-17 Thread Marcelo Tosatti

On Sun, Mar 14, 2010 at 10:22:52AM +0200, Avi Kivity wrote:
> Direct maps are linear translations for a section of memory, used for
> real mode or with large pages.  As such, they are independent of the guest
> levels.
> 
> Teach the mmu about this by making page->role.glevels = 0 for direct maps.
> This allows direct maps to be shared among real mode and the various paging
> modes.
> 
> Signed-off-by: Avi Kivity 
> ---
>  arch/x86/kvm/mmu.c |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index b137515..a984bc1 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1328,6 +1328,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
> kvm_vcpu *vcpu,
>   role = vcpu->arch.mmu.base_role;
>   role.level = level;
>   role.direct = direct;
> + if (role.direct)
> + role.glevels = 0;
>   role.access = access;
>   if (vcpu->arch.mmu.root_level <= PT32_ROOT_LEVEL) {
>   quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level));
> -- 
> 1.7.0.2

Isnt this what happens already, since for tdp base_role.glevels is not 
initialized?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm crashes with Assertion ... failed.

2010-03-17 Thread Marcelo Tosatti

On Sun, Mar 14, 2010 at 09:57:52AM +0100, André Weidemann wrote:
> Hi,
> I cloned the qemu-kvm git repository today with "git clone
> git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git
> qemu-kvm-2010-03-14", ran configure and compiled it and did a "make
> install". Everything went fine without warnings or errors.
> For configure output take a look here: http://pastebin.com/BL4DYCRY
> 
> Here is my Server Hardware:
> Asus P5Q Mainbaord
> Intel Q9300
> 8GB RAM
> RAID5 with mdadm consisting of 4x 1TB disks
> The volume /dev/storage/Windows7test mentioned below is on this RAID5.
> 
> I ran my virtual machine with the following command:
> 
> qemu-system-x86_64 -cpu core2duo -vga cirrus -boot order=ndc -vnc
> 192.168.3.42:2 -k de -smp 4,cores=4 -drive
> file=/vmware/Windows7Test_600G.img,if=ide,index=0,cache=writeback -m
> 1024 -net nic,model=e1000,macaddr=DE:AD:BE:EF:12:3A -net
> tap,script=/usr/local/bin/qemu-ifup  -monitor pty -name
> Windows7test,process=Windows7test -drive
> file=/dev/storage/Windows7test,if=ide,index=1,cache=none,aio=native

Andre,

Can you try qemu-kvm-0.12.3 ?

> Windows7Test_600G.img is a qcow2 file and contains a Windows 7 Pro image.
> /dev/storage/Windows7test is formated with XFS
> 
> After starting the machine with the above command line, I booted
> into an Ubuntu 9.10 x86_64 Live Image via PXE and mounted /dev/sdb1
> (/dev/storage/Windows7test) under /mnt. I then did "cd /mnt/" and
> ran "iozone -Ra -g 2G -b /tmp/iozone-aoi-linux-xls"
> 
> iozone ran some test and then kvm simply quit with the following
> error message:
> qemu-system-x86_64:
> /usr/local/src/qemu-kvm-2010-03-10/hw/ide/internal.h:510:
> bmdma_active_if: Assertion `bmdma->unit != (uint8_t)-1' failed.
> 
> /var/log/syslog contained the folowing:
> Mar 14 09:18:14 server kernel: [318080.627468] kvm: 1361: cpu0
> kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop
> Mar 14 09:18:14 server  kernel: [318080.627473] kvm: 1361: cpu0
> kvm_set_msr_common: MSR_IA32_MCG_CTL 0x, nop
> Mar 14 09:18:14 server kernel: [318080.627476] kvm: 1361: cpu0
> unhandled wrmsr: 0x400 data 
> Mar 14 09:18:14 server kernel: [318080.627506] kvm: 1361: cpu1
> kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop
> Mar 14 09:18:14 server  kernel: [318080.627509] kvm: 1361: cpu1
> kvm_set_msr_common: MSR_IA32_MCG_CTL 0x, nop
> Mar 14 09:18:14 server kernel: [318080.627511] kvm: 1361: cpu1
> unhandled wrmsr: 0x400 data 
> Mar 14 09:18:14 server kernel: [318080.627538] kvm: 1361: cpu2
> kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop
> Mar 14 09:18:14 server kernel: [318080.627540] kvm: 1361: cpu2
> kvm_set_msr_common: MSR_IA32_MCG_CTL 0x, nop
> Mar 14 09:18:14 server kernel: [318080.627543] kvm: 1361: cpu2
> unhandled wrmsr: 0x400 data 
> 
> 
> I ws able to reproduce this error 3 times in a row.
> 
> Regards,
>  André
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] kvm: svm: reset cr0 properly on vcpu reset

2010-03-17 Thread Alexander Graf

Eduardo Habkost wrote:
> svm_vcpu_reset() was not properly resetting the contents of the guest-visible
> cr0 register, causing the following issue:
> https://bugzilla.redhat.com/show_bug.cgi?id=525699
>
> Without resetting cr0 properly, the vcpu was running the SIPI bootstrap 
> routine
> with paging enabled, making the vcpu get a pagefault exception while trying to
> run it.
>
> Instead of setting vmcb->save.cr0 directly, the new code just resets
> kvm->arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on
> vmcb->save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0().
>
> kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure
> kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode.
>
> Signed-off-by: Eduardo Habkost 
>   

Should this go into -stable?


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost: fix error handling in vring ioctls

2010-03-17 Thread Laurent Chavey

Acked-by: cha...@google.com


On Wed, Mar 17, 2010 at 7:42 AM, Michael S. Tsirkin  wrote:
> Stanse found a locking problem in vhost_set_vring:
> several returns from VHOST_SET_VRING_KICK, VHOST_SET_VRING_CALL,
> VHOST_SET_VRING_ERR with the vq->mutex held.
> Fix these up.
>
> Reported-by: Jiri Slaby 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/vhost/vhost.c |   18 --
>  1 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 7cd55e0..7bd7a1e 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -476,8 +476,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
> ioctl, void __user *argp)
>                if (r < 0)
>                        break;
>                eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
> -               if (IS_ERR(eventfp))
> -                       return PTR_ERR(eventfp);
> +               if (IS_ERR(eventfp)) {
> +                       r = PTR_ERR(eventfp);
> +                       break;
> +               }
>                if (eventfp != vq->kick) {
>                        pollstop = filep = vq->kick;
>                        pollstart = vq->kick = eventfp;
> @@ -489,8 +491,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
> ioctl, void __user *argp)
>                if (r < 0)
>                        break;
>                eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
> -               if (IS_ERR(eventfp))
> -                       return PTR_ERR(eventfp);
> +               if (IS_ERR(eventfp)) {
> +                       r = PTR_ERR(eventfp);
> +                       break;
> +               }
>                if (eventfp != vq->call) {
>                        filep = vq->call;
>                        ctx = vq->call_ctx;
> @@ -505,8 +509,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
> ioctl, void __user *argp)
>                if (r < 0)
>                        break;
>                eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
> -               if (IS_ERR(eventfp))
> -                       return PTR_ERR(eventfp);
> +               if (IS_ERR(eventfp)) {
> +                       r = PTR_ERR(eventfp);
> +                       break;
> +               }
>                if (eventfp != vq->error) {
>                        filep = vq->error;
>                        vq->error = eventfp;
> --
> 1.7.0.18.g0d53a5
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Vivek Goyal

On Wed, Mar 17, 2010 at 03:14:10PM +, Chris Webb wrote:
> Anthony Liguori  writes:
> 
> > This really gets down to your definition of "safe" behaviour.  As it
> > stands, if you suffer a power outage, it may lead to guest
> > corruption.
> > 
> > While we are correct in advertising a write-cache, write-caches are
> > volatile and should a drive lose power, it could lead to data
> > corruption.  Enterprise disks tend to have battery backed write
> > caches to prevent this.
> > 
> > In the set up you're emulating, the host is acting as a giant write
> > cache.  Should your host fail, you can get data corruption.
> 
> Hi Anthony. I suspected my post might spark an interesting discussion!
> 
> Before considering anything like this, we did quite a bit of testing with
> OSes in qemu-kvm guests running filesystem-intensive work, using an ipmitool
> power off to kill the host. I didn't manage to corrupt any ext3, ext4 or
> NTFS filesystems despite these efforts.
> 
> Is your claim here that:-
> 
>   (a) qemu doesn't emulate a disk write cache correctly; or
> 
>   (b) operating systems are inherently unsafe running on top of a disk with
>   a write-cache; or
> 
>   (c) installations that are already broken and lose data with a physical
>   drive with a write-cache can lose much more in this case because the
>   write cache is much bigger?
> 
> Following Christoph Hellwig's patch series from last September, I'm pretty
> convinced that (a) isn't true apart from the inability to disable the
> write-cache at run-time, which is something that neither recent linux nor
> windows seem to want to do out-of-the box.
> 
> Given that modern SATA drives come with fairly substantial write-caches
> nowadays which operating systems leave on without widespread disaster, I
> don't really believe in (b) either, at least for the ide and scsi case.
> Filesystems know they have to flush the disk cache to avoid corruption.
> (Virtio makes the write cache invisible to the OS except in linux 2.6.32+ so
> I know virtio-blk has to be avoided for current windows and obsolete linux
> when writeback caching is on.)
> 
> I can certainly imagine (c) might be the case, although when I use strace to
> watch the IO to the block device, I see pretty regular fdatasyncs being
> issued by the guests, interleaved with the writes, so I'm not sure how
> likely the problem would be in practice. Perhaps my test guests were
> unrepresentatively well-behaved.
> 
> However, the potentially unlimited time-window for loss of incorrectly
> unsynced data is also something one could imagine fixing at the qemu level.
> Perhaps I should be implementing something like
> cache=writeback,flushtimeout=N which, upon a write being issued to the block
> device, starts an N second timer if it isn't already running. The timer is
> destroyed on flush, and if it expires before it's destroyed, a gratuitous
> flush is sent. Do you think this is worth doing? Just a simple 'while sleep
> 10; do sync; done' on the host even!
> 
> We've used cache=none and cache=writethrough, and whilst performance is fine
> with a single guest accessing a disk, when we chop the disks up with LVM and
> run a even a small handful of guests, the constant seeking to serve tiny
> synchronous IOs leads to truly abysmal throughput---we've seen less than
> 700kB/s streaming write rates within guests when the backing store is
> capable of 100MB/s.
> 
> With cache=writeback, there's still IO contention between guests, but the
> write granularity is a bit coarser, so the host's elevator seems to get a
> bit more of a chance to help us out and we can at least squeeze out 5-10MB/s
> from two or three concurrently running guests, getting a total of 20-30% of
> the performance of the underlying block device rather than a total of around
> 5%.

Hi Chris,

Are you using CFQ in the host? What is the host kernel version? I am not sure
what is the problem here but you might want to play with IO controller and put
these guests in individual cgroups and see if you get better throughput even
with cache=writethrough.

If the problem is that if sync writes from different guests get intermixed
resulting in more seeks, IO controller might help as these writes will now
go on different group service trees and in CFQ, we try to service requests
from one service tree at a time for a period before we switch the service
tree.

The issue will be that all the logic is in CFQ and it works at leaf nodes
of storage stack and not at LVM nodes. So first you might want to try it with
single partitioned disk. If it helps, then it might help with LVM
configuration also (IO control working at leaf nodes).

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity


On 03/17/2010 06:57 PM, Christoph Hellwig wrote:

On Wed, Mar 17, 2010 at 06:40:30PM +0200, Avi Kivity wrote:
   

Chris, can you carry out an experiment?  Write a program that pwrite()s
a byte to a file at the same location repeatedly, with the file opened
using O_SYNC.  Measure the write rate, and run blktrace on the host to
see what the disk (/dev/sda, not the volume) sees.  Should be a (write,
flush, write, flush) per pwrite pattern or similar (for writing the data
and a journal block, perhaps even three writes will be needed).

Then scale this across multiple guests, measure and trace again.  If
we're lucky, the flushes will be coalesced, if not, we need to work on it.
 

As the person who has written quite a bit of the current O_SYNC
implementation and also reviewed the rest of it I can tell you that
those flushes won't be coalesced.  If we always rewrite the same block
we do the cache flush from the fsync method and there's is nothing
to coalesced it there.  If you actually do modify metadata (e.g. by
using the new real O_SYNC instead of the old one that always was O_DSYNC
that I introduced in 2.6.33 but that isn't picked up by userspace yet)
you might hit a very limited transaction merging window in some
filesystems, but it's generally very small for a good reason.  If it
were too large we'd make the once progress wait for I/O in another just
because we might expect transactions to coalesced later.  There's been
some long discussion about that fsync transaction batching tuning
for ext3 a while ago.
   


I definitely don't expect flush merging for a single guest, but for 
multiple guests there is certainly an opportunity for merging.  Most 
likely we don't take advantage of it and that's one of the problems.  
Copying data into pagecache so that we can merge the flushes seems like 
a very unsatisfactory implementation.





--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity


On 03/17/2010 06:52 PM, Christoph Hellwig wrote:

On Wed, Mar 17, 2010 at 06:22:29PM +0200, Avi Kivity wrote:
   

They should be reorderable.  Otherwise host filesystems on several
volumes would suffer the same problems.
 

They are reordable, just not as extremly as the the page cache.
Remember that the request queue really is just a relatively small queue
of outstanding I/O, and that is absolutely intentional.  Large scale
_caching_ is done by the VM in the pagecache, with all the usual aging,
pressure, etc algorithms applied to it.


We already have the large scale caching and stuff running in the guest.  
We have a stream of optimized requests coming out of guests, running the 
same algorithm again shouldn't improve things.  The host has an 
opportunity to do inter-guest optimization, but given each guest has its 
own disk area, I don't see how any reordering or merging could help here 
(beyond sorting guests according to disk order).



The block devices have a
relatively small fixed size request queue associated with it to
facilitate request merging and limited reordering and having fully
set up I/O requests for the device.
   


We should enlarge the queues, increase request reorderability, and merge 
flushes (delay flushes until after unrelated writes, then adjacent 
flushes can be collapsed).


Collapsing flushes should get us better than linear scaling (since we 
collapes N writes + M flushes into N writes and 1 flush).  However the 
writes themselves scale worse than linearly, since they now span a 
larger disk space and cause higher seek penalties.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity


On 03/17/2010 06:58 PM, Christoph Hellwig wrote:

On Wed, Mar 17, 2010 at 06:53:34PM +0200, Avi Kivity wrote:
   

Meanwhile I looked at the code, and it looks bad.  There is an
IO_CMD_FDSYNC, but it isn't tagged, so we have to drain the queue before
issuing it.  In any case, qemu doesn't use it as far as I could tell,
and even if it did, device-matter doesn't implement the needed
->aio_fsync() operation.
 

No one implements it, and all surrounding code is dead wood.  It would
require us to do asynchronous pagecache operations, which involve
major surgery of the VM code.  Patches to do this were rejected multiple
times.
   


Pity.  What about the O_DIRECT aio case?  It's ridiculous that you can 
submit async write requests but have to wait synchronously for them to 
actually hit the disk if you have a write cache.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Christoph Hellwig

On Wed, Mar 17, 2010 at 06:53:34PM +0200, Avi Kivity wrote:
> Meanwhile I looked at the code, and it looks bad.  There is an 
> IO_CMD_FDSYNC, but it isn't tagged, so we have to drain the queue before 
> issuing it.  In any case, qemu doesn't use it as far as I could tell, 
> and even if it did, device-matter doesn't implement the needed 
> ->aio_fsync() operation.

No one implements it, and all surrounding code is dead wood.  It would
require us to do asynchronous pagecache operations, which involve
major surgery of the VM code.  Patches to do this were rejected multiple
times.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Christoph Hellwig

On Wed, Mar 17, 2010 at 06:40:30PM +0200, Avi Kivity wrote:
> Chris, can you carry out an experiment?  Write a program that pwrite()s 
> a byte to a file at the same location repeatedly, with the file opened 
> using O_SYNC.  Measure the write rate, and run blktrace on the host to 
> see what the disk (/dev/sda, not the volume) sees.  Should be a (write, 
> flush, write, flush) per pwrite pattern or similar (for writing the data 
> and a journal block, perhaps even three writes will be needed).
> 
> Then scale this across multiple guests, measure and trace again.  If 
> we're lucky, the flushes will be coalesced, if not, we need to work on it.

As the person who has written quite a bit of the current O_SYNC
implementation and also reviewed the rest of it I can tell you that
those flushes won't be coalesced.  If we always rewrite the same block
we do the cache flush from the fsync method and there's is nothing
to coalesced it there.  If you actually do modify metadata (e.g. by
using the new real O_SYNC instead of the old one that always was O_DSYNC
that I introduced in 2.6.33 but that isn't picked up by userspace yet)
you might hit a very limited transaction merging window in some
filesystems, but it's generally very small for a good reason.  If it
were too large we'd make the once progress wait for I/O in another just
because we might expect transactions to coalesced later.  There's been
some long discussion about that fsync transaction batching tuning
for ext3 a while ago.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Fix SIGFPE for vnc display of width/height = 1

2010-03-17 Thread Alexander Graf

Anthony Liguori wrote:
> On 03/08/2010 08:34 AM, Chris Webb wrote:
>> During boot, the screen gets resized to height 1 and a mouse click at
>> this
>> point will cause a division by zero when calculating the absolute
>> pointer
>> position from the pixel (x, y). Return a click in the middle of the
>> screen
>> instead in this case.
>>
>> Signed-off-by: Chris Webb
>>
> Applied.  Thanks.

Also queued it to stable?


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity


On 03/17/2010 06:47 PM, Chris Webb wrote:

Avi Kivity  writes:

   

Chris, can you carry out an experiment?  Write a program that
pwrite()s a byte to a file at the same location repeatedly, with the
file opened using O_SYNC.  Measure the write rate, and run blktrace
on the host to see what the disk (/dev/sda, not the volume) sees.
Should be a (write, flush, write, flush) per pwrite pattern or
similar (for writing the data and a journal block, perhaps even
three writes will be needed).

Then scale this across multiple guests, measure and trace again.  If
we're lucky, the flushes will be coalesced, if not, we need to work
on it.
 

Sure, sounds like an excellent plan. I don't have a test machine at the
moment as the last host I was using for this has gone into production, but
I'm due to get another one to install later today or first thing tomorrow
which would be ideal for doing this. I'll follow up with the results once I
have them.
   


Meanwhile I looked at the code, and it looks bad.  There is an 
IO_CMD_FDSYNC, but it isn't tagged, so we have to drain the queue before 
issuing it.  In any case, qemu doesn't use it as far as I could tell, 
and even if it did, device-matter doesn't implement the needed 
->aio_fsync() operation.


So, there's a lot of plubming needed before we can get cache flushes 
merged into each other.  Given cache=writeback does allow merging, I 
think we explained part of the problem at least.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Christoph Hellwig

On Wed, Mar 17, 2010 at 06:22:29PM +0200, Avi Kivity wrote:
> They should be reorderable.  Otherwise host filesystems on several 
> volumes would suffer the same problems.

They are reordable, just not as extremly as the the page cache.
Remember that the request queue really is just a relatively small queue
of outstanding I/O, and that is absolutely intentional.  Large scale
_caching_ is done by the VM in the pagecache, with all the usual aging,
pressure, etc algorithms applied to it.  The block devices have a
relatively small fixed size request queue associated with it to
facilitate request merging and limited reordering and having fully
set up I/O requests for the device.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Chris Webb

Avi Kivity  writes:

> Chris, can you carry out an experiment?  Write a program that
> pwrite()s a byte to a file at the same location repeatedly, with the
> file opened using O_SYNC.  Measure the write rate, and run blktrace
> on the host to see what the disk (/dev/sda, not the volume) sees.
> Should be a (write, flush, write, flush) per pwrite pattern or
> similar (for writing the data and a journal block, perhaps even
> three writes will be needed).
> 
> Then scale this across multiple guests, measure and trace again.  If
> we're lucky, the flushes will be coalesced, if not, we need to work
> on it.

Sure, sounds like an excellent plan. I don't have a test machine at the
moment as the last host I was using for this has gone into production, but
I'm due to get another one to install later today or first thing tomorrow
which would be ideal for doing this. I'll follow up with the results once I
have them.

Cheers,

Chris.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity


On 03/17/2010 06:22 PM, Avi Kivity wrote:
Also, if my guest kernel issues (say) three small writes, one at the 
start
of the disk, one in the middle, one at the end, and then does a 
flush, can
virtio really express this as one non-contiguous O_DIRECT write (the 
three

components of which can be reordered by the elevator with respect to one
another) rather than three distinct O_DIRECT writes which can't be 
permuted?
Can qemu issue a write like that? cache=writeback + flush allows this 
to be

optimised by the block layer in the normal way.



Guest side virtio will send this as three requests followed by a 
flush.  Qemu will issue these as three distinct requests and then 
flush.  The requests are marked, as Christoph says, in a way that 
limits their reorderability, and perhaps if we fix these two problems 
performance will improve.


Something that comes to mind is merging of flush requests.  If N 
guests issue one write and one flush each, we should issue N writes 
and just one flush - a flush for the disk applies to all volumes on 
that disk.




Chris, can you carry out an experiment?  Write a program that pwrite()s 
a byte to a file at the same location repeatedly, with the file opened 
using O_SYNC.  Measure the write rate, and run blktrace on the host to 
see what the disk (/dev/sda, not the volume) sees.  Should be a (write, 
flush, write, flush) per pwrite pattern or similar (for writing the data 
and a journal block, perhaps even three writes will be needed).


Then scale this across multiple guests, measure and trace again.  If 
we're lucky, the flushes will be coalesced, if not, we need to work on it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/10] Don't call apic functions directly from kvm code

2010-03-17 Thread Avi Kivity


On 03/17/2010 04:00 PM, Glauber Costa wrote:

On Tue, Mar 09, 2010 at 03:27:02PM +0200, Avi Kivity wrote:
   

On 02/26/2010 10:12 PM, Glauber Costa wrote:
 

It is actually not necessary to call a tpr function to save and load cr8,
as cr8 is part of the processor state, and thus, it is much easier
to just add it to CPUState.

As for apic base, wrap kvm usages, so we can call either the qemu device,
or the in kernel version.


  }

+static void kvm_set_apic_base(CPUState *env, uint64_t val)
+{
+if (!kvm_irqchip_in_kernel())
+cpu_set_apic_base(env, val);
   

What if it is in kernel?  Just ignored?  Doesn't seem right.
 

At this point it is right, because there is no irqchip in kernel yet.

In a later patch, irqchip in kernel begins to exist, and this function
gets filled.
   


Ok.  In the future please code things like that without the if (), and 
add it when you introduce the other side.  Helps fend off nit-pickers.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Chris Webb

Anthony Liguori  writes:

> On 03/17/2010 10:14 AM, Chris Webb wrote:
> >   (c) installations that are already broken and lose data with a physical
> >   drive with a write-cache can lose much more in this case because the
> >   write cache is much bigger?
> 
> This is the closest to the most accurate.
> 
> It basically boils down to this: most enterprises use a disks with
> battery backed write caches.  Having the host act as a giant write
> cache means that you can lose data.
> 
> I agree that a well behaved file system will not become corrupt, but
> my contention is that for many types of applications, data lose ==
> corruption and not all file systems are well behaved.  And it's
> certainly valid to argue about whether common filesystems are
> "broken" but from a purely pragmatic perspective, this is going to
> be the case.

Okay. What I was driving at in describing these systems as 'already broken'
is that they will already lose data (in this sense) if they're run on bare
metal with normal commodity SATA disks with their 32MB write caches on. That
configuration surely describes the vast majority of PC-class desktops and
servers!

If I understand correctly, your point here is that the small cache on a real
SATA drive gives a relatively small time window for data loss, whereas the
worry with cache=writeback is that the host page cache can be gigabytes, so
the time window for unsynced data to be lost is potentially enormous.

Isn't the fix for that just forcing periodic sync on the host to bound-above
the time window for unsynced data loss in the guest?

Cheers,

Chris.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: [PATCH 2/6] qemu-kvm: Modify and introduce wrapper functions to access phys_ram_dirty.

2010-03-17 Thread Avi Kivity


On 03/17/2010 06:06 PM, Paul Brook wrote:

On 03/16/2010 10:10 PM, Blue Swirl wrote:
 

   Yes, and is what tlb_protect_code() does and it's called from
tb_alloc_page() which is what's code when a TB is created.
 

Just a tangential note: a long time ago, I tried to disable self
modifying code detection for Sparc. On most RISC architectures, SMC
needs explicit flushing so in theory we need not track code memory
writes. However, during exceptions the translator needs to access the
original unmodified code that was used to generate the TB. But maybe
there are other ways to avoid SMC tracking, on x86 it's still needed
   

On x86 you're supposed to execute a serializing instruction (one of
INVD, INVEPT, INVLPG, INVVPID, LGDT, LIDT, LLDT, LTR, MOV (to control
register, with the exception of MOV CR8), MOV (to debug register),
WBINVD, WRMSR, CPUID, IRET, and RSM) before running modified code.
 

Last time I checked, a jump instruction was sufficient to ensure coherency
withing a core.  Serializing instructions are only required for coherency
between cores on SMP systems.
   


Yeah, the docs say either a jump or a serializing instruction is needed.


QEMU effectively has a very large physically tagged icache[1] with very
expensive cache loads.  AFAIK The only practical way to maintain that cache on
x86 targets is to do write snooping via dirty bits. On targets that mandate
explicit icache invalidation we might be able to get away with this, however I
doubt it actually gains you anything - a correctly written guest is going to
invalidate at least as much as we get from dirty tracking, and we still need
to provide correct behaviour when executing with cache disabled.
   


Agreed.

   

but I suppose SMC is pretty rare.
   

Every time you demand load a code page from disk, you're running self
modifying code (though it usually doesn't exist in the tlb, so there's
no previous version that can cause trouble).
 

I think you're confusing TLB flushes with TB flushes.
   


No - my thinking was page fault, load page, invlpg, continue.  But the 
invlpg is unneeded, and "continue" has to include a jump anyway.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Balbir Singh

* Anthony Liguori  [2010-03-17 10:55:47]:

> On 03/17/2010 10:14 AM, Chris Webb wrote:
> >Anthony Liguori  writes:
> >
> >>This really gets down to your definition of "safe" behaviour.  As it
> >>stands, if you suffer a power outage, it may lead to guest
> >>corruption.
> >>
> >>While we are correct in advertising a write-cache, write-caches are
> >>volatile and should a drive lose power, it could lead to data
> >>corruption.  Enterprise disks tend to have battery backed write
> >>caches to prevent this.
> >>
> >>In the set up you're emulating, the host is acting as a giant write
> >>cache.  Should your host fail, you can get data corruption.
> >Hi Anthony. I suspected my post might spark an interesting discussion!
> >
> >Before considering anything like this, we did quite a bit of testing with
> >OSes in qemu-kvm guests running filesystem-intensive work, using an ipmitool
> >power off to kill the host. I didn't manage to corrupt any ext3, ext4 or
> >NTFS filesystems despite these efforts.
> >
> >Is your claim here that:-
> >
> >   (a) qemu doesn't emulate a disk write cache correctly; or
> >
> >   (b) operating systems are inherently unsafe running on top of a disk with
> >   a write-cache; or
> >
> >   (c) installations that are already broken and lose data with a physical
> >   drive with a write-cache can lose much more in this case because the
> >   write cache is much bigger?
> 
> This is the closest to the most accurate.
> 
> It basically boils down to this: most enterprises use a disks with
> battery backed write caches.  Having the host act as a giant write
> cache means that you can lose data.
> 

Dirty limits can help control how much we lose, but also affect how
much we write out.

> I agree that a well behaved file system will not become corrupt, but
> my contention is that for many types of applications, data lose ==
> corruption and not all file systems are well behaved.  And it's
> certainly valid to argue about whether common filesystems are
> "broken" but from a purely pragmatic perspective, this is going to
> be the case.
>

I think it is a trade-off for end users to decide on. cache=writeback
does provide performance benefits, but can cause data loss.


-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity


On 03/17/2010 05:24 PM, Chris Webb wrote:

Avi Kivity  writes:

   

On 03/15/2010 10:23 PM, Chris Webb wrote:

 

Wasteful duplication of page cache between guest and host notwithstanding,
turning on cache=writeback is a spectacular performance win for our guests.
   

Is this with qcow2, raw file, or direct volume access?
 

This is with direct access to logical volumes. No file systems or qcow2 in
the stack. Our typical host has a couple of SATA disks, combined in md
RAID1, chopped up into volumes with LVM2 (really just dm linear targets).
The performance measured outside qemu is excellent, inside qemu-kvm is fine
too until multiple guests are trying to access their drives at once, but
then everything starts to grind badly.

   


OK.


I can understand it for qcow2, but for direct volume access this
shouldn't happen.  The guest schedules as many writes as it can,
followed by a sync.  The host (and disk) can then reschedule them
whether they are in the writeback cache or in the block layer, and
must sync in the same way once completed.
 

I don't really understand what's going on here, but I wonder if the
underlying problem might be that all the O_DIRECT/O_SYNC writes from the
guests go down into the same block device at the bottom of the device mapper
stack, and thus can't be reordered with respect to one another.


They should be reorderable.  Otherwise host filesystems on several 
volumes would suffer the same problems.


Whether the filesystem is in the host or guest shouldn't matter.


For our
purposes,

   Guest AA   Guest BB   Guest AA   Guest BB   Guest AA   Guest BB
   write A1  write A1 write B1
  write B1   write A2  write A1
   write A2 write B1   write A2

are all equivalent, but the system isn't allowed to reorder in this way
because there isn't a separate request queue for each logical volume, just
the one at the bottom. (I don't know whether nested request queues would
behave remotely reasonably either, though!)

Also, if my guest kernel issues (say) three small writes, one at the start
of the disk, one in the middle, one at the end, and then does a flush, can
virtio really express this as one non-contiguous O_DIRECT write (the three
components of which can be reordered by the elevator with respect to one
another) rather than three distinct O_DIRECT writes which can't be permuted?
Can qemu issue a write like that? cache=writeback + flush allows this to be
optimised by the block layer in the normal way.
   


Guest side virtio will send this as three requests followed by a flush.  
Qemu will issue these as three distinct requests and then flush.  The 
requests are marked, as Christoph says, in a way that limits their 
reorderability, and perhaps if we fix these two problems performance 
will improve.


Something that comes to mind is merging of flush requests.  If N guests 
issue one write and one flush each, we should issue N writes and just 
one flush - a flush for the disk applies to all volumes on that disk.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: [PATCH 2/6] qemu-kvm: Modify and introduce wrapper functions to access phys_ram_dirty.

2010-03-17 Thread Paul Brook

> On 03/16/2010 10:10 PM, Blue Swirl wrote:
> >>   Yes, and is what tlb_protect_code() does and it's called from
> >> tb_alloc_page() which is what's code when a TB is created.
> >
> > Just a tangential note: a long time ago, I tried to disable self
> > modifying code detection for Sparc. On most RISC architectures, SMC
> > needs explicit flushing so in theory we need not track code memory
> > writes. However, during exceptions the translator needs to access the
> > original unmodified code that was used to generate the TB. But maybe
> > there are other ways to avoid SMC tracking, on x86 it's still needed
> 
> On x86 you're supposed to execute a serializing instruction (one of
> INVD, INVEPT, INVLPG, INVVPID, LGDT, LIDT, LLDT, LTR, MOV (to control
> register, with the exception of MOV CR8), MOV (to debug register),
> WBINVD, WRMSR, CPUID, IRET, and RSM) before running modified code.

Last time I checked, a jump instruction was sufficient to ensure coherency 
withing a core.  Serializing instructions are only required for coherency 
between cores on SMP systems.

QEMU effectively has a very large physically tagged icache[1] with very 
expensive cache loads.  AFAIK The only practical way to maintain that cache on 
x86 targets is to do write snooping via dirty bits. On targets that mandate 
explicit icache invalidation we might be able to get away with this, however I 
doubt it actually gains you anything - a correctly written guest is going to 
invalidate at least as much as we get from dirty tracking, and we still need 
to provide correct behaviour when executing with cache disabled.

> > but I suppose SMC is pretty rare.
> 
> Every time you demand load a code page from disk, you're running self
> modifying code (though it usually doesn't exist in the tlb, so there's
> no previous version that can cause trouble).

I think you're confusing TLB flushes with TB flushes.

Paul

[1] Even modern x86 only have relatively small icache. The large L2/L3 caches 
aren't relevant as they are unified I/D caches.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Fix SIGFPE for vnc display of width/height = 1

2010-03-17 Thread Anthony Liguori


On 03/08/2010 08:34 AM, Chris Webb wrote:

During boot, the screen gets resized to height 1 and a mouse click at this
point will cause a division by zero when calculating the absolute pointer
position from the pixel (x, y). Return a click in the middle of the screen
instead in this case.

Signed-off-by: Chris Webb
   

Applied.  Thanks.

Regards,

Anthony Liguori

---
  vnc.c |6 --
  1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/vnc.c b/vnc.c
index 01353a9..676a707 100644
--- a/vnc.c
+++ b/vnc.c
@@ -1457,8 +1457,10 @@ static void pointer_event(VncState *vs, int button_mask, 
int x, int y)
  dz = 1;

  if (vs->absolute) {
-kbd_mouse_event(x * 0x7FFF / (ds_get_width(vs->ds) - 1),
-y * 0x7FFF / (ds_get_height(vs->ds) - 1),
+kbd_mouse_event(ds_get_width(vs->ds)>  1 ?
+  x * 0x7FFF / (ds_get_width(vs->ds) - 1) : 0x4000,
+ds_get_height(vs->ds)>  1 ?
+  y * 0x7FFF / (ds_get_height(vs->ds) - 1) : 0x4000,
  dz, buttons);
  } else if (vnc_has_feature(vs, VNC_FEATURE_POINTER_TYPE_CHANGE)) {
  x -= 0x7FFF;
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Anthony Liguori


On 03/17/2010 10:14 AM, Chris Webb wrote:

Anthony Liguori  writes:

   

This really gets down to your definition of "safe" behaviour.  As it
stands, if you suffer a power outage, it may lead to guest
corruption.

While we are correct in advertising a write-cache, write-caches are
volatile and should a drive lose power, it could lead to data
corruption.  Enterprise disks tend to have battery backed write
caches to prevent this.

In the set up you're emulating, the host is acting as a giant write
cache.  Should your host fail, you can get data corruption.
 

Hi Anthony. I suspected my post might spark an interesting discussion!

Before considering anything like this, we did quite a bit of testing with
OSes in qemu-kvm guests running filesystem-intensive work, using an ipmitool
power off to kill the host. I didn't manage to corrupt any ext3, ext4 or
NTFS filesystems despite these efforts.

Is your claim here that:-

   (a) qemu doesn't emulate a disk write cache correctly; or

   (b) operating systems are inherently unsafe running on top of a disk with
   a write-cache; or

   (c) installations that are already broken and lose data with a physical
   drive with a write-cache can lose much more in this case because the
   write cache is much bigger?
   


This is the closest to the most accurate.

It basically boils down to this: most enterprises use a disks with 
battery backed write caches.  Having the host act as a giant write cache 
means that you can lose data.


I agree that a well behaved file system will not become corrupt, but my 
contention is that for many types of applications, data lose == 
corruption and not all file systems are well behaved.  And it's 
certainly valid to argue about whether common filesystems are "broken" 
but from a purely pragmatic perspective, this is going to be the case.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Chris Webb

Avi Kivity  writes:

> On 03/15/2010 10:23 PM, Chris Webb wrote:
>
> >Wasteful duplication of page cache between guest and host notwithstanding,
> >turning on cache=writeback is a spectacular performance win for our guests.
> 
> Is this with qcow2, raw file, or direct volume access?

This is with direct access to logical volumes. No file systems or qcow2 in
the stack. Our typical host has a couple of SATA disks, combined in md
RAID1, chopped up into volumes with LVM2 (really just dm linear targets).
The performance measured outside qemu is excellent, inside qemu-kvm is fine
too until multiple guests are trying to access their drives at once, but
then everything starts to grind badly.

> I can understand it for qcow2, but for direct volume access this
> shouldn't happen.  The guest schedules as many writes as it can,
> followed by a sync.  The host (and disk) can then reschedule them
> whether they are in the writeback cache or in the block layer, and
> must sync in the same way once completed.

I don't really understand what's going on here, but I wonder if the
underlying problem might be that all the O_DIRECT/O_SYNC writes from the
guests go down into the same block device at the bottom of the device mapper
stack, and thus can't be reordered with respect to one another. For our
purposes,

  Guest AA   Guest BB   Guest AA   Guest BB   Guest AA   Guest BB
  write A1  write A1 write B1
 write B1   write A2  write A1
  write A2 write B1   write A2

are all equivalent, but the system isn't allowed to reorder in this way
because there isn't a separate request queue for each logical volume, just
the one at the bottom. (I don't know whether nested request queues would
behave remotely reasonably either, though!)

Also, if my guest kernel issues (say) three small writes, one at the start
of the disk, one in the middle, one at the end, and then does a flush, can
virtio really express this as one non-contiguous O_DIRECT write (the three
components of which can be reordered by the elevator with respect to one
another) rather than three distinct O_DIRECT writes which can't be permuted?
Can qemu issue a write like that? cache=writeback + flush allows this to be
optimised by the block layer in the normal way.

Cheers,

Chris.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Chris Webb

Anthony Liguori  writes:

> This really gets down to your definition of "safe" behaviour.  As it
> stands, if you suffer a power outage, it may lead to guest
> corruption.
> 
> While we are correct in advertising a write-cache, write-caches are
> volatile and should a drive lose power, it could lead to data
> corruption.  Enterprise disks tend to have battery backed write
> caches to prevent this.
> 
> In the set up you're emulating, the host is acting as a giant write
> cache.  Should your host fail, you can get data corruption.

Hi Anthony. I suspected my post might spark an interesting discussion!

Before considering anything like this, we did quite a bit of testing with
OSes in qemu-kvm guests running filesystem-intensive work, using an ipmitool
power off to kill the host. I didn't manage to corrupt any ext3, ext4 or
NTFS filesystems despite these efforts.

Is your claim here that:-

  (a) qemu doesn't emulate a disk write cache correctly; or

  (b) operating systems are inherently unsafe running on top of a disk with
  a write-cache; or

  (c) installations that are already broken and lose data with a physical
  drive with a write-cache can lose much more in this case because the
  write cache is much bigger?

Following Christoph Hellwig's patch series from last September, I'm pretty
convinced that (a) isn't true apart from the inability to disable the
write-cache at run-time, which is something that neither recent linux nor
windows seem to want to do out-of-the box.

Given that modern SATA drives come with fairly substantial write-caches
nowadays which operating systems leave on without widespread disaster, I
don't really believe in (b) either, at least for the ide and scsi case.
Filesystems know they have to flush the disk cache to avoid corruption.
(Virtio makes the write cache invisible to the OS except in linux 2.6.32+ so
I know virtio-blk has to be avoided for current windows and obsolete linux
when writeback caching is on.)

I can certainly imagine (c) might be the case, although when I use strace to
watch the IO to the block device, I see pretty regular fdatasyncs being
issued by the guests, interleaved with the writes, so I'm not sure how
likely the problem would be in practice. Perhaps my test guests were
unrepresentatively well-behaved.

However, the potentially unlimited time-window for loss of incorrectly
unsynced data is also something one could imagine fixing at the qemu level.
Perhaps I should be implementing something like
cache=writeback,flushtimeout=N which, upon a write being issued to the block
device, starts an N second timer if it isn't already running. The timer is
destroyed on flush, and if it expires before it's destroyed, a gratuitous
flush is sent. Do you think this is worth doing? Just a simple 'while sleep
10; do sync; done' on the host even!

We've used cache=none and cache=writethrough, and whilst performance is fine
with a single guest accessing a disk, when we chop the disks up with LVM and
run a even a small handful of guests, the constant seeking to serve tiny
synchronous IOs leads to truly abysmal throughput---we've seen less than
700kB/s streaming write rates within guests when the backing store is
capable of 100MB/s.

With cache=writeback, there's still IO contention between guests, but the
write granularity is a bit coarser, so the host's elevator seems to get a
bit more of a chance to help us out and we can at least squeeze out 5-10MB/s
from two or three concurrently running guests, getting a total of 20-30% of
the performance of the underlying block device rather than a total of around
5%.

Cheers,

Chris.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] vhost: fix interrupt mitigation with raw sockets

2010-03-17 Thread Michael S. Tsirkin

A thinko in code means we never trigger interrupt
mitigation. Fix this.

Reported-by: Juan Quintela 
Reported-by: Unai Uribarri 
Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index fcafb6b..a6a88df 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -125,7 +125,7 @@ static void handle_tx(struct vhost_net *net)
mutex_lock(&vq->mutex);
vhost_disable_notify(vq);
 
-   if (wmem < sock->sk->sk_sndbuf * 2)
+   if (wmem < sock->sk->sk_sndbuf / 2)
tx_poll_stop(net);
hdr_size = vq->hdr_size;
 
-- 
1.7.0.18.g0d53a5
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] vhost: fix error handling in vring ioctls

2010-03-17 Thread Michael S. Tsirkin

Stanse found a locking problem in vhost_set_vring:
several returns from VHOST_SET_VRING_KICK, VHOST_SET_VRING_CALL,
VHOST_SET_VRING_ERR with the vq->mutex held.
Fix these up.

Reported-by: Jiri Slaby 
Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/vhost.c |   18 --
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 7cd55e0..7bd7a1e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -476,8 +476,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
ioctl, void __user *argp)
if (r < 0)
break;
eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
-   if (IS_ERR(eventfp))
-   return PTR_ERR(eventfp);
+   if (IS_ERR(eventfp)) {
+   r = PTR_ERR(eventfp);
+   break;
+   }
if (eventfp != vq->kick) {
pollstop = filep = vq->kick;
pollstart = vq->kick = eventfp;
@@ -489,8 +491,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
ioctl, void __user *argp)
if (r < 0)
break;
eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
-   if (IS_ERR(eventfp))
-   return PTR_ERR(eventfp);
+   if (IS_ERR(eventfp)) {
+   r = PTR_ERR(eventfp);
+   break;
+   }
if (eventfp != vq->call) {
filep = vq->call;
ctx = vq->call_ctx;
@@ -505,8 +509,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
ioctl, void __user *argp)
if (r < 0)
break;
eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
-   if (IS_ERR(eventfp))
-   return PTR_ERR(eventfp);
+   if (IS_ERR(eventfp)) {
+   r = PTR_ERR(eventfp);
+   break;
+   }
if (eventfp != vq->error) {
filep = vq->error;
vq->error = eventfp;
-- 
1.7.0.18.g0d53a5
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/10] Don't call apic functions directly from kvm code

2010-03-17 Thread Glauber Costa

On Tue, Mar 09, 2010 at 03:27:02PM +0200, Avi Kivity wrote:
> On 02/26/2010 10:12 PM, Glauber Costa wrote:
> >It is actually not necessary to call a tpr function to save and load cr8,
> >as cr8 is part of the processor state, and thus, it is much easier
> >to just add it to CPUState.
> >
> >As for apic base, wrap kvm usages, so we can call either the qemu device,
> >or the in kernel version.
> >
> >
> >  }
> >
> >+static void kvm_set_apic_base(CPUState *env, uint64_t val)
> >+{
> >+if (!kvm_irqchip_in_kernel())
> >+cpu_set_apic_base(env, val);
> 
> What if it is in kernel?  Just ignored?  Doesn't seem right.
At this point it is right, because there is no irqchip in kernel yet.

In a later patch, irqchip in kernel begins to exist, and this function
gets filled.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] [Autotest PATCH] KVM-test: Add a subtest 'qemu_img'

2010-03-17 Thread Lucas Meneghel Rodrigues

Copying Michael on the message.

Hi Yolkfull, I have reviewed this patch and I have some comments to
make on it, similar to the ones I made on an earlier version of it:

One of the things that I noticed is that this patch doesn't work very
well out of the box:

[...@freedom kvm]$ ./scan_results.py
TestStatus  Seconds 
Info
--  --- 

(Result file: ../../results/default/status)
smp2.Fedora.11.64.qemu_img.checkGOOD47  
completed successfully
smp2.Fedora.11.64.qemu_img.create   GOOD44  
completed successfully
smp2.Fedora.11.64.qemu_img.convert.to_qcow2 FAIL45  
Image
converted failed; Command: /usr/bin/qemu-img convert -f qcow2 -O qcow2
/tmp/kvm_autotest_root/images/fc11-64.qcow2
/tmp/kvm_autotest_root/images/fc11-64.qcow2.converted_qcow2;Output is:
qemu-img: Could not open '/tmp/kvm_autotest_root/images/fc11-64.qcow2'
smp2.Fedora.11.64.qemu_img.convert.to_raw   FAIL46  
Image
converted failed; Command: /usr/bin/qemu-img convert -f qcow2 -O raw
/tmp/kvm_autotest_root/images/fc11-64.qcow2
/tmp/kvm_autotest_root/images/fc11-64.qcow2.converted_raw;Output is:
qemu-img: Could not open '/tmp/kvm_autotest_root/images/fc11-64.qcow2'
smp2.Fedora.11.64.qemu_img.snapshot FAIL44  
Create
snapshot failed via command: /usr/bin/qemu-img snapshot -c snapshot0
/tmp/kvm_autotest_root/images/fc11-64.qcow2;Output is: qemu-img: Could
not open '/tmp/kvm_autotest_root/images/fc11-64.qcow2'
smp2.Fedora.11.64.qemu_img.commit   GOOD44  
completed successfully
smp2.Fedora.11.64.qemu_img.info FAIL44  
Unhandled
str: Unhandled TypeError: argument of type 'NoneType' is not iterable
smp2.Fedora.11.64.qemu_img.rebase   TEST_NA 43  
Current
kvm user space version does not support 'rebase' subcommand
GOOD412 

We need to fix that before upstream inclusion.

Also, one thing that I've noticed is that this test doesn't depend of
any other variants, so we don't need to repeat it to every combination
of guest and qemu command line options. Michael, does it occur to you
a way to get this test out of the variants block, so it gets executed
only once per job and not every combination of guest and other qemu
options?

On Fri, Jan 29, 2010 at 4:00 AM, Yolkfull Chow  wrote:
> This is designed to test all subcommands of 'qemu-img' however
> so far 'commit' is not implemented.
>
> * For 'check' subcommand test, it will 'dd' to create a file with specified
> size and see whether it's supported to be checked. Then convert it to be
> supported formats ('qcow2' and 'raw' so far) to see whether there's error
> after convertion.
>
> * For 'convert' subcommand test, it will convert both to 'qcow2' and 'raw' 
> from
> the format specified in config file. And only check 'qcow2' after convertion.
>
> * For 'snapshot' subcommand test, it will create two snapshots and list them.
> Finally delete them if no errors found.
>
> * For 'info' subcommand test, it will check image format & size according to
> output of 'info' subcommand  at specified image file.
>
> * For 'rebase' subcommand test, it will create first snapshot 'sn1' based on 
> original
> base_img, and create second snapshot based on sn1. And then rebase sn2 to 
> base_img.
> After rebase check the baking_file of sn2.
>
> This supports two rebase mode: unsafe mode and safe mode:
> Unsafe mode:
> With -u an unsafe mode is enabled that doesn't require the backing files to 
> exist.
> It merely changes the backing file reference in the COW image. This is useful 
> for
> renaming or moving the backing file. The user is responsible to make sure 
> that the
> new backing file has no changes compared to the old one, or corruption may 
> occur.
>
> Safe Mode:
> Both the current and the new backing file need to exist, and after the 
> rebase, the
> COW image is guaranteed to have the same guest visible content as before.
> To achieve this, old and new backing file are compared and, if necessary, 
> data is
> copied from the old backing file into the COW image.
>
> Signed-off-by: Yolkfull Chow 
> ---
>  client/tests/kvm/tests/qemu_img.py     |  235 
> 
>  client/tests/kvm/tests_base.cfg.sample |   40 ++
>  2 files changed, 275 insertions(+), 0 deletions(-)
>  create mode 100644 client/tests/kvm/tests/qemu_img.py
>
> diff --git a/client/tests/kvm/tests/qemu_img.py 
> b/client/tests/kvm/tests/qemu_img.py
> new file mode 100644
> index 000..e6352a0
> --- /dev/null
> +++ b/client/tests/kvm/tests/qemu_img.py
> @@ -0,0 +1,235 @@
> +import re, os, logging, commands
> +from autotest_lib.client.common_lib

Re: [PATCH] KVM: x86: Add KVM_GET/SET_VCPU_EVENTS

2010-03-17 Thread Alexander Graf

Avi Kivity wrote:
> On 11/12/2009 02:04 AM, Jan Kiszka wrote:
>> This new IOCTL exports all yet user-invisible states related to
>> exceptions, interrupts, and NMIs. Together with appropriate user space
>> changes, this fixes sporadic problems of vmsave/restore, live migration
>> and system reset.
>>
>>
>
> Applied, thanks.  I added a flags field to the structure in case we
> discover a new bit that needs to fit in there.  Please take a look
> (separate commit in kvm-next).
>

So without this patch migration fails? Sounds like a stable candidate to
me. Same goes for the follow-up that adds the shadow field.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Corrupt qcow2 image, recovery?

2010-03-17 Thread RW

Liang Guo gave me this advice some weeks ago:

> you may use kvm-nbd or qemu-nbd to present kvm image
> as a NBD device, so
> that you can use nbd-client to access them. eg:
>
> kvm-nbd /vm/sid1.img
> modprobe nbd
> nbd-client localhost 1024 /dev/nbd0
> fdisk -l /dev/nbd0

Didn't work for me because I always got a segfault
but maybe it work's for you.

- Robert




On 03/16/10 19:21, Christian Nilsson wrote:
> Hi!
> 
> I'm running kvm / qemu-kvm on a couple of production servers everything (or 
> at least most things) works as it should.
> However today someone thought it was a "good" idea to restart one of the 
> servers and after that the windows 2k3 guest on that server don't boot 
> anymore.
> 
> kvm on this server is a bit "outdated": "QEMU PC emulator version 0.9.1 
> (kvm-83)"
> (I guess this is one of the qcow2 corruption bugs, and i can only blame 
> myself for not upgrading kvm sooner.)
> The guest.qcow2 is a 21GiB file for a 60GiB disk
> 
> i have tried a couple of things kvm-img convert -f qcow2 -O raw guest.qcow2 
> guest.raw
> this stops and does nothing after creating a guest.raw that is 60GiB but only 
> using 60MiB
> 
> so mounted the fs from another server running: "QEMU PC emulator version 
> 0.12.1 (qemu-kvm-0.12.1.2)"
> 
> and run qemu-img with the same options as above and after a few secs got 
> "qemu-img: error while reading"
> and the same 60MiB used by guest.raw
> 
> i also tried booting qemu-kvm with a linux guest and this qcow2 image but 
> only get I/O Errors (and no partitions found)
> 
> # qemu-img check guest.qcow2
> ERROR: invalid cluster offset=0x10a000  
> ERROR OFLAG_COPIED: l2_offset=ee73 refcount=1   
> ERROR l2_offset=ee73: Table is not cluster aligned; L1 entry corrupted
> ERROR: invalid cluster offset=0x11d44100080   
> ERROR: invalid cluster offset=0x11d61600080   
> ERROR: invalid cluster offset=0x11d68600080   
> ERROR: invalid cluster offset=0x11d95300080
> (and a loot more in this style, full log can be provided if 
> it would be of help to anybody)
> 
> 
> 
> is there any possibility to repair this file, or convert it to a RAW file 
> (even with parts padded that are not "safe" from the qcow2 image), or as a 
> last resort, are there any debug tools for qcow2 images that might be of use?
> 
> I have read up on the qcow fileformat but right now i'm a bit short of time, 
> i need the data in this guests disk image, or at least the MS SQL datafiles 
> that are on this disk) i have also checked the qcow2 file and it do contain a 
> NTLDR string and a loot of other NTFS recognized strings so i know that all 
> data is not gone. the question is how can i access it as a Filesystem again?
> 
> 
> Any help would be appreciated!
> 
> Regards
> Christian Nilsson
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 4/4] virtio-pci: irqfd support

2010-03-17 Thread Michael S. Tsirkin

Use irqfd when supported by kernel.
This uses msix mask notifiers: when vector is masked, we poll it from
userspace.  When it is unmasked, we poll it from kernel.

Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio-pci.c |   27 +++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 4255d98..f8d8022 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -402,6 +402,27 @@ static void virtio_pci_guest_notifier_read(void *opaque)
 }
 }
 
+static int virtio_pci_mask_notifier(PCIDevice *dev, unsigned vector,
+void *opaque, int masked)
+{
+VirtQueue *vq = opaque;
+EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
+int r = kvm_set_irqfd(dev->msix_irq_entries[vector].gsi,
+  event_notifier_get_fd(notifier),
+  !masked);
+if (r < 0) {
+return (r == -ENOSYS) ? 0 : r;
+}
+if (masked) {
+qemu_set_fd_handler(event_notifier_get_fd(notifier),
+virtio_pci_guest_notifier_read, NULL, vq);
+} else {
+qemu_set_fd_handler(event_notifier_get_fd(notifier),
+NULL, NULL, NULL);
+}
+return 0;
+}
+
 static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign)
 {
 VirtIOPCIProxy *proxy = opaque;
@@ -415,7 +436,11 @@ static int virtio_pci_set_guest_notifier(void *opaque, int 
n, bool assign)
 }
 qemu_set_fd_handler(event_notifier_get_fd(notifier),
 virtio_pci_guest_notifier_read, NULL, vq);
+msix_set_mask_notifier(&proxy->pci_dev,
+   virtio_queue_vector(proxy->vdev, n), vq);
 } else {
+msix_set_mask_notifier(&proxy->pci_dev,
+   virtio_queue_vector(proxy->vdev, n), NULL);
 qemu_set_fd_handler(event_notifier_get_fd(notifier),
 NULL, NULL, NULL);
 event_notifier_cleanup(notifier);
@@ -500,6 +525,8 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, 
VirtIODevice *vdev,
 
 proxy->pci_dev.config_write = virtio_write_config;
 
+proxy->pci_dev.msix_mask_notifier = virtio_pci_mask_notifier;
+
 size = VIRTIO_PCI_REGION_SIZE(&proxy->pci_dev) + vdev->config_len;
 if (size & (size-1))
 size = 1 << qemu_fls(size);
-- 
1.7.0.18.g0d53a5
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 3/4] msix: add mask/unmask notifiers

2010-03-17 Thread Michael S. Tsirkin

Support per-vector callbacks for msix mask/unmask.
Will be used for vhost net.

Signed-off-by: Michael S. Tsirkin 
---
 hw/msix.c |   36 +++-
 hw/msix.h |1 +
 hw/pci.h  |6 ++
 3 files changed, 42 insertions(+), 1 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index faee0b2..3ec8805 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -317,6 +317,13 @@ static void msix_mmio_writel(void *opaque, 
target_phys_addr_t addr,
 if (kvm_enabled() && kvm_irqchip_in_kernel()) {
 kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
 }
+if (was_masked != msix_is_masked(dev, vector) &&
+dev->msix_mask_notifier && dev->msix_mask_notifier_opaque[vector]) {
+int r = dev->msix_mask_notifier(dev, vector,
+   dev->msix_mask_notifier_opaque[vector],
+   msix_is_masked(dev, vector));
+assert(r >= 0);
+}
 msix_handle_mask_update(dev, vector);
 }
 
@@ -355,10 +362,18 @@ void msix_mmio_map(PCIDevice *d, int region_num,
 
 static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
 {
-int vector;
+int vector, r;
 for (vector = 0; vector < nentries; ++vector) {
 unsigned offset = vector * MSIX_ENTRY_SIZE + MSIX_VECTOR_CTRL;
+int was_masked = msix_is_masked(dev, vector);
 dev->msix_table_page[offset] |= MSIX_VECTOR_MASK;
+if (was_masked != msix_is_masked(dev, vector) &&
+dev->msix_mask_notifier && dev->msix_mask_notifier_opaque[vector]) 
{
+r = dev->msix_mask_notifier(dev, vector,
+dev->msix_mask_notifier_opaque[vector],
+msix_is_masked(dev, vector));
+assert(r >= 0);
+}
 }
 }
 
@@ -381,6 +396,9 @@ int msix_init(struct PCIDevice *dev, unsigned short 
nentries,
 sizeof *dev->msix_irq_entries);
 }
 #endif
+dev->msix_mask_notifier_opaque =
+qemu_mallocz(nentries * sizeof *dev->msix_mask_notifier_opaque);
+dev->msix_mask_notifier = NULL;
 dev->msix_entry_used = qemu_mallocz(MSIX_MAX_ENTRIES *
 sizeof *dev->msix_entry_used);
 
@@ -443,6 +461,8 @@ int msix_uninit(PCIDevice *dev)
 dev->msix_entry_used = NULL;
 qemu_free(dev->msix_irq_entries);
 dev->msix_irq_entries = NULL;
+qemu_free(dev->msix_mask_notifier_opaque);
+dev->msix_mask_notifier_opaque = NULL;
 dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
 return 0;
 }
@@ -586,3 +606,17 @@ void msix_unuse_all_vectors(PCIDevice *dev)
 return;
 msix_free_irq_entries(dev);
 }
+
+int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque)
+{
+int r = 0;
+if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
+return 0;
+
+if (dev->msix_mask_notifier)
+r = dev->msix_mask_notifier(dev, vector, opaque,
+msix_is_masked(dev, vector));
+if (r >= 0)
+dev->msix_mask_notifier_opaque[vector] = opaque;
+return r;
+}
diff --git a/hw/msix.h b/hw/msix.h
index a9f7993..f167231 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -33,4 +33,5 @@ void msix_reset(PCIDevice *dev);
 
 extern int msix_supported;
 
+int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque);
 #endif
diff --git a/hw/pci.h b/hw/pci.h
index 1eab8f2..100104c 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -136,6 +136,9 @@ enum {
 #define PCI_CAPABILITY_CONFIG_MSI_LENGTH 0x10
 #define PCI_CAPABILITY_CONFIG_MSIX_LENGTH 0x10
 
+typedef int (*msix_mask_notifier_func)(PCIDevice *, unsigned vector,
+  void *opaque, int masked);
+
 struct PCIDevice {
 DeviceState qdev;
 /* PCI config space */
@@ -201,6 +204,9 @@ struct PCIDevice {
 
 struct kvm_irq_routing_entry *msix_irq_entries;
 
+void **msix_mask_notifier_opaque;
+msix_mask_notifier_func msix_mask_notifier;
+
 /* Device capability configuration space */
 struct {
 int supported;
-- 
1.7.0.18.g0d53a5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 2/4] kvm: irqfd support

2010-03-17 Thread Michael S. Tsirkin

Add API to assign/deassign irqfd to kvm.
Add stub so that users do not have to use
ifdefs.

Signed-off-by: Michael S. Tsirkin 
---
 kvm-all.c |   19 +++
 kvm.h |   10 ++
 2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 7b05462..1a15662 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1200,5 +1200,24 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, 
uint16_t val, bool assign)
 }
 #endif
 
+#if defined(KVM_IRQFD)
+int kvm_set_irqfd(int gsi, int fd, bool assigned)
+{
+struct kvm_irqfd irqfd = {
+.fd = fd,
+.gsi = gsi,
+.flags = assigned ? 0 : KVM_IRQFD_FLAG_DEASSIGN,
+};
+int r;
+if (!kvm_enabled() || !kvm_irqchip_in_kernel())
+return -ENOSYS;
+
+r = kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd);
+if (r < 0)
+return r;
+return 0;
+}
+#endif
+
 #undef PAGE_SIZE
 #include "qemu-kvm.c"
diff --git a/kvm.h b/kvm.h
index 0951380..72dcaca 100644
--- a/kvm.h
+++ b/kvm.h
@@ -180,4 +180,14 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, 
uint16_t val, bool assign)
 }
 #endif
 
+#if defined(KVM_IRQFD) && defined(CONFIG_KVM)
+int kvm_set_irqfd(int gsi, int fd, bool assigned);
+#else
+static inline
+int kvm_set_irqfd(int gsi, int fd, bool assigned)
+{
+return -ENOSYS;
+}
+#endif
+
 #endif
-- 
1.7.0.18.g0d53a5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv6 1/4] qemu-kvm: add vhost.h header

2010-03-17 Thread Michael S. Tsirkin

This makes it possible to build vhost support
on systems which do not have this header.

Signed-off-by: Michael S. Tsirkin 
---
 kvm/include/linux/vhost.h |  130 +
 1 files changed, 130 insertions(+), 0 deletions(-)
 create mode 100644 kvm/include/linux/vhost.h

diff --git a/kvm/include/linux/vhost.h b/kvm/include/linux/vhost.h
new file mode 100644
index 000..165a484
--- /dev/null
+++ b/kvm/include/linux/vhost.h
@@ -0,0 +1,130 @@
+#ifndef _LINUX_VHOST_H
+#define _LINUX_VHOST_H
+/* Userspace interface for in-kernel virtio accelerators. */
+
+/* vhost is used to reduce the number of system calls involved in virtio.
+ *
+ * Existing virtio net code is used in the guest without modification.
+ *
+ * This header includes interface used by userspace hypervisor for
+ * device configuration.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+
+struct vhost_vring_state {
+   unsigned int index;
+   unsigned int num;
+};
+
+struct vhost_vring_file {
+   unsigned int index;
+   int fd; /* Pass -1 to unbind from file. */
+
+};
+
+struct vhost_vring_addr {
+   unsigned int index;
+   /* Option flags. */
+   unsigned int flags;
+   /* Flag values: */
+   /* Whether log address is valid. If set enables logging. */
+#define VHOST_VRING_F_LOG 0
+
+   /* Start of array of descriptors (virtually contiguous) */
+   __u64 desc_user_addr;
+   /* Used structure address. Must be 32 bit aligned */
+   __u64 used_user_addr;
+   /* Available structure address. Must be 16 bit aligned */
+   __u64 avail_user_addr;
+   /* Logging support. */
+   /* Log writes to used structure, at offset calculated from specified
+* address. Address must be 32 bit aligned. */
+   __u64 log_guest_addr;
+};
+
+struct vhost_memory_region {
+   __u64 guest_phys_addr;
+   __u64 memory_size; /* bytes */
+   __u64 userspace_addr;
+   __u64 flags_padding; /* No flags are currently specified. */
+};
+
+/* All region addresses and sizes must be 4K aligned. */
+#define VHOST_PAGE_SIZE 0x1000
+
+struct vhost_memory {
+   __u32 nregions;
+   __u32 padding;
+   struct vhost_memory_region regions[0];
+};
+
+/* ioctls */
+
+#define VHOST_VIRTIO 0xAF
+
+/* Features bitmask for forward compatibility.  Transport bits are used for
+ * vhost specific features. */
+#define VHOST_GET_FEATURES _IOR(VHOST_VIRTIO, 0x00, __u64)
+#define VHOST_SET_FEATURES _IOW(VHOST_VIRTIO, 0x00, __u64)
+
+/* Set current process as the (exclusive) owner of this file descriptor.  This
+ * must be called before any other vhost command.  Further calls to
+ * VHOST_OWNER_SET fail until VHOST_OWNER_RESET is called. */
+#define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01)
+/* Give up ownership, and reset the device to default values.
+ * Allows subsequent call to VHOST_OWNER_SET to succeed. */
+#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02)
+
+/* Set up/modify memory layout */
+#define VHOST_SET_MEM_TABLE_IOW(VHOST_VIRTIO, 0x03, struct vhost_memory)
+
+/* Write logging setup. */
+/* Memory writes can optionally be logged by setting bit at an offset
+ * (calculated from the physical address) from specified log base.
+ * The bit is set using an atomic 32 bit operation. */
+/* Set base address for logging. */
+#define VHOST_SET_LOG_BASE _IOW(VHOST_VIRTIO, 0x04, __u64)
+/* Specify an eventfd file descriptor to signal on log write. */
+#define VHOST_SET_LOG_FD _IOW(VHOST_VIRTIO, 0x07, int)
+
+/* Ring setup. */
+/* Set number of descriptors in ring. This parameter can not
+ * be modified while ring is running (bound to a device). */
+#define VHOST_SET_VRING_NUM _IOW(VHOST_VIRTIO, 0x10, struct vhost_vring_state)
+/* Set addresses for the ring. */
+#define VHOST_SET_VRING_ADDR _IOW(VHOST_VIRTIO, 0x11, struct vhost_vring_addr)
+/* Base value where queue looks for available descriptors */
+#define VHOST_SET_VRING_BASE _IOW(VHOST_VIRTIO, 0x12, struct vhost_vring_state)
+/* Get accessor: reads index, writes value in num */
+#define VHOST_GET_VRING_BASE _IOWR(VHOST_VIRTIO, 0x12, struct 
vhost_vring_state)
+
+/* The following ioctls use eventfd file descriptors to signal and poll
+ * for events. */
+
+/* Set eventfd to poll for added buffers */
+#define VHOST_SET_VRING_KICK _IOW(VHOST_VIRTIO, 0x20, struct vhost_vring_file)
+/* Set eventfd to signal when buffers have beed used */
+#define VHOST_SET_VRING_CALL _IOW(VHOST_VIRTIO, 0x21, struct vhost_vring_file)
+/* Set eventfd to signal an error */
+#define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
+
+/* VHOST_NET specific defines */
+
+/* Attach virtio net ring to a raw socket, or tap device.
+ * The socket must be already bound to an ethernet device, this device will be
+ * used for transmit.  Pass fd -1 to unbind from the socket and the transmit
+ * device.  This can be used to stop the ring (e.g. for migration). */
+#define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, s

[PATCHv6 0/4] qemu-kvm: vhost net port

2010-03-17 Thread Michael S. Tsirkin

This is port of vhost v6 patch set I posted previously to qemu-kvm, for
those that want to get good performance out of it :) This patchset needs
to be applied when qemu.git one gets merged, this includes irqchip
support.

Changes from previous version:
- check kvm_enabled in irqfd call

Michael S. Tsirkin (4):
  qemu-kvm: add vhost.h header
  kvm: irqfd support
  msix: add mask/unmask notifiers
  virtio-pci: irqfd support

 hw/msix.c |   36 -
 hw/msix.h |1 +
 hw/pci.h  |6 ++
 hw/virtio-pci.c   |   27 +
 kvm-all.c |   19 +++
 kvm.h |   10 
 kvm/include/linux/vhost.h |  130 +
 7 files changed, 228 insertions(+), 1 deletions(-)
 create mode 100644 kvm/include/linux/vhost.h
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] [PATCH 1/2] KVM test: Refactoring the 'autotest' subtest

2010-03-17 Thread Lucas Meneghel Rodrigues

On Fri, Feb 26, 2010 at 1:13 AM, sudhir kumar  wrote:
> Looks good to me. It will definitely boost test speed for certain
> tests and give flexibility to use existing autotest strength in more
> granular way.

Thank you! FYI, this patch was applied, mainly because it's not
dependent on cpu_set test itself:

http://autotest.kernel.org/changeset/4308

> On Fri, Feb 26, 2010 at 1:13 AM, Lucas Meneghel Rodrigues
>  wrote:
>> Refactor autotest subtest into a utility function, so other
>> KVM subtests can run autotest control files in hosts as part
>> of their routine.
>>
>> This arrangement was made to accomodate the upcoming 'cpu_set'
>> test.
>>
>> Signed-off-by: Lucas Meneghel Rodrigues 
>> ---
>>  client/tests/kvm/kvm_test_utils.py |  165 
>> +++-
>>  client/tests/kvm/tests/autotest.py |  153 ++---
>>  2 files changed, 171 insertions(+), 147 deletions(-)
>>
>> diff --git a/client/tests/kvm/kvm_test_utils.py 
>> b/client/tests/kvm/kvm_test_utils.py
>> index 7d96d6e..71d6303 100644
>> --- a/client/tests/kvm/kvm_test_utils.py
>> +++ b/client/tests/kvm/kvm_test_utils.py
>> @@ -24,7 +24,7 @@ More specifically:
>>  import time, os, logging, re, commands
>>  from autotest_lib.client.common_lib import error
>>  from autotest_lib.client.bin import utils
>> -import kvm_utils, kvm_vm, kvm_subprocess
>> +import kvm_utils, kvm_vm, kvm_subprocess, scan_results
>>
>>
>>  def get_living_vm(env, vm_name):
>> @@ -237,3 +237,166 @@ def get_memory_info(lvms):
>>     meminfo = meminfo[0:-2] + "}"
>>
>>     return meminfo
>> +
>> +
>> +def run_autotest(vm, session, control_path, timeout, test_name, outputdir):
>> +    """
>> +    Run an autotest control file inside a guest (linux only utility).
>> +
>> +   �...@param vm: VM object.
>> +   �...@param session: A shell session on the VM provided.
>> +   �...@param control: An autotest control file.
>> +   �...@param timeout: Timeout under which the autotest test must complete.
>> +   �...@param test_name: Autotest client test name.
>> +   �...@param outputdir: Path on host where we should copy the guest 
>> autotest
>> +            results to.
>> +    """
>> +    def copy_if_size_differs(vm, local_path, remote_path):
>> +        """
>> +        Copy a file to a guest if it doesn't exist or if its size differs.
>> +
>> +       �...@param vm: VM object.
>> +       �...@param local_path: Local path.
>> +       �...@param remote_path: Remote path.
>> +        """
>> +        copy = False
>> +        basename = os.path.basename(local_path)
>> +        local_size = os.path.getsize(local_path)
>> +        output = session.get_command_output("ls -l %s" % remote_path)
>> +        if "such file" in output:
>> +            logging.info("Copying %s to guest (remote file is missing)" %
>> +                         basename)
>> +            copy = True
>> +        else:
>> +            try:
>> +                remote_size = output.split()[4]
>> +                remote_size = int(remote_size)
>> +            except IndexError, ValueError:
>> +                logging.error("Check for remote path size %s returned %s. "
>> +                              "Cannot process.", remote_path, output)
>> +                raise error.TestFail("Failed to check for %s (Guest died?)" 
>> %
>> +                                     remote_path)
>> +            if remote_size != local_size:
>> +                logging.debug("Copying %s to guest due to size mismatch"
>> +                              "(remote size %s, local size %s)" %
>> +                              (basename, remote_size, local_size))
>> +                copy = True
>> +
>> +        if copy:
>> +            if not vm.copy_files_to(local_path, remote_path):
>> +                raise error.TestFail("Could not copy %s to guest" % 
>> local_path)
>> +
>> +
>> +    def extract(vm, remote_path, dest_dir="."):
>> +        """
>> +        Extract a .tar.bz2 file on the guest.
>> +
>> +       �...@param vm: VM object
>> +       �...@param remote_path: Remote file path
>> +       �...@param dest_dir: Destination dir for the contents
>> +        """
>> +        basename = os.path.basename(remote_path)
>> +        logging.info("Extracting %s..." % basename)
>> +        (status, output) = session.get_command_status_output(
>> +                                  "tar xjvf %s -C %s" % (remote_path, 
>> dest_dir))
>> +        if status != 0:
>> +            logging.error("Uncompress output:\n%s" % output)
>> +            raise error.TestFail("Could not extract % on guest")
>> +
>> +    if not os.path.isfile(control_path):
>> +        raise error.TestError("Invalid path to autotest control file: %s" %
>> +                              control_path)
>> +
>> +    tarred_autotest_path = "/tmp/autotest.tar.bz2"
>> +    tarred_test_path = "/tmp/%s.tar.bz2" % test_name
>> +
>> +    # To avoid problems, let's make the test use the current AUTODIR
>> +    # (autotest client path) location
>> +    autotest_path =

Re: [Autotest] [PATCH] KVM-test: SR-IOV: Fix a bug that wrongly check VFs count

2010-03-17 Thread Lucas Meneghel Rodrigues

On Thu, Mar 11, 2010 at 2:54 AM, Yolkfull Chow  wrote:
> The parameter 'devices_requested' is irrelated to driver_option 'max_vfs'
> of 'igb'.
>
> NIC card 82576 has two network interfaces and each can be
> virtualized up to 7 virtual functions, therefore we multiply
> two for the value of driver_option 'max_vfs' and can thus get
> the total number of VFs.

Applied, thanks!

> Signed-off-by: Yolkfull Chow 
> ---
>  client/tests/kvm/kvm_utils.py |   19 +--
>  1 files changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
> index 4565dc1..1813ed1 100644
> --- a/client/tests/kvm/kvm_utils.py
> +++ b/client/tests/kvm/kvm_utils.py
> @@ -1012,17 +1012,22 @@ class PciAssignable(object):
>         """
>         Get VFs count number according to lspci.
>         """
> +        # FIXME: Need to think out a method of identify which
> +        # 'virtual function' belongs to which physical card considering
> +        # that if the host has more than one 82576 card. PCI_ID?
>         cmd = "lspci | grep 'Virtual Function' | wc -l"
> -        # For each VF we'll see 2 prints of 'Virtual Function', so let's
> -        # divide the result per 2
> -        return int(commands.getoutput(cmd)) / 2
> +        return int(commands.getoutput(cmd))
>
>
>     def check_vfs_count(self):
>         """
>         Check VFs count number according to the parameter driver_options.
>         """
> -        return (self.get_vfs_count == self.devices_requested)
> +        # Network card 82576 has two network interfaces and each can be
> +        # virtualized up to 7 virtual functions, therefore we multiply
> +        # two for the value of driver_option 'max_vfs'.
> +        expected_count = int((re.findall("(\d)", self.driver_option)[0])) * 2
> +        return (self.get_vfs_count == expected_count)
>
>
>     def is_binded_to_stub(self, full_id):
> @@ -1054,15 +1059,17 @@ class PciAssignable(object):
>         elif not self.check_vfs_count():
>             os.system("modprobe -r %s" % self.driver)
>             re_probe = True
> +        else:
> +            return True
>
>         # Re-probe driver with proper number of VFs
>         if re_probe:
>             cmd = "modprobe %s %s" % (self.driver, self.driver_option)
> +            logging.info("Loading the driver '%s' with option '%s'" %
> +                                   (self.driver, self.driver_option))
>             s, o = commands.getstatusoutput(cmd)
>             if s:
>                 return False
> -            if not self.check_vfs_count():
> -                return False
>             return True
>
>
> --
> 1.7.0.1
>
> ___
> Autotest mailing list
> autot...@test.kernel.org
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
>



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM test: Make qcow2 check image non critical

2010-03-17 Thread Lucas Meneghel Rodrigues

Instead of forcing the vms to shut down due to qemu-img
check step, just make the postprocess step non-critical,
ie, doesn't make the test fail because of it. The check
is still there, but it won't mask the results of tests
itself, while providing useful additional info.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/tests_base.cfg.sample |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index beae786..bb455e6 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -1049,8 +1049,7 @@ variants:
 post_command = " python scripts/check_image.py;"
 remove_image = no
 post_command_timeout = 600
-kill_vm = yes
-kill_vm_gracefully = yes
+post_command_noncritical = yes
 - vmdk:
 only Fedora Ubuntu Windows
 only smp2
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: >2 serial ports?

2010-03-17 Thread Paul Brook

> Oh, well, yes, I remember.  qemu is more strict on ISA irq sharing now.
>   A bit too strict.
> 
> /me goes dig out a old patch which never made it upstream for some
> reason I forgot.  Attached.

This is wrong. Two devices should never be manipulating the same qemu_irq 
object.  If you want multiple devices connected to the same IRQ then you need 
an explicit multiplexer. e.g. arm_timer.c:sp804_set_irq.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: >2 serial ports?

2010-03-17 Thread Michael Tokarev

Neo Jia wrote:
> May I ask if it is possible to bind a real physical serial port to a guest?

It is all described in the documentation, quite a long list of
various things you can attach to a virtual serial port, incl.
a real one.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: >2 serial ports?

2010-03-17 Thread Michael Tokarev

Gerd Hoffmann wrote:
> On 03/17/10 09:38, Michael Tokarev wrote:
>> Since 0.12, it appears that kvm does not allow more than
>> 2 serial ports for a guest:
>>
>> $ kvm \
>>   -serial unix:s1,server,nowait \
>>   -serial unix:s2,server,nowait \
>>   -serial unix:s3,server,nowait
>> isa irq 4 already assigned
>>
>> Is there a work-around for this?
> 
> Oh, well, yes, I remember.  qemu is more strict on ISA irq sharing now.
>  A bit too strict.
> 
> /me goes dig out a old patch which never made it upstream for some
> reason I forgot.  Attached.

I tried the patch, and it now appears to work.  I did not try
to run various stress tests so far, but basic tests are fine.

Thank you Gerd!  And I think it's time to push it finally :)

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 2/3] Provides multiple submits and asynchronous notifications.

2010-03-17 Thread Michael S. Tsirkin

On Wed, Mar 17, 2010 at 05:48:10PM +0800, Xin, Xiaohui wrote:
> >> Michael,
> >> I don't use the kiocb comes from the sendmsg/recvmsg,
> > >since I have embeded the kiocb in page_info structure,
> > >and allocate it when page_info allocated.
> 
> >So what I suggested was that vhost allocates and tracks the iocbs, and
> >passes them to your device with sendmsg/ recvmsg calls. This way your
> >device won't need to share structures and locking strategy with vhost:
> >you get an iocb, handle it, invoke a callback to notify vhost about
> >completion.
> 
> >This also gets rid of the 'receiver' callback
> 
> I'm not sure receiver callback can be removed here:
> The patch describes a work flow like this:
> netif_receive_skb() gets the packet, it does nothing but just queue the skb
> and wakeup the handle_rx() of vhost. handle_rx() then calls the receiver 
> callback
> to deal with skb and and get the necessary notify info into a list, vhost 
> owns the 
> list and in the same handle_rx() context use it to complete.
> 
> We use "receiver" callback here is because only handle_rx() is waked up from
> netif_receive_skb(), and we need mp device context to deal with the skb and
> notify info attached to it. We also have some lock in the callback function.
> 
> If I remove the receiver callback, I can only deal with the skb and notify
> info in netif_receive_skb(), but this function is in an interrupt context,
> which I think lock is not allowed there. But I cannot remove the lock there.
> 

The basic idea is that vhost passes iocb to recvmsg and backend
completes the iocb to signal that data is ready. That completion could
be in interrupt context and so we need to switch to workqueue to handle
the event, it is true, but the code to do this would live in vhost.c or
net.c.

With this structure your device won't depend on
vhost, and can go under drivers/net/, opening up possibility
to use it for zero copy without vhost in the future.



> >> Please have a review and thanks for the instruction
> >> for replying email which helps me a lot.
> >> 
> > >Thanks,
> > >Xiaohui
> > >
> > > drivers/vhost/net.c   |  159 
> > > +++--
> >>  drivers/vhost/vhost.h |   12 
> >>  2 files changed, 166 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > >index 22d5fef..5483848 100644
> > >--- a/drivers/vhost/net.c
> > >+++ b/drivers/vhost/net.c
> > >@@ -17,11 +17,13 @@
> > > #include 
> > > #include 
> > > #include 
> > >+#include 
> > > 
> > > #include 
> > > #include 
> > > #include 
> > > #include 
> > >+#include 
> > > 
> > > #include 
> > > 
> > >@@ -91,6 +93,12 @@ static void tx_poll_start(struct vhost_net *net, struct 
> > >socket *sock)
> > >   net->tx_poll_state = VHOST_NET_POLL_STARTED;
> > > }
> > > 
> > >+static void handle_async_rx_events_notify(struct vhost_net *net,
> > >+  struct vhost_virtqueue *vq);
> > >+
> > >+static void handle_async_tx_events_notify(struct vhost_net *net,
> > >+  struct vhost_virtqueue *vq);
> > >+
> 
> >A couple of style comments:
> >
> >- It's better to arrange functions in such order that forward declarations
> >aren't necessary.  Since we don't have recursion, this should always be
> >possible.
> 
> >- continuation lines should be idented at least at the position of '('
> >on the previous line.
> 
> Thanks. I'd correct that.
> 
> >>  /* Expects to be always run from workqueue - which acts as
> >>   * read-size critical section for our kind of RCU. */
> >>  static void handle_tx(struct vhost_net *net)
> >> @@ -124,6 +132,8 @@ static void handle_tx(struct vhost_net *net)
> >>tx_poll_stop(net);
> >>hdr_size = vq->hdr_size;
> >>  
> >> +  handle_async_tx_events_notify(net, vq);
> > >+
> >>for (;;) {
> >>head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
> >> ARRAY_SIZE(vq->iov),
> > >@@ -151,6 +161,12 @@ static void handle_tx(struct vhost_net *net)
> >>/* Skip header. TODO: support TSO. */
> >>s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, out);
> >>msg.msg_iovlen = out;
> > >+
> > >+  if (vq->link_state == VHOST_VQ_LINK_ASYNC) {
> > >+  vq->head = head;
> > >+  msg.msg_control = (void *)vq;
> 
> >So here a device gets a pointer to vhost_virtqueue structure. If it gets
> >an iocb and invokes a callback, it would not care about vhost internals.
> 
> >> +  }
> >> +
> >>len = iov_length(vq->iov, out);
> >>/* Sanity check */
> >>if (!len) {
> >> @@ -166,6 +182,10 @@ static void handle_tx(struct vhost_net *net)
> >>tx_poll_start(net, sock);
> >>break;
> >>}
> >> +
> >> +  if (vq->link_state == VHOST_VQ_LINK_ASYNC)
> >> +  continue;
> >>+
> >>if (err != len)
> >>p

Re: [PATCH v2] KVM: cleanup {kvm_vm_ioctl, kvm}_get_dirty_log()

2010-03-17 Thread Xiao Guangrong


Takuya Yoshikawa wrote:

> 
> Ah, probably checking the git log will explain you why it is like that!
> Marcelo's work? IIRC.

Oh, i find this commit:

commit 706831a7faec7ac0d3057d20df8234c45bbbc3c5
Author: Marcelo Tosatti 
Date:   Wed Dec 23 14:35:22 2009 -0200

KVM: use SRCU for dirty log

Signed-off-by: Marcelo Tosatti 

But i don't know why Marcelo separates kvm_get_dirty_log()'s code
into kvm_vm_ioctl_get_dirty_log(). :-(

Thanks,
Xiao

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Avi Kivity


On 03/17/2010 11:51 AM, Sheng Yang wrote:



I think you need DM_NMI for that to work correctly.

An alternative is to call the NMI handler directly.
 

apic_send_IPI_self() already took care of APIC_DM_NMI.
   


So it does (though not for x2apic?).  I don't see why it doesn't work.


And NMI handler would block the following NMI?

   


It wouldn't - won't work without extensive changes.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Sheng Yang

On Wednesday 17 March 2010 17:41:58 Avi Kivity wrote:
> On 03/17/2010 11:28 AM, Sheng Yang wrote:
> >> I'm not sure if vmexit does break NMI context or not. Hardware NMI
> >> context isn't reentrant till a IRET. YangSheng would like to double
> >> check it.
> >
> > After more check, I think VMX won't remained NMI block state for host.
> > That's means, if NMI happened and processor is in VMX non-root mode, it
> > would only result in VMExit, with a reason indicate that it's due to NMI
> > happened, but no more state change in the host.
> >
> > So in that meaning, there _is_ a window between VMExit and KVM handle the
> > NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling
> > code because "int $2" don't have effect to block following NMI.
> 
> That's pretty bad, as NMI runs on a separate stack (via IST).  So if
> another NMI happens while our int $2 is running, the stack will be
> corrupted.

Though hardware didn't provide this kind of block, software at least would 
warn about it... nmi_enter() still would be executed by "int $2", and result 
in BUG() if we are already in NMI context(OK, it is a little better than 
mysterious crash due to corrupted stack).
> 
> > And if the NMI sequence is not important(I think so), then we need to
> > generate a real NMI in current vmexit-after code. Seems let APIC send a
> > NMI IPI to itself is a good idea.
> >
> > I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to
> > replace "int $2". Something unexpected is happening...
> 
> I think you need DM_NMI for that to work correctly.
> 
> An alternative is to call the NMI handler directly.

apic_send_IPI_self() already took care of APIC_DM_NMI.

And NMI handler would block the following NMI?

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v1 2/3] Provides multiple submits and asynchronous notifications.

2010-03-17 Thread Xin, Xiaohui

>> Michael,
>> I don't use the kiocb comes from the sendmsg/recvmsg,
> >since I have embeded the kiocb in page_info structure,
> >and allocate it when page_info allocated.

>So what I suggested was that vhost allocates and tracks the iocbs, and
>passes them to your device with sendmsg/ recvmsg calls. This way your
>device won't need to share structures and locking strategy with vhost:
>you get an iocb, handle it, invoke a callback to notify vhost about
>completion.

>This also gets rid of the 'receiver' callback

I'm not sure receiver callback can be removed here:
The patch describes a work flow like this:
netif_receive_skb() gets the packet, it does nothing but just queue the skb
and wakeup the handle_rx() of vhost. handle_rx() then calls the receiver 
callback
to deal with skb and and get the necessary notify info into a list, vhost owns 
the 
list and in the same handle_rx() context use it to complete.

We use "receiver" callback here is because only handle_rx() is waked up from
netif_receive_skb(), and we need mp device context to deal with the skb and
notify info attached to it. We also have some lock in the callback function.

If I remove the receiver callback, I can only deal with the skb and notify
info in netif_receive_skb(), but this function is in an interrupt context,
which I think lock is not allowed there. But I cannot remove the lock there.


>> Please have a review and thanks for the instruction
>> for replying email which helps me a lot.
>> 
> >Thanks,
> >Xiaohui
> >
> > drivers/vhost/net.c   |  159 
> > +++--
>>  drivers/vhost/vhost.h |   12 
>>  2 files changed, 166 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >index 22d5fef..5483848 100644
> >--- a/drivers/vhost/net.c
> >+++ b/drivers/vhost/net.c
> >@@ -17,11 +17,13 @@
> > #include 
> > #include 
> > #include 
> >+#include 
> > 
> > #include 
> > #include 
> > #include 
> > #include 
> >+#include 
> > 
> > #include 
> > 
> >@@ -91,6 +93,12 @@ static void tx_poll_start(struct vhost_net *net, struct 
> >socket *sock)
> > net->tx_poll_state = VHOST_NET_POLL_STARTED;
> > }
> > 
> >+static void handle_async_rx_events_notify(struct vhost_net *net,
> >+struct vhost_virtqueue *vq);
> >+
> >+static void handle_async_tx_events_notify(struct vhost_net *net,
> >+struct vhost_virtqueue *vq);
> >+

>A couple of style comments:
>
>- It's better to arrange functions in such order that forward declarations
>aren't necessary.  Since we don't have recursion, this should always be
>possible.

>- continuation lines should be idented at least at the position of '('
>on the previous line.

Thanks. I'd correct that.

>>  /* Expects to be always run from workqueue - which acts as
>>   * read-size critical section for our kind of RCU. */
>>  static void handle_tx(struct vhost_net *net)
>> @@ -124,6 +132,8 @@ static void handle_tx(struct vhost_net *net)
>>  tx_poll_stop(net);
>>  hdr_size = vq->hdr_size;
>>  
>> +handle_async_tx_events_notify(net, vq);
> >+
>>  for (;;) {
>>  head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
>>   ARRAY_SIZE(vq->iov),
> >@@ -151,6 +161,12 @@ static void handle_tx(struct vhost_net *net)
>>  /* Skip header. TODO: support TSO. */
>>  s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, out);
>>  msg.msg_iovlen = out;
> >+
> >+if (vq->link_state == VHOST_VQ_LINK_ASYNC) {
> >+vq->head = head;
> >+msg.msg_control = (void *)vq;

>So here a device gets a pointer to vhost_virtqueue structure. If it gets
>an iocb and invokes a callback, it would not care about vhost internals.

>> +}
>> +
>>  len = iov_length(vq->iov, out);
>>  /* Sanity check */
>>  if (!len) {
>> @@ -166,6 +182,10 @@ static void handle_tx(struct vhost_net *net)
>>  tx_poll_start(net, sock);
>>  break;
>>  }
>> +
>> +if (vq->link_state == VHOST_VQ_LINK_ASYNC)
>> +continue;
>>+
>>  if (err != len)
>>  pr_err("Truncated TX packet: "
>> " len %d != %zd\n", err, len);
>> @@ -177,6 +197,8 @@ static void handle_tx(struct vhost_net *net)
>>  }
>>  }
>>  
>> +handle_async_tx_events_notify(net, vq);
>> +
>>  mutex_unlock(&vq->mutex);
>>  unuse_mm(net->dev.mm);
>>  }
>>@@ -206,7 +228,8 @@ static void handle_rx(struct vhost_net *net)
>>  int err;
>>  size_t hdr_size;
>>  struct socket *sock = rcu_dereference(vq->private_data);
>> -if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue))
>> +if (!sock || (skb_queue_empty(&sock->sk->sk_receive_queue) &&
>> +vq->link_state == VHOST_VQ_LINK_SYNC))
>>

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Avi Kivity


On 03/17/2010 11:28 AM, Sheng Yang wrote:



I'm not sure if vmexit does break NMI context or not. Hardware NMI context
isn't reentrant till a IRET. YangSheng would like to double check it.
 

After more check, I think VMX won't remained NMI block state for host. That's
means, if NMI happened and processor is in VMX non-root mode, it would only
result in VMExit, with a reason indicate that it's due to NMI happened, but no
more state change in the host.

So in that meaning, there _is_ a window between VMExit and KVM handle the NMI.
Moreover, I think we _can't_ stop the re-entrance of NMI handling code because
"int $2" don't have effect to block following NMI.
   


That's pretty bad, as NMI runs on a separate stack (via IST).  So if 
another NMI happens while our int $2 is running, the stack will be 
corrupted.



And if the NMI sequence is not important(I think so), then we need to generate
a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to
itself is a good idea.

I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to replace
"int $2". Something unexpected is happening...
   


I think you need DM_NMI for that to work correctly.

An alternative is to call the NMI handler directly.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Sheng Yang

On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
> > On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
> > > Right, but there is a scope between kvm_guest_enter and really running
> > > in guest os, where a perf event might overflow. Anyway, the scope is
> > > very narrow, I will change it to use flag PF_VCPU.
> >
> > There is also a window between setting the flag and calling 'int $2'
> > where an NMI might happen and be accounted incorrectly.
> >
> > Perhaps separate the 'int $2' into a direct call into perf and another
> > call for the rest of NMI handling.  I don't see how it would work on svm
> > though - AFAICT the NMI is held whereas vmx swallows it.
> >
> >  I guess NMIs
> > will be disabled until the next IRET so it isn't racy, just tricky.
> 
> I'm not sure if vmexit does break NMI context or not. Hardware NMI context
> isn't reentrant till a IRET. YangSheng would like to double check it.

After more check, I think VMX won't remained NMI block state for host. That's 
means, if NMI happened and processor is in VMX non-root mode, it would only 
result in VMExit, with a reason indicate that it's due to NMI happened, but no 
more state change in the host.

So in that meaning, there _is_ a window between VMExit and KVM handle the NMI. 
Moreover, I think we _can't_ stop the re-entrance of NMI handling code because 
"int $2" don't have effect to block following NMI.

And if the NMI sequence is not important(I think so), then we need to generate 
a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to 
itself is a good idea.

I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to replace 
"int $2". Something unexpected is happening...

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Zhang, Yanmin

On Tue, 2010-03-16 at 10:47 +0100, Ingo Molnar wrote:
> * Zhang, Yanmin  wrote:
> 
> > On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote:
> > > On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote:
> > > > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> > > > > From: Zhang, Yanmin
> > > > >
> > > > > Based on the discussion in KVM community, I worked out the patch to 
> > > > > support
> > > > > perf to collect guest os statistics from host side. This patch is 
> > > > > implemented
> > > > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed 
> > > > > out a
> > > > > critical bug and provided good suggestions with other guys. I really 
> > > > > appreciate
> > > > > their kind help.
> > > > >
> > > > > The patch adds new subcommand kvm to perf.
> > > > >
> > > > >perf kvm top
> > > > >perf kvm record
> > > > >perf kvm report
> > > > >perf kvm diff
> > > > >
> > > > > The new perf could profile guest os kernel except guest os user 
> > > > > space, but it
> > > > > could summarize guest os user space utilization per guest os.
> > > > >
> > > > > Below are some examples.
> > > > > 1) perf kvm top
> > > > > [r...@lkp-ne01 norm]# perf kvm --host --guest 
> > > > > --guestkallsyms=/home/ymzhang/guest/kallsyms
> > > > > --guestmodules=/home/ymzhang/guest/modules top
> > > > >
> > > > >
> > > > 
> > > Thanks for your kind comments.
> > > 
> > > > Excellent, support for guest kernel != host kernel is critical (I can't 
> > > > remember the last time I ran same kernels).
> > > > 
> > > > How would we support multiple guests with different kernels?
> > > With the patch, 'perf kvm report --sort pid" could show
> > > summary statistics for all guest os instances. Then, use
> > > parameter --pid of 'perf kvm record' to collect single problematic 
> > > instance data.
> > Sorry. I found currently --pid isn't process but a thread (main thread).
> > 
> > Ingo,
> > 
> > Is it possible to support a new parameter or extend --inherit, so 'perf 
> > record' and 'perf top' could collect data on all threads of a process when 
> > the process is running?
> > 
> > If not, I need add a new ugly parameter which is similar to --pid to filter 
> > out process data in userspace.
> 
> Yeah. For maximum utility i'd suggest to extend --pid to include this, and 
> introduce --tid for the previous, limited-to-a-single-task functionality.
> 
> Most users would expect --pid to work like a 'late attach' - i.e. to work 
> like 
> strace -f or like a gdb attach.

Thanks Ingo, Avi.

I worked out below patch against tip/master of March 15th.

Subject: [PATCH] Change perf's parameter --pid to process-wide collection
From: Zhang, Yanmin 

Change parameter -p (--pid) to real process pid and add -t (--tid) meaning
thread id. Now, --pid means perf collects the statistics of all threads of
the process, while --tid means perf just collect the statistics of that thread.

BTW, the patch fixes a bug of 'perf stat -p'. 'perf stat' always configures
attr->disabled=1 if it isn't a system-wide collection. If there is a '-p'
and no forks, 'perf stat -p' doesn't collect any data. In addition, the
while(!done) in run_perf_stat consumes 100% single cpu time which has bad impact
on running workload. I added a sleep(1) in the loop.

Signed-off-by: Zhang Yanmin 

---

diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-record.c 
linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-record.c
--- linux-2.6_tipmaster0315/tools/perf/builtin-record.c 2010-03-16 
08:59:54.896488489 +0800
+++ linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-record.c 2010-03-17 
16:30:17.71706 +0800
@@ -27,7 +27,7 @@
 #include 
 #include 
 
-static int fd[MAX_NR_CPUS][MAX_COUNTERS];
+static int *fd[MAX_NR_CPUS][MAX_COUNTERS];
 
 static longdefault_interval=  0;
 
@@ -43,6 +43,9 @@ static intraw_samples 
=  0;
 static int system_wide =  0;
 static int profile_cpu = -1;
 static pid_t   target_pid  = -1;
+static pid_t   target_tid  = -1;
+static int *all_tids   =  NULL;
+static int thread_num  =  0;
 static pid_t   child_pid   = -1;
 static int inherit =  1;
 static int force   =  0;
@@ -60,7 +63,7 @@ static struct timeval this_read;
 
 static u64 bytes_written   =  0;
 
-static struct pollfd   event_array[MAX_NR_CPUS * MAX_COUNTERS];
+static struct pollfd   *event_array;
 
 static int nr_poll =  0;
 static int n

Re: >2 serial ports?

2010-03-17 Thread Neo Jia

May I ask if it is possible to bind a real physical serial port to a guest?

Thanks,
Neo

On Wed, Mar 17, 2010 at 1:38 AM, Michael Tokarev  wrote:
> Since 0.12, it appears that kvm does not allow more than
> 2 serial ports for a guest:
>
> $ kvm \
>  -serial unix:s1,server,nowait \
>  -serial unix:s2,server,nowait \
>  -serial unix:s3,server,nowait
> isa irq 4 already assigned
>
> Is there a work-around for this?
>
> Thanks!
>
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
I would remember that if researchers were not ambitious
probably today we haven't the technology we are using!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: >2 serial ports?

2010-03-17 Thread Gerd Hoffmann


On 03/17/10 09:38, Michael Tokarev wrote:

Since 0.12, it appears that kvm does not allow more than
2 serial ports for a guest:

$ kvm \
  -serial unix:s1,server,nowait \
  -serial unix:s2,server,nowait \
  -serial unix:s3,server,nowait
isa irq 4 already assigned

Is there a work-around for this?


Oh, well, yes, I remember.  qemu is more strict on ISA irq sharing now. 
 A bit too strict.


/me goes dig out a old patch which never made it upstream for some 
reason I forgot.  Attached.


HTH,
  Gerd
>From 7d5d53e8a23544ac6413487a8ecdd43537ade9f3 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann 
Date: Fri, 11 Sep 2009 13:43:46 +0200
Subject: [PATCH] isa: refine irq reservations

There are a few cases where IRQ sharing on the ISA bus is used and
possible.  In general only devices of the same kind can do that.
A few use cases:

  * serial lines 1+3 share irq 4
  * serial lines 2+4 share irq 3
  * parallel ports share irq 7
  * ppc/prep: ide ports share irq 13

This patch refines the irq reservation mechanism for the isa bus to
handle those cases.  It keeps track of the driver which owns the IRQ in
question and allows irq sharing for devices handled by the same driver.

Signed-off-by: Gerd Hoffmann 
---
 hw/isa-bus.c |   16 +---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/hw/isa-bus.c b/hw/isa-bus.c
index 4d489d2..bd2f69c 100644
--- a/hw/isa-bus.c
+++ b/hw/isa-bus.c
@@ -26,6 +26,7 @@ struct ISABus {
 BusState qbus;
 qemu_irq *irqs;
 uint32_t assigned;
+DeviceInfo *irq_owner[16];
 };
 static ISABus *isabus;
 
@@ -71,7 +72,9 @@ qemu_irq isa_reserve_irq(int isairq)
 exit(1);
 }
 if (isabus->assigned & (1 << isairq)) {
-fprintf(stderr, "isa irq %d already assigned\n", isairq);
+DeviceInfo *owner = isabus->irq_owner[isairq];
+fprintf(stderr, "isa irq %d already assigned (%s)\n",
+isairq, owner ? owner->name : "unknown");
 exit(1);
 }
 isabus->assigned |= (1 << isairq);
@@ -82,10 +85,17 @@ void isa_init_irq(ISADevice *dev, qemu_irq *p, int isairq)
 {
 assert(dev->nirqs < ARRAY_SIZE(dev->isairq));
 if (isabus->assigned & (1 << isairq)) {
-fprintf(stderr, "isa irq %d already assigned\n", isairq);
-exit(1);
+DeviceInfo *owner = isabus->irq_owner[isairq];
+if (owner == dev->qdev.info) {
+/* irq sharing is ok in case the same driver handles both */;
+} else {
+fprintf(stderr, "isa irq %d already assigned (%s)\n",
+isairq, owner ? owner->name : "unknown");
+exit(1);
+}
 }
 isabus->assigned |= (1 << isairq);
+isabus->irq_owner[isairq] = dev->qdev.info;
 dev->isairq[dev->nirqs] = isairq;
 *p = isabus->irqs[isairq];
 dev->nirqs++;
-- 
1.6.6.1

Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-17 Thread Avi Kivity


On 03/17/2010 10:49 AM, Christoph Hellwig wrote:

On Tue, Mar 16, 2010 at 01:08:28PM +0200, Avi Kivity wrote:
   

If the batch size is larger than the virtio queue size, or if there are
no flushes at all, then yes the huge write cache gives more opportunity
for reordering.  But we're already talking hundreds of requests here.
 

Yes.  And rememember those don't have to come from the same host.  Also
remember that we rather limit execssive reodering of O_DIRECT requests
in the I/O scheduler because they are "synchronous" type I/O while
we don't do that for pagecache writeback.
   


Maybe we should relax that for kvm.  Perhaps some of the problem comes 
from the fact that we call io_submit() once per request.



And we don't have unlimited virtio queue size, in fact it's quite
limited.
   


That can be extended easily if it fixes the problem.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-17 Thread Ingo Molnar


* Avi Kivity  wrote:

> On 03/17/2010 10:16 AM, Ingo Molnar wrote:
> >* Avi Kivity  wrote:
> >
> >> Monitoring guests from the host is useful for kvm developers, but less so
> >> for users.
> >
> > Guest space profiling is easy, and 'perf kvm' is not about that. (plain 
> > 'perf' will work if a proper paravirt channel is opened to the host)
> >
> > I think you might have misunderstood the purpose and role of the 'perf 
> > kvm' patch here? 'perf kvm' is aimed at KVM developers: it is them who 
> > improve KVM code, not guest kernel users.
> 
> Of course I understood it.  My point was that 'perf kvm' serves a tiny 
> minority of users. [...]

I hope you wont be disappointed to learn that 100% of Linux, all 13+ million 
lines of it, was and is being developed by a tiny, tiny, tiny minority of 
users ;-)

> [...]  That doesn't mean it isn't useful, just that it doesn't satisfy all 
> needs by itself.

Of course - and it doesnt bring world peace either. One step at a time.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 114 matches

Mail list logo