[PATCH] KVM: VMX: Disable unrestricted guest when EPT disabled
Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest supported processors. Signed-off-by: Sheng Yang --- Please apply this to 2.6.32 stable. Thanks! Patch already in the upstream, commit: 046d87103addc117f0d397196e85189722d4d7de arch/x86/kvm/vmx.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 80367c5..1092e8a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2316,8 +2316,10 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; if (vmx->vpid == 0) exec_control &= ~SECONDARY_EXEC_ENABLE_VPID; - if (!enable_ept) + if (!enable_ept) { exec_control &= ~SECONDARY_EXEC_ENABLE_EPT; + enable_unrestricted_guest = 0; + } if (!enable_unrestricted_guest) exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST; vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control); -- 1.5.4.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Disable unrestricted guest when EPT disabled
On Thursday 18 March 2010 13:51:41 Alexander Graf wrote: > On 18.03.2010, at 02:50, Sheng Yang wrote: > > On Thursday 18 March 2010 02:37:10 Alexander Graf wrote: > >> Marcelo Tosatti wrote: > >>> On Fri, Nov 27, 2009 at 04:46:26PM +0800, Sheng Yang wrote: > Otherwise would cause VMEntry failure when using ept=0 on unrestricted > guest supported processors. > > Signed-off-by: Sheng Yang > >>> > >>> Applied, thanks. > >> > >> So without this patch kvm breaks with ept=0? Sounds like a stable > >> candidate to me. > > > > Seems unrestricted guest code isn't in v2.6.31-stable, and v2.6.32 had > > already fixed this issue. So it should be fine. > > Are you sure? I don't see the patch in 2.6.32-stable git. Yes, you are right. Found it not in 2.6.32-stable... Would post a patch for stable. Thanks -- regards Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Disable unrestricted guest when EPT disabled
On 18.03.2010, at 02:50, Sheng Yang wrote: > On Thursday 18 March 2010 02:37:10 Alexander Graf wrote: >> Marcelo Tosatti wrote: >>> On Fri, Nov 27, 2009 at 04:46:26PM +0800, Sheng Yang wrote: Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest supported processors. Signed-off-by: Sheng Yang >>> >>> Applied, thanks. >> >> So without this patch kvm breaks with ept=0? Sounds like a stable >> candidate to me. > > Seems unrestricted guest code isn't in v2.6.31-stable, and v2.6.32 had > already > fixed this issue. So it should be fine. Are you sure? I don't see the patch in 2.6.32-stable git. Alex-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 23/42] KVM: Activate Virtualization On Demand
Dieter Ries wrote: > On Wed, Mar 17, 2010 at 11:02:40PM +0100, Alexander Graf wrote: [] >> Are you 100% sure you don't have vmware, virtualbox, parallels, whatever >> running in parallel on that machine? > > Definitely. I have virtualbox installed, but haven't used it in months. > The others I don't use at all, so they are not installed either. Dieter, we've talked with you on IRC yesterday... Can you take a look at what's in the startup script sequence on your machine, and what modules are loaded which may be related? What I'm trying to say: I don't know how virtualbox works, but it may come with a kernel module or a bootup script that touches SVM settings. /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
On Thursday 18 March 2010 13:22:28 Sheng Yang wrote: > On Thursday 18 March 2010 12:50:58 Zachary Amsden wrote: > > On 03/17/2010 03:19 PM, Sheng Yang wrote: > > > On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote: > > >> On 03/16/2010 11:28 PM, Sheng Yang wrote: > > >>> On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote: > > On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote: > > > On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: > > >> Right, but there is a scope between kvm_guest_enter and really > > >> running in guest os, where a perf event might overflow. Anyway, > > >> the scope is very narrow, I will change it to use flag PF_VCPU. > > > > > > There is also a window between setting the flag and calling 'int > > > $2' where an NMI might happen and be accounted incorrectly. > > > > > > Perhaps separate the 'int $2' into a direct call into perf and > > > another call for the rest of NMI handling. I don't see how it > > > would work on svm though - AFAICT the NMI is held whereas vmx > > > swallows it. > > > > > >I guess NMIs > > > will be disabled until the next IRET so it isn't racy, just tricky. > > > > I'm not sure if vmexit does break NMI context or not. Hardware NMI > > context isn't reentrant till a IRET. YangSheng would like to double > > check it. > > >>> > > >>> After more check, I think VMX won't remained NMI block state for > > >>> host. That's means, if NMI happened and processor is in VMX non-root > > >>> mode, it would only result in VMExit, with a reason indicate that > > >>> it's due to NMI happened, but no more state change in the host. > > >>> > > >>> So in that meaning, there _is_ a window between VMExit and KVM handle > > >>> the NMI. Moreover, I think we _can't_ stop the re-entrance of NMI > > >>> handling code because "int $2" don't have effect to block following > > >>> NMI. > > >>> > > >>> And if the NMI sequence is not important(I think so), then we need to > > >>> generate a real NMI in current vmexit-after code. Seems let APIC send > > >>> a NMI IPI to itself is a good idea. > > >>> > > >>> I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to > > >>> replace "int $2". Something unexpected is happening... > > >> > > >> You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't > > >> supposed to be able to. > > > > > > Um? Why? > > > > > > Especially kernel is already using it to deliver NMI. > > > > That's the only defined case, and it is defined because the vector field > > is ignore for DM_NMI. Vol 3A (exact section numbers may vary depending > > on your version). > > > > 8.5.1 / 8.6.1 > > > > '100 (NMI) Delivers an NMI interrupt to the target processor or > > processors. The vector information is ignored' > > > > 8.5.2 Valid Interrupt Vectors > > > > 'Local and I/O APICs support 240 of these vectors (in the range of 16 to > > 255) as valid interrupts.' > > > > 8.8.4 Interrupt Acceptance for Fixed Interrupts > > > > '...; vectors 0 through 15 are reserved by the APIC (see also: Section > > 8.5.2, "Valid Interrupt Vectors")' > > > > So I misremembered, apparently you can deliver interrupts 0x10-0x1f, but > > vectors 0x00-0x0f are not valid to send via APIC or I/O APIC. > > As you pointed out, NMI is not "Fixed interrupt". If we want to send NMI, > it would need a specific delivery mode rather than vector number. > > And if you look at code, if we specific NMI_VECTOR, the delivery mode would > be set to NMI. > > So what's wrong here? OK, I think I understand your points now. You meant that these vectors can't be filled in vector field directly, right? But NMI is a exception due to DM_NMI. Is that your point? I think we agree on this. -- regards Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] Enhance perf to collect KVM guest os statistics from host side
Hi Avi, Ingo, I've been following through this long thread since the very first email. I'm a performance engineer whose job is to tune workloads run on top of KVM (and Xen previously). As a performance engineer, I desperately want to have a tool that can monitor the host and guests at same time. Think about >100 guests mixed with Linux/Windows running together on single system, being able to know what's happening is critical to do performance analysis. Actually I am the person asked Yanmin to add feature for CPU utilization break down (into host_usr, host_krn, guest_usr, guest_krn) so that I can monitor dozens of running guests. I hasn't made this patch work on my system yet but I _do_ think this patch is a very good start. And finally, monitoring guests from host is useful for users too (administrator and performance guy like me). I really appreciate you guys' work and would love to provide feedback from my point of view if needed. Regards, HUANG, Zhiteng Intel SSG/SSD/SPA/PRC Scalability Lab -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Avi Kivity Sent: Wednesday, March 17, 2010 11:55 AM To: Frank Ch. Eigler Cc: Anthony Liguori; Ingo Molnar; Zhang, Yanmin; Peter Zijlstra; Sheng Yang; linux-ker...@vger.kernel.org; kvm@vger.kernel.org; Marcelo Tosatti; oerg Roedel; Jes Sorensen; Gleb Natapov; Zachary Amsden; ziteng.hu...@intel.com Subject: Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side On 03/17/2010 02:41 AM, Frank Ch. Eigler wrote: > Hi - > > On Tue, Mar 16, 2010 at 06:04:10PM -0500, Anthony Liguori wrote: > >> [...] >> The only way to really address this is to change the interaction. >> Instead of running perf externally to qemu, we should support a perf >> command in the qemu monitor that can then tie directly to the perf >> tooling. That gives us the best possible user experience. >> > To what extent could this be solved with less crossing of > isolation/abstraction layers, if the perfctr facilities were properly > virtualized? > That's the more interesting (by far) usage model. In general guest owners don't have access to the host, and host owners can't (and shouldn't) change guests. Monitoring guests from the host is useful for kvm developers, but less so for users. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
On Thursday 18 March 2010 12:50:58 Zachary Amsden wrote: > On 03/17/2010 03:19 PM, Sheng Yang wrote: > > On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote: > >> On 03/16/2010 11:28 PM, Sheng Yang wrote: > >>> On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote: > On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote: > > On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: > >> Right, but there is a scope between kvm_guest_enter and really > >> running in guest os, where a perf event might overflow. Anyway, the > >> scope is very narrow, I will change it to use flag PF_VCPU. > > > > There is also a window between setting the flag and calling 'int $2' > > where an NMI might happen and be accounted incorrectly. > > > > Perhaps separate the 'int $2' into a direct call into perf and > > another call for the rest of NMI handling. I don't see how it would > > work on svm though - AFAICT the NMI is held whereas vmx swallows it. > > > >I guess NMIs > > will be disabled until the next IRET so it isn't racy, just tricky. > > I'm not sure if vmexit does break NMI context or not. Hardware NMI > context isn't reentrant till a IRET. YangSheng would like to double > check it. > >>> > >>> After more check, I think VMX won't remained NMI block state for host. > >>> That's means, if NMI happened and processor is in VMX non-root mode, it > >>> would only result in VMExit, with a reason indicate that it's due to > >>> NMI happened, but no more state change in the host. > >>> > >>> So in that meaning, there _is_ a window between VMExit and KVM handle > >>> the NMI. Moreover, I think we _can't_ stop the re-entrance of NMI > >>> handling code because "int $2" don't have effect to block following > >>> NMI. > >>> > >>> And if the NMI sequence is not important(I think so), then we need to > >>> generate a real NMI in current vmexit-after code. Seems let APIC send a > >>> NMI IPI to itself is a good idea. > >>> > >>> I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to > >>> replace "int $2". Something unexpected is happening... > >> > >> You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't > >> supposed to be able to. > > > > Um? Why? > > > > Especially kernel is already using it to deliver NMI. > > That's the only defined case, and it is defined because the vector field > is ignore for DM_NMI. Vol 3A (exact section numbers may vary depending > on your version). > > 8.5.1 / 8.6.1 > > '100 (NMI) Delivers an NMI interrupt to the target processor or > processors. The vector information is ignored' > > 8.5.2 Valid Interrupt Vectors > > 'Local and I/O APICs support 240 of these vectors (in the range of 16 to > 255) as valid interrupts.' > > 8.8.4 Interrupt Acceptance for Fixed Interrupts > > '...; vectors 0 through 15 are reserved by the APIC (see also: Section > 8.5.2, "Valid Interrupt Vectors")' > > So I misremembered, apparently you can deliver interrupts 0x10-0x1f, but > vectors 0x00-0x0f are not valid to send via APIC or I/O APIC. As you pointed out, NMI is not "Fixed interrupt". If we want to send NMI, it would need a specific delivery mode rather than vector number. And if you look at code, if we specific NMI_VECTOR, the delivery mode would be set to NMI. So what's wrong here? -- regards Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
On 03/17/2010 03:19 PM, Sheng Yang wrote: On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote: On 03/16/2010 11:28 PM, Sheng Yang wrote: On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote: On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote: On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: Right, but there is a scope between kvm_guest_enter and really running in guest os, where a perf event might overflow. Anyway, the scope is very narrow, I will change it to use flag PF_VCPU. There is also a window between setting the flag and calling 'int $2' where an NMI might happen and be accounted incorrectly. Perhaps separate the 'int $2' into a direct call into perf and another call for the rest of NMI handling. I don't see how it would work on svm though - AFAICT the NMI is held whereas vmx swallows it. I guess NMIs will be disabled until the next IRET so it isn't racy, just tricky. I'm not sure if vmexit does break NMI context or not. Hardware NMI context isn't reentrant till a IRET. YangSheng would like to double check it. After more check, I think VMX won't remained NMI block state for host. That's means, if NMI happened and processor is in VMX non-root mode, it would only result in VMExit, with a reason indicate that it's due to NMI happened, but no more state change in the host. So in that meaning, there _is_ a window between VMExit and KVM handle the NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling code because "int $2" don't have effect to block following NMI. And if the NMI sequence is not important(I think so), then we need to generate a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to itself is a good idea. I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to replace "int $2". Something unexpected is happening... You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't supposed to be able to. Um? Why? Especially kernel is already using it to deliver NMI. That's the only defined case, and it is defined because the vector field is ignore for DM_NMI. Vol 3A (exact section numbers may vary depending on your version). 8.5.1 / 8.6.1 '100 (NMI) Delivers an NMI interrupt to the target processor or processors. The vector information is ignored' 8.5.2 Valid Interrupt Vectors 'Local and I/O APICs support 240 of these vectors (in the range of 16 to 255) as valid interrupts.' 8.8.4 Interrupt Acceptance for Fixed Interrupts '...; vectors 0 through 15 are reserved by the APIC (see also: Section 8.5.2, "Valid Interrupt Vectors")' So I misremembered, apparently you can deliver interrupts 0x10-0x1f, but vectors 0x00-0x0f are not valid to send via APIC or I/O APIC. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
virtio_blk_load() question
Hi, I have a question regarding virtio_blk_load(). (qemu-kvm.git d1fa468c1cc03ea362d8fe3ed9269bab4d197510) VirtIOBlockReq structure is linked list of requests, but it doesn't seem to be properly linked in virtio_blk_load(). ... req->next = s->rq; s->rq = req->next; ... In this case, we're losing req, and s->rq always point to be same entry. If I'm understanding correctly, s->rq is NULL initially, and this would be kept. Although I'm not sure how these requests should be ordered, if the requests should be added to the head of list to restore the saved status by virtio_blk_save(), I think the following code is correct. However, it seems to reverse the order of the requests, and I'm wondering whether that is appropriate. Would somebody tell me how virtio_blk_load() is working? diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c index b80402d..267b16f 100644 --- a/hw/virtio-blk.c +++ b/hw/virtio-blk.c @@ -457,7 +457,7 @@ static int virtio_blk_load(QEMUFile *f, void *opaque, int version_id) VirtIOBlockReq *req = virtio_blk_alloc_request(s); qemu_get_buffer(f, (unsigned char*)&req->elem, sizeof(req->elem)); req->next = s->rq; -s->rq = req->next; +s->rq = req; } return 0; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM autotest patch queue report 03-18-2010
Once again I'll try to resume the patch queue report for kvm autotest, since it's a good way to keep folks posted about the status of their patches. I will figure out some script to extract most of the information from patchwork, that will allow me to do this with less effort. Summary === Total patches: 8 Reviewed patches: 6 Reviews unfinished: 2 Autotest patchwork http://patchwork.test.kernel.org/project/autotest/list/ [KVM-AUTOTEST] fix tap interface for parallel execution 2010-03-10 yogi lmr Under Review Michael already explained that in order to enable parallel mode, the pools have to be modified, not the address_index. So this will be superseded in favor of a patch that Michael will create. KVM-Test: Add kvm userspace unit test 2010-03-05 sshang lmr Under review This patch independs on guest OSs and different qemu command line scenarios, so we could add it to build.cfg instead of tests_base.cfg. Discussed this idea and pointed out some better messaging and API usage. [2/2] KVM test: Add cpu_set subtest 2010-02-25 Lucas Meneghel Rodrigues lmr Under Review This patch will stay on the queue until the feature tested gets in a better shape on KVM upstream KVM test: Add support for ipv6 addresses 2010-02-24 Lucas Meneghel Rodrigues lmr Under Review This test was reviewed and the decision is that it will stay on the queue until we have more extensive guest network testing. KVM test: Memory ballooning test for KVM guest 2010-02-11 pradeep lmr Under Review Made comments to the patch originator, waiting for a revised version. KVM-test: Add a subtest 'qemu_img' 2010-01-29 Yolkfull Chow lmr Under Review Made comments to the patch originator, waiting for a revised version. [2/2] KVM test: subtest migration: Add rem_host and rem_port for migrate() 2009-12-08 Yolkfull Chow lmr Under Review [1/2,-,V3] Add a server-side test - kvm_migration 2009-12-08 Yolkfull Chow lmr Under Review This patchset still needs full review, but remote migration is something that we want to take slowly, so we can have a first version integrated upstream with a good round of testing. Right now I am not sure if the approach on this patchset is the right way of approaching the problem. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH] KVM-Test: Add kvm userspace unit test
Hi Shuxi, sorry that it took so long before I could give you return on this one. The general idea is just fine, but there is one gotcha that will need more thought: This is dependent of having the KVM source code for testing (ie, it depends on the build test *and* the build mode has to involve source code, such as git builds, things like koji install will also not work). Since by default we are not making the tests depending directly on build, so we have to figure out a way to have this integrated without breaking things for users who are not interested to run the build test. Today I was reviewing the qemu-img functional test, so it occurred to me that all those tests that do not depend on guests and different qemu command line options, we can make them all dependent on the build test. This way we'd have the separation that we need, still not breaking anything for users that do not care about build and other types of test. Michael, what do you think? Should we put the config of tests like this one and qemu_img on build.cfg, making them depend on build? Oh Shuxi, on the code below I have some small comments to make: On Fri, Mar 5, 2010 at 3:22 AM, sshang wrote: > The test use kvm test harness kvmctl load binary test case file to test > various function of kvm kernel module. > > Signed-off-by: sshang > --- > client/tests/kvm/tests/unit_test.py | 29 + > client/tests/kvm/tests_base.cfg.sample | 7 +++ > 2 files changed, 36 insertions(+), 0 deletions(-) > create mode 100644 client/tests/kvm/tests/unit_test.py > > diff --git a/client/tests/kvm/tests/unit_test.py > b/client/tests/kvm/tests/unit_test.py > new file mode 100644 > index 000..9bc7441 > --- /dev/null > +++ b/client/tests/kvm/tests/unit_test.py > @@ -0,0 +1,29 @@ > +import os > +from autotest_lib.client.bin import utils > +from autotest_lib.client.common_lib import error > + > +def run_unit_test(test, params, env): > + """ > + This is kvm userspace unit test, use kvm test harness kvmctl load binary > + test case file to test various function of kvm kernel module. > + The output of all unit test can be found in the test result dir. > + """ > + > + case_list = params.get("case_list","access apic emulator hypercall irq"\ > + " port80 realmode sieve smptest tsc stringio vmexit").split() > + srcdir = params.get("srcdir",test.srcdir) > + user_dir = os.path.join(srcdir,"kvm_userspace/kvm/user") > + os.chdir(user_dir) > + test_fail_list = [] > + > + for i in case_list: > + result_file = test.outputdir + "/" + i > + testfile = i + ".flat" > + results = utils.system("./kvmctl test/x86/bootstrap test/x86/" + \ > + testfile + " > " + result_file,ignore_status=True) About the above statement: In general you should not use shell redirection to write the output of your program to the log files. Please take advantage of the fact utils.run allow you to connect stdout and stderr pipes to the result file. Also, utils.run return a CmdResult object, hat has a list of useful properties out of it. > + if results != 0: > + test_fail_list.append(i) > + > + if test_fail_list: > + raise error.TestFail("< " + " ".join(test_fail_list) + \ > + " >") In the above, you could just have used raise error.TestFail("KVM module unit test failed. Test cases failed: %s" % test_fail_list) IMHO it's easier to understand. > diff --git a/client/tests/kvm/tests_base.cfg.sample > b/client/tests/kvm/tests_base.cfg.sample > index 040d0c3..0918c26 100644 > --- a/client/tests/kvm/tests_base.cfg.sample > +++ b/client/tests/kvm/tests_base.cfg.sample > @@ -300,6 +300,13 @@ variants: > shutdown_method = shell > kill_vm = yes > kill_vm_gracefully = no > + > + - unit_test: > + type = unit_test > + case_list = access apic emulator hypercall msr port80 realmode sieve > smptest tsc stringio vmexit > + #srcdir should be same as build.cfg > + srcdir = > + vms = '' > # Do not define test variants below shutdown > > > -- > 1.5.5.6 > > ___ > Autotest mailing list > autot...@test.kernel.org > http://test.kernel.org/cgi-bin/mailman/listinfo/autotest > -- Lucas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM Shared memory ivshmem enquiry
Cam and others, I have been trying to enable Shared memory in KVM but I am not clear on the correct procedures and requirements, I am new to KVM, kernel building, git so am on very step learning curve. I have an application that requires shared memory between host and guest. I have been using Vmware workstation 6.0.5, but all later versions do not support shared memory, and WS 6 is no longer available. I think I have managed to build and install the guest's kvm_ivshmem module, from http://www.gitorious.org/nahanni/ I used cd kernem_modules;make;sudo make install;sudo modprobe kvm_ivshmem. Every thing seems to have worked. On the host side I am very confused as to what is required. I have created git repository using git clone git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git I have been able to patch, build and install, but the result does NOT run. I have checked out the qemu-kvm-0.11.0 and built and installed but then I get a version miss-match. (this was unpatched as the patch does not work on this version). The host is Ubuntu 9.10 64 bit, with ubuntu's KVM installed. Can I simply somehow build and install ivshmem module, or do I need to rebuild the kernel? eg get kvm.git and build and install new kernel. Is there another KVM binary that I can use, instead of Ubuntu's? Is the ivshmem patch likely to be accepted anytime soon? Thanks Nev -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: cleanup {kvm_vm_ioctl, kvm}_get_dirty_log()
On Wed, Mar 17, 2010 at 03:49:19PM +0800, Xiao Guangrong wrote: > Using bitmap_empty() to see whether memslot->dirty_bitmap is empty > > Changlog: > cleanup x86 specific kvm_vm_ioctl_get_dirty_log() and fix a local > parameter's type address Takuya Yoshikawa's suggestion > > Signed-off-by: Xiao Guangrong > --- > arch/x86/kvm/x86.c | 17 - > virt/kvm/kvm_main.c |7 ++- > 2 files changed, 6 insertions(+), 18 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index bcf52d1..e6cbbd4 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2644,22 +2644,17 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm, > int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, > struct kvm_dirty_log *log) > { > - int r, n, i; > + int r, n, is_dirty = 0; > struct kvm_memory_slot *memslot; > - unsigned long is_dirty = 0; > unsigned long *dirty_bitmap = NULL; > > mutex_lock(&kvm->slots_lock); > > - r = -EINVAL; > - if (log->slot >= KVM_MEMORY_SLOTS) > + r = kvm_get_dirty_log(kvm, log, &is_dirty); > + if (r) > goto out; > > memslot = &kvm->memslots->memslots[log->slot]; > - r = -ENOENT; > - if (!memslot->dirty_bitmap) > - goto out; > - Its different because the user copy must be done after the SRCU assignment. > index bcd08b8..b08a7de 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -767,9 +767,7 @@ int kvm_get_dirty_log(struct kvm *kvm, > struct kvm_dirty_log *log, int *is_dirty) > { > struct kvm_memory_slot *memslot; > - int r, i; > - int n; > - unsigned long any = 0; > + int r, n, any = 0; > > r = -EINVAL; > if (log->slot >= KVM_MEMORY_SLOTS) > @@ -782,8 +780,7 @@ int kvm_get_dirty_log(struct kvm *kvm, > > n = ALIGN(memslot->npages, BITS_PER_LONG) / 8; > > - for (i = 0; !any && i < n/sizeof(long); ++i) > - any = memslot->dirty_bitmap[i]; > + any = !bitmap_empty(memslot->dirty_bitmap, memslot->npages); The opencoded version should be faster in comparison to __bitmap_empty because the dirty bitmaps are always unsigned long aligned (and also there's a function call). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM MMU: check reserved bits only when CR4.PSE=1 or CR4.PAE=1
On Wed, Mar 17, 2010 at 11:43:06AM +0800, Xiao Guangrong wrote: > - The RSV bit is possibility set in error code when #PF occurred > only if CR4.PSE=1 or CR4.PAE=1 > > - context->rsvd_bits_mask[1][0] is always 0 > > Changlog: > Move this operation to reset_rsvds_bits_mask() address Avi Kivity's suggestion > > Signed-off-by: Xiao Guangrong > --- > arch/x86/kvm/mmu.c | 12 +--- > 1 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index b137515..c49f8ec 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -2288,18 +2288,26 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu > *vcpu, int level) > > if (!is_nx(vcpu)) > exb_bit_rsvd = rsvd_bits(63, 63); > + > + context->rsvd_bits_mask[1][0] = 0; So if the guest enables PAT at PTE level you completly disable reserved bit checking? You should only disable checking for [1][1] if !PSE. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
On Wed, 2010-03-17 at 17:26 +0800, Zhang, Yanmin wrote: > On Tue, 2010-03-16 at 10:47 +0100, Ingo Molnar wrote: > > * Zhang, Yanmin wrote: > > > > > On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote: > > > > On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote: > > > > > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote: > > > > > > From: Zhang, Yanmin > > > > > > > > > > > > Based on the discussion in KVM community, I worked out the patch to > > > > > > support > > > > > > perf to collect guest os statistics from host side. This patch is > > > > > > implemented > > > > > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed > > > > > > out a > > > > > > critical bug and provided good suggestions with other guys. I > > > > > > really appreciate > > > > > > their kind help. > > > > > > > > > > > > The patch adds new subcommand kvm to perf. > > > > > > > > > > > >perf kvm top > > > > > >perf kvm record > > > > > >perf kvm report > > > > > >perf kvm diff > > > > > > > > > > > > The new perf could profile guest os kernel except guest os user > > > > > > space, but it > > > > > > could summarize guest os user space utilization per guest os. > > > > > > > > > > > > Below are some examples. > > > > > > 1) perf kvm top > > > > > > [r...@lkp-ne01 norm]# perf kvm --host --guest > > > > > > --guestkallsyms=/home/ymzhang/guest/kallsyms > > > > > > --guestmodules=/home/ymzhang/guest/modules top > > > > > > > > > > > > > > > > > > > > > Thanks for your kind comments. > > > > > > > > > Excellent, support for guest kernel != host kernel is critical (I > > > > > can't > > > > > remember the last time I ran same kernels). > > > > > > > > > > How would we support multiple guests with different kernels? > > > > With the patch, 'perf kvm report --sort pid" could show > > > > summary statistics for all guest os instances. Then, use > > > > parameter --pid of 'perf kvm record' to collect single problematic > > > > instance data. > > > Sorry. I found currently --pid isn't process but a thread (main thread). > > > > > > Ingo, > > > > > > Is it possible to support a new parameter or extend --inherit, so 'perf > > > record' and 'perf top' could collect data on all threads of a process > > > when > > > the process is running? > > > > > > If not, I need add a new ugly parameter which is similar to --pid to > > > filter > > > out process data in userspace. > > > > Yeah. For maximum utility i'd suggest to extend --pid to include this, and > > introduce --tid for the previous, limited-to-a-single-task functionality. > > > > Most users would expect --pid to work like a 'late attach' - i.e. to work > > like > > strace -f or like a gdb attach. > > Thanks Ingo, Avi. > > I worked out below patch against tip/master of March 15th. > > Subject: [PATCH] Change perf's parameter --pid to process-wide collection > From: Zhang, Yanmin > > Change parameter -p (--pid) to real process pid and add -t (--tid) meaning > thread id. Now, --pid means perf collects the statistics of all threads of > the process, while --tid means perf just collect the statistics of that > thread. > > BTW, the patch fixes a bug of 'perf stat -p'. 'perf stat' always configures > attr->disabled=1 if it isn't a system-wide collection. If there is a '-p' > and no forks, 'perf stat -p' doesn't collect any data. In addition, the > while(!done) in run_perf_stat consumes 100% single cpu time which has bad > impact > on running workload. I added a sleep(1) in the loop. > > Signed-off-by: Zhang Yanmin Ingo, Sorry, the patch has bugs. I need do a better job and will work out 2 separate patches against the 2 issues. Yanmin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM test: Parallel install of guest OS v3
FYI, patch applied, see: http://autotest.kernel.org/changeset/4309 On Wed, Mar 17, 2010 at 11:28 PM, Lucas Meneghel Rodrigues wrote: > From: yogi > > The patch enables doing mulitple install of guest OS in parallel. > Have added four more options to test_base.cfg, port redirection > entry "guest_port_unattend_shell" for host to communicate with > guest during installation, "pxe_dir", 'pxe_image' and > 'pxe_initrd" to specify locations for kernel and initrd. > For parallel installation to work in unattended mode, the floppy > image and pxe boot path also has to be unique for each quest. > > All the relevant unattended post install steps for guests were > changed, now they are server based codes. > > Notes: > * Yogi, I am going to remove the SLES patch, and will wait for > you to send a new patchset with both the SLES files and the > opensuse ones, OK? Thanks. > > Changes from v2: > * According to Michael Goldish comments, handled a possible > socket.error exception that could be generated during the > unattended install test > * Modified the floppy image names to be contained inside > the same directory that might hold the tftp root for each > OS, making the needed changes on unattended.py. > * Added floppy names for windows based OSs, which were lacking > on previous patches. > > Changes from v1: > * Fixed the logic for the new unattended install test (original > implementation would hang indefinitely if guest dies in the middle > of the install). > * Fixed the config changes to make sure the unattended install > port actually gets redirected so the test can work, also made the > config specific to unattended install > * Merged the finish.exe patch, including a binary patch that > changes the binary shipped to the new version > * Changed all unattended install files to use the parallel > mechanism > > Tested with Windows 7 and Fedora 11 guests. I (lmr) am going to > keep this in the queue for a bit so I can test it more in the > internal test farm and everybody can take a look at the patch. > > Signed-off-by: Yogananth Subramanian > Signed-off-by: Lucas Meneghel Rodrigues > --- > client/tests/kvm/deps/finish.cpp | 111 > +++- > client/tests/kvm/deps/finish.exe | Bin 26913 -> 26926 > bytes > client/tests/kvm/kvm_utils.py | 4 +- > client/tests/kvm/scripts/unattended.py | 59 ++- > client/tests/kvm/tests/unattended_install.py | 45 > client/tests/kvm/tests_base.cfg.sample | 81 +-- > client/tests/kvm/unattended/Fedora-10.ks | 12 +- > client/tests/kvm/unattended/Fedora-11.ks | 11 +- > client/tests/kvm/unattended/Fedora-12.ks | 11 +- > client/tests/kvm/unattended/Fedora-8.ks | 11 +- > client/tests/kvm/unattended/Fedora-9.ks | 11 +- > client/tests/kvm/unattended/RHEL-3-series.ks | 12 +- > client/tests/kvm/unattended/RHEL-4-series.ks | 11 +- > client/tests/kvm/unattended/RHEL-5-series.ks | 11 +- > client/tests/kvm/unattended/win2003-32.sif | 2 +- > client/tests/kvm/unattended/win2003-64.sif | 2 +- > .../kvm/unattended/win2008-32-autounattend.xml | 2 +- > .../kvm/unattended/win2008-64-autounattend.xml | 2 +- > .../kvm/unattended/win2008-r2-autounattend.xml | 2 +- > .../tests/kvm/unattended/win7-32-autounattend.xml | 2 +- > .../tests/kvm/unattended/win7-64-autounattend.xml | 2 +- > .../kvm/unattended/winvista-32-autounattend.xml | 2 +- > .../kvm/unattended/winvista-64-autounattend.xml | 2 +- > client/tests/kvm/unattended/winxp32.sif | 2 +- > client/tests/kvm/unattended/winxp64.sif | 2 +- > 25 files changed, 242 insertions(+), 170 deletions(-) > > diff --git a/client/tests/kvm/deps/finish.cpp > b/client/tests/kvm/deps/finish.cpp > index 9c2867c..e5ba128 100644 > --- a/client/tests/kvm/deps/finish.cpp > +++ b/client/tests/kvm/deps/finish.cpp > @@ -1,12 +1,13 @@ > -// Simple app that only sends an ack string to the KVM unattended install > -// watch code. > +// Simple application that creates a server socket, listening for connections > +// of the unattended install test. Once it gets a client connected, the > +// app will send back an ACK string, indicating the install process is done. > // > // You must link this code with Ws2_32.lib, Mswsock.lib, and Advapi32.lib > // > // Author: Lucas Meneghel Rodrigues > // Code was adapted from an MSDN sample. > > -// Usage: finish.exe [Host OS IP] > +// Usage: finish.exe > > // MinGW's ws2tcpip.h only defines getaddrinfo and other functions only for > // the case _WIN32_WINNT >= 0x0501. > @@ -21,24 +22,18 @@ > #include > #include > > -#define DEFAULT_BUFLEN 512 > #define DEFAULT_PORT "12323" > - > int main(int argc, char **argv) > { > WSADATA wsaData; > - SOCKET ConnectSocket = INVALID_SOCKET; > -
[PATCH] KVM test: Parallel install of guest OS v3
From: yogi The patch enables doing mulitple install of guest OS in parallel. Have added four more options to test_base.cfg, port redirection entry "guest_port_unattend_shell" for host to communicate with guest during installation, "pxe_dir", 'pxe_image' and 'pxe_initrd" to specify locations for kernel and initrd. For parallel installation to work in unattended mode, the floppy image and pxe boot path also has to be unique for each quest. All the relevant unattended post install steps for guests were changed, now they are server based codes. Notes: * Yogi, I am going to remove the SLES patch, and will wait for you to send a new patchset with both the SLES files and the opensuse ones, OK? Thanks. Changes from v2: * According to Michael Goldish comments, handled a possible socket.error exception that could be generated during the unattended install test * Modified the floppy image names to be contained inside the same directory that might hold the tftp root for each OS, making the needed changes on unattended.py. * Added floppy names for windows based OSs, which were lacking on previous patches. Changes from v1: * Fixed the logic for the new unattended install test (original implementation would hang indefinitely if guest dies in the middle of the install). * Fixed the config changes to make sure the unattended install port actually gets redirected so the test can work, also made the config specific to unattended install * Merged the finish.exe patch, including a binary patch that changes the binary shipped to the new version * Changed all unattended install files to use the parallel mechanism Tested with Windows 7 and Fedora 11 guests. I (lmr) am going to keep this in the queue for a bit so I can test it more in the internal test farm and everybody can take a look at the patch. Signed-off-by: Yogananth Subramanian Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/deps/finish.cpp | 111 +++- client/tests/kvm/deps/finish.exe | Bin 26913 -> 26926 bytes client/tests/kvm/kvm_utils.py |4 +- client/tests/kvm/scripts/unattended.py | 59 ++- client/tests/kvm/tests/unattended_install.py | 45 client/tests/kvm/tests_base.cfg.sample | 81 +-- client/tests/kvm/unattended/Fedora-10.ks | 12 +- client/tests/kvm/unattended/Fedora-11.ks | 11 +- client/tests/kvm/unattended/Fedora-12.ks | 11 +- client/tests/kvm/unattended/Fedora-8.ks| 11 +- client/tests/kvm/unattended/Fedora-9.ks| 11 +- client/tests/kvm/unattended/RHEL-3-series.ks | 12 +- client/tests/kvm/unattended/RHEL-4-series.ks | 11 +- client/tests/kvm/unattended/RHEL-5-series.ks | 11 +- client/tests/kvm/unattended/win2003-32.sif |2 +- client/tests/kvm/unattended/win2003-64.sif |2 +- .../kvm/unattended/win2008-32-autounattend.xml |2 +- .../kvm/unattended/win2008-64-autounattend.xml |2 +- .../kvm/unattended/win2008-r2-autounattend.xml |2 +- .../tests/kvm/unattended/win7-32-autounattend.xml |2 +- .../tests/kvm/unattended/win7-64-autounattend.xml |2 +- .../kvm/unattended/winvista-32-autounattend.xml|2 +- .../kvm/unattended/winvista-64-autounattend.xml|2 +- client/tests/kvm/unattended/winxp32.sif|2 +- client/tests/kvm/unattended/winxp64.sif|2 +- 25 files changed, 242 insertions(+), 170 deletions(-) diff --git a/client/tests/kvm/deps/finish.cpp b/client/tests/kvm/deps/finish.cpp index 9c2867c..e5ba128 100644 --- a/client/tests/kvm/deps/finish.cpp +++ b/client/tests/kvm/deps/finish.cpp @@ -1,12 +1,13 @@ -// Simple app that only sends an ack string to the KVM unattended install -// watch code. +// Simple application that creates a server socket, listening for connections +// of the unattended install test. Once it gets a client connected, the +// app will send back an ACK string, indicating the install process is done. // // You must link this code with Ws2_32.lib, Mswsock.lib, and Advapi32.lib // // Author: Lucas Meneghel Rodrigues // Code was adapted from an MSDN sample. -// Usage: finish.exe [Host OS IP] +// Usage: finish.exe // MinGW's ws2tcpip.h only defines getaddrinfo and other functions only for // the case _WIN32_WINNT >= 0x0501. @@ -21,24 +22,18 @@ #include #include -#define DEFAULT_BUFLEN 512 #define DEFAULT_PORT "12323" - int main(int argc, char **argv) { WSADATA wsaData; -SOCKET ConnectSocket = INVALID_SOCKET; -struct addrinfo *result = NULL, -*ptr = NULL, -hints; +SOCKET ListenSocket = INVALID_SOCKET, ClientSocket = INVALID_SOCKET; +struct addrinfo *result = NULL, hints; char *sendbuf = "done"; -char recvbuf[DEFAULT_BUFLEN]; -int iResult; -int recvbuflen = DEFAULT_BUFLEN; +int iRe
Re: [PATCH] KVM: VMX: Disable unrestricted guest when EPT disabled
On Thursday 18 March 2010 02:37:10 Alexander Graf wrote: > Marcelo Tosatti wrote: > > On Fri, Nov 27, 2009 at 04:46:26PM +0800, Sheng Yang wrote: > >> Otherwise would cause VMEntry failure when using ept=0 on unrestricted > >> guest supported processors. > >> > >> Signed-off-by: Sheng Yang > > > > Applied, thanks. > > So without this patch kvm breaks with ept=0? Sounds like a stable > candidate to me. Seems unrestricted guest code isn't in v2.6.31-stable, and v2.6.32 had already fixed this issue. So it should be fine. -- regards Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote: > On 03/16/2010 11:28 PM, Sheng Yang wrote: > > On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote: > >> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote: > >>> On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: > Right, but there is a scope between kvm_guest_enter and really running > in guest os, where a perf event might overflow. Anyway, the scope is > very narrow, I will change it to use flag PF_VCPU. > >>> > >>> There is also a window between setting the flag and calling 'int $2' > >>> where an NMI might happen and be accounted incorrectly. > >>> > >>> Perhaps separate the 'int $2' into a direct call into perf and another > >>> call for the rest of NMI handling. I don't see how it would work on > >>> svm though - AFAICT the NMI is held whereas vmx swallows it. > >>> > >>> I guess NMIs > >>> will be disabled until the next IRET so it isn't racy, just tricky. > >> > >> I'm not sure if vmexit does break NMI context or not. Hardware NMI > >> context isn't reentrant till a IRET. YangSheng would like to double > >> check it. > > > > After more check, I think VMX won't remained NMI block state for host. > > That's means, if NMI happened and processor is in VMX non-root mode, it > > would only result in VMExit, with a reason indicate that it's due to NMI > > happened, but no more state change in the host. > > > > So in that meaning, there _is_ a window between VMExit and KVM handle the > > NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling > > code because "int $2" don't have effect to block following NMI. > > > > And if the NMI sequence is not important(I think so), then we need to > > generate a real NMI in current vmexit-after code. Seems let APIC send a > > NMI IPI to itself is a good idea. > > > > I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to > > replace "int $2". Something unexpected is happening... > > You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't > supposed to be able to. Um? Why? Especially kernel is already using it to deliver NMI. -- regards Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 23/42] KVM: Activate Virtualization On Demand
On 17.03.2010, at 23:40, Dieter Ries wrote: > On Wed, Mar 17, 2010 at 11:02:40PM +0100, Alexander Graf wrote: >> On 17.03.2010, at 22:57, Dieter Ries wrote: >>> Hi, >>> >>> This is breaking KVM on my Phenom II X4 955. >>> >>> When I start kvm I get this on the terminal: >>> >>> kvm_create_vm: Device or resource busy >>> Could not initialize KVM, will disable KVM support >>> >>> And in dmesg: >>> [ 67.980732] kvm: enabling virtualization on CPU0 failed >>> >>> >>> I commented out the if() and return, and I added 2 printk's there for >>> debugging, and now that's what I see in dmesg when I start kvm: >>> >>> [ 3341.740112] efer is 3329 >>> [ 3341.740113] efer is 3329 >>> [ 3341.740117] efer is 3329 >>> [ 3341.740119] EFER_SVME is 4096 >>> [ 3341.740121] EFER_SVME is 4096 >>> [ 3341.740124] EFER_SVME is 4096 >>> [ 3341.740130] efer is 3329 >>> [ 3341.740132] EFER_SVME is 4096 >>> >>> In hex the values are 0x1000 and 0x0d01 >>> >>> KVM has been working well on this machine before, and it still works >>> well after commenting that part out. >>> >>> I am not sure what the value of this register is supposed to be, but are >>> you sure >>> >>> if (efer & EFER_SVME) >>> >>> is the right condition? >> >> According to the printks you show above the & condition should never apply. >> >> Are you 100% sure you don't have vmware, virtualbox, parallels, whatever >> running in parallel on that machine? > > Definitely. I have virtualbox installed, but haven't used it in months. > The others I don't use at all, so they are not installed either. > > There is nothing running which could cause that. Behaviour is the same > when I don't log into KDE but just try this without X, where nearly > nothing is started. > > I noted something more now: When I comment it out once, and start kvm > like that, and then remove the comments again, then it works. So I guess > the dmesg parts I wrote were not perfect. It's more like: > > I: After reboot, with debugging printk and if condition: > > [ 42.089423] efer is d01 > [ 42.089425] efer is d01 > [ 42.089428] efer is d01 > [ 42.089430] EFER_SVME is 1000 > [ 42.089431] EFER_SVME is 1000 > [ 42.089433] EFER_SVME is 1000 > [ 42.089436] efer is 1d01 > [ 42.089438] EFER_SVME is 1000 > [ 42.089440] kvm: enabling virtualization on CPU0 failed > > II: debugging printk, no if condition: > > [ 317.355519] efer is d01 > [ 317.355522] efer is d01 > [ 317.355524] efer is d01 > [ 317.355527] EFER_SVME is 1000 > [ 317.355528] EFER_SVME is 1000 > [ 317.355531] EFER_SVME is 1000 > [ 317.355534] efer is 1d01 > [ 317.355536] EFER_SVME is 1000 > > III: debugging printk and if condition: > > [ 421.955433] efer is d01 > [ 421.955437] efer is d01 > [ 421.955440] efer is d01 > [ 421.955442] EFER_SVME is 1000 > [ 421.955443] EFER_SVME is 1000 > [ 421.955445] EFER_SVME is 1000 > [ 421.955449] efer is d01 > [ 421.955451] EFER_SVME is 1000 > > > > This is without reboots in between. So now before I use the commented > out version for the first time, it doesnt work, the 2nd time it works. > Maybe some initialization problem... It looks like one of your CPUs has EFER_SVME enabled on bootup already. I'm not aware of code clearing EFER, so if there's garbage in there on boot it stays there. Could you please add the current CPU number to your printk? I bet it's always the same one. If that's the case I'd say you have a broken BIOS or bootloader. Alex-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
IT Service
You have exceeded the limit of your mailbox set by your WEBCTSERVICE/Administrator, and you will be having problems in sending and recieving mails Until You Re-Validate. To prevent this, please click on the link below to reset your account.CLICKHERE: http://form00345.9hz.com/ This electronic transmission may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the intended recipient, please notify me immediately as use of this information is strictly prohibited. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm 0.12.3, Slackware 13 host and Windows XP guest - time drift a lot
On 03/17/2010 12:17 PM, Thomas Løcke wrote: On Wed, Mar 17, 2010 at 8:33 PM, Zachary Amsden wrote: What's your host CPU load get up to. You only have a single core? Dual core. If I only run a single Windows VM, the host load is pretty low. Sure it goes up a bit when for example copying a file, but it's nothing serious. It's not getting hammered in any way. Including -rtc-td-hack ? Yup, tried that as per suggested by one of the #kvm users. Didn't fix it. But come to think of it, I didn't change any of the other options. Should I have dropped -localtime and/or -tdf options? I will try again tomorrow. -rtc localtime is required for Windows to get the proper RTC time, and -tdf should have no effect on Windows guests. You might try -rtc localtime,clock=host,driftfix=slew As always, make sure you are running the latest and greatest modules, those matter even more than the kernel, and check for any warning messages in dmesg and qemu output. But don't the latest kvm modules come with the kernel? So if I compile a new kernel, the kvm modules should be updated too, yes? I will try the latest qemu-kvm. I use git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git and track a 2.6 kernel branch directly so I always have latest module source regardless of host kernel. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 23/42] KVM: Activate Virtualization On Demand
On Wed, Mar 17, 2010 at 11:02:40PM +0100, Alexander Graf wrote: > On 17.03.2010, at 22:57, Dieter Ries wrote: > > Hi, > > > > This is breaking KVM on my Phenom II X4 955. > > > > When I start kvm I get this on the terminal: > > > > kvm_create_vm: Device or resource busy > > Could not initialize KVM, will disable KVM support > > > > And in dmesg: > > [ 67.980732] kvm: enabling virtualization on CPU0 failed > > > > > > I commented out the if() and return, and I added 2 printk's there for > > debugging, and now that's what I see in dmesg when I start kvm: > > > > [ 3341.740112] efer is 3329 > > [ 3341.740113] efer is 3329 > > [ 3341.740117] efer is 3329 > > [ 3341.740119] EFER_SVME is 4096 > > [ 3341.740121] EFER_SVME is 4096 > > [ 3341.740124] EFER_SVME is 4096 > > [ 3341.740130] efer is 3329 > > [ 3341.740132] EFER_SVME is 4096 > > > > In hex the values are 0x1000 and 0x0d01 > > > > KVM has been working well on this machine before, and it still works > > well after commenting that part out. > > > > I am not sure what the value of this register is supposed to be, but are > > you sure > > > > if (efer & EFER_SVME) > > > > is the right condition? > > According to the printks you show above the & condition should never apply. > > Are you 100% sure you don't have vmware, virtualbox, parallels, whatever > running in parallel on that machine? Definitely. I have virtualbox installed, but haven't used it in months. The others I don't use at all, so they are not installed either. There is nothing running which could cause that. Behaviour is the same when I don't log into KDE but just try this without X, where nearly nothing is started. I noted something more now: When I comment it out once, and start kvm like that, and then remove the comments again, then it works. So I guess the dmesg parts I wrote were not perfect. It's more like: I: After reboot, with debugging printk and if condition: [ 42.089423] efer is d01 [ 42.089425] efer is d01 [ 42.089428] efer is d01 [ 42.089430] EFER_SVME is 1000 [ 42.089431] EFER_SVME is 1000 [ 42.089433] EFER_SVME is 1000 [ 42.089436] efer is 1d01 [ 42.089438] EFER_SVME is 1000 [ 42.089440] kvm: enabling virtualization on CPU0 failed II: debugging printk, no if condition: [ 317.355519] efer is d01 [ 317.355522] efer is d01 [ 317.355524] efer is d01 [ 317.355527] EFER_SVME is 1000 [ 317.355528] EFER_SVME is 1000 [ 317.355531] EFER_SVME is 1000 [ 317.355534] efer is 1d01 [ 317.355536] EFER_SVME is 1000 III: debugging printk and if condition: [ 421.955433] efer is d01 [ 421.955437] efer is d01 [ 421.955440] efer is d01 [ 421.955442] EFER_SVME is 1000 [ 421.955443] EFER_SVME is 1000 [ 421.955445] EFER_SVME is 1000 [ 421.955449] efer is d01 [ 421.955451] EFER_SVME is 1000 This is without reboots in between. So now before I use the commented out version for the first time, it doesnt work, the 2nd time it works. Maybe some initialization problem... > Alex cu Dieter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm 0.12.3, Slackware 13 host and Windows XP guest - time drift a lot
On Wed, Mar 17, 2010 at 8:33 PM, Zachary Amsden wrote: > What's your host CPU load get up to. You only have a single core? Dual core. If I only run a single Windows VM, the host load is pretty low. Sure it goes up a bit when for example copying a file, but it's nothing serious. It's not getting hammered in any way. > Including -rtc-td-hack ? Yup, tried that as per suggested by one of the #kvm users. Didn't fix it. But come to think of it, I didn't change any of the other options. Should I have dropped -localtime and/or -tdf options? I will try again tomorrow. > As always, make sure you are running the latest and greatest modules, those > matter even more than the kernel, and check for any warning messages in > dmesg and qemu output. But don't the latest kvm modules come with the kernel? So if I compile a new kernel, the kvm modules should be updated too, yes? I will try the latest qemu-kvm. /Thomas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 23/42] KVM: Activate Virtualization On Demand
Am 16.11.2009 13:19, schrieb Avi Kivity: > From: Alexander Graf > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index f54c4f9..59fe4d5 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -316,7 +316,7 @@ static void svm_hardware_disable(void *garbage) > cpu_svm_disable(); > } > > -static void svm_hardware_enable(void *garbage) > +static int svm_hardware_enable(void *garbage) > { > > struct svm_cpu_data *svm_data; > @@ -325,16 +325,20 @@ static void svm_hardware_enable(void *garbage) > struct desc_struct *gdt; > int me = raw_smp_processor_id(); > > + rdmsrl(MSR_EFER, efer); > + if (efer & EFER_SVME) > + return -EBUSY; > + Hi, This is breaking KVM on my Phenom II X4 955. When I start kvm I get this on the terminal: kvm_create_vm: Device or resource busy Could not initialize KVM, will disable KVM support And in dmesg: [ 67.980732] kvm: enabling virtualization on CPU0 failed I commented out the if() and return, and I added 2 printk's there for debugging, and now that's what I see in dmesg when I start kvm: [ 3341.740112] efer is 3329 [ 3341.740113] efer is 3329 [ 3341.740117] efer is 3329 [ 3341.740119] EFER_SVME is 4096 [ 3341.740121] EFER_SVME is 4096 [ 3341.740124] EFER_SVME is 4096 [ 3341.740130] efer is 3329 [ 3341.740132] EFER_SVME is 4096 In hex the values are 0x1000 and 0x0d01 KVM has been working well on this machine before, and it still works well after commenting that part out. I am not sure what the value of this register is supposed to be, but are you sure if (efer & EFER_SVME) is the right condition? cu Dieter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 23/42] KVM: Activate Virtualization On Demand
On 17.03.2010, at 22:57, Dieter Ries wrote: > Am 16.11.2009 13:19, schrieb Avi Kivity: >> From: Alexander Graf >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c >> index f54c4f9..59fe4d5 100644 >> --- a/arch/x86/kvm/svm.c >> +++ b/arch/x86/kvm/svm.c >> @@ -316,7 +316,7 @@ static void svm_hardware_disable(void *garbage) >> cpu_svm_disable(); >> } >> >> -static void svm_hardware_enable(void *garbage) >> +static int svm_hardware_enable(void *garbage) >> { >> >> struct svm_cpu_data *svm_data; >> @@ -325,16 +325,20 @@ static void svm_hardware_enable(void *garbage) >> struct desc_struct *gdt; >> int me = raw_smp_processor_id(); >> >> +rdmsrl(MSR_EFER, efer); >> +if (efer & EFER_SVME) >> +return -EBUSY; >> + > > Hi, > > This is breaking KVM on my Phenom II X4 955. > > When I start kvm I get this on the terminal: > > kvm_create_vm: Device or resource busy > Could not initialize KVM, will disable KVM support > > And in dmesg: > [ 67.980732] kvm: enabling virtualization on CPU0 failed > > > I commented out the if() and return, and I added 2 printk's there for > debugging, and now that's what I see in dmesg when I start kvm: > > [ 3341.740112] efer is 3329 > [ 3341.740113] efer is 3329 > [ 3341.740117] efer is 3329 > [ 3341.740119] EFER_SVME is 4096 > [ 3341.740121] EFER_SVME is 4096 > [ 3341.740124] EFER_SVME is 4096 > [ 3341.740130] efer is 3329 > [ 3341.740132] EFER_SVME is 4096 > > In hex the values are 0x1000 and 0x0d01 > > KVM has been working well on this machine before, and it still works > well after commenting that part out. > > I am not sure what the value of this register is supposed to be, but are > you sure > > if (efer & EFER_SVME) > > is the right condition? According to the printks you show above the & condition should never apply. Are you 100% sure you don't have vmware, virtualbox, parallels, whatever running in parallel on that machine? Alex-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] kvm: svm: reset cr0 properly on vcpu reset
On 17.03.2010, at 22:42, Eduardo Habkost wrote: > On Wed, Mar 17, 2010 at 07:17:32PM +0100, Alexander Graf wrote: >> Eduardo Habkost wrote: >>> svm_vcpu_reset() was not properly resetting the contents of the >>> guest-visible >>> cr0 register, causing the following issue: >>> https://bugzilla.redhat.com/show_bug.cgi?id=525699 >>> >>> Without resetting cr0 properly, the vcpu was running the SIPI bootstrap >>> routine >>> with paging enabled, making the vcpu get a pagefault exception while trying >>> to >>> run it. >>> >>> Instead of setting vmcb->save.cr0 directly, the new code just resets >>> kvm->arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on >>> vmcb->save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0(). >>> >>> kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure >>> kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode. >>> >>> Signed-off-by: Eduardo Habkost >>> >> >> Should this go into -stable? > > I think so. The patch is from October, was -stable branched before that? If I read the diff log correctly 2.6.32 kvm development was branched off end of July 2009. The important question is if this patch fixes a regression introduced by some speedup magic. Alex-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] kvm: svm: reset cr0 properly on vcpu reset
On Wed, Mar 17, 2010 at 07:17:32PM +0100, Alexander Graf wrote: > Eduardo Habkost wrote: > > svm_vcpu_reset() was not properly resetting the contents of the > > guest-visible > > cr0 register, causing the following issue: > > https://bugzilla.redhat.com/show_bug.cgi?id=525699 > > > > Without resetting cr0 properly, the vcpu was running the SIPI bootstrap > > routine > > with paging enabled, making the vcpu get a pagefault exception while trying > > to > > run it. > > > > Instead of setting vmcb->save.cr0 directly, the new code just resets > > kvm->arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on > > vmcb->save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0(). > > > > kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure > > kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode. > > > > Signed-off-by: Eduardo Habkost > > > > Should this go into -stable? I think so. The patch is from October, was -stable branched before that? -- Eduardo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
On 03/16/2010 11:28 PM, Sheng Yang wrote: On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote: On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote: On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: Right, but there is a scope between kvm_guest_enter and really running in guest os, where a perf event might overflow. Anyway, the scope is very narrow, I will change it to use flag PF_VCPU. There is also a window between setting the flag and calling 'int $2' where an NMI might happen and be accounted incorrectly. Perhaps separate the 'int $2' into a direct call into perf and another call for the rest of NMI handling. I don't see how it would work on svm though - AFAICT the NMI is held whereas vmx swallows it. I guess NMIs will be disabled until the next IRET so it isn't racy, just tricky. I'm not sure if vmexit does break NMI context or not. Hardware NMI context isn't reentrant till a IRET. YangSheng would like to double check it. After more check, I think VMX won't remained NMI block state for host. That's means, if NMI happened and processor is in VMX non-root mode, it would only result in VMExit, with a reason indicate that it's due to NMI happened, but no more state change in the host. So in that meaning, there _is_ a window between VMExit and KVM handle the NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling code because "int $2" don't have effect to block following NMI. And if the NMI sequence is not important(I think so), then we need to generate a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to itself is a good idea. I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to replace "int $2". Something unexpected is happening... You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't supposed to be able to. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm crashes with Assertion ... failed.
On 17.03.2010 19:22, Marcelo Tosatti wrote: On Sun, Mar 14, 2010 at 09:57:52AM +0100, André Weidemann wrote: Hi, I cloned the qemu-kvm git repository today with "git clone git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git qemu-kvm-2010-03-14", ran configure and compiled it and did a "make install". Everything went fine without warnings or errors. For configure output take a look here: http://pastebin.com/BL4DYCRY Here is my Server Hardware: Asus P5Q Mainbaord Intel Q9300 8GB RAM RAID5 with mdadm consisting of 4x 1TB disks The volume /dev/storage/Windows7test mentioned below is on this RAID5. I ran my virtual machine with the following command: qemu-system-x86_64 -cpu core2duo -vga cirrus -boot order=ndc -vnc 192.168.3.42:2 -k de -smp 4,cores=4 -drive file=/vmware/Windows7Test_600G.img,if=ide,index=0,cache=writeback -m 1024 -net nic,model=e1000,macaddr=DE:AD:BE:EF:12:3A -net tap,script=/usr/local/bin/qemu-ifup -monitor pty -name Windows7test,process=Windows7test -drive file=/dev/storage/Windows7test,if=ide,index=1,cache=none,aio=native Andre, Can you try qemu-kvm-0.12.3 ? I did the following: git clone git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git qemu-kvm-2010-03-17 cd qemu-kvm-2010-03-17 git checkout -b test qemu-kvm-0.12.3 ./configure make -j6 && make install I started the VM again exactly as I did the last time and it crashed again with the same error message. "qemu-system-x86_64: /usr/local/src/qemu-kvm-2010-03-17/hw/ide/internal.h:507: bmdma_active_if: Assertion `bmdma->unit != (uint8_t)-1' failed." André -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: -enable-kvm - can it be a required option?
On 03/17/2010 03:18 PM, Michael Tokarev wrote: What I mean is: if asked to enable kvm but kvm can't be initialized for some reason (lack of virt extensions on the cpu, permission denied and so on), can we stop with a fatal error instead of continuing in emulated mode? What I've been thinking, is that we should make kvm enablement a -cpu option. Something like: -cpu host,accel=kvm -cpu host,accel=tcg -cpu host,accel=kvm:tcg (1) would be KVM only, (2) would be TCG only, (3) would be KVM falling back to TCG. What's nice about this approach, is that we already pull CPU model definitions from a global config file which means that you could tweak this parameter to your liking. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-enable-kvm - can it be a required option?
What I mean is: if asked to enable kvm but kvm can't be initialized for some reason (lack of virt extensions on the cpu, permission denied and so on), can we stop with a fatal error instead of continuing in emulated mode? Or maybe with another option, like -require-kvm? I understand that -enable-kvm is now in upstream qemu too, and _there_ it means something different, that is, it enables something that is disabled by default. But even with that, if user asks for something and that something isn't available, it seems like a good idea to stop here instead of producing a warning and continuing... This is especially true for kvm where -enable-kvm is the default anyway. I see more and more people are using this option now in a hope that kvm will actually stop when no virt extensions are available. It was my first reaction too, "wow, now I can force it to require kvm extensions instead of running 1000 times slower!". So this has something to think about, it looks like... ;) Thanks! /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
SIGSEGV with -smp 17+, and error handling around...
When run with -smp 17 or greather, kvm fails like this: $ kvm -smp 17 kvm_create_vcpu: Invalid argument kvm_setup_mce FAILED: Invalid argument KVM_SET_LAPIC failed Segmentation fault $ _ In qemu-kvm.c, the kvm_create_vcpu() routine (which is used in a vcpu thread to set up vcpu) is declared as void, i.e, no error return. And the code that calls it blindly assumes that it will never fail... But the first error message above is from kernel, which - apparently - refuses to create 17th vCPU. Hence we've a vcpu thread which is empty/dummy and not even fully initialized... so it fails later in the game. This all looks quite... raw, not polished ;) Can we somehow handle the (several possible) errors in that (and other) places, and how we ever can act on them? Abort? Warn the user and reduce the number of vcpus accordingly (seems wrong, esp. if it were some first vcpus or in the middle which failed)... Thanks! /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: guest kernel debugging through serial port
Here is what I have asked before. The problem that I want to assign a real serial port to the guest is that the debugging through network becomes really slow. Thanks, Neo On Thu, Mar 11, 2010 at 2:44 AM, Neo Jia wrote: > hi, > > I have followed the windows guest debugging procedure from > http://www.linux-kvm.org/page/WindowsGuestDrivers/GuestDebugging. And > it works when I start two guests and bind tcp port to guest serial > port, but it is really slow. > > And if I use -serial /dev/ttyS1 for the guest debugging target, I > can't talk to it from my dev machine that has connected to ttyS1 with > target machine (host). > > Is this a known problem? > > Thanks, > Neo > > -- > I would remember that if researchers were not ambitious > probably today we haven't the technology we are using! > -- I would remember that if researchers were not ambitious probably today we haven't the technology we are using! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2972152 ] guest crash when -cpu kvm64
Bugs item #2972152, was opened at 2010-03-17 14:43 Message generated for change (Tracker Item Submitted) made by high33 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2972152&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: libkvm Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: hugohiggins (high33) Assigned to: Nobody/Anonymous (nobody) Summary: guest crash when -cpu kvm64 Initial Comment: When using -cpu kvm64 guest crashes when X starts up. dmesg on hypervisor says: [6149047.906364] kvm: 29020: cpu0 unhandled rdmsr: 0xc0010112 Guest boots OK without -cpu parameter cpu: dual opteron 2435 (12 cores total) ram: 32gig host dist: ubuntu 9.04 host kernel: 2.6.28-16-generic #55-Ubuntu SMP guest dist: xubuntu-9.10-amd64 # /usr/local/qemu-kvm-0.12.3/bin/qemu-system-x86_64 -name "ubu64 localhost:69" -M pc \ -m 2048 -boot d -vga std \ -net nic,macaddr=BA:DD:C0:FF:EE:F6,model=virtio -net vde \ -drive file=/dev/sdp,if=scsi,boot=on \ -cpu kvm64 \ -cdrom iso/xubuntu-9.10-desktop-amd64.iso -k en-us -localtime -sdl -vnc localhost:69 -daemonize -usbdevice tablet -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2972152&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm 0.12.3, Slackware 13 host and Windows XP guest - time drift a lot
On 03/17/2010 09:22 AM, Thomas Løcke wrote: Hey all, I'm working on moving from a mixture of physical servers and virtualized servers running on Virtualbox, to a pure KVM setup. But I'm having some problems with my Windows XP guests in my test-setup. This is the host I'm testing on: CPU: Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz RAM: 8GB 2x320GB WD SATA disks (one for host OS and one for KVM guest images) 2x1GBs Intel nics (bonded) Host OS is Slackware 13 with the following kernels: 2.6.29.6-huge, 2.6.29.6-generic, 2.6.33 and 2.6.33.1 qemu-kvm is 0.12.3 qemu's been changing a lot, might be best to build from the actual git repository, which is 0.12.50 now. My Linux guests works like a charm. When they boot up I do a single "ntpdate -b europe.pool.ntp.org" and after that the time stays in near perfect sync with the host, with no ntpd running on the guests. My Windows XP guests on the other hand drifts backwards in time, especially when there's load on the guest, for example when I'm copying a large file from my samba server to the Windows XP guest. The guest can easily lose 10 minutes while copying a 600MB file. Or if I start a few browsers and point them at some horrible flash heavy sites and just let them sit there, then the VM also start losing a lot of time real fast. What's your host CPU load get up to. You only have a single core? This is the commandline I use to start the Windows XP guests: qemu-system-x86_64 -hda winxppro.raw -boot c -m 1024 -vnc :1 -k da -smp 1 -localtime -daemonize -name qemu_winxppro,process=qemu_winxppro -net nic,macaddr=de:ad:be:ef:00:01,model=rtl8139 -net tap -runas kvm I use the same commandline for my Linux guests, except the nic is virtio. I'm at my wits end. I've tried the -tdf option with no success. I've tried setting various -rtc options with no success. Including -rtc-td-hack ? Could it be I'm missing some key-component in the kernel? Or is there perhaps some dev version of qemu-kvm I could/should try? According to some of the #kvm residents, this should "just work" (tm), but I simply cannot make it happen. Any and all advice are more than welcome. As always, make sure you are running the latest and greatest modules, those matter even more than the kernel, and check for any warning messages in dmesg and qemu output. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu-kvm 0.12.3, Slackware 13 host and Windows XP guest - time drift a lot
Hey all, I'm working on moving from a mixture of physical servers and virtualized servers running on Virtualbox, to a pure KVM setup. But I'm having some problems with my Windows XP guests in my test-setup. This is the host I'm testing on: CPU: Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz RAM: 8GB 2x320GB WD SATA disks (one for host OS and one for KVM guest images) 2x1GBs Intel nics (bonded) Host OS is Slackware 13 with the following kernels: 2.6.29.6-huge, 2.6.29.6-generic, 2.6.33 and 2.6.33.1 qemu-kvm is 0.12.3 My Linux guests works like a charm. When they boot up I do a single "ntpdate -b europe.pool.ntp.org" and after that the time stays in near perfect sync with the host, with no ntpd running on the guests. My Windows XP guests on the other hand drifts backwards in time, especially when there's load on the guest, for example when I'm copying a large file from my samba server to the Windows XP guest. The guest can easily lose 10 minutes while copying a 600MB file. Or if I start a few browsers and point them at some horrible flash heavy sites and just let them sit there, then the VM also start losing a lot of time real fast. This is the commandline I use to start the Windows XP guests: qemu-system-x86_64 -hda winxppro.raw -boot c -m 1024 -vnc :1 -k da -smp 1 -localtime -daemonize -name qemu_winxppro,process=qemu_winxppro -net nic,macaddr=de:ad:be:ef:00:01,model=rtl8139 -net tap -runas kvm I use the same commandline for my Linux guests, except the nic is virtio. I'm at my wits end. I've tried the -tdf option with no success. I've tried setting various -rtc options with no success. Could it be I'm missing some key-component in the kernel? Or is there perhaps some dev version of qemu-kvm I could/should try? According to some of the #kvm residents, this should "just work" (tm), but I simply cannot make it happen. Any and all advice are more than welcome. :o) /Thomas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Cleanup: change to use bool return values
On Mon, Mar 15, 2010 at 05:29:09PM +0800, Gui Jianfeng wrote: > Make use of bool as return values, and remove some useless > bool value converting. Thanks Avi to point this out. > > Signed-off-by: Gui Jianfeng Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH rework] KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s error handling
On Mon, Mar 15, 2010 at 10:13:30PM +0900, Takuya Yoshikawa wrote: > kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced > mmio ring page and dev even after it has freed them. > > Also, if this function fails, though it might be rare, it seems to be > suggesting the system's serious state: so we'd better stop the works > following the kvm_creat_vm(). > > This patch clears these problems. > > We move the coalesced mmio's initialization out of kvm_create_vm(). > This seems to be natural because it includes a registration which > can be done only when vm is successfully created. > > Signed-off-by: Takuya Yoshikawa > --- > virt/kvm/coalesced_mmio.c |2 ++ > virt/kvm/kvm_main.c | 12 > 2 files changed, 10 insertions(+), 4 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Broken loadvm ?
On Tue, Mar 16, 2010 at 05:25:13PM +0200, Alpár Török wrote: > PS: It just occurred to me , that it does indeed freeze and cause a > 100% CPU usage. At least i can say for sure that network, serial line, > keyboard, nor mouse work. If loadvm is loaded from the command line. > If loaded from the monitor, everything seams to work, except the > mouse. After a -loadvm from the command line, repeating the command > from the monitor doesn't unfreeze it. > > i am really stuck with this. Any help is greatly appreciated, as > downgrading is not an option. Upgrade to qemu-kvm-0.12.3? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] add "xchg ax, reg" emulator test
On Tue, Mar 16, 2010 at 02:42:52PM +0200, Gleb Natapov wrote: > Add test for opcodes 0x90-0x9f emulation > > Signed-off-by: Gleb Natapov > diff --git a/kvm/user/test/x86/realmode.c b/kvm/user/test/x86/realmode.c > index bc6b27f..bfc2942 100644 Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: >2 serial ports?
Neo Jia wrote: > On Wed, Mar 17, 2010 at 3:35 AM, Michael Tokarev wrote: >> Neo Jia wrote: >>> May I ask if it is possible to bind a real physical serial port to a guest? >> It is all described in the documentation, quite a long list of >> various things you can attach to a virtual serial port, incl. >> a real one. > > I have tried -serial /dev/ttyS0 but I can't use it to debug my Windows guest. That's entirely different issue, -- inability to debug windows guests. Please don't hijack other threads for unrelated issues -- it makes finding information and replying more difficult. If it does not work for you, ask in a new thread. But before, try to research the issue a bit, I've seen several discussions about debugging guests over serial port in kvm. Besides, I've no idea what are you really trying to do - debugging a guest is much easier in kvm than to set up another HOST and connect two HOSTS over a null-modem serial cable /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Vivek Goyal writes: > Are you using CFQ in the host? What is the host kernel version? I am not sure > what is the problem here but you might want to play with IO controller and put > these guests in individual cgroups and see if you get better throughput even > with cache=writethrough. Hi. We're using the deadline IO scheduler on 2.6.32.7. We got better performance from deadline than from cfq when we last tested, which was admittedly around the 2.6.30 timescale so is now a rather outdated measurement. > If the problem is that if sync writes from different guests get intermixed > resulting in more seeks, IO controller might help as these writes will now > go on different group service trees and in CFQ, we try to service requests > from one service tree at a time for a period before we switch the service > tree. Thanks for the suggestion: I'll have a play with this. I currently use /sys/kernel/uids/N/cpu_share with one UID per guest to divide up the CPU between guests, but this could just as easily be done with a cgroup per guest if a side-effect is to provide a hint about IO independence to CFQ. Best wishes, Chris. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: >2 serial ports?
On Wed, Mar 17, 2010 at 3:35 AM, Michael Tokarev wrote: > Neo Jia wrote: >> May I ask if it is possible to bind a real physical serial port to a guest? > > It is all described in the documentation, quite a long list of > various things you can attach to a virtual serial port, incl. > a real one. I have tried -serial /dev/ttyS0 but I can't use it to debug my Windows guest. Thanks, Neo > > /mjt > -- I would remember that if researchers were not ambitious probably today we haven't the technology we are using! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: fix error handling in vring ioctls
Acked-by: Laurent Chavey On Wed, Mar 17, 2010 at 10:54 AM, Laurent Chavey wrote: > Acked-by: cha...@google.com > > > On Wed, Mar 17, 2010 at 7:42 AM, Michael S. Tsirkin wrote: >> Stanse found a locking problem in vhost_set_vring: >> several returns from VHOST_SET_VRING_KICK, VHOST_SET_VRING_CALL, >> VHOST_SET_VRING_ERR with the vq->mutex held. >> Fix these up. >> >> Reported-by: Jiri Slaby >> Signed-off-by: Michael S. Tsirkin >> --- >> drivers/vhost/vhost.c | 18 -- >> 1 files changed, 12 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c >> index 7cd55e0..7bd7a1e 100644 >> --- a/drivers/vhost/vhost.c >> +++ b/drivers/vhost/vhost.c >> @@ -476,8 +476,10 @@ static long vhost_set_vring(struct vhost_dev *d, int >> ioctl, void __user *argp) >> if (r < 0) >> break; >> eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd); >> - if (IS_ERR(eventfp)) >> - return PTR_ERR(eventfp); >> + if (IS_ERR(eventfp)) { >> + r = PTR_ERR(eventfp); >> + break; >> + } >> if (eventfp != vq->kick) { >> pollstop = filep = vq->kick; >> pollstart = vq->kick = eventfp; >> @@ -489,8 +491,10 @@ static long vhost_set_vring(struct vhost_dev *d, int >> ioctl, void __user *argp) >> if (r < 0) >> break; >> eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd); >> - if (IS_ERR(eventfp)) >> - return PTR_ERR(eventfp); >> + if (IS_ERR(eventfp)) { >> + r = PTR_ERR(eventfp); >> + break; >> + } >> if (eventfp != vq->call) { >> filep = vq->call; >> ctx = vq->call_ctx; >> @@ -505,8 +509,10 @@ static long vhost_set_vring(struct vhost_dev *d, int >> ioctl, void __user *argp) >> if (r < 0) >> break; >> eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd); >> - if (IS_ERR(eventfp)) >> - return PTR_ERR(eventfp); >> + if (IS_ERR(eventfp)) { >> + r = PTR_ERR(eventfp); >> + break; >> + } >> if (eventfp != vq->error) { >> filep = vq->error; >> vq->error = eventfp; >> -- >> 1.7.0.18.g0d53a5 >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Disable unrestricted guest when EPT disabled
Marcelo Tosatti wrote: > On Fri, Nov 27, 2009 at 04:46:26PM +0800, Sheng Yang wrote: > >> Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest >> supported processors. >> >> Signed-off-by: Sheng Yang >> > > Applied, thanks. > So without this patch kvm breaks with ept=0? Sounds like a stable candidate to me. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: MMU: Disassociate direct maps from guest levels
On Sun, Mar 14, 2010 at 10:22:52AM +0200, Avi Kivity wrote: > Direct maps are linear translations for a section of memory, used for > real mode or with large pages. As such, they are independent of the guest > levels. > > Teach the mmu about this by making page->role.glevels = 0 for direct maps. > This allows direct maps to be shared among real mode and the various paging > modes. > > Signed-off-by: Avi Kivity > --- > arch/x86/kvm/mmu.c |2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index b137515..a984bc1 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -1328,6 +1328,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct > kvm_vcpu *vcpu, > role = vcpu->arch.mmu.base_role; > role.level = level; > role.direct = direct; > + if (role.direct) > + role.glevels = 0; > role.access = access; > if (vcpu->arch.mmu.root_level <= PT32_ROOT_LEVEL) { > quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level)); > -- > 1.7.0.2 Isnt this what happens already, since for tdp base_role.glevels is not initialized? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm crashes with Assertion ... failed.
On Sun, Mar 14, 2010 at 09:57:52AM +0100, André Weidemann wrote: > Hi, > I cloned the qemu-kvm git repository today with "git clone > git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git > qemu-kvm-2010-03-14", ran configure and compiled it and did a "make > install". Everything went fine without warnings or errors. > For configure output take a look here: http://pastebin.com/BL4DYCRY > > Here is my Server Hardware: > Asus P5Q Mainbaord > Intel Q9300 > 8GB RAM > RAID5 with mdadm consisting of 4x 1TB disks > The volume /dev/storage/Windows7test mentioned below is on this RAID5. > > I ran my virtual machine with the following command: > > qemu-system-x86_64 -cpu core2duo -vga cirrus -boot order=ndc -vnc > 192.168.3.42:2 -k de -smp 4,cores=4 -drive > file=/vmware/Windows7Test_600G.img,if=ide,index=0,cache=writeback -m > 1024 -net nic,model=e1000,macaddr=DE:AD:BE:EF:12:3A -net > tap,script=/usr/local/bin/qemu-ifup -monitor pty -name > Windows7test,process=Windows7test -drive > file=/dev/storage/Windows7test,if=ide,index=1,cache=none,aio=native Andre, Can you try qemu-kvm-0.12.3 ? > Windows7Test_600G.img is a qcow2 file and contains a Windows 7 Pro image. > /dev/storage/Windows7test is formated with XFS > > After starting the machine with the above command line, I booted > into an Ubuntu 9.10 x86_64 Live Image via PXE and mounted /dev/sdb1 > (/dev/storage/Windows7test) under /mnt. I then did "cd /mnt/" and > ran "iozone -Ra -g 2G -b /tmp/iozone-aoi-linux-xls" > > iozone ran some test and then kvm simply quit with the following > error message: > qemu-system-x86_64: > /usr/local/src/qemu-kvm-2010-03-10/hw/ide/internal.h:510: > bmdma_active_if: Assertion `bmdma->unit != (uint8_t)-1' failed. > > /var/log/syslog contained the folowing: > Mar 14 09:18:14 server kernel: [318080.627468] kvm: 1361: cpu0 > kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop > Mar 14 09:18:14 server kernel: [318080.627473] kvm: 1361: cpu0 > kvm_set_msr_common: MSR_IA32_MCG_CTL 0x, nop > Mar 14 09:18:14 server kernel: [318080.627476] kvm: 1361: cpu0 > unhandled wrmsr: 0x400 data > Mar 14 09:18:14 server kernel: [318080.627506] kvm: 1361: cpu1 > kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop > Mar 14 09:18:14 server kernel: [318080.627509] kvm: 1361: cpu1 > kvm_set_msr_common: MSR_IA32_MCG_CTL 0x, nop > Mar 14 09:18:14 server kernel: [318080.627511] kvm: 1361: cpu1 > unhandled wrmsr: 0x400 data > Mar 14 09:18:14 server kernel: [318080.627538] kvm: 1361: cpu2 > kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop > Mar 14 09:18:14 server kernel: [318080.627540] kvm: 1361: cpu2 > kvm_set_msr_common: MSR_IA32_MCG_CTL 0x, nop > Mar 14 09:18:14 server kernel: [318080.627543] kvm: 1361: cpu2 > unhandled wrmsr: 0x400 data > > > I ws able to reproduce this error 3 times in a row. > > Regards, > André -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] kvm: svm: reset cr0 properly on vcpu reset
Eduardo Habkost wrote: > svm_vcpu_reset() was not properly resetting the contents of the guest-visible > cr0 register, causing the following issue: > https://bugzilla.redhat.com/show_bug.cgi?id=525699 > > Without resetting cr0 properly, the vcpu was running the SIPI bootstrap > routine > with paging enabled, making the vcpu get a pagefault exception while trying to > run it. > > Instead of setting vmcb->save.cr0 directly, the new code just resets > kvm->arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on > vmcb->save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0(). > > kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure > kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode. > > Signed-off-by: Eduardo Habkost > Should this go into -stable? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: fix error handling in vring ioctls
Acked-by: cha...@google.com On Wed, Mar 17, 2010 at 7:42 AM, Michael S. Tsirkin wrote: > Stanse found a locking problem in vhost_set_vring: > several returns from VHOST_SET_VRING_KICK, VHOST_SET_VRING_CALL, > VHOST_SET_VRING_ERR with the vq->mutex held. > Fix these up. > > Reported-by: Jiri Slaby > Signed-off-by: Michael S. Tsirkin > --- > drivers/vhost/vhost.c | 18 -- > 1 files changed, 12 insertions(+), 6 deletions(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index 7cd55e0..7bd7a1e 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -476,8 +476,10 @@ static long vhost_set_vring(struct vhost_dev *d, int > ioctl, void __user *argp) > if (r < 0) > break; > eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd); > - if (IS_ERR(eventfp)) > - return PTR_ERR(eventfp); > + if (IS_ERR(eventfp)) { > + r = PTR_ERR(eventfp); > + break; > + } > if (eventfp != vq->kick) { > pollstop = filep = vq->kick; > pollstart = vq->kick = eventfp; > @@ -489,8 +491,10 @@ static long vhost_set_vring(struct vhost_dev *d, int > ioctl, void __user *argp) > if (r < 0) > break; > eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd); > - if (IS_ERR(eventfp)) > - return PTR_ERR(eventfp); > + if (IS_ERR(eventfp)) { > + r = PTR_ERR(eventfp); > + break; > + } > if (eventfp != vq->call) { > filep = vq->call; > ctx = vq->call_ctx; > @@ -505,8 +509,10 @@ static long vhost_set_vring(struct vhost_dev *d, int > ioctl, void __user *argp) > if (r < 0) > break; > eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd); > - if (IS_ERR(eventfp)) > - return PTR_ERR(eventfp); > + if (IS_ERR(eventfp)) { > + r = PTR_ERR(eventfp); > + break; > + } > if (eventfp != vq->error) { > filep = vq->error; > vq->error = eventfp; > -- > 1.7.0.18.g0d53a5 > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On Wed, Mar 17, 2010 at 03:14:10PM +, Chris Webb wrote: > Anthony Liguori writes: > > > This really gets down to your definition of "safe" behaviour. As it > > stands, if you suffer a power outage, it may lead to guest > > corruption. > > > > While we are correct in advertising a write-cache, write-caches are > > volatile and should a drive lose power, it could lead to data > > corruption. Enterprise disks tend to have battery backed write > > caches to prevent this. > > > > In the set up you're emulating, the host is acting as a giant write > > cache. Should your host fail, you can get data corruption. > > Hi Anthony. I suspected my post might spark an interesting discussion! > > Before considering anything like this, we did quite a bit of testing with > OSes in qemu-kvm guests running filesystem-intensive work, using an ipmitool > power off to kill the host. I didn't manage to corrupt any ext3, ext4 or > NTFS filesystems despite these efforts. > > Is your claim here that:- > > (a) qemu doesn't emulate a disk write cache correctly; or > > (b) operating systems are inherently unsafe running on top of a disk with > a write-cache; or > > (c) installations that are already broken and lose data with a physical > drive with a write-cache can lose much more in this case because the > write cache is much bigger? > > Following Christoph Hellwig's patch series from last September, I'm pretty > convinced that (a) isn't true apart from the inability to disable the > write-cache at run-time, which is something that neither recent linux nor > windows seem to want to do out-of-the box. > > Given that modern SATA drives come with fairly substantial write-caches > nowadays which operating systems leave on without widespread disaster, I > don't really believe in (b) either, at least for the ide and scsi case. > Filesystems know they have to flush the disk cache to avoid corruption. > (Virtio makes the write cache invisible to the OS except in linux 2.6.32+ so > I know virtio-blk has to be avoided for current windows and obsolete linux > when writeback caching is on.) > > I can certainly imagine (c) might be the case, although when I use strace to > watch the IO to the block device, I see pretty regular fdatasyncs being > issued by the guests, interleaved with the writes, so I'm not sure how > likely the problem would be in practice. Perhaps my test guests were > unrepresentatively well-behaved. > > However, the potentially unlimited time-window for loss of incorrectly > unsynced data is also something one could imagine fixing at the qemu level. > Perhaps I should be implementing something like > cache=writeback,flushtimeout=N which, upon a write being issued to the block > device, starts an N second timer if it isn't already running. The timer is > destroyed on flush, and if it expires before it's destroyed, a gratuitous > flush is sent. Do you think this is worth doing? Just a simple 'while sleep > 10; do sync; done' on the host even! > > We've used cache=none and cache=writethrough, and whilst performance is fine > with a single guest accessing a disk, when we chop the disks up with LVM and > run a even a small handful of guests, the constant seeking to serve tiny > synchronous IOs leads to truly abysmal throughput---we've seen less than > 700kB/s streaming write rates within guests when the backing store is > capable of 100MB/s. > > With cache=writeback, there's still IO contention between guests, but the > write granularity is a bit coarser, so the host's elevator seems to get a > bit more of a chance to help us out and we can at least squeeze out 5-10MB/s > from two or three concurrently running guests, getting a total of 20-30% of > the performance of the underlying block device rather than a total of around > 5%. Hi Chris, Are you using CFQ in the host? What is the host kernel version? I am not sure what is the problem here but you might want to play with IO controller and put these guests in individual cgroups and see if you get better throughput even with cache=writethrough. If the problem is that if sync writes from different guests get intermixed resulting in more seeks, IO controller might help as these writes will now go on different group service trees and in CFQ, we try to service requests from one service tree at a time for a period before we switch the service tree. The issue will be that all the logic is in CFQ and it works at leaf nodes of storage stack and not at LVM nodes. So first you might want to try it with single partitioned disk. If it helps, then it might help with LVM configuration also (IO control working at leaf nodes). Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On 03/17/2010 06:57 PM, Christoph Hellwig wrote: On Wed, Mar 17, 2010 at 06:40:30PM +0200, Avi Kivity wrote: Chris, can you carry out an experiment? Write a program that pwrite()s a byte to a file at the same location repeatedly, with the file opened using O_SYNC. Measure the write rate, and run blktrace on the host to see what the disk (/dev/sda, not the volume) sees. Should be a (write, flush, write, flush) per pwrite pattern or similar (for writing the data and a journal block, perhaps even three writes will be needed). Then scale this across multiple guests, measure and trace again. If we're lucky, the flushes will be coalesced, if not, we need to work on it. As the person who has written quite a bit of the current O_SYNC implementation and also reviewed the rest of it I can tell you that those flushes won't be coalesced. If we always rewrite the same block we do the cache flush from the fsync method and there's is nothing to coalesced it there. If you actually do modify metadata (e.g. by using the new real O_SYNC instead of the old one that always was O_DSYNC that I introduced in 2.6.33 but that isn't picked up by userspace yet) you might hit a very limited transaction merging window in some filesystems, but it's generally very small for a good reason. If it were too large we'd make the once progress wait for I/O in another just because we might expect transactions to coalesced later. There's been some long discussion about that fsync transaction batching tuning for ext3 a while ago. I definitely don't expect flush merging for a single guest, but for multiple guests there is certainly an opportunity for merging. Most likely we don't take advantage of it and that's one of the problems. Copying data into pagecache so that we can merge the flushes seems like a very unsatisfactory implementation. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On 03/17/2010 06:52 PM, Christoph Hellwig wrote: On Wed, Mar 17, 2010 at 06:22:29PM +0200, Avi Kivity wrote: They should be reorderable. Otherwise host filesystems on several volumes would suffer the same problems. They are reordable, just not as extremly as the the page cache. Remember that the request queue really is just a relatively small queue of outstanding I/O, and that is absolutely intentional. Large scale _caching_ is done by the VM in the pagecache, with all the usual aging, pressure, etc algorithms applied to it. We already have the large scale caching and stuff running in the guest. We have a stream of optimized requests coming out of guests, running the same algorithm again shouldn't improve things. The host has an opportunity to do inter-guest optimization, but given each guest has its own disk area, I don't see how any reordering or merging could help here (beyond sorting guests according to disk order). The block devices have a relatively small fixed size request queue associated with it to facilitate request merging and limited reordering and having fully set up I/O requests for the device. We should enlarge the queues, increase request reorderability, and merge flushes (delay flushes until after unrelated writes, then adjacent flushes can be collapsed). Collapsing flushes should get us better than linear scaling (since we collapes N writes + M flushes into N writes and 1 flush). However the writes themselves scale worse than linearly, since they now span a larger disk space and cause higher seek penalties. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On 03/17/2010 06:58 PM, Christoph Hellwig wrote: On Wed, Mar 17, 2010 at 06:53:34PM +0200, Avi Kivity wrote: Meanwhile I looked at the code, and it looks bad. There is an IO_CMD_FDSYNC, but it isn't tagged, so we have to drain the queue before issuing it. In any case, qemu doesn't use it as far as I could tell, and even if it did, device-matter doesn't implement the needed ->aio_fsync() operation. No one implements it, and all surrounding code is dead wood. It would require us to do asynchronous pagecache operations, which involve major surgery of the VM code. Patches to do this were rejected multiple times. Pity. What about the O_DIRECT aio case? It's ridiculous that you can submit async write requests but have to wait synchronously for them to actually hit the disk if you have a write cache. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On Wed, Mar 17, 2010 at 06:53:34PM +0200, Avi Kivity wrote: > Meanwhile I looked at the code, and it looks bad. There is an > IO_CMD_FDSYNC, but it isn't tagged, so we have to drain the queue before > issuing it. In any case, qemu doesn't use it as far as I could tell, > and even if it did, device-matter doesn't implement the needed > ->aio_fsync() operation. No one implements it, and all surrounding code is dead wood. It would require us to do asynchronous pagecache operations, which involve major surgery of the VM code. Patches to do this were rejected multiple times. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On Wed, Mar 17, 2010 at 06:40:30PM +0200, Avi Kivity wrote: > Chris, can you carry out an experiment? Write a program that pwrite()s > a byte to a file at the same location repeatedly, with the file opened > using O_SYNC. Measure the write rate, and run blktrace on the host to > see what the disk (/dev/sda, not the volume) sees. Should be a (write, > flush, write, flush) per pwrite pattern or similar (for writing the data > and a journal block, perhaps even three writes will be needed). > > Then scale this across multiple guests, measure and trace again. If > we're lucky, the flushes will be coalesced, if not, we need to work on it. As the person who has written quite a bit of the current O_SYNC implementation and also reviewed the rest of it I can tell you that those flushes won't be coalesced. If we always rewrite the same block we do the cache flush from the fsync method and there's is nothing to coalesced it there. If you actually do modify metadata (e.g. by using the new real O_SYNC instead of the old one that always was O_DSYNC that I introduced in 2.6.33 but that isn't picked up by userspace yet) you might hit a very limited transaction merging window in some filesystems, but it's generally very small for a good reason. If it were too large we'd make the once progress wait for I/O in another just because we might expect transactions to coalesced later. There's been some long discussion about that fsync transaction batching tuning for ext3 a while ago. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Fix SIGFPE for vnc display of width/height = 1
Anthony Liguori wrote: > On 03/08/2010 08:34 AM, Chris Webb wrote: >> During boot, the screen gets resized to height 1 and a mouse click at >> this >> point will cause a division by zero when calculating the absolute >> pointer >> position from the pixel (x, y). Return a click in the middle of the >> screen >> instead in this case. >> >> Signed-off-by: Chris Webb >> > Applied. Thanks. Also queued it to stable? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On 03/17/2010 06:47 PM, Chris Webb wrote: Avi Kivity writes: Chris, can you carry out an experiment? Write a program that pwrite()s a byte to a file at the same location repeatedly, with the file opened using O_SYNC. Measure the write rate, and run blktrace on the host to see what the disk (/dev/sda, not the volume) sees. Should be a (write, flush, write, flush) per pwrite pattern or similar (for writing the data and a journal block, perhaps even three writes will be needed). Then scale this across multiple guests, measure and trace again. If we're lucky, the flushes will be coalesced, if not, we need to work on it. Sure, sounds like an excellent plan. I don't have a test machine at the moment as the last host I was using for this has gone into production, but I'm due to get another one to install later today or first thing tomorrow which would be ideal for doing this. I'll follow up with the results once I have them. Meanwhile I looked at the code, and it looks bad. There is an IO_CMD_FDSYNC, but it isn't tagged, so we have to drain the queue before issuing it. In any case, qemu doesn't use it as far as I could tell, and even if it did, device-matter doesn't implement the needed ->aio_fsync() operation. So, there's a lot of plubming needed before we can get cache flushes merged into each other. Given cache=writeback does allow merging, I think we explained part of the problem at least. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On Wed, Mar 17, 2010 at 06:22:29PM +0200, Avi Kivity wrote: > They should be reorderable. Otherwise host filesystems on several > volumes would suffer the same problems. They are reordable, just not as extremly as the the page cache. Remember that the request queue really is just a relatively small queue of outstanding I/O, and that is absolutely intentional. Large scale _caching_ is done by the VM in the pagecache, with all the usual aging, pressure, etc algorithms applied to it. The block devices have a relatively small fixed size request queue associated with it to facilitate request merging and limited reordering and having fully set up I/O requests for the device. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Avi Kivity writes: > Chris, can you carry out an experiment? Write a program that > pwrite()s a byte to a file at the same location repeatedly, with the > file opened using O_SYNC. Measure the write rate, and run blktrace > on the host to see what the disk (/dev/sda, not the volume) sees. > Should be a (write, flush, write, flush) per pwrite pattern or > similar (for writing the data and a journal block, perhaps even > three writes will be needed). > > Then scale this across multiple guests, measure and trace again. If > we're lucky, the flushes will be coalesced, if not, we need to work > on it. Sure, sounds like an excellent plan. I don't have a test machine at the moment as the last host I was using for this has gone into production, but I'm due to get another one to install later today or first thing tomorrow which would be ideal for doing this. I'll follow up with the results once I have them. Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On 03/17/2010 06:22 PM, Avi Kivity wrote: Also, if my guest kernel issues (say) three small writes, one at the start of the disk, one in the middle, one at the end, and then does a flush, can virtio really express this as one non-contiguous O_DIRECT write (the three components of which can be reordered by the elevator with respect to one another) rather than three distinct O_DIRECT writes which can't be permuted? Can qemu issue a write like that? cache=writeback + flush allows this to be optimised by the block layer in the normal way. Guest side virtio will send this as three requests followed by a flush. Qemu will issue these as three distinct requests and then flush. The requests are marked, as Christoph says, in a way that limits their reorderability, and perhaps if we fix these two problems performance will improve. Something that comes to mind is merging of flush requests. If N guests issue one write and one flush each, we should issue N writes and just one flush - a flush for the disk applies to all volumes on that disk. Chris, can you carry out an experiment? Write a program that pwrite()s a byte to a file at the same location repeatedly, with the file opened using O_SYNC. Measure the write rate, and run blktrace on the host to see what the disk (/dev/sda, not the volume) sees. Should be a (write, flush, write, flush) per pwrite pattern or similar (for writing the data and a journal block, perhaps even three writes will be needed). Then scale this across multiple guests, measure and trace again. If we're lucky, the flushes will be coalesced, if not, we need to work on it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/10] Don't call apic functions directly from kvm code
On 03/17/2010 04:00 PM, Glauber Costa wrote: On Tue, Mar 09, 2010 at 03:27:02PM +0200, Avi Kivity wrote: On 02/26/2010 10:12 PM, Glauber Costa wrote: It is actually not necessary to call a tpr function to save and load cr8, as cr8 is part of the processor state, and thus, it is much easier to just add it to CPUState. As for apic base, wrap kvm usages, so we can call either the qemu device, or the in kernel version. } +static void kvm_set_apic_base(CPUState *env, uint64_t val) +{ +if (!kvm_irqchip_in_kernel()) +cpu_set_apic_base(env, val); What if it is in kernel? Just ignored? Doesn't seem right. At this point it is right, because there is no irqchip in kernel yet. In a later patch, irqchip in kernel begins to exist, and this function gets filled. Ok. In the future please code things like that without the if (), and add it when you introduce the other side. Helps fend off nit-pickers. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Anthony Liguori writes: > On 03/17/2010 10:14 AM, Chris Webb wrote: > > (c) installations that are already broken and lose data with a physical > > drive with a write-cache can lose much more in this case because the > > write cache is much bigger? > > This is the closest to the most accurate. > > It basically boils down to this: most enterprises use a disks with > battery backed write caches. Having the host act as a giant write > cache means that you can lose data. > > I agree that a well behaved file system will not become corrupt, but > my contention is that for many types of applications, data lose == > corruption and not all file systems are well behaved. And it's > certainly valid to argue about whether common filesystems are > "broken" but from a purely pragmatic perspective, this is going to > be the case. Okay. What I was driving at in describing these systems as 'already broken' is that they will already lose data (in this sense) if they're run on bare metal with normal commodity SATA disks with their 32MB write caches on. That configuration surely describes the vast majority of PC-class desktops and servers! If I understand correctly, your point here is that the small cache on a real SATA drive gives a relatively small time window for data loss, whereas the worry with cache=writeback is that the host page cache can be gigabytes, so the time window for unsynced data to be lost is potentially enormous. Isn't the fix for that just forcing periodic sync on the host to bound-above the time window for unsynced data loss in the guest? Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 2/6] qemu-kvm: Modify and introduce wrapper functions to access phys_ram_dirty.
On 03/17/2010 06:06 PM, Paul Brook wrote: On 03/16/2010 10:10 PM, Blue Swirl wrote: Yes, and is what tlb_protect_code() does and it's called from tb_alloc_page() which is what's code when a TB is created. Just a tangential note: a long time ago, I tried to disable self modifying code detection for Sparc. On most RISC architectures, SMC needs explicit flushing so in theory we need not track code memory writes. However, during exceptions the translator needs to access the original unmodified code that was used to generate the TB. But maybe there are other ways to avoid SMC tracking, on x86 it's still needed On x86 you're supposed to execute a serializing instruction (one of INVD, INVEPT, INVLPG, INVVPID, LGDT, LIDT, LLDT, LTR, MOV (to control register, with the exception of MOV CR8), MOV (to debug register), WBINVD, WRMSR, CPUID, IRET, and RSM) before running modified code. Last time I checked, a jump instruction was sufficient to ensure coherency withing a core. Serializing instructions are only required for coherency between cores on SMP systems. Yeah, the docs say either a jump or a serializing instruction is needed. QEMU effectively has a very large physically tagged icache[1] with very expensive cache loads. AFAIK The only practical way to maintain that cache on x86 targets is to do write snooping via dirty bits. On targets that mandate explicit icache invalidation we might be able to get away with this, however I doubt it actually gains you anything - a correctly written guest is going to invalidate at least as much as we get from dirty tracking, and we still need to provide correct behaviour when executing with cache disabled. Agreed. but I suppose SMC is pretty rare. Every time you demand load a code page from disk, you're running self modifying code (though it usually doesn't exist in the tlb, so there's no previous version that can cause trouble). I think you're confusing TLB flushes with TB flushes. No - my thinking was page fault, load page, invlpg, continue. But the invlpg is unneeded, and "continue" has to include a jump anyway. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
* Anthony Liguori [2010-03-17 10:55:47]: > On 03/17/2010 10:14 AM, Chris Webb wrote: > >Anthony Liguori writes: > > > >>This really gets down to your definition of "safe" behaviour. As it > >>stands, if you suffer a power outage, it may lead to guest > >>corruption. > >> > >>While we are correct in advertising a write-cache, write-caches are > >>volatile and should a drive lose power, it could lead to data > >>corruption. Enterprise disks tend to have battery backed write > >>caches to prevent this. > >> > >>In the set up you're emulating, the host is acting as a giant write > >>cache. Should your host fail, you can get data corruption. > >Hi Anthony. I suspected my post might spark an interesting discussion! > > > >Before considering anything like this, we did quite a bit of testing with > >OSes in qemu-kvm guests running filesystem-intensive work, using an ipmitool > >power off to kill the host. I didn't manage to corrupt any ext3, ext4 or > >NTFS filesystems despite these efforts. > > > >Is your claim here that:- > > > > (a) qemu doesn't emulate a disk write cache correctly; or > > > > (b) operating systems are inherently unsafe running on top of a disk with > > a write-cache; or > > > > (c) installations that are already broken and lose data with a physical > > drive with a write-cache can lose much more in this case because the > > write cache is much bigger? > > This is the closest to the most accurate. > > It basically boils down to this: most enterprises use a disks with > battery backed write caches. Having the host act as a giant write > cache means that you can lose data. > Dirty limits can help control how much we lose, but also affect how much we write out. > I agree that a well behaved file system will not become corrupt, but > my contention is that for many types of applications, data lose == > corruption and not all file systems are well behaved. And it's > certainly valid to argue about whether common filesystems are > "broken" but from a purely pragmatic perspective, this is going to > be the case. > I think it is a trade-off for end users to decide on. cache=writeback does provide performance benefits, but can cause data loss. -- Three Cheers, Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On 03/17/2010 05:24 PM, Chris Webb wrote: Avi Kivity writes: On 03/15/2010 10:23 PM, Chris Webb wrote: Wasteful duplication of page cache between guest and host notwithstanding, turning on cache=writeback is a spectacular performance win for our guests. Is this with qcow2, raw file, or direct volume access? This is with direct access to logical volumes. No file systems or qcow2 in the stack. Our typical host has a couple of SATA disks, combined in md RAID1, chopped up into volumes with LVM2 (really just dm linear targets). The performance measured outside qemu is excellent, inside qemu-kvm is fine too until multiple guests are trying to access their drives at once, but then everything starts to grind badly. OK. I can understand it for qcow2, but for direct volume access this shouldn't happen. The guest schedules as many writes as it can, followed by a sync. The host (and disk) can then reschedule them whether they are in the writeback cache or in the block layer, and must sync in the same way once completed. I don't really understand what's going on here, but I wonder if the underlying problem might be that all the O_DIRECT/O_SYNC writes from the guests go down into the same block device at the bottom of the device mapper stack, and thus can't be reordered with respect to one another. They should be reorderable. Otherwise host filesystems on several volumes would suffer the same problems. Whether the filesystem is in the host or guest shouldn't matter. For our purposes, Guest AA Guest BB Guest AA Guest BB Guest AA Guest BB write A1 write A1 write B1 write B1 write A2 write A1 write A2 write B1 write A2 are all equivalent, but the system isn't allowed to reorder in this way because there isn't a separate request queue for each logical volume, just the one at the bottom. (I don't know whether nested request queues would behave remotely reasonably either, though!) Also, if my guest kernel issues (say) three small writes, one at the start of the disk, one in the middle, one at the end, and then does a flush, can virtio really express this as one non-contiguous O_DIRECT write (the three components of which can be reordered by the elevator with respect to one another) rather than three distinct O_DIRECT writes which can't be permuted? Can qemu issue a write like that? cache=writeback + flush allows this to be optimised by the block layer in the normal way. Guest side virtio will send this as three requests followed by a flush. Qemu will issue these as three distinct requests and then flush. The requests are marked, as Christoph says, in a way that limits their reorderability, and perhaps if we fix these two problems performance will improve. Something that comes to mind is merging of flush requests. If N guests issue one write and one flush each, we should issue N writes and just one flush - a flush for the disk applies to all volumes on that disk. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 2/6] qemu-kvm: Modify and introduce wrapper functions to access phys_ram_dirty.
> On 03/16/2010 10:10 PM, Blue Swirl wrote: > >> Yes, and is what tlb_protect_code() does and it's called from > >> tb_alloc_page() which is what's code when a TB is created. > > > > Just a tangential note: a long time ago, I tried to disable self > > modifying code detection for Sparc. On most RISC architectures, SMC > > needs explicit flushing so in theory we need not track code memory > > writes. However, during exceptions the translator needs to access the > > original unmodified code that was used to generate the TB. But maybe > > there are other ways to avoid SMC tracking, on x86 it's still needed > > On x86 you're supposed to execute a serializing instruction (one of > INVD, INVEPT, INVLPG, INVVPID, LGDT, LIDT, LLDT, LTR, MOV (to control > register, with the exception of MOV CR8), MOV (to debug register), > WBINVD, WRMSR, CPUID, IRET, and RSM) before running modified code. Last time I checked, a jump instruction was sufficient to ensure coherency withing a core. Serializing instructions are only required for coherency between cores on SMP systems. QEMU effectively has a very large physically tagged icache[1] with very expensive cache loads. AFAIK The only practical way to maintain that cache on x86 targets is to do write snooping via dirty bits. On targets that mandate explicit icache invalidation we might be able to get away with this, however I doubt it actually gains you anything - a correctly written guest is going to invalidate at least as much as we get from dirty tracking, and we still need to provide correct behaviour when executing with cache disabled. > > but I suppose SMC is pretty rare. > > Every time you demand load a code page from disk, you're running self > modifying code (though it usually doesn't exist in the tlb, so there's > no previous version that can cause trouble). I think you're confusing TLB flushes with TB flushes. Paul [1] Even modern x86 only have relatively small icache. The large L2/L3 caches aren't relevant as they are unified I/D caches. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Fix SIGFPE for vnc display of width/height = 1
On 03/08/2010 08:34 AM, Chris Webb wrote: During boot, the screen gets resized to height 1 and a mouse click at this point will cause a division by zero when calculating the absolute pointer position from the pixel (x, y). Return a click in the middle of the screen instead in this case. Signed-off-by: Chris Webb Applied. Thanks. Regards, Anthony Liguori --- vnc.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/vnc.c b/vnc.c index 01353a9..676a707 100644 --- a/vnc.c +++ b/vnc.c @@ -1457,8 +1457,10 @@ static void pointer_event(VncState *vs, int button_mask, int x, int y) dz = 1; if (vs->absolute) { -kbd_mouse_event(x * 0x7FFF / (ds_get_width(vs->ds) - 1), -y * 0x7FFF / (ds_get_height(vs->ds) - 1), +kbd_mouse_event(ds_get_width(vs->ds)> 1 ? + x * 0x7FFF / (ds_get_width(vs->ds) - 1) : 0x4000, +ds_get_height(vs->ds)> 1 ? + y * 0x7FFF / (ds_get_height(vs->ds) - 1) : 0x4000, dz, buttons); } else if (vnc_has_feature(vs, VNC_FEATURE_POINTER_TYPE_CHANGE)) { x -= 0x7FFF; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On 03/17/2010 10:14 AM, Chris Webb wrote: Anthony Liguori writes: This really gets down to your definition of "safe" behaviour. As it stands, if you suffer a power outage, it may lead to guest corruption. While we are correct in advertising a write-cache, write-caches are volatile and should a drive lose power, it could lead to data corruption. Enterprise disks tend to have battery backed write caches to prevent this. In the set up you're emulating, the host is acting as a giant write cache. Should your host fail, you can get data corruption. Hi Anthony. I suspected my post might spark an interesting discussion! Before considering anything like this, we did quite a bit of testing with OSes in qemu-kvm guests running filesystem-intensive work, using an ipmitool power off to kill the host. I didn't manage to corrupt any ext3, ext4 or NTFS filesystems despite these efforts. Is your claim here that:- (a) qemu doesn't emulate a disk write cache correctly; or (b) operating systems are inherently unsafe running on top of a disk with a write-cache; or (c) installations that are already broken and lose data with a physical drive with a write-cache can lose much more in this case because the write cache is much bigger? This is the closest to the most accurate. It basically boils down to this: most enterprises use a disks with battery backed write caches. Having the host act as a giant write cache means that you can lose data. I agree that a well behaved file system will not become corrupt, but my contention is that for many types of applications, data lose == corruption and not all file systems are well behaved. And it's certainly valid to argue about whether common filesystems are "broken" but from a purely pragmatic perspective, this is going to be the case. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Avi Kivity writes: > On 03/15/2010 10:23 PM, Chris Webb wrote: > > >Wasteful duplication of page cache between guest and host notwithstanding, > >turning on cache=writeback is a spectacular performance win for our guests. > > Is this with qcow2, raw file, or direct volume access? This is with direct access to logical volumes. No file systems or qcow2 in the stack. Our typical host has a couple of SATA disks, combined in md RAID1, chopped up into volumes with LVM2 (really just dm linear targets). The performance measured outside qemu is excellent, inside qemu-kvm is fine too until multiple guests are trying to access their drives at once, but then everything starts to grind badly. > I can understand it for qcow2, but for direct volume access this > shouldn't happen. The guest schedules as many writes as it can, > followed by a sync. The host (and disk) can then reschedule them > whether they are in the writeback cache or in the block layer, and > must sync in the same way once completed. I don't really understand what's going on here, but I wonder if the underlying problem might be that all the O_DIRECT/O_SYNC writes from the guests go down into the same block device at the bottom of the device mapper stack, and thus can't be reordered with respect to one another. For our purposes, Guest AA Guest BB Guest AA Guest BB Guest AA Guest BB write A1 write A1 write B1 write B1 write A2 write A1 write A2 write B1 write A2 are all equivalent, but the system isn't allowed to reorder in this way because there isn't a separate request queue for each logical volume, just the one at the bottom. (I don't know whether nested request queues would behave remotely reasonably either, though!) Also, if my guest kernel issues (say) three small writes, one at the start of the disk, one in the middle, one at the end, and then does a flush, can virtio really express this as one non-contiguous O_DIRECT write (the three components of which can be reordered by the elevator with respect to one another) rather than three distinct O_DIRECT writes which can't be permuted? Can qemu issue a write like that? cache=writeback + flush allows this to be optimised by the block layer in the normal way. Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Anthony Liguori writes: > This really gets down to your definition of "safe" behaviour. As it > stands, if you suffer a power outage, it may lead to guest > corruption. > > While we are correct in advertising a write-cache, write-caches are > volatile and should a drive lose power, it could lead to data > corruption. Enterprise disks tend to have battery backed write > caches to prevent this. > > In the set up you're emulating, the host is acting as a giant write > cache. Should your host fail, you can get data corruption. Hi Anthony. I suspected my post might spark an interesting discussion! Before considering anything like this, we did quite a bit of testing with OSes in qemu-kvm guests running filesystem-intensive work, using an ipmitool power off to kill the host. I didn't manage to corrupt any ext3, ext4 or NTFS filesystems despite these efforts. Is your claim here that:- (a) qemu doesn't emulate a disk write cache correctly; or (b) operating systems are inherently unsafe running on top of a disk with a write-cache; or (c) installations that are already broken and lose data with a physical drive with a write-cache can lose much more in this case because the write cache is much bigger? Following Christoph Hellwig's patch series from last September, I'm pretty convinced that (a) isn't true apart from the inability to disable the write-cache at run-time, which is something that neither recent linux nor windows seem to want to do out-of-the box. Given that modern SATA drives come with fairly substantial write-caches nowadays which operating systems leave on without widespread disaster, I don't really believe in (b) either, at least for the ide and scsi case. Filesystems know they have to flush the disk cache to avoid corruption. (Virtio makes the write cache invisible to the OS except in linux 2.6.32+ so I know virtio-blk has to be avoided for current windows and obsolete linux when writeback caching is on.) I can certainly imagine (c) might be the case, although when I use strace to watch the IO to the block device, I see pretty regular fdatasyncs being issued by the guests, interleaved with the writes, so I'm not sure how likely the problem would be in practice. Perhaps my test guests were unrepresentatively well-behaved. However, the potentially unlimited time-window for loss of incorrectly unsynced data is also something one could imagine fixing at the qemu level. Perhaps I should be implementing something like cache=writeback,flushtimeout=N which, upon a write being issued to the block device, starts an N second timer if it isn't already running. The timer is destroyed on flush, and if it expires before it's destroyed, a gratuitous flush is sent. Do you think this is worth doing? Just a simple 'while sleep 10; do sync; done' on the host even! We've used cache=none and cache=writethrough, and whilst performance is fine with a single guest accessing a disk, when we chop the disks up with LVM and run a even a small handful of guests, the constant seeking to serve tiny synchronous IOs leads to truly abysmal throughput---we've seen less than 700kB/s streaming write rates within guests when the backing store is capable of 100MB/s. With cache=writeback, there's still IO contention between guests, but the write granularity is a bit coarser, so the host's elevator seems to get a bit more of a chance to help us out and we can at least squeeze out 5-10MB/s from two or three concurrently running guests, getting a total of 20-30% of the performance of the underlying block device rather than a total of around 5%. Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] vhost: fix interrupt mitigation with raw sockets
A thinko in code means we never trigger interrupt mitigation. Fix this. Reported-by: Juan Quintela Reported-by: Unai Uribarri Signed-off-by: Michael S. Tsirkin --- drivers/vhost/net.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index fcafb6b..a6a88df 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -125,7 +125,7 @@ static void handle_tx(struct vhost_net *net) mutex_lock(&vq->mutex); vhost_disable_notify(vq); - if (wmem < sock->sk->sk_sndbuf * 2) + if (wmem < sock->sk->sk_sndbuf / 2) tx_poll_stop(net); hdr_size = vq->hdr_size; -- 1.7.0.18.g0d53a5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] vhost: fix error handling in vring ioctls
Stanse found a locking problem in vhost_set_vring: several returns from VHOST_SET_VRING_KICK, VHOST_SET_VRING_CALL, VHOST_SET_VRING_ERR with the vq->mutex held. Fix these up. Reported-by: Jiri Slaby Signed-off-by: Michael S. Tsirkin --- drivers/vhost/vhost.c | 18 -- 1 files changed, 12 insertions(+), 6 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 7cd55e0..7bd7a1e 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -476,8 +476,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp) if (r < 0) break; eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd); - if (IS_ERR(eventfp)) - return PTR_ERR(eventfp); + if (IS_ERR(eventfp)) { + r = PTR_ERR(eventfp); + break; + } if (eventfp != vq->kick) { pollstop = filep = vq->kick; pollstart = vq->kick = eventfp; @@ -489,8 +491,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp) if (r < 0) break; eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd); - if (IS_ERR(eventfp)) - return PTR_ERR(eventfp); + if (IS_ERR(eventfp)) { + r = PTR_ERR(eventfp); + break; + } if (eventfp != vq->call) { filep = vq->call; ctx = vq->call_ctx; @@ -505,8 +509,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp) if (r < 0) break; eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd); - if (IS_ERR(eventfp)) - return PTR_ERR(eventfp); + if (IS_ERR(eventfp)) { + r = PTR_ERR(eventfp); + break; + } if (eventfp != vq->error) { filep = vq->error; vq->error = eventfp; -- 1.7.0.18.g0d53a5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/10] Don't call apic functions directly from kvm code
On Tue, Mar 09, 2010 at 03:27:02PM +0200, Avi Kivity wrote: > On 02/26/2010 10:12 PM, Glauber Costa wrote: > >It is actually not necessary to call a tpr function to save and load cr8, > >as cr8 is part of the processor state, and thus, it is much easier > >to just add it to CPUState. > > > >As for apic base, wrap kvm usages, so we can call either the qemu device, > >or the in kernel version. > > > > > > } > > > >+static void kvm_set_apic_base(CPUState *env, uint64_t val) > >+{ > >+if (!kvm_irqchip_in_kernel()) > >+cpu_set_apic_base(env, val); > > What if it is in kernel? Just ignored? Doesn't seem right. At this point it is right, because there is no irqchip in kernel yet. In a later patch, irqchip in kernel begins to exist, and this function gets filled. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [Autotest PATCH] KVM-test: Add a subtest 'qemu_img'
Copying Michael on the message. Hi Yolkfull, I have reviewed this patch and I have some comments to make on it, similar to the ones I made on an earlier version of it: One of the things that I noticed is that this patch doesn't work very well out of the box: [...@freedom kvm]$ ./scan_results.py TestStatus Seconds Info -- --- (Result file: ../../results/default/status) smp2.Fedora.11.64.qemu_img.checkGOOD47 completed successfully smp2.Fedora.11.64.qemu_img.create GOOD44 completed successfully smp2.Fedora.11.64.qemu_img.convert.to_qcow2 FAIL45 Image converted failed; Command: /usr/bin/qemu-img convert -f qcow2 -O qcow2 /tmp/kvm_autotest_root/images/fc11-64.qcow2 /tmp/kvm_autotest_root/images/fc11-64.qcow2.converted_qcow2;Output is: qemu-img: Could not open '/tmp/kvm_autotest_root/images/fc11-64.qcow2' smp2.Fedora.11.64.qemu_img.convert.to_raw FAIL46 Image converted failed; Command: /usr/bin/qemu-img convert -f qcow2 -O raw /tmp/kvm_autotest_root/images/fc11-64.qcow2 /tmp/kvm_autotest_root/images/fc11-64.qcow2.converted_raw;Output is: qemu-img: Could not open '/tmp/kvm_autotest_root/images/fc11-64.qcow2' smp2.Fedora.11.64.qemu_img.snapshot FAIL44 Create snapshot failed via command: /usr/bin/qemu-img snapshot -c snapshot0 /tmp/kvm_autotest_root/images/fc11-64.qcow2;Output is: qemu-img: Could not open '/tmp/kvm_autotest_root/images/fc11-64.qcow2' smp2.Fedora.11.64.qemu_img.commit GOOD44 completed successfully smp2.Fedora.11.64.qemu_img.info FAIL44 Unhandled str: Unhandled TypeError: argument of type 'NoneType' is not iterable smp2.Fedora.11.64.qemu_img.rebase TEST_NA 43 Current kvm user space version does not support 'rebase' subcommand GOOD412 We need to fix that before upstream inclusion. Also, one thing that I've noticed is that this test doesn't depend of any other variants, so we don't need to repeat it to every combination of guest and qemu command line options. Michael, does it occur to you a way to get this test out of the variants block, so it gets executed only once per job and not every combination of guest and other qemu options? On Fri, Jan 29, 2010 at 4:00 AM, Yolkfull Chow wrote: > This is designed to test all subcommands of 'qemu-img' however > so far 'commit' is not implemented. > > * For 'check' subcommand test, it will 'dd' to create a file with specified > size and see whether it's supported to be checked. Then convert it to be > supported formats ('qcow2' and 'raw' so far) to see whether there's error > after convertion. > > * For 'convert' subcommand test, it will convert both to 'qcow2' and 'raw' > from > the format specified in config file. And only check 'qcow2' after convertion. > > * For 'snapshot' subcommand test, it will create two snapshots and list them. > Finally delete them if no errors found. > > * For 'info' subcommand test, it will check image format & size according to > output of 'info' subcommand at specified image file. > > * For 'rebase' subcommand test, it will create first snapshot 'sn1' based on > original > base_img, and create second snapshot based on sn1. And then rebase sn2 to > base_img. > After rebase check the baking_file of sn2. > > This supports two rebase mode: unsafe mode and safe mode: > Unsafe mode: > With -u an unsafe mode is enabled that doesn't require the backing files to > exist. > It merely changes the backing file reference in the COW image. This is useful > for > renaming or moving the backing file. The user is responsible to make sure > that the > new backing file has no changes compared to the old one, or corruption may > occur. > > Safe Mode: > Both the current and the new backing file need to exist, and after the > rebase, the > COW image is guaranteed to have the same guest visible content as before. > To achieve this, old and new backing file are compared and, if necessary, > data is > copied from the old backing file into the COW image. > > Signed-off-by: Yolkfull Chow > --- > client/tests/kvm/tests/qemu_img.py | 235 > > client/tests/kvm/tests_base.cfg.sample | 40 ++ > 2 files changed, 275 insertions(+), 0 deletions(-) > create mode 100644 client/tests/kvm/tests/qemu_img.py > > diff --git a/client/tests/kvm/tests/qemu_img.py > b/client/tests/kvm/tests/qemu_img.py > new file mode 100644 > index 000..e6352a0 > --- /dev/null > +++ b/client/tests/kvm/tests/qemu_img.py > @@ -0,0 +1,235 @@ > +import re, os, logging, commands > +from autotest_lib.client.common_lib
Re: [PATCH] KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
Avi Kivity wrote: > On 11/12/2009 02:04 AM, Jan Kiszka wrote: >> This new IOCTL exports all yet user-invisible states related to >> exceptions, interrupts, and NMIs. Together with appropriate user space >> changes, this fixes sporadic problems of vmsave/restore, live migration >> and system reset. >> >> > > Applied, thanks. I added a flags field to the structure in case we > discover a new bit that needs to fit in there. Please take a look > (separate commit in kvm-next). > So without this patch migration fails? Sounds like a stable candidate to me. Same goes for the follow-up that adds the shadow field. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Corrupt qcow2 image, recovery?
Liang Guo gave me this advice some weeks ago: > you may use kvm-nbd or qemu-nbd to present kvm image > as a NBD device, so > that you can use nbd-client to access them. eg: > > kvm-nbd /vm/sid1.img > modprobe nbd > nbd-client localhost 1024 /dev/nbd0 > fdisk -l /dev/nbd0 Didn't work for me because I always got a segfault but maybe it work's for you. - Robert On 03/16/10 19:21, Christian Nilsson wrote: > Hi! > > I'm running kvm / qemu-kvm on a couple of production servers everything (or > at least most things) works as it should. > However today someone thought it was a "good" idea to restart one of the > servers and after that the windows 2k3 guest on that server don't boot > anymore. > > kvm on this server is a bit "outdated": "QEMU PC emulator version 0.9.1 > (kvm-83)" > (I guess this is one of the qcow2 corruption bugs, and i can only blame > myself for not upgrading kvm sooner.) > The guest.qcow2 is a 21GiB file for a 60GiB disk > > i have tried a couple of things kvm-img convert -f qcow2 -O raw guest.qcow2 > guest.raw > this stops and does nothing after creating a guest.raw that is 60GiB but only > using 60MiB > > so mounted the fs from another server running: "QEMU PC emulator version > 0.12.1 (qemu-kvm-0.12.1.2)" > > and run qemu-img with the same options as above and after a few secs got > "qemu-img: error while reading" > and the same 60MiB used by guest.raw > > i also tried booting qemu-kvm with a linux guest and this qcow2 image but > only get I/O Errors (and no partitions found) > > # qemu-img check guest.qcow2 > ERROR: invalid cluster offset=0x10a000 > ERROR OFLAG_COPIED: l2_offset=ee73 refcount=1 > ERROR l2_offset=ee73: Table is not cluster aligned; L1 entry corrupted > ERROR: invalid cluster offset=0x11d44100080 > ERROR: invalid cluster offset=0x11d61600080 > ERROR: invalid cluster offset=0x11d68600080 > ERROR: invalid cluster offset=0x11d95300080 > (and a loot more in this style, full log can be provided if > it would be of help to anybody) > > > > is there any possibility to repair this file, or convert it to a RAW file > (even with parts padded that are not "safe" from the qcow2 image), or as a > last resort, are there any debug tools for qcow2 images that might be of use? > > I have read up on the qcow fileformat but right now i'm a bit short of time, > i need the data in this guests disk image, or at least the MS SQL datafiles > that are on this disk) i have also checked the qcow2 file and it do contain a > NTLDR string and a loot of other NTFS recognized strings so i know that all > data is not gone. the question is how can i access it as a Filesystem again? > > > Any help would be appreciated! > > Regards > Christian Nilsson > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv6 4/4] virtio-pci: irqfd support
Use irqfd when supported by kernel. This uses msix mask notifiers: when vector is masked, we poll it from userspace. When it is unmasked, we poll it from kernel. Signed-off-by: Michael S. Tsirkin --- hw/virtio-pci.c | 27 +++ 1 files changed, 27 insertions(+), 0 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 4255d98..f8d8022 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -402,6 +402,27 @@ static void virtio_pci_guest_notifier_read(void *opaque) } } +static int virtio_pci_mask_notifier(PCIDevice *dev, unsigned vector, +void *opaque, int masked) +{ +VirtQueue *vq = opaque; +EventNotifier *notifier = virtio_queue_get_guest_notifier(vq); +int r = kvm_set_irqfd(dev->msix_irq_entries[vector].gsi, + event_notifier_get_fd(notifier), + !masked); +if (r < 0) { +return (r == -ENOSYS) ? 0 : r; +} +if (masked) { +qemu_set_fd_handler(event_notifier_get_fd(notifier), +virtio_pci_guest_notifier_read, NULL, vq); +} else { +qemu_set_fd_handler(event_notifier_get_fd(notifier), +NULL, NULL, NULL); +} +return 0; +} + static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign) { VirtIOPCIProxy *proxy = opaque; @@ -415,7 +436,11 @@ static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign) } qemu_set_fd_handler(event_notifier_get_fd(notifier), virtio_pci_guest_notifier_read, NULL, vq); +msix_set_mask_notifier(&proxy->pci_dev, + virtio_queue_vector(proxy->vdev, n), vq); } else { +msix_set_mask_notifier(&proxy->pci_dev, + virtio_queue_vector(proxy->vdev, n), NULL); qemu_set_fd_handler(event_notifier_get_fd(notifier), NULL, NULL, NULL); event_notifier_cleanup(notifier); @@ -500,6 +525,8 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev, proxy->pci_dev.config_write = virtio_write_config; +proxy->pci_dev.msix_mask_notifier = virtio_pci_mask_notifier; + size = VIRTIO_PCI_REGION_SIZE(&proxy->pci_dev) + vdev->config_len; if (size & (size-1)) size = 1 << qemu_fls(size); -- 1.7.0.18.g0d53a5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv6 3/4] msix: add mask/unmask notifiers
Support per-vector callbacks for msix mask/unmask. Will be used for vhost net. Signed-off-by: Michael S. Tsirkin --- hw/msix.c | 36 +++- hw/msix.h |1 + hw/pci.h |6 ++ 3 files changed, 42 insertions(+), 1 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index faee0b2..3ec8805 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -317,6 +317,13 @@ static void msix_mmio_writel(void *opaque, target_phys_addr_t addr, if (kvm_enabled() && kvm_irqchip_in_kernel()) { kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector)); } +if (was_masked != msix_is_masked(dev, vector) && +dev->msix_mask_notifier && dev->msix_mask_notifier_opaque[vector]) { +int r = dev->msix_mask_notifier(dev, vector, + dev->msix_mask_notifier_opaque[vector], + msix_is_masked(dev, vector)); +assert(r >= 0); +} msix_handle_mask_update(dev, vector); } @@ -355,10 +362,18 @@ void msix_mmio_map(PCIDevice *d, int region_num, static void msix_mask_all(struct PCIDevice *dev, unsigned nentries) { -int vector; +int vector, r; for (vector = 0; vector < nentries; ++vector) { unsigned offset = vector * MSIX_ENTRY_SIZE + MSIX_VECTOR_CTRL; +int was_masked = msix_is_masked(dev, vector); dev->msix_table_page[offset] |= MSIX_VECTOR_MASK; +if (was_masked != msix_is_masked(dev, vector) && +dev->msix_mask_notifier && dev->msix_mask_notifier_opaque[vector]) { +r = dev->msix_mask_notifier(dev, vector, +dev->msix_mask_notifier_opaque[vector], +msix_is_masked(dev, vector)); +assert(r >= 0); +} } } @@ -381,6 +396,9 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries, sizeof *dev->msix_irq_entries); } #endif +dev->msix_mask_notifier_opaque = +qemu_mallocz(nentries * sizeof *dev->msix_mask_notifier_opaque); +dev->msix_mask_notifier = NULL; dev->msix_entry_used = qemu_mallocz(MSIX_MAX_ENTRIES * sizeof *dev->msix_entry_used); @@ -443,6 +461,8 @@ int msix_uninit(PCIDevice *dev) dev->msix_entry_used = NULL; qemu_free(dev->msix_irq_entries); dev->msix_irq_entries = NULL; +qemu_free(dev->msix_mask_notifier_opaque); +dev->msix_mask_notifier_opaque = NULL; dev->cap_present &= ~QEMU_PCI_CAP_MSIX; return 0; } @@ -586,3 +606,17 @@ void msix_unuse_all_vectors(PCIDevice *dev) return; msix_free_irq_entries(dev); } + +int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) +{ +int r = 0; +if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector]) +return 0; + +if (dev->msix_mask_notifier) +r = dev->msix_mask_notifier(dev, vector, opaque, +msix_is_masked(dev, vector)); +if (r >= 0) +dev->msix_mask_notifier_opaque[vector] = opaque; +return r; +} diff --git a/hw/msix.h b/hw/msix.h index a9f7993..f167231 100644 --- a/hw/msix.h +++ b/hw/msix.h @@ -33,4 +33,5 @@ void msix_reset(PCIDevice *dev); extern int msix_supported; +int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque); #endif diff --git a/hw/pci.h b/hw/pci.h index 1eab8f2..100104c 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -136,6 +136,9 @@ enum { #define PCI_CAPABILITY_CONFIG_MSI_LENGTH 0x10 #define PCI_CAPABILITY_CONFIG_MSIX_LENGTH 0x10 +typedef int (*msix_mask_notifier_func)(PCIDevice *, unsigned vector, + void *opaque, int masked); + struct PCIDevice { DeviceState qdev; /* PCI config space */ @@ -201,6 +204,9 @@ struct PCIDevice { struct kvm_irq_routing_entry *msix_irq_entries; +void **msix_mask_notifier_opaque; +msix_mask_notifier_func msix_mask_notifier; + /* Device capability configuration space */ struct { int supported; -- 1.7.0.18.g0d53a5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv6 2/4] kvm: irqfd support
Add API to assign/deassign irqfd to kvm. Add stub so that users do not have to use ifdefs. Signed-off-by: Michael S. Tsirkin --- kvm-all.c | 19 +++ kvm.h | 10 ++ 2 files changed, 29 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 7b05462..1a15662 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1200,5 +1200,24 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign) } #endif +#if defined(KVM_IRQFD) +int kvm_set_irqfd(int gsi, int fd, bool assigned) +{ +struct kvm_irqfd irqfd = { +.fd = fd, +.gsi = gsi, +.flags = assigned ? 0 : KVM_IRQFD_FLAG_DEASSIGN, +}; +int r; +if (!kvm_enabled() || !kvm_irqchip_in_kernel()) +return -ENOSYS; + +r = kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd); +if (r < 0) +return r; +return 0; +} +#endif + #undef PAGE_SIZE #include "qemu-kvm.c" diff --git a/kvm.h b/kvm.h index 0951380..72dcaca 100644 --- a/kvm.h +++ b/kvm.h @@ -180,4 +180,14 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool assign) } #endif +#if defined(KVM_IRQFD) && defined(CONFIG_KVM) +int kvm_set_irqfd(int gsi, int fd, bool assigned); +#else +static inline +int kvm_set_irqfd(int gsi, int fd, bool assigned) +{ +return -ENOSYS; +} +#endif + #endif -- 1.7.0.18.g0d53a5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv6 1/4] qemu-kvm: add vhost.h header
This makes it possible to build vhost support on systems which do not have this header. Signed-off-by: Michael S. Tsirkin --- kvm/include/linux/vhost.h | 130 + 1 files changed, 130 insertions(+), 0 deletions(-) create mode 100644 kvm/include/linux/vhost.h diff --git a/kvm/include/linux/vhost.h b/kvm/include/linux/vhost.h new file mode 100644 index 000..165a484 --- /dev/null +++ b/kvm/include/linux/vhost.h @@ -0,0 +1,130 @@ +#ifndef _LINUX_VHOST_H +#define _LINUX_VHOST_H +/* Userspace interface for in-kernel virtio accelerators. */ + +/* vhost is used to reduce the number of system calls involved in virtio. + * + * Existing virtio net code is used in the guest without modification. + * + * This header includes interface used by userspace hypervisor for + * device configuration. + */ + +#include + +#include +#include +#include + +struct vhost_vring_state { + unsigned int index; + unsigned int num; +}; + +struct vhost_vring_file { + unsigned int index; + int fd; /* Pass -1 to unbind from file. */ + +}; + +struct vhost_vring_addr { + unsigned int index; + /* Option flags. */ + unsigned int flags; + /* Flag values: */ + /* Whether log address is valid. If set enables logging. */ +#define VHOST_VRING_F_LOG 0 + + /* Start of array of descriptors (virtually contiguous) */ + __u64 desc_user_addr; + /* Used structure address. Must be 32 bit aligned */ + __u64 used_user_addr; + /* Available structure address. Must be 16 bit aligned */ + __u64 avail_user_addr; + /* Logging support. */ + /* Log writes to used structure, at offset calculated from specified +* address. Address must be 32 bit aligned. */ + __u64 log_guest_addr; +}; + +struct vhost_memory_region { + __u64 guest_phys_addr; + __u64 memory_size; /* bytes */ + __u64 userspace_addr; + __u64 flags_padding; /* No flags are currently specified. */ +}; + +/* All region addresses and sizes must be 4K aligned. */ +#define VHOST_PAGE_SIZE 0x1000 + +struct vhost_memory { + __u32 nregions; + __u32 padding; + struct vhost_memory_region regions[0]; +}; + +/* ioctls */ + +#define VHOST_VIRTIO 0xAF + +/* Features bitmask for forward compatibility. Transport bits are used for + * vhost specific features. */ +#define VHOST_GET_FEATURES _IOR(VHOST_VIRTIO, 0x00, __u64) +#define VHOST_SET_FEATURES _IOW(VHOST_VIRTIO, 0x00, __u64) + +/* Set current process as the (exclusive) owner of this file descriptor. This + * must be called before any other vhost command. Further calls to + * VHOST_OWNER_SET fail until VHOST_OWNER_RESET is called. */ +#define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01) +/* Give up ownership, and reset the device to default values. + * Allows subsequent call to VHOST_OWNER_SET to succeed. */ +#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) + +/* Set up/modify memory layout */ +#define VHOST_SET_MEM_TABLE_IOW(VHOST_VIRTIO, 0x03, struct vhost_memory) + +/* Write logging setup. */ +/* Memory writes can optionally be logged by setting bit at an offset + * (calculated from the physical address) from specified log base. + * The bit is set using an atomic 32 bit operation. */ +/* Set base address for logging. */ +#define VHOST_SET_LOG_BASE _IOW(VHOST_VIRTIO, 0x04, __u64) +/* Specify an eventfd file descriptor to signal on log write. */ +#define VHOST_SET_LOG_FD _IOW(VHOST_VIRTIO, 0x07, int) + +/* Ring setup. */ +/* Set number of descriptors in ring. This parameter can not + * be modified while ring is running (bound to a device). */ +#define VHOST_SET_VRING_NUM _IOW(VHOST_VIRTIO, 0x10, struct vhost_vring_state) +/* Set addresses for the ring. */ +#define VHOST_SET_VRING_ADDR _IOW(VHOST_VIRTIO, 0x11, struct vhost_vring_addr) +/* Base value where queue looks for available descriptors */ +#define VHOST_SET_VRING_BASE _IOW(VHOST_VIRTIO, 0x12, struct vhost_vring_state) +/* Get accessor: reads index, writes value in num */ +#define VHOST_GET_VRING_BASE _IOWR(VHOST_VIRTIO, 0x12, struct vhost_vring_state) + +/* The following ioctls use eventfd file descriptors to signal and poll + * for events. */ + +/* Set eventfd to poll for added buffers */ +#define VHOST_SET_VRING_KICK _IOW(VHOST_VIRTIO, 0x20, struct vhost_vring_file) +/* Set eventfd to signal when buffers have beed used */ +#define VHOST_SET_VRING_CALL _IOW(VHOST_VIRTIO, 0x21, struct vhost_vring_file) +/* Set eventfd to signal an error */ +#define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file) + +/* VHOST_NET specific defines */ + +/* Attach virtio net ring to a raw socket, or tap device. + * The socket must be already bound to an ethernet device, this device will be + * used for transmit. Pass fd -1 to unbind from the socket and the transmit + * device. This can be used to stop the ring (e.g. for migration). */ +#define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, s
[PATCHv6 0/4] qemu-kvm: vhost net port
This is port of vhost v6 patch set I posted previously to qemu-kvm, for those that want to get good performance out of it :) This patchset needs to be applied when qemu.git one gets merged, this includes irqchip support. Changes from previous version: - check kvm_enabled in irqfd call Michael S. Tsirkin (4): qemu-kvm: add vhost.h header kvm: irqfd support msix: add mask/unmask notifiers virtio-pci: irqfd support hw/msix.c | 36 - hw/msix.h |1 + hw/pci.h |6 ++ hw/virtio-pci.c | 27 + kvm-all.c | 19 +++ kvm.h | 10 kvm/include/linux/vhost.h | 130 + 7 files changed, 228 insertions(+), 1 deletions(-) create mode 100644 kvm/include/linux/vhost.h -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH 1/2] KVM test: Refactoring the 'autotest' subtest
On Fri, Feb 26, 2010 at 1:13 AM, sudhir kumar wrote: > Looks good to me. It will definitely boost test speed for certain > tests and give flexibility to use existing autotest strength in more > granular way. Thank you! FYI, this patch was applied, mainly because it's not dependent on cpu_set test itself: http://autotest.kernel.org/changeset/4308 > On Fri, Feb 26, 2010 at 1:13 AM, Lucas Meneghel Rodrigues > wrote: >> Refactor autotest subtest into a utility function, so other >> KVM subtests can run autotest control files in hosts as part >> of their routine. >> >> This arrangement was made to accomodate the upcoming 'cpu_set' >> test. >> >> Signed-off-by: Lucas Meneghel Rodrigues >> --- >> client/tests/kvm/kvm_test_utils.py | 165 >> +++- >> client/tests/kvm/tests/autotest.py | 153 ++--- >> 2 files changed, 171 insertions(+), 147 deletions(-) >> >> diff --git a/client/tests/kvm/kvm_test_utils.py >> b/client/tests/kvm/kvm_test_utils.py >> index 7d96d6e..71d6303 100644 >> --- a/client/tests/kvm/kvm_test_utils.py >> +++ b/client/tests/kvm/kvm_test_utils.py >> @@ -24,7 +24,7 @@ More specifically: >> import time, os, logging, re, commands >> from autotest_lib.client.common_lib import error >> from autotest_lib.client.bin import utils >> -import kvm_utils, kvm_vm, kvm_subprocess >> +import kvm_utils, kvm_vm, kvm_subprocess, scan_results >> >> >> def get_living_vm(env, vm_name): >> @@ -237,3 +237,166 @@ def get_memory_info(lvms): >> meminfo = meminfo[0:-2] + "}" >> >> return meminfo >> + >> + >> +def run_autotest(vm, session, control_path, timeout, test_name, outputdir): >> + """ >> + Run an autotest control file inside a guest (linux only utility). >> + >> + �...@param vm: VM object. >> + �...@param session: A shell session on the VM provided. >> + �...@param control: An autotest control file. >> + �...@param timeout: Timeout under which the autotest test must complete. >> + �...@param test_name: Autotest client test name. >> + �...@param outputdir: Path on host where we should copy the guest >> autotest >> + results to. >> + """ >> + def copy_if_size_differs(vm, local_path, remote_path): >> + """ >> + Copy a file to a guest if it doesn't exist or if its size differs. >> + >> + �...@param vm: VM object. >> + �...@param local_path: Local path. >> + �...@param remote_path: Remote path. >> + """ >> + copy = False >> + basename = os.path.basename(local_path) >> + local_size = os.path.getsize(local_path) >> + output = session.get_command_output("ls -l %s" % remote_path) >> + if "such file" in output: >> + logging.info("Copying %s to guest (remote file is missing)" % >> + basename) >> + copy = True >> + else: >> + try: >> + remote_size = output.split()[4] >> + remote_size = int(remote_size) >> + except IndexError, ValueError: >> + logging.error("Check for remote path size %s returned %s. " >> + "Cannot process.", remote_path, output) >> + raise error.TestFail("Failed to check for %s (Guest died?)" >> % >> + remote_path) >> + if remote_size != local_size: >> + logging.debug("Copying %s to guest due to size mismatch" >> + "(remote size %s, local size %s)" % >> + (basename, remote_size, local_size)) >> + copy = True >> + >> + if copy: >> + if not vm.copy_files_to(local_path, remote_path): >> + raise error.TestFail("Could not copy %s to guest" % >> local_path) >> + >> + >> + def extract(vm, remote_path, dest_dir="."): >> + """ >> + Extract a .tar.bz2 file on the guest. >> + >> + �...@param vm: VM object >> + �...@param remote_path: Remote file path >> + �...@param dest_dir: Destination dir for the contents >> + """ >> + basename = os.path.basename(remote_path) >> + logging.info("Extracting %s..." % basename) >> + (status, output) = session.get_command_status_output( >> + "tar xjvf %s -C %s" % (remote_path, >> dest_dir)) >> + if status != 0: >> + logging.error("Uncompress output:\n%s" % output) >> + raise error.TestFail("Could not extract % on guest") >> + >> + if not os.path.isfile(control_path): >> + raise error.TestError("Invalid path to autotest control file: %s" % >> + control_path) >> + >> + tarred_autotest_path = "/tmp/autotest.tar.bz2" >> + tarred_test_path = "/tmp/%s.tar.bz2" % test_name >> + >> + # To avoid problems, let's make the test use the current AUTODIR >> + # (autotest client path) location >> + autotest_path =
Re: [Autotest] [PATCH] KVM-test: SR-IOV: Fix a bug that wrongly check VFs count
On Thu, Mar 11, 2010 at 2:54 AM, Yolkfull Chow wrote: > The parameter 'devices_requested' is irrelated to driver_option 'max_vfs' > of 'igb'. > > NIC card 82576 has two network interfaces and each can be > virtualized up to 7 virtual functions, therefore we multiply > two for the value of driver_option 'max_vfs' and can thus get > the total number of VFs. Applied, thanks! > Signed-off-by: Yolkfull Chow > --- > client/tests/kvm/kvm_utils.py | 19 +-- > 1 files changed, 13 insertions(+), 6 deletions(-) > > diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py > index 4565dc1..1813ed1 100644 > --- a/client/tests/kvm/kvm_utils.py > +++ b/client/tests/kvm/kvm_utils.py > @@ -1012,17 +1012,22 @@ class PciAssignable(object): > """ > Get VFs count number according to lspci. > """ > + # FIXME: Need to think out a method of identify which > + # 'virtual function' belongs to which physical card considering > + # that if the host has more than one 82576 card. PCI_ID? > cmd = "lspci | grep 'Virtual Function' | wc -l" > - # For each VF we'll see 2 prints of 'Virtual Function', so let's > - # divide the result per 2 > - return int(commands.getoutput(cmd)) / 2 > + return int(commands.getoutput(cmd)) > > > def check_vfs_count(self): > """ > Check VFs count number according to the parameter driver_options. > """ > - return (self.get_vfs_count == self.devices_requested) > + # Network card 82576 has two network interfaces and each can be > + # virtualized up to 7 virtual functions, therefore we multiply > + # two for the value of driver_option 'max_vfs'. > + expected_count = int((re.findall("(\d)", self.driver_option)[0])) * 2 > + return (self.get_vfs_count == expected_count) > > > def is_binded_to_stub(self, full_id): > @@ -1054,15 +1059,17 @@ class PciAssignable(object): > elif not self.check_vfs_count(): > os.system("modprobe -r %s" % self.driver) > re_probe = True > + else: > + return True > > # Re-probe driver with proper number of VFs > if re_probe: > cmd = "modprobe %s %s" % (self.driver, self.driver_option) > + logging.info("Loading the driver '%s' with option '%s'" % > + (self.driver, self.driver_option)) > s, o = commands.getstatusoutput(cmd) > if s: > return False > - if not self.check_vfs_count(): > - return False > return True > > > -- > 1.7.0.1 > > ___ > Autotest mailing list > autot...@test.kernel.org > http://test.kernel.org/cgi-bin/mailman/listinfo/autotest > -- Lucas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM test: Make qcow2 check image non critical
Instead of forcing the vms to shut down due to qemu-img check step, just make the postprocess step non-critical, ie, doesn't make the test fail because of it. The check is still there, but it won't mask the results of tests itself, while providing useful additional info. Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/tests_base.cfg.sample |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index beae786..bb455e6 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -1049,8 +1049,7 @@ variants: post_command = " python scripts/check_image.py;" remove_image = no post_command_timeout = 600 -kill_vm = yes -kill_vm_gracefully = yes +post_command_noncritical = yes - vmdk: only Fedora Ubuntu Windows only smp2 -- 1.6.6.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: >2 serial ports?
> Oh, well, yes, I remember. qemu is more strict on ISA irq sharing now. > A bit too strict. > > /me goes dig out a old patch which never made it upstream for some > reason I forgot. Attached. This is wrong. Two devices should never be manipulating the same qemu_irq object. If you want multiple devices connected to the same IRQ then you need an explicit multiplexer. e.g. arm_timer.c:sp804_set_irq. Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: >2 serial ports?
Neo Jia wrote: > May I ask if it is possible to bind a real physical serial port to a guest? It is all described in the documentation, quite a long list of various things you can attach to a virtual serial port, incl. a real one. /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: >2 serial ports?
Gerd Hoffmann wrote: > On 03/17/10 09:38, Michael Tokarev wrote: >> Since 0.12, it appears that kvm does not allow more than >> 2 serial ports for a guest: >> >> $ kvm \ >> -serial unix:s1,server,nowait \ >> -serial unix:s2,server,nowait \ >> -serial unix:s3,server,nowait >> isa irq 4 already assigned >> >> Is there a work-around for this? > > Oh, well, yes, I remember. qemu is more strict on ISA irq sharing now. > A bit too strict. > > /me goes dig out a old patch which never made it upstream for some > reason I forgot. Attached. I tried the patch, and it now appears to work. I did not try to run various stress tests so far, but basic tests are fine. Thank you Gerd! And I think it's time to push it finally :) /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 2/3] Provides multiple submits and asynchronous notifications.
On Wed, Mar 17, 2010 at 05:48:10PM +0800, Xin, Xiaohui wrote: > >> Michael, > >> I don't use the kiocb comes from the sendmsg/recvmsg, > > >since I have embeded the kiocb in page_info structure, > > >and allocate it when page_info allocated. > > >So what I suggested was that vhost allocates and tracks the iocbs, and > >passes them to your device with sendmsg/ recvmsg calls. This way your > >device won't need to share structures and locking strategy with vhost: > >you get an iocb, handle it, invoke a callback to notify vhost about > >completion. > > >This also gets rid of the 'receiver' callback > > I'm not sure receiver callback can be removed here: > The patch describes a work flow like this: > netif_receive_skb() gets the packet, it does nothing but just queue the skb > and wakeup the handle_rx() of vhost. handle_rx() then calls the receiver > callback > to deal with skb and and get the necessary notify info into a list, vhost > owns the > list and in the same handle_rx() context use it to complete. > > We use "receiver" callback here is because only handle_rx() is waked up from > netif_receive_skb(), and we need mp device context to deal with the skb and > notify info attached to it. We also have some lock in the callback function. > > If I remove the receiver callback, I can only deal with the skb and notify > info in netif_receive_skb(), but this function is in an interrupt context, > which I think lock is not allowed there. But I cannot remove the lock there. > The basic idea is that vhost passes iocb to recvmsg and backend completes the iocb to signal that data is ready. That completion could be in interrupt context and so we need to switch to workqueue to handle the event, it is true, but the code to do this would live in vhost.c or net.c. With this structure your device won't depend on vhost, and can go under drivers/net/, opening up possibility to use it for zero copy without vhost in the future. > >> Please have a review and thanks for the instruction > >> for replying email which helps me a lot. > >> > > >Thanks, > > >Xiaohui > > > > > > drivers/vhost/net.c | 159 > > > +++-- > >> drivers/vhost/vhost.h | 12 > >> 2 files changed, 166 insertions(+), 5 deletions(-) > >> > >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > > >index 22d5fef..5483848 100644 > > >--- a/drivers/vhost/net.c > > >+++ b/drivers/vhost/net.c > > >@@ -17,11 +17,13 @@ > > > #include > > > #include > > > #include > > >+#include > > > > > > #include > > > #include > > > #include > > > #include > > >+#include > > > > > > #include > > > > > >@@ -91,6 +93,12 @@ static void tx_poll_start(struct vhost_net *net, struct > > >socket *sock) > > > net->tx_poll_state = VHOST_NET_POLL_STARTED; > > > } > > > > > >+static void handle_async_rx_events_notify(struct vhost_net *net, > > >+ struct vhost_virtqueue *vq); > > >+ > > >+static void handle_async_tx_events_notify(struct vhost_net *net, > > >+ struct vhost_virtqueue *vq); > > >+ > > >A couple of style comments: > > > >- It's better to arrange functions in such order that forward declarations > >aren't necessary. Since we don't have recursion, this should always be > >possible. > > >- continuation lines should be idented at least at the position of '(' > >on the previous line. > > Thanks. I'd correct that. > > >> /* Expects to be always run from workqueue - which acts as > >> * read-size critical section for our kind of RCU. */ > >> static void handle_tx(struct vhost_net *net) > >> @@ -124,6 +132,8 @@ static void handle_tx(struct vhost_net *net) > >>tx_poll_stop(net); > >>hdr_size = vq->hdr_size; > >> > >> + handle_async_tx_events_notify(net, vq); > > >+ > >>for (;;) { > >>head = vhost_get_vq_desc(&net->dev, vq, vq->iov, > >> ARRAY_SIZE(vq->iov), > > >@@ -151,6 +161,12 @@ static void handle_tx(struct vhost_net *net) > >>/* Skip header. TODO: support TSO. */ > >>s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, out); > >>msg.msg_iovlen = out; > > >+ > > >+ if (vq->link_state == VHOST_VQ_LINK_ASYNC) { > > >+ vq->head = head; > > >+ msg.msg_control = (void *)vq; > > >So here a device gets a pointer to vhost_virtqueue structure. If it gets > >an iocb and invokes a callback, it would not care about vhost internals. > > >> + } > >> + > >>len = iov_length(vq->iov, out); > >>/* Sanity check */ > >>if (!len) { > >> @@ -166,6 +182,10 @@ static void handle_tx(struct vhost_net *net) > >>tx_poll_start(net, sock); > >>break; > >>} > >> + > >> + if (vq->link_state == VHOST_VQ_LINK_ASYNC) > >> + continue; > >>+ > >>if (err != len) > >>p
Re: [PATCH v2] KVM: cleanup {kvm_vm_ioctl, kvm}_get_dirty_log()
Takuya Yoshikawa wrote: > > Ah, probably checking the git log will explain you why it is like that! > Marcelo's work? IIRC. Oh, i find this commit: commit 706831a7faec7ac0d3057d20df8234c45bbbc3c5 Author: Marcelo Tosatti Date: Wed Dec 23 14:35:22 2009 -0200 KVM: use SRCU for dirty log Signed-off-by: Marcelo Tosatti But i don't know why Marcelo separates kvm_get_dirty_log()'s code into kvm_vm_ioctl_get_dirty_log(). :-( Thanks, Xiao -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
On 03/17/2010 11:51 AM, Sheng Yang wrote: I think you need DM_NMI for that to work correctly. An alternative is to call the NMI handler directly. apic_send_IPI_self() already took care of APIC_DM_NMI. So it does (though not for x2apic?). I don't see why it doesn't work. And NMI handler would block the following NMI? It wouldn't - won't work without extensive changes. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
On Wednesday 17 March 2010 17:41:58 Avi Kivity wrote: > On 03/17/2010 11:28 AM, Sheng Yang wrote: > >> I'm not sure if vmexit does break NMI context or not. Hardware NMI > >> context isn't reentrant till a IRET. YangSheng would like to double > >> check it. > > > > After more check, I think VMX won't remained NMI block state for host. > > That's means, if NMI happened and processor is in VMX non-root mode, it > > would only result in VMExit, with a reason indicate that it's due to NMI > > happened, but no more state change in the host. > > > > So in that meaning, there _is_ a window between VMExit and KVM handle the > > NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling > > code because "int $2" don't have effect to block following NMI. > > That's pretty bad, as NMI runs on a separate stack (via IST). So if > another NMI happens while our int $2 is running, the stack will be > corrupted. Though hardware didn't provide this kind of block, software at least would warn about it... nmi_enter() still would be executed by "int $2", and result in BUG() if we are already in NMI context(OK, it is a little better than mysterious crash due to corrupted stack). > > > And if the NMI sequence is not important(I think so), then we need to > > generate a real NMI in current vmexit-after code. Seems let APIC send a > > NMI IPI to itself is a good idea. > > > > I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to > > replace "int $2". Something unexpected is happening... > > I think you need DM_NMI for that to work correctly. > > An alternative is to call the NMI handler directly. apic_send_IPI_self() already took care of APIC_DM_NMI. And NMI handler would block the following NMI? -- regards Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v1 2/3] Provides multiple submits and asynchronous notifications.
>> Michael, >> I don't use the kiocb comes from the sendmsg/recvmsg, > >since I have embeded the kiocb in page_info structure, > >and allocate it when page_info allocated. >So what I suggested was that vhost allocates and tracks the iocbs, and >passes them to your device with sendmsg/ recvmsg calls. This way your >device won't need to share structures and locking strategy with vhost: >you get an iocb, handle it, invoke a callback to notify vhost about >completion. >This also gets rid of the 'receiver' callback I'm not sure receiver callback can be removed here: The patch describes a work flow like this: netif_receive_skb() gets the packet, it does nothing but just queue the skb and wakeup the handle_rx() of vhost. handle_rx() then calls the receiver callback to deal with skb and and get the necessary notify info into a list, vhost owns the list and in the same handle_rx() context use it to complete. We use "receiver" callback here is because only handle_rx() is waked up from netif_receive_skb(), and we need mp device context to deal with the skb and notify info attached to it. We also have some lock in the callback function. If I remove the receiver callback, I can only deal with the skb and notify info in netif_receive_skb(), but this function is in an interrupt context, which I think lock is not allowed there. But I cannot remove the lock there. >> Please have a review and thanks for the instruction >> for replying email which helps me a lot. >> > >Thanks, > >Xiaohui > > > > drivers/vhost/net.c | 159 > > +++-- >> drivers/vhost/vhost.h | 12 >> 2 files changed, 166 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > >index 22d5fef..5483848 100644 > >--- a/drivers/vhost/net.c > >+++ b/drivers/vhost/net.c > >@@ -17,11 +17,13 @@ > > #include > > #include > > #include > >+#include > > > > #include > > #include > > #include > > #include > >+#include > > > > #include > > > >@@ -91,6 +93,12 @@ static void tx_poll_start(struct vhost_net *net, struct > >socket *sock) > > net->tx_poll_state = VHOST_NET_POLL_STARTED; > > } > > > >+static void handle_async_rx_events_notify(struct vhost_net *net, > >+struct vhost_virtqueue *vq); > >+ > >+static void handle_async_tx_events_notify(struct vhost_net *net, > >+struct vhost_virtqueue *vq); > >+ >A couple of style comments: > >- It's better to arrange functions in such order that forward declarations >aren't necessary. Since we don't have recursion, this should always be >possible. >- continuation lines should be idented at least at the position of '(' >on the previous line. Thanks. I'd correct that. >> /* Expects to be always run from workqueue - which acts as >> * read-size critical section for our kind of RCU. */ >> static void handle_tx(struct vhost_net *net) >> @@ -124,6 +132,8 @@ static void handle_tx(struct vhost_net *net) >> tx_poll_stop(net); >> hdr_size = vq->hdr_size; >> >> +handle_async_tx_events_notify(net, vq); > >+ >> for (;;) { >> head = vhost_get_vq_desc(&net->dev, vq, vq->iov, >> ARRAY_SIZE(vq->iov), > >@@ -151,6 +161,12 @@ static void handle_tx(struct vhost_net *net) >> /* Skip header. TODO: support TSO. */ >> s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, out); >> msg.msg_iovlen = out; > >+ > >+if (vq->link_state == VHOST_VQ_LINK_ASYNC) { > >+vq->head = head; > >+msg.msg_control = (void *)vq; >So here a device gets a pointer to vhost_virtqueue structure. If it gets >an iocb and invokes a callback, it would not care about vhost internals. >> +} >> + >> len = iov_length(vq->iov, out); >> /* Sanity check */ >> if (!len) { >> @@ -166,6 +182,10 @@ static void handle_tx(struct vhost_net *net) >> tx_poll_start(net, sock); >> break; >> } >> + >> +if (vq->link_state == VHOST_VQ_LINK_ASYNC) >> +continue; >>+ >> if (err != len) >> pr_err("Truncated TX packet: " >> " len %d != %zd\n", err, len); >> @@ -177,6 +197,8 @@ static void handle_tx(struct vhost_net *net) >> } >> } >> >> +handle_async_tx_events_notify(net, vq); >> + >> mutex_unlock(&vq->mutex); >> unuse_mm(net->dev.mm); >> } >>@@ -206,7 +228,8 @@ static void handle_rx(struct vhost_net *net) >> int err; >> size_t hdr_size; >> struct socket *sock = rcu_dereference(vq->private_data); >> -if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue)) >> +if (!sock || (skb_queue_empty(&sock->sk->sk_receive_queue) && >> +vq->link_state == VHOST_VQ_LINK_SYNC)) >>
Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
On 03/17/2010 11:28 AM, Sheng Yang wrote: I'm not sure if vmexit does break NMI context or not. Hardware NMI context isn't reentrant till a IRET. YangSheng would like to double check it. After more check, I think VMX won't remained NMI block state for host. That's means, if NMI happened and processor is in VMX non-root mode, it would only result in VMExit, with a reason indicate that it's due to NMI happened, but no more state change in the host. So in that meaning, there _is_ a window between VMExit and KVM handle the NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling code because "int $2" don't have effect to block following NMI. That's pretty bad, as NMI runs on a separate stack (via IST). So if another NMI happens while our int $2 is running, the stack will be corrupted. And if the NMI sequence is not important(I think so), then we need to generate a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to itself is a good idea. I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to replace "int $2". Something unexpected is happening... I think you need DM_NMI for that to work correctly. An alternative is to call the NMI handler directly. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote: > On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote: > > On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: > > > Right, but there is a scope between kvm_guest_enter and really running > > > in guest os, where a perf event might overflow. Anyway, the scope is > > > very narrow, I will change it to use flag PF_VCPU. > > > > There is also a window between setting the flag and calling 'int $2' > > where an NMI might happen and be accounted incorrectly. > > > > Perhaps separate the 'int $2' into a direct call into perf and another > > call for the rest of NMI handling. I don't see how it would work on svm > > though - AFAICT the NMI is held whereas vmx swallows it. > > > > I guess NMIs > > will be disabled until the next IRET so it isn't racy, just tricky. > > I'm not sure if vmexit does break NMI context or not. Hardware NMI context > isn't reentrant till a IRET. YangSheng would like to double check it. After more check, I think VMX won't remained NMI block state for host. That's means, if NMI happened and processor is in VMX non-root mode, it would only result in VMExit, with a reason indicate that it's due to NMI happened, but no more state change in the host. So in that meaning, there _is_ a window between VMExit and KVM handle the NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling code because "int $2" don't have effect to block following NMI. And if the NMI sequence is not important(I think so), then we need to generate a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to itself is a good idea. I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to replace "int $2". Something unexpected is happening... -- regards Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
On Tue, 2010-03-16 at 10:47 +0100, Ingo Molnar wrote: > * Zhang, Yanmin wrote: > > > On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote: > > > On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote: > > > > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote: > > > > > From: Zhang, Yanmin > > > > > > > > > > Based on the discussion in KVM community, I worked out the patch to > > > > > support > > > > > perf to collect guest os statistics from host side. This patch is > > > > > implemented > > > > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed > > > > > out a > > > > > critical bug and provided good suggestions with other guys. I really > > > > > appreciate > > > > > their kind help. > > > > > > > > > > The patch adds new subcommand kvm to perf. > > > > > > > > > >perf kvm top > > > > >perf kvm record > > > > >perf kvm report > > > > >perf kvm diff > > > > > > > > > > The new perf could profile guest os kernel except guest os user > > > > > space, but it > > > > > could summarize guest os user space utilization per guest os. > > > > > > > > > > Below are some examples. > > > > > 1) perf kvm top > > > > > [r...@lkp-ne01 norm]# perf kvm --host --guest > > > > > --guestkallsyms=/home/ymzhang/guest/kallsyms > > > > > --guestmodules=/home/ymzhang/guest/modules top > > > > > > > > > > > > > > > > > Thanks for your kind comments. > > > > > > > Excellent, support for guest kernel != host kernel is critical (I can't > > > > remember the last time I ran same kernels). > > > > > > > > How would we support multiple guests with different kernels? > > > With the patch, 'perf kvm report --sort pid" could show > > > summary statistics for all guest os instances. Then, use > > > parameter --pid of 'perf kvm record' to collect single problematic > > > instance data. > > Sorry. I found currently --pid isn't process but a thread (main thread). > > > > Ingo, > > > > Is it possible to support a new parameter or extend --inherit, so 'perf > > record' and 'perf top' could collect data on all threads of a process when > > the process is running? > > > > If not, I need add a new ugly parameter which is similar to --pid to filter > > out process data in userspace. > > Yeah. For maximum utility i'd suggest to extend --pid to include this, and > introduce --tid for the previous, limited-to-a-single-task functionality. > > Most users would expect --pid to work like a 'late attach' - i.e. to work > like > strace -f or like a gdb attach. Thanks Ingo, Avi. I worked out below patch against tip/master of March 15th. Subject: [PATCH] Change perf's parameter --pid to process-wide collection From: Zhang, Yanmin Change parameter -p (--pid) to real process pid and add -t (--tid) meaning thread id. Now, --pid means perf collects the statistics of all threads of the process, while --tid means perf just collect the statistics of that thread. BTW, the patch fixes a bug of 'perf stat -p'. 'perf stat' always configures attr->disabled=1 if it isn't a system-wide collection. If there is a '-p' and no forks, 'perf stat -p' doesn't collect any data. In addition, the while(!done) in run_perf_stat consumes 100% single cpu time which has bad impact on running workload. I added a sleep(1) in the loop. Signed-off-by: Zhang Yanmin --- diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-record.c linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-record.c --- linux-2.6_tipmaster0315/tools/perf/builtin-record.c 2010-03-16 08:59:54.896488489 +0800 +++ linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-record.c 2010-03-17 16:30:17.71706 +0800 @@ -27,7 +27,7 @@ #include #include -static int fd[MAX_NR_CPUS][MAX_COUNTERS]; +static int *fd[MAX_NR_CPUS][MAX_COUNTERS]; static longdefault_interval= 0; @@ -43,6 +43,9 @@ static intraw_samples = 0; static int system_wide = 0; static int profile_cpu = -1; static pid_t target_pid = -1; +static pid_t target_tid = -1; +static int *all_tids = NULL; +static int thread_num = 0; static pid_t child_pid = -1; static int inherit = 1; static int force = 0; @@ -60,7 +63,7 @@ static struct timeval this_read; static u64 bytes_written = 0; -static struct pollfd event_array[MAX_NR_CPUS * MAX_COUNTERS]; +static struct pollfd *event_array; static int nr_poll = 0; static int n
Re: >2 serial ports?
May I ask if it is possible to bind a real physical serial port to a guest? Thanks, Neo On Wed, Mar 17, 2010 at 1:38 AM, Michael Tokarev wrote: > Since 0.12, it appears that kvm does not allow more than > 2 serial ports for a guest: > > $ kvm \ > -serial unix:s1,server,nowait \ > -serial unix:s2,server,nowait \ > -serial unix:s3,server,nowait > isa irq 4 already assigned > > Is there a work-around for this? > > Thanks! > > /mjt > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- I would remember that if researchers were not ambitious probably today we haven't the technology we are using! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: >2 serial ports?
On 03/17/10 09:38, Michael Tokarev wrote: Since 0.12, it appears that kvm does not allow more than 2 serial ports for a guest: $ kvm \ -serial unix:s1,server,nowait \ -serial unix:s2,server,nowait \ -serial unix:s3,server,nowait isa irq 4 already assigned Is there a work-around for this? Oh, well, yes, I remember. qemu is more strict on ISA irq sharing now. A bit too strict. /me goes dig out a old patch which never made it upstream for some reason I forgot. Attached. HTH, Gerd >From 7d5d53e8a23544ac6413487a8ecdd43537ade9f3 Mon Sep 17 00:00:00 2001 From: Gerd Hoffmann Date: Fri, 11 Sep 2009 13:43:46 +0200 Subject: [PATCH] isa: refine irq reservations There are a few cases where IRQ sharing on the ISA bus is used and possible. In general only devices of the same kind can do that. A few use cases: * serial lines 1+3 share irq 4 * serial lines 2+4 share irq 3 * parallel ports share irq 7 * ppc/prep: ide ports share irq 13 This patch refines the irq reservation mechanism for the isa bus to handle those cases. It keeps track of the driver which owns the IRQ in question and allows irq sharing for devices handled by the same driver. Signed-off-by: Gerd Hoffmann --- hw/isa-bus.c | 16 +--- 1 files changed, 13 insertions(+), 3 deletions(-) diff --git a/hw/isa-bus.c b/hw/isa-bus.c index 4d489d2..bd2f69c 100644 --- a/hw/isa-bus.c +++ b/hw/isa-bus.c @@ -26,6 +26,7 @@ struct ISABus { BusState qbus; qemu_irq *irqs; uint32_t assigned; +DeviceInfo *irq_owner[16]; }; static ISABus *isabus; @@ -71,7 +72,9 @@ qemu_irq isa_reserve_irq(int isairq) exit(1); } if (isabus->assigned & (1 << isairq)) { -fprintf(stderr, "isa irq %d already assigned\n", isairq); +DeviceInfo *owner = isabus->irq_owner[isairq]; +fprintf(stderr, "isa irq %d already assigned (%s)\n", +isairq, owner ? owner->name : "unknown"); exit(1); } isabus->assigned |= (1 << isairq); @@ -82,10 +85,17 @@ void isa_init_irq(ISADevice *dev, qemu_irq *p, int isairq) { assert(dev->nirqs < ARRAY_SIZE(dev->isairq)); if (isabus->assigned & (1 << isairq)) { -fprintf(stderr, "isa irq %d already assigned\n", isairq); -exit(1); +DeviceInfo *owner = isabus->irq_owner[isairq]; +if (owner == dev->qdev.info) { +/* irq sharing is ok in case the same driver handles both */; +} else { +fprintf(stderr, "isa irq %d already assigned (%s)\n", +isairq, owner ? owner->name : "unknown"); +exit(1); +} } isabus->assigned |= (1 << isairq); +isabus->irq_owner[isairq] = dev->qdev.info; dev->isairq[dev->nirqs] = isairq; *p = isabus->irqs[isairq]; dev->nirqs++; -- 1.6.6.1
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On 03/17/2010 10:49 AM, Christoph Hellwig wrote: On Tue, Mar 16, 2010 at 01:08:28PM +0200, Avi Kivity wrote: If the batch size is larger than the virtio queue size, or if there are no flushes at all, then yes the huge write cache gives more opportunity for reordering. But we're already talking hundreds of requests here. Yes. And rememember those don't have to come from the same host. Also remember that we rather limit execssive reodering of O_DIRECT requests in the I/O scheduler because they are "synchronous" type I/O while we don't do that for pagecache writeback. Maybe we should relax that for kvm. Perhaps some of the problem comes from the fact that we call io_submit() once per request. And we don't have unlimited virtio queue size, in fact it's quite limited. That can be extended easily if it fixes the problem. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
* Avi Kivity wrote: > On 03/17/2010 10:16 AM, Ingo Molnar wrote: > >* Avi Kivity wrote: > > > >> Monitoring guests from the host is useful for kvm developers, but less so > >> for users. > > > > Guest space profiling is easy, and 'perf kvm' is not about that. (plain > > 'perf' will work if a proper paravirt channel is opened to the host) > > > > I think you might have misunderstood the purpose and role of the 'perf > > kvm' patch here? 'perf kvm' is aimed at KVM developers: it is them who > > improve KVM code, not guest kernel users. > > Of course I understood it. My point was that 'perf kvm' serves a tiny > minority of users. [...] I hope you wont be disappointed to learn that 100% of Linux, all 13+ million lines of it, was and is being developed by a tiny, tiny, tiny minority of users ;-) > [...] That doesn't mean it isn't useful, just that it doesn't satisfy all > needs by itself. Of course - and it doesnt bring world peace either. One step at a time. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html