Re: [PATCH] kvm: add missing void __user * cast to access_ok() call
On Tue, 24 May 2011 07:51:27 +0200 Heiko Carstens wrote: > From: Heiko Carstens > > fa3d315a "KVM: Validate userspace_addr of memslot when registered" introduced > this new warning onn s390: > > kvm_main.c: In function '__kvm_set_memory_region': > kvm_main.c:654:7: warning: passing argument 1 of '__access_ok' makes pointer > from integer without a cast > arch/s390/include/asm/uaccess.h:53:19: note: expected 'const void *' but > argument is of type '__u64' > > Add the missing cast to get rid of it again... > Looks good to me, thank you! I should have checked s390's type checking... Takuya > Cc: Takuya Yoshikawa > Signed-off-by: Heiko Carstens > --- > virt/kvm/kvm_main.c |3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -651,7 +651,8 @@ int __kvm_set_memory_region(struct kvm * > /* We can read the guest memory with __xxx_user() later on. */ > if (user_alloc && > ((mem->userspace_addr & (PAGE_SIZE - 1)) || > - !access_ok(VERIFY_WRITE, mem->userspace_addr, mem->memory_size))) > + !access_ok(VERIFY_WRITE, (void __user *)mem->userspace_addr, > + mem->memory_size))) > goto out; > if (mem->slot >= KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS) > goto out; -- Takuya Yoshikawa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: add missing void __user * cast to access_ok() call
From: Heiko Carstens fa3d315a "KVM: Validate userspace_addr of memslot when registered" introduced this new warning onn s390: kvm_main.c: In function '__kvm_set_memory_region': kvm_main.c:654:7: warning: passing argument 1 of '__access_ok' makes pointer from integer without a cast arch/s390/include/asm/uaccess.h:53:19: note: expected 'const void *' but argument is of type '__u64' Add the missing cast to get rid of it again... Cc: Takuya Yoshikawa Signed-off-by: Heiko Carstens --- virt/kvm/kvm_main.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -651,7 +651,8 @@ int __kvm_set_memory_region(struct kvm * /* We can read the guest memory with __xxx_user() later on. */ if (user_alloc && ((mem->userspace_addr & (PAGE_SIZE - 1)) || -!access_ok(VERIFY_WRITE, mem->userspace_addr, mem->memory_size))) +!access_ok(VERIFY_WRITE, (void __user *)mem->userspace_addr, + mem->memory_size))) goto out; if (mem->slot >= KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS) goto out; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 08/31] nVMX: Fix local_vcpus_link handling
> From: Nadav Har'El > Sent: Tuesday, May 24, 2011 2:51 AM > > > >+ vmcs_init(vmx->loaded_vmcs->vmcs); > > >+ vmx->loaded_vmcs->cpu = -1; > > >+ vmx->loaded_vmcs->launched = 0; > > > > Perhaps a loaded_vmcs_init() to encapsulate initialization of these > > three fields, you'll probably reuse it later. > > It's good you pointed this out, because it made me suddenly realise that I > forgot to VMCLEAR the new vmcs02's I allocate. In practice it never made a > difference, but better safe than sorry. yes, that's what spec requires. You need VMCLEAR on any new VMCS which does implementation specific initialization in that VMCS region. > > I had to restructure some of the code a bit to be able to properly use this > new function (in 3 places - __loaded_vmcs_clear, nested_get_current_vmcs02, > vmx_create_cpu). > > > Please repost separately after the fix, I'd like to apply it before the > > rest of the series. > > I am adding a new version of this patch at the end of this mail. > > > (regarding interrupts, I think we can do that work post-merge. But I'd > > like to see Kevin's comments addressed) > > I replied to his comments. Done some of the things he asked, and asked for > more info on why/where he believes the current code is incorrect where I > didn't understand what problems he pointed to, and am now waiting for him > to reply. As I replied in another thread, I believe this has been explained clearly by Nadav. > > > --- 8< -- 8< -- 8< -- 8< --- 8< --- > > Subject: [PATCH 01/31] nVMX: Keep list of loaded VMCSs, instead of vcpus. > > In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it > because (at least in theory) the processor might not have written all of its > content back to memory. Since a patch from June 26, 2008, this is done using > a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU. > > The problem is that with nested VMX, we no longer have the concept of a > vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for > L2s), and each of those may be have been last loaded on a different cpu. > > So instead of linking the vcpus, we link the VMCSs, using a new structure > loaded_vmcs. This structure contains the VMCS, and the information > pertaining > to its loading on a specific cpu (namely, the cpu number, and whether it > was already launched on this cpu once). In nested we will also use the same > structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the > currently active VMCS. > > Signed-off-by: Nadav Har'El > --- > arch/x86/kvm/vmx.c | 150 --- > 1 file changed, 86 insertions(+), 64 deletions(-) > > --- .before/arch/x86/kvm/vmx.c2011-05-23 21:46:14.0 +0300 > +++ .after/arch/x86/kvm/vmx.c 2011-05-23 21:46:14.0 +0300 > @@ -116,6 +116,18 @@ struct vmcs { > char data[0]; > }; > > +/* > + * Track a VMCS that may be loaded on a certain CPU. If it is (cpu!=-1), also > + * remember whether it was VMLAUNCHed, and maintain a linked list of all > VMCSs > + * loaded on this CPU (so we can clear them if the CPU goes down). > + */ > +struct loaded_vmcs { > + struct vmcs *vmcs; > + int cpu; > + int launched; > + struct list_head loaded_vmcss_on_cpu_link; > +}; > + > struct shared_msr_entry { > unsigned index; > u64 data; > @@ -124,9 +136,7 @@ struct shared_msr_entry { > > struct vcpu_vmx { > struct kvm_vcpu vcpu; > - struct list_head local_vcpus_link; > unsigned long host_rsp; > - int launched; > u8fail; > u8cpl; > bool nmi_known_unmasked; > @@ -140,7 +150,14 @@ struct vcpu_vmx { > u64 msr_host_kernel_gs_base; > u64 msr_guest_kernel_gs_base; > #endif > - struct vmcs *vmcs; > + /* > + * loaded_vmcs points to the VMCS currently used in this vcpu. For a > + * non-nested (L1) guest, it always points to vmcs01. For a nested > + * guest (L2), it points to a different VMCS. > + */ > + struct loaded_vmcsvmcs01; > + struct loaded_vmcs *loaded_vmcs; > + bool __launched; /* temporary, used in > vmx_vcpu_run */ > struct msr_autoload { > unsigned nr; > struct vmx_msr_entry guest[NR_AUTOLOAD_MSRS]; > @@ -200,7 +217,11 @@ static int vmx_set_tss_addr(struct kvm * > > static DEFINE_PER_CPU(struct vmcs *, vmxarea); > static DEFINE_PER_CPU(struct vmcs *, current_vmcs); > -static DEFINE_PER_CPU(struct list_head, vcpus_on_cpu); > +/* > + * We maintain a per-CPU linked-list of VMCS loaded on that CPU. This is > needed > + * when a CPU is brought down, and we need to VMCLEAR all VMCSs loaded > on it. > + */ > +static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu); > static DEFINE_PER_CPU(struct desc_ptr, host
RE: [PATCH 07/31] nVMX: Introduce vmcs02: VMCS used to run L2
> From: Nadav Har'El [mailto:n...@math.technion.ac.il] > Sent: Sunday, May 22, 2011 4:30 PM > > Hi, > > On Fri, May 20, 2011, Tian, Kevin wrote about "RE: [PATCH 07/31] nVMX: > Introduce vmcs02: VMCS used to run L2": > > Possibly we can maintain the vmcs02 pool along with L1 VMCLEAR ops, which > > is similar to the hardware behavior regarding to cleared and launched state. > > If you set VMCS02_POOL_SIZE to a large size, and L1, like typical hypervisors, > only keeps around a few VMCSs (and VMCLEARs the ones it will not use again), > then we'll only have a few vmcs02: handle_vmclear() removes from the pool the > vmcs02 that L1 explicitly told us it won't need again. yes > > > > +struct saved_vmcs { > > > + struct vmcs *vmcs; > > > + int cpu; > > > + int launched; > > > +}; > > > > "saved" looks a bit misleading here. It's simply a list of all active vmcs02 > tracked > > by kvm, isn't it? > > I have rewritten this part of the code, based on Avi's and Marcelo's requests, > and the new name for this structure is "loaded_vmcs", i.e., a structure > describing where a VMCS was loaded. great, I'll take a look at your new code. Thanks Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 08/31] nVMX: Fix local_vcpus_link handling
> From: Avi Kivity > Sent: Monday, May 23, 2011 11:49 PM > (regarding interrupts, I think we can do that work post-merge. But I'd > like to see Kevin's comments addressed) My earlier comment has been addressed by Nadav with his explanation. Thanks Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 07/31] nVMX: Introduce vmcs02: VMCS used to run L2
> From: Nadav Har'El [mailto:n...@math.technion.ac.il] > Sent: Sunday, May 22, 2011 3:23 PM > > Hi, > > On Sun, May 22, 2011, Tian, Kevin wrote about "RE: [PATCH 07/31] nVMX: > Introduce vmcs02: VMCS used to run L2": > > Here the vmcs02 being overridden may have been run on another processor > before > > but is not vmclear-ed yet. When you resume this vmcs02 with new content on > a > > separate processor, the risk of corruption exists. > > I still believe that my current code is correct (in this area). I'll try to > explain it here and would be grateful if you could point to me the error (if > there is one) in my logic: > > Nested_vmx_run() is our function which is switches from running L1 to L2 > (patch 18). > > This function starts by calling nested_get_current_vmcs02(), which gets us > *some* vmcs to use for vmcs02. This may be a fresh new VMCS, or a > "recycled" > VMCS, some VMCS we've previously used to run some, potentially different L2 > guest on some, potentially different, CPU. > nested_get_current_vmcs02() returns a "saved_vmcs" structure, which > not only contains a VMCS, but also remembers on which (if any) cpu it is > currently loaded (and whether it was VMLAUNCHed once on that cpu). > > The next thing that Nested_vmx_run() now does is to set up in the vcpu object > the vmcs, cpu and launched fields according to what was returned above. > > Now it calls vmx_vcpu_load(). This standard KVM function checks if we're now > running on a different CPU from the vcpu->cpu, and if it a different one, is > uses vcpu_clear() to VMCLEAR the vmcs on the CPU where it was last loaded > (using an IPI). Only after it vmclears the VMCS on the old CPU, it can finally > load the VMCS on the new CPU. > > Only now Nested_vmx_run() can call prepare_vmcs02, which starts > VMWRITEing > to this VMCS, and finally returns. > yes, you're correct. Previously I just looked around 07/31 and raised above concern. Along with nested_vmx_run you explained above, this part is clear to me now. :-) > P.S. Seeing that you're from Intel, maybe you can help me with a pointer: > I found what appears to be a small error in the SDM - who can I report it to? > Let me ask for you. Thanks Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
memory zones and the KVM guest kernel
Hi, When I boot my guest kernel with KVM, the dmesg output says that: ... [0.00] Zone PFN ranges: [0.00] DMA 0x0010 -> 0x1000 [0.00] DMA320x1000 -> 0x0010 [0.00] Normal empty [0.00] Movable zone start PFN for each node [0.00] early_node_map[2] active PFN ranges [0.00] 0: 0x0010 -> 0x009f [0.00] 0: 0x0100 -> 0x0007fffd ... Why is the Normal Zone empty? Is it possible to have some of the guest's memory mapped in the Normal zone? Is there a good reference that talks about the normal, movable, etc. memory zones? Thanks, \dae -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling
On Mon, May 23, 2011 at 09:59:01PM +0300, Nadav Har'El wrote: > On Mon, May 23, 2011, Gleb Natapov wrote about "Re: [PATCH 08/31] nVMX: Fix > local_vcpus_link handling": > > On Mon, May 23, 2011 at 06:49:17PM +0300, Avi Kivity wrote: > > > (regarding interrupts, I think we can do that work post-merge. But > > > I'd like to see Kevin's comments addressed) > > > > > To be fair this wasn't addressed for almost two years now. > > Gleb, I assume by "this" you meant the idt-vectoring information issue, not > Kevin's comments (which I only saw a couple of days ago)? > Yes, of course. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling
On Mon, May 23, 2011, Gleb Natapov wrote about "Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling": > On Mon, May 23, 2011 at 06:49:17PM +0300, Avi Kivity wrote: > > (regarding interrupts, I think we can do that work post-merge. But > > I'd like to see Kevin's comments addressed) > > > To be fair this wasn't addressed for almost two years now. Gleb, I assume by "this" you meant the idt-vectoring information issue, not Kevin's comments (which I only saw a couple of days ago)? -- Nadav Har'El| Monday, May 23 2011, 20 Iyyar 5771 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |Someone offered you a cute little quote http://nadav.harel.org.il |for your signature? JUST SAY NO! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling
Hi, and thanks again for the reviews, On Mon, May 23, 2011, Avi Kivity wrote about "Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling": > > if (need_emulate_wbinvd(vcpu)) { > > if (kvm_x86_ops->has_wbinvd_exit()) > > cpumask_set_cpu(cpu, vcpu->arch.wbinvd_dirty_mask); > >-else if (vcpu->cpu != -1&& vcpu->cpu != cpu) > >+else if (vcpu->cpu != -1&& vcpu->cpu != cpu > >+&& cpu_online(vcpu->cpu)) > > smp_call_function_single(vcpu->cpu, > > wbinvd_ipi, NULL, 1); > > } > > Is this a necessary part of this patch? Or an semi-related bugfix? > > I think that it can't actually trigger before this patch due to luck. > svm doesn't clear vcpu->cpu on cpu offline, but on the other hand it > ->has_wbinvd_exit(). Well, this was Marcelo's patch: When I suggested that we might have problems because vcpu->cpu now isn't cleared to -1 when a cpu is offlined, he looked at the code and said that he thinks this is the only place that will have problems, and offered this patch, which I simply included in mine. I'm afraid to admit I don't understand that part of the code, so I can't judge if this is important or not. I'll drop it from my patch for now (and you can apply Marcelo's patch separately). > >+if (vmx->loaded_vmcs->cpu != cpu) { > > struct desc_ptr *gdt =&__get_cpu_var(host_gdt); > > unsigned long sysenter_esp; > > > > kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); > > local_irq_disable(); > >-list_add(&vmx->local_vcpus_link, > >-&per_cpu(vcpus_on_cpu, cpu)); > >+list_add(&vmx->loaded_vmcs->loaded_vmcss_on_cpu_link, > >+&per_cpu(loaded_vmcss_on_cpu, cpu)); > > local_irq_enable(); > > > > /* > >@@ -999,13 +1020,15 @@ static void vmx_vcpu_load(struct kvm_vcp > > rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp); > > vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 > > */ > > } > >+vmx->loaded_vmcs->cpu = cpu; > This should be within the if () block. Makes sense :-) Done. > >+vmcs_init(vmx->loaded_vmcs->vmcs); > >+vmx->loaded_vmcs->cpu = -1; > >+vmx->loaded_vmcs->launched = 0; > > Perhaps a loaded_vmcs_init() to encapsulate initialization of these > three fields, you'll probably reuse it later. It's good you pointed this out, because it made me suddenly realise that I forgot to VMCLEAR the new vmcs02's I allocate. In practice it never made a difference, but better safe than sorry. I had to restructure some of the code a bit to be able to properly use this new function (in 3 places - __loaded_vmcs_clear, nested_get_current_vmcs02, vmx_create_cpu). > Please repost separately after the fix, I'd like to apply it before the > rest of the series. I am adding a new version of this patch at the end of this mail. > (regarding interrupts, I think we can do that work post-merge. But I'd > like to see Kevin's comments addressed) I replied to his comments. Done some of the things he asked, and asked for more info on why/where he believes the current code is incorrect where I didn't understand what problems he pointed to, and am now waiting for him to reply. --- 8< -- 8< -- 8< -- 8< --- 8< --- Subject: [PATCH 01/31] nVMX: Keep list of loaded VMCSs, instead of vcpus. In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it because (at least in theory) the processor might not have written all of its content back to memory. Since a patch from June 26, 2008, this is done using a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU. The problem is that with nested VMX, we no longer have the concept of a vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for L2s), and each of those may be have been last loaded on a different cpu. So instead of linking the vcpus, we link the VMCSs, using a new structure loaded_vmcs. This structure contains the VMCS, and the information pertaining to its loading on a specific cpu (namely, the cpu number, and whether it was already launched on this cpu once). In nested we will also use the same structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the currently active VMCS. Signed-off-by: Nadav Har'El --- arch/x86/kvm/vmx.c | 150 --- 1 file changed, 86 insertions(+), 64 deletions(-) --- .before/arch/x86/kvm/vmx.c 2011-05-23 21:46:14.0 +0300 +++ .after/arch/x86/kvm/vmx.c 2011-05-23 21:46:14.0 +0300 @@ -116,6 +116,18 @@ struct vmcs { char data[0]; }; +/* + * Track a VMCS that may be loaded on a certain CPU. If it is (cpu!=-1), also + * remember whether it was VMLAUNCHed, and maintain a linked list of all VMCSs + * loaded on this CPU (so we can clear them if the CPU g
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On 23.05.2011, at 17:23, Avi Kivity wrote: > On 05/23/2011 05:44 PM, Nadav Har'El wrote: >> On Mon, May 23, 2011, Avi Kivity wrote about "Re: [PATCH 0/30] nVMX: Nested >> VMX, v9": >> > vmcs01 and vmcs02 will both be generated from vmcs12. >> >> If you don't do a clean nested exit (from L2 to L1), vmcs02 can't be >> generated >> from vmcs12... while L2 runs, it is possible that it modifies vmcs02 (e.g., >> non-trapped bits of guest_cr0), and these modifications are not copied back >> to vmcs12 until the nested exit (when prepare_vmcs12() is called to perform >> this task). >> >> If you do a nested exit (a "fake" one), vmcs12 is made up to date, and then >> indeed vmcs02 can be thrown away and regenerated. > > You would flush this state back to the vmcs. But that just confirms Joerg's > statement that a fake vmexit/vmrun is more or less equivalent. > > The question is whether %rip points to the VMRUN/VMLAUNCH instruction, > HOST_RIP (or the next instruction for svm), or to guest code. But the actual > things we need to do are all very similar subsets of a vmexit. %rip should certainly point to VMRUN. That way there is no need to save any information whatsoever, as the VMCB is already in sane state and nothing needs to be special cased, as the next VCPU_RUN would simply go back into guest mode - which is exactly what we want. The only tricky part is how we distinguish between "I need to live migrate" and "info registers". In the former case, %rip should be on VMRUN. In the latter, on the guest rip. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling
On 05/23/2011 07:43 PM, Roedel, Joerg wrote: On Mon, May 23, 2011 at 11:49:17AM -0400, Avi Kivity wrote: > Joerg, is > > if (unlikely(cpu != vcpu->cpu)) { > svm->asid_generation = 0; > mark_all_dirty(svm->vmcb); > } > > susceptible to cpu offline/online? I don't think so. This should be safe for cpu offline/online as long as the cpu-number value is not reused for another physical cpu. But that should be the case afaik. Why not? offline/online does reuse cpu numbers AFAIK (and it must, if you have a fully populated machine and offline/online just one cpu). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling
On Mon, May 23, 2011 at 11:49:17AM -0400, Avi Kivity wrote: > Joerg, is > > if (unlikely(cpu != vcpu->cpu)) { > svm->asid_generation = 0; > mark_all_dirty(svm->vmcb); > } > > susceptible to cpu offline/online? I don't think so. This should be safe for cpu offline/online as long as the cpu-number value is not reused for another physical cpu. But that should be the case afaik. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling
On Mon, May 23, 2011 at 06:49:17PM +0300, Avi Kivity wrote: > (regarding interrupts, I think we can do that work post-merge. But > I'd like to see Kevin's comments addressed) > To be fair this wasn't addressed for almost two years now. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling
On 05/22/2011 11:57 AM, Nadav Har'El wrote: Hi Avi and Marcelo, here is a the new first patch to the nvmx patch set, which overhauls the handling of vmcss on cpus, as you asked. As you guessed, the nested entry and exit code becomes much simpler and cleaner, with the whole VMCS switching code on entry, for example, reduced to: cpu = get_cpu(); vmx->loaded_vmcs = vmcs02; vmx_vcpu_put(vcpu); vmx_vcpu_load(vcpu, cpu); vcpu->cpu = cpu; put_cpu(); That's wonderful, it indicates the code is much better integrated. Perhaps later we can refine it to have separate _load and _put for host-related and guest-related parts (I think they already exist in the code, except they are always called together), but that is an optimization, and not the most important one by far. You can apply this patch separately from the rest of the patch set, if you wish. I'm sending just this one, like you asked - and can send the rest of the patches when you ask me to. Subject: [PATCH 01/31] nVMX: Keep list of loaded VMCSs, instead of vcpus. In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it because (at least in theory) the processor might not have written all of its content back to memory. Since a patch from June 26, 2008, this is done using a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU. The problem is that with nested VMX, we no longer have the concept of a vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for L2s), and each of those may be have been last loaded on a different cpu. So instead of linking the vcpus, we link the VMCSs, using a new structure loaded_vmcs. This structure contains the VMCS, and the information pertaining to its loading on a specific cpu (namely, the cpu number, and whether it was already launched on this cpu once). In nested we will also use the same structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the currently active VMCS. --- .before/arch/x86/kvm/x86.c 2011-05-22 11:41:57.0 +0300 +++ .after/arch/x86/kvm/x86.c 2011-05-22 11:41:57.0 +0300 @@ -2119,7 +2119,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu if (need_emulate_wbinvd(vcpu)) { if (kvm_x86_ops->has_wbinvd_exit()) cpumask_set_cpu(cpu, vcpu->arch.wbinvd_dirty_mask); - else if (vcpu->cpu != -1&& vcpu->cpu != cpu) + else if (vcpu->cpu != -1&& vcpu->cpu != cpu + && cpu_online(vcpu->cpu)) smp_call_function_single(vcpu->cpu, wbinvd_ipi, NULL, 1); } Is this a necessary part of this patch? Or an semi-related bugfix? I think that it can't actually trigger before this patch due to luck. svm doesn't clear vcpu->cpu on cpu offline, but on the other hand it ->has_wbinvd_exit(). Joerg, is if (unlikely(cpu != vcpu->cpu)) { svm->asid_generation = 0; mark_all_dirty(svm->vmcb); } susceptible to cpu offline/online? @@ -971,22 +992,22 @@ static void vmx_vcpu_load(struct kvm_vcp if (!vmm_exclusive) kvm_cpu_vmxon(phys_addr); - else if (vcpu->cpu != cpu) - vcpu_clear(vmx); + else if (vmx->loaded_vmcs->cpu != cpu) + loaded_vmcs_clear(vmx->loaded_vmcs); - if (per_cpu(current_vmcs, cpu) != vmx->vmcs) { - per_cpu(current_vmcs, cpu) = vmx->vmcs; - vmcs_load(vmx->vmcs); + if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) { + per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs; + vmcs_load(vmx->loaded_vmcs->vmcs); } - if (vcpu->cpu != cpu) { + if (vmx->loaded_vmcs->cpu != cpu) { struct desc_ptr *gdt =&__get_cpu_var(host_gdt); unsigned long sysenter_esp; kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); local_irq_disable(); - list_add(&vmx->local_vcpus_link, - &per_cpu(vcpus_on_cpu, cpu)); + list_add(&vmx->loaded_vmcs->loaded_vmcss_on_cpu_link, + &per_cpu(loaded_vmcss_on_cpu, cpu)); local_irq_enable(); /* @@ -999,13 +1020,15 @@ static void vmx_vcpu_load(struct kvm_vcp rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp); vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */ } + vmx->loaded_vmcs->cpu = cpu; This should be within the if () block. @@ -4344,11 +4369,13 @@ static struct kvm_vcpu *vmx_create_vcpu( goto uninit_vcpu; } - vmx->vmcs = alloc_vmcs(); - if (!vmx->vmcs) + vmx->loaded_vmcs =&vmx->vmcs01; + vmx->loaded_vmcs->vmcs = alloc_vmcs(); + if (!vmx->loaded_vmcs->vmcs) goto free_msrs; - - vmcs_init(vmx->vmcs); + vmcs_init(vmx->loaded_vmcs->vmcs); +
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On 05/23/2011 05:44 PM, Nadav Har'El wrote: On Mon, May 23, 2011, Avi Kivity wrote about "Re: [PATCH 0/30] nVMX: Nested VMX, v9": > vmcs01 and vmcs02 will both be generated from vmcs12. If you don't do a clean nested exit (from L2 to L1), vmcs02 can't be generated from vmcs12... while L2 runs, it is possible that it modifies vmcs02 (e.g., non-trapped bits of guest_cr0), and these modifications are not copied back to vmcs12 until the nested exit (when prepare_vmcs12() is called to perform this task). If you do a nested exit (a "fake" one), vmcs12 is made up to date, and then indeed vmcs02 can be thrown away and regenerated. You would flush this state back to the vmcs. But that just confirms Joerg's statement that a fake vmexit/vmrun is more or less equivalent. The question is whether %rip points to the VMRUN/VMLAUNCH instruction, HOST_RIP (or the next instruction for svm), or to guest code. But the actual things we need to do are all very similar subsets of a vmexit. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On 05/23/2011 05:58 PM, Joerg Roedel wrote: On Mon, May 23, 2011 at 05:34:20PM +0300, Avi Kivity wrote: > On 05/23/2011 05:28 PM, Joerg Roedel wrote: >> To user-space we can provide a VCPU_FREEZE/VCPU_UNFREEZE ioctl which >> does all the necessary things. > > Or we can automatically flush things on any exit to userspace. They > should be very rare in guest mode. This would make nesting mostly transparent to migration, so it sounds good in this regard. I do not completly agree that user-space exits in guest-mode are rare, this depends on the hypervisor in the L1. In Hyper-V for example the root-domain uses hardware virtualization too and has direct access to devices (at least to some degree). IOIO is not intercepted in the root-domain, for example. Not sure about the MMIO regions. Good point. We were also talking about passing through virtio (or even host) devices to the guest. So an ioctl to flush volatile state to memory would be a good idea. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On Mon, May 23, 2011 at 05:34:20PM +0300, Avi Kivity wrote: > On 05/23/2011 05:28 PM, Joerg Roedel wrote: >> To user-space we can provide a VCPU_FREEZE/VCPU_UNFREEZE ioctl which >> does all the necessary things. > > Or we can automatically flush things on any exit to userspace. They > should be very rare in guest mode. This would make nesting mostly transparent to migration, so it sounds good in this regard. I do not completly agree that user-space exits in guest-mode are rare, this depends on the hypervisor in the L1. In Hyper-V for example the root-domain uses hardware virtualization too and has direct access to devices (at least to some degree). IOIO is not intercepted in the root-domain, for example. Not sure about the MMIO regions. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On Mon, May 23, 2011, Avi Kivity wrote about "Re: [PATCH 0/30] nVMX: Nested VMX, v9": > vmcs01 and vmcs02 will both be generated from vmcs12. If you don't do a clean nested exit (from L2 to L1), vmcs02 can't be generated from vmcs12... while L2 runs, it is possible that it modifies vmcs02 (e.g., non-trapped bits of guest_cr0), and these modifications are not copied back to vmcs12 until the nested exit (when prepare_vmcs12() is called to perform this task). If you do a nested exit (a "fake" one), vmcs12 is made up to date, and then indeed vmcs02 can be thrown away and regenerated. Nadav. -- Nadav Har'El| Monday, May 23 2011, 19 Iyyar 5771 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |Jury: Twelve people who determine which http://nadav.harel.org.il |client has the better lawyer. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm tools: Drop unused vars from int10.c code
There is a couple of functions which defines 'ah' variable but never use it in real so that gcc 4.6.x series does complain on me as CC bios/bios-rom.bin bios/int10.c: In function ‘int10_putchar’: bios/int10.c:86:9: error: variable ‘ah’ set but not used [-Werror=unused-but-set-variable] bios/int10.c: In function ‘int10_vesa’: bios/int10.c:96:9: error: variable ‘ah’ set but not used [-Werror=unused-but-set-variable] cc1: all warnings being treated as errors so get rid of them. Signed-off-by: Cyrill Gorcunov CC: Sasha Levin --- tools/kvm/bios/int10.c |8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) Index: linux-2.6.git/tools/kvm/bios/int10.c === --- linux-2.6.git.orig/tools/kvm/bios/int10.c +++ linux-2.6.git/tools/kvm/bios/int10.c @@ -83,22 +83,18 @@ static inline void outb(unsigned short p */ static inline void int10_putchar(struct int10_args *args) { - u8 al, ah; - - al = args->eax & 0xFF; - ah = (args->eax & 0xFF00) >> 8; + u8 al = args->eax & 0xFF; outb(0x3f8, al); } static void int10_vesa(struct int10_args *args) { - u8 al, ah; + u8 al; struct vesa_general_info *destination; struct vminfo *vi; al = args->eax; - ah = args->eax >> 8; switch (al) { case 0: -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On 05/23/2011 05:28 PM, Joerg Roedel wrote: On Mon, May 23, 2011 at 04:52:47PM +0300, Avi Kivity wrote: > On 05/23/2011 04:40 PM, Joerg Roedel wrote: >> The next benefit is that it works seemlessly even if the state that >> needs to be transfered is extended (e.g. by emulating a new >> virtualization hardware feature). This support can be implemented in the >> kernel module and no changes to qemu are required. > > I agree it's a benefit. But I don't like making the fake vmexit part of > live migration, if it turns out the wrong choice it's hard to undo it. Well, saving the state to the host-save-area and doing a fake-vmexit is logically the same, only the memory where the information is stored differs. Right. I guess the main difference is "info registers" after a stop. To user-space we can provide a VCPU_FREEZE/VCPU_UNFREEZE ioctl which does all the necessary things. Or we can automatically flush things on any exit to userspace. They should be very rare in guest mode. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On 05/23/2011 05:10 PM, Nadav Har'El wrote: On Mon, May 23, 2011, Avi Kivity wrote about "Re: [PATCH 0/30] nVMX: Nested VMX, v9": > I think for Intel there is no hidden state apart from in-guest-mode > (there is the VMPTR, but it is an actual register accessible via > instructions). is_guest_mode(vcpu), vmx->nested.vmxon, vmx->nested.current_vmptr are the only three things I can think of. Vmxon is actually more than a boolean (there's also a vmxon pointer). What do you mean by the current_vmptr being available through an instruction? It is (VMPTRST), but this would be an instruction run on L1 (emulated by L0). How would L0's user space use that instruction? I mean that it is an architectural register rather than "hidden state". It doesn't mean that L0 user space can use it. > I agree it's a benefit. But I don't like making the fake vmexit part of > live migration, if it turns out the wrong choice it's hard to undo it. If you don't do this "fake vmexit", you'll need to migrate both vmcs01 and the current vmcs02 - the fact that vmcs12 is in guest memory will not be enough, because vmcs02 isn't copied back to vmcs12 until the nested exit. vmcs01 and vmcs02 will both be generated from vmcs12. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On Mon, May 23, 2011 at 04:52:47PM +0300, Avi Kivity wrote: > On 05/23/2011 04:40 PM, Joerg Roedel wrote: >> The next benefit is that it works seemlessly even if the state that >> needs to be transfered is extended (e.g. by emulating a new >> virtualization hardware feature). This support can be implemented in the >> kernel module and no changes to qemu are required. > > I agree it's a benefit. But I don't like making the fake vmexit part of > live migration, if it turns out the wrong choice it's hard to undo it. Well, saving the state to the host-save-area and doing a fake-vmexit is logically the same, only the memory where the information is stored differs. To user-space we can provide a VCPU_FREEZE/VCPU_UNFREEZE ioctl which does all the necessary things. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5 V2] kvm tools: Initialize and use VESA and VNC
On Mon, May 23, 2011 at 2:19 PM, Sasha Levin wrote: > Requirements - Kernel compiled with: > CONFIG_FB_BOOT_VESA_SUPPORT=y > CONFIG_FB_VESA=y > CONFIG_FRAMEBUFFER_CONSOLE=y Dunno if it's possible but it would be nice to have a more readable error message if you don't have that compiled in: penberg@tiger:~/linux/tools/kvm$ ./kvm run --vnc -d ~/images/debian_squeeze_amd64_standard.img # kvm run -k ../../arch/x86/boot/bzImage -m 320 -c 2 Warning: Config tap device error. Are you root? 23/05/2011 17:08:19 Listening for VNC connections on TCP port 5900 Undefined video mode number: 312 Press to see video modes available, to continue, or wait 30 sec Killed This obviously isn't an issue for merging this patch. Pekka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On Mon, May 23, 2011, Avi Kivity wrote about "Re: [PATCH 0/30] nVMX: Nested VMX, v9": > I think for Intel there is no hidden state apart from in-guest-mode > (there is the VMPTR, but it is an actual register accessible via > instructions). is_guest_mode(vcpu), vmx->nested.vmxon, vmx->nested.current_vmptr are the only three things I can think of. Vmxon is actually more than a boolean (there's also a vmxon pointer). What do you mean by the current_vmptr being available through an instruction? It is (VMPTRST), but this would be an instruction run on L1 (emulated by L0). How would L0's user space use that instruction? > I agree it's a benefit. But I don't like making the fake vmexit part of > live migration, if it turns out the wrong choice it's hard to undo it. If you don't do this "fake vmexit", you'll need to migrate both vmcs01 and the current vmcs02 - the fact that vmcs12 is in guest memory will not be enough, because vmcs02 isn't copied back to vmcs12 until the nested exit. -- Nadav Har'El| Monday, May 23 2011, 19 Iyyar 5771 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |The world is coming to an end ... SAVE http://nadav.harel.org.il |YOUR BUFFERS!!! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On 05/23/2011 04:40 PM, Joerg Roedel wrote: On Mon, May 23, 2011 at 04:08:00PM +0300, Avi Kivity wrote: > On 05/23/2011 04:02 PM, Joerg Roedel wrote: >> About live-migration with nesting, we had discussed the idea of just >> doing an VMEXIT(INTR) if the vcpu runs nested and we want to migrate. >> The problem was that the hypervisor may not expect an INTR intercept. >> >> How about doing an implicit VMEXIT in this case and an implicit VMRUN >> after the vcpu is migrated? > > What if there's something in EXIT_INT_INFO? On real SVM hardware EXIT_INT_INFO should only contain something for exception and npt intercepts. These are all handled in the kernel and do not cause an exit to user-space so that no valid EXIT_INT_INFO should be around when we actually go back to user-space (so that migration can happen). The exception might be the #PF/NPT intercept when the guest is doing very obscure things like putting an exception/interrupt handler on mmio memory, but that isn't really supported by KVM anyway so I doubt we should care. Unless I miss something here we should be safe by just not looking at EXIT_INT_INFO while migrating. Agree. >>The nested hypervisor will not see the >> vmexit and the vcpu will be in a state where it is safe to migrate. This >> should work for nested-vmx too if the guest-state is written back to >> guest memory on VMEXIT. Is this the case? > > It is the case with the current implementation, and we can/should make > it so in future implementations, just before exit to userspace. Or at > least provide an ABI to sync memory. > > But I don't see why we shouldn't just migrate all the hidden state (in > guest mode flag, svm host paging mode, svm host interrupt state, vmcb > address/vmptr, etc.). It's more state, but no thinking is involved, so > it's clearly superior. An issue is that there is different state to migrate for Intel and AMD hosts. If we keep all that information in guest memory the kvm kernel module can handle those details and all KVM needs to migrate is the in-guest-mode flag and the gpa of the vmcb/vmcs which is currently executed. This state should be enough for Intel and AMD nesting. I think for Intel there is no hidden state apart from in-guest-mode (there is the VMPTR, but it is an actual register accessible via instructions). For svm we can keep the hidden state in the host state-save area (including the vmcb pointer). The only risk is that svm will gain hardware support for nesting, and will choose a different format than ours. An alternative is a fake MSR for storing this data, or just another get/set ioctl pair. We'll have a flags field that says which fields are filled in. The next benefit is that it works seemlessly even if the state that needs to be transfered is extended (e.g. by emulating a new virtualization hardware feature). This support can be implemented in the kernel module and no changes to qemu are required. I agree it's a benefit. But I don't like making the fake vmexit part of live migration, if it turns out the wrong choice it's hard to undo it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On Mon, May 23, 2011 at 04:08:00PM +0300, Avi Kivity wrote: > On 05/23/2011 04:02 PM, Joerg Roedel wrote: >> About live-migration with nesting, we had discussed the idea of just >> doing an VMEXIT(INTR) if the vcpu runs nested and we want to migrate. >> The problem was that the hypervisor may not expect an INTR intercept. >> >> How about doing an implicit VMEXIT in this case and an implicit VMRUN >> after the vcpu is migrated? > > What if there's something in EXIT_INT_INFO? On real SVM hardware EXIT_INT_INFO should only contain something for exception and npt intercepts. These are all handled in the kernel and do not cause an exit to user-space so that no valid EXIT_INT_INFO should be around when we actually go back to user-space (so that migration can happen). The exception might be the #PF/NPT intercept when the guest is doing very obscure things like putting an exception/interrupt handler on mmio memory, but that isn't really supported by KVM anyway so I doubt we should care. Unless I miss something here we should be safe by just not looking at EXIT_INT_INFO while migrating. >> The nested hypervisor will not see the >> vmexit and the vcpu will be in a state where it is safe to migrate. This >> should work for nested-vmx too if the guest-state is written back to >> guest memory on VMEXIT. Is this the case? > > It is the case with the current implementation, and we can/should make > it so in future implementations, just before exit to userspace. Or at > least provide an ABI to sync memory. > > But I don't see why we shouldn't just migrate all the hidden state (in > guest mode flag, svm host paging mode, svm host interrupt state, vmcb > address/vmptr, etc.). It's more state, but no thinking is involved, so > it's clearly superior. An issue is that there is different state to migrate for Intel and AMD hosts. If we keep all that information in guest memory the kvm kernel module can handle those details and all KVM needs to migrate is the in-guest-mode flag and the gpa of the vmcb/vmcs which is currently executed. This state should be enough for Intel and AMD nesting. The next benefit is that it works seemlessly even if the state that needs to be transfered is extended (e.g. by emulating a new virtualization hardware feature). This support can be implemented in the kernel module and no changes to qemu are required. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On Mon, May 23, 2011, Joerg Roedel wrote about "Re: [PATCH 0/30] nVMX: Nested VMX, v9": > About live-migration with nesting, we had discussed the idea of just > doing an VMEXIT(INTR) if the vcpu runs nested and we want to migrate. > The problem was that the hypervisor may not expect an INTR intercept. > > How about doing an implicit VMEXIT in this case and an implicit VMRUN > after the vcpu is migrated? The nested hypervisor will not see the > vmexit and the vcpu will be in a state where it is safe to migrate. This > should work for nested-vmx too if the guest-state is written back to > guest memory on VMEXIT. Is this the case? Indeed, on nested exit (L2 to L1), the L2 guest state is written back to vmcs12 (in guest memory). In theory, at that point, the vmcs02 (the vmcs used by L0 to actually run L2) can be discarded, without risking losing anything. The receiving hypervisor will need to remember to do that implicit VMRUN when it starts the guest; It also needs to know what is the current L2 guest - in VMX this would be vmx->nested.current_vmptr, which needs to me migrated as well (on the other hand, other variables like vmx->nested.current_vmcs12, will need to be recalculated by the receiver, and not migrated as-is). I haven't started considering how to wrap up all these pieces into a complete working solution - it is one of the things on my TODO list after the basic nested VMX is merged. -- Nadav Har'El| Monday, May 23 2011, 19 Iyyar 5771 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |Live as if you were to die tomorrow, http://nadav.harel.org.il |learn as if you were to live forever. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Managedsave does not work with kernel >=2.6.38
Hello, we recently noticed that the "managedsave" command from libvirt does not work when using kernel >= 2.6.38. It saves the state to a file, but the domain does not resume from the file. Instead a started domain always gets rebooted. When using kernel 2.6.37 "managedsave" does work without problems. We are currently using: - libvirt 0.9.0 - kvm 0.14.0 - kernel 2.6.38.6 Is this a known bug? Best regards Sebastian Nickel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On 05/23/2011 04:02 PM, Joerg Roedel wrote: On Mon, May 23, 2011 at 12:52:50PM +0300, Avi Kivity wrote: > On 05/22/2011 10:32 PM, Nadav Har'El wrote: >> What do we need to do with this idt-vectoring-information? In regular (non- >> nested) guests, the answer is simple: On the next entry, we need to inject >> this event again into the guest, so it can resume the delivery of the >> same event it was trying to deliver. This is why the nested-unaware code >> has a vmx_complete_interrupts which basically adds this idt-vectoring-info >> into KVM's event queue, which on the next entry will be injected similarly >> to the way virtual interrupts from userspace are injected, and so on. > > The other thing we may need to do, is to expose it to userspace in case > we're live migrating at exactly this point in time. About live-migration with nesting, we had discussed the idea of just doing an VMEXIT(INTR) if the vcpu runs nested and we want to migrate. The problem was that the hypervisor may not expect an INTR intercept. How about doing an implicit VMEXIT in this case and an implicit VMRUN after the vcpu is migrated? What if there's something in EXIT_INT_INFO? The nested hypervisor will not see the vmexit and the vcpu will be in a state where it is safe to migrate. This should work for nested-vmx too if the guest-state is written back to guest memory on VMEXIT. Is this the case? It is the case with the current implementation, and we can/should make it so in future implementations, just before exit to userspace. Or at least provide an ABI to sync memory. But I don't see why we shouldn't just migrate all the hidden state (in guest mode flag, svm host paging mode, svm host interrupt state, vmcb address/vmptr, etc.). It's more state, but no thinking is involved, so it's clearly superior. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On Mon, May 23, 2011 at 12:52:50PM +0300, Avi Kivity wrote: > On 05/22/2011 10:32 PM, Nadav Har'El wrote: >> What do we need to do with this idt-vectoring-information? In regular (non- >> nested) guests, the answer is simple: On the next entry, we need to inject >> this event again into the guest, so it can resume the delivery of the >> same event it was trying to deliver. This is why the nested-unaware code >> has a vmx_complete_interrupts which basically adds this idt-vectoring-info >> into KVM's event queue, which on the next entry will be injected similarly >> to the way virtual interrupts from userspace are injected, and so on. > > The other thing we may need to do, is to expose it to userspace in case > we're live migrating at exactly this point in time. About live-migration with nesting, we had discussed the idea of just doing an VMEXIT(INTR) if the vcpu runs nested and we want to migrate. The problem was that the hypervisor may not expect an INTR intercept. How about doing an implicit VMEXIT in this case and an implicit VMRUN after the vcpu is migrated? The nested hypervisor will not see the vmexit and the vcpu will be in a state where it is safe to migrate. This should work for nested-vmx too if the guest-state is written back to guest memory on VMEXIT. Is this the case? Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 4/5] kvm tools: Update makefile and feature tests
From: John Floren Update feature tests to test for libvncserver. VESA support doesn't get compiled in unless libvncserver is installed. Signed-off-by: John Floren [ turning code into patches and cleanup ] Signed-off-by: Sasha Levin --- tools/kvm/Makefile | 11 ++- tools/kvm/config/feature-tests.mak | 10 ++ 2 files changed, 20 insertions(+), 1 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index e6e8d4e..2ebc86c 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -58,6 +58,14 @@ ifeq ($(has_bfd),y) LIBS+= -lbfd endif +FLAGS_VNCSERVER=$(CFLAGS) -lvncserver +has_vncserver := $(call try-cc,$(SOURCE_VNCSERVER),$(FLAGS_VNCSERVER)) +ifeq ($(has_vncserver),y) + CFLAGS += -DCONFIG_HAS_VNCSERVER + OBJS+= hw/vesa.o + LIBS+= -lvncserver +endif + DEPS := $(patsubst %.o,%.d,$(OBJS)) # Exclude BIOS object files from header dependencies. @@ -153,9 +161,10 @@ bios/bios.o: bios/bios.S bios/bios-rom.bin bios/bios-rom.bin: bios/bios-rom.S bios/e820.c $(E) " CC " $@ $(Q) $(CC) -include code16gcc.h $(CFLAGS) $(BIOS_CFLAGS) -c -s bios/e820.c -o bios/e820.o + $(Q) $(CC) -include code16gcc.h $(CFLAGS) $(BIOS_CFLAGS) -c -s bios/int10.c -o bios/int10.o $(Q) $(CC) $(CFLAGS) $(BIOS_CFLAGS) -c -s bios/bios-rom.S -o bios/bios-rom.o $(E) " LD " $@ - $(Q) ld -T bios/rom.ld.S -o bios/bios-rom.bin.elf bios/bios-rom.o bios/e820.o + $(Q) ld -T bios/rom.ld.S -o bios/bios-rom.bin.elf bios/bios-rom.o bios/e820.o bios/int10.o $(E) " OBJCOPY " $@ $(Q) objcopy -O binary -j .text bios/bios-rom.bin.elf bios/bios-rom.bin $(E) " NM " $@ diff --git a/tools/kvm/config/feature-tests.mak b/tools/kvm/config/feature-tests.mak index 6170fd2..0801b54 100644 --- a/tools/kvm/config/feature-tests.mak +++ b/tools/kvm/config/feature-tests.mak @@ -126,3 +126,13 @@ int main(void) return 0; } endef + +define SOURCE_VNCSERVER +#include + +int main(void) +{ + rfbIsActive((void *)0); + return 0; +} +endef -- 1.7.5.rc3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 5/5] kvm tools: Initialize and use VESA and VNC
From: John Floren Requirements - Kernel compiled with: CONFIG_FB_BOOT_VESA_SUPPORT=y CONFIG_FB_VESA=y CONFIG_FRAMEBUFFER_CONSOLE=y Start VNC server by starting kvm tools with "--vnc". Connect to the VNC server by running: "vncviewer :0". Since there is no support for input devices at this time, it may be useful starting kvm tools with an additional ' -p "console=ttyS0" ' parameter so that it would be possible to use a serial console alongside with a graphic one. Signed-off-by: John Floren [ turning code into patches and cleanup ] Signed-off-by: Sasha Levin --- tools/kvm/kvm-run.c | 17 +++-- 1 files changed, 15 insertions(+), 2 deletions(-) diff --git a/tools/kvm/kvm-run.c b/tools/kvm/kvm-run.c index 288e1fb..adbb25b 100644 --- a/tools/kvm/kvm-run.c +++ b/tools/kvm/kvm-run.c @@ -28,6 +28,7 @@ #include #include #include +#include /* header files for gitish interface */ #include @@ -66,6 +67,7 @@ static const char *virtio_9p_dir; static bool single_step; static bool readonly_image[MAX_DISK_IMAGES]; static bool virtio_rng; +static bool vnc; extern bool ioport_debug; extern int active_console; @@ -110,6 +112,7 @@ static const struct option options[] = { OPT_STRING('\0', "kvm-dev", &kvm_dev, "kvm-dev", "KVM device file"), OPT_STRING('\0', "virtio-9p", &virtio_9p_dir, "root dir", "Enable 9p over virtio"), + OPT_BOOLEAN('\0', "vnc", &vnc, "Enable VNC framebuffer"), OPT_GROUP("Kernel options:"), OPT_STRING('k', "kernel", &kernel_filename, "kernel", @@ -413,6 +416,7 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) char *hi; int i; void *ret; + u16 vidmode = 0; signal(SIGALRM, handle_sigalrm); signal(SIGQUIT, handle_sigquit); @@ -511,7 +515,13 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm->nrcpus = nrcpus; memset(real_cmdline, 0, sizeof(real_cmdline)); - strcpy(real_cmdline, "notsc noapic noacpi pci=conf1 console=ttyS0 earlyprintk=serial"); + strcpy(real_cmdline, "notsc noapic noacpi pci=conf1"); + if (vnc) { + strcat(real_cmdline, " video=vesafb console=tty0"); + vidmode = 0x312; + } else { + strcat(real_cmdline, " console=ttyS0 earlyprintk=serial"); + } strcat(real_cmdline, " "); if (kernel_cmdline) strlcat(real_cmdline, kernel_cmdline, sizeof(real_cmdline)); @@ -543,7 +553,7 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) printf(" # kvm run -k %s -m %Lu -c %d\n", kernel_filename, ram_size / 1024 / 1024, nrcpus); if (!kvm__load_kernel(kvm, kernel_filename, initrd_filename, - real_cmdline)) + real_cmdline, vidmode)) die("unable to load kernel %s", kernel_filename); kvm->vmlinux= vmlinux_filename; @@ -597,6 +607,9 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm__init_ram(kvm); + if (vnc) + vesa__init(kvm); + thread_pool__init(nr_online_cpus); for (i = 0; i < nrcpus; i++) { -- 1.7.5.rc3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 3/5] kvm tools: Add VESA device
From: John Floren Add a simple VESA device which simply moves a framebuffer from guest kernel to a VNC server. VESA device PCI code is very similar to virtio-* PCI code. Signed-off-by: John Floren [ turning code into patches and cleanup ] Signed-off-by: Sasha Levin --- tools/kvm/hw/vesa.c| 108 tools/kvm/include/kvm/ioport.h |2 + tools/kvm/include/kvm/vesa.h | 27 tools/kvm/include/kvm/virtio-pci-dev.h |3 + 4 files changed, 140 insertions(+), 0 deletions(-) create mode 100644 tools/kvm/hw/vesa.c create mode 100644 tools/kvm/include/kvm/vesa.h diff --git a/tools/kvm/hw/vesa.c b/tools/kvm/hw/vesa.c new file mode 100644 index 000..3003aa5 --- /dev/null +++ b/tools/kvm/hw/vesa.c @@ -0,0 +1,108 @@ +#include "kvm/vesa.h" +#include "kvm/ioport.h" +#include "kvm/util.h" +#include "kvm/kvm.h" +#include "kvm/pci.h" +#include "kvm/kvm-cpu.h" +#include "kvm/irq.h" +#include "kvm/virtio-pci-dev.h" + +#include + +#include +#include +#include +#include + +#define VESA_QUEUE_SIZE128 +#define VESA_IRQ 14 + +/* + * This "6000" value is pretty much the result of experimentation + * It seems that around this value, things update pretty smoothly + */ +#define VESA_UPDATE_TIME 6000 + +u8 videomem[VESA_MEM_SIZE]; + +static bool vesa_pci_io_in(struct kvm *kvm, u16 port, void *data, int size, u32 count) +{ + printf("vesa in port=%u\n", port); + return true; +} + +static bool vesa_pci_io_out(struct kvm *kvm, u16 port, void *data, int size, u32 count) +{ + printf("vesa out port=%u\n", port); + return true; +} + +static struct ioport_operations vesa_io_ops = { + .io_in = vesa_pci_io_in, + .io_out = vesa_pci_io_out, +}; + +static struct pci_device_header vesa_pci_device = { + .vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET, + .device_id = PCI_DEVICE_ID_VESA, + .header_type= PCI_HEADER_TYPE_NORMAL, + .revision_id= 0, + .class = 0x03, + .subsys_vendor_id = PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET, + .subsys_id = PCI_SUBSYSTEM_ID_VESA, + .bar[0] = IOPORT_VESA | PCI_BASE_ADDRESS_SPACE_IO, + .bar[1] = VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY, +}; + + +void vesa_mmio_callback(u64 addr, u8 *data, u32 len, u8 is_write) +{ + if (is_write) + memcpy(&videomem[addr - VESA_MEM_ADDR], data, len); + + return; +} + +void vesa__init(struct kvm *kvm) +{ + u8 dev, line, pin; + pthread_t thread; + + if (irq__register_device(PCI_DEVICE_ID_VESA, &dev, &pin, &line) < 0) + return; + + vesa_pci_device.irq_pin = pin; + vesa_pci_device.irq_line = line; + pci__register(&vesa_pci_device, dev); + ioport__register(IOPORT_VESA, &vesa_io_ops, IOPORT_VESA_SIZE); + + kvm__register_mmio(VESA_MEM_ADDR, VESA_MEM_SIZE, &vesa_mmio_callback); + pthread_create(&thread, NULL, vesa__dovnc, kvm); +} + +/* + * This starts a VNC server to display the framebuffer. + * It's not altogether clear this belongs here rather than in kvm-run.c + */ +void *vesa__dovnc(void *v) +{ + /* +* Make a fake argc and argv because the getscreen function +* seems to want it. +*/ + int ac = 1; + char av[1][1] = {{0} }; + rfbScreenInfoPtr server; + + server = rfbGetScreen(&ac, (char **)av, VESA_WIDTH, VESA_HEIGHT, 8, 3, 4); + server->frameBuffer = (char *)videomem; + server->alwaysShared = TRUE; + rfbInitServer(server); + + while (rfbIsActive(server)) { + rfbMarkRectAsModified(server, 0, 0, VESA_WIDTH, VESA_HEIGHT); + rfbProcessEvents(server, server->deferUpdateTime * VESA_UPDATE_TIME); + } + return NULL; +} + diff --git a/tools/kvm/include/kvm/ioport.h b/tools/kvm/include/kvm/ioport.h index 218530c..8253938 100644 --- a/tools/kvm/include/kvm/ioport.h +++ b/tools/kvm/include/kvm/ioport.h @@ -7,6 +7,8 @@ /* some ports we reserve for own use */ #define IOPORT_DBG 0xe0 +#define IOPORT_VESA0xa200 +#define IOPORT_VESA_SIZE 256 #define IOPORT_VIRTIO_P9 0xb200 /* Virtio 9P device */ #define IOPORT_VIRTIO_P9_SIZE 256 #define IOPORT_VIRTIO_BLK 0xc200 /* Virtio block device */ diff --git a/tools/kvm/include/kvm/vesa.h b/tools/kvm/include/kvm/vesa.h new file mode 100644 index 000..ff3ec75 --- /dev/null +++ b/tools/kvm/include/kvm/vesa.h @@ -0,0 +1,27 @@ +#ifndef KVM__VESA_H +#define KVM__VESA_H + +#include + +#define VESA_WIDTH 640 +#define VESA_HEIGHT480 + +#define VESA_MEM_ADDR 0xd000 +#define VESA_MEM_SIZE (4*VESA_WIDTH*VESA_HEIGHT) +#define VESA_BPP 32 + +struct kvm; +struct int10_args; + +void vesa_mmio_callback(u64, u8*,
[PATCH V3 2/5] kvm tools: Add video mode to kernel initialization
From: John Floren Allow setting video mode in guest kernel. For possible values see Documentation/fb/vesafb.txt Signed-off-by: John Floren [ turning code into patches and cleanup ] Signed-off-by: Sasha Levin --- tools/kvm/include/kvm/kvm.h |2 +- tools/kvm/kvm.c |7 --- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index 08c6fda..f951f2d 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -41,7 +41,7 @@ int kvm__max_cpus(struct kvm *kvm); void kvm__init_ram(struct kvm *kvm); void kvm__delete(struct kvm *kvm); bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename, - const char *initrd_filename, const char *kernel_cmdline); + const char *initrd_filename, const char *kernel_cmdline, u16 vidmode); void kvm__setup_bios(struct kvm *kvm); void kvm__start_timer(struct kvm *kvm); void kvm__stop_timer(struct kvm *kvm); diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 4393a41..7284211 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -320,7 +320,7 @@ static int load_flat_binary(struct kvm *kvm, int fd) static const char *BZIMAGE_MAGIC = "HdrS"; static bool load_bzimage(struct kvm *kvm, int fd_kernel, - int fd_initrd, const char *kernel_cmdline) + int fd_initrd, const char *kernel_cmdline, u16 vidmode) { struct boot_params *kern_boot; unsigned long setup_sects; @@ -383,6 +383,7 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, kern_boot->hdr.type_of_loader = 0xff; kern_boot->hdr.heap_end_ptr = 0xfe00; kern_boot->hdr.loadflags|= CAN_USE_HEAP; + kern_boot->hdr.vid_mode = vidmode; /* * Read initrd image into guest memory @@ -441,7 +442,7 @@ static bool initrd_check(int fd) } bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename, - const char *initrd_filename, const char *kernel_cmdline) + const char *initrd_filename, const char *kernel_cmdline, u16 vidmode) { bool ret; int fd_kernel = -1, fd_initrd = -1; @@ -459,7 +460,7 @@ bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename, die("%s is not an initrd", initrd_filename); } - ret = load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline); + ret = load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline, vidmode); if (initrd_filename) close(fd_initrd); -- 1.7.5.rc3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 1/5] kvm tools: Add BIOS INT10 handler
From: John Floren INT10 handler is a basic implementation of BIOS video services. The handler implements a VESA interface which is initialized at the very beginning of loading the kernel. Signed-off-by: John Floren [ turning code into patches and cleanup ] Signed-off-by: Sasha Levin --- tools/kvm/bios/bios-rom.S | 56 tools/kvm/bios/int10.c| 161 + 2 files changed, 189 insertions(+), 28 deletions(-) create mode 100644 tools/kvm/bios/int10.c diff --git a/tools/kvm/bios/bios-rom.S b/tools/kvm/bios/bios-rom.S index 8a53dcd..5645cd2 100644 --- a/tools/kvm/bios/bios-rom.S +++ b/tools/kvm/bios/bios-rom.S @@ -27,36 +27,36 @@ ENTRY_END(bios_intfake) * We ignore bx settings */ ENTRY(bios_int10) - test $0x0e, %ah - jne 1f + pushw %fs + pushl %es + pushl %edi + pushl %esi + pushl %ebp + pushl %esp + pushl %edx + pushl %ecx + pushl %ebx + pushl %eax + + movl%esp, %eax + /* this is way easier than doing it in assembly */ + /* just push all the regs and jump to a C handler */ + callint10_handler + + popl%eax + popl%ebx + popl%ecx + popl%edx + popl%esp + popl%ebp + popl%esi + popl%edi + popl%es + popw%fs -/* - * put char in AL at current cursor and - * increment cursor position - */ -putchar: - stack_swap - - push %fs - push %bx - - mov $VGA_RAM_SEG, %bx - mov %bx, %fs - mov %cs:(cursor), %bx - mov %al, %fs:(%bx) - inc %bx - test $VGA_PAGE_SIZE, %bx - jb putchar_new - xor %bx, %bx -putchar_new: - mov %bx, %fs:(cursor) - - pop %bx - pop %fs - - stack_restore -1: IRET + + /* * private IRQ data */ diff --git a/tools/kvm/bios/int10.c b/tools/kvm/bios/int10.c new file mode 100644 index 000..1ab3a67 --- /dev/null +++ b/tools/kvm/bios/int10.c @@ -0,0 +1,161 @@ +#include "kvm/segment.h" +#include "kvm/bios.h" +#include "kvm/util.h" +#include "kvm/vesa.h" +#include + +#define VESA_MAGIC ('V' + ('E' << 8) + ('S' << 16) + ('A' << 24)) + +struct int10_args { + u32 eax; + u32 ebx; + u32 ecx; + u32 edx; + u32 esp; + u32 ebp; + u32 esi; + u32 edi; + u32 es; +}; + +/* VESA General Information table */ +struct vesa_general_info { + u32 signature; /* 0 Magic number = "VESA" */ + u16 version;/* 4 */ + void *vendor_string;/* 6 */ + u32 capabilities; /* 10 */ + void *video_mode_ptr; /* 14 */ + u16 total_memory; /* 18 */ + + u8 reserved[236]; /* 20 */ +} __attribute__ ((packed)); + + +struct vminfo { + u16 mode_attr; /* 0 */ + u8 win_attr[2];/* 2 */ + u16 win_grain; /* 4 */ + u16 win_size; /* 6 */ + u16 win_seg[2]; /* 8 */ + u32 win_scheme; /* 12 */ + u16 logical_scan; /* 16 */ + + u16 h_res; /* 18 */ + u16 v_res; /* 20 */ + u8 char_width; /* 22 */ + u8 char_height;/* 23 */ + u8 memory_planes; /* 24 */ + u8 bpp;/* 25 */ + u8 banks; /* 26 */ + u8 memory_layout; /* 27 */ + u8 bank_size; /* 28 */ + u8 image_planes; /* 29 */ + u8 page_function; /* 30 */ + + u8 rmask; /* 31 */ + u8 rpos; /* 32 */ + u8 gmask; /* 33 */ + u8 gpos; /* 34 */ + u8 bmask; /* 35 */ + u8 bpos; /* 36 */ + u8 resv_mask; /* 37 */ + u8 resv_pos; /* 38 */ + u8 dcm_info; /* 39 */ + + u32 lfb_ptr;/* 40 Linear frame buffer address */ + u32 offscreen_ptr; /* 44 Offscreen memory address */ + u16 offscreen_size; /* 48 */ + + u8 reserved[206]; /* 50 */ +}; + +char oemstring[11] = "KVM VESA"; +u16 modes[2] = { 0x0112, 0x }; + +static inline void outb(unsigned short port, unsigned char val) +{ + asm volatile("outb %0, %1" : : "a"(val), "Nd"(port)); +} + +/* + * It's probably much more useful to make this print to the serial + * line rather than print to a non-displayed VGA memory + */ +static inline void int10_putchar(struct int10_args *args) +{ + u8 al, ah; + + al = args->eax & 0xFF; +
Re: [PATCH 5/5 V2] kvm tools: Initialize and use VESA and VNC
On 5/23/11 2:38 PM, Ingo Molnar wrote: * Sasha Levin wrote: @@ -511,7 +515,13 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm->nrcpus = nrcpus; memset(real_cmdline, 0, sizeof(real_cmdline)); - strcpy(real_cmdline, "notsc noapic noacpi pci=conf1 console=ttyS0 earlyprintk=serial"); + strcpy(real_cmdline, "notsc noapic noacpi pci=conf1"); + if (vnc) { + strcat(real_cmdline, " video=vesafb console=tty0"); + vidmode = 0x312; + } else { + strcat(real_cmdline, " console=ttyS0 earlyprintk=serial"); + } Hm, i think all the kernel parameter handling code wants to move into driver specific routines as well. Something like: serial_init(kvm, real_cmdline); where serial_init() would append to real_cmdline if needed. This removes a bit of serial-driver specific knowledge from kvm-run.c. Same goes for the VESA driver and the above video mode flag logic. @@ -597,6 +607,9 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm__init_ram(kvm); + if (vnc) + vesa__init(kvm); Shouldnt vesa__init() itself know about whether it's active (i.e. the 'vnc' flag is set) and return early if it's not set? That way this could become more encapsulated and self-sufficient: vesa__init(kvm); With no VESA driver specific state exposed to the generic kvm_cmd_run() function. Ideally kvm_cmd_run() hould just be a series of: serial_init(kvm, real_cmdline); vesa_init(kvm, real_cmdline); ... initialization routines. Later on even this could be removed: using section tricks we can put init functions into a section and drivers could register their init function like initcall(func) functions are registered within the kernel. kvm_cmd_run() could thus iterate over that (build time constructed) section like this: extern initcall_t __initcall_start[], __initcall_end[], __early_initcall_end[]; static void __init do_initcalls(void) { initcall_t *fn; for (fn = __early_initcall_end; fn< __initcall_end; fn++) do_one_initcall(*fn); } and would not actually have *any* knowledge about what drivers were built in. Currently it's fine to initialize everything explicitly - but this would be the long term model to work towards ... Prasad, didn't you have patches to do exactly that? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5 V2] kvm tools: Initialize and use VESA and VNC
* Sasha Levin wrote: > @@ -511,7 +515,13 @@ int kvm_cmd_run(int argc, const char **argv, const char > *prefix) > kvm->nrcpus = nrcpus; > > memset(real_cmdline, 0, sizeof(real_cmdline)); > - strcpy(real_cmdline, "notsc noapic noacpi pci=conf1 console=ttyS0 > earlyprintk=serial"); > + strcpy(real_cmdline, "notsc noapic noacpi pci=conf1"); > + if (vnc) { > + strcat(real_cmdline, " video=vesafb console=tty0"); > + vidmode = 0x312; > + } else { > + strcat(real_cmdline, " console=ttyS0 earlyprintk=serial"); > + } Hm, i think all the kernel parameter handling code wants to move into driver specific routines as well. Something like: serial_init(kvm, real_cmdline); where serial_init() would append to real_cmdline if needed. This removes a bit of serial-driver specific knowledge from kvm-run.c. Same goes for the VESA driver and the above video mode flag logic. > @@ -597,6 +607,9 @@ int kvm_cmd_run(int argc, const char **argv, const char > *prefix) > > kvm__init_ram(kvm); > > + if (vnc) > + vesa__init(kvm); Shouldnt vesa__init() itself know about whether it's active (i.e. the 'vnc' flag is set) and return early if it's not set? That way this could become more encapsulated and self-sufficient: vesa__init(kvm); With no VESA driver specific state exposed to the generic kvm_cmd_run() function. Ideally kvm_cmd_run() hould just be a series of: serial_init(kvm, real_cmdline); vesa_init(kvm, real_cmdline); ... initialization routines. Later on even this could be removed: using section tricks we can put init functions into a section and drivers could register their init function like initcall(func) functions are registered within the kernel. kvm_cmd_run() could thus iterate over that (build time constructed) section like this: extern initcall_t __initcall_start[], __initcall_end[], __early_initcall_end[]; static void __init do_initcalls(void) { initcall_t *fn; for (fn = __early_initcall_end; fn < __initcall_end; fn++) do_one_initcall(*fn); } and would not actually have *any* knowledge about what drivers were built in. Currently it's fine to initialize everything explicitly - but this would be the long term model to work towards ... Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5 V2] kvm tools: Add VESA device
* Sasha Levin wrote: > +struct int10args; this should be int10_args. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5 V2] kvm tools: Add video mode to kernel initialization
* Sasha Levin wrote: > bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename, > - const char *initrd_filename, const char > *kernel_cmdline); > + const char *initrd_filename, const char > *kernel_cmdline, u16 vidmode); Suggestion for future cleanup: we really want to gros a 'struct kernel_params' kind of thing which could be passed along here by address. That would make it easier to extent it with whatever may come along in the future, and would make the code look cleaner as well. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5 V2] kvm tools: Add BIOS INT10 handler
* Sasha Levin wrote: > INT10 handler is a basic implementation of BIOS video services. > > The handler implements a VESA interface which is initialized at > the very beginning of loading the kernel. > > Signed-off-by: John Floren > [ turning code into patches and cleanup ] > Signed-off-by: Sasha Levin > --- > tools/kvm/bios/bios-rom.S | 56 > tools/kvm/bios/int10.c| 161 > + > 2 files changed, 189 insertions(+), 28 deletions(-) > create mode 100644 tools/kvm/bios/int10.c > > diff --git a/tools/kvm/bios/bios-rom.S b/tools/kvm/bios/bios-rom.S > index 8a53dcd..b636cb8 100644 > --- a/tools/kvm/bios/bios-rom.S > +++ b/tools/kvm/bios/bios-rom.S > @@ -27,36 +27,36 @@ ENTRY_END(bios_intfake) > * We ignore bx settings > */ > ENTRY(bios_int10) > - test $0x0e, %ah > - jne 1f > + pushw %fs > + pushl %es > + pushl %edi > + pushl %esi > + pushl %ebp > + pushl %esp > + pushl %edx > + pushl %ecx > + pushl %ebx > + pushl %eax > + > + movl%esp, %eax > + /* this is way easier than doing it in assembly */ > + /* just push all the regs and jump to a C handler */ > + callint10handler > + > + popl%eax > + popl%ebx > + popl%ecx > + popl%edx > + popl%esp > + popl%ebp > + popl%esi > + popl%edi > + popl%es > + popw%fs > > -/* > - * put char in AL at current cursor and > - * increment cursor position > - */ > -putchar: > - stack_swap > - > - push %fs > - push %bx > - > - mov $VGA_RAM_SEG, %bx > - mov %bx, %fs > - mov %cs:(cursor), %bx > - mov %al, %fs:(%bx) > - inc %bx > - test $VGA_PAGE_SIZE, %bx > - jb putchar_new > - xor %bx, %bx > -putchar_new: > - mov %bx, %fs:(cursor) > - > - pop %bx > - pop %fs > - > - stack_restore > -1: > IRET > + > + > /* > * private IRQ data > */ > diff --git a/tools/kvm/bios/int10.c b/tools/kvm/bios/int10.c > new file mode 100644 > index 000..98205c3 > --- /dev/null > +++ b/tools/kvm/bios/int10.c > @@ -0,0 +1,161 @@ > +#include "kvm/segment.h" > +#include "kvm/bios.h" > +#include "kvm/util.h" > +#include "kvm/vesa.h" > +#include > + > +#define VESA_MAGIC ('V' + ('E' << 8) + ('S' << 16) + ('A' << 24)) > + > +struct int10args { > + u32 eax; > + u32 ebx; > + u32 ecx; > + u32 edx; > + u32 esp; > + u32 ebp; > + u32 esi; > + u32 edi; > + u32 es; > +}; > + > +/* VESA General Information table */ > +struct vesa_general_info { > + u32 signature; /* 0 Magic number = "VESA" */ > + u16 version;/* 4 */ > + void *vendor_string;/* 6 */ > + u32 capabilities; /* 10 */ > + void *video_mode_ptr; /* 14 */ > + u16 total_memory; /* 18 */ > + > + u8 reserved[236]; /* 20 */ > +} __attribute__ ((packed)); > + > + > +struct vminfo { > + u16 mode_attr; /* 0 */ > + u8 win_attr[2];/* 2 */ > + u16 win_grain; /* 4 */ > + u16 win_size; /* 6 */ > + u16 win_seg[2]; /* 8 */ > + u32 win_scheme; /* 12 */ > + u16 logical_scan; /* 16 */ > + > + u16 h_res; /* 18 */ > + u16 v_res; /* 20 */ > + u8 char_width; /* 22 */ > + u8 char_height;/* 23 */ > + u8 memory_planes; /* 24 */ > + u8 bpp;/* 25 */ > + u8 banks; /* 26 */ > + u8 memory_layout; /* 27 */ > + u8 bank_size; /* 28 */ > + u8 image_planes; /* 29 */ > + u8 page_function; /* 30 */ > + > + u8 rmask; /* 31 */ > + u8 rpos; /* 32 */ > + u8 gmask; /* 33 */ > + u8 gpos; /* 34 */ > + u8 bmask; /* 35 */ > + u8 bpos; /* 36 */ > + u8 resv_mask; /* 37 */ > + u8 resv_pos; /* 38 */ > + u8 dcm_info; /* 39 */ > + > + u32 lfb_ptr;/* 40 Linear frame buffer address */ > + u32 offscreen_ptr; /* 44 Offscreen memory address */ > + u16 offscreen_size; /* 48 */ > + > + u8 reserved[206]; /* 50 */ > +}; > + > +char oemstring[11] = "KVM VESA"; > +u16 modes[2] = { 0x0112, 0x }; > + > +static inline void outb(unsigned short port, unsigned char val) > +{ > + asm volatile("outb %0, %1" : : "a"(val), "Nd"(port)); > +} > + > +/* > + * It's probably much more useful to make this print to the serial > + * line r
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote: > On Sun, 22 May 2011 15:10:08 +0300, "Michael S. Tsirkin" > wrote: > > On Sat, May 21, 2011 at 11:49:59AM +0930, Rusty Russell wrote: > > > On Fri, 20 May 2011 02:11:56 +0300, "Michael S. Tsirkin" > > > wrote: > > > > Current code might introduce a lot of latency variation > > > > if there are many pending bufs at the time we > > > > attempt to transmit a new one. This is bad for > > > > real-time applications and can't be good for TCP either. > > > > > > Do we have more than speculation to back that up, BTW? > > > > Need to dig this up: I thought we saw some reports of this on the list? > > I think so too, but a reference needs to be here too. > > It helps to have exact benchmarks on what's being tested, otherwise we > risk unexpected interaction with the other optimization patches. > > > > > struct sk_buff *skb; > > > > unsigned int len; > > > > - > > > > - while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) { > > > > + bool c; > > > > + int n; > > > > + > > > > + /* We try to free up at least 2 skbs per one sent, so that > > > > we'll get > > > > +* all of the memory back if they are used fast enough. */ > > > > + for (n = 0; > > > > +((c = virtqueue_get_capacity(vi->svq) < capacity) || n < > > > > 2) && > > > > +((skb = virtqueue_get_buf(vi->svq, &len))); > > > > +++n) { > > > > pr_debug("Sent skb %p\n", skb); > > > > vi->dev->stats.tx_bytes += skb->len; > > > > vi->dev->stats.tx_packets++; > > > > dev_kfree_skb_any(skb); > > > > } > > > > + return !c; > > > > > > This is for() abuse :) > > > > > > Why is the capacity check in there at all? Surely it's simpler to try > > > to free 2 skbs each time around? > > > > This is in case we can't use indirect: we want to free up > > enough buffers for the following add_buf to succeed. > > Sure, or we could just count the frags of the skb we're taking out, > which would be accurate for both cases and far more intuitive. > > ie. always try to free up twice as much as we're about to put in. > > Can we hit problems with OOM? Sure, but no worse than now... > The problem is that this "virtqueue_get_capacity()" returns the worst > case, not the normal case. So using it is deceptive. > Maybe just document this? I still believe capacity really needs to be decided at the virtqueue level, not in the driver. E.g. with indirect each skb uses a single entry: freeing 1 small skb is always enough to have space for a large one. I do understand how it seems a waste to leave direct space in the ring while we might in practice have space due to indirect. Didn't come up with a nice way to solve this yet - but 'no worse than now :)' > > I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make > > sure we have enough space in the buffer. Another way to do > > that is with a define :). > > To do this properly, we should really be using the actual number of sg > elements needed, but we'd have to do most of xmit_skb beforehand so we > know how many. > > Cheers, > Rusty. Maybe I'm confused here. The problem isn't the failing add_buf for the given skb IIUC. What we are trying to do here is stop the queue *before xmit_skb fails*. We can't look at the number of fragments in the current skb - the next one can be much larger. That's why we check capacity after xmit_skb, not before it, right? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5 V2] kvm tools: Initialize and use VESA and VNC
Requirements - Kernel compiled with: CONFIG_FB_BOOT_VESA_SUPPORT=y CONFIG_FB_VESA=y CONFIG_FRAMEBUFFER_CONSOLE=y Start VNC server by starting kvm tools with "--vnc". Connect to the VNC server by running: "vncviewer :0". Since there is no support for input devices at this time, it may be useful starting kvm tools with an additional ' -p "console=ttyS0" ' parameter so that it would be possible to use a serial console alongside with a graphic one. Signed-off-by: John Floren [ turning code into patches and cleanup ] Signed-off-by: Sasha Levin --- tools/kvm/kvm-run.c | 17 +++-- 1 files changed, 15 insertions(+), 2 deletions(-) diff --git a/tools/kvm/kvm-run.c b/tools/kvm/kvm-run.c index 288e1fb..adbb25b 100644 --- a/tools/kvm/kvm-run.c +++ b/tools/kvm/kvm-run.c @@ -28,6 +28,7 @@ #include #include #include +#include /* header files for gitish interface */ #include @@ -66,6 +67,7 @@ static const char *virtio_9p_dir; static bool single_step; static bool readonly_image[MAX_DISK_IMAGES]; static bool virtio_rng; +static bool vnc; extern bool ioport_debug; extern int active_console; @@ -110,6 +112,7 @@ static const struct option options[] = { OPT_STRING('\0', "kvm-dev", &kvm_dev, "kvm-dev", "KVM device file"), OPT_STRING('\0', "virtio-9p", &virtio_9p_dir, "root dir", "Enable 9p over virtio"), + OPT_BOOLEAN('\0', "vnc", &vnc, "Enable VNC framebuffer"), OPT_GROUP("Kernel options:"), OPT_STRING('k', "kernel", &kernel_filename, "kernel", @@ -413,6 +416,7 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) char *hi; int i; void *ret; + u16 vidmode = 0; signal(SIGALRM, handle_sigalrm); signal(SIGQUIT, handle_sigquit); @@ -511,7 +515,13 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm->nrcpus = nrcpus; memset(real_cmdline, 0, sizeof(real_cmdline)); - strcpy(real_cmdline, "notsc noapic noacpi pci=conf1 console=ttyS0 earlyprintk=serial"); + strcpy(real_cmdline, "notsc noapic noacpi pci=conf1"); + if (vnc) { + strcat(real_cmdline, " video=vesafb console=tty0"); + vidmode = 0x312; + } else { + strcat(real_cmdline, " console=ttyS0 earlyprintk=serial"); + } strcat(real_cmdline, " "); if (kernel_cmdline) strlcat(real_cmdline, kernel_cmdline, sizeof(real_cmdline)); @@ -543,7 +553,7 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) printf(" # kvm run -k %s -m %Lu -c %d\n", kernel_filename, ram_size / 1024 / 1024, nrcpus); if (!kvm__load_kernel(kvm, kernel_filename, initrd_filename, - real_cmdline)) + real_cmdline, vidmode)) die("unable to load kernel %s", kernel_filename); kvm->vmlinux= vmlinux_filename; @@ -597,6 +607,9 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm__init_ram(kvm); + if (vnc) + vesa__init(kvm); + thread_pool__init(nr_online_cpus); for (i = 0; i < nrcpus; i++) { -- 1.7.5.rc3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5 V2] kvm tools: Update makefile and feature tests
Update feature tests to test for libvncserver. VESA support doesn't get compiled in unless libvncserver is installed. Signed-off-by: John Floren [ turning code into patches and cleanup ] Signed-off-by: Sasha Levin --- tools/kvm/Makefile | 11 ++- tools/kvm/config/feature-tests.mak | 10 ++ 2 files changed, 20 insertions(+), 1 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index e6e8d4e..2ebc86c 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -58,6 +58,14 @@ ifeq ($(has_bfd),y) LIBS+= -lbfd endif +FLAGS_VNCSERVER=$(CFLAGS) -lvncserver +has_vncserver := $(call try-cc,$(SOURCE_VNCSERVER),$(FLAGS_VNCSERVER)) +ifeq ($(has_vncserver),y) + CFLAGS += -DCONFIG_HAS_VNCSERVER + OBJS+= hw/vesa.o + LIBS+= -lvncserver +endif + DEPS := $(patsubst %.o,%.d,$(OBJS)) # Exclude BIOS object files from header dependencies. @@ -153,9 +161,10 @@ bios/bios.o: bios/bios.S bios/bios-rom.bin bios/bios-rom.bin: bios/bios-rom.S bios/e820.c $(E) " CC " $@ $(Q) $(CC) -include code16gcc.h $(CFLAGS) $(BIOS_CFLAGS) -c -s bios/e820.c -o bios/e820.o + $(Q) $(CC) -include code16gcc.h $(CFLAGS) $(BIOS_CFLAGS) -c -s bios/int10.c -o bios/int10.o $(Q) $(CC) $(CFLAGS) $(BIOS_CFLAGS) -c -s bios/bios-rom.S -o bios/bios-rom.o $(E) " LD " $@ - $(Q) ld -T bios/rom.ld.S -o bios/bios-rom.bin.elf bios/bios-rom.o bios/e820.o + $(Q) ld -T bios/rom.ld.S -o bios/bios-rom.bin.elf bios/bios-rom.o bios/e820.o bios/int10.o $(E) " OBJCOPY " $@ $(Q) objcopy -O binary -j .text bios/bios-rom.bin.elf bios/bios-rom.bin $(E) " NM " $@ diff --git a/tools/kvm/config/feature-tests.mak b/tools/kvm/config/feature-tests.mak index 6170fd2..0801b54 100644 --- a/tools/kvm/config/feature-tests.mak +++ b/tools/kvm/config/feature-tests.mak @@ -126,3 +126,13 @@ int main(void) return 0; } endef + +define SOURCE_VNCSERVER +#include + +int main(void) +{ + rfbIsActive((void *)0); + return 0; +} +endef -- 1.7.5.rc3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5 V2] kvm tools: Add VESA device
Add a simple VESA device which simply moves a framebuffer from guest kernel to a VNC server. VESA device PCI code is very similar to virtio-* PCI code. Signed-off-by: John Floren [ turning code into patches and cleanup ] Signed-off-by: Sasha Levin --- tools/kvm/hw/vesa.c| 108 tools/kvm/include/kvm/ioport.h |2 + tools/kvm/include/kvm/vesa.h | 27 tools/kvm/include/kvm/virtio-pci-dev.h |3 + 4 files changed, 140 insertions(+), 0 deletions(-) create mode 100644 tools/kvm/hw/vesa.c create mode 100644 tools/kvm/include/kvm/vesa.h diff --git a/tools/kvm/hw/vesa.c b/tools/kvm/hw/vesa.c new file mode 100644 index 000..3003aa5 --- /dev/null +++ b/tools/kvm/hw/vesa.c @@ -0,0 +1,108 @@ +#include "kvm/vesa.h" +#include "kvm/ioport.h" +#include "kvm/util.h" +#include "kvm/kvm.h" +#include "kvm/pci.h" +#include "kvm/kvm-cpu.h" +#include "kvm/irq.h" +#include "kvm/virtio-pci-dev.h" + +#include + +#include +#include +#include +#include + +#define VESA_QUEUE_SIZE128 +#define VESA_IRQ 14 + +/* + * This "6000" value is pretty much the result of experimentation + * It seems that around this value, things update pretty smoothly + */ +#define VESA_UPDATE_TIME 6000 + +u8 videomem[VESA_MEM_SIZE]; + +static bool vesa_pci_io_in(struct kvm *kvm, u16 port, void *data, int size, u32 count) +{ + printf("vesa in port=%u\n", port); + return true; +} + +static bool vesa_pci_io_out(struct kvm *kvm, u16 port, void *data, int size, u32 count) +{ + printf("vesa out port=%u\n", port); + return true; +} + +static struct ioport_operations vesa_io_ops = { + .io_in = vesa_pci_io_in, + .io_out = vesa_pci_io_out, +}; + +static struct pci_device_header vesa_pci_device = { + .vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET, + .device_id = PCI_DEVICE_ID_VESA, + .header_type= PCI_HEADER_TYPE_NORMAL, + .revision_id= 0, + .class = 0x03, + .subsys_vendor_id = PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET, + .subsys_id = PCI_SUBSYSTEM_ID_VESA, + .bar[0] = IOPORT_VESA | PCI_BASE_ADDRESS_SPACE_IO, + .bar[1] = VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY, +}; + + +void vesa_mmio_callback(u64 addr, u8 *data, u32 len, u8 is_write) +{ + if (is_write) + memcpy(&videomem[addr - VESA_MEM_ADDR], data, len); + + return; +} + +void vesa__init(struct kvm *kvm) +{ + u8 dev, line, pin; + pthread_t thread; + + if (irq__register_device(PCI_DEVICE_ID_VESA, &dev, &pin, &line) < 0) + return; + + vesa_pci_device.irq_pin = pin; + vesa_pci_device.irq_line = line; + pci__register(&vesa_pci_device, dev); + ioport__register(IOPORT_VESA, &vesa_io_ops, IOPORT_VESA_SIZE); + + kvm__register_mmio(VESA_MEM_ADDR, VESA_MEM_SIZE, &vesa_mmio_callback); + pthread_create(&thread, NULL, vesa__dovnc, kvm); +} + +/* + * This starts a VNC server to display the framebuffer. + * It's not altogether clear this belongs here rather than in kvm-run.c + */ +void *vesa__dovnc(void *v) +{ + /* +* Make a fake argc and argv because the getscreen function +* seems to want it. +*/ + int ac = 1; + char av[1][1] = {{0} }; + rfbScreenInfoPtr server; + + server = rfbGetScreen(&ac, (char **)av, VESA_WIDTH, VESA_HEIGHT, 8, 3, 4); + server->frameBuffer = (char *)videomem; + server->alwaysShared = TRUE; + rfbInitServer(server); + + while (rfbIsActive(server)) { + rfbMarkRectAsModified(server, 0, 0, VESA_WIDTH, VESA_HEIGHT); + rfbProcessEvents(server, server->deferUpdateTime * VESA_UPDATE_TIME); + } + return NULL; +} + diff --git a/tools/kvm/include/kvm/ioport.h b/tools/kvm/include/kvm/ioport.h index 218530c..8253938 100644 --- a/tools/kvm/include/kvm/ioport.h +++ b/tools/kvm/include/kvm/ioport.h @@ -7,6 +7,8 @@ /* some ports we reserve for own use */ #define IOPORT_DBG 0xe0 +#define IOPORT_VESA0xa200 +#define IOPORT_VESA_SIZE 256 #define IOPORT_VIRTIO_P9 0xb200 /* Virtio 9P device */ #define IOPORT_VIRTIO_P9_SIZE 256 #define IOPORT_VIRTIO_BLK 0xc200 /* Virtio block device */ diff --git a/tools/kvm/include/kvm/vesa.h b/tools/kvm/include/kvm/vesa.h new file mode 100644 index 000..3e58587 --- /dev/null +++ b/tools/kvm/include/kvm/vesa.h @@ -0,0 +1,27 @@ +#ifndef KVM__VESA_H +#define KVM__VESA_H + +#include + +#define VESA_WIDTH 640 +#define VESA_HEIGHT480 + +#define VESA_MEM_ADDR 0xd000 +#define VESA_MEM_SIZE (4*VESA_WIDTH*VESA_HEIGHT) +#define VESA_BPP 32 + +struct kvm; +struct int10args; + +void vesa_mmio_callback(u64, u8*, u32, u8); +void vesa_
[PATCH 2/5 V2] kvm tools: Add video mode to kernel initialization
Allow setting video mode in guest kernel. For possible values see Documentation/fb/vesafb.txt Signed-off-by: John Floren [ turning code into patches and cleanup ] Signed-off-by: Sasha Levin --- tools/kvm/include/kvm/kvm.h |2 +- tools/kvm/kvm.c |7 --- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index 08c6fda..f951f2d 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -41,7 +41,7 @@ int kvm__max_cpus(struct kvm *kvm); void kvm__init_ram(struct kvm *kvm); void kvm__delete(struct kvm *kvm); bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename, - const char *initrd_filename, const char *kernel_cmdline); + const char *initrd_filename, const char *kernel_cmdline, u16 vidmode); void kvm__setup_bios(struct kvm *kvm); void kvm__start_timer(struct kvm *kvm); void kvm__stop_timer(struct kvm *kvm); diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 4393a41..7284211 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -320,7 +320,7 @@ static int load_flat_binary(struct kvm *kvm, int fd) static const char *BZIMAGE_MAGIC = "HdrS"; static bool load_bzimage(struct kvm *kvm, int fd_kernel, - int fd_initrd, const char *kernel_cmdline) + int fd_initrd, const char *kernel_cmdline, u16 vidmode) { struct boot_params *kern_boot; unsigned long setup_sects; @@ -383,6 +383,7 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, kern_boot->hdr.type_of_loader = 0xff; kern_boot->hdr.heap_end_ptr = 0xfe00; kern_boot->hdr.loadflags|= CAN_USE_HEAP; + kern_boot->hdr.vid_mode = vidmode; /* * Read initrd image into guest memory @@ -441,7 +442,7 @@ static bool initrd_check(int fd) } bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename, - const char *initrd_filename, const char *kernel_cmdline) + const char *initrd_filename, const char *kernel_cmdline, u16 vidmode) { bool ret; int fd_kernel = -1, fd_initrd = -1; @@ -459,7 +460,7 @@ bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename, die("%s is not an initrd", initrd_filename); } - ret = load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline); + ret = load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline, vidmode); if (initrd_filename) close(fd_initrd); -- 1.7.5.rc3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5 V2] kvm tools: Add BIOS INT10 handler
INT10 handler is a basic implementation of BIOS video services. The handler implements a VESA interface which is initialized at the very beginning of loading the kernel. Signed-off-by: John Floren [ turning code into patches and cleanup ] Signed-off-by: Sasha Levin --- tools/kvm/bios/bios-rom.S | 56 tools/kvm/bios/int10.c| 161 + 2 files changed, 189 insertions(+), 28 deletions(-) create mode 100644 tools/kvm/bios/int10.c diff --git a/tools/kvm/bios/bios-rom.S b/tools/kvm/bios/bios-rom.S index 8a53dcd..b636cb8 100644 --- a/tools/kvm/bios/bios-rom.S +++ b/tools/kvm/bios/bios-rom.S @@ -27,36 +27,36 @@ ENTRY_END(bios_intfake) * We ignore bx settings */ ENTRY(bios_int10) - test $0x0e, %ah - jne 1f + pushw %fs + pushl %es + pushl %edi + pushl %esi + pushl %ebp + pushl %esp + pushl %edx + pushl %ecx + pushl %ebx + pushl %eax + + movl%esp, %eax + /* this is way easier than doing it in assembly */ + /* just push all the regs and jump to a C handler */ + callint10handler + + popl%eax + popl%ebx + popl%ecx + popl%edx + popl%esp + popl%ebp + popl%esi + popl%edi + popl%es + popw%fs -/* - * put char in AL at current cursor and - * increment cursor position - */ -putchar: - stack_swap - - push %fs - push %bx - - mov $VGA_RAM_SEG, %bx - mov %bx, %fs - mov %cs:(cursor), %bx - mov %al, %fs:(%bx) - inc %bx - test $VGA_PAGE_SIZE, %bx - jb putchar_new - xor %bx, %bx -putchar_new: - mov %bx, %fs:(cursor) - - pop %bx - pop %fs - - stack_restore -1: IRET + + /* * private IRQ data */ diff --git a/tools/kvm/bios/int10.c b/tools/kvm/bios/int10.c new file mode 100644 index 000..98205c3 --- /dev/null +++ b/tools/kvm/bios/int10.c @@ -0,0 +1,161 @@ +#include "kvm/segment.h" +#include "kvm/bios.h" +#include "kvm/util.h" +#include "kvm/vesa.h" +#include + +#define VESA_MAGIC ('V' + ('E' << 8) + ('S' << 16) + ('A' << 24)) + +struct int10args { + u32 eax; + u32 ebx; + u32 ecx; + u32 edx; + u32 esp; + u32 ebp; + u32 esi; + u32 edi; + u32 es; +}; + +/* VESA General Information table */ +struct vesa_general_info { + u32 signature; /* 0 Magic number = "VESA" */ + u16 version;/* 4 */ + void *vendor_string;/* 6 */ + u32 capabilities; /* 10 */ + void *video_mode_ptr; /* 14 */ + u16 total_memory; /* 18 */ + + u8 reserved[236]; /* 20 */ +} __attribute__ ((packed)); + + +struct vminfo { + u16 mode_attr; /* 0 */ + u8 win_attr[2];/* 2 */ + u16 win_grain; /* 4 */ + u16 win_size; /* 6 */ + u16 win_seg[2]; /* 8 */ + u32 win_scheme; /* 12 */ + u16 logical_scan; /* 16 */ + + u16 h_res; /* 18 */ + u16 v_res; /* 20 */ + u8 char_width; /* 22 */ + u8 char_height;/* 23 */ + u8 memory_planes; /* 24 */ + u8 bpp;/* 25 */ + u8 banks; /* 26 */ + u8 memory_layout; /* 27 */ + u8 bank_size; /* 28 */ + u8 image_planes; /* 29 */ + u8 page_function; /* 30 */ + + u8 rmask; /* 31 */ + u8 rpos; /* 32 */ + u8 gmask; /* 33 */ + u8 gpos; /* 34 */ + u8 bmask; /* 35 */ + u8 bpos; /* 36 */ + u8 resv_mask; /* 37 */ + u8 resv_pos; /* 38 */ + u8 dcm_info; /* 39 */ + + u32 lfb_ptr;/* 40 Linear frame buffer address */ + u32 offscreen_ptr; /* 44 Offscreen memory address */ + u16 offscreen_size; /* 48 */ + + u8 reserved[206]; /* 50 */ +}; + +char oemstring[11] = "KVM VESA"; +u16 modes[2] = { 0x0112, 0x }; + +static inline void outb(unsigned short port, unsigned char val) +{ + asm volatile("outb %0, %1" : : "a"(val), "Nd"(port)); +} + +/* + * It's probably much more useful to make this print to the serial + * line rather than print to a non-displayed VGA memory + */ +static inline void int10putchar(struct int10args *args) +{ + u8 al, ah; + + al = args->eax & 0xFF; + ah = (args->eax &
Re: Some errors when running KVM-Autotest on kernel-2.6.39
On 05/23/2011 12:43 PM, Zhi Yong Wu wrote: HI, guys, Some warnings and errors appear when running KVM-autotest on kernel 2.6.39 Can anyone give some comments? Is it a known issue, new, or a problem with my setup? /home/zwu/work/virt/autotest/client/tests/kvm/qemu -name 'vm1' -monitor unix:'/tmp/monitor-humanmonitor1-20110523-101151-G8Zb',server,nowait -serial unix:'/tmp/serial-20110523-101151-G8Zb',server,nowait -m 512 -smp 2 -kernel '/home/zwu/work/virt/autotest/client/tests/kvm/unittests/emulator.flat' -vnc :0 -chardev file,id=testlog,path=/tmp/testlog-20110523-101151-G8Zb -device testdev,chardev=testlog -S 10:13:16 INFO | (qemu) Code=44 24 08 03 00 00 00 c7 44 24 0c 04 00 00 00 66 0f 6f 04 24 0f 7f 03 48 89 de 48 89 e7 e8 a8 ee ff ff 0f b6 f0 bf cb c3 40 00 e8 c4 ee ff ff c7 03 That's a movdqu instruction. 2.6.40 gained support for emulating this instruction, and the emulator unit test has a new test for it. Obviously it will fail on earlier kernels. We need some way to tell the test to expect failures on older kernels. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda dfor May 24th
Please send in any agenda items you are interested in covering. Thanks, Juan. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On 05/22/2011 10:32 PM, Nadav Har'El wrote: On Thu, May 12, 2011, Gleb Natapov wrote about "Re: [PATCH 0/30] nVMX: Nested VMX, v9": > > But if my interpretation of the code is correct, SVM isn't much closer > > than VMX to the goal of moving this logic to x86.c. When some logic is > > moved there, both SVM and VMX code will need to change - perhaps even > > considerably. So how will it be helpful to make VMX behave exactly like > > SVM does now, when the latter will also need to change considerably? > > > SVM design is much close to the goal of moving the logic into x86.c > because IIRC it does not bypass parsing of IDT vectoring info into arch > independent structure. VMX code uses vmx->idt_vectoring_info directly. At the risk of sounding blasphemous, I'd like to make the case that perhaps the current nested-VMX design - regarding the IDT-vectoring-info-field handling - is actually closer than nested-SVM to the goal of moving clean nested-supporting logic into x86.c, instead of having ad-hoc, unnatural, workarounds. Let me explain, and see if you agree with my logic: We discover at exit time whether the virtualization hardware (VMX or SVM) exited while *delivering* an interrupt or exception to the current guest. This is known as "idt-vectoring-information" in VMX. What do we need to do with this idt-vectoring-information? In regular (non- nested) guests, the answer is simple: On the next entry, we need to inject this event again into the guest, so it can resume the delivery of the same event it was trying to deliver. This is why the nested-unaware code has a vmx_complete_interrupts which basically adds this idt-vectoring-info into KVM's event queue, which on the next entry will be injected similarly to the way virtual interrupts from userspace are injected, and so on. The other thing we may need to do, is to expose it to userspace in case we're live migrating at exactly this point in time. But with nested virtualization, this is *not* what is supposed to happen - we do not *always* need to inject the event to the guest. We will only need to inject the event if the next entry will be again to the same guest, i.e., L1 after L1, or L2 after L2. If the idt-vectoring-info came from L2, but our next entry will be into L1 (i.e., a nested exit), we *shouldn't* inject the event as usual, but should rather pass this idt-vectoring-info field as the exit information that L1 gets (in nested vmx terminology, in vmcs12). However, at the time of exit, we cannot know for sure whether L2 will actually run next, because it is still possible that an injection from user space, before the next entry, will cause us to decide to exit to L1. Therefore, I believe that the clean solution isn't to leave the original non-nested logic that always queues the idt-vectoring-info assuming it will be injected, and then if it shouldn't (because we want to exit during entry) we need to skip the entry once as a "trick" to avoid this wrong injection. Rather, a clean solution is, I think, to recognize that in nested virtualization, idt-vectoring-info is a different kind of beast than regular injected events, and it needs to be saved at exit time in a different field (which will of course be common to SVM and VMX). Only at entry time, after the regular injection code (which may cause a nested exit), we can call a x86_op to handle this special injection. The benefit of this approach, which is closer to the current vmx code, is, I think, that x86.c will contain clear, self-explanatory nested logic, instead of relying on vmx.c or svm.c circumventing various x86.c functions and mechanisms to do something different from what they were meant to do. IMO this will cause confusion, especially with the user interfaces used to read/write pending events. I think what we need to do is: 1. change ->interrupt_allowed() to return true if the interrupt flag is unmasked OR if in a nested guest, and we're intercepting interrupts 2. change ->set_irq() to cause a nested vmexit if in a nested guest and we're intercepting interrupts 3. change ->nmi_allowed() and ->set_nmi() in a similar way 4. add a .injected flag to the interrupt queue which overrides the nested vmexit for VM_ENTRY_INTR_INFO_FIELD and the svm equivalent; if present normal injection takes place (or an error vmexit if the interrupt flag is clear and we cannot inject) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Some errors when running KVM-Autotest on kernel-2.6.39
HI, guys, Some warnings and errors appear when running KVM-autotest on kernel 2.6.39 Can anyone give some comments? Is it a known issue, new, or a problem with my setup? [root@f12 linux-2.6]# uname -a Linux f12 2.6.39 #2 SMP Fri May 20 19:51:05 CST 2011 x86_64 x86_64 x86_64 GNU/Linux [root@f12 linux-2.6]# modinfo kvm filename: /lib/modules/2.6.39/kernel/arch/x86/kvm/kvm.ko license:GPL author: Qumranet srcversion: ABB5612DB8B1955AA82288F depends: vermagic: 2.6.39 SMP mod_unload parm: oos_shadow:bool parm: ignore_msrs:bool [root@f12 linux-2.6]# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz stepping: 11 cpu MHz : 2667.000 cache size : 4096 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts tpr_shadow vnmi flexpriority bogomips: 5320.45 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz stepping: 11 cpu MHz : 2000.000 cache size : 4096 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts tpr_shadow vnmi flexpriority bogomips: 5319.99 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: Below is the output. 10:13:04 INFO | Running apic 10:13:04 WARNI| Could not send monitor command 'screendump /home/zwu/work/virt/autotest/client/results/default/kvm.unittest/debug/pre_vm1.ppm' ([Errno 32] Broken pipe) 10:13:04 INFO | Running qemu command: /home/zwu/work/virt/autotest/client/tests/kvm/qemu -name 'vm1' -monitor unix:'/tmp/monitor-humanmonitor1-20110523-101151-G8Zb',server,nowait -serial unix:'/tmp/serial-20110523-101151-G8Zb',server,nowait -m 512 -smp 2 -kernel '/home/zwu/work/virt/autotest/client/tests/kvm/unittests/apic.flat' -vnc :0 -chardev file,id=testlog,path=/tmp/testlog-20110523-101151-G8Zb -device testdev,chardev=testlog -S -cpu qemu64,+x2apic 10:13:05 INFO | Waiting for unittest apic to complete, timeout 600, output in /tmp/testlog-20110523-101151-G8Zb 10:13:07 INFO | (qemu) /bin/sh: line 1: 18661 Segmentation fault (core dumped) /home/zwu/work/virt/autotest/client/tests/kvm/qemu -name 'vm1' -monitor unix:'/tmp/monitor-humanmonitor1-20110523-101151-G8Zb',server,nowait -serial unix:'/tmp/serial-20110523-101151-G8Zb',server,nowait -m 512 -smp 2 -kernel '/home/zwu/work/virt/autotest/client/tests/kvm/unittests/apic.flat' -vnc :0 -chardev file,id=testlog,path=/tmp/testlog-20110523-101151-G8Zb -device testdev,chardev=testlog -S -cpu qemu64,+x2apic 10:13:07 INFO | (qemu) (Process terminated with status 139) 10:13:07 ERROR| Unit test apic failed 10:13:07 INFO | Unit test log collected and available under /home/zwu/work/virt/autotest/client/results/default/kvm.unittest/debug/apic.log 10:13:07 INFO | Running svm 10:13:07 WARNI| Could not send monitor command 'screendump /home/zwu/work/virt/autotest/client/results/default/kvm.unittest/debug/pre_vm1.ppm' ([Errno 32] Broken pipe) 10:13:07 INFO | Running qemu command: /home/zwu/work/virt/autotest/client/tests/kvm/qemu -name 'vm1' -monitor unix:'/tmp/monitor-humanmonitor1-20110523-101151-G8Zb',server,nowait -serial unix:'/tmp/serial-20110523-101151-G8Zb',server,nowait -m 512 -smp 2 -kernel '/home/zwu/work/virt/autotest/client/tests/kvm/unittests/svm.flat' -vnc :0 -chardev file,id=testlog,path=/tmp/testlog-20110523-101151-G8Zb -device testdev,chardev=testlog -S -enable-nesting -cpu qemu64,+svm 10:13:08 INFO | Waiting for unittest svm to complete, timeout 600, output in /tmp/testlog-20110523-101151-G8Zb 10:13:09 INFO | (qemu) (Process terminated with status 0) 10:13:10 INFO | Unit te
Re: [PATCH 1/5] kvm tools: Add BIOS INT10 handler
* Sasha Levin wrote: > INT10 handler is a basic implementation of BIOS video services. > > The handler implements a VESA interface which is initialized at > the very beginning of loading the kernel. > > Signed-off-by: John Floren > Signed-off-by: Sasha Levin Btw., the signoff chain looks broken - this will look odd in Git. If you took most of this from John then please put this in the first line of the patch: From: John Floren That way John will be marked by Git as the author and you are the patch maintainer who nursed along the patch. If you did significant changes to the patch (such as splitting it off a larger patch, cleaning it up, etc.) you can mark this before your SOB entry: Signed-off-by: John Floren [ split up the patch and cleaned it up ] Signed-off-by: Sasha Levin If you did so many changes to a patch that you can reasonably be called the main author then you can be the From line and can mark John's first version as: Originally-From: John Floren Signed-off-by: Sasha Levin If John has put copyright notices into the file then those should be preserved, and you can add yours as well, if you so wish. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/30] nVMX: Nested VMX, v9
On Sun, May 22, 2011 at 10:32:39PM +0300, Nadav Har'El wrote: > At the risk of sounding blasphemous, I'd like to make the case that perhaps > the current nested-VMX design - regarding the IDT-vectoring-info-field > handling - is actually closer than nested-SVM to the goal of moving clean > nested-supporting logic into x86.c, instead of having ad-hoc, unnatural, > workarounds. Well, the nested SVM implementation is certainly not perfect in this regard :) > Therefore, I believe that the clean solution isn't to leave the original > non-nested logic that always queues the idt-vectoring-info assuming it will > be injected, and then if it shouldn't (because we want to exit during entry) > we need to skip the entry once as a "trick" to avoid this wrong injection. > > Rather, a clean solution is, I think, to recognize that in nested > virtualization, idt-vectoring-info is a different kind of beast than regular > injected events, and it needs to be saved at exit time in a different field > (which will of course be common to SVM and VMX). Only at entry time, after > the regular injection code (which may cause a nested exit), we can call a > x86_op to handle this special injection. Things are complicated either way. If you keep the vectoring-info seperate from the kvm exception queue you need special logic to combine the vectoring-info and the queue. For example, imagine something is pending in idt-vectoring info and the intercept causes another exception for the guest. KVM needs to turn this into the #DF then. When we just queue the vectoring-info into the exception queue we get this implicitly without extra code. This is a cleaner way imho. On the other side, when using the exception queue we need to keep extra-information for nesting in the queue because an event which is just re-injected into L2 must not cause a nested vmexit, even if the exception vector is intercepted by L1. But this is the same for SVM and VMX so we can do this in generic x86 code. This is not the case when keeping track of idt-vectoring info seperate in architecture code. Regards, Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] kvm tools: Initialize and use VESA and VNC
* Sasha Levin wrote: > @@ -598,6 +608,13 @@ int kvm_cmd_run(int argc, const char **argv, const char > *prefix) > > kvm__init_ram(kvm); > > + if (vnc) { > + pthread_t thread; > + > + vesa__init(kvm); > + pthread_create(&thread, NULL, vesa__dovnc, kvm); > + } > + This should be encapsulated better, it should probably be all be done within vesa__init() and the only kv_cmd_run() exposure should be: vesa__init(kvm); vesa__init() would wrap to an empty inline function if the library prereqs are not present. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] kvm tools: Initialize and use VESA and VNC
Requirements - Kernel compiled with: CONFIG_FB_BOOT_VESA_SUPPORT=y CONFIG_FB_VESA=y CONFIG_FRAMEBUFFER_CONSOLE=y Start VNC server by starting kvm tools with "--vnc". Connect to the VNC server by running: "vncviewer :0". Since there is no support for input devices at this time, it may be useful starting kvm tools with an additional ' -p "console=ttyS0" ' parameter so that it would be possible to use a serial console alongside with a graphic one. Signed-off-by: John Floren Signed-off-by: Sasha Levin --- tools/kvm/kvm-run.c | 21 +++-- 1 files changed, 19 insertions(+), 2 deletions(-) diff --git a/tools/kvm/kvm-run.c b/tools/kvm/kvm-run.c index f7de0fb..5acddb2 100644 --- a/tools/kvm/kvm-run.c +++ b/tools/kvm/kvm-run.c @@ -28,6 +28,7 @@ #include #include #include +#include /* header files for gitish interface */ #include @@ -67,6 +68,7 @@ static const char *virtio_9p_dir; static bool single_step; static bool readonly_image[MAX_DISK_IMAGES]; static bool virtio_rng; +static bool vnc; extern bool ioport_debug; extern int active_console; @@ -111,6 +113,7 @@ static const struct option options[] = { OPT_STRING('\0', "kvm-dev", &kvm_dev, "kvm-dev", "KVM device file"), OPT_STRING('\0', "virtio-9p", &virtio_9p_dir, "root dir", "Enable 9p over virtio"), + OPT_BOOLEAN('\0', "vnc", &vnc, "Enable VNC framebuffer"), OPT_GROUP("Kernel options:"), OPT_STRING('k', "kernel", &kernel_filename, "kernel", @@ -414,6 +417,7 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) char *hi; int i; void *ret; + u16 vidmode = 0; signal(SIGALRM, handle_sigalrm); signal(SIGQUIT, handle_sigquit); @@ -512,7 +516,13 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm->nrcpus = nrcpus; memset(real_cmdline, 0, sizeof(real_cmdline)); - strcpy(real_cmdline, "notsc noapic noacpi pci=conf1 console=ttyS0 earlyprintk=serial"); + strcpy(real_cmdline, "notsc noapic noacpi pci=conf1"); + if (vnc) { + strcat(real_cmdline, " video=vesafb:ypan console=tty0"); + vidmode = 0x312; + } else { + strcat(real_cmdline, " console=ttyS0 earlyprintk=serial"); + } strcat(real_cmdline, " "); if (kernel_cmdline) strlcat(real_cmdline, kernel_cmdline, sizeof(real_cmdline)); @@ -544,7 +554,7 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) printf(" # kvm run -k %s -m %Lu -c %d\n", kernel_filename, ram_size / 1024 / 1024, nrcpus); if (!kvm__load_kernel(kvm, kernel_filename, initrd_filename, - real_cmdline)) + real_cmdline, vidmode)) die("unable to load kernel %s", kernel_filename); kvm->vmlinux= vmlinux_filename; @@ -598,6 +608,13 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm__init_ram(kvm); + if (vnc) { + pthread_t thread; + + vesa__init(kvm); + pthread_create(&thread, NULL, vesa__dovnc, kvm); + } + thread_pool__init(nr_online_cpus); for (i = 0; i < nrcpus; i++) { -- 1.7.5.rc3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] kvm tools: Update makefile and feature tests
Update feature tests to test for libvncserver. VESA support doesn't get compiled in unless libvncserver is installed. Signed-off-by: John Floren Signed-off-by: Sasha Levin --- tools/kvm/Makefile | 11 ++- tools/kvm/config/feature-tests.mak | 10 ++ 2 files changed, 20 insertions(+), 1 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index e6e8d4e..2ebc86c 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -58,6 +58,14 @@ ifeq ($(has_bfd),y) LIBS+= -lbfd endif +FLAGS_VNCSERVER=$(CFLAGS) -lvncserver +has_vncserver := $(call try-cc,$(SOURCE_VNCSERVER),$(FLAGS_VNCSERVER)) +ifeq ($(has_vncserver),y) + CFLAGS += -DCONFIG_HAS_VNCSERVER + OBJS+= hw/vesa.o + LIBS+= -lvncserver +endif + DEPS := $(patsubst %.o,%.d,$(OBJS)) # Exclude BIOS object files from header dependencies. @@ -153,9 +161,10 @@ bios/bios.o: bios/bios.S bios/bios-rom.bin bios/bios-rom.bin: bios/bios-rom.S bios/e820.c $(E) " CC " $@ $(Q) $(CC) -include code16gcc.h $(CFLAGS) $(BIOS_CFLAGS) -c -s bios/e820.c -o bios/e820.o + $(Q) $(CC) -include code16gcc.h $(CFLAGS) $(BIOS_CFLAGS) -c -s bios/int10.c -o bios/int10.o $(Q) $(CC) $(CFLAGS) $(BIOS_CFLAGS) -c -s bios/bios-rom.S -o bios/bios-rom.o $(E) " LD " $@ - $(Q) ld -T bios/rom.ld.S -o bios/bios-rom.bin.elf bios/bios-rom.o bios/e820.o + $(Q) ld -T bios/rom.ld.S -o bios/bios-rom.bin.elf bios/bios-rom.o bios/e820.o bios/int10.o $(E) " OBJCOPY " $@ $(Q) objcopy -O binary -j .text bios/bios-rom.bin.elf bios/bios-rom.bin $(E) " NM " $@ diff --git a/tools/kvm/config/feature-tests.mak b/tools/kvm/config/feature-tests.mak index 6170fd2..0801b54 100644 --- a/tools/kvm/config/feature-tests.mak +++ b/tools/kvm/config/feature-tests.mak @@ -126,3 +126,13 @@ int main(void) return 0; } endef + +define SOURCE_VNCSERVER +#include + +int main(void) +{ + rfbIsActive((void *)0); + return 0; +} +endef -- 1.7.5.rc3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] kvm tools: Add VESA device
Add a simple VESA device which simply moves a framebuffer from guest kernel to a VNC server. VESA device PCI code is very similar to virtio-* PCI code. Signed-off-by: John Floren Signed-off-by: Sasha Levin --- tools/kvm/hw/vesa.c| 106 tools/kvm/include/kvm/ioport.h |2 + tools/kvm/include/kvm/vesa.h | 31 + tools/kvm/include/kvm/virtio-pci-dev.h |3 + 4 files changed, 142 insertions(+), 0 deletions(-) create mode 100644 tools/kvm/hw/vesa.c create mode 100644 tools/kvm/include/kvm/vesa.h diff --git a/tools/kvm/hw/vesa.c b/tools/kvm/hw/vesa.c new file mode 100644 index 000..c1a4c64 --- /dev/null +++ b/tools/kvm/hw/vesa.c @@ -0,0 +1,106 @@ +#include "kvm/vesa.h" +#include "kvm/ioport.h" +#include "kvm/util.h" +#include "kvm/kvm.h" +#include "kvm/pci.h" +#include "kvm/kvm-cpu.h" +#include "kvm/irq.h" +#include "kvm/virtio-pci-dev.h" + +#include + +#include +#include +#include +#include + +#define VESA_QUEUE_SIZE128 +#define VESA_IRQ 14 + +/* + * This "6000" value is pretty much the result of experimentation + * It seems that around this value, things update pretty smoothly + */ +#define VESA_UPDATE_TIME 6000 + +u8 videomem[VESA_MEM_SIZE]; + +static bool vesa_pci_io_in(struct kvm *kvm, u16 port, void *data, int size, u32 count) +{ + printf("vesa in port=%u\n", port); + return true; +} + +static bool vesa_pci_io_out(struct kvm *kvm, u16 port, void *data, int size, u32 count) +{ + printf("vesa out port=%u\n", port); + return true; +} + +static struct ioport_operations vesa_io_ops = { + .io_in = vesa_pci_io_in, + .io_out = vesa_pci_io_out, +}; + +static struct pci_device_header vesa_pci_device = { + .vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET, + .device_id = PCI_DEVICE_ID_VESA, + .header_type= PCI_HEADER_TYPE_NORMAL, + .revision_id= 0, + .class = 0x03, + .subsys_vendor_id = PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET, + .subsys_id = PCI_SUBSYSTEM_ID_VESA, + .bar[0] = IOPORT_VESA | PCI_BASE_ADDRESS_SPACE_IO, + .bar[1] = VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY, +}; + + +void vesa_mmio_callback(u64 addr, u8 *data, u32 len, u8 is_write) +{ + if (is_write) + memcpy(&videomem[addr - VESA_MEM_ADDR], data, len); + + return; +} + +void vesa__init(struct kvm *kvm) +{ + u8 dev, line, pin; + + if (irq__register_device(PCI_DEVICE_ID_VESA, &dev, &pin, &line) < 0) + return; + + vesa_pci_device.irq_pin = pin; + vesa_pci_device.irq_line = line; + pci__register(&vesa_pci_device, dev); + ioport__register(IOPORT_VESA, &vesa_io_ops, IOPORT_VESA_SIZE); + + kvm__register_mmio(VESA_MEM_ADDR, VESA_MEM_SIZE, &vesa_mmio_callback); +} + +/* + * This starts a VNC server to display the framebuffer. + * It's not altogether clear this belongs here rather than in kvm-run.c + */ +void *vesa__dovnc(void *v) +{ + /* +* Make a fake argc and argv because the getscreen function +* seems to want it. +*/ + int ac = 1; + char av[1][1] = {{0} }; + rfbScreenInfoPtr server; + + server = rfbGetScreen(&ac, (char **)av, VESA_WIDTH, VESA_HEIGHT, 8, 3, 4); + server->frameBuffer = (char *)videomem; + server->alwaysShared = TRUE; + rfbInitServer(server); + + while (rfbIsActive(server)) { + rfbMarkRectAsModified(server, 0, 0, VESA_WIDTH, VESA_HEIGHT); + rfbProcessEvents(server, server->deferUpdateTime * VESA_UPDATE_TIME); + } + return NULL; +} + diff --git a/tools/kvm/include/kvm/ioport.h b/tools/kvm/include/kvm/ioport.h index 218530c..8253938 100644 --- a/tools/kvm/include/kvm/ioport.h +++ b/tools/kvm/include/kvm/ioport.h @@ -7,6 +7,8 @@ /* some ports we reserve for own use */ #define IOPORT_DBG 0xe0 +#define IOPORT_VESA0xa200 +#define IOPORT_VESA_SIZE 256 #define IOPORT_VIRTIO_P9 0xb200 /* Virtio 9P device */ #define IOPORT_VIRTIO_P9_SIZE 256 #define IOPORT_VIRTIO_BLK 0xc200 /* Virtio block device */ diff --git a/tools/kvm/include/kvm/vesa.h b/tools/kvm/include/kvm/vesa.h new file mode 100644 index 000..dfa3d941 --- /dev/null +++ b/tools/kvm/include/kvm/vesa.h @@ -0,0 +1,31 @@ +#ifndef KVM__VESA_H +#define KVM__VESA_H + +#include + +#define VESA_WIDTH 640 +#define VESA_HEIGHT480 + +#define VESA_MEM_ADDR 0xd000 +#define VESA_MEM_SIZE (4*VESA_WIDTH*VESA_HEIGHT) +#define VESA_BPP 32 + +struct kvm; +struct int10args; + +#ifdef CONFIG_HAS_VNCSERVER +void vesa_mmio_callback(u64, u8*, u32, u8); +void vesa__init(struct kvm *self); +void *vesa__dovnc(void *); +#else +void vesa__init(struct kvm *self)
[PATCH 2/5] kvm tools: Add video mode to kernel initialization
Allow setting video mode in guest kernel. For possible values see Documentation/fb/vesafb.txt Signed-off-by: John Floren Signed-off-by: Sasha Levin --- tools/kvm/include/kvm/kvm.h |2 +- tools/kvm/kvm.c |7 --- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index 3cf6e6c..49ebd95 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -39,7 +39,7 @@ int kvm__max_cpus(struct kvm *kvm); void kvm__init_ram(struct kvm *kvm); void kvm__delete(struct kvm *kvm); bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename, - const char *initrd_filename, const char *kernel_cmdline); + const char *initrd_filename, const char *kernel_cmdline, u16 vidmode); void kvm__setup_bios(struct kvm *kvm); void kvm__start_timer(struct kvm *kvm); void kvm__stop_timer(struct kvm *kvm); diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 4393a41..7284211 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -320,7 +320,7 @@ static int load_flat_binary(struct kvm *kvm, int fd) static const char *BZIMAGE_MAGIC = "HdrS"; static bool load_bzimage(struct kvm *kvm, int fd_kernel, - int fd_initrd, const char *kernel_cmdline) + int fd_initrd, const char *kernel_cmdline, u16 vidmode) { struct boot_params *kern_boot; unsigned long setup_sects; @@ -383,6 +383,7 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, kern_boot->hdr.type_of_loader = 0xff; kern_boot->hdr.heap_end_ptr = 0xfe00; kern_boot->hdr.loadflags|= CAN_USE_HEAP; + kern_boot->hdr.vid_mode = vidmode; /* * Read initrd image into guest memory @@ -441,7 +442,7 @@ static bool initrd_check(int fd) } bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename, - const char *initrd_filename, const char *kernel_cmdline) + const char *initrd_filename, const char *kernel_cmdline, u16 vidmode) { bool ret; int fd_kernel = -1, fd_initrd = -1; @@ -459,7 +460,7 @@ bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename, die("%s is not an initrd", initrd_filename); } - ret = load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline); + ret = load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline, vidmode); if (initrd_filename) close(fd_initrd); -- 1.7.5.rc3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] kvm tools: Add BIOS INT10 handler
INT10 handler is a basic implementation of BIOS video services. The handler implements a VESA interface which is initialized at the very beginning of loading the kernel. Signed-off-by: John Floren Signed-off-by: Sasha Levin --- tools/kvm/bios/bios-rom.S | 56 tools/kvm/bios/int10.c| 161 + 2 files changed, 189 insertions(+), 28 deletions(-) create mode 100644 tools/kvm/bios/int10.c diff --git a/tools/kvm/bios/bios-rom.S b/tools/kvm/bios/bios-rom.S index 8a53dcd..b636cb8 100644 --- a/tools/kvm/bios/bios-rom.S +++ b/tools/kvm/bios/bios-rom.S @@ -27,36 +27,36 @@ ENTRY_END(bios_intfake) * We ignore bx settings */ ENTRY(bios_int10) - test $0x0e, %ah - jne 1f + pushw %fs + pushl %es + pushl %edi + pushl %esi + pushl %ebp + pushl %esp + pushl %edx + pushl %ecx + pushl %ebx + pushl %eax + + movl%esp, %eax + /* this is way easier than doing it in assembly */ + /* just push all the regs and jump to a C handler */ + callint10handler + + popl%eax + popl%ebx + popl%ecx + popl%edx + popl%esp + popl%ebp + popl%esi + popl%edi + popl%es + popw%fs -/* - * put char in AL at current cursor and - * increment cursor position - */ -putchar: - stack_swap - - push %fs - push %bx - - mov $VGA_RAM_SEG, %bx - mov %bx, %fs - mov %cs:(cursor), %bx - mov %al, %fs:(%bx) - inc %bx - test $VGA_PAGE_SIZE, %bx - jb putchar_new - xor %bx, %bx -putchar_new: - mov %bx, %fs:(cursor) - - pop %bx - pop %fs - - stack_restore -1: IRET + + /* * private IRQ data */ diff --git a/tools/kvm/bios/int10.c b/tools/kvm/bios/int10.c new file mode 100644 index 000..98205c3 --- /dev/null +++ b/tools/kvm/bios/int10.c @@ -0,0 +1,161 @@ +#include "kvm/segment.h" +#include "kvm/bios.h" +#include "kvm/util.h" +#include "kvm/vesa.h" +#include + +#define VESA_MAGIC ('V' + ('E' << 8) + ('S' << 16) + ('A' << 24)) + +struct int10args { + u32 eax; + u32 ebx; + u32 ecx; + u32 edx; + u32 esp; + u32 ebp; + u32 esi; + u32 edi; + u32 es; +}; + +/* VESA General Information table */ +struct vesa_general_info { + u32 signature; /* 0 Magic number = "VESA" */ + u16 version;/* 4 */ + void *vendor_string;/* 6 */ + u32 capabilities; /* 10 */ + void *video_mode_ptr; /* 14 */ + u16 total_memory; /* 18 */ + + u8 reserved[236]; /* 20 */ +} __attribute__ ((packed)); + + +struct vminfo { + u16 mode_attr; /* 0 */ + u8 win_attr[2];/* 2 */ + u16 win_grain; /* 4 */ + u16 win_size; /* 6 */ + u16 win_seg[2]; /* 8 */ + u32 win_scheme; /* 12 */ + u16 logical_scan; /* 16 */ + + u16 h_res; /* 18 */ + u16 v_res; /* 20 */ + u8 char_width; /* 22 */ + u8 char_height;/* 23 */ + u8 memory_planes; /* 24 */ + u8 bpp;/* 25 */ + u8 banks; /* 26 */ + u8 memory_layout; /* 27 */ + u8 bank_size; /* 28 */ + u8 image_planes; /* 29 */ + u8 page_function; /* 30 */ + + u8 rmask; /* 31 */ + u8 rpos; /* 32 */ + u8 gmask; /* 33 */ + u8 gpos; /* 34 */ + u8 bmask; /* 35 */ + u8 bpos; /* 36 */ + u8 resv_mask; /* 37 */ + u8 resv_pos; /* 38 */ + u8 dcm_info; /* 39 */ + + u32 lfb_ptr;/* 40 Linear frame buffer address */ + u32 offscreen_ptr; /* 44 Offscreen memory address */ + u16 offscreen_size; /* 48 */ + + u8 reserved[206]; /* 50 */ +}; + +char oemstring[11] = "KVM VESA"; +u16 modes[2] = { 0x0112, 0x }; + +static inline void outb(unsigned short port, unsigned char val) +{ + asm volatile("outb %0, %1" : : "a"(val), "Nd"(port)); +} + +/* + * It's probably much more useful to make this print to the serial + * line rather than print to a non-displayed VGA memory + */ +static inline void int10putchar(struct int10args *args) +{ + u8 al, ah; + + al = args->eax & 0xFF; + ah = (args->eax & 0xFF00) >> 8; + + outb(0x3f8, al);
[PATCH][RESEND] KVM: Clean up error handling during VCPU creation
So far kvm_arch_vcpu_setup is responsible for freeing the vcpu struct if it fails. Move this confusing resonsibility back into the hands of kvm_vm_ioctl_create_vcpu. Only kvm_arch_vcpu_setup of x86 is affected, all other archs cannot fail. Signed-off-by: Jan Kiszka --- arch/x86/kvm/x86.c |5 - virt/kvm/kvm_main.c | 11 ++- 2 files changed, 6 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index da48622..aaa3735 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6126,12 +6126,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) if (r == 0) r = kvm_mmu_setup(vcpu); vcpu_put(vcpu); - if (r < 0) - goto free_vcpu; - return 0; -free_vcpu: - kvm_x86_ops->vcpu_free(vcpu); return r; } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 3962899..8de7208 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1612,18 +1612,18 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) r = kvm_arch_vcpu_setup(vcpu); if (r) - return r; + goto vcpu_destroy; mutex_lock(&kvm->lock); if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) { r = -EINVAL; - goto vcpu_destroy; + goto unlock_vcpu_destroy; } kvm_for_each_vcpu(r, v, kvm) if (v->vcpu_id == id) { r = -EEXIST; - goto vcpu_destroy; + goto unlock_vcpu_destroy; } BUG_ON(kvm->vcpus[atomic_read(&kvm->online_vcpus)]); @@ -1633,7 +1633,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) r = create_vcpu_fd(vcpu); if (r < 0) { kvm_put_kvm(kvm); - goto vcpu_destroy; + goto unlock_vcpu_destroy; } kvm->vcpus[atomic_read(&kvm->online_vcpus)] = vcpu; @@ -1647,8 +1647,9 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) mutex_unlock(&kvm->lock); return r; -vcpu_destroy: +unlock_vcpu_destroy: mutex_unlock(&kvm->lock); +vcpu_destroy: kvm_arch_vcpu_destroy(vcpu); return r; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu-kvm: Fix non-ISA IRQ routing in kernel irqchip mode
Merge regression of d1dcf63406: The KVM i8259 believes it is also an IOAPIC and takes all GSIs. Until we refactor this, work around it by avoiding the isa_irq_handler dispatcher in kernel irqchip mode. Signed-off-by: Jan Kiszka --- hw/pc_piix.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 66c5e04..7af03fa 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -131,7 +131,11 @@ static void pc_init1(ram_addr_t ram_size, if (pci_enabled) { ioapic_init(isa_irq_state); } -isa_irq = qemu_allocate_irqs(isa_irq_handler, isa_irq_state, 24); +if (!(kvm_enabled() && kvm_irqchip_in_kernel())) { +isa_irq = qemu_allocate_irqs(isa_irq_handler, isa_irq_state, 24); +} else { +isa_irq = i8259; +} if (pci_enabled) { if (!xen_enabled()) { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] KVM updates for 2.6.40
Linus, please pull from: git://git.kernel.org/pub/scm/virt/kvm/kvm.git kvm-updates/2.6.40 to receive the KVM updates for the 2.6.40 cycle. Changes this time include emulator correctness (segment checks, nested SVM intercepts), 16-byte MMIO, Via CPU feature support, virtual TSC rate for newer AMD processors, better RCU integration, and performance improvements. Changelog/diffstat (includes already-merged RCU commits): Avi Kivity (56): KVM: Use kvm_get_rflags() and kvm_set_rflags() instead of the raw versions KVM: VMX: Optimize vmx_get_rflags() KVM: VMX: Optimize vmx_get_cpl() KVM: VMX: Cache cpl KVM: VMX: Avoid vmx_recover_nmi_blocking() when unneeded KVM: VMX: Qualify check for host NMI KVM: VMX: Refactor vmx_complete_atomic_exit() KVM: VMX: Don't VMREAD VM_EXIT_INTR_INFO unconditionally KVM: VMX: Use cached VM_EXIT_INTR_INFO in handle_exception KVM: VMX: simplify NMI mask management KVM: extend in-kernel mmio to handle >8 byte transactions KVM: Split mmio completion into a function KVM: 16-byte mmio support KVM: x86 emulator: do not munge rep prefix KVM: x86 emulator: define callbacks for using the guest fpu within the emulator KVM: x86 emulator: Specialize decoding for insns with 66/f2/f3 prefixes KVM: x86 emulator: SSE support KVM: x86 emulator: implement movdqu instruction (f3 0f 6f, f3 0f 7f) KVM: x86 emulator: add framework for instruction intercepts KVM: x86 emulator: add SVM intercepts KVM: x86 emulator: Re-add VendorSpecific tag to VMMCALL insn KVM: x86 emulator: Drop EFER.SVME requirement from VMMCALL KVM: x86 emulator: Add helpers for memory access using segmented addresses KVM: x86 emulator: move invlpg emulation into a function KVM: x86 emulator: change address linearization to return an error code KVM: x86 emulator: pass access size and read/write intent to linearize() KVM: x86 emulator: move linearize() downwards KVM: x86 emulator: move desc_limit_scaled() KVM: x86 emulator: implement segment permission checks KVM: x86 emulator: whitespace cleanups KVM: x86 emulator: drop vcpu argument from memory read/write callbacks KVM: x86 emulator: drop vcpu argument from pio callbacks KVM: x86 emulator: drop vcpu argument from segment/gdt/idt callbacks KVM: x86 emulator: drop vcpu argument from cr/dr/cpl/msr callbacks KVM: x86 emulator: drop vcpu argument from intercept callback KVM: x86 emulator: avoid using ctxt->vcpu in check_perm() callbacks KVM: x86 emulator: add and use new callbacks set_idt(), set_gdt() KVM: x86 emulator: drop use of is_long_mode() KVM: x86 emulator: Replace calls to is_pae() and is_paging with ->get_cr() KVM: x86 emulator: emulate CLTS internally KVM: x86 emulator: make emulate_invlpg() an emulator callback KVM: x86 emulator: add new ->halt() callback KVM: x86 emulator: add ->fix_hypercall() callback KVM: x86 emulator: add new ->wbinvd() callback KVM: Avoid using x86_emulate_ctxt.vcpu KVM: x86 emulator: drop x86_emulate_ctxt::vcpu KVM: x86 emulator: move 0F 01 sub-opcodes into their own functions KVM: x86 emulator: Don't force #UD for 0F 01 /5 KVM: x86 emulator: Use opcode::execute for 0F 01 opcode KVM: SVM: Get rid of x86_intercept_map::valid KVM: MMU: Add unlikely() annotations to walk_addr_generic() KVM: x86 emulator: consolidate group handling KVM: VMX: Avoid reading %rip unnecessarily when handling exceptions KVM: x86 emulator: consolidate segment accessors KVM: VMX: Cache vmcs segment fields Merge commit '29ce83181dd757d3116bf774aafffc4b6b20' into next Bharat Bhushan (1): KVM: PPC: Fix issue clearing exit timing counters brill...@viatech.com.cn (1): KVM: Add CPUID support for VIA CPU Clemens Noss (1): KVM: x86 emulator: avoid calling wbinvd() macro Duan Jiong (2): KVM: remove useless function declarations from file arch/x86/kvm/irq.h KVM: remove useless function declaration kvm_inject_pit_timer_irqs() Glauber Costa (1): KVM: expose async pf through our standard mechanism Gleb Natapov (8): KVM: x86: better fix for race between nmi injection and enabling nmi window KVM: x86 emulator: do not open code return values from the emulator KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu KVM: mmio_fault_cr2 is not used KVM: emulator: Propagate fault in far jump emulation KVM: Fix compound mmio KVM: call cache_all_regs() only once during instruction emulation KVM: make guest mode entry to be rcu quiescent state Jan Kiszka (2): KVM: SVM: Remove unused svm_features KVM: VMX: Ensure that vmx_create_vcpu always returns proper error Jeff Mahoney (2): KVM: Fix off by one in kvm_for_each_vcpu
[PATCH 4/5] KVM test: setup tap fd and pass it to qemu-kvm v2
We used to use qemu-ifup to manage the tap which have several limitations: 1) If we want to specify a bridge, we must create a customized qemu-ifup file as the default script always match the first bridge. 2) It's hard to add support for macvtap device. So this patch let kvm subtest control the tap creation and setup then pass it to qemu-kvm. User could specify the bridge he want to used in configuration file. The original autoconfiguration was changed by private bridge setup. Changes from v1: * Combine the private bridge config and TAP fd in one patchset, dropped the "auto" mode * Close TAP fds on VM.destroy() (thanks to Amos Kong for finding the problem) Signed-off-by: Jason Wang Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/scripts/qemu-ifup | 11 -- client/virt/kvm_vm.py | 60 client/virt/virt_utils.py | 11 -- 3 files changed, 47 insertions(+), 35 deletions(-) delete mode 100755 client/tests/kvm/scripts/qemu-ifup diff --git a/client/tests/kvm/scripts/qemu-ifup b/client/tests/kvm/scripts/qemu-ifup deleted file mode 100755 index c4debf5..000 --- a/client/tests/kvm/scripts/qemu-ifup +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/sh - -# The following expression selects the first bridge listed by 'brctl show'. -# Modify it to suit your needs. -switch=$(/usr/sbin/brctl show | awk 'NR==2 { print $1 }') - -/bin/echo 1 > /proc/sys/net/ipv6/conf/${switch}/disable_ipv6 -/sbin/ifconfig $1 0.0.0.0 up -/usr/sbin/brctl addif ${switch} $1 -/usr/sbin/brctl setfd ${switch} 0 -/usr/sbin/brctl stp ${switch} off diff --git a/client/virt/kvm_vm.py b/client/virt/kvm_vm.py index 57fc61b..5b1a27b 100644 --- a/client/virt/kvm_vm.py +++ b/client/virt/kvm_vm.py @@ -7,7 +7,7 @@ Utility classes and functions to handle Virtual Machine creation using qemu. import time, os, logging, fcntl, re, commands, glob from autotest_lib.client.common_lib import error from autotest_lib.client.bin import utils -import virt_utils, virt_vm, kvm_monitor, aexpect +import virt_utils, virt_vm, virt_test_setup, kvm_monitor, aexpect class VM(virt_vm.BaseVM): @@ -41,6 +41,7 @@ class VM(virt_vm.BaseVM): self.pci_assignable = None self.netdev_id = [] self.device_id = [] +self.tapfds = [] self.uuid = None @@ -231,19 +232,17 @@ class VM(virt_vm.BaseVM): cmd += ",id='%s'" % device_id return cmd -def add_net(help, vlan, mode, ifname=None, script=None, -downscript=None, tftp=None, bootfile=None, hostfwd=[], -netdev_id=None, netdev_extra_params=None): +def add_net(help, vlan, mode, ifname=None, tftp=None, bootfile=None, +hostfwd=[], netdev_id=None, netdev_extra_params=None, +tapfd=None): if has_option(help, "netdev"): cmd = " -netdev %s,id=%s" % (mode, netdev_id) if netdev_extra_params: cmd += ",%s" % netdev_extra_params else: cmd = " -net %s,vlan=%d" % (mode, vlan) -if mode == "tap": -if ifname: cmd += ",ifname='%s'" % ifname -if script: cmd += ",script='%s'" % script -cmd += ",downscript='%s'" % (downscript or "no") +if mode == "tap" and tapfd: +cmd += ",fd=%d" % tapfd elif mode == "user": if tftp and "[,tftp=" in help: cmd += ",tftp='%s'" % tftp @@ -413,20 +412,22 @@ class VM(virt_vm.BaseVM): qemu_cmd += add_nic(help, vlan, nic_params.get("nic_model"), mac, device_id, netdev_id, nic_params.get("nic_extra_params")) # Handle the '-net tap' or '-net user' or '-netdev' part -script = nic_params.get("nic_script") -downscript = nic_params.get("nic_downscript") tftp = nic_params.get("tftp") -if script: -script = virt_utils.get_path(root_dir, script) -if downscript: -downscript = virt_utils.get_path(root_dir, downscript) if tftp: tftp = virt_utils.get_path(root_dir, tftp) -qemu_cmd += add_net(help, vlan, nic_params.get("nic_mode", "user"), -vm.get_ifname(vlan), -script, downscript, tftp, +if nic_params.get("nic_mode") == "tap": +try: +tapfd = vm.tapfds[vlan] +except IndexError: +tapfd = None +else: +tapfd = None +qemu_cmd += add_net(help, vlan, +nic_params.get("nic_mode", "user"), +vm.get_ifname(vlan), tftp, nic_params.get("bootp"), redirs, netdev_id, -
[PATCH 2/2] KVM Test: Add a subtest lvm
From: Qingtang Zhou Changes from v1: * Made the test use more current kvm autotest api, namely: - Error contexts, and session.cmd for shorter, cleaner code - Removed pre command, as the functionality needed for image_create was implemented on the previous patch Signed-off-by: Lucas Meneghel Rodrigues This test sets up an lvm over two images and then format the lvm and finally checks the fs using fsck. Signed-off-by: Yolkfull Chow Remove the progress of filling up. Add a params of clean which could prevent the umount and volume removing command and let this case usd by the following benchmark or stress test. Add the dbench into the lvm tests. Signed-off-by: Jason Wang This test depends on fillup_disk test and ioquit test. Signed-off-by: Qingtang Zhou --- client/tests/kvm/tests_base.cfg.sample | 48 ++ client/virt/tests/lvm.py | 84 2 files changed, 132 insertions(+), 0 deletions(-) create mode 100644 client/virt/tests/lvm.py diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 5713513..d1a188d 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -879,6 +879,46 @@ variants: fillup_cmd = "dd if=/dev/zero of=/%s/fillup.%d bs=%dM count=1 oflag=direct" kill_vm = yes +- lvm: +only Linux +images += ' stg1 stg2' +image_name_stg1 = storage_4k +image_cluster_size_stg1 = 4096 +image_size_stg1 = 1G +image_format_stg1 = qcow2 +image_name_stg2 = storage_64k +image_cluster_size_stg2 = 65536 +image_size_stg2 = 1G +image_format_stg2 = qcow2 +guest_testdir = /mnt +disks = "/dev/sdb /dev/sdc" +kill_vm = no +post_command_noncritical = no +variants: +lvm_create: +type = lvm +force_create_image_stg1 = yes +force_create_image_stg2 = yes +clean = no +lvm_fill: lvm_create +type = fillup_disk +force_create_image_stg1 = no +force_create_image_stg2 = no +guest_testdir = /mnt/kvm_test_lvm +fillup_timeout = 120 +fillup_size = 20 +fillup_cmd = "dd if=/dev/zero of=%s/fillup.%d bs=%dM count=1 oflag=direct" +lvm_ioquit: lvm_create +type = ioquit +force_create_image_stg1 = no +force_create_image_stg2 = no +kill_vm = yes +background_cmd = "for i in 1 2 3 4; do (dd if=/dev/urandom of=/mnt/kvm_test_lvm/file bs=102400 count=1000 &); done" +check_cmd = pgrep dd +clean = yes +remove_image_stg1 = yes +remove_image_stg2 = yes + - ioquit: only Linux type = ioquit @@ -1656,6 +1696,8 @@ variants: md5sum_1m_cd1 = 127081cbed825d7232331a2083975528 fillup_disk: fillup_cmd = "dd if=/dev/zero of=/%s/fillup.%d bs=%dM count=1" +lvm.lvm_fill: +fillup_cmd = "dd if=/dev/zero of=/%s/fillup.%d bs=%dM count=1" - 4.7.x86_64: no setup autotest @@ -1677,6 +1719,8 @@ variants: md5sum_1m_cd1 = 58fa63eaee68e269f4cb1d2edf479792 fillup_disk: fillup_cmd = "dd if=/dev/zero of=/%s/fillup.%d bs=%dM count=1" +lvm.lvm_fill: +fillup_cmd = "dd if=/dev/zero of=/%s/fillup.%d bs=%dM count=1" - 4.8.i386: no setup autotest @@ -1696,6 +1740,8 @@ variants: sys_path = "/sys/class/net/%s/driver" fillup_disk: fillup_cmd = "dd if=/dev/zero of=/%s/fillup.%d bs=%dM count=1" +lvm.lvm_fill: +fillup_cmd = "dd if=/dev/zero of=/%s/fillup.%d bs=%dM count=1" - 4.8.x86_64: @@ -1716,6 +1762,8 @@ variants: sys_path = "/sys/class/net/%s/driver" fillup_disk: fillup_cmd = "dd if=/dev/zero of=/%s/fillup.%d bs=%dM count=1" +lvm.lvm_fill: +fillup_cmd = "dd if=/dev/zero of=/%s/fillup.%d bs=%dM count=1" - 5.3.i386: diff --git a/client/virt/tests/lvm.py b/client/virt/tests/lvm.py new file mode 100644 index 000..d171747 --- /dev/null +++ b/client/virt/tests/lvm.py @@ -0,0 +1,84 @@ +import logging, os +from autotest_lib.client.common_lib import error + + +@error.context_aware +def mount_lv(lv_path, session): +error.context("mount
[PATCH 1/2] client.virt.virt_vm: Make it possible to specify cluster size for image
For some tests, we need to specify image cluster size for a given image. Make it possible to specify it so qemu-img is called with the right parameters. This way we can state things like: images += ' stg1 stg2' image_name_stg1 = storage_4k image_cluster_size_stg1 = 4096 image_format_stg1 = qcow2 image_name_stg2 = storage_64k image_cluster_size_stg2 = 65536 image_format_stg2 = qcow2 in the configuration file for a test Signed-off-by: Lucas Meneghel Rodrigues --- client/virt/virt_vm.py |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/client/virt/virt_vm.py b/client/virt/virt_vm.py index 983ee02..7236218 100644 --- a/client/virt/virt_vm.py +++ b/client/virt/virt_vm.py @@ -218,6 +218,7 @@ def create_image(params, root_dir): @note: params should contain: image_name -- the name of the image file, without extension image_format -- the format of the image (qcow2, raw etc) + image_cluster_size (optional) -- the cluster size for the image image_size -- the requested size of the image (a string qemu-img can understand, such as '10G') """ @@ -228,6 +229,10 @@ def create_image(params, root_dir): format = params.get("image_format", "qcow2") qemu_img_cmd += " -f %s" % format +image_cluster_size = params.get("image_cluster_size", None) +if image_cluster_size is not None: +qemu_img_cmd += " -o cluster_size=%s" % image_cluster_size + image_filename = get_image_filename(params, root_dir) qemu_img_cmd += " %s" % image_filename -- 1.7.5.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] KVM test: Add helpers to control the TAP/bridge
On Mon, 2011-05-23 at 14:16 +0800, Amos Kong wrote: > On Sat, May 21, 2011 at 01:23:27AM -0300, Lucas Meneghel Rodrigues wrote: > > This patch adds some helpers to assist virt test to setup the bridge or > > macvtap based guest networking. > > > > Changes from v1: > > * Fixed undefined variable errors on the exception class definitions > > > > Signed-off-by: Jason Wang > > Signed-off-by: Lucas Meneghel Rodrigues > > --- > > client/virt/virt_utils.py | 218 > > + > > 1 files changed, 218 insertions(+), 0 deletions(-) > > > > diff --git a/client/virt/virt_utils.py b/client/virt/virt_utils.py > > index 5510c89..96b9c84 100644 > > --- a/client/virt/virt_utils.py > > +++ b/client/virt/virt_utils.py > > @@ -6,6 +6,7 @@ KVM test utility functions. > > > > import time, string, random, socket, os, signal, re, logging, commands, > > cPickle > > import fcntl, shelve, ConfigParser, threading, sys, UserDict, inspect > > +import struct > > from autotest_lib.client.bin import utils, os_dep > > from autotest_lib.client.common_lib import error, logging_config > > import rss_client, aexpect > > @@ -15,6 +16,20 @@ try: > > except ImportError: > > KOJI_INSTALLED = False > > > > +# From include/linux/sockios.h > > +SIOCSIFHWADDR = 0x8924 > > +SIOCGIFHWADDR = 0x8927 > > +SIOCSIFFLAGS = 0x8914 > > +SIOCGIFINDEX = 0x8933 > > +SIOCBRADDIF = 0x89a2 > > +# From linux/include/linux/if_tun.h > > +TUNSETIFF = 0x400454ca > > +TUNGETIFF = 0x800454d2 > > +TUNGETFEATURES = 0x800454cf > > +IFF_UP = 0x1 > > +IFF_TAP = 0x0002 > > +IFF_NO_PI = 0x1000 > > +IFF_VNET_HDR = 0x4000 > > > > def _lock_file(filename): > > f = open(filename, "w") > > @@ -36,6 +51,76 @@ def is_vm(obj): > > return obj.__class__.__name__ == "VM" > > > > > > +class NetError(Exception): > > +pass > > + > > + > > +class TAPModuleError(NetError): > > +def __init__(self, devname): > > +NetError.__init__(self, devname) > > +self.devname = devname > > + > > +def __str__(self): > > +return "Can't open %s" % self.devname > > + > > +class TAPNotExistError(NetError): > > +def __init__(self, ifname): > > +NetError.__init__(self, ifname) > > +self.ifname = ifname > > + > > +def __str__(self): > > +return "Interface %s does not exist" % self.ifname > > + > > + > > +class TAPCreationError(NetError): > > +def __init__(self, ifname): > > +NetError.__init__(self, ifname) > > +self.ifname = ifname > > + > > +def __str__(self): > > +return "Cannot create TAP device %s" % self.ifname > > + > > + > > +class TAPBringUpError(NetError): > > +def __init__(self, ifname): > > +NetError.__init__(self, ifname) > > +self.ifname = ifname > > + > > +def __str__(self): > > +return "Cannot bring up TAP %s" % self.ifname > > + > > + > > +class BRAddIfError(NetError): > > +def __init__(self, ifname, brname, details): > > +NetError.__init__(self, ifname, brname, details) > > +self.ifname = ifname > > +self.brname = brname > > +self.details = details > > + > > +def __str__(self): > > +return ("Can not add if %s to bridge %s: %s" % > > +(self.ifname, self.brname, self.details)) > > + > > + > > +class HwAddrSetError(NetError): > > +def __init__(self, ifname, mac): > > +NetError.__init__(self, ifname, mac) > > +self.ifname = ifname > > +self.mac = mac > > + > > +def __str__(self): > > +return "Can not set mac %s to interface %s" % (self.mac, > > self.ifname) > > + > > + > > +class HwAddrGetError(NetError): > > +def __init__(self, ifname): > > +NetError.__init__(self, ifname) > > +self.ifname = ifname > > + > > +def __str__(self): > > +return "Can not get mac of interface %s" % self.ifname > > + > > + > > class Env(UserDict.IterableUserDict): > > """ > > A dict-like object containing global objects used by tests. > > @@ -2307,3 +2392,136 @@ def install_host_kernel(job, params): > > else: > > logging.info('Chose %s, using the current kernel for the host', > > install_type) > > + > > + > > +def bridge_auto_detect(): > > +""" > > +Automatically detect a bridge for tap through brctl. > > +""" > > +try: > > +brctl_output = utils.system_output("ip route list", > > + retain_output=True) > > +brname = re.findall("default.*dev (.*) ", brctl_output)[0] > > +except: > > +raise BRAutoDetectError > > +return brname > > + > > + > > +def if_nametoindex(ifname): > > +""" > > +Map an interface name into its corresponding index. > > +Returns 0 on error, as 0 is not a valid index > > + > > +@param ifname: interface name > > +""" > > +index = 0 > > +ctrl_sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, 0) > > +if