Re: [kvm-devel] kvm-45 problems
Zhao, Yunfeng wrote: Hi,Avi With latest kvm commits, the SMP linux guests causes host soft lock issues can not be reproduced on my machine. Have you fixed it? Not intentionally... Maybe the rmap fix is somehow responsible. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] kernel device reset support
Dong, Eddie wrote: Current VP wake up logic thru INIT/SIPI doesn't support this when irqchip in kernel. Doesn't this code imply that waiting for SIPI is supported? It is supported to wake up VCPU in kernel, but can't wake up the VCPU in user level since irqchip_in_kernel is TRUE here. vcpu-mp_state doesn't export to user level. We never sleep in user level if irqchip_in_kernel(). So the thread will eventually go back to kernel mode. You can put a goto to the top of the loop to redo the mmu reload. In any case you need to do that because you don't want to execute the reset code with interrupts and preemption disabled. A goto cross function? It is too aggresive and bad code style IMO. The vcpu-request check is in __vcpu_run, while entering block state is in its parent function kvm_vcpu_ioctl_run. goto the label 'again' in __vcpu_run(), which has the call to kvm_mmu_reload(). But if you want, we can return a special value, say REQUEST_INTERNAL_LOOP, to kvm_vcpu_ioctl_run and let kvm_vcpu_ioctl_run use sepcial logic to do goto within function if it see the special return value REQUEST_INTERNAL_LOOP. But is it cleaner? Also we will add more kernel to user EXIT reason, such as RESET request from kernel sensored guest tripple fault etc. There is already a triple fault exit code. The VCPU may be executing in kernel still, which may modify kernel device state. E.g. A VCPU may be doing PIO emulating. In that case we will wait when taking kvm-lock. Lock doesn't help. Lock can only avoid no 2+ modifcation in same time. But what we care if all other VCPUs can't do modification after BSP do device reset. It is different semantics. Maybe you are still arguing it is the AP who do RESET ops. Let us go to next discussion first. We first halt all vcpus, then take the lock. So: - other processors won't start after the device reset because they are halted - we won't do the reset concurrently with other processors because of the lock If BSP reset the kernel devices earlier than the VCPU modify the device state, we are in trouble. No, VCPU0 (BSP) is current VCPU (though you don't have the current vcpu parameter explicitly) like mentioned in previous mail and as pre-requirement of user level change. Please refer my abswer above of this mail. We can't rely on user space not to cause host kernel corruption. ??? Even an AP trigger RESET, it just sets a reset_request flag in user level. It is another VCPU who will execute RESET operation. It seems the argument is who should do the RESET operation, say RST_CPU. BSP only or AP too. For me, since after RESET only BSP can execute, and the thread executing qemu_system_reset will continously execute (after RESET kernel) per current Qemu code, so what we can do is: 1: RST_CPU=BSP. Then BSP does qemu_system_reset, or 2: RST_CPU = AP, say RAP, does qemu_system_reset, user level then need to block RAP after qemu_system_reset and wake up BSP to take over. A point here we can't blcok RAP in case 2 at kernel RESET time, since kernel RESET may be not the last step of qemu_system_reset. It may go to kernel again. If we go with #1, just 1 line change as in my previous mail. If we go with #2, we have to add a new ABI for the AP to enter kernel wait for INIT/SIPI/SIPI state, otherwise normal INIT/SIPI/SIPI couldn't wake it up. I see much complicate in #2 while #1 has same functionality but simple. My view is: - vcpu threads never sleep in userspace. they will always eventually end up in the kernel so we can stop or restart them there - reset is a platform API so it can't be dependent on which vcpu thread executes it (if any; it may be executed from an unrelated thread, remember we plan to separate the qemu signal handling code into a separate thread) - we already have a way to send messages to other vcpus So it seems to me everything is in place to make it fairly simple. I'll try writing a patch that does what I mean and post it. Either I'll convince you that in-kernel is simpler, or I'll convince myself that it is harder. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] RFC/patch portability: split kvm_vcpu_ioctl v1
Carsten Otte wrote: Avi Kivity wrote: Applied, thanks. I renamed kvm_vcpu_load() and kvm_vcpu_put() back to vcpu_load() and vcpu_put() in order to keep the patch small and simple, and because I'm emotionally attached to the original names. Oh, I think I had a very good reason for renaming it: it's no longer static, and thus part of the kernel's global namespace in case kvm is built-in. As far as I know, modules are expected to prefix any symbol they use with their module name. I am sorry for the emotional part of it, I tend to stick to old names too once I got used to them. In case you decide you want kvm_vcpu_load/put again, let me know so that I can supply a patch on top of git that renames it. I agree 100%, I'm just using the keep the patch dead simple excuse to delay the change. We can have a 'add kvm_ prefix' patch round later. Let's complete the separation first. There are some bigger offenders too, like set_crX(), which don't even start with the magic V and are exported to modules. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] Expose infrastructure for unpinning guest memory
Anthony Liguori wrote: Now that we have userspace memory allocation, I wanted to play with ballooning. The idea is that when a guest balloons down, we simply unpin the underlying physical memory and the host kernel may or may not swap it. To reclaim ballooned memory, the guest can just start using it and we'll pin it on demand. The following patch is a stab at providing the right infrastructure for pinning and automatic repinning. I don't have a lot of comfort in the MMU code so I thought I'd get some feedback before going much further. gpa_to_hpa is a little awkward to hook, but it seems like the right place in the code. I'm most uncertain about the SMP safety of the unpinning. Presumably, I have to hold the kvm lock around the mmu_unshadow and page_cache release to ensure that another VCPU doesn't fault the page back in after mmu_unshadow? One we have true swapping capabilities (which imply ability for the kernel to remove a page from the shadow page tables) you can unpin by calling munmap() or madvise(MADV_REMOVE) on the pages to be unpinned. Other than that the approach seems right. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Patch] [0/3] Patches to support new architectures.
Avi Kivity wrote: Zhang, Xiantao wrote: x86 will continue to use kvm_x86_ops for that purposes. But other archs should not. x86 will use both mechanisms: first, linkage will select the x86 function, and then kvm_x86_ops will be used to select the implementation dependent code. The two levels are very different as kvm_x86_ops is very low level and x86 specific. Hi Avi, Maybe linkage is a better choice. But if we need to maintain two different implmentation for different archs, it may introduce unnecessary effort. In addition, I can't figure out any disadvantages with function pointers, moreover, it makes source uniform for all architectures, though it is not very necessary. Linkage is more efficient (though I don't think we'll be able to measure the difference) and is also the traditional way of doing things in Linux. I don't see why it causes extra effort. Can you explain? I orgirnally mean we have to wrap all functions related to kvm_x86_ops. But seems it doesn't introduce extra maintain effort, if other architectures implment these functions directly. Good method! Thanks Xiantao - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [ kvm-Bugs-1812043 ] Cannot boot 32bit smp RHEL5.1 guest
Bugs item #1812043, was opened at 2007-10-12 14:41 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1812043group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: yunfeng (yunfeng) Assigned to: Nobody/Anonymous (nobody) Summary: Cannot boot 32bit smp RHEL5.1 guest Initial Comment: 32bit smp RHEL5.1 guest cannot boot on 32bit host and 64bit host. With -no-kvm-irqchip it also fails. But if booted it with UP or with -no-kvm, it has no any problem. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1812043group_id=180599 - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Patch] [0/3] Patches to support new architectures.
Zhang, Xiantao wrote: x86 will continue to use kvm_x86_ops for that purposes. But other archs should not. x86 will use both mechanisms: first, linkage will select the x86 function, and then kvm_x86_ops will be used to select the implementation dependent code. The two levels are very different as kvm_x86_ops is very low level and x86 specific. Hi Avi, Maybe linkage is a better choice. But if we need to maintain two different implmentation for different archs, it may introduce unnecessary effort. In addition, I can't figure out any disadvantages with function pointers, moreover, it makes source uniform for all architectures, though it is not very necessary. Linkage is more efficient (though I don't think we'll be able to measure the difference) and is also the traditional way of doing things in Linux. I don't see why it causes extra effort. Can you explain? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [ kvm-Bugs-1812050 ] segfault while booting 64bit linux with 4GB mem
Bugs item #1812050, was opened at 2007-10-12 15:00 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1812050group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: yunfeng (yunfeng) Assigned to: Nobody/Anonymous (nobody) Summary: segfault while booting 64bit linux with 4GB mem Initial Comment: Segment fault happens while booting a 64bit linux guest with 4GB mem. Here is the error message: qemu-system-x86[8052]: segfault at 2b9d3be19000 rip 003d35876d40 rsp 7fff6d586ce8 error 4 The guest is installed with RHEL4U3, the kernel is 2.6.9-34.EL The host machine is a harwitch/paxville (16LPs), mem is 8GB. Here is the command line: qemu-system-x86_64 . -m 4096 -net nic,macaddr=00:16:3e:17:fa:66,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/tmp-img_CPL_MEM_05_1192172400_1 -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1812050group_id=180599 - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] kernel device reset support
I'll try writing a patch that does what I mean and post it. Either I'll convince you that in-kernel is simpler, or I'll convince myself that it is harder. OK, let us see which one is simple. BTW, you have swapped to N+1 SMP model in this discussion which is not there yet. And this is the difference. For me N+1 model means the device emulation will go to the N+1 thread while all CPU execution will still be in its own thread. It looks like you are proposing different. Eddie - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [ kvm-Bugs-1812072 ] Cannot boot 64bit Vista
Bugs item #1812072, was opened at 2007-10-12 15:36 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1812072group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: yunfeng (yunfeng) Assigned to: Nobody/Anonymous (nobody) Summary: Cannot boot 64bit Vista Initial Comment: I cannot boot 64bit UP Vista. It turns a black windows after booting for while. The issue doesn't exist if adding -no-kvm-irqchip. Here is the command: qemu-system-x86_64 . -m 1024-net nic,macaddr=00:16:3e:3f:02:d6,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/tmp-img_gbp17_1192173849_1 -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1812072group_id=180599 - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [ANNOUNCE] kvm-46 release
Avi Kivity wrote: We've now switched to allocating guest memory in userspace rather than in the kernel. Hmm, a quick glimpse over kvmctl.h doesn't show an obvious way how to use that. If I want to back vm memory with a file mapping, how can I do that? cheers, Gerd - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] RFC/patch portability: split kvm_vcpu_ioctl v1
Avi Kivity wrote: I agree 100%, I'm just using the keep the patch dead simple excuse to delay the change. We can have a 'add kvm_ prefix' patch round later. Let's complete the separation first. Okay, fine with me. There are some bigger offenders too, like set_crX(), which don't even start with the magic V and are exported to modules. Boo. Stop name space pollution, save the planet! Name space pollution causes global warmth! We should clean that up one day. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] Test for KVM, kernel 33aaf..., userspace, 803145...
Hi, Here is the summary for current KVM quality, kernel 33aafecf3f106ba6aa8847dfdae033a73e5d1b50, userspace, 80314525755d89861c6709c5ba6104fbe34a64ea. 4 old issues have been fixed in this week. And 3 new issues have been found. Totally 8 major issues still exist. Five fixed old issues: 1. 64bit linux with 2.6.9 kernel cannot get ip https://sourceforge.net/tracker/index.php?func=detailaid=1802580group_ id=180599atid=893831 2. soft lockup while running SMP linux guest with 4vpus https://sourceforge.net/tracker/index.php?func=detailaid=1804597group_ id=180599atid=893831 3 Creating multiple guests may cause host to hang https://sourceforge.net/tracker/index.php?func=detailaid=1741312group_ id=180599atid=893831 4. Fails to install x86 Vista on 64bit host https://sourceforge.net/tracker/index.php?func=detailaid=1805007group_ id=180599atid=893831 Issue List: One Network issue: 1. 64bit guest with 2.6.16 kernel crashes when start up nic -no-kvm-irqchip has the same issue https://sourceforge.net/tracker/index.php?func=detailaid=1804990group_ id=180599atid=893831 Five Windows issues: 2. Cannot boot 64bit Vista -no-kvm-irqchip hasn't the issue. https://sourceforge.net/tracker/index.php?func=detailaid=1812072group_ id=180599atid=893831 3. windows xp with acpi hal fails to reboot -no-kvm-irqchip has the same issue https://sourceforge.net/tracker/index.php?func=detailaid=1805016group_ id=180599atid=893831 4. 64bit xpsp2 installer crashed when rebooting With -no-kvm-irqchip, blue screen happens when reboot https://sourceforge.net/tracker/index.php?func=detailaid=1804990group_ id=180599atid=893831 5. xpsp2 with 2vpus may fail to boot -no-kvm-irqchip has the same issue https://sourceforge.net/tracker/index.php?func=detailaid=1805017group_ id=180599atid=893831 Three Linux guest issues: 6. segfault while booting 64bit linux with 4GB mem https://sourceforge.net/tracker/?func=detailatid=893831aid=1812050gro up_id=180599 7. Cannot boot 32bit smp RHEL5.1 guest Only -on-kvm can boot it. https://sourceforge.net/tracker/?func=detailatid=893831aid=1812043gro up_id=180599 8 Some ltp cases fail on KVM guests https://sourceforge.net/tracker/index.php?func=detailaid=1741316group_ id=180599atid=893831 thanks Yunfeng - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Patch] [0/3] Patches to support new architectures.
Zhang, Xiantao wrote: I orgirnally mean we have to wrap all functions related to kvm_x86_ops. But seems it doesn't introduce extra maintain effort, if other architectures implment these functions directly. Good method! That was my idea at first too, until Hollis has beaten me up on this. Most archs don't have the split like vmx/svm do, and the compiler automagically inlines the wrapper for x86 which gets optimized away completely. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Patch][RFC]Split kvm_vcpu to support new archs.
Zhang, Xiantao wrote: --- /dev/null +++ b/drivers/kvm/kvm_arch.h [...] +struct kvm_arch_vcpu{ + + u64 host_tsc; + + unsigned long regs[NR_VCPU_REGS]; /* for rsp: vcpu_load_rsp_rip() */ + unsigned long rip; /* needs vcpu_load_rsp_rip() */ + + unsigned long cr0; + unsigned long cr2; + unsigned long cr3; + unsigned long cr4; + unsigned long cr8; + u64 pdptrs[4]; /* pae */ + u64 shadow_efer; + u64 apic_base; + struct kvm_lapic *apic;/* kernel irqchip context */ + + u64 ia32_misc_enable_msr; + + + struct i387_fxsave_struct host_fx_image; + struct i387_fxsave_struct guest_fx_image; + int fpu_active; + int guest_fpu_loaded; + + gva_t mmio_fault_cr2; + + struct { + int active; + u8 save_iopl; + struct kvm_save_segment { + u16 selector; + unsigned long base; + u32 limit; + u32 ar; + } tr, es, ds, fs, gs; + } rmode; [...] As far as I can see without applying it, that split is ok for powerpc. I had a similar approach in my local patch queue too. Minor differences in which elements of the structs are arch dependent or not can be changed in small patches later ;-) But the file kvm_arch.h name confuses me a bit - I assume you had the coming asm split in mind where every architecture can define it's asm/kvm_arch.h. Since we don't have that asm structure for kvm yet, the changes you made to kvm_arch.h may be better located at the x86.h atm. -- Grüsse / regards, Christian Ehrhardt IBM Linux Technology Center, Open Virtualization +49 7031/16-3385 [EMAIL PROTECTED] [EMAIL PROTECTED] IBM Deutschland Entwicklung GmbH Vorsitzender des Aufsichtsrats: Johann Weihen Geschäftsführung: Herbert Kircher Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Patch][RFC]Split kvm_vcpu to support new archs.
Christian Ehrhardt wrote: Zhang, Xiantao wrote: --- /dev/null +++ b/drivers/kvm/kvm_arch.h [...] +struct kvm_arch_vcpu{ + +u64 host_tsc; + +unsigned long regs[NR_VCPU_REGS]; /* for rsp: vcpu_load_rsp_rip() */ +unsigned long rip; /* needs vcpu_load_rsp_rip() */ + +unsigned long cr0; +unsigned long cr2; +unsigned long cr3; +unsigned long cr4; +unsigned long cr8; +u64 pdptrs[4]; /* pae */ +u64 shadow_efer; +u64 apic_base; +struct kvm_lapic *apic;/* kernel irqchip context */ + +u64 ia32_misc_enable_msr; + + +struct i387_fxsave_struct host_fx_image; +struct i387_fxsave_struct guest_fx_image; +int fpu_active; +int guest_fpu_loaded; + +gva_t mmio_fault_cr2; + +struct { +int active; +u8 save_iopl; +struct kvm_save_segment { +u16 selector; +unsigned long base; +u32 limit; +u32 ar; +} tr, es, ds, fs, gs; +} rmode; [...] As far as I can see without applying it, that split is ok for powerpc. I had a similar approach in my local patch queue too. Minor differences in which elements of the structs are arch dependent or not can be changed in small patches later ;-) But the file kvm_arch.h name confuses me a bit - I assume you had the coming asm split in mind where every architecture can define it's asm/kvm_arch.h. Since we don't have that asm structure for kvm yet, the changes you made to kvm_arch.h may be better located at the x86.h atm. According to our previous discuss, we proposed a source layout, which contains an include directory to hold header files for all archs under drivers/kvm/, and kvm_arch.h will finally go into drivers/kvm/include/kvm-x86/(linked as kvm when compile). So, every architecture can defines its own kvm_arch.h for their arch, and compile will choose it per ARCH when compile time. But for now, we can just put it here before another real new arch in. Then, we can remove x86.h, since it is not so common for all archs. :) BTW, header files should be managed with a uniform method, because possible archs, such as IA64, maybe need many ones. Thanks Xiatnao - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Patch][RFC]Split kvm_vcpu to support new archs.
Zhang, Xiantao wrote: Thank you, I will resend it :) I do greatly appreciate it. We'll do this together, please do also pick on my patches whenever you see something that does'nt fit what you need for ia64. thanks, Carsten - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH] Add some \n in ioapic_debug()
Add new-line at end of debug strings. Signed-off-by: Laurent Vivier [EMAIL PROTECTED] --- drivers/kvm/ioapic.c | 25 ++--- 1 files changed, 14 insertions(+), 11 deletions(-) diff --git a/drivers/kvm/ioapic.c b/drivers/kvm/ioapic.c index 3b69541..1a5e59a 100644 --- a/drivers/kvm/ioapic.c +++ b/drivers/kvm/ioapic.c @@ -40,8 +40,11 @@ #include asm/apicdef.h #include asm/io_apic.h #include irq.h -/* #define ioapic_debug(fmt,arg...) printk(KERN_WARNING fmt,##arg) */ +#if 0 +#define ioapic_debug(fmt,arg...) printk(KERN_WARNING fmt,##arg) +#else #define ioapic_debug(fmt, arg...) +#endif static void ioapic_deliver(struct kvm_ioapic *vioapic, int irq); static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic, @@ -113,7 +116,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val) default: index = (ioapic-ioregsel - 0x10) 1; - ioapic_debug(change redir index %x val %x, index, val); + ioapic_debug(change redir index %x val %x\n, index, val); if (index = IOAPIC_NUM_PINS) return; if (ioapic-ioregsel 1) { @@ -134,7 +137,7 @@ static void ioapic_inj_irq(struct kvm_ioapic *ioapic, struct kvm_lapic *target, u8 vector, u8 trig_mode, u8 delivery_mode) { - ioapic_debug(irq %d trig %d deliv %d, vector, trig_mode, + ioapic_debug(irq %d trig %d deliv %d\n, vector, trig_mode, delivery_mode); ASSERT((delivery_mode == dest_Fixed) || @@ -151,7 +154,7 @@ static u32 ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest, struct kvm *kvm = ioapic-kvm; struct kvm_vcpu *vcpu; - ioapic_debug(dest %d dest_mode %d, dest, dest_mode); + ioapic_debug(dest %d dest_mode %d\n, dest, dest_mode); if (dest_mode == 0) { /* Physical mode. */ if (dest == 0xFF) { /* Broadcast. */ @@ -179,7 +182,7 @@ static u32 ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest, kvm_apic_match_logical_addr(vcpu-apic, dest)) mask |= 1 vcpu-vcpu_id; } - ioapic_debug(mask %x, mask); + ioapic_debug(mask %x\n, mask); return mask; } @@ -196,12 +199,12 @@ static void ioapic_deliver(struct kvm_ioapic *ioapic, int irq) int vcpu_id; ioapic_debug(dest=%x dest_mode=%x delivery_mode=%x -vector=%x trig_mode=%x, +vector=%x trig_mode=%x\n, dest, dest_mode, delivery_mode, vector, trig_mode); deliver_bitmask = ioapic_get_delivery_bitmask(ioapic, dest, dest_mode); if (!deliver_bitmask) { - ioapic_debug(no target on destination); + ioapic_debug(no target on destination\n); return; } @@ -214,7 +217,7 @@ static void ioapic_deliver(struct kvm_ioapic *ioapic, int irq) trig_mode, delivery_mode); else ioapic_debug(null round robin: -mask=%x vector=%x delivery_mode=%x, +mask=%x vector=%x delivery_mode=%x\n, deliver_bitmask, vector, dest_LowestPrio); break; case dest_Fixed: @@ -304,7 +307,7 @@ static void ioapic_mmio_read(struct kvm_io_device *this, gpa_t addr, int len, struct kvm_ioapic *ioapic = (struct kvm_ioapic *)this-private; u32 result; - ioapic_debug(addr %lx, (unsigned long)addr); + ioapic_debug(addr %lx\n, (unsigned long)addr); ASSERT(!(addr 0xf)); /* check alignment */ addr = 0xff; @@ -341,8 +344,8 @@ static void ioapic_mmio_write(struct kvm_io_device *this, gpa_t addr, int len, struct kvm_ioapic *ioapic = (struct kvm_ioapic *)this-private; u32 data; - ioapic_debug(ioapic_mmio_write addr=%lx len=%d val=%p\n, -addr, len, val); + ioapic_debug(ioapic_mmio_write addr=%p len=%d val=%p\n, +(void*)addr, len, val); ASSERT(!(addr 0xf)); /* check alignment */ if (len == 4 || len == 8) data = *(u32 *) val; -- 1.5.2.4 - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Patch][RFC]Split kvm_vcpu to support new archs.
Carsten Otte wrote: Zhang, Xiantao wrote: Thank you, I will resend it :) I do greatly appreciate it. We'll do this together, please do also pick on my patches whenever you see something that does'nt fit what you need for ia64. Sure:) Xiantao thanks, Carsten - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Patch][RFC]Split kvm_vcpu to support new archs.
int vcpu_id; struct mutex mutex; int cpu; -u64 host_tsc; struct kvm_run *run; int interrupt_window_open; I am not sure if this is the right thing for all archs. We have various forms of interrupts (I/O, external etc) which can all be masked seperately. I think interrupt_window_open should go to arch. Thank you, I will resend it :) int guest_mode; unsigned long requests; unsigned long irq_summary; /* bit vector: 1 per word in irq_pending */ We don't have irq. This works completely different for us, thus this needs to go to arch. DECLARE_BITMAP(irq_pending, KVM_NR_INTERRUPTS); Same here. #define VCPU_MP_STATE_RUNNABLE 0 #define VCPU_MP_STATE_UNINITIALIZED 1 #define VCPU_MP_STATE_INIT_RECEIVED 2 @@ -339,7 +329,6 @@ struct kvm_vcpu { #define VCPU_MP_STATE_HALTED4 int mp_state; int sipi_vector; This one is arch dependent and should go to arch. -u64 ia32_misc_enable_msr; struct kvm_mmu mmu; @@ -354,10 +343,6 @@ struct kvm_vcpu { struct kvm_guest_debug guest_debug; -struct i387_fxsave_struct host_fx_image; -struct i387_fxsave_struct guest_fx_image; -int fpu_active; -int guest_fpu_loaded; I think guest_fpu_loaded should be generic. Don't you want to use the lazy fpu restore with preempt notification too? int mmio_needed; int mmio_read_completed; This is arch dependent, we don't have CONFIG_MMIO. @@ -365,7 +350,6 @@ struct kvm_vcpu { int mmio_size; unsigned char mmio_data[8]; gpa_t mmio_phys_addr; -gva_t mmio_fault_cr2; struct kvm_pio_request pio; void *pio_data; All above are arch dependent. diff --git a/drivers/kvm/kvm_arch.h b/drivers/kvm/kvm_arch.h new file mode 100644 index 000..fe73d3d --- /dev/null +++ b/drivers/kvm/kvm_arch.h @@ -0,0 +1,65 @@ +#ifndef __KVM_ARCH_H +#define __KVM_ARCH_H This should go to x86.h, no new header please. +struct kvm_arch_vcpu{ + +u64 host_tsc; + +unsigned long regs[NR_VCPU_REGS]; /* for rsp: vcpu_load_rsp_rip() */ +unsigned long rip; /* needs vcpu_load_rsp_rip() */ + +unsigned long cr0; +unsigned long cr2; +unsigned long cr3; +unsigned long cr4; +unsigned long cr8; +u64 pdptrs[4]; /* pae */ +u64 shadow_efer; +u64 apic_base; +struct kvm_lapic *apic;/* kernel irqchip context */ + +u64 ia32_misc_enable_msr; + + +struct i387_fxsave_struct host_fx_image; +struct i387_fxsave_struct guest_fx_image; +int fpu_active; +int guest_fpu_loaded; + +gva_t mmio_fault_cr2; + +struct { +int active; +u8 save_iopl; +struct kvm_save_segment { +u16 selector; +unsigned long base; +u32 limit; +u32 ar; +} tr, es, ds, fs, gs; +} rmode; + +int cpuid_nent; +struct kvm_cpuid_entry cpuid_entries[KVM_MAX_CPUID_ENTRIES]; + +/* emulate context */ + +struct x86_emulate_ctxt emulate_ctxt; +}; + +#endif Very nice. The only thing that should'nt be here is fpu_active as far as I can tell. Since some archs don't need to care fpu, so I put it under arch. If most archs need it, maybe we can move it to top level. Just a tradeoff.:) I like this split overall, per architecture vcpu data structures are an important step and clearly the right way to go. with kind regards, Carsten - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Patch][RFC]Split kvm_vcpu to support new archs.
Zhang, Xiantao wrote: From 12457e0fb85ef32f1a1f808be294bebe8d22667c Mon Sep 17 00:00:00 2001 From: Zhang xiantao [EMAIL PROTECTED] Date: Fri, 12 Oct 2007 13:29:30 +0800 Subject: [PATCH] Split kvm_vcpu to support new archs. Define a new sub field kvm_arch_vcpu to hold arch-specific sections. I am not sure data fields related to mmu should put under kvm_arch_vcpu or not, because IA64 side doesn't need them, and only need kvm module to allocate memory for guests. We don't need them either on 390, and so does ppc. I think we should consider Avi's ingenious softmmu to be x86 specific. Therefore, those fields should go to the x86 part afaics. diff --git a/drivers/kvm/ioapic.c b/drivers/kvm/ioapic.c index 3b69541..b149c07 100644 --- a/drivers/kvm/ioapic.c +++ b/drivers/kvm/ioapic.c @@ -156,7 +156,7 @@ static u32 ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest, if (dest_mode == 0) { /* Physical mode. */ if (dest == 0xFF) { /* Broadcast. */ for (i = 0; i KVM_MAX_VCPUS; ++i) - if (kvm-vcpus[i] kvm-vcpus[i]-apic) + if (kvm-vcpus[i] kvm-vcpus[i]-arch.apic) mask |= 1 i; return mask; } Your mail client wraps lines, thus the patch is not applicable when taking from an email. Try using mudd or evolution for sending patches. In evolution, select preformat mode and paste into. diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h index 4a52d6e..eaa28c8 100644 --- a/drivers/kvm/kvm.h +++ b/drivers/kvm/kvm.h @@ -307,31 +307,21 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus, gpa_t addr); void kvm_io_bus_register_dev(struct kvm_io_bus *bus, struct kvm_io_device *dev); + +#include kvm_arch.h + This should be x86.h for now, and later on be moved to include/asm-x86/to-be-named.h struct kvm_vcpu { struct kvm *kvm; struct preempt_notifier preempt_notifier; int vcpu_id; struct mutex mutex; int cpu; - u64 host_tsc; struct kvm_run *run; int interrupt_window_open; I am not sure if this is the right thing for all archs. We have various forms of interrupts (I/O, external etc) which can all be masked seperately. I think interrupt_window_open should go to arch. int guest_mode; unsigned long requests; unsigned long irq_summary; /* bit vector: 1 per word in irq_pending */ We don't have irq. This works completely different for us, thus this needs to go to arch. DECLARE_BITMAP(irq_pending, KVM_NR_INTERRUPTS); Same here. #define VCPU_MP_STATE_RUNNABLE 0 #define VCPU_MP_STATE_UNINITIALIZED 1 #define VCPU_MP_STATE_INIT_RECEIVED 2 @@ -339,7 +329,6 @@ struct kvm_vcpu { #define VCPU_MP_STATE_HALTED4 int mp_state; int sipi_vector; This one is arch dependent and should go to arch. - u64 ia32_misc_enable_msr; struct kvm_mmu mmu; @@ -354,10 +343,6 @@ struct kvm_vcpu { struct kvm_guest_debug guest_debug; - struct i387_fxsave_struct host_fx_image; - struct i387_fxsave_struct guest_fx_image; - int fpu_active; - int guest_fpu_loaded; I think guest_fpu_loaded should be generic. Don't you want to use the lazy fpu restore with preempt notification too? int mmio_needed; int mmio_read_completed; This is arch dependent, we don't have CONFIG_MMIO. @@ -365,7 +350,6 @@ struct kvm_vcpu { int mmio_size; unsigned char mmio_data[8]; gpa_t mmio_phys_addr; - gva_t mmio_fault_cr2; struct kvm_pio_request pio; void *pio_data; All above are arch dependent. diff --git a/drivers/kvm/kvm_arch.h b/drivers/kvm/kvm_arch.h new file mode 100644 index 000..fe73d3d --- /dev/null +++ b/drivers/kvm/kvm_arch.h @@ -0,0 +1,65 @@ +#ifndef __KVM_ARCH_H +#define __KVM_ARCH_H This should go to x86.h, no new header please. +struct kvm_arch_vcpu{ + + u64 host_tsc; + + unsigned long regs[NR_VCPU_REGS]; /* for rsp: vcpu_load_rsp_rip() */ + unsigned long rip; /* needs vcpu_load_rsp_rip() */ + + unsigned long cr0; + unsigned long cr2; + unsigned long cr3; + unsigned long cr4; + unsigned long cr8; + u64 pdptrs[4]; /* pae */ + u64 shadow_efer; + u64 apic_base; + struct kvm_lapic *apic;/* kernel irqchip context */ + + u64 ia32_misc_enable_msr; + + + struct i387_fxsave_struct host_fx_image; + struct i387_fxsave_struct guest_fx_image; + int fpu_active; + int guest_fpu_loaded; + + gva_t mmio_fault_cr2; + + struct { + int active; + u8 save_iopl; + struct kvm_save_segment { + u16 selector; + unsigned long base; + u32 limit; +
Re: [kvm-devel] [Patch][RFC]Split kvm_vcpu to support new archs.
Zhang, Xiantao wrote: According to our previous discuss, we proposed a source layout, which contains an include directory to hold header files for all archs under drivers/kvm/, and kvm_arch.h will finally go into drivers/kvm/include/kvm-x86/(linked as kvm when compile). Right. The thing is, I've started a new header for this purpose yesterday. And this should be in the _same_ header, no matter where it'll end up. It is the x86 specific header file, currently named drivers/kvm/x86.h, which needs to be renamed/moved in the future. So, every architecture can defines its own kvm_arch.h for their arch, and compile will choose it per ARCH when compile time. But for now, we can just put it here before another real new arch in. Then, we can remove x86.h, since it is not so common for all archs. :) BTW, header files should be managed with a uniform method, because possible archs, such as IA64, maybe need many ones. That's fine with me. But prior to that we'll need to split x86 so that it can be relocated in its arch directory different from the common kvm location. And until we're there, we use x86.h as a place to store x86 specific header content. so long, Carsten - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] RFC/patch portability: split kvm_vm_ioctl
This patch splits kvm_vm_ioctl into archtecture independent parts, and x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c. Common ioctls for all architectures are: KVM_CREATE_VCPU, KVM_GET_DIRTY_LOG I'd really like to see more commonalities, but all others did not fit our needs. I would love to keep KVM_GET_DIRTY_LOG common, so that the ingenious migration code does not need to care too much about different architectures. x86 specific ioctls are: KVM_SET_MEMORY_REGION, KVM_SET_USER_MEMORY_REGION, KVM_GET/SET_NR_MMU_PAGES, KVM_SET_MEMORY_ALIAS, KVM_CREATE_IRQCHIP, KVM_CREATE_IRQ_LINE, KVM_GET/SET_IRQCHIP While the pic/apic related functions are obviously x86 specific, some other ioctls seem to be common at a first glance. KVM_SET_(USER)_MEMORY_REGION for example. We've got a total different address layout on s390: we cannot support multiple slots, and a user memory range always equals the guest physical memory [guest_phys + vm specific offset = host user address]. We don't have nor need dedicated vmas for the guest memory, we just use what the memory managment has in stock. This is true, because we reuse the page table for user and guest mode. Looks to me like the s390 might have a lot in common with a future AMD nested page table implementation. If AMD choose to reuse the page table too, we might share the same ioctl to set up guest addressing with them. signed-off-by: Carsten Otte [EMAIL PROTECTED] reviewed-by: Christian Borntraeger [EMAIL PROTECTED] reviewed-by: Christian Ehrhardt [EMAIL PROTECTED] --- Index: kvm/drivers/kvm/kvm.h === --- kvm.orig/drivers/kvm/kvm.h 2007-10-12 13:38:59.0 +0200 +++ kvm/drivers/kvm/kvm.h 2007-10-12 14:22:40.0 +0200 @@ -661,6 +661,9 @@ unsigned int ioctl, unsigned long arg); long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); +long kvm_arch_vm_ioctl(struct file *filp, + unsigned int ioctl, unsigned long arg); +void kvm_arch_destroy_vm(struct kvm *kvm); void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu); Index: kvm/drivers/kvm/kvm_main.c === --- kvm.orig/drivers/kvm/kvm_main.c 2007-10-12 13:38:59.0 +0200 +++ kvm/drivers/kvm/kvm_main.c 2007-10-12 13:57:30.0 +0200 @@ -40,7 +40,6 @@ #include linux/anon_inodes.h #include linux/profile.h #include linux/kvm_para.h -#include linux/pagemap.h #include asm/processor.h #include asm/msr.h @@ -319,61 +318,6 @@ return kvm; } -static void kvm_free_userspace_physmem(struct kvm_memory_slot *free) -{ - int i; - - for (i = 0; i free-npages; ++i) { - if (free-phys_mem[i]) { - if (!PageReserved(free-phys_mem[i])) - SetPageDirty(free-phys_mem[i]); - page_cache_release(free-phys_mem[i]); - } - } -} - -static void kvm_free_kernel_physmem(struct kvm_memory_slot *free) -{ - int i; - - for (i = 0; i free-npages; ++i) - if (free-phys_mem[i]) - __free_page(free-phys_mem[i]); -} - -/* - * Free any memory in @free but not in @dont. - */ -static void kvm_free_physmem_slot(struct kvm_memory_slot *free, - struct kvm_memory_slot *dont) -{ - if (!dont || free-phys_mem != dont-phys_mem) - if (free-phys_mem) { - if (free-user_alloc) - kvm_free_userspace_physmem(free); - else - kvm_free_kernel_physmem(free); - vfree(free-phys_mem); - } - if (!dont || free-rmap != dont-rmap) - vfree(free-rmap); - - if (!dont || free-dirty_bitmap != dont-dirty_bitmap) - vfree(free-dirty_bitmap); - - free-phys_mem = NULL; - free-npages = 0; - free-dirty_bitmap = NULL; -} - -static void kvm_free_physmem(struct kvm *kvm) -{ - int i; - - for (i = 0; i kvm-nmemslots; ++i) - kvm_free_physmem_slot(kvm-memslots[i], NULL); -} - static void free_pio_guest_pages(struct kvm_vcpu *vcpu) { int i; @@ -421,7 +365,7 @@ kfree(kvm-vpic); kfree(kvm-vioapic); kvm_free_vcpus(kvm); - kvm_free_physmem(kvm); + kvm_arch_destroy_vm(kvm); kfree(kvm); } @@ -686,183 +630,6 @@ EXPORT_SYMBOL_GPL(fx_init); /* - * Allocate some memory and give it an address in the guest physical address - * space. - * - * Discontiguous memory is allowed, mostly for framebuffers. - */ -static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, - struct - kvm_userspace_memory_region *mem, -
Re: [kvm-devel] RFC/patch portability: split kvm_vm_ioctl
Carsten Otte wrote: This patch splits kvm_vm_ioctl into archtecture independent parts, and x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c. Common ioctls for all architectures are: KVM_CREATE_VCPU, KVM_GET_DIRTY_LOG I'd really like to see more commonalities, but all others did not fit our needs. I would love to keep KVM_GET_DIRTY_LOG common, so that the ingenious migration code does not need to care too much about different architectures. x86 specific ioctls are: KVM_SET_MEMORY_REGION, KVM_SET_USER_MEMORY_REGION, KVM_GET/SET_NR_MMU_PAGES, KVM_SET_MEMORY_ALIAS, KVM_CREATE_IRQCHIP, KVM_CREATE_IRQ_LINE, KVM_GET/SET_IRQCHIP While the pic/apic related functions are obviously x86 specific, some other ioctls seem to be common at a first glance. KVM_SET_(USER)_MEMORY_REGION for example. We've got a total different address layout on s390: we cannot support multiple slots, and a user memory range always equals the guest physical memory [guest_phys + vm specific offset = host user address]. We don't have nor need dedicated vmas for the guest memory, we just use what the memory managment has in stock. This is true, because we reuse the page table for user and guest mode. You still need to tell the kernel about vm specific offset right? So doesn't KVM_SET_USER_MEMORY_REGION for you just become that? There's nothing wrong with s390 not supporting multiple memory slots, but there's no reason the ioctl interface can't be the same. Regards, Anthony Liguori Looks to me like the s390 might have a lot in common with a future AMD nested page table implementation. If AMD choose to reuse the page table too, we might share the same ioctl to set up guest addressing with them. signed-off-by: Carsten Otte [EMAIL PROTECTED] reviewed-by: Christian Borntraeger [EMAIL PROTECTED] reviewed-by: Christian Ehrhardt [EMAIL PROTECTED] --- - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] Expose infrastructure for unpinning guest memory
Avi Kivity wrote: Anthony Liguori wrote: Now that we have userspace memory allocation, I wanted to play with ballooning. The idea is that when a guest balloons down, we simply unpin the underlying physical memory and the host kernel may or may not swap it. To reclaim ballooned memory, the guest can just start using it and we'll pin it on demand. The following patch is a stab at providing the right infrastructure for pinning and automatic repinning. I don't have a lot of comfort in the MMU code so I thought I'd get some feedback before going much further. gpa_to_hpa is a little awkward to hook, but it seems like the right place in the code. I'm most uncertain about the SMP safety of the unpinning. Presumably, I have to hold the kvm lock around the mmu_unshadow and page_cache release to ensure that another VCPU doesn't fault the page back in after mmu_unshadow? One we have true swapping capabilities (which imply ability for the kernel to remove a page from the shadow page tables) you can unpin by calling munmap() or madvise(MADV_REMOVE) on the pages to be unpinned. So does MADV_REMOVE remove the backing page but still allow for memory to be faulted in? That is, after calling MADV_REMOVE, there's no guarantee that the contents of a give VA range will remain the same (but it won't SEGV the app if it accesses that memory)? If so, I think that would be the right way to treat it. That allows for two types of hints for the guest to provide: 1) I won't access this memory for a very long time (so it's a good candidate to swap out) and 2) I won't access this memory and don't care about it's contents. Regards, Anthony Liguori Other than that the approach seems right. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] RFC/patch portability: split kvm_vm_ioctl
Am Freitag, den 12.10.2007, 15:37 +0200 schrieb Arnd Bergmann: I assume the contents are ok, since you're just moving code around, but please write this 'Signed-off-by' and 'Reviewed-by' (capital letters), and include a diffstat for any patch that doesn't fit on a few pages of mail client screen space. The intend of an rfc is in general to review a patch, not to pick on formalities. Signed-off-by: Carsten Otte [EMAIL PROTECTED] Reviewed-by: Christian Borntraeger [EMAIL PROTECTED] Reviewed-by: Christian Ehrhardt [EMAIL PROTECTED] --- kvm.h |3 kvm_main.c | 460 --- x86.c | 472 + 3 files changed, 478 insertions(+), 457 deletions(-) Index: kvm/drivers/kvm/kvm.h === --- kvm.orig/drivers/kvm/kvm.h 2007-10-12 13:38:59.0 +0200 +++ kvm/drivers/kvm/kvm.h 2007-10-12 14:22:40.0 +0200 @@ -661,6 +661,9 @@ unsigned int ioctl, unsigned long arg); long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); +long kvm_arch_vm_ioctl(struct file *filp, + unsigned int ioctl, unsigned long arg); +void kvm_arch_destroy_vm(struct kvm *kvm); void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu); Index: kvm/drivers/kvm/kvm_main.c === --- kvm.orig/drivers/kvm/kvm_main.c 2007-10-12 13:38:59.0 +0200 +++ kvm/drivers/kvm/kvm_main.c 2007-10-12 13:57:30.0 +0200 @@ -40,7 +40,6 @@ #include linux/anon_inodes.h #include linux/profile.h #include linux/kvm_para.h -#include linux/pagemap.h #include asm/processor.h #include asm/msr.h @@ -319,61 +318,6 @@ return kvm; } -static void kvm_free_userspace_physmem(struct kvm_memory_slot *free) -{ - int i; - - for (i = 0; i free-npages; ++i) { - if (free-phys_mem[i]) { - if (!PageReserved(free-phys_mem[i])) - SetPageDirty(free-phys_mem[i]); - page_cache_release(free-phys_mem[i]); - } - } -} - -static void kvm_free_kernel_physmem(struct kvm_memory_slot *free) -{ - int i; - - for (i = 0; i free-npages; ++i) - if (free-phys_mem[i]) - __free_page(free-phys_mem[i]); -} - -/* - * Free any memory in @free but not in @dont. - */ -static void kvm_free_physmem_slot(struct kvm_memory_slot *free, - struct kvm_memory_slot *dont) -{ - if (!dont || free-phys_mem != dont-phys_mem) - if (free-phys_mem) { - if (free-user_alloc) - kvm_free_userspace_physmem(free); - else - kvm_free_kernel_physmem(free); - vfree(free-phys_mem); - } - if (!dont || free-rmap != dont-rmap) - vfree(free-rmap); - - if (!dont || free-dirty_bitmap != dont-dirty_bitmap) - vfree(free-dirty_bitmap); - - free-phys_mem = NULL; - free-npages = 0; - free-dirty_bitmap = NULL; -} - -static void kvm_free_physmem(struct kvm *kvm) -{ - int i; - - for (i = 0; i kvm-nmemslots; ++i) - kvm_free_physmem_slot(kvm-memslots[i], NULL); -} - static void free_pio_guest_pages(struct kvm_vcpu *vcpu) { int i; @@ -421,7 +365,7 @@ kfree(kvm-vpic); kfree(kvm-vioapic); kvm_free_vcpus(kvm); - kvm_free_physmem(kvm); + kvm_arch_destroy_vm(kvm); kfree(kvm); } @@ -686,183 +630,6 @@ EXPORT_SYMBOL_GPL(fx_init); /* - * Allocate some memory and give it an address in the guest physical address - * space. - * - * Discontiguous memory is allowed, mostly for framebuffers. - */ -static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, - struct - kvm_userspace_memory_region *mem, - int user_alloc) -{ - int r; - gfn_t base_gfn; - unsigned long npages; - unsigned long i; - struct kvm_memory_slot *memslot; - struct kvm_memory_slot old, new; - - r = -EINVAL; - /* General sanity checks */ - if (mem-memory_size (PAGE_SIZE - 1)) - goto out; - if (mem-guest_phys_addr (PAGE_SIZE - 1)) - goto out; - if (mem-slot = KVM_MEMORY_SLOTS) - goto out; - if (mem-guest_phys_addr + mem-memory_size mem-guest_phys_addr) - goto out; - - memslot = kvm-memslots[mem-slot]; - base_gfn = mem-guest_phys_addr PAGE_SHIFT; - npages = mem-memory_size PAGE_SHIFT; - - if (!npages) -
Re: [kvm-devel] RFC/patch portability: split kvm_vm_ioctl
On Friday 12 October 2007, Carsten Otte wrote: This patch splits kvm_vm_ioctl into archtecture independent parts, and x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c. I assume the contents are ok, since you're just moving code around, but please signed-off-by: Carsten Otte [EMAIL PROTECTED] reviewed-by: Christian Borntraeger [EMAIL PROTECTED] reviewed-by: Christian Ehrhardt [EMAIL PROTECTED] write this 'Signed-off-by' and 'Reviewed-by' (capital letters), and include a diffstat for any patch that doesn't fit on a few pages of mail client screen space. Arnd - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH][Resend] Split kvm_vcpu to support new archs.
Zhang, Xiantao wrote: diff --git a/drivers/kvm/ioapic.c b/drivers/kvm/ioapic.c index 3b69541..df67292 100644 --- a/drivers/kvm/ioapic.c +++ b/drivers/kvm/ioapic.c @@ -156,7 +156,7 @@ static u32 ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest, if (dest_mode == 0) { /* Physical mode. */ if (dest == 0xFF) { /* Broadcast. */ for (i = 0; i KVM_MAX_VCPUS; ++i) - if (kvm-vcpus[i] kvm-vcpus[i]-apic) + if (kvm-vcpus[i] kvm-vcpus[i]-arch.apic) mask |= 1 i; return mask; } Your mail client still wraps here, the patch is not applicable. struct kvm_vcpu { struct kvm *kvm; struct preempt_notifier preempt_notifier; int vcpu_id; struct mutex mutex; int cpu; - u64 host_tsc; struct kvm_run *run; int interrupt_window_open; This one should go to arch. int guest_mode; unsigned long requests; unsigned long irq_summary; /* bit vector: 1 per word in irq_pending */ DECLARE_BITMAP(irq_pending, KVM_NR_INTERRUPTS); Both irq related ones too please. int mmio_needed; int mmio_read_completed; Not all architectures have mmio, please put this into arch specific part. Other then that, the patch looks fine to me. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Working on an entry-level project
Dor Laor wrote: Cam Macdonell wrote: It's a simple test, when there are keyboard/mouse/display changes keep the refresh rate high. When there are no changes start decrease the rate until a minimum reached. The performance benefit should also be checked since if it minimal there's no use for this optimization. Related to that, what is the status of VMGL's (http://www.cs.toronto.edu/~andreslc/xen-gl/) integration with KVM or QEMU? Has anyone tried it? I've found some pages that refer to QEMU and VMGL but nothing definitive. Go ahead, there claim it can work with qemu. Try first with qemu since it is the repository to contribute the code to. KVM will inherit it from qemu. Is there a way to test with QEMU that is not painfully slow? Cam - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Patch][RFC]Split kvm_vcpu to support new archs.
Zhang, Xiantao wrote: So, every architecture can defines its own kvm_arch.h for their arch, and compile will choose it per ARCH when compile time. But for now, we can just put it here before another real new arch in. Then, we can remove x86.h, since it is not so common for all archs. :) BTW, header files should be managed with a uniform method, because possible archs, such as IA64, maybe need many ones. That's fine with me. But prior to that we'll need to split x86 so that it can be relocated in its arch directory different from the common kvm location. And until we're there, we use x86.h as a place to store x86 specific header content. OK, I will change it to x86.h, but we also renamed it to such kvm_arch.h, because kvm.h will includes it. Which kvm.h? The one in include/linux or the one in drivers/kvm? - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Patch][RFC]Split kvm_vcpu to support new archs.
Carsten Otte wrote: Zhang, Xiantao wrote: According to our previous discuss, we proposed a source layout, which contains an include directory to hold header files for all archs under drivers/kvm/, and kvm_arch.h will finally go into drivers/kvm/include/kvm-x86/(linked as kvm when compile). Right. The thing is, I've started a new header for this purpose yesterday. And this should be in the _same_ header, no matter where it'll end up. It is the x86 specific header file, currently named drivers/kvm/x86.h, which needs to be renamed/moved in the future. Agree. future rename or remove operation is needed. So, every architecture can defines its own kvm_arch.h for their arch, and compile will choose it per ARCH when compile time. But for now, we can just put it here before another real new arch in. Then, we can remove x86.h, since it is not so common for all archs. :) BTW, header files should be managed with a uniform method, because possible archs, such as IA64, maybe need many ones. That's fine with me. But prior to that we'll need to split x86 so that it can be relocated in its arch directory different from the common kvm location. And until we're there, we use x86.h as a place to store x86 specific header content. OK, I will change it to x86.h, but we also renamed it to such kvm_arch.h, because kvm.h will includes it. so long, Carsten - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Working on an entry-level project
Cam Macdonell wrote: Dor Laor wrote: Cam Macdonell wrote: You may choose the interactivity improvements:in http://kvm.qumranet.com/kvmwiki/TODO Dor Thanks Dor, I'll look into it. Beyond the description, can you elaborate on the problem with frame rate during interactivity? Is the a simple test that reveals the problem? It's a simple test, when there are keyboard/mouse/display changes keep the refresh rate high. When there are no changes start decrease the rate until a minimum reached. The performance benefit should also be checked since if it minimal there's no use for this optimization. Related to that, what is the status of VMGL's (http://www.cs.toronto.edu/~andreslc/xen-gl/) integration with KVM or QEMU? Has anyone tried it? I've found some pages that refer to QEMU and VMGL but nothing definitive. Go ahead, there claim it can work with qemu. Try first with qemu since it is the repository to contribute the code to. KVM will inherit it from qemu. Cam - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] Paravirt timer for KVM
On Fri, 2007-10-12 at 13:08 -0300, Glauber de Oliveira Costa wrote: +config KVM_CLOCK + bool KVM paravirtualized clock + depends on PARAVIRT GENERIC_CLOCKEVENTS + help + Turning on this option will allow you to run a paravirtualized clock + when running over the KVM hypervisor. Instead of relying on a PIT + (or probably other) emulation by the underlying device model, the host + provides the guest with timing infrastructure, as time of day, and + timer expiration. I must have missed earlier discussion on this topic, so I'm left wondering... what's the point? What's wrong with PIT (et al) emulation? -- Hollis Blanchard IBM Linux Technology Center - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Patch][RFC]Split kvm_vcpu to support new archs.
Carsten Otte wrote: Zhang, Xiantao wrote: So, every architecture can defines its own kvm_arch.h for their arch, and compile will choose it per ARCH when compile time. But for now, we can just put it here before another real new arch in. Then, we can remove x86.h, since it is not so common for all archs. :) BTW, header files should be managed with a uniform method, because possible archs, such as IA64, maybe need many ones. That's fine with me. But prior to that we'll need to split x86 so that it can be relocated in its arch directory different from the common kvm location. And until we're there, we use x86.h as a place to store x86 specific header content. OK, I will change it to x86.h, but we also renamed it to such kvm_arch.h, because kvm.h will includes it. Which kvm.h? The one in include/linux or the one in drivers/kvm? I mean drivers/kvm/kvm.h Xiantao - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [RFC] Paravirt timer for KVM
Hi, Attached is a first draft to a paravirt implementation for a timer to KVM. It is inspired in anthony's last patch about it, but not that much based on it. I'm not using hypercalls to get the current time, but rather, registering an address that will get timer updates once in a while. Also, it includes a clockevent oneshot implementation (which is the very thing of this patch), that will allow us interest things like dynticks. It's still not yet working on SMP, and I'm currently not sure why (ok, ok, if you actually read the patch, it will become obvious the why: it only delivers interrupt for vector 0x20, but I'm further with it, this patch is just a snapshot) My next TODOs with it are: * Get SMP working * Try something for stolen time, as jeremy's last suggestion for anthony's patch * Measure the time it takes for a hypercall, and subtract this time for calculating the expiry time for the timer event. * Testing and fixing bugs: I'm sure they exist! Meanwhile, all your suggestions are welcome. -- Glauber de Oliveira Costa. Free as in Freedom http://glommer.net The less confident you are, the more serious you have to act. diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig index 97b64d7..622e4d2 100644 --- a/arch/i386/Kconfig +++ b/arch/i386/Kconfig @@ -236,6 +236,15 @@ config VMI (it could be used by other hypervisors in theory too, but is not at the moment), by linking the kernel to a GPL-ed ROM module provided by the hypervisor. +config KVM_CLOCK + bool KVM paravirtualized clock + depends on PARAVIRT GENERIC_CLOCKEVENTS + help + Turning on this option will allow you to run a paravirtualized clock + when running over the KVM hypervisor. Instead of relying on a PIT + (or probably other) emulation by the underlying device model, the host + provides the guest with timing infrastructure, as time of day, and + timer expiration. config ACPI_SRAT bool diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile index 9d33b00..90c5dc4 100644 --- a/arch/i386/kernel/Makefile +++ b/arch/i386/kernel/Makefile @@ -42,6 +42,7 @@ obj-$(CONFIG_K8_NB) += k8.o obj-$(CONFIG_MGEODE_LX) += geode.o obj-$(CONFIG_VMI) += vmi.o vmiclock.o +obj-$(CONFIG_KVM_CLOCK) += kvmclock.o obj-$(CONFIG_PARAVIRT) += paravirt.o obj-y+= pcspeaker.o diff --git a/arch/i386/kernel/kvmclock.c b/arch/i386/kernel/kvmclock.c new file mode 100644 index 000..8c4df5d --- /dev/null +++ b/arch/i386/kernel/kvmclock.c @@ -0,0 +1,222 @@ +#include linux/clocksource.h +#include linux/clockchips.h +#include linux/interrupt.h +#include linux/kvm_para.h +#include asm/arch_hooks.h +#include asm/i8253.h + +#include mach_ipi.h +#include irq_vectors.h + +#define KVM_SCALE 22 + +static int no_kvmclock = 0; +extern struct clock_event_device *global_clock_event; + +static int parse_no_kvmclock(char *arg) +{ + no_kvmclock = 1; + return 0; +} +early_param(no-kvmclock, parse_no_kvmclock); + +/* The hypervisor will put information about time periodically here */ +struct kvm_hv_clock hv_clock; + +/* + * The wallclock is the time of day when we booted. Since then, some time may + * have elapsed since the hypervisor wrote the data. So we try to account for + * that. Even if the tsc is not accurate, it gives us a more accurate timing + * than not adjusting at all + */ +unsigned long kvm_get_wallclock(void) +{ + unsigned long wallclock; + unsigned long long now; + wallclock = hv_clock.wc.tv_sec; + + rdtscll(now); + + now -= hv_clock.last_tsc; + now = (now * hv_clock.tsc_mult) KVM_SCALE; + now += hv_clock.wc.tv_nsec; + do_div(now, NSEC_PER_SEC); + return wallclock; +} + +int kvm_set_wallclock(unsigned long now) +{ + return 0; +} + +/* + * This is our read_clock function. The host puts an tsc timestamp each time + * it updates a new time, and then we can use it to derive a slightly more + * precise notion of elapsed time, converted to nanoseconds. + * + * If the platform provides a stable tsc, we just use it, and there is no need + * for the host to update anything. + */ +static cycle_t kvm_clock_read(void) { + + u64 delta, last_tsc; + struct timespec *now; + + if (hv_clock.stable_tsc) { + rdtscll(last_tsc); + return last_tsc; + } + + do { + last_tsc = hv_clock.last_tsc; + rmb(); + now = hv_clock.now; + rmb(); + } while (hv_clock.last_tsc != last_tsc); + + delta = native_read_tsc() - last_tsc; + delta = (delta * hv_clock.tsc_mult) KVM_SCALE; + + return (cycle_t)now-tv_sec * NSEC_PER_SEC + now-tv_nsec + delta; +} + +static void kvm_timer_set_mode(enum clock_event_mode mode, +struct clock_event_device *evt) +{ + WARN_ON(!irqs_disabled()); + + switch (mode) { + case CLOCK_EVT_MODE_ONESHOT: + /* this is what we want */ + break; + case CLOCK_EVT_MODE_RESUME: + break; + case CLOCK_EVT_MODE_PERIODIC: + WARN_ON(1); + break; + case CLOCK_EVT_MODE_UNUSED: + case CLOCK_EVT_MODE_SHUTDOWN: + kvm_hypercall0(KVM_HCALL_STOP_ONESHOT); + break; + default: + break; + } +} + +/* + * Programming the next event is
Re: [kvm-devel] [RFC] Paravirt timer for KVM
Glauber de Oliveira Costa wrote: My next TODOs with it are: * Get SMP working * Try something for stolen time, as jeremy's last suggestion for anthony's patch * Measure the time it takes for a hypercall, and subtract this time for calculating the expiry time for the timer event. I don't think there's much point in trying to do stuff like this. The guest can be preempted at any time, so there's an arbitrary amount of time between deciding to set a timeout, and the time the timeout actually happens. In theory you can mitigate this by using an absolute rather than relative timeout value, but in practice I don't think it makes much difference. + +/* + * This is our read_clock function. The host puts an tsc timestamp each time + * it updates a new time, and then we can use it to derive a slightly more + * precise notion of elapsed time, converted to nanoseconds. + * + * If the platform provides a stable tsc, we just use it, and there is no need + * for the host to update anything. How would you deal with suspend/resume/migrate? Also, do you assume that stable_tsc also means synchronized tsc on an SMP host? + */ +static cycle_t kvm_clock_read(void) { + + u64 delta, last_tsc; + struct timespec *now; + + if (hv_clock.stable_tsc) { + rdtscll(last_tsc); + return last_tsc; So this returns a tsc here? + } + + do { + last_tsc = hv_clock.last_tsc; + rmb(); + now = hv_clock.now; Shouldn't this be taking a copy of now, rather than a pointer to it? Otherwise what's the point of this loop? + rmb(); + } while (hv_clock.last_tsc != last_tsc); This won't be an atomic compare on 32-bit; it could get confused by seeing a half-updated tsc value. + + delta = native_read_tsc() - last_tsc; + delta = (delta * hv_clock.tsc_mult) KVM_SCALE; + + return (cycle_t)now-tv_sec * NSEC_PER_SEC + now-tv_nsec + delta; --- But returns ns here? +} + +static void kvm_timer_set_mode(enum clock_event_mode mode, + struct clock_event_device *evt) +{ + WARN_ON(!irqs_disabled()); + + switch (mode) { + case CLOCK_EVT_MODE_ONESHOT: + /* this is what we want */ + break; + case CLOCK_EVT_MODE_RESUME: + break; + case CLOCK_EVT_MODE_PERIODIC: + WARN_ON(1); + break; + case CLOCK_EVT_MODE_UNUSED: + case CLOCK_EVT_MODE_SHUTDOWN: + kvm_hypercall0(KVM_HCALL_STOP_ONESHOT); + break; + default: + break; + } +} + +/* + * Programming the next event is just a matter of asking the host + * to generate us an interrupt when the time expires. We pass the + * delta on, and hypervisor will do all remaining tricks. For a more + * precise timing, we can just subtract the time spent by the hypercall Not worthwhile. It would be better to make the hypercall take an absolute time, and pass it now+delta. At least then if you get preempted past the timeout period you can return -ETIME, and the clock subsystem will know what to do. + */ +static int kvm_timer_next_event(unsigned long delta, + struct clock_event_device *evt) +{ + WARN_ON(evt-mode != CLOCK_EVT_MODE_ONESHOT); + kvm_hypercall1(KVM_HCALL_SET_ALARM, delta); + return 0; +} + diff --git a/arch/i386/kernel/setup.c b/arch/i386/kernel/setup.c index d474cd6..fd758f9 100644 --- a/arch/i386/kernel/setup.c +++ b/arch/i386/kernel/setup.c @@ -46,6 +46,7 @@ #include linux/crash_dump.h #include linux/dmi.h #include linux/pfn.h +#include linux/kvm_para.h #include video/edid.h @@ -579,6 +580,9 @@ void __init setup_arch(char **cmdline_p) vmi_init(); #endif +#ifdef CONFIG_KVM_CLOCK + kvmclock_init(); +#endif Why is this necessary? Can't you hook one of the existing pvops? J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] Paravirt timer for KVM
Glauber de Oliveira Costa wrote: Hi, Attached is a first draft to a paravirt implementation for a timer to KVM. It is inspired in anthony's last patch about it, but not that much based on it. I'm not using hypercalls to get the current time, but rather, registering an address that will get timer updates once in a while. Also, it includes a clockevent oneshot implementation (which is the very thing of this patch), that will allow us interest things like dynticks. It's still not yet working on SMP, and I'm currently not sure why (ok, ok, if you actually read the patch, it will become obvious the why: it only delivers interrupt for vector 0x20, but I'm further with it, this patch is just a snapshot) My next TODOs with it are: * Get SMP working * Try something for stolen time, as jeremy's last suggestion for anthony's patch * Measure the time it takes for a hypercall, and subtract this time for calculating the expiry time for the timer event. * Testing and fixing bugs: I'm sure they exist! Meanwhile, all your suggestions are welcome. snip + +void __init kvmclock_init(void) +{ + + unsigned long shared_page = (unsigned long)hv_clock; + /* + * If we can't use the paravirt clock, just go with + * the usual timekeeping + */ + if (!kvm_para_available() || no_kvmclock) + return; You should also check kvm_para_has_feature() and define a feature flag for the clock. + if (kvm_hypercall1(KVM_HCALL_SET_SHARED_PAGE, shared_page)) + return; + + paravirt_ops.get_wallclock = kvm_get_wallclock; + paravirt_ops.set_wallclock = kvm_set_wallclock; + paravirt_ops.sched_clock = kvm_sched_clock; + paravirt_ops.time_init = kvm_time_init; + /* + * If we let the normal APIC initialization code run, they will + * override our event handler, relying that the APIC will deliver + * the interrupts in the LOCAL_TIMER_VECTOR. The easy solution is + * keep the PIT running until then + */ + paravirt_ops.setup_boot_clock = kvm_disable_pit; +} diff --git a/drivers/kvm/irq.c b/drivers/kvm/irq.c index 0f663fe..7baf798 100644 --- a/drivers/kvm/irq.c +++ b/drivers/kvm/irq.c @@ -32,6 +32,8 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v) { struct kvm_pic *s; + if (v-timer_vector != -1) + return 1; if (kvm_apic_has_interrupt(v) == -1) { /* LAPIC */ if (kvm_apic_accept_pic_intr(v)) { s = pic_irqchip(v-kvm);/* PIC */ @@ -43,6 +45,12 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v) } EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt); +static int kvm_get_pvclock_interrupt(struct kvm_vcpu *v) +{ + int ret = v-timer_vector; + v-timer_vector = -1; + return ret; +} /* * Read pending interrupt vector and intack. */ @@ -51,7 +59,9 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v) struct kvm_pic *s; int vector; - vector = kvm_get_apic_interrupt(v); /* APIC */ + vector = kvm_get_pvclock_interrupt(v); + if (vector == -1) + vector = kvm_get_apic_interrupt(v); /* APIC */ It might be better to just rely on the in-kernel APIC to inject an interrupt for the clock (via kvm_pic_set_irq()). Regards, Anthony LIguori if (vector == -1) { if (kvm_apic_accept_pic_intr(v)) { s = pic_irqchip(v-kvm) - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] Paravirt timer for KVM
Hollis Blanchard wrote: On Fri, 2007-10-12 at 13:08 -0300, Glauber de Oliveira Costa wrote: +config KVM_CLOCK + bool KVM paravirtualized clock + depends on PARAVIRT GENERIC_CLOCKEVENTS + help + Turning on this option will allow you to run a paravirtualized clock + when running over the KVM hypervisor. Instead of relying on a PIT + (or probably other) emulation by the underlying device model, the host + provides the guest with timing infrastructure, as time of day, and + timer expiration. I must have missed earlier discussion on this topic, so I'm left wondering... what's the point? What's wrong with PIT (et al) emulation? There are three separate reasons, that I know of, to have a PV timer. 1) the PIT is periodic. a PV timer can offer a one shot timer which enables dynticks. 2) the TSC would have to be used as a clocksource. You don't know the frequency which is the first problem with using the TSC but some systems have a TSC that changes frequencies. A PV time source gives you more stable clocksource (although as in glommer's patch, when the TSC can be used, it's better to use it). 3) a PV clock can support stolen time calculation which there really isn't a concept of with emulation. Regards, Anthony Liguori - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] Windows XP PFN_LIST_CORRUPT error during install.
Not much detail here but I'll post all I can. KVM-46 (from tarball, using kvm-46 modules), ubuntu gutsy 2.6.22-14-generic amd64, Turion X2 with SVM, 1GB total memory on laptop. Was in the middle of an windows xp pro sp2 install using this command line: sudo ~/kvm46/bin/qemu-system-x86_64 -m 368 -boot c -cdrom winxp.iso -hda winxp-pro-work.qcow2 -vnc :1 -net user -net nic,mac=00:11:22:33:44:55:66,model=rtl8139 -localtime While installing, I went to do other work, which involved starting another VM running a linux boot CD with 256MB of ram. Was running both side by side without problem, other than the fact my host X-windows was sluggish due to the lack of total ram available (1G-368M-256M). When finished with the other VM, I shut it down, and a few seconds later the windows install which was almost complete in the first VM threw a bluescreen with a 'PFN_LIST_CORRUPT' error. Note that this was in the second stage of the install where windows had already rebooted into the install environment on the hard disk and was 'registering components' or something. Nothing in the kernel log on the host that I can see, anywhere else I can look? Unrelated, but earlier I tried a windows install with '-clock dynticks' and windows installation would eventually fail with an IRQL__NOT_LESS_THAN_OR_EQUAL (or something like that.. I'm working from memory). Removing the 'clock' option made it work fine. Not sure if it's supposed to work yet or not, but it didn't for me. john.c - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] OpenBSD 4.1 failes with kvm-45
Hi, I've applied the patch now the kernel detects: cpu0: Intel Pentium Pro, II or III... cpu0: FPU,V86,... But I get in the next line: kernel: page fault trap, code=0 Stopped at trap+0x16f: testb $0x3,0x38(%ecx) I don't know what that means. Hopefuly you could help. kind regards, Oliver Am Donnerstag, 11. Oktober 2007 09:57:19 schrieben Sie: On Wed, Oct 10, 2007 at 07:38:55PM +0200, Oliver Kowalke wrote: Will this patch be included into the new kvm version (46)? No. Hopefully it will be included in a near future version of kvm. hmm - Iget an error: patch -p1 ./qemu.patch patching file qemu/hw/pc.c patch: malformed patch at line 6: DisplayState *ds, const char **fd_filename, int snapshot, regards, Oliver I might be wrong, but this sounds like a patch gone bad by the mail system, since here I get clean and quite patch relative to both kvm-45 and kvm-46. I'll try to resend as attachment. Regards, Dan. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] Paravirt timer for KVM
Hollis Blanchard wrote: On Fri, 2007-10-12 at 15:02 -0500, Anthony Liguori wrote: Hollis Blanchard wrote: On Fri, 2007-10-12 at 13:08 -0300, Glauber de Oliveira Costa wrote: +config KVM_CLOCK + bool KVM paravirtualized clock + depends on PARAVIRT GENERIC_CLOCKEVENTS + help + Turning on this option will allow you to run a paravirtualized clock + when running over the KVM hypervisor. Instead of relying on a PIT + (or probably other) emulation by the underlying device model, the host + provides the guest with timing infrastructure, as time of day, and + timer expiration. I must have missed earlier discussion on this topic, so I'm left wondering... what's the point? What's wrong with PIT (et al) emulation? There are three separate reasons, that I know of, to have a PV timer. 1) the PIT is periodic. a PV timer can offer a one shot timer which enables dynticks. Obviously people have figured out how to do dynticks on real x86 hardware, so I don't accept this reason. :) Using more advanced timers like the HPET. 2) the TSC would have to be used as a clocksource. You don't know the frequency which is the first problem with using the TSC but some systems have a TSC that changes frequencies. A PV time source gives you more stable clocksource (although as in glommer's patch, when the TSC can be used, it's better to use it). As I understand it, the TSC is based on CPU frequency, which changes with power management. Architectural bug. However, PV time still doesn't help here: * The TSC is _user_ accessible, so PV time support in the guest kernel doesn't solve the problem. * It looks like external agents can perform out-of-kernel frequency scaling on x86 (at least I see options for it on IBM blades). So there must already exist some mechanism for a kernel to be informed that the TSC frequency has been changed. I don't know if that is scaled transparently to the host OS or just at boot time. Keep in mind too, modern Intel processors have fixed frequency TSCs so it's possible that that's only an option for those processors. Regards, Anthony Liguori 3) a PV clock can support stolen time calculation which there really isn't a concept of with emulation. This is true, and I know other platforms support this functionality. I think it's mostly useful for process time accounting. Is that actually supported in this patch? - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] Paravirt timer for KVM
On Fri, 2007-10-12 at 15:02 -0500, Anthony Liguori wrote: Hollis Blanchard wrote: On Fri, 2007-10-12 at 13:08 -0300, Glauber de Oliveira Costa wrote: +config KVM_CLOCK + bool KVM paravirtualized clock + depends on PARAVIRT GENERIC_CLOCKEVENTS + help + Turning on this option will allow you to run a paravirtualized clock + when running over the KVM hypervisor. Instead of relying on a PIT + (or probably other) emulation by the underlying device model, the host + provides the guest with timing infrastructure, as time of day, and + timer expiration. I must have missed earlier discussion on this topic, so I'm left wondering... what's the point? What's wrong with PIT (et al) emulation? There are three separate reasons, that I know of, to have a PV timer. 1) the PIT is periodic. a PV timer can offer a one shot timer which enables dynticks. Obviously people have figured out how to do dynticks on real x86 hardware, so I don't accept this reason. :) 2) the TSC would have to be used as a clocksource. You don't know the frequency which is the first problem with using the TSC but some systems have a TSC that changes frequencies. A PV time source gives you more stable clocksource (although as in glommer's patch, when the TSC can be used, it's better to use it). As I understand it, the TSC is based on CPU frequency, which changes with power management. Architectural bug. However, PV time still doesn't help here: * The TSC is _user_ accessible, so PV time support in the guest kernel doesn't solve the problem. * It looks like external agents can perform out-of-kernel frequency scaling on x86 (at least I see options for it on IBM blades). So there must already exist some mechanism for a kernel to be informed that the TSC frequency has been changed. 3) a PV clock can support stolen time calculation which there really isn't a concept of with emulation. This is true, and I know other platforms support this functionality. I think it's mostly useful for process time accounting. Is that actually supported in this patch? -- Hollis Blanchard IBM Linux Technology Center - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 0/4] Swapping
this patchs allow the guest not shadowed memory to be swapped out. to make it the must effective you should run -kvm-shadow-memory 1 (witch will make your machine slow) with -kvm-shadow-memory 1, 3giga memory guest can get to be just 32mb on physical host! when not using -kvm-shadow-memory, i saw 4100mb machine getting to as low as 168mb on the physical host (not as bad as i thought it would be, and surely not as bad as it can be with 41mb of shadow pages :)) it seems to be very stable, it didnt crushed to me once, and i was able to run: 2 3giga each windows xp + 5giga linux guest and 2 4.1 giga each windows xp and 2 2giga each windows xp. few things to note: ignore for now the ugly messages at dmesg, it is due to the fact that gfn_to_page try to sleep while local intrreupts disabled ( we have to split some emulator function so it wont do it) and i saw some issue with the new rmapp at fedora 7 live cd, for some reason , in the nonpaging mode rmap_remove getting called about 50 times less than it need it doesnt happen at other linux guests, need to check this... (for now it mean you might have about 200k of memory leak for each fedora 7 live cd you are runing ) also note that now kvm load much faster, beacuse no memset on all the memory is needed (beacuse gfn_to_page get called at run time) (avi, and dor, note that this patch include small fix to a bug in the patch that i sent you) - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 1/4] Swapping
this make the rmap keep reverse mapping on all the present shadow pages From 3b5821a55836f82f987b878982cbc6fc8336371f Mon Sep 17 00:00:00 2001 From: Izik Eidus [EMAIL PROTECTED](none) Date: Sat, 13 Oct 2007 01:47:44 +0200 Subject: [PATCH] modify the rmap so it will hold reverse mapping to all present shadow pages Signed-off-by: Izik Eidus [EMAIL PROTECTED] --- drivers/kvm/mmu.c | 52 1 files changed, 36 insertions(+), 16 deletions(-) diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c index f52604a..cfbeec8 100644 --- a/drivers/kvm/mmu.c +++ b/drivers/kvm/mmu.c @@ -211,8 +211,8 @@ static int is_io_pte(unsigned long pte) static int is_rmap_pte(u64 pte) { - return (pte (PT_WRITABLE_MASK | PT_PRESENT_MASK)) - == (PT_WRITABLE_MASK | PT_PRESENT_MASK); + return pte != shadow_trap_nonpresent_pte + pte != shadow_notrap_nonpresent_pte; } static void set_shadow_pte(u64 *sptep, u64 spte) @@ -456,29 +456,51 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) } } -static void rmap_write_protect(struct kvm *kvm, u64 gfn) +static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte) { struct kvm_rmap_desc *desc; + struct kvm_rmap_desc *prev_desc; + u64 *prev_spte; + int i; + + if (!*rmapp) + return NULL; + else if (!(*rmapp 1)) { + if (!spte) + return (u64 *)*rmapp; + return NULL; + } + desc = (struct kvm_rmap_desc *)(*rmapp ~1ul); + prev_desc = NULL; + prev_spte = NULL; + while (desc) { + for (i = 0; i RMAP_EXT desc-shadow_ptes[i]; ++i) { + if (prev_spte == spte) +return desc-shadow_ptes[i]; + prev_spte = desc-shadow_ptes[i]; + } + desc = desc-more; + } + return NULL; +} + +static void rmap_write_protect(struct kvm *kvm, u64 gfn) +{ unsigned long *rmapp; u64 *spte; gfn = unalias_gfn(kvm, gfn); rmapp = gfn_to_rmap(kvm, gfn); - while (*rmapp) { - if (!(*rmapp 1)) - spte = (u64 *)*rmapp; - else { - desc = (struct kvm_rmap_desc *)(*rmapp ~1ul); - spte = desc-shadow_ptes[0]; - } + spte = rmap_next(kvm, rmapp, NULL); + while (spte) { BUG_ON(!spte); BUG_ON(!(*spte PT_PRESENT_MASK)); - BUG_ON(!(*spte PT_WRITABLE_MASK)); rmap_printk(rmap_write_protect: spte %p %llx\n, spte, *spte); - rmap_remove(kvm, spte); - set_shadow_pte(spte, *spte ~PT_WRITABLE_MASK); + if (is_writeble_pte(*spte)) + set_shadow_pte(spte, *spte ~PT_WRITABLE_MASK); kvm_flush_remote_tlbs(kvm); + spte = rmap_next(kvm, rmapp, spte); } } @@ -1399,10 +1421,8 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot) pt = page-spt; for (i = 0; i PT64_ENT_PER_PAGE; ++i) /* avoid RMW */ - if (pt[i] PT_WRITABLE_MASK) { -rmap_remove(kvm, pt[i]); + if (pt[i] PT_WRITABLE_MASK) pt[i] = ~PT_WRITABLE_MASK; - } } } -- 1.5.2.4 - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 2/4] Swapping
this patch make gfn_to_page always safe function (return bad_page in case there is no such page in the guest) From 51a8851a2805f5b61d3fbe506ab317ecb677c3da Mon Sep 17 00:00:00 2001 From: Izik Eidus [EMAIL PROTECTED](none) Date: Sat, 13 Oct 2007 02:01:54 +0200 Subject: [PATCH] change gfn_to_page to be safe always function. Signed-off-by: Izik Eidus [EMAIL PROTECTED] --- drivers/kvm/kvm.h |3 ++- drivers/kvm/kvm_main.c| 26 ++ drivers/kvm/mmu.c | 16 +--- drivers/kvm/paging_tmpl.h | 11 +++ 4 files changed, 24 insertions(+), 32 deletions(-) diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h index 4a52d6e..a155c2b 100644 --- a/drivers/kvm/kvm.h +++ b/drivers/kvm/kvm.h @@ -565,8 +565,9 @@ static inline int is_error_hpa(hpa_t hpa) { return hpa HPA_MSB; } hpa_t gva_to_hpa(struct kvm_vcpu *vcpu, gva_t gva); struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t gva); -extern hpa_t bad_page_address; +extern struct page *bad_page; +int is_error_page(struct page *page); gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn); struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn); int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c index a0f8366..bfa201c 100644 --- a/drivers/kvm/kvm_main.c +++ b/drivers/kvm/kvm_main.c @@ -1012,6 +1012,12 @@ static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, struct kvm_irqchip *chip) return r; } +int is_error_page(struct page *page) +{ + return page == bad_page; +} +EXPORT_SYMBOL_GPL(is_error_page); + gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn) { int i; @@ -1053,7 +1059,7 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) gfn = unalias_gfn(kvm, gfn); slot = __gfn_to_memslot(kvm, gfn); if (!slot) - return NULL; + return bad_page; return slot-phys_mem[gfn - slot-base_gfn]; } EXPORT_SYMBOL_GPL(gfn_to_page); @@ -1073,7 +1079,7 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, struct page *page; page = gfn_to_page(kvm, gfn); - if (!page) + if (is_error_page(page)) return -EFAULT; page_virt = kmap_atomic(page, KM_USER0); @@ -,7 +1117,7 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data, struct page *page; page = gfn_to_page(kvm, gfn); - if (!page) + if (is_error_page(page)) return -EFAULT; page_virt = kmap_atomic(page, KM_USER0); @@ -1149,7 +1155,7 @@ int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len) struct page *page; page = gfn_to_page(kvm, gfn); - if (!page) + if (is_error_page(page)) return -EFAULT; page_virt = kmap_atomic(page, KM_USER0); @@ -3075,7 +3081,7 @@ static struct page *kvm_vm_nopage(struct vm_area_struct *vma, pgoff = ((address - vma-vm_start) PAGE_SHIFT) + vma-vm_pgoff; page = gfn_to_page(kvm, pgoff); - if (!page) + if (is_error_page(page)) return NOPAGE_SIGBUS; get_page(page); if (type != NULL) @@ -3390,7 +3396,7 @@ static struct sys_device kvm_sysdev = { .cls = kvm_sysdev_class, }; -hpa_t bad_page_address; +struct page *bad_page; static inline struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn) @@ -3519,7 +3525,6 @@ EXPORT_SYMBOL_GPL(kvm_exit_x86); static __init int kvm_init(void) { - static struct page *bad_page; int r; r = kvm_mmu_module_init(); @@ -3530,16 +3535,13 @@ static __init int kvm_init(void) kvm_arch_init(); - bad_page = alloc_page(GFP_KERNEL); + bad_page = alloc_page(GFP_KERNEL | __GFP_ZERO); if (bad_page == NULL) { r = -ENOMEM; goto out; } - bad_page_address = page_to_pfn(bad_page) PAGE_SHIFT; - memset(__va(bad_page_address), 0, PAGE_SIZE); - return 0; out: @@ -3552,7 +3554,7 @@ out4: static __exit void kvm_exit(void) { kvm_exit_debug(); - __free_page(pfn_to_page(bad_page_address PAGE_SHIFT)); + __free_page(bad_page); kvm_mmu_module_exit(); } diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c index cfbeec8..e6a9b4a 100644 --- a/drivers/kvm/mmu.c +++ b/drivers/kvm/mmu.c @@ -850,23 +850,17 @@ static void page_header_update_slot(struct kvm *kvm, void *pte, gpa_t gpa) __set_bit(slot, page_head-slot_bitmap); } -hpa_t safe_gpa_to_hpa(struct kvm *kvm, gpa_t gpa) -{ - hpa_t hpa = gpa_to_hpa(kvm, gpa); - - return is_error_hpa(hpa) ? bad_page_address | (gpa ~PAGE_MASK): hpa; -} - hpa_t gpa_to_hpa(struct kvm *kvm, gpa_t gpa) { struct page *page; + hpa_t hpa; ASSERT((gpa HPA_ERR_MASK) == 0); page = gfn_to_page(kvm, gpa PAGE_SHIFT); - if (!page) - return gpa | HPA_ERR_MASK; - return ((hpa_t)page_to_pfn(page) PAGE_SHIFT) - | (gpa (PAGE_SIZE-1)); + hpa = ((hpa_t)page_to_pfn(page) PAGE_SHIFT) | (gpa (PAGE_SIZE-1)); + if (is_error_page(page)) + return hpa | HPA_ERR_MASK; + return hpa; } hpa_t gva_to_hpa(struct kvm_vcpu *vcpu, gva_t gva) diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h index a9e687b..58fd35a 100644 ---
[kvm-devel] [PATCH 3/4] Swapping
this patch make the guest non shadowed pages swappedable From 8e25e215b8ed95ca4ff51cbfcf5bdc438bb799f4 Mon Sep 17 00:00:00 2001 From: Izik Eidus [EMAIL PROTECTED](none) Date: Sat, 13 Oct 2007 04:03:28 +0200 Subject: [PATCH] make the guest non shadowed memory swappable Signed-off-by: Izik Eidus [EMAIL PROTECTED] --- drivers/kvm/kvm.h |1 + drivers/kvm/kvm_main.c| 66 +--- drivers/kvm/mmu.c | 13 - drivers/kvm/paging_tmpl.h | 23 +-- 4 files changed, 70 insertions(+), 33 deletions(-) diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h index a155c2b..2e83fa7 100644 --- a/drivers/kvm/kvm.h +++ b/drivers/kvm/kvm.h @@ -409,6 +409,7 @@ struct kvm_memory_slot { unsigned long *rmap; unsigned long *dirty_bitmap; int user_alloc; /* user allocated memory */ + unsigned long userspace_addr; }; struct kvm { diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c index bfa201c..0dce93c 100644 --- a/drivers/kvm/kvm_main.c +++ b/drivers/kvm/kvm_main.c @@ -321,15 +321,6 @@ static struct kvm *kvm_create_vm(void) static void kvm_free_userspace_physmem(struct kvm_memory_slot *free) { - int i; - - for (i = 0; i free-npages; ++i) { - if (free-phys_mem[i]) { - if (!PageReserved(free-phys_mem[i])) -SetPageDirty(free-phys_mem[i]); - page_cache_release(free-phys_mem[i]); - } - } } static void kvm_free_kernel_physmem(struct kvm_memory_slot *free) @@ -771,19 +762,8 @@ static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, memset(new.phys_mem, 0, npages * sizeof(struct page *)); memset(new.rmap, 0, npages * sizeof(*new.rmap)); if (user_alloc) { - unsigned long pages_num; - new.user_alloc = 1; - down_read(current-mm-mmap_sem); - - pages_num = get_user_pages(current, current-mm, - mem-userspace_addr, - npages, 1, 0, new.phys_mem, - NULL); - - up_read(current-mm-mmap_sem); - if (pages_num != npages) -goto out_unlock; + new.userspace_addr = mem-userspace_addr; } else { for (i = 0; i npages; ++i) { new.phys_mem[i] = alloc_page(GFP_HIGHUSER @@ -1058,8 +1038,27 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) gfn = unalias_gfn(kvm, gfn); slot = __gfn_to_memslot(kvm, gfn); - if (!slot) + if (!slot) { + get_page(bad_page); return bad_page; + } + if (slot-user_alloc) { + struct page *page[1]; + int npages; + + down_read(current-mm-mmap_sem); + npages = get_user_pages(current, current-mm, + slot-userspace_addr + + (gfn - slot-base_gfn) * PAGE_SIZE, 1, + 1, 0, page, NULL); + up_read(current-mm-mmap_sem); + if (npages != 1) { + get_page(bad_page); + return bad_page; + } + return page[0]; + } + get_page(slot-phys_mem[gfn - slot-base_gfn]); return slot-phys_mem[gfn - slot-base_gfn]; } EXPORT_SYMBOL_GPL(gfn_to_page); @@ -1079,13 +1078,16 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, struct page *page; page = gfn_to_page(kvm, gfn); - if (is_error_page(page)) + if (is_error_page(page)) { + put_page(page); return -EFAULT; + } page_virt = kmap_atomic(page, KM_USER0); memcpy(data, page_virt + offset, len); kunmap_atomic(page_virt, KM_USER0); + put_page(page); return 0; } EXPORT_SYMBOL_GPL(kvm_read_guest_page); @@ -1117,14 +1119,17 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data, struct page *page; page = gfn_to_page(kvm, gfn); - if (is_error_page(page)) + if (is_error_page(page)) { + put_page(page); return -EFAULT; + } page_virt = kmap_atomic(page, KM_USER0); memcpy(page_virt + offset, data, len); kunmap_atomic(page_virt, KM_USER0); mark_page_dirty(kvm, gfn); + put_page(page); return 0; } EXPORT_SYMBOL_GPL(kvm_write_guest_page); @@ -1155,13 +1160,16 @@ int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len) struct page *page; page = gfn_to_page(kvm, gfn); - if (is_error_page(page)) + if (is_error_page(page)) { + put_page(page); return -EFAULT; + } page_virt = kmap_atomic(page, KM_USER0); memset(page_virt + offset, 0, len); kunmap_atomic(page_virt, KM_USER0); + put_page(page); return 0; } EXPORT_SYMBOL_GPL(kvm_clear_guest_page); @@ -2090,13 +2098,12 @@ int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, struct kvm_run *run, int in, for (i = 0; i nr_pages; ++i) { mutex_lock(vcpu-kvm-lock); page = gva_to_page(vcpu, address + i * PAGE_SIZE); - if (page) - get_page(page); vcpu-pio.guest_pages[i] = page; mutex_unlock(vcpu-kvm-lock); if (!page) { inject_gp(vcpu); free_pio_guest_pages(vcpu); return 1; } } @@ -3081,9 +3088,10 @@ static struct page *kvm_vm_nopage(struct vm_area_struct *vma, pgoff = ((address - vma-vm_start) PAGE_SHIFT) + vma-vm_pgoff; page = gfn_to_page(kvm, pgoff); - if (is_error_page(page)) + if (is_error_page(page)) { + put_page(page); return NOPAGE_SIGBUS; - get_page(page); + } if (type != NULL) *type =
[kvm-devel] [PATCH 4/4] Swapping
this patch just remove the memset from kvmctl, so the vm will load much faster now From dc0164113041c2f2bf22fc066ca99b9b8531d627 Mon Sep 17 00:00:00 2001 From: Izik Eidus [EMAIL PROTECTED](none) Date: Sat, 13 Oct 2007 02:56:25 +0200 Subject: [PATCH] now that gfn_to_page get called at run time, we dont have to do memset on the memory. (it is now much faster to load VM with alot of memory) Signed-off-by: Izik Eidus [EMAIL PROTECTED] --- user/kvmctl.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/user/kvmctl.c b/user/kvmctl.c index 0604f2f..ff2014e 100644 --- a/user/kvmctl.c +++ b/user/kvmctl.c @@ -391,7 +391,6 @@ int kvm_alloc_userspace_memory(kvm_context_t kvm, unsigned long memory, low_memory.userspace_addr = (unsigned long)*vm_mem; - memset((unsigned long *)low_memory.userspace_addr, 0, low_memory.memory_size); /* 640K should be enough. */ r = ioctl(kvm-vm_fd, KVM_SET_USER_MEMORY_REGION, low_memory); if (r == -1) { @@ -406,7 +405,6 @@ int kvm_alloc_userspace_memory(kvm_context_t kvm, unsigned long memory, return -1; } extended_memory.userspace_addr = (unsigned long)(*vm_mem + exmem); - memset((unsigned long *)extended_memory.userspace_addr, 0, extended_memory.memory_size); r = ioctl(kvm-vm_fd, KVM_SET_USER_MEMORY_REGION, extended_memory); if (r == -1) { fprintf(stderr, kvm_create_memory_region: %m\n); @@ -422,7 +420,6 @@ int kvm_alloc_userspace_memory(kvm_context_t kvm, unsigned long memory, return -1; } above_4g_memory.userspace_addr = (unsigned long)(*vm_mem + 0x1); - memset((unsigned long *)above_4g_memory.userspace_addr, 0, above_4g_memory.memory_size); r = ioctl(kvm-vm_fd, KVM_SET_USER_MEMORY_REGION, above_4g_memory); if (r == -1) { fprintf(stderr, kvm_create_memory_region: %m\n); -- 1.5.2.4 - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Test for KVM, kernel 33aaf..., userspace, 803145...
Zhao, Yunfeng wrote: Three Linux guest issues: 6. segfault while booting 64bit linux with 4GB mem https://sourceforge.net/tracker/?func=detailatid=893831aid=1812050gro up_id=180599 did it happen to you before kvm-46? - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Test for KVM, kernel 33aaf..., userspace, 803145...
Yes, it also happened before kvm-46. Guest with 1.5GB mem hasn't the problem. -Original Message- From: Izik Eidus [mailto:[EMAIL PROTECTED] Sent: 2007年10月13日 10:24 To: Zhao, Yunfeng Cc: kvm-devel Subject: Re: [kvm-devel] Test for KVM, kernel 33aaf..., userspace, 803145... Zhao, Yunfeng wrote: Three Linux guest issues: 6. segfault while booting 64bit linux with 4GB mem https://sourceforge.net/tracker/?funcÞtailatid3831aid12050gro up_id0599 did it happen to you before kvm-46? - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] RFC/patch portability: split kvm_vm_ioctl
Carsten Otte wrote: This patch splits kvm_vm_ioctl into archtecture independent parts, and x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c. Common ioctls for all architectures are: KVM_CREATE_VCPU, KVM_GET_DIRTY_LOG I'd really like to see more commonalities, but all others did not fit our needs. I would love to keep KVM_GET_DIRTY_LOG common, so that the ingenious migration code does not need to care too much about different architectures. x86 specific ioctls are: KVM_SET_MEMORY_REGION, KVM_SET_USER_MEMORY_REGION, KVM_GET/SET_NR_MMU_PAGES, KVM_SET_MEMORY_ALIAS, KVM_CREATE_IRQCHIP, KVM_CREATE_IRQ_LINE, KVM_GET/SET_IRQCHIP I don't know why we not put KVM_SET_MEMORY_REGION, KVM_SET_USER_MEMORY_REGION as common, although I have read the reasons you listed. I think they should work for most of archs, although it is not very friendly with s390. If we put them as arch-specific ones, we have to duplicate many copies for them in KVM code. One suggestion: Maybe we can comment out current memory allocation logic in userspace for S390, and s390 use your apporach to get its memory. While the pic/apic related functions are obviously x86 specific, some other ioctls seem to be common at a first glance. KVM_SET_(USER)_MEMORY_REGION for example. We've got a total different address layout on s390: we cannot support multiple slots, and a user memory range always equals the guest physical memory [guest_phys + vm specific offset = host user address]. We don't have nor need dedicated vmas for the guest memory, we just use what the memory managment has in stock. This is true, because we reuse the page table for user and guest mode. Looks to me like the s390 might have a lot in common with a future AMD nested page table implementation. If AMD choose to reuse the page table too, we might share the same ioctl to set up guest addressing with them. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH][Resend] Split kvm_vcpu to support new archs.
Carsten Otte wrote: Zhang, Xiantao wrote: diff --git a/drivers/kvm/ioapic.c b/drivers/kvm/ioapic.c index 3b69541..df67292 100644 --- a/drivers/kvm/ioapic.c +++ b/drivers/kvm/ioapic.c @@ -156,7 +156,7 @@ static u32 ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest, if (dest_mode == 0) { /* Physical mode. */ if (dest == 0xFF) {/* Broadcast. */ for (i = 0; i KVM_MAX_VCPUS; ++i) -if (kvm-vcpus[i] kvm-vcpus[i]-apic) +if (kvm-vcpus[i] kvm-vcpus[i]-arch.apic) mask |= 1 i; return mask; } Your mail client still wraps here, the patch is not applicable. Maybe my mail client has something wrong, I will check them next time. Thanks struct kvm_vcpu { struct kvm *kvm; struct preempt_notifier preempt_notifier; int vcpu_id; struct mutex mutex; int cpu; -u64 host_tsc; struct kvm_run *run; int interrupt_window_open; This one should go to arch. int guest_mode; unsigned long requests; unsigned long irq_summary; /* bit vector: 1 per word in irq_pending */ DECLARE_BITMAP(irq_pending, KVM_NR_INTERRUPTS); Both irq related ones too please. I can't understand about it, doesn't s390 need userspace to transfer interrupts into kvm module? or other approaches? If need, we had better follow existing infrastructure of KVM, or it may introduce unnecessary for most archs. Please don't forget that we are in KVM world :) int mmio_needed; int mmio_read_completed; Not all architectures have mmio, please put this into arch specific part. OK. Other then that, the patch looks fine to me. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] APIC_TMCCT register read bug
Hi, While booting a non-Linux OS under kvm-46, I noticed that reading APIC_TMCCT before initializing APIC_TDCR to something other than its boot time value would lead to a host kernel divide by zero exception. It's due to apic-timer.divide_count being set to 0 at boot... it should be set to 2 since APIC_TDCR=0 means 'divide count by 2'. The last hunk of the attached patch results in apic-timer.divide_count being set to 2 and eliminates the oops. The other changes to apic_get_tmcct() are intended to clean it up a bit, although completely untested other than to verify 0 is returned for a read of APIC_TMCCT at boot. 'apic' should not be used before the ASSERT() and using u32 for counter_passed makes it fairly easy to overflow. Kevin --- kvm-46.orig/kernel/lapic.c 2007-10-10 02:06:36.0 -0600 +++ kvm-46.fix/kernel/lapic.c 2007-10-12 22:50:01.0 -0600 @@ -487,12 +487,19 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic) { - u32 counter_passed; - ktime_t passed, now = apic-timer.dev.base-get_time(); - u32 tmcct = apic_get_reg(apic, APIC_TMICT); + u64 counter_passed; + ktime_t passed, now; + u32 tmcct; ASSERT(apic != NULL); + now = apic-timer.dev.base-get_time(); + tmcct = apic_get_reg(apic, APIC_TMICT); + + /* if initial count is 0, current count should also be 0 */ + if (tmcct == 0) + return 0; + if (unlikely(ktime_to_ns(now) = ktime_to_ns(apic-timer.last_update))) { /* Wrap around */ @@ -507,15 +514,24 @@ counter_passed = div64_64(ktime_to_ns(passed), (APIC_BUS_CYCLE_NS * apic-timer.divide_count)); - tmcct -= counter_passed; - if (tmcct = 0) { - if (unlikely(!apic_lvtt_period(apic))) + if (counter_passed tmcct) { + if (unlikely(!apic_lvtt_period(apic))) { + /* one-shot timers stick at 0 until reset */ tmcct = 0; - else - do { - tmcct += apic_get_reg(apic, APIC_TMICT); - } while (tmcct = 0); + } else { + /* +* periodic timers reset to APIC_TMICT when they +* hit 0. The while loop simulates this happening N +* times. (counter_passed %= tmcct) would also work, +* but might be slower or not work on 32-bit?? +*/ + while (counter_passed tmcct) + counter_passed -= tmcct; + tmcct -= counter_passed; + } + } else { + tmcct -= counter_passed; } return tmcct; @@ -844,7 +860,7 @@ apic_set_reg(apic, APIC_ISR + 0x10 * i, 0); apic_set_reg(apic, APIC_TMR + 0x10 * i, 0); } - apic-timer.divide_count = 0; + update_divide_count(apic); atomic_set(apic-timer.pending, 0); if (vcpu-vcpu_id == 0) vcpu-apic_base |= MSR_IA32_APICBASE_BSP; - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel