Re: [kvm-devel] [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly
Christian Borntraeger wrote: On kvm I have seen some rare hangs in stop_machine when I used more guest cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the hang quite often. I could also reproduce the problem on a 4 way z/VM host with a 64 way guest. I think that's one of those don't do that then cases ;) It turned out that the guest was consuming all available cpus mostly for spinning on scheduler locks like rq-lock. This is expected as the threads are calling yield all the time. The problem is now, that the host scheduling decisings together with the guest scheduling decisions and spinlocks not being fair managed to create an interesting scenario similar to a live lock. (Sometimes the hang resolved itself after some minutes) I think x86 (at least) is now using ticket locks, which is fair. Which kernel are you seeing this problem on? Changing stop_machine to yield the cpu to the hypervisor when yielding inside the guest fixed the problem for me. While I am not completely happy with this patch, I think it causes no harm and it really improves the situation for me. I used cpu_relax for yielding to the hypervisor, does that work on all architectures? On x86, cpu_relax is just a pause instruction (rep;nop). We don't hook it in paravirt_ops, and while VT/SVM can be used to fault into the hypervisor on this instruction, I don't know if kvm actually does so. Either way, it wouldn't work for VMI, Xen or lguest. J p.s.: If you want to reproduce the problem, cpu hotplug and kprobes use stop_machine_run and both triggered the problem after some retries. Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] CC: Ingo Molnar [EMAIL PROTECTED] CC: Rusty Russell [EMAIL PROTECTED] --- kernel/stop_machine.c |7 --- 1 file changed, 4 insertions(+), 3 deletions(-) Index: kvm/kernel/stop_machine.c === --- kvm.orig/kernel/stop_machine.c +++ kvm/kernel/stop_machine.c @@ -62,8 +62,7 @@ static int stopmachine(void *cpu) * help our sisters onto their CPUs. */ if (!prepared !irqs_disabled) yield(); - else - cpu_relax(); + cpu_relax(); } /* Ack: we are exiting. */ @@ -106,8 +105,10 @@ static int stop_machine(void) } /* Wait for them all to come to life. */ - while (atomic_read(stopmachine_thread_ack) != stopmachine_num_threads) + while (atomic_read(stopmachine_thread_ack) != stopmachine_num_threads) { yield(); + cpu_relax(); + } /* If some failed, kill them all. */ if (ret 0) { ___ Virtualization mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/virtualization - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly
Christian Borntraeger wrote: I really like 64 guest cpus as a good testcase for all kind of things. Sure, I do the same kind of thing. I think x86 (at least) is now using ticket locks, which is fair. Which kernel are you seeing this problem on? Sorry, forgot to mention. Its kvm.git from 2 days ago on s390. And on s390 cpu_relax yields the vcpu? That's not common behaviour across architectures. J - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [ RfC / patch ] kvmclock fixes
Gerd Hoffmann wrote: * Host: make kvm pv clock really compatible with xen pv clock. * Guest/xen: factor out some xen clock code into a separate source file (pvclock.[ch]), so kvm can reuse it. * Guest/kvm: make kvm clock compatible with xen clock by using the common code bits. I guess saving on code duplication is good... +cycle_t pvclock_clocksource_read(struct kvm_vcpu_time_info *src) +{ + struct pvclock_shadow_time *shadow = get_cpu_var(shadow_time); + cycle_t ret; + + pvclock_get_time_values(shadow, src); + ret = shadow-system_timestamp + pvclock_get_nsec_offset(shadow); You need to put this in a loop in case the system clock parameters change between the pvclock_get_time_values() and pvclock_get_nsec_offset(). How does kvm deal with suspend/resume with respect to time? Is the system timestamp guaranteed to remain monotonic? For Xen, I think we'll need to maintain an offset between the initial system timestamp and whatever it is after resuming. J - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] pv clock: kvm is incompatible with xen :-(
Gerd Hoffmann wrote: Hmm, I somehow fail to see a case where it could be non-atomic ... get_time_values() copies a consistent snapshot, thus xen_clocksource_read() doesn't race against xen updating the fields. The snapshot is in a per-cpu variable, thus it doesn't race against other guest vcpus running get_time_values() at the same time. Xen could change the parameters in the instant after get_time_values(). That change could be as a result of suspend-resume, so the parameters and the tsc could be wildly different. It's definitely an edge-case, but it's easy enough to deal with. There could be a loopless __get_time_values() for use in this case, but given that it almost never loops, I don't think its worthwhile. in this case ??? I'm confused. There is only a single user of get_nsec_offset(), which is xen_clocksource_read() ... Sure, but get_time_values() has several other callers. If xen_clocksource_read() had its own loop to make sure the read_tsc is atomic with respect to get_time_values, then get_time_values itself needn't loop. But, given that it only loops in the very rare case that it races with Xen updating those parameters, it doesn't seem to make much difference either way. J - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] pv clock: kvm is incompatible with xen :-(
Gerd Hoffmann wrote: Jeremy Fitzhardinge wrote: Xen could change the parameters in the instant after get_time_values(). That change could be as a result of suspend-resume, so the parameters and the tsc could be wildly different. Ah, ok, forgot the rdtsc in the picture. With that in mind I fully agree that the loop is needed. I think kvm guests can even hit that one with the vcpu migrating to a different physical cpu, so we better handle it correctly ;) Yes, same with Xen. Sure, but get_time_values() has several other callers. Not really. There are only two calls, one in clocksource_read() and one in the init path. The later is superfluous I think because clocksource_read() is the only user of the shadowed time info. Hm. It doesn't look like shadow_time needs to be a static percpu at all. It could just be a local to clocksource_read, I think. J - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [ RfC / patch ] kvmclock fixes
Gerd Hoffmann wrote: +cycle_t pvclock_clocksource_read(struct kvm_vcpu_time_info *src) +{ + struct pvclock_shadow_time *shadow; + cycle_t ret; + unsigned version; + + shadow = get_cpu_var(shadow_time); + do { + version = pvclock_get_time_values(shadow, src); + barrier(); + ret = shadow-system_timestamp + pvclock_get_nsec_offset(shadow); + barrier(); Is barrier() strong enough? Does kvm guarantee that the per-cpu time parameters are only ever updated by that cpu? I'm pretty sure Xen does, so that's OK. J - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] pv clock: kvm is incompatible with xen :-(
Gerd Hoffmann wrote: I'm looking at the guest side of the issue right now, trying to identify common code, and while doing so noticed that xen does the version-check-loop in both get_time_values_from_xen(void) and xen_clocksource_read(void), and I can't see any obvious reason for that. The loop in xen_clocksource_read(void) is not needed IMHO. Can I drop it? No. The get_nsec_offset() needs to be atomic with respect to the get_time_values() parameters. There could be a loopless __get_time_values() for use in this case, but given that it almost never loops, I don't think its worthwhile. J - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] pv clock: kvm is incompatible with xen :-(
Gerd Hoffmann wrote: Wall clock is off a few hours though. Oops. I think the way wall clock and system clock work together in xen (Jeremy correct me if I'm wrong) is that the wall clock specifies the point in time where the system clock started going. As kvm fills in host system time into the guest system time fields the guest wall clock fields should be filled with the host boot time timestamp I'd say. Yes. The wallclock field in the shared info structure is the wallclock time at boot; you compute the current time by adding the system timestamp to it. System time changes are effected by retroactively changing the boot time of the machine, though that can also change because of suspend/resume/migrate. In general the kernel only reads the wallclock time at boot, and then maintains it for itself from then on. I think. J - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 01/10] emm: mm_lock: Lock a process against reclaim
Andrea Arcangeli wrote: On Fri, Apr 04, 2008 at 04:12:42PM -0700, Jeremy Fitzhardinge wrote: I think you can break this if() down a bit: if (!(vma-vm_file vma-vm_file-f_mapping)) continue; It makes no difference at runtime, coding style preferences are quite subjective. Well, overall the formatting of that if statement is very hard to read. Separating out the logically distinct pieces in to different ifs at least shows the reader that they are distinct. Aside from that, doing some manual CSE to remove all the casts and expose the actual thing you're testing for would help a lot (are the casts even necessary?). So this is an O(n^2) algorithm to take the i_mmap_locks from low to high order? A comment would be nice. And O(n^2)? Ouch. How often is it called? It's called a single time when the mmu notifier is registered. It's a very slow path of course. Any other approach to reduce the complexity would require memory allocations and it would require mmu_notifier_register to return -ENOMEM failure. It didn't seem worth it. It's per-mm though. How many processes would need to have notifiers? And is it necessary to mush lock and unlock together? Unlock ordering doesn't matter, so you should just be able to have a much simpler loop, no? That avoids duplicating .text. Originally they were separated. unlock can't be a simpler loop because I didn't reserve vm_flags bitflags to do a single O(N) loop for unlock. If you do malloc+fork+munmap two vmas will point to the same anon-vma lock, that's why the unlock isn't simpler unless I mark what I locked with a vm_flags bitflag. Well, its definitely going to need more comments then. I assumed it would end up locking everything, so unlocking everything would be sufficient. J - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Register now and save $200. Hurry, offer ends at 11:59 p.m., Monday, April 7! Use priority code J8TLD2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 01/10] emm: mm_lock: Lock a process against reclaim
Christoph Lameter wrote: Provide a way to lock an mm_struct against reclaim (try_to_unmap etc). This is necessary for the invalidate notifier approaches so that they can reliably add and remove a notifier. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/mm.h | 10 mm/mmap.c | 66 + 2 files changed, 76 insertions(+) Index: linux-2.6/include/linux/mm.h === --- linux-2.6.orig/include/linux/mm.h 2008-04-02 11:41:47.741678873 -0700 +++ linux-2.6/include/linux/mm.h 2008-04-04 15:02:17.660504756 -0700 @@ -1050,6 +1050,16 @@ extern int install_special_mapping(struc unsigned long addr, unsigned long len, unsigned long flags, struct page **pages); +/* + * Locking and unlocking am mm against reclaim. + * + * mm_lock will take mmap_sem writably (to prevent additional vmas from being + * added) and then take all mapping locks of the existing vmas. With that + * reclaim is effectively stopped. + */ +extern void mm_lock(struct mm_struct *mm); +extern void mm_unlock(struct mm_struct *mm); + extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, Index: linux-2.6/mm/mmap.c === --- linux-2.6.orig/mm/mmap.c 2008-04-04 14:55:03.477593980 -0700 +++ linux-2.6/mm/mmap.c 2008-04-04 14:59:05.505395402 -0700 @@ -2242,3 +2242,69 @@ int install_special_mapping(struct mm_st return 0; } + +static void mm_lock_unlock(struct mm_struct *mm, int lock) +{ + struct vm_area_struct *vma; + spinlock_t *i_mmap_lock_last, *anon_vma_lock_last; + + i_mmap_lock_last = NULL; + for (;;) { + spinlock_t *i_mmap_lock = (spinlock_t *) -1UL; + for (vma = mm-mmap; vma; vma = vma-vm_next) + if (vma-vm_file vma-vm_file-f_mapping I think you can break this if() down a bit: if (!(vma-vm_file vma-vm_file-f_mapping)) continue; + (unsigned long) i_mmap_lock + (unsigned long) + vma-vm_file-f_mapping-i_mmap_lock + (unsigned long) + vma-vm_file-f_mapping-i_mmap_lock + (unsigned long) i_mmap_lock_last) + i_mmap_lock = + vma-vm_file-f_mapping-i_mmap_lock; So this is an O(n^2) algorithm to take the i_mmap_locks from low to high order? A comment would be nice. And O(n^2)? Ouch. How often is it called? And is it necessary to mush lock and unlock together? Unlock ordering doesn't matter, so you should just be able to have a much simpler loop, no? + if (i_mmap_lock == (spinlock_t *) -1UL) + break; + i_mmap_lock_last = i_mmap_lock; + if (lock) + spin_lock(i_mmap_lock); + else + spin_unlock(i_mmap_lock); + } + + anon_vma_lock_last = NULL; + for (;;) { + spinlock_t *anon_vma_lock = (spinlock_t *) -1UL; + for (vma = mm-mmap; vma; vma = vma-vm_next) + if (vma-anon_vma + (unsigned long) anon_vma_lock + (unsigned long) vma-anon_vma-lock + (unsigned long) vma-anon_vma-lock + (unsigned long) anon_vma_lock_last) + anon_vma_lock = vma-anon_vma-lock; + if (anon_vma_lock == (spinlock_t *) -1UL) + break; + anon_vma_lock_last = anon_vma_lock; + if (lock) + spin_lock(anon_vma_lock); + else + spin_unlock(anon_vma_lock); + } +} + +/* + * This operation locks against the VM for all pte/vma/mm related + * operations that could ever happen on a certain mm. This includes + * vmtruncate, try_to_unmap, and all page faults. The holder + * must not hold any mm related lock. A single task can't take more + * than one mm lock in a row or it would deadlock. + */ +void mm_lock(struct mm_struct * mm) +{ + down_write(mm-mmap_sem); + mm_lock_unlock(mm, 1); +} + +void mm_unlock(struct mm_struct *mm) +{ + mm_lock_unlock(mm, 0); + up_write(mm-mmap_sem); +} - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Register now and save $200. Hurry, offer
Re: [kvm-devel] [02/17][PATCH] Implement smp_call_function_mask for ia64 - V8
Jes Sorensen wrote: Jeremy Fitzhardinge wrote: Jes Sorensen wrote: This change has been on the x86 side for ages, and not even Ingo made a peep about it ;) Mmmm, last time I looked, x86 didn't scale to any interesting number of CPUs :-) Well, I guess you need all those CPUs if scanning a 64-word bitvector takes anything like the time it takes to do an IPI... I wasn't suggesting we shouldn't have both interfaces, merely questioning why adding what to me seems like an unnecessary performance hit for the classic case of the call. I don't mind how many interfaces there are, so long as there only needs to be one place to hook to plug in the Xen version of smp_call_function_whatever. Perhaps the answer is to just hook the IPI mechanism itself rather than the whole of smp_call_function_mask... Have you looked at Jens Axboe's patches to make all this stuff a lot more arch-common? J - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [02/17][PATCH] Implement smp_call_function_mask for ia64 - V8
Jes Sorensen wrote: I'm a little wary of the performance impact of this change. Doing a cpumask compare on all smp_call_function calls seems a little expensive. Maybe it's just noise in the big picture compared to the actual cost of the IPIs, but I thought I'd bring it up. Keep in mind that a cpumask can be fairly big these days, max NR_CPUS is currently 4096. For those booting a kernel with NR_CPUS at 4096 on a dual CPU machine, it would be a bit expensive. Unless your hardware has remarkably fast IPIs, I think really the cost of scanning 512 bytes is going to be in the noise... This change has been on the x86 side for ages, and not even Ingo made a peep about it ;) Why not keep smp_call_function() the way it was before, rather than implementing it via the call to smp_call_function_mask()? Because Xen needs a different core implementation (because of its different IPI implementation), and it would be better to just have to do one of them rather than N. J - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC/PATCH 01/15] preparation: provide hook to enable pgstes in user pagetable
Carsten Otte wrote: +struct mm_struct *dup_mm(struct task_struct *tsk); No prototypes in .c files. Put this in an appropriate header. J - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [EMAIL PROTECTED]: [patch 3/4] paravirt: set_access_flags/set_wrprotect should use paravirt interface]
Marcelo Tosatti wrote: Forgot to copy you... Ideally all pte updates should be done via the paravirt interface. Hm, are you sure? +static inline void pte_clear_bit(unsigned int bit, pte_t *ptep) +{ + pte_t pte = *ptep; + clear_bit(bit, (unsigned long *)pte.pte); + set_pte(ptep, pte); +} This is not a safe transformation. This will lose hardware A/D bit changes if the pte is part of an active pagetable. Aside from that, clear_bit is atomic and so is the wrong thing to use on a local variable; a plain bitmask would do the job. clear_bit on a PTE is fine, since everyone has to support some level of trap'n'emulate on pte-level entries (ptep_get_and_clear is very hard to implement otherwise). Shadow pagetable implementations will do this naturally, and Xen must do it to allow atomic updates on ptes (also because it makes late-pinning/early-unpinning a performance win). + static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { Index: linux-2.6-x86-kvm/include/asm-x86/pgtable.h === --- linux-2.6-x86-kvm.orig/include/asm-x86/pgtable.h +++ linux-2.6-x86-kvm/include/asm-x86/pgtable.h @@ -227,6 +227,8 @@ void native_pagetable_setup_done(pgd_t * #define pte_update(mm, addr, ptep) do { } while (0) #define pte_update_defer(mm, addr, ptep)do { } while (0) +#define pte_clear_bit(bit, ptep) clear_bit(bit, (unsigned long *)ptep-pte) + static inline void paravirt_pagetable_setup_start(pgd_t *base) { native_pagetable_setup_start(base); @@ -302,7 +304,7 @@ static inline void native_set_pte_at(str ({ \ int __changed = !pte_same(*(ptep), entry); \ if (__changed dirty) { \ - *ptep = entry; \ + set_pte(ptep, entry); \ This change is OK, but doesn't really do anything. All hypervisors allow direct access to ptes. pte_update_defer((vma)-vm_mm, (address), (ptep)); \ flush_tlb_page(vma, address); \ } \ @@ -357,7 +359,7 @@ static inline pte_t ptep_get_and_clear_f #define __HAVE_ARCH_PTEP_SET_WRPROTECT static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - clear_bit(_PAGE_BIT_RW, (unsigned long *)ptep-pte); + pte_clear_bit(_PAGE_BIT_RW, ptep); This is only valid if ptep_set_wrprotect is never used on active ptes, which I don't think is the case. J - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [EMAIL PROTECTED]: [patch 3/4] paravirt: set_access_flags/set_wrprotect should use paravirt interface]
Marcelo Tosatti wrote: On Wed, Jan 30, 2008 at 03:00:49PM -0800, Jeremy Fitzhardinge wrote: Marcelo Tosatti wrote: Forgot to copy you... Ideally all pte updates should be done via the paravirt interface. Hm, are you sure? +static inline void pte_clear_bit(unsigned int bit, pte_t *ptep) +{ + pte_t pte = *ptep; + clear_bit(bit, (unsigned long *)pte.pte); + set_pte(ptep, pte); +} This is not a safe transformation. This will lose hardware A/D bit changes if the pte is part of an active pagetable. Aside from that, clear_bit is atomic and so is the wrong thing to use on a local variable; a plain bitmask would do the job. True this is racy on baremetal. And under Xen (since the pagetable is the real hardware pagetable). clear_bit on a PTE is fine, since everyone has to support some level of trap'n'emulate on pte-level entries (ptep_get_and_clear is very hard to implement otherwise). Shadow pagetable implementations will do this naturally, and Xen must do it to allow atomic updates on ptes (also because it makes late-pinning/early-unpinning a performance win). Sure clear_bit on a PTE is fine, but we wan't to virtualize for batching pte updates via hypercall. Yes, mprotect needs batching, but that probably needs special handling. + static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { Index: linux-2.6-x86-kvm/include/asm-x86/pgtable.h === --- linux-2.6-x86-kvm.orig/include/asm-x86/pgtable.h +++ linux-2.6-x86-kvm/include/asm-x86/pgtable.h @@ -227,6 +227,8 @@ void native_pagetable_setup_done(pgd_t * #define pte_update(mm, addr, ptep) do { } while (0) #define pte_update_defer(mm, addr, ptep)do { } while (0) +#define pte_clear_bit(bit, ptep) clear_bit(bit, (unsigned long *)ptep-pte) + static inline void paravirt_pagetable_setup_start(pgd_t *base) { native_pagetable_setup_start(base); @@ -302,7 +304,7 @@ static inline void native_set_pte_at(str ({ \ int __changed = !pte_same(*(ptep), entry); \ if (__changed dirty) { \ - *ptep = entry; \ + set_pte(ptep, entry); \ This change is OK, but doesn't really do anything. All hypervisors allow direct access to ptes. We want the pte update to go through a hypercall instead of trap'n'emulate. ptep_set_access_flags is only used in fault handling, so there's no scope for batching. pte_update_defer((vma)-vm_mm, (address), (ptep)); \ flush_tlb_page(vma, address); \ } \ @@ -357,7 +359,7 @@ static inline pte_t ptep_get_and_clear_f #define __HAVE_ARCH_PTEP_SET_WRPROTECT static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - clear_bit(_PAGE_BIT_RW, (unsigned long *)ptep-pte); + pte_clear_bit(_PAGE_BIT_RW, ptep); This is only valid if ptep_set_wrprotect is never used on active ptes, which I don't think is the case. Ok, pte_clear_bit() like above is broken. But the point is that we want to batch the write protection of the source pte along with the creation of the write protected destination pte on COW using the paravirt_enter_lazy/paravirt_leave_lazy. Which we are we talking about here? In Xen, the destination pte is unpinned during a fork(), so a plain memory write will work without any hypervisor interaction. That's what I mean about the late pin/early unpin optimisation. Almost all fork/exit pagetable manipulation happens on unpinned pagetables which are plain RW pages. It's only updates on pinned pagetables which need hypervisor interaction. The only significant batchable update is mprotect. I don't know what the issues for KVM/VMI are. J - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH][RFC] SVM: Add Support for Nested Paging in AMD Fam16 CPUs
Avi Kivity wrote: I find it non-descriptive, and it reminds me of another hypervisor. I suggest 'tlp' for two-level paging. That has its own ambiguity; without other context it reads like two-level pagetable. Anyway, using the same term for the same thing is not a bad idea. J - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] export notifier #1
Gerd Hoffmann wrote: Another maybe workable approach for Xen is to go through pv_ops (although pte_clear doesn't go through pv_ops right now, so this would be an additional hook too ...). It does for 32-bit PAE. Making pte_clear uniform across all pagetable modes would be a nice cleanup. J - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 24/24] make vsmp a paravirt client
Glauber de Oliveira Costa wrote: the ifdef only exists because, as I said, the code itself will be always compiled in, to avoid an ifdef in setup_64.c. So it's just a taking it from here, putting it there issue. Kiran seem to prefer this way, but I don't really have a preference. It would be better to have the ifdef in setup_64.c and just make the compilation of vsmp_64.c depend on CONFIG_X86_VSMP. If the ifdef really rankles you could use a weak stub function somewhere, or define an inline stub in vsmp.h. J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/24] paravirt_ops for unified x86 - that's me again!
Amit Shah wrote: Glauber, are you planning on consolidating the dma_ops structure for 32- and 64-bit? 32-bit doesn't currently have a dma_mapping_ops structure, which makes paravirtualizing DMA access difficult on 32-bit. I think it's a good idea. While I haven't worked out the details yet, it seems like something I'll need for Xen dom0 support. J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 1/24] mm/sparse-vmemmap.c: make sure init_mm is included
Glauber de Oliveira Costa wrote: mm/sparse-vmemmap.c uses init_mm in some places. However, it is not present in any of the headers currently included in the file. init_mm is defined as extern in sched.h, so we add it to the headers list Up to now, this problem was masked by the fact that functions like set_pte_at() and pmd_populate_kernel() are usually macros that expand to simpler variants that does not use the first parameter at all. Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] Signed-off-by: Linus Torvalds [EMAIL PROTECTED] --- mm/sparse-vmemmap.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index d3b718b..22620f6 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -24,6 +24,7 @@ #include linux/module.h #include linux/spinlock.h #include linux/vmalloc.h +#include linux/sched.h This is already in git. J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 5/24] smp x86 consolidation
Glauber de Oliveira Costa wrote: This patch consolidates part of the pieces of smp for both architectures. (i386 and x86_64). It makes part the calls go through smp_ops, and shares code for those functions in smpcommon.c There's more room for code sharing here, but it is left as an exercise to the reader ;-) I'm getting link errors in 32-bit: arch/x86/kernel/built-in.o: In function `native_smp_send_reschedule': /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/smpcommon.c:262: undefined reference to `genapic' arch/x86/kernel/built-in.o: In function `native_smp_call_function_mask': /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/smpcommon.c:113: undefined reference to `genapic' J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 5/24] smp x86 consolidation
Glauber de Oliveira Costa wrote: arch/x86/kernel/built-in.o: In function `native_smp_send_reschedule': /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/smpcommon.c:262: undefined reference to `genapic' arch/x86/kernel/built-in.o: In function `native_smp_call_function_mask': /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/smpcommon.c:113: undefined reference to `genapic' Ok, it compiled just fine here. I bet it's due to one of that i386 lots of variants. Which subarchitecture are you compiling for, jeremy ? Default (CONFIG_X86_PC). Config attached. J # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24-rc2 # Fri Nov 9 15:19:56 2007 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y # CONFIG_TASK_XACCT is not set # CONFIG_USER_NS is not set # CONFIG_AUDIT is not set # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=17 # CONFIG_CGROUPS is not set CONFIG_FAIR_GROUP_SCHED=y CONFIG_FAIR_USER_SCHED=y # CONFIG_FAIR_CGROUP_SCHED is not set CONFIG_SYSFS_DEPRECATED=y CONFIG_RELAY=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set CONFIG_KALLSYMS_EXTRA_PASS=y CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLUB_DEBUG=y # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set CONFIG_MODVERSIONS=y CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y # CONFIG_LBD is not set # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set CONFIG_BLK_DEV_BSG=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=cfq CONFIG_PREEMPT_NOTIFIERS=y # # Processor type and features # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_SMP=y CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y CONFIG_PARAVIRT=y CONFIG_PARAVIRT_GUEST=y CONFIG_XEN=y CONFIG_VMI=y # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set CONFIG_MPENTIUMM=y # CONFIG_MCORE2 is not set # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_XADD=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_X86_CMOV=y CONFIG_X86_MINIMUM_CPU_FAMILY=4 CONFIG_HPET_TIMER=y CONFIG_NR_CPUS=4 # CONFIG_SCHED_SMT is not set CONFIG_SCHED_MC=y # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y CONFIG_PREEMPT_BKL=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_X86_MCE=y CONFIG_X86_MCE_NONFATAL=y CONFIG_X86_MCE_P4THERMAL=y
Re: [kvm-devel] include files for kvmclock
Glauber de Oliveira Costa wrote: This patch introduces the include files for kvm clock. They'll be needed for both guest and host part. Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- include/asm-x86/kvm_para.h | 23 +++ include/linux/kvm.h|1 + include/linux/kvm_para.h | 20 3 files changed, 44 insertions(+), 0 deletions(-) diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h index c6f3fd8..af9fb75 100644 --- a/include/asm-x86/kvm_para.h +++ b/include/asm-x86/kvm_para.h @@ -10,15 +10,38 @@ * paravirtualization, the appropriate feature bit should be checked. */ #define KVM_CPUID_FEATURES 0x4001 +#define KVM_FEATURE_CLOCKEVENTS 0 +#define KVM_FEATURE_CLOCKSOURCE 1 + #ifdef __KERNEL__ #include asm/processor.h +extern void kvmclock_init(void); + +union kvm_hv_clock { Why two copies of this structure? + struct { + u64 tsc_mult; + u64 now_ns; + /* That's the wall clock, not the water closet */ + u64 wc_sec; + u64 wc_nsec; + u64 last_tsc; + /* At first, we could use the tsc value as a marker, but Jeremy + * well noted that it will cause us locking problems in 32-bit + * sys, so we have a special version field */ + u32 version; + }; + char page_align[PAGE_SIZE]; +}; + [...] + +union kvm_hv_clock { + struct { + u64 tsc_mult; + u64 now_ns; + /* That's the wall clock, not the water closet */ + u64 wc_sec; + u64 wc_nsec; + u64 last_tsc; + /* At first, we could use the tsc value as a marker, but Jeremy + * well noted that it will cause us locking problems in 32-bit + * sys, so we have a special version field */ + u32 version; + }; + char page_align[PAGE_SIZE]; +}; + /* * hypercalls use architecture specific */ - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] include files for kvmclock
Avi Kivity wrote: Glauber de Oliveira Costa wrote: +union kvm_hv_clock { + struct { + u64 tsc_mult; + u64 now_ns; + /* That's the wall clock, not the water closet */ + u64 wc_sec; + u64 wc_nsec; Do we really need 128-bit time? you must be planning to live forever. Well, he's planning on having lots of very small nanoseconds. J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 3/16] read/write_crX, clts and wbinvd for 64-bit paravirt
Glauber de Oliveira Costa wrote: I in fact have seen bugs with mixed reads and writes to the same cr, (cr4), but adding the volatile flag to the read function seemed to fix it. Well, volatile will make a read be repeated rather than caching the previous value, but it has no effect on ordering. Yet, I agree with you that the theorectical problem exists for the reorder, and your proposed fix seems fine (although if we're really desperate about memory usage, we can use a char instead a int and save 3 bytes!) Sure. Ideally the compiler would never even generate a reference to it, and it could just be extern, but in practice the compiler will generate references sometimes. J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 3/16] read/write_crX, clts and wbinvd for 64-bit paravirt
Keir Fraser wrote: volatile prevents the asm from being 'moved significantly', according to the gcc manual. I take that to mean that reordering is not allowed. That phrase doesn't appear in the gcc manual; in fact, it specifically says that reordering can happen: The `volatile' keyword indicates that the instruction has important side-effects. GCC will not delete a volatile `asm' if it is reachable. (The instruction can still be deleted if GCC can prove that control-flow will never reach the location of the instruction.) Note that even a volatile `asm' instruction can be moved relative to other code, including across jump instructions. For example, on many targets there is a system register which can be set to control the rounding mode of floating point operations. You might try setting it with a volatile `asm', like this PowerPC example: asm volatile(mtfsf 255,%0 : : f (fpenv)); sum = x + y; This will not work reliably, as the compiler may move the addition back before the volatile `asm'. To make it work you need to add an artificial dependency to the `asm' referencing a variable in the code you don't want moved, for example: asm volatile (mtfsf 255,%1 : =X(sum): f(fpenv)); sum = x + y; I take from this that it is not a good idea to assume asm volatile has any ordering effects at all. J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 3/16] read/write_crX, clts and wbinvd for 64-bit paravirt
Glauber de Oliveira Costa wrote: This patch introduces, and patch callers when needed, native versions for read/write_crX functions, clts and wbinvd. Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] Signed-off-by: Steven Rostedt [EMAIL PROTECTED] Acked-by: Jeremy Fitzhardinge [EMAIL PROTECTED] --- arch/x86/mm/pageattr_64.c |3 +- include/asm-x86/system_64.h | 60 ++ 2 files changed, 45 insertions(+), 18 deletions(-) diff --git a/arch/x86/mm/pageattr_64.c b/arch/x86/mm/pageattr_64.c index c40afba..59a52b0 100644 --- a/arch/x86/mm/pageattr_64.c +++ b/arch/x86/mm/pageattr_64.c @@ -12,6 +12,7 @@ #include asm/processor.h #include asm/tlbflush.h #include asm/io.h +#include asm/paravirt.h pte_t *lookup_address(unsigned long address) { @@ -77,7 +78,7 @@ static void flush_kernel_map(void *arg) much cheaper than WBINVD. */ /* clflush is still broken. Disable for now. */ if (1 || !cpu_has_clflush) - asm volatile(wbinvd ::: memory); + wbinvd(); else list_for_each_entry(pg, l, lru) { void *adr = page_address(pg); clflush_cache_range(adr, PAGE_SIZE); diff --git a/include/asm-x86/system_64.h b/include/asm-x86/system_64.h index 4cb2384..b558cb2 100644 --- a/include/asm-x86/system_64.h +++ b/include/asm-x86/system_64.h @@ -65,53 +65,62 @@ extern void load_gs_index(unsigned); /* * Clear and set 'TS' bit respectively */ -#define clts() __asm__ __volatile__ (clts) +static inline void native_clts(void) +{ + asm volatile (clts); +} -static inline unsigned long read_cr0(void) -{ +static inline unsigned long native_read_cr0(void) +{ unsigned long cr0; asm volatile(movq %%cr0,%0 : =r (cr0)); return cr0; } This is a pre-existing bug, but it seems to me that these read/write crX asms should have a constraint to stop the compiler from reordering them with respect to each other. The brute-force approach would be to add memory clobbers, but the subtle fix would be to add a variable which is only used to sequence: static int __cr_seq; static inline unsigned long native_read_cr0(void) { unsigned long cr0; asm volatile(mov %%cr0, %0 : =r (cr0), =m (__cr_seq)); } static inline void native_write_cr0(unsigned long val) { asm volatile(mov %1, %%cr0 : +m (__cr_seq) : r (val)); } J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 11/16] turn priviled operation into a macro in head_64.S
Glauber de Oliveira Costa wrote: under paravirt, read cr2 cannot be issued directly anymore. So wrap it in a macro, defined to the operation itself in case paravirt is off, but to something else if we have paravirt in the game Is this actually needed? It's only used in the early fault handler in head_64.S. Will we be taking that path in the paravirt case? If so, should we disable the fault handler altogether, since the hypervisor can probably provide better diagnositcs. J Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] Signed-off-by: Steven Rostedt [EMAIL PROTECTED] Acked-by: Jeremy Fitzhardinge [EMAIL PROTECTED] --- arch/x86/kernel/head_64.S |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index b6167fe..c31b1c9 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -19,6 +19,13 @@ #include asm/msr.h #include asm/cache.h +#ifdef CONFIG_PARAVIRT +#include asm/asm-offsets.h +#include asm/paravirt.h +#else +#define GET_CR2_INTO_RCX movq %cr2, %rcx +#endif + /* we are not able to switch in one step to the final KERNEL ADRESS SPACE * because we need identity-mapped pages. * @@ -267,7 +274,7 @@ ENTRY(early_idt_handler) xorl %eax,%eax movq 8(%rsp),%rsi # get rip movq (%rsp),%rdx - movq %cr2,%rcx + GET_CR2_INTO_RCX leaq early_idt_msg(%rip),%rdi call early_printk cmpl $2,early_recursion_flag(%rip) - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] raise tsc clocksource rating
Ingo Molnar wrote: * Zachary Amsden [EMAIL PROTECTED] wrote: On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote: From: Glauber de Oliveira Costa [EMAIL PROTECTED] tsc is very good time source (when it does not have drifts, does not change it's frequency, i.e. when it works), so it should have its rating raised to a value greater than, or equal 400. Since it's being a tendency among paravirt clocksources to use values around 400, we should declare tsc as even better: So we use 500. Why is the TSC better than a paravirt clocksource? In our case this is definitely inaccurate. Paravirt clocksources should be preferred to TSC, and both must be made available in hardware for platforms which do not support paravirt. if it's inaccurate why are you exposing it to the guest then? Native only uses the TSC if it's safe and accurate to do so. It is used as part of the Xen clocksource as a short term extrapolator, with correction parameters supplied by the hypervisor. It should never be used directly. J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] raise tsc clocksource rating
Zachary Amsden wrote: On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote: From: Glauber de Oliveira Costa [EMAIL PROTECTED] tsc is very good time source (when it does not have drifts, does not change it's frequency, i.e. when it works), so it should have its rating raised to a value greater than, or equal 400. Since it's being a tendency among paravirt clocksources to use values around 400, we should declare tsc as even better: So we use 500. Why is the TSC better than a paravirt clocksource? In our case this is definitely inaccurate. Paravirt clocksources should be preferred to TSC, and both must be made available in hardware for platforms which do not support paravirt. Also, please cc all the paravirt developers on things related to paravirt, especially things with such broad effect. I think 400 is a good value for a perfect native clocksource. 400 should be reserved for super-real (i.e. paravirt) sources that should always be chosen over a hardware realistic implementation in a virtual environment. Yes, agreed. The tsc is never the right thing to use if there's a paravirt clocksource available. What's wrong with rating it 300? What inferior clocksource does it lose out to? Shouldn't that clocksource be lowered? (Why don't we just use 1 to 10?) J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] raise tsc clocksource rating
Ingo Molnar wrote: that's totally broken then. You cannot create an SMP-safe monotonic clocksource via interpolation - native does not do it either. Good thing this problem got exposed, it needs to be fixed. Sigh, I don't really want to have this fight again. I don't really see what point there is in raising the tsc's rating (why is 300 insufficient again?), but regardless of the value, the Xen clocksource rating needs to be higher. J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] raise tsc clocksource rating
Ingo Molnar wrote: i dont remember us having discussed this before, ever. If there's any fight about monotonicity and SMP then it would be a pretty onesided affair, with you being beaten up seriously ;-) This is part of Xen's ABI, so it isn't easily changed. You're right that getting monotonicity is a pretty ugly affair of doing something like if (now previous_return) return ++previous_return;, but its better than using the tsc. (Last time around, it was Xen can't use the tsc because it can't ever be used as a stable timebase - though I don't think that was you specifically.) J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] Paravirt timer for KVM
Carsten Otte wrote: On s390, we've had an instruction... Yes, quite ;) J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] Paravirt timer for KVM
Glauber de Oliveira Costa wrote: My next TODOs with it are: * Get SMP working * Try something for stolen time, as jeremy's last suggestion for anthony's patch * Measure the time it takes for a hypercall, and subtract this time for calculating the expiry time for the timer event. I don't think there's much point in trying to do stuff like this. The guest can be preempted at any time, so there's an arbitrary amount of time between deciding to set a timeout, and the time the timeout actually happens. In theory you can mitigate this by using an absolute rather than relative timeout value, but in practice I don't think it makes much difference. + +/* + * This is our read_clock function. The host puts an tsc timestamp each time + * it updates a new time, and then we can use it to derive a slightly more + * precise notion of elapsed time, converted to nanoseconds. + * + * If the platform provides a stable tsc, we just use it, and there is no need + * for the host to update anything. How would you deal with suspend/resume/migrate? Also, do you assume that stable_tsc also means synchronized tsc on an SMP host? + */ +static cycle_t kvm_clock_read(void) { + + u64 delta, last_tsc; + struct timespec *now; + + if (hv_clock.stable_tsc) { + rdtscll(last_tsc); + return last_tsc; So this returns a tsc here? + } + + do { + last_tsc = hv_clock.last_tsc; + rmb(); + now = hv_clock.now; Shouldn't this be taking a copy of now, rather than a pointer to it? Otherwise what's the point of this loop? + rmb(); + } while (hv_clock.last_tsc != last_tsc); This won't be an atomic compare on 32-bit; it could get confused by seeing a half-updated tsc value. + + delta = native_read_tsc() - last_tsc; + delta = (delta * hv_clock.tsc_mult) KVM_SCALE; + + return (cycle_t)now-tv_sec * NSEC_PER_SEC + now-tv_nsec + delta; --- But returns ns here? +} + +static void kvm_timer_set_mode(enum clock_event_mode mode, + struct clock_event_device *evt) +{ + WARN_ON(!irqs_disabled()); + + switch (mode) { + case CLOCK_EVT_MODE_ONESHOT: + /* this is what we want */ + break; + case CLOCK_EVT_MODE_RESUME: + break; + case CLOCK_EVT_MODE_PERIODIC: + WARN_ON(1); + break; + case CLOCK_EVT_MODE_UNUSED: + case CLOCK_EVT_MODE_SHUTDOWN: + kvm_hypercall0(KVM_HCALL_STOP_ONESHOT); + break; + default: + break; + } +} + +/* + * Programming the next event is just a matter of asking the host + * to generate us an interrupt when the time expires. We pass the + * delta on, and hypervisor will do all remaining tricks. For a more + * precise timing, we can just subtract the time spent by the hypercall Not worthwhile. It would be better to make the hypercall take an absolute time, and pass it now+delta. At least then if you get preempted past the timeout period you can return -ETIME, and the clock subsystem will know what to do. + */ +static int kvm_timer_next_event(unsigned long delta, + struct clock_event_device *evt) +{ + WARN_ON(evt-mode != CLOCK_EVT_MODE_ONESHOT); + kvm_hypercall1(KVM_HCALL_SET_ALARM, delta); + return 0; +} + diff --git a/arch/i386/kernel/setup.c b/arch/i386/kernel/setup.c index d474cd6..fd758f9 100644 --- a/arch/i386/kernel/setup.c +++ b/arch/i386/kernel/setup.c @@ -46,6 +46,7 @@ #include linux/crash_dump.h #include linux/dmi.h #include linux/pfn.h +#include linux/kvm_para.h #include video/edid.h @@ -579,6 +580,9 @@ void __init setup_arch(char **cmdline_p) vmi_init(); #endif +#ifdef CONFIG_KVM_CLOCK + kvmclock_init(); +#endif Why is this necessary? Can't you hook one of the existing pvops? J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure
Anthony Liguori wrote: Nakajima, Jun wrote: I don't understand the purpose of returning the max leaf. Who is that information useful for? Well, this is the key info to the user of CPUID. It tells which leaves are valid to use. Otherwise, the user cannot tell whether the results of CPUID.0x400N are valid or not (i.e. junk). BTW, this is what we are doing on the native (for the leaf 0, 0x8000, for example). The fact that Xen returns 0x4002 means it only uses 3 leaves today. Then it's just a version ID. You pretty much have to treat it as a version id because if it returns 0x4000 0003 and you only know what 0002 is, then you can't actually use it. Yeah. It's the way all the other cpuid leaf/level stuff works, so it's reasonable to do the same thing here. The question it helps answer is I understand leaf 33, does the [v]CPU?. I much prefer the current use of CPUID in KVM. If 1000 returns the KVM signature, then 1001 *must* be valid and contain a set of feature bits. If we wish to use additional CPUID leaves in the future, then we can just use a feature bit. The real benefit to us is that we can use a discontiguous set of leaves whereas the Xen approach is forced to use a linear set (at least for the result to be meaningful). Well, its also what the CPU itself does. The feature bits tend to relate to specific CPU features rather than CPUID instruction leaves. The features themselves may also have corresponding leaves, but that's secondary. IOW, if feature bit X is set, it may use leaf 0x4000101f, but that doesn't mean leaves 0x40001001-1f are necessarily defined. I'm starting to lean toward just using . If for no other reason than the hypercall space is unsharable. Well, it could be, but it would take affirmative action on the guest's part. If there's feature bits for each supported hypercall interface, then you could have a magic MSR to select which interface you want to use now. That would allow a generic-interface-using guest to probe for the generic interface at cpuid leaf 0x40001000, use 40001001 to determine whether the hypercall interface is available, 4000100x to find the base of the magic msrs, and write appropriate msr to set the desired hypercall style (and all this can be done without using vmcall, so it doesn't matter that hypercall interface is initially established). J - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure
Anthony Liguori wrote: This patch refactors the current hypercall infrastructure to better support live migration and SMP. It eliminates the hypercall page by trapping the UD exception that would occur if you used the wrong hypercall instruction for the underlying architecture and replacing it with the right one lazily. I guess it would be pretty rude/unlikely for these opcodes to get reused in other implementations... But couldn't you make the page trap instead, rather than relying on an instruction fault? It also introduces the infrastructure to probe for hypercall available via CPUID leaves 0x4002. CPUID leaf 0x4003 should be filled out by userspace. Is this compatible with Xen's (and other's) use of cpuid? That is, 0x4000 returns a hypervisor-specific signature in e[bcd]x, and eax has the max hypervisor leaf. J - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure
Anthony Liguori wrote: The whole point of using the instruction is to allow hypercalls to be used in many locations. This has the nice side effect of not requiring a central hypercall initialization routine in the guest to fetch the hypercall page. A PV driver can be completely independent of any other code provided that it restricts itself to it's hypercall namespace. I see. So you take the fault, disassemble the instruction, see that its another CPU's vmcall instruction, and then replace it with the current CPU's vmcall? Xen is currently using 0/1/2. I had thought it was only using 0/1. The intention was not to squash Xen's current CPUID usage so that it would still be possible for Xen to make use of the guest code. Can we agree that Xen won't squash leaves 3/4 or is it not worth trying to be compatible at this point? No, the point is that you're supposed to work out which hypervisor it is from the signature in leaf 0, and then the hypervisor can put anything it wants in the other leaves. J - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure
Anthony Liguori wrote: Yeah, see, the initial goal was to make it possible to use the KVM paravirtualizations on other hypervisors. However, I don't think this is really going to be possible in general so maybe it's better to just use leaf 0. I'll let others chime in before sending a new patch. Hm. Obviously you can just define a signature for kvm-compatible hypercall interface and make it common that way, but it gets tricky if the hypervisor supports multiple hypercall interfaces, including the kvm one. Start the kvm leaves at 0x40001000 or something? J - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure
Nakajima, Jun wrote: Today, 3 CPUID leaves starting from 0x4000_ are defined in a generic fashion (hypervisor detection, version, and hypercall page), and those are the ones used by Xen today. We should extend those leaves (e.g. starting from 0x4000_0003) for the vmm-independent features as well. If Xen needs additional Xen-specific features, we need to allocate some leaves for those (e.g. 0x4000_1000) But the signature is XenVMMXenVMM, which isn't very generic. If we're presenting a generic interface, it needs to have a generic signature, otherwise guests will need to have a list of all hypervisor signatures supporting their interface. Since 0x4000 has already been established as the base leaf of the hypervisor-specific interfaces, the generic interface will have to be elsewhere. J - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure
Nakajima, Jun wrote: The hypervisor detection machanism is generic, and the signature returned is implentation specific. Having a list of all hypervisor signatures sounds fine to me as we are detecting vendor-specific processor(s) in the native. And I don't expect the list is large. I'm confused about what you're proposing. I was thinking that a kernel looking for the generic hypervisor interface would check for a specific signature at some cpuid leaf, and then go about using it from there. If not, how does is it supposed to detect the generic hypervisor interface? J - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH][RFC]: pte notifiers -- support for external page tables
Avi Kivity wrote: It is, but the hooks are in much the same places. It could be argued that you'd embed pte notifiers in paravirt_ops for a host kernel, but that's not doable because pte notifiers use higher-level data strutures (like vmas). Also, I wouldn't like to preclude the possibility of having a kernel that's both a guest and a host (ie, nested vmms). J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/4] Virtual Machine Time Accounting
Laurent Vivier wrote: functionnalities: - allow to measure time spent by a CPU in a virtual CPU. - allow to display in /proc/state this value by CPU - allow to display in /proc/pid/state this value by process - allow KVM to use these 3 previous functionnalities So, currently time spent in a kvm guest is accumulated as qemu-kvm usertime, right? Given that qemu knows when its running in qemu vs guest context, couldn't it provide the breakdown between user and guest time (ditto lguest)? J - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/5] KVM paravirt_ops implementation
Anthony Liguori wrote: I don't agree that having paravirt_ops within a normal module is all that useful. By the time modules can be loaded, the kernel has completely booted. There should only be a handful of paravirt_ops implementations and they aren't large so I don't think there's a big size savings either. It doesn't seem terribly valuable to me either. But Zach is talking about something very similar to the kvm case, where you have a fully virtualized environment (with hardware support), but then you load a module containing paravirtualized helpers at some late stage which makes things more efficient but isn't required for functional correctness. More importantly, now device drivers for virtual devices would have a way to inquire into which set of paravirt-ops was loaded by having an official registered interface rather than an ad-hoc (if xxx_running == 1) mess, and now the paravirt driver modules are nicely decoupled from the boot-strap code and can be loaded dynamically. I'm not familiar with the particular problem here, but I don't think that driver modules should be checking to see what paravirt_ops is active. Each VMM has it's own discovery mechanism (KVM and Xen are both based on CPUID) so that seems like a much better method to use. I think he's referring to the xen/kvm/vmi paravirt implementation as a driver here. I think. I don't know of any if (xxx_running) tests at present. J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/5] KVM paravirt_ops implementation
Zachary Amsden wrote: For a VMM which supports both full emulation and para-virtualization, testing CPUID leaves is not sufficient to determine applicability of a paravirt device driver. This only indicates the presence of the functionality, not the fact that the functionality has been activated. For 32-bit Xen, this might be an already assumed fact - but for VMI, KVM, and HV assisted Xen, which do support guests running without paravirt, you need a way to test whether the particular family of paravirt has been activated - for device drivers which assume paravirt semantics might well require this activation to work, or need to behave differently in an unactivated environment (emulate hypercalls with port I/O, for example). paravirt_ops-style paravirtualization and paravirt device drivers are more or less completely orthogonal. An unmodified OS running under hvm Xen can still have paravirt drivers which can detect the presence of Xenbus and do all the appropriate things. It would presumably be a matter of loading xenbus.ko, which would probe for the presence of the Xen device infrastructure, and that in turn would start pulling in the appropriate paravirt drivers. The state (or existence) of struct paravirt_ops is immaterial. Thus, all the paravirt drivers as modules would need to test if (xen_running) or (vmi_enabled) or (kvm_active), and then all these symbols need to be exported, and now you have an ad-hoc activation detection system for each brand of paravirt. No, I think not. Normal bus/device probing should be able to deal with it. Better to have a standard interface, IMHO, where a paravirt-ops parent module gets registered and activated, and then well defined symbols to query that activation. You also have module dependencies between the parent and child which are nicely modeled with the module system (xenbus dependes on xen, vmitimer depends on vmi, etc..). To a large extent that already exists in the device model. J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/5] KVM paravirt_ops implementation
Zachary Amsden wrote: Basically, it just makes it easier on distributors and allows any old kernel with paravirt-ops module support to run on any modern, new hypervisor - that might not have even existed at the time the distro was created. Hey, isn't that what VMI's for? ;) I'd been thinking about the possibility of allowing the domain builder to provide a new paravirt_ops implementation to the booting kernel. It would be akin to a kernel module, in that its built for a specific kernel, but obviously run a lot earlier. But at this point I think the idea is too crack-ridden to be taken seriously. In the case of KVM, the paravirt_ops implementation is orthogonal to paravirt device drivers. A PV device driver can happily exist even if the paravirt_ops backend isn't activated. This is assuming that hypercalls aren't used btw. If hypercalls are desirable to use, then the paravirt_ops backend would have to EXPORT_GPL the hypercall interface. I imagine returning a specific errno would suffice. I'm mostly in agreement on that - although making dual hypercall / I/O emulated drivers is a bit more work. Semi-paravirtualized real-hardware drivers seems like a difficult mishmash. I would hope we could deal with it with a virtio-like thing. J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/5] KVM paravirt_ops implementation
Anthony Liguori wrote: I've been thinking about this wrt the hypercall page in KVM. The problem is that in a model like KVM (or presumably VMI), migration gets really difficult if you have anything but a trivial hypercall page since the hypercall page will change after migration. If you cannot guarantee the guest isn't executing code within the hypercall page (or in your case, doing something with paravirt_ops), then you cannot safely migrate. Hm, you need to quiesce the kernel in some way when you do a migrate, so making sure it isn't in a hypercall would be just part of that. In general you'd make sure all but one CPU is parked somewhere, and the remaining CPU is doing the suspend, right? The tricky part for Xen in all this is how to make sure all mfn references are visible to the hypervisor/toolstack so they can be remapped; hypercall page contents are not a concern by comparison. J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/5] KVM paravirt_ops implementation
Zachary Amsden wrote: Unless you also migrate the hypercall page itself and impose migration restrictions on compatible hypercall pages. Seems unreasonable, especially if you support migration between VT and SVM machines. The whole point of a hypercall page is to give you a point of indirection in order to hide these kinds of hardware differences; migrating it would defeat the purpose. Although I favor the guarantee that execution within the hypercall page is finished - it is important for protecting against non-reentrancy as well. Think about interrupts during batching / queueing operations. Not quite sure that's specifically relevant to migration, but yes, its important to disable interrupts while doing the setup for a batch of stuff unless you want to see some surprises in your queue (and not Oh, yay, a puppy! kind of surprises). J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/5] KVM paravirt_ops implementation
Zachary Amsden wrote: You only need to quiesce if you have guest-visible data-structures that have details about the underlying hardware. So Xen needs to quiesce, but I don't know of any other VMM that would. VMI, KVM and lhype should be capable of transparent migration without guest involvement. Sure; Xen makes the explicit design decision that suspend/resume/migrate are things that the guest is likely to want to have some involvement in if its already doing all the paravirt games. A semi-kinda-paravirtualized hvm Xen guest doesn't have to worry too much about those kinds of things. J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/5] KVM paravirt_ops implementation
Zachary Amsden wrote: Yes, but if we want to stay with that forward compatibility story, we need a way to allow paravirt device probing to be completely orthogonal to paravirt-ops probing. Either the VMware hypervisor needs to NOT implement a CPUID leaf, keeping the same ROM based detection, or other VMI client drivers (say, as a wild example, a KVM driver running on a VMI to KVM paravirt-ops backend) need not to check CPUID leaf as a condition of execution. Yes, this is something that keeps coming up. hpa originally floated the idea of reserving some PCI bus namespace as a gateway for probing for virtual/paravirtual devices, and Jun Nakajima proposed it again in the context of smart hardware which is virtualization friendly (ie, how to represent PCI-IOV to guests). I'm not wildly happy about the idea of using PCI for probing for otherwise completely non-PCI devices, but some kind of probing mechanism might be nice in the general case. Xen deals with it with Xenbus, but I figure I'm unlikely to convince everyone to adopt that. We at least would like to use a CPUID leaf for the core paravirt-ops on 64-bit and get rid of the need for ROM probing in that case, which would mean we either need a CPUID sub-leaf for the device model, a completely identical device model, or completely orthogonal device probing. Well, cpuid leaf 0x4000 seems to be gaining currency as a (semi-?)formal way for hypervisors to advertise themselves, so that seems completely doable. Since there hasn't been a formal specification for how the device probing should work, or, at least, I don't know all the details of how device probing works for all the various hypervisors, I worry that weird ad-hoc tests could trample the compatibility effort. Yes. That's the thinking behind using PCI as a somewhat common mechanism for device discovery. s390 folks hate it, of course. The completely identical device model is of course ideal, but the implementation and consolidation of that is a long term prospect to move towards, not something that will happen immediately. We at least emulate physical hardware devices already, and will continue to need drivers compatible with those models for some time. Well, physical devices and completely emulated physical devices are fairly straightforward - do it like real hardware. Its the semi-virtual devices which pose problems. Either device emulations with a bit of performance paravirtualization sprinkled over them, or virtualization friendly devices which allow safe direct guest access, but need some paravirtual management interfaces as well. J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/5] KVM paravirt_ops implementation
Zachary Amsden wrote: If I had a gentoo install, Yes, but then you'd be a gentoo user. ;) I would probably go so far as to want to recompile everything after migration across CPU vendors; things like NMIs, MSRs, thermal controls and sleep states are also vendor dependent and either need to be emulated both ways, re-invented in a new way entirely, or just dropped. Many of those things are meaningless for a guest to see or control. I don't think cross-CPU vendor hot migration is particularly compelling, although it certainly is possible, the payoff doesn't seem worth the implementation cost and you will find a maze of brambly thorns blocking your path. We see people trying to do it, and for good reasons. The selling point of VMs is that they can install their thingy once, and it becomes independent from its hardware environment, to the extent that they can update their hardware without having to worry about its effects on their guests. With a stable OS and live migration, its reasonable to consider a guest VM undergoing multiple hardware upgrade transitions with no downtime. I think the general approach is to have a compatible-performance slider which disables/enables non-portable features. Migrating between Intel/AMD is just a slightly more extreme point on the continuum of allowing people to migrate between different models within one manufacturer's line. J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/5] KVM paravirt_ops implementation
Anthony Liguori wrote: The real trick is doing it without the guest being involved at all. Right now, it won't be a problem in KVM since the hypercall page only differs by a single instruction across platforms. In the future, we'll have to be smarter and wait for all VCPUs to leave the hypercall page. Well, you could just fake the whole acpi suspend/resume ;) The tricky part for Xen in all this is how to make sure all mfn references are visible to the hypervisor/toolstack so they can be remapped; hypercall page contents are not a concern by comparison. I don't know HVM save/resume all that well but I think it's a similar model where the guest isn't involved. Yes, the guest has little to nothing to do. They may have a similar issue when using PV drivers. Hm, not sure. J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 5/5] KVM: paravirt time source
Anthony Liguori wrote: I've updated this patch and switched to using a scale/shift like Xen is doing, but I must admit, I don't understand how it helps adjtime. I poked around a bit and it wasn't obvious. Why is having {mult=122, shift=22} better for adjtime than {mult=1, shift=0}? I don't fully understand it myself, but I think its because adjtime plays with the mult factor to scale the rate at which time passes. If the scale is 1, then it can only scale time by integer amounts. By setting it to 2^22, then it can adjust time down to 1 part in 4 million. J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 3/5] KVM: Add paravirt MMU write support
Anthony Liguori wrote: Perhaps my grep'ing skills are weak, but I don't seem to see any. Were you thinking of something in particular? __pte(), of course. Sheesh. ;) J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/5] KVM paravirt_ops implementation
Anthony Liguori wrote: Perhaps we can just print the banner before batching occurs? Then it's being printed at the last possible moment. s/batching/patching/? Yes, that would work. J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 5/5] KVM: paravirt time source
Anthony Liguori wrote: +static cycle_t read_hyper(void) +{ + struct timespec now; + int ret; + + ret = kvm_hypercall(KVM_HYPERCALL_GET_KTIME, (u32)now, 0, 0, 0); + WARN_ON(ret); + + return now.tv_nsec + now.tv_sec * (cycles_t)1e9; Hm, use of FP looks pretty odd. I guess its OK to assume the compiler will completely remove all the FP stuff at compile time. Or you could use NSEC_PER_SEC. +} + +static struct clocksource clocksource_hyper = { + .name = hyper, + .rating = 200, We should probably standardize on this. I guess that if you're in a paravirt environment, and there's a paravirt clocksource, that would always be the best clocksource to use. + .read = read_hyper, + .mask = CLOCKSOURCE_MASK(64), + .mult = 1, + .shift = 0, It would be better to use a scale and shift here, so that adjtime has something to work with when warping time. J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 5/5] KVM: paravirt time source
Anthony Liguori wrote: Jeremy Fitzhardinge wrote: Anthony Liguori wrote: Okay. I may remove this patch from the patch series and attempt to sit down next week and work out something more complete that also implements stolen time accounting. Well, that's a separate problem. clocksource.read should always return real time passed, so stolen time doesn't come into it. Right. paravirt_ops.sched_clock should take stolen time into account, but that's almost completely orthogonal. Except that I wanted to change the hypercall to allow querying of real time or available time as VMI puts it. I see. You could have an interface like Xen's runstate interface, which gives you a breakdown of how long each vcpu spends in each state (running, runnable, paused or blocked). The sum of all of them gives you real time, or you can use a subset to work out stolen time (runnable+paused), busyness (blocked / (running+runnable+blocked)), etc. Right now, I'm relying on the PIT but it would be nice to eliminate that. I'd like to move to something PV so that I can make use of tickless guest kernels. I'm very open to suggestion and even more open to reusing other people's code :-) A simple hypercall interface which says raise irq X after N ns is probably the easiest way to go. Try to avoid the pitfalls of VMI's interface in this area, which tries to recycle existing architectural interrupt sources for hypervisor timers, and gets into a bit of mess. Xen is relatively clean, but I'm not sure how it would fit into the mostly-emulated kvm model. If you keep things simple, most of the Xen clockevent code could be reused with little change. J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/5] KVM paravirt_ops implementation
Anthony Liguori wrote: 1) Not really sure what is needed for CONFIG_PREEMPT support. I'm not sure which paravirt_ops calls are actually re-entrant. I'm not sure that has specifically come up. The main issue is whether a particular call can be preempted and whether that matters. I guess the calls which affect a particular CPU's state will generally be called in a non-preemptable context, but I guess we can't assume that; the best approach is to assume that each call be atomic with respect to preemption. Things like batching must be completed with preemption disabled over the whole batch. I check that with BUG_ON in the Xen code. 2) The paravirt_ops implementation is registered with core_initcall(). However, the paravirt_ops banner is also printed with core_initcall() so that fact that this works now is just the luck of build order. Need a better way to initialize the KVM paravirt_ops backend. Hm. We could make the banner printing later; obviously its purely cosmetic. I put it as a core_initcall on the assumption that pv-ops would be set up very early as it is with Xen and lguest, and so the banner should print relatively early. But if your model is that you boot fully virtualized for a while, and then become paravirtualized later, then it would make sense to defer banner printing until then. J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
Anthony Liguori wrote: Regards, Anthony Liguori Subject: [PATCH] KVM: Add hypercall queue for paravirt_ops implementation Author: Anthony Liguori [EMAIL PROTECTED] Implemented a hypercall queue that can be used when paravirt_ops lazy mode is enabled. This patch enables queueing of MMU write operations and CR updates. This results in about a 50% bump in kernbench performance. Signed-off-by: Anthony Liguori [EMAIL PROTECTED] diff --git a/arch/i386/kernel/kvm.c b/arch/i386/kernel/kvm.c index 07ce38e..4b323f1 100644 --- a/arch/i386/kernel/kvm.c +++ b/arch/i386/kernel/kvm.c @@ -33,8 +33,10 @@ struct kvm_paravirt_state unsigned long cached_cr[5]; int cr_valid[5]; - struct kvm_vmca *vmca; + enum paravirt_lazy_mode mode; struct kvm_hypercall_entry *queue; + + struct kvm_vmca *vmca; void (*hypercall)(void); u64 vmca_gpa; @@ -42,17 +44,17 @@ struct kvm_paravirt_state static DEFINE_PER_CPU(struct kvm_paravirt_state *, paravirt_state); +static int do_hypercall_batching; static int do_mmu_write; static int do_cr_read_caching; static int do_nop_io_delay; static u64 msr_set_vmca; -static long kvm_hypercall(unsigned int nr, unsigned long p1, - unsigned long p2, unsigned long p3, - unsigned long p4) +static long _kvm_hypercall(struct kvm_paravirt_state *state, +unsigned int nr, unsigned long p1, +unsigned long p2, unsigned long p3, +unsigned long p4) { - struct kvm_paravirt_state *state - = per_cpu(paravirt_state, smp_processor_id()); long ret; asm volatile(call *(%6) \n\t @@ -69,6 +71,55 @@ static long kvm_hypercall(unsigned int nr, unsigned long p1, return ret; } +static int can_defer_hypercall(struct kvm_paravirt_state *state, +unsigned int nr) +{ + if (state-mode == PARAVIRT_LAZY_MMU) { + if (nr == KVM_HYPERCALL_MMU_WRITE) + return 1; + } else if (state-mode == PARAVIRT_LAZY_CPU) { + if (nr == KVM_HYPERCALL_SET_CR) + return 1; + } + + return 0; +} + +static void _kvm_hypercall_defer(struct kvm_paravirt_state *state, + unsigned int nr, + unsigned long p1, unsigned long p2, + unsigned long p3, unsigned long p4) +{ + struct kvm_hypercall_entry *entry; + + if (state-vmca-queue_index == state-vmca-max_queue_index) + _kvm_hypercall(state, KVM_HYPERCALL_FLUSH, 0, 0, 0, 0); + + /* FIXME: are we preempt safe here? */ BUG_ON(preemptible()) would be a reasonable thing to put here to be sure. + entry = state-queue[state-vmca-queue_index++]; + entry-nr = nr; + entry-p1 = p1; + entry-p2 = p2; + entry-p3 = p3; + entry-p4 = p4; +} + +static long kvm_hypercall(unsigned int nr, unsigned long p1, + unsigned long p2, unsigned long p3, + unsigned long p4) +{ + struct kvm_paravirt_state *state + = per_cpu(paravirt_state, smp_processor_id()); Rather than using this here and passing state around, you could use either x86_read/write_percpu, or get/put_cpu_var (or __get_vpu_var if you don't need the preempt-disable). + long ret = 0; + + if (can_defer_hypercall(state, nr)) + _kvm_hypercall_defer(state, nr, p1, p2, p3, p4); + else + ret = _kvm_hypercall(state, nr, p1, p2, p3, p4); + + return ret; +} + /* * No need for any IO delay on KVM */ @@ -107,7 +158,9 @@ static void kvm_write_cr(int reg, unsigned long value) state-cr_valid[reg] = 1; state-cached_cr[reg] = value; - if (reg == 0) + if (state-mode == PARAVIRT_LAZY_CPU) + kvm_hypercall(KVM_HYPERCALL_SET_CR, reg, value, 0, 0); + else if (reg == 0) native_write_cr0(value); else if (reg == 3) native_write_cr3(value); @@ -218,6 +271,18 @@ static void kvm_pmd_clear(pmd_t *pmdp) kvm_mmu_write(pmdp, pmd, sizeof(pmd)); } +static void kvm_set_lazy_mode(enum paravirt_lazy_mode mode) +{ + struct kvm_paravirt_state *state + = per_cpu(paravirt_state, smp_processor_id()); + + if (mode == PARAVIRT_LAZY_FLUSH || mode == PARAVIRT_LAZY_NONE) { + if (state-vmca-queue_index) + _kvm_hypercall(state, KVM_HYPERCALL_FLUSH, 0, 0, 0, 0); + } + state-mode = mode; No, you don't want to set state-mode to LAZY_FLUSH (its not a mode, just a action which overloads the interface). +} + static void paravirt_ops_setup(void) { paravirt_ops.name = KVM; @@ -249,6 +314,9 @@ static void
Re: [kvm-devel] [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
Jeremy Fitzhardinge wrote: +static int kvm_hypercall_flush(struct kvm_vcpu *vcpu) +{ +struct kvm_hypercall_entry *queue; +struct kvm_vmca *vmca; +int ret = 0; +int i; + +queue = kmap(vcpu-queue_page); +vmca = kmap(vcpu-para_state_page); kmap_atomic? Or why not keep them mapped all the time? Oh, right, this is on the kvm side. Still, kmap_atomic? J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/5] KVM paravirt_ops implementation
Anthony Liguori wrote: Hi Jeremy, Jeremy Fitzhardinge wrote: Anthony Liguori wrote: 1) Not really sure what is needed for CONFIG_PREEMPT support. I'm not sure which paravirt_ops calls are actually re-entrant. I'm not sure that has specifically come up. The main issue is whether a particular call can be preempted and whether that matters. I guess the calls which affect a particular CPU's state will generally be called in a non-preemptable context, but I guess we can't assume that; the best approach is to assume that each call be atomic with respect to preemption. So each call would need to disable preemption? I'm not sure that makes a whole lot of sense for something like CR reads/writes. In fact, without passing in a cpu parameter, I'm pretty sure that those operations *have* to require preemption to be disabled. Yeah, its a little unclear to me. If you're poking at a control register, then one presumes you've got a specific CPU's CRx in mind. But in the Xen code I don't care about the preemption state for control register updates - except for write_cr3, which never makes any sense with preemption enabled. For something like MMU operations, preemption really doesn't have to be disabled. Unless you're batching, since the lazy_mode is inherently per-cpu state. Things like batching must be completed with preemption disabled over the whole batch. I check that with BUG_ON in the Xen code. Right now the KVM batching requires preemption to be disabled for batching. I think that's probably overkill. I had to put a few explicit preempt_disable/enables in the Xen code, but mostly the preempt state is reasonable for a given operation (ie, disabled for per-cpu state updates, enabled for memory/global state updates). I don't think that's a hard requirement though since we could pass the current batch PA as part of the flush hypercalls. PA? Things are a lot easier though if we can just assume preemption is disabled :-) Are you aware of any paravirt_ops calls that are probably being called in the kernel with preemption enabled? Erm, I haven't made a breakdown, but many are. The descriptor updates generally are, for example. Pagetable updates could be, but are generally done under a pagetable lock, and so are not preemptible anyway. I don't see a compelling reason to paravirtualize earlier although I also don't see a compelling reason not too. I noticed that VMI hooks setup.c. It wasn't immediately obvious why it was hooking there but perhaps it worthwhile to have a common hook? I suspect VMI and KVM will have a similar model for startup. Well, I was suggesting we could print the banner later rather than forcing an earlier init. The important part is that you set your pv_ops before patching occurs, since that will bake the function calls into the rest of the kernel, and it will ignore any further changes to the paravirt_ops structure. I think Zach was originally thinking of initializing VMI much later (even as a module load), but the subtleties of inveigling its way into the kernel at that late stage got too complex. J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Xen-devel] [PATCH RFC 1/3] virtio infrastructure
Santos, Jose Renato G wrote: It seems that we will still need specific devices drivers for each different virtualization flavor. For example, we will still need to have a specific Xen netfront device that talks to a backend device in dom0, using page grants, and other Xen specific mechanisms. It looks like will still need to maintain all the virtual device drivers and in addition we will now have to maintain another virtualization layer. The hope is that the Xen-specific elements of the driver will be restricted to Xen-specific things like grant tables, and the bulk of the driver and its logic can be common. Whether that can be achieved and still retain the full performance/features of the entirely Xen-specific netfront driver remains to be seen. I haven't had a chance to look at doing any Xen-virtio glue yet, so I'm not really sure how it will work out. J - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] make KVM conform to sucky rdmsr interface
Rusty Russell wrote: It was actually Jeremy's paravirt cleanup patch which changed the calling convention of rdmsr_safe() to match rdmsr(). Oops, my little mind hobgoblin is getting out of control... J - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] make KVM conform to sucky rdmsr interface
Andrew Morton wrote: Which tree are you patching?? - It looks like its against the previously posted Cleanup: rationalize paravirt wrappers patch. J - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel