Re: [PATCH v3 6/6] KVM: introduce readonly memslot
On Mon, Jun 18, 2012 at 05:25:05PM -0300, Marcelo Tosatti wrote: > On Mon, Jun 18, 2012 at 12:50:10PM +0300, Avi Kivity wrote: > > On 06/16/2012 05:11 AM, Marcelo Tosatti wrote: > > > > > > Can you introduce a separate exit reason, say KVM_EXIT_READ_FAULT, with > > > information about the fault? > > > > I think you mean WRITE_FAULT. > > Yes. > > > But what's wrong with the normal mmio exit? > > It is necessary to perform an address->mmio region lookup, to verify > whether the mmio exit is due to an actual mmio (no memory slot) or from > a write access to a write protected slot. That information is readily > available in the kernel but is lost if the mmio exit is used to transmit > the information. > Why is it necessary though? Write access to a write protected slot is MMIO by (our) definition. > Moreover, i'd argue the uses are different: one is an mmio emulation > exit, the other is more like handling a pagefault in qemu. > What do you mean by "handling a pagefault in qemu"? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 6/6] KVM: introduce readonly memslot
On 06/18/2012 11:25 PM, Marcelo Tosatti wrote: > On Mon, Jun 18, 2012 at 12:50:10PM +0300, Avi Kivity wrote: >> On 06/16/2012 05:11 AM, Marcelo Tosatti wrote: >> > >> > Can you introduce a separate exit reason, say KVM_EXIT_READ_FAULT, with >> > information about the fault? >> >> I think you mean WRITE_FAULT. > > Yes. > >> But what's wrong with the normal mmio exit? > > It is necessary to perform an address->mmio region lookup, to verify > whether the mmio exit is due to an actual mmio (no memory slot) or from > a write access to a write protected slot. That information is readily > available in the kernel but is lost if the mmio exit is used to transmit > the information. For qemu it doesn't matter, but other userspaces might need it. Can we add it to the mmio exit reason as extra data? Only present if the CAP is available. > Moreover, i'd argue the uses are different: one is an mmio emulation > exit, the other is more like handling a pagefault in qemu. It is not. A pagefault is handled by fixing and retrying (fault). MMIO emulation is done by userspace instead of the kernel or guest (trap). Here we're not fixing anything (say by marking the page writeable; the slot is readonly, not the page (as is the case with ksm). Qemu's ROM and ROMD behaviour follow this (discard access for ROM, emualte for ROMD). There might be value in implementing user visible write protection, for example to implement distributed shared memory or postcopy migration. But as long as per page protection requires separate VMAs it is not practical (and is unlikely to perform well anyway). > >> > Then perform this exit only if userspace allows it by explicit enable, >> > and by default have the exit_read_fault handler jump to the mmio >> > handler. >> >> >> I don't get this. > > > CAN USERSPACE HANDLE WRITE FAULT EXITS? > YES: WRITE FAULT EXIT. > NO: MMIO EXIT. > > But then again userspace won't set read-only slots if it does not know > about them. So it is not necessary. > Okay, okay, don't shout. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4] KVM: x86: Implement PCID/INVPCID for guests with EPT
Hi, Avi Any comments for the patch? Best Regards Junjie Mao > -Original Message- > From: Marcelo Tosatti [mailto:mtosa...@redhat.com] > Sent: Saturday, June 16, 2012 10:32 AM > To: Mao, Junjie > Cc: 'kvm@vger.kernel.org'; Avi Kivity > Subject: Re: [PATCH v4] KVM: x86: Implement PCID/INVPCID for guests with > EPT > > On Thu, Jun 14, 2012 at 02:04:25AM +, Mao, Junjie wrote: > > This patch handles PCID/INVPCID for guests. > > > > Process-context identifiers (PCIDs) are a facility by which a logical > > processor may cache information for multiple linear-address spaces so > > that the processor may retain cached information when software > > switches to a different linear address space. Refer to section 4.10.1 > > in IA32 Intel Software Developer's Manual Volume 3A for details. > > > > For guests with EPT, the PCID feature is enabled and INVPCID behaves > > as running natively. > > For guests without EPT, the PCID feature is disabled and INVPCID triggers > #UD. > > > > Changes from v3: > > Rebase to the latest tree > > Expose PCID to nested guests > > Remove the pcid_supported callback > > > > Changes from v2: > > Seperate management of PCID and INVPCID > > Prevent PCID bit in CPUID from exposing on guest hypervisors > > Don't check the lower 12 bits when loading cr3 if cr4.PCIDE is set > > Explicitly disable INVPCID for L2 guests > > Support both enable and disable INVPCID in vmx_cpuid_update() > > > > Changes from v1: > > Move cr0/cr4 writing checks to x86.c > > Update comments for the reason why PCID is disabled for non-EPT guests > > Do not support PCID/INVPCID for nested guests at present > > Clean up useless symbols > > > > Signed-off-by: Junjie Mao > > Looks good to me. > > > --- > > arch/x86/include/asm/kvm_host.h|4 ++- > > arch/x86/include/asm/processor-flags.h |2 + > > arch/x86/include/asm/vmx.h |2 + > > arch/x86/kvm/cpuid.c |6 +++- > > arch/x86/kvm/cpuid.h |8 +++ > > arch/x86/kvm/svm.c |6 + > > arch/x86/kvm/vmx.c | 37 > +++- > > arch/x86/kvm/x86.c | 24 > ++-- > > 8 files changed, 82 insertions(+), 7 deletions(-) > > > > diff --git a/arch/x86/include/asm/kvm_host.h > > b/arch/x86/include/asm/kvm_host.h index db7c1f2..95828a4 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -48,12 +48,13 @@ > > > > #define CR3_PAE_RESERVED_BITS ((X86_CR3_PWT | X86_CR3_PCD) - 1) > > #define CR3_NONPAE_RESERVED_BITS ((PAGE_SIZE-1) & ~(X86_CR3_PWT | > > X86_CR3_PCD)) > > +#define CR3_PCID_ENABLED_RESERVED_BITS 0xFF00ULL > > #define CR3_L_MODE_RESERVED_BITS (CR3_NONPAE_RESERVED_BITS | > \ > > 0xFF00ULL) > > #define CR4_RESERVED_BITS > \ > > (~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | > X86_CR4_DE\ > > | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \ > > - | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR \ > > + | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | > X86_CR4_PCIDE \ > > | X86_CR4_OSXSAVE | X86_CR4_SMEP | > X86_CR4_RDWRGSFS \ > > | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE)) > > > > @@ -661,6 +662,7 @@ struct kvm_x86_ops { > > u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); > > int (*get_lpage_level)(void); > > bool (*rdtscp_supported)(void); > > + bool (*invpcid_supported)(void); > > void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment, > > bool host); > > > > void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3); diff > > --git a/arch/x86/include/asm/processor-flags.h > > b/arch/x86/include/asm/processor-flags.h > > index f8ab3ea..aea1d1d 100644 > > --- a/arch/x86/include/asm/processor-flags.h > > +++ b/arch/x86/include/asm/processor-flags.h > > @@ -44,6 +44,7 @@ > > */ > > #define X86_CR3_PWT0x0008 /* Page Write Through */ > > #define X86_CR3_PCD0x0010 /* Page Cache Disable */ > > +#define X86_CR3_PCID_MASK 0x0fff /* PCID Mask */ > > > > /* > > * Intel CPU features in CR4 > > @@ -61,6 +62,7 @@ > > #define X86_CR4_OSXMMEXCPT 0x0400 /* enable unmasked SSE > exceptions */ > > #define X86_CR4_VMXE 0x2000 /* enable VMX virtualization */ > > #define X86_CR4_RDWRGSFS 0x0001 /* enable RDWRGSFS support */ > > +#define X86_CR4_PCIDE 0x0002 /* enable PCID support */ > > #define X86_CR4_OSXSAVE 0x0004 /* enable xsave and xrestore */ > > #define X86_CR4_SMEP 0x0010 /* enable SMEP support */ > > > > diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h > > index 31f180c..b81525c 100644 > > --- a/arch/x86/include/asm/vmx.h > > +++ b/arch/x86/include/asm/vmx.h > > @@ -60,6 +60,7 @@ >
Re: [PATCH v8 04/15] KVM: Guard mmu_notifier specific code with CONFIG_MMU_NOTIFIER
On 06/18/2012 08:47 PM, Christoffer Dall wrote: > On Mon, Jun 18, 2012 at 9:08 AM, Avi Kivity wrote: >> On 06/15/2012 10:07 PM, Christoffer Dall wrote: >>> From: Marc Zyngier >>> >>> In order to avoid compilation failure when KVM is not compiled in, >>> guard the mmu_notifier specific sections with both CONFIG_MMU_NOTIFIER >>> and KVM_ARCH_WANT_MMU_NOTIFIER, like it is being done in the rest of >>> the KVM code. >>> >>> >>> -#ifdef KVM_ARCH_WANT_MMU_NOTIFIER >>> +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) >>> struct mmu_notifier mmu_notifier; >>> unsigned long mmu_notifier_seq; >>> long mmu_notifier_count; >>> @@ -780,7 +780,7 @@ struct kvm_stats_debugfs_item { >>> extern struct kvm_stats_debugfs_item debugfs_entries[]; >>> extern struct dentry *kvm_debugfs_dir; >>> >>> -#ifdef KVM_ARCH_WANT_MMU_NOTIFIER >>> +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) >>> static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long >>> mmu_seq) >>> { >> >> Why not have Kconfig select CONFIG_MMU_NOTIFIER? >> >> > Not sure I understand. Where would you select this option? > > We do select this option when choosing to compile KVM on, but when we > do _not_, then other includes of kvm_host.h fails. Right, my mistake. Didn't notice it was a header. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 06/15] ARM: KVM: Hypervisor identity mapping
On 06/18/2012 08:55 PM, Christoffer Dall wrote: > On Mon, Jun 18, 2012 at 9:12 AM, Avi Kivity wrote: >> On 06/15/2012 10:07 PM, Christoffer Dall wrote: >>> Adds support in the identity mapping feature that allows KVM to setup >>> identity mapping for the Hyp mode with the AP[1] bit set as required by >>> the specification and also supports freeing created sub pmd's after >>> finished use. >>> >>> These two functions: >>> - hyp_idmap_add(pgd, addr, end); >>> - hyp_idmap_del(pgd, addr, end); >>> are essentially calls to the same function as the non-hyp versions but >>> with a different argument value. KVM calls these functions to setup >>> and teardown the identity mapping used to initialize the hypervisor. >>> >>> Note, the hyp-version of the _del function actually frees the pmd's >>> pointed to by the pgd as opposed to the non-hyp version which just >>> clears them. >> >> >> I asked previously what happens if two data structures share a page, and >> one of them is removed. Is that handled now? How? >> > > I think you asked previously for the general hyp-mode mappings, not > the identity mappings. For the general hyp-mode mappings we simply > don't unmap the data structures, potentially leaking a few pages for > the page tables themselves. > > This is only for initialization, so there are not really any data > structures mapped, only one/two pages to initialize the hypervisor > mode. > >> Why not just identity map all memory? You can use large pages so it's >> fast and doesn't consume a lot of page table memory. > > That's an option, but it still seems like an awful waste since it's > only used once (unless you unload and re-load the module) and there's > really no problem with data structures here. > > The truth is that this is going to go away, and the code will be put > in a section that's idmapped from kernel start. There's a patch under > way from Marc taking care of this which I assmue we'll merge for v9. Okay, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 10/15] ARM: KVM: Inject IRQs and FIQs from userspace
On 06/18/2012 11:56 PM, Christoffer Dall wrote: > On Mon, Jun 18, 2012 at 9:32 AM, Avi Kivity wrote: >> On 06/15/2012 10:08 PM, Christoffer Dall wrote: >>> From: Christoffer Dall >>> >>> Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl. >>> This ioctl is used since the sematics are in fact two lines that can be >>> either raised or lowered on the VCPU - the IRQ and FIQ lines. >>> >>> KVM needs to know which VCPU it must operate on and whether the FIQ or >>> IRQ line is raised/lowered. Hence both pieces of information is packed >>> in the kvm_irq_level->irq field. The irq fild value will be: >>> IRQ: vcpu_index << 1 >>> FIQ: (vcpu_index << 1) | 1 >>> >>> This is documented in Documentation/kvm/api.txt. >>> >>> The effect of the ioctl is simply to simply raise/lower the >>> corresponding irq_line field on the VCPU struct, which will cause the >>> world-switch code to raise/lower virtual interrupts when running the >>> guest on next switch. The wait_for_interrupt flag is also cleared for >>> raised IRQs or FIQs causing an idle VCPU to become active again. CPUs >>> in guest mode are kicked to make sure they refresh their interrupt status. >> >>> >>> +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm, >>> + struct kvm_irq_level *irq_level) >>> +{ >>> + int mask; >>> + unsigned int vcpu_idx; >>> + struct kvm_vcpu *vcpu; >>> + unsigned long old, new, *ptr; >>> + >>> + vcpu_idx = irq_level->irq >> 1; >>> + if (vcpu_idx >= KVM_MAX_VCPUS) >>> + return -EINVAL; >>> + >>> + vcpu = kvm_get_vcpu(kvm, vcpu_idx); >>> + if (!vcpu) >>> + return -EINVAL; >>> + >>> + if ((irq_level->irq & 1) == KVM_ARM_IRQ_LINE) >>> + mask = HCR_VI; >>> + else /* KVM_ARM_FIQ_LINE */ >>> + mask = HCR_VF; >>> + >>> + trace_kvm_set_irq(irq_level->irq, irq_level->level, 0); >>> + >>> + ptr = (unsigned long *)&vcpu->arch.irq_lines; >>> + do { >>> + old = ACCESS_ONCE(*ptr); >>> + if (irq_level->level) >>> + new = old | mask; >>> + else >>> + new = old & ~mask; >>> + >>> + if (new == old) >>> + return 0; /* no change */ >>> + } while (cmpxchg(ptr, old, new) != old); >> >> Isn't this a complicated >> >> >> if (level) >> set_bit() >> else >> clear_bit() >> >> ? >> > > we need to atomically know if we changed either the FIQ/IRQ fields and > atomically update without locking. So use test_and_set_bit()/test_and_clear_bit(). > (I think you suggested this > approach, in fact). I think so too, but it only makes sense if you need to consider both fields simultaneously (which you don't here). For example, if IRQ was asserted while FIQ was already raised, you don't need to kick. But that's not the case according to the below. > >>> + >>> + /* >>> + * The vcpu irq_lines field was updated, wake up sleeping VCPUs and >>> + * trigger a world-switch round on the running physical CPU to set the >>> + * virtual IRQ/FIQ fields in the HCR appropriately. >>> + */ >>> + kvm_vcpu_kick(vcpu); >> >> No need to wake when the line is asserted so you can make this >> conditional on level. >> > > we need to trigger a world switch to update the virtualized register > from the actual running physical CPU if we changed any of the IRQ/FIQ > fields. We return in the loop above if we didn't change anything. Okay. > >>> + >>> + return 0; >>> +} >>> + >>> long kvm_arch_vcpu_ioctl(struct file *filp, >>>unsigned int ioctl, unsigned long arg) >>> { >>> @@ -298,7 +345,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct >>> kvm_dirty_log *log) >>> long kvm_arch_vm_ioctl(struct file *filp, >>> unsigned int ioctl, unsigned long arg) >>> { >>> - return -EINVAL; >>> + struct kvm *kvm = filp->private_data; >>> + void __user *argp = (void __user *)arg; >>> + >>> + switch (ioctl) { >>> + case KVM_IRQ_LINE: { >>> + struct kvm_irq_level irq_event; >>> + >>> + if (copy_from_user(&irq_event, argp, sizeof irq_event)) >>> + return -EFAULT; >>> + return kvm_arch_vm_ioctl_irq_line(kvm, &irq_event); >>> + } >>> + default: >>> + return -EINVAL; >>> + } >>> } >> >> Should be in common code guarded by the define introduced previously. >> >> > > you mean like this: ? > > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > index a9f209b..5bf2193 100644 > --- a/arch/arm/kvm/arm.c > +++ b/arch/arm/kvm/arm.c > @@ -535,8 +535,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, > struct kvm_run *run) > return ret; > } > > -static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm, > - struct kvm_irq_level *irq_level) > +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_le
Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
On 06/19/2012 01:05 AM, Christoffer Dall wrote: >> Premature, but this is sad. I suggest you split vmid generation from >> next available vmid. This allows you to make the generation counter >> atomic so it may be read outside the lock. >> >> You can do >> >>if (likely(kvm->arch.vmd_gen) == atomic_read(&kvm_vmid_gen))) >> return; >> >>spin_lock(... >> > > I knew you were going to say something here :), please take a look at > this and see if you agree: It looks reasonable wrt locking. > + > + /* First user of a new VMID generation? */ > + if (unlikely(kvm_next_vmid == 0)) { > + atomic64_inc(&kvm_vmid_gen); > + kvm_next_vmid = 1; > + > + /* This does nothing on UP */ > + smp_call_function(reset_vm_context, NULL, 1); Does this imply a memory barrier? If not, smp_mb__after_atomic_inc(). > + > + /* > + * On SMP we know no other CPUs can use this CPU's or > + * each other's VMID since the kvm_vmid_lock blocks > + * them from reentry to the guest. > + */ > + > + reset_vm_context(NULL); These two lines can be folded as on_each_cpu(). Don't you have a race here where you can increment the generation just before guest entry? cpu0 cpu1 (vmid=0, gen=1)(gen=0) -- -- gen == global_gen, return gen != global_gen increment global_gen smp_call_function reset_vm_context vmid=0 enter with vmid=0 enter with vmid=0 You must recheck gen after disabling interrupts to ensure global_gen didn't bump after update_vttbr but before entry. x86 has a lot of this, see vcpu_enter_guest() and vcpu->requests (doesn't apply directly to your case but may come in useful later). > >>> + >>> +/** >>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest >>> code >>> + * @vcpu:The VCPU pointer >>> + * @run: The kvm_run structure pointer used for userspace state >>> exchange >>> + * >>> + * This function is called through the VCPU_RUN ioctl called from user >>> space. It >>> + * will execute VM code in a loop until the time slice for the process is >>> used >>> + * or some emulation is needed from user space in which case the function >>> will >>> + * return with return value 0 and with the kvm_run structure filled in >>> with the >>> + * required data for the requested emulation. >>> + */ >>> int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) >>> { >>> - return -EINVAL; >>> + int ret = 0; >>> + sigset_t sigsaved; >>> + >>> + if (vcpu->sigset_active) >>> + sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); >>> + >>> + run->exit_reason = KVM_EXIT_UNKNOWN; >>> + while (run->exit_reason == KVM_EXIT_UNKNOWN) { >> >> It's not a good idea to read stuff from run unless it's part of the ABI, >> since userspace can play with it while you're reading it. It's harmless >> here but in other places it can lead to a vulnerability. >> > > ok, so in this case, userspace can 'suppress' an MMIO or interrupt > window operation. > > I can change to keep some local variable to maintain the state and > have some API convention for emulation functions, if you feel strongly > about it, but otherwise it feels to me like the code will be more > complicated. Do you have other ideas? x86 uses: 0 - return to userspace (run prepared) 1 - return to guest (run untouched) -ESOMETHING - return to userspace as return values from handlers and for locals (usually named 'r'). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SOLVED: RE: Garbled audio - Windows 7 64 bit guest on Debian
On 06/18/2012 10:32 PM, Jimmy Crossley wrote: > > I have mostly solved this issue. The sound works much, much, better, but is > still not as good as on my host machine. > > I installed PulseAudio and used it instead of ALSA. In order to get kvm to > use it, I set the environment variable QEMU_AUDIO_DRV=pa. I had been using > sudo to start the VM, and that kept this environment variable from being > used. I did a "sudo setcap cap_net_admin+ep /usr/bin/kvm" in order to be > able to run kvm under a normal user account. Now the sound works quite well. > That is good to hear. But you are giving up on a lot of security if you're running kvm as root (or with CAP_NET_ADMIN). qemu supports setting up the network externally and running with no special privileges. Of course, it may not matter for your use case. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 13/15] ARM: KVM: Handle guest faults in KVM
On 06/19/2012 01:20 AM, Christoffer Dall wrote: > On Mon, Jun 18, 2012 at 9:45 AM, Avi Kivity wrote: >> On 06/15/2012 10:09 PM, Christoffer Dall wrote: >>> From: Christoffer Dall >>> >>> Handles the guest faults in KVM by mapping in corresponding user pages >>> in the 2nd stage page tables. >>> >>> Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and >>> pgprot_guest variables used to map 2nd stage memory for KVM guests. >>> >>> Leverages MMU notifiers on KVM/ARM by supporting the kvm_unmap_hva() >>> operation, >>> where we remove the HVA from the 2nd stage translation. All other KVM MMU >>> notifierhooks are NOPs. >> >> I think you must at least support change_pte (possibly by unmapping). >> Andrea? >> > hmmm, at least for KSM support we need to support change_pte (are > there other callers for this type of memory?) > > It's not trivial I guess, since we would need to support COW and > thereby stage-2 permission faults... Marc, right? As I mentioned, you can support change_pte by unmapping. This will cause ksm to be ineffective (pages will only be shared if the guest doesn't touch them at all), but it's enough to get started. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/3] Improve virtio-blk performance
On Tue, Jun 19, 2012 at 5:24 AM, Asias He wrote: > On 06/18/2012 06:58 PM, Stefan Hajnoczi wrote: >> >> As long as the latency is decreasing that's good. But It's worth >> keeping in mind that these percentages are probably wildly different >> on real storage devices and/or qemu-kvm. What we don't know here is >> whether this bottleneck matters in real environments - results with >> real storage and with qemu-kvm would be interesting. > > > Yes. Here is the performance data on a Fusion-IO SSD device. > > Fio test is performed in a 8 vcpu guest with Fusion-IO based guest using kvm > tool. > > Short version: > With bio-based IO path, sequential read/write, random read/write > IOPS boost : 11%, 11%, 13%, 10% > Latency improvement: 10%, 10%, 12%, 10% Nice, I'm glad the improvement shows on real hardware. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 13/15] ARM: KVM: Handle guest faults in KVM
On Tue, Jun 19, 2012 at 12:32:06PM +0300, Avi Kivity wrote: > On 06/19/2012 01:20 AM, Christoffer Dall wrote: > > On Mon, Jun 18, 2012 at 9:45 AM, Avi Kivity wrote: > >> On 06/15/2012 10:09 PM, Christoffer Dall wrote: > >>> From: Christoffer Dall > >>> > >>> Handles the guest faults in KVM by mapping in corresponding user pages > >>> in the 2nd stage page tables. > >>> > >>> Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and > >>> pgprot_guest variables used to map 2nd stage memory for KVM guests. > >>> > >>> Leverages MMU notifiers on KVM/ARM by supporting the kvm_unmap_hva() > >>> operation, > >>> where we remove the HVA from the 2nd stage translation. All other KVM MMU > >>> notifierhooks are NOPs. > >> > >> I think you must at least support change_pte (possibly by unmapping). > >> Andrea? > >> > > hmmm, at least for KSM support we need to support change_pte (are > > there other callers for this type of memory?) > > > > It's not trivial I guess, since we would need to support COW and > > thereby stage-2 permission faults... Marc, right? > > As I mentioned, you can support change_pte by unmapping. This will > cause ksm to be ineffective (pages will only be shared if the guest > doesn't touch them at all), but it's enough to get started. The main reason change_pte initially was required for KSM to be effective was because gup_fast was called with write=1 unconditionally. change_pte was also responsible to set the spte readonly. But that should have been fixed now on x86, so KSM should be effective even despite lack of change_pte on x86. If the KVM page fault is calling gfn_to_pfn_async(write=0/1) depending if the vmexit was caused by a write or read access (instead of gfn_to_pfn which still has the unconditional write=1), and in turn it's forced to sete the spte readonly after calling gfn_to_pfn_async(write=0), change_pte is still useful but it's only a worthwhile optimization to avoid a spte read fault after every KSM page merged, it's not strictly required for KSM effectiveness anymore. In short if ARM does the right thing with regard of KVM read faults passed to gup_fast(write=0) and setting the spte readonly, all should work good with KSM (even if not as optimal as with change_pte). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Drop locks around call to kvmppc_pin_guest_page
On 06.06.2012, at 17:52, Alexander Graf wrote: > On 06/06/2012 02:28 PM, Avi Kivity wrote: >> On 06/01/2012 01:20 PM, Paul Mackerras wrote: >>> At the moment we call kvmppc_pin_guest_page() in kvmppc_update_vpa() >>> with two spinlocks held: the vcore lock and the vcpu->vpa_update_lock. >>> This is not good, since kvmppc_pin_guest_page() calls down_read() and >>> get_user_pages_fast(), both of which can sleep. This bug was introduced >>> in 2e25aa5f ("KVM: PPC: Book3S HV: Make virtual processor area >>> registration more robust"). >>> >>> This arranges to drop those spinlocks before calling >>> kvmppc_pin_guest_page() and re-take them afterwards. Dropping the >>> vcore lock in kvmppc_run_core() means we have to set the vcore_state >>> field to VCORE_RUNNING before we drop the lock, so that other vcpus >>> won't try to run this vcore. >>> >>> Signed-off-by: Paul Mackerras >>> --- >>> Since this bug is in Linus' tree, and it can cause a scheduling while >>> atomic bug message, can we send this to Linus for inclusion in 3.5, >>> after review of course? >>> >> Sure, Alex? > > Yup, reviewed and tested. Looks ready to go into the tree to me. Can you pull > it in the short way please? Avi? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] spapr_vscsi: Error handling fixes
Am 19.06.2012 08:02, schrieb Benjamin Herrenschmidt: > We were incorrectly g_free'ing an object that isn't allocated > in one error path and failed to release it completely in another > > This fixes qemu crashes with some cases of IO errors. > > Signed-off-by: Benjamin Herrenschmidt > --- > hw/spapr_vscsi.c |4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/hw/spapr_vscsi.c b/hw/spapr_vscsi.c > index d2fe3e5..6afc3b4 100644 > --- a/hw/spapr_vscsi.c > +++ b/hw/spapr_vscsi.c > @@ -801,6 +801,7 @@ static void vscsi_got_payload(VSCSIState *s, vscsi_crq > *crq) > if (crq->s.IU_length > sizeof(union viosrp_iu)) { > fprintf(stderr, "VSCSI: SRP IU too long (%d bytes) !\n", > crq->s.IU_length); > +vscsi_put_req(req); > return; > } > > @@ -808,7 +809,8 @@ static void vscsi_got_payload(VSCSIState *s, vscsi_crq > *crq) > if (spapr_vio_dma_read(&s->vdev, crq->s.IU_data_ptr, &req->iu, > crq->s.IU_length)) { > fprintf(stderr, "vscsi_got_payload: DMA read failure !\n"); > -g_free(req); > +vscsi_put_req(req); > + return; Tab alert. :) /-F > } > memcpy(&req->crq, crq, sizeof(vscsi_crq)); > > > > -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Drop locks around call to kvmppc_pin_guest_page
On 06/19/2012 01:50 PM, Alexander Graf wrote: > > On 06.06.2012, at 17:52, Alexander Graf wrote: > >> On 06/06/2012 02:28 PM, Avi Kivity wrote: >>> On 06/01/2012 01:20 PM, Paul Mackerras wrote: At the moment we call kvmppc_pin_guest_page() in kvmppc_update_vpa() with two spinlocks held: the vcore lock and the vcpu->vpa_update_lock. This is not good, since kvmppc_pin_guest_page() calls down_read() and get_user_pages_fast(), both of which can sleep. This bug was introduced in 2e25aa5f ("KVM: PPC: Book3S HV: Make virtual processor area registration more robust"). This arranges to drop those spinlocks before calling kvmppc_pin_guest_page() and re-take them afterwards. Dropping the vcore lock in kvmppc_run_core() means we have to set the vcore_state field to VCORE_RUNNING before we drop the lock, so that other vcpus won't try to run this vcore. Signed-off-by: Paul Mackerras --- Since this bug is in Linus' tree, and it can cause a scheduling while atomic bug message, can we send this to Linus for inclusion in 3.5, after review of course? >>> Sure, Alex? >> >> Yup, reviewed and tested. Looks ready to go into the tree to me. Can you >> pull it in the short way please? > > Avi? Sorry! Applied to master, will push upstream shortly. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[FIX PATCH] KVM: Use kvm_kvfree() to free memory allocated by kvm_kvzalloc()
From: Takuya Yoshikawa The following commit did not care about the error handling path: commit c1a7b32a14138f908df52d7c53b5ce3415ec6b50 KVM: Avoid wasting pages for small lpage_info arrays If memory allocation fails, vfree() will be called with the address returned by kzalloc(). This patch fixes this issue. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/x86.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a01a424..6ed5983 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6350,7 +6350,7 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages) out_free: for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) { - vfree(slot->arch.lpage_info[i]); + kvm_kvfree(slot->arch.lpage_info[i]); slot->arch.lpage_info[i] = NULL; } return -ENOMEM; -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [FIX PATCH] KVM: Use kvm_kvfree() to free memory allocated by kvm_kvzalloc()
On 06/19/2012 04:04 PM, Takuya Yoshikawa wrote: > From: Takuya Yoshikawa > > The following commit did not care about the error handling path: > > commit c1a7b32a14138f908df52d7c53b5ce3415ec6b50 > KVM: Avoid wasting pages for small lpage_info arrays > > If memory allocation fails, vfree() will be called with the address > returned by kzalloc(). This patch fixes this issue. Thanks, applied to next. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: MMU: Make kvm_handle_hva() handle range of addresses
On Mon, 18 Jun 2012 15:11:42 +0300 Avi Kivity wrote: > Potential for improvement: don't do 512 iterations on same large page. > > Something like > > if ((gfn ^ prev_gfn) & mask(level)) > ret |= handler(...) > > with clever selection of the first prev_gfn so it always matches (~gfn > maybe). I thought up a better solution: 1. Separate rmap_pde from lpage_info->write_count and make this a simple array. (I once tried this.) 2. Use gfn_to_index() and loop over rmap array: ... /* intersection check */ start = max(start, memslot->userspace_addr); end = min(end, memslot->userspace_addr + (memslot->npages << PAGE_SHIFT)); if (start > end) continue; /* hva to gfn conversion */ gfn_start = hva_to_gfn_memslot(start); gfn_end = hva_to_gfn_memslot(end); /* main part */ for each level { rmapp = __gfn_to_rmap(gfn_start, level, memslot); for (idx = gfn_to_index(gfn_start, memslot->base_gfn, level); idx < gfn_to_index(gfn_end, memslot->base_gfn, level); idx++) { ... /* loop over rmap array */ ret |= handler(kvm, rmapp + idx, data); } } Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: SOLVED: RE: Garbled audio - Windows 7 64 bit guest on Debian
Thanks for that reminder, Avi. You are right - it would be best to not be running kvm as root or with any special capabilities. I mainly wanted to report back that I got the sound working by switching to PulseAudio. I made the other changes so I would not have to run kvm as root and it would see the environment variable. When I run without that capability, it seems that kvm cannot create the tun device. I will do more research and if I can't find the information I need, I'll start a new thread. > - Jimmy Crossley CoNetrix Sentry Plaza III 5214 68th Street, Suite 200 Lubbock, TX 79424 jcross...@conetrix.com https://www.CoNetrix.com Telephone: 806-687-8600 Toll Free: 800-356-6568 Fax: 806-687-8511 This e-mail message (and attachments) may contain confidential CoNetrix information. If you are not the intended recipient, you cannot use, distribute or copy the message or attachments. In such a case, please notify the sender by return e-mail immediately and erase all copies of the message and attachments. Opinions, conclusions and other information in this message and attachments that do not relate to official business are neither given nor endorsed by CoNetrix. -Original Message- > From: Avi Kivity [mailto:a...@redhat.com] > Sent: Tuesday, June 19, 2012 04:21 > To: Jimmy Crossley > Cc: kvm@vger.kernel.org > Subject: Re: SOLVED: RE: Garbled audio - Windows 7 64 bit guest on Debian > > On 06/18/2012 10:32 PM, Jimmy Crossley wrote: > > > > I have mostly solved this issue. The sound works much, much, better, but > > is still not as good as > on my host machine. > > > > I installed PulseAudio and used it instead of ALSA. In order to get kvm to > > use it, I set the > environment variable QEMU_AUDIO_DRV=pa. I had been using sudo to start the > VM, and that kept this > environment variable from being used. I did a "sudo setcap cap_net_admin+ep > /usr/bin/kvm" in order > to be able to run kvm under a normal user account. Now the sound works quite > well. > > > > That is good to hear. But you are giving up on a lot of security if > you're running kvm as root (or with CAP_NET_ADMIN). qemu supports > setting up the network externally and running with no special privileges. > > Of course, it may not matter for your use case. > > -- > error compiling committee.c: too many arguments to function > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for Tuesday, June 19th
Juan Quintela wrote: > Hi > > Please send in any agenda items you are interested in covering. > > Anthony suggested for last week: > - multithreading vhost (and general vhost improvements) > > I suggest: > - status of migration: post-copy, IDL, XBRLE, huge memory, ... > Will send an email with an status before tomorrow call. XBRLE: v12 is coming today or so. This three patches should be a no-brainer (just refactoring code). 1st one is shared with postcopy. [PATCH v11 1/9] Add MigrationParams structure [PATCH v11 5/9] Add uleb encoding/decoding functions [PATCH v11 6/9] Add save_block_hdr function This ones can be be the ones that we can discuss. [PATCH v11 2/9] Add migration capabilites [PATCH v11 3/9] Add XBZRLE documentation [PATCH v11 4/9] Add cache handling functions [PATCH v11 7/9] Add XBZRLE to ram_save_block and ram_save_live [PATCH v11 8/9] Add set_cachesize command Postcopy: This is just refactoring that can be integrated. [PATCH v2 01/41] arch_init: export sort_ram_list() and ram_save_block() [PATCH v2 02/41] arch_init: export RAM_SAVE_xxx flags for postcopy [PATCH v2 03/41] arch_init/ram_save: introduce constant for ram save version = 4 [PATCH v2 04/41] arch_init: refactor host_from_stream_offset() [PATCH v2 05/41] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case [PATCH v2 06/41] arch_init: refactor ram_save_block() [PATCH v2 07/41] arch_init/ram_save_live: factor out ram_save_limit [PATCH v2 08/41] arch_init/ram_load: refactor ram_load [PATCH v2 09/41] arch_init: introduce helper function to find ram block with id string [PATCH v2 10/41] arch_init: simplify a bit by ram_find_block() [PATCH v2 11/41] arch_init: factor out counting transferred bytes [PATCH v2 12/41] arch_init: factor out setting last_block, last_offset [PATCH v2 13/41] exec.c: factor out qemu_get_ram_ptr() [PATCH v2 14/41] exec.c: export last_ram_offset() [PATCH v2 15/41] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip [PATCH v2 16/41] savevm: qemu_pending_size() to return pending buffered size [PATCH v2 17/41] savevm, buffered_file: introduce method to drain buffer of buffered file [PATCH v2 18/41] QEMUFile: add qemu_file_fd() for later use [PATCH v2 19/41] savevm/QEMUFile: drop qemu_stdio_fd [PATCH v2 20/41] savevm/QEMUFileSocket: drop duplicated member fd [PATCH v2 21/41] savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close [PATCH v2 22/41] savevm/QEMUFile: introduce qemu_fopen_fd [PATCH v2 23/41] migration.c: remove redundant line in migrate_init() [PATCH v2 24/41] migration: export migrate_fd_completed() and migrate_fd_cleanup() [PATCH v2 25/41] migration: factor out parameters into MigrationParams [PATCH v2 26/41] buffered_file: factor out buffer management logic [PATCH v2 27/41] buffered_file: Introduce QEMUFileNonblock for nonblock write [PATCH v2 28/41] buffered_file: add qemu_file to read/write to buffer in memory This is postcopy properly. From this one, postcopy needs to be the things addressed on previous review, and from there probably (at least) another review. Thing to have in account is that the umem (or whatever you want to call it), should be able to work over RDMA. Anyone that knows anything about RDMA to comment on this? [PATCH v2 29/41] umem.h: import Linux umem.h [PATCH v2 30/41] update-linux-headers.sh: teach umem.h to update-linux-headers.sh [PATCH v2 31/41] configure: add CONFIG_POSTCOPY option [PATCH v2 32/41] savevm: add new section that is used by postcopy [PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option [PATCH v2 34/41] postcopy outgoing: add -p and -n option to migrate command [PATCH v2 35/41] postcopy: introduce helper functions for postcopy [PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration [PATCH v2 37/41] postcopy: implement outgoing part of postcopy live migration [PATCH v2 38/41] postcopy/outgoing: add forward, backward option to specify the size of prefault [PATCH v2 39/41] postcopy/outgoing: implement prefault [PATCH v2 40/41] migrate: add -m (movebg) option to migrate command [PATCH v2 41/41] migration/postcopy: add movebg mode Huge memory migration. This ones should be trivial, and integrated. [PATCH 1/7] Add spent time for migration [PATCH 2/7] Add tracepoints for savevm section start/end [PATCH 3/7] No need to iterate if we already are over the limit [PATCH 4/7] Only TCG needs TLB handling [PATCH 5/7] Only calculate expected_time for stage 2 This one is also trivial, but Anthony on previous reviews wanted to have migration-thread before we integrated this one. [PATCH 6/7] Exit loop if we have been there too long This one, Anthony wanted a different approach improving bitmap handling. Not done yet. [PATCH 7/7] Maintaing number of dirty pages IDL patchset. I am not against generating the VMState information, but I am trying to understand how the patch works. Notice that I don't grok Python, this is is one of the reasos it is taking long. This was one of the
Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th
On 06/19/2012 08:54 AM, Juan Quintela wrote: Juan Quintela wrote: Hi Please send in any agenda items you are interested in covering. Anthony suggested for last week: - multithreading vhost (and general vhost improvements) I suggest: - status of migration: post-copy, IDL, XBRLE, huge memory, ... Will send an email with an status before tomorrow call. XBRLE: v12 is coming today or so. This three patches should be a no-brainer (just refactoring code). 1st one is shared with postcopy. [PATCH v11 1/9] Add MigrationParams structure [PATCH v11 5/9] Add uleb encoding/decoding functions [PATCH v11 6/9] Add save_block_hdr function This ones can be be the ones that we can discuss. [PATCH v11 2/9] Add migration capabilites [PATCH v11 3/9] Add XBZRLE documentation [PATCH v11 4/9] Add cache handling functions [PATCH v11 7/9] Add XBZRLE to ram_save_block and ram_save_live [PATCH v11 8/9] Add set_cachesize command Postcopy: This is just refactoring that can be integrated. [PATCH v2 01/41] arch_init: export sort_ram_list() and ram_save_block() [PATCH v2 02/41] arch_init: export RAM_SAVE_xxx flags for postcopy [PATCH v2 03/41] arch_init/ram_save: introduce constant for ram save version = 4 [PATCH v2 04/41] arch_init: refactor host_from_stream_offset() [PATCH v2 05/41] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case [PATCH v2 06/41] arch_init: refactor ram_save_block() [PATCH v2 07/41] arch_init/ram_save_live: factor out ram_save_limit [PATCH v2 08/41] arch_init/ram_load: refactor ram_load [PATCH v2 09/41] arch_init: introduce helper function to find ram block with id string [PATCH v2 10/41] arch_init: simplify a bit by ram_find_block() [PATCH v2 11/41] arch_init: factor out counting transferred bytes [PATCH v2 12/41] arch_init: factor out setting last_block, last_offset [PATCH v2 13/41] exec.c: factor out qemu_get_ram_ptr() [PATCH v2 14/41] exec.c: export last_ram_offset() [PATCH v2 15/41] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip [PATCH v2 16/41] savevm: qemu_pending_size() to return pending buffered size [PATCH v2 17/41] savevm, buffered_file: introduce method to drain buffer of buffered file [PATCH v2 18/41] QEMUFile: add qemu_file_fd() for later use [PATCH v2 19/41] savevm/QEMUFile: drop qemu_stdio_fd [PATCH v2 20/41] savevm/QEMUFileSocket: drop duplicated member fd [PATCH v2 21/41] savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close [PATCH v2 22/41] savevm/QEMUFile: introduce qemu_fopen_fd [PATCH v2 23/41] migration.c: remove redundant line in migrate_init() [PATCH v2 24/41] migration: export migrate_fd_completed() and migrate_fd_cleanup() [PATCH v2 25/41] migration: factor out parameters into MigrationParams [PATCH v2 26/41] buffered_file: factor out buffer management logic [PATCH v2 27/41] buffered_file: Introduce QEMUFileNonblock for nonblock write [PATCH v2 28/41] buffered_file: add qemu_file to read/write to buffer in memory This is postcopy properly. From this one, postcopy needs to be the things addressed on previous review, and from there probably (at least) another review. Thing to have in account is that the umem (or whatever you want to call it), should be able to work over RDMA. Anyone that knows anything about RDMA to comment on this? [PATCH v2 29/41] umem.h: import Linux umem.h [PATCH v2 30/41] update-linux-headers.sh: teach umem.h to update-linux-headers.sh [PATCH v2 31/41] configure: add CONFIG_POSTCOPY option [PATCH v2 32/41] savevm: add new section that is used by postcopy [PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option [PATCH v2 34/41] postcopy outgoing: add -p and -n option to migrate command [PATCH v2 35/41] postcopy: introduce helper functions for postcopy [PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration [PATCH v2 37/41] postcopy: implement outgoing part of postcopy live migration [PATCH v2 38/41] postcopy/outgoing: add forward, backward option to specify the size of prefault [PATCH v2 39/41] postcopy/outgoing: implement prefault [PATCH v2 40/41] migrate: add -m (movebg) option to migrate command [PATCH v2 41/41] migration/postcopy: add movebg mode I'm not at all convinced that postcopy is a good idea. There needs a clear expression of what the value proposition is that's backed by benchmarks. Those benchmarks need to include latency measurements of downtime which so far, I've not seen. I don't want to take any postcopy patches until this discussion happens. Regards, Anthony Liguori Huge memory migration. This ones should be trivial, and integrated. [PATCH 1/7] Add spent time for migration [PATCH 2/7] Add tracepoints for savevm section start/end [PATCH 3/7] No need to iterate if we already are over the limit [PATCH 4/7] Only TCG needs TLB handling [PATCH 5/7] Only calculate expected_time for stage 2 This one is also trivial, but Anthony on previous reviews wanted to have migration-thread before we integrated this one. [PATCH 6/7] Exit loop i
Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th
On Tue, 19 Jun 2012 09:01:36 -0500 Anthony Liguori wrote: > I'm not at all convinced that postcopy is a good idea. There needs a clear > expression of what the value proposition is that's backed by benchmarks. > Those > benchmarks need to include latency measurements of downtime which so far, > I've > not seen. > > I don't want to take any postcopy patches until this discussion happens. FWIW: I rather see postcopy as a way of migrating guests forcibly and I know a service in which such a way is needed: emergency migration. There is also a product which does live migration when some hardware problems are detected (as a semi-FT solution) -- in such cases, we cannot wait until the guest becomes calm. Although I am not certain whether QEMU can be used for such products, it may be worth thinking about. Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th
Hello, Wanted to share some preliminary data from live migration experiments on a setup that is perhaps one of the larger ones. We used Juan's "huge_memory" patches (without the separate migration thread) and measured the total migration time and the time taken for stage 3 ("downtime"). Note: We didn't change the default "downtime" (30ms?). We had a private 10Gig back-to-back link between the two hosts..and we set the migration speed to 10Gig. The "workloads" chosen were ones that we could easily setup. All experiments were done without using virsh/virt-manager (i.e. direct interaction with the qemu monitor prompt). Pl. see the data below. As the guest size increased (and for busier the workloads) we observed that network connections were getting dropped not only during the "downtime" (i.e. stage 3) but also during at times during iterative pre-copy phase (i.e. stage 2). Perhaps some of this will get fixed when we have the migration thread implemented. We had also briefly tried the proposed delta compression changes (easier to say than XBZRLE :)) on a smaller configuration. For the simple workloads (perhaps there was not much temporal locality in them) it didn't seem to show improvements instead took much longer time to migrate (high cache miss penalty?). Waiting for the updated version of the XBZRLE for further experiments to see how well it scales on this larger set up... FYI Vinod --- 10VCPUs/128G --- 1) Idle guest Total migration time : 124585 ms, Stage_3_time : 941 ms , Total MB transferred : 2720 2) AIM7-compute (2000 users) Total migration time : 123540 ms, Stage_3_time : 726 ms , Total MB transferred : 3580 3) SpecJBB (modified to run 10 warehouse threads for a long duration of time) Total migration time : 165720 ms, Stage_3_time : 6851 ms , Total MB transferred : 19656 4) Google SAT (-s 3600 -C 5 -i 5) Total migration time : 411827 ms, Stage_3_time : 77807 ms , Total MB transferred : 142136 --- 20VCPUs /256G --- 1) Idle guest Total migration time : 259938 ms, Stage_3_time : 1998 ms , Total MB transferred : 5114 2) AIM7-compute (2000 users) Total migration time : 261336 ms, Stage_3_time : 2107 ms , Total MB transferred : 5473 3) SpecJBB (modified to run 20 warehouse threads for a long duration of time) Total migration time : 390548 ms, Stage_3_time : 19596 ms , Total MB transferred : 48109 4) Google SAT (-s 3600 -C 10 -i 10) Total migration time : 780150 ms, Stage_3_time : 90346 ms , Total MB transferred : 251287 30VCPUs/384G --- 1) Idle guest (qemu) Total migration time : 501704 ms, Stage_3_time : 2835 ms , Total MB transferred : 15731 2) AIM7-compute (2000 users) Total migration time : 496001 ms, Stage_3_time : 3884 ms , Total MB transferred : 9375 3) SpecJBB (modified to run 30 warehouse threads for a long duration of time) Total migration time : 611075 ms, Stage_3_time : 17107 ms , Total MB transferred : 48862 4) Google SAT (-s 3600 -C 15 -i 15) (look at /tmp/kvm_30w_Goog) Total migration time : 1348102 ms, Stage_3_time : 128531 ms , Total MB transferred : 367524 --- 40VCPUs/512G --- 1) Idle guest Total migration time : 780257 ms, Stage_3_time : 3770 ms , Total MB transferred : 13330 2) AIM7-compute (2000 users) Total migration time : 720963 ms, Stage_3_time : 3966 ms , Total MB transferred : 10595 3) SpecJBB (modified to run 40 warehouse threads for a long duration of time) Total migration time : 863577 ms, Stage_3_time : 25149 ms , Total MB transferred : 54685 4) Google SAT (-s 3600 -C 20 -i 20) Total migration time : 2585039 ms, Stage_3_time : 177625 ms , Total MB transferred : 493575 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th
On Tue, Jun 19, 2012 at 03:54:23PM +0200, Juan Quintela wrote: > Juan Quintela wrote: > > Hi > > > > Please send in any agenda items you are interested in covering. > > > > Anthony suggested for last week: > > - multithreading vhost (and general vhost improvements) > > > > I suggest: > > - status of migration: post-copy, IDL, XBRLE, huge memory, ... > > Will send an email with an status before tomorrow call. > > XBRLE: v12 is coming today or so. > > > This three patches should be a no-brainer (just refactoring code). > 1st one is shared with postcopy. > > [PATCH v11 1/9] Add MigrationParams structure > [PATCH v11 5/9] Add uleb encoding/decoding functions > [PATCH v11 6/9] Add save_block_hdr function > > This ones can be be the ones that we can discuss. > > [PATCH v11 2/9] Add migration capabilites > [PATCH v11 3/9] Add XBZRLE documentation > [PATCH v11 4/9] Add cache handling functions > [PATCH v11 7/9] Add XBZRLE to ram_save_block and ram_save_live > [PATCH v11 8/9] Add set_cachesize command > > Postcopy: This is just refactoring that can be integrated. > > [PATCH v2 01/41] arch_init: export sort_ram_list() and ram_save_block() > [PATCH v2 02/41] arch_init: export RAM_SAVE_xxx flags for postcopy > [PATCH v2 03/41] arch_init/ram_save: introduce constant for ram save version > = 4 > [PATCH v2 04/41] arch_init: refactor host_from_stream_offset() > [PATCH v2 05/41] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE > case > [PATCH v2 06/41] arch_init: refactor ram_save_block() > [PATCH v2 07/41] arch_init/ram_save_live: factor out ram_save_limit > [PATCH v2 08/41] arch_init/ram_load: refactor ram_load > [PATCH v2 09/41] arch_init: introduce helper function to find ram block with > id string > [PATCH v2 10/41] arch_init: simplify a bit by ram_find_block() > [PATCH v2 11/41] arch_init: factor out counting transferred bytes > [PATCH v2 12/41] arch_init: factor out setting last_block, last_offset > [PATCH v2 13/41] exec.c: factor out qemu_get_ram_ptr() > [PATCH v2 14/41] exec.c: export last_ram_offset() > [PATCH v2 15/41] savevm: export qemu_peek_buffer, qemu_peek_byte, > qemu_file_skip > [PATCH v2 16/41] savevm: qemu_pending_size() to return pending buffered size > [PATCH v2 17/41] savevm, buffered_file: introduce method to drain buffer of > buffered file > [PATCH v2 18/41] QEMUFile: add qemu_file_fd() for later use > [PATCH v2 19/41] savevm/QEMUFile: drop qemu_stdio_fd > [PATCH v2 20/41] savevm/QEMUFileSocket: drop duplicated member fd > [PATCH v2 21/41] savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to > fd_close > [PATCH v2 22/41] savevm/QEMUFile: introduce qemu_fopen_fd > [PATCH v2 23/41] migration.c: remove redundant line in migrate_init() > [PATCH v2 24/41] migration: export migrate_fd_completed() and > migrate_fd_cleanup() > [PATCH v2 25/41] migration: factor out parameters into MigrationParams > [PATCH v2 26/41] buffered_file: factor out buffer management logic > [PATCH v2 27/41] buffered_file: Introduce QEMUFileNonblock for nonblock write > [PATCH v2 28/41] buffered_file: add qemu_file to read/write to buffer in > memory > > This is postcopy properly. From this one, postcopy needs to be the > things addressed on previous review, and from there probably (at least) > another review. Thing to have in account is that the umem (or whatever > you want to call it), should be able to work over RDMA. Anyone that > knows anything about RDMA to comment on this? > > [PATCH v2 29/41] umem.h: import Linux umem.h > [PATCH v2 30/41] update-linux-headers.sh: teach umem.h to > update-linux-headers.sh > [PATCH v2 31/41] configure: add CONFIG_POSTCOPY option > [PATCH v2 32/41] savevm: add new section that is used by postcopy > [PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option > [PATCH v2 34/41] postcopy outgoing: add -p and -n option to migrate command > [PATCH v2 35/41] postcopy: introduce helper functions for postcopy > [PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration > [PATCH v2 37/41] postcopy: implement outgoing part of postcopy live migration > [PATCH v2 38/41] postcopy/outgoing: add forward, backward option to specify > the size of prefault > [PATCH v2 39/41] postcopy/outgoing: implement prefault > [PATCH v2 40/41] migrate: add -m (movebg) option to migrate command > [PATCH v2 41/41] migration/postcopy: add movebg mode > > Huge memory migration. > This ones should be trivial, and integrated. > > [PATCH 1/7] Add spent time for migration > [PATCH 2/7] Add tracepoints for savevm section start/end > [PATCH 3/7] No need to iterate if we already are over the limit > [PATCH 4/7] Only TCG needs TLB handling > [PATCH 5/7] Only calculate expected_time for stage 2 > > This one is also trivial, but Anthony on previous reviews wanted to have > migration-thread before we integrated this one. > > [PATCH 6/7] Exit loop if we have been there too long > > This one, Anthony wanted a different approach improving bitmap > handling. N
Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th
On Tue, Jun 19, 2012 at 11:34:42PM +0900, Takuya Yoshikawa wrote: > On Tue, 19 Jun 2012 09:01:36 -0500 > Anthony Liguori wrote: > > > I'm not at all convinced that postcopy is a good idea. There needs a clear > > expression of what the value proposition is that's backed by benchmarks. > > Those > > benchmarks need to include latency measurements of downtime which so far, > > I've > > not seen. > > > > I don't want to take any postcopy patches until this discussion happens. > > FWIW: > > I rather see postcopy as a way of migrating guests forcibly and I know > a service in which such a way is needed: emergency migration. There is > also a product which does live migration when some hardware problems are > detected (as a semi-FT solution) -- in such cases, we cannot wait until > the guest becomes calm. Ignoring max downtime values when we've determined that the target is no longer converging would be another option. Essentially having a use_strict_max_downtime that can be set on a per-migration basis, where if not set we can "give up" on maintaining the max_downtime when it's been determined that progress is no longer being made. > > Although I am not certain whether QEMU can be used for such products, > it may be worth thinking about. > > Thanks, > Takuya > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] block: Introduce __blk_segment_map_sg() helper
Hello, On Mon, Jun 18, 2012 at 7:02 PM, Asias He wrote: >> I *hope* this is a bit prettier. e.g. Do we really need to pass in >> @sglist and keep using "goto new_segment"? > > I think this deserves another patch on top of this splitting one. I'd like > to clean this up later. Yeap, doing it in a separate patch would be better. It would be great if you can include such patch in this series. :) Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: asm/kvm_para.h from asm-generic (was: Re: linux-next: triage for April 19, 2012)
> >> > >> I just noticed include/asm-generic/Kbuild.asm already had > >> > >> ifneq ($(wildcard $(srctree)/arch/$(SRCARCH)/include/asm/kvm_para.h \ > >> $(srctree)/include/asm-$(SRCARCH)/kvm_para.h),) > >> header-y += kvm_para.h > >> endif > >> > >> but this doesn't seem to work. > >> > >> Kbuild people: which one is correct? Any use of wildcards in include/asm-generic/Kbuild.asm is bogus. We should: 1) add explicit header-y for kvm.h + a.out.h to the relevant architectures. 2) add explicit header-y for kvm_para.h for the archictectures that needs this file 3) and drop header-y / dummy file fro the others. Anything else is just papering over the initial wrong doings. I actually coded up "header-n" support to explicitly say: do not include this file for this arch - but looking at the actual problem being solved this is just bogus so I killed it off again. Sam -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: asm/kvm_para.h from asm-generic (was: Re: linux-next: triage for April 19, 2012)
On Tue, Jun 19, 2012 at 08:48:03PM +0200, Sam Ravnborg wrote: > > >> > > >> I just noticed include/asm-generic/Kbuild.asm already had > > >> > > >> ifneq ($(wildcard $(srctree)/arch/$(SRCARCH)/include/asm/kvm_para.h \ > > >> $(srctree)/include/asm-$(SRCARCH)/kvm_para.h),) > > >> header-y += kvm_para.h > > >> endif > > >> > > >> but this doesn't seem to work. > > >> > > >> Kbuild people: which one is correct? > > Any use of wildcards in include/asm-generic/Kbuild.asm is bogus. > We should: > 1) add explicit header-y for kvm.h + a.out.h to the relevant architectures. > 2) add explicit header-y for kvm_para.h for the archictectures that needs > this file > 3) and drop header-y / dummy file fro the others. > > Anything else is just papering over the initial wrong doings. > > I actually coded up "header-n" support to explicitly say: > do not include this file for this arch - but looking at the > actual problem being solved this is just bogus so I killed > it off again. Just to be clear - I know this will break headers_check_all. I will try to take a look soon for a better generic solution. Sam -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] spapr_vscsi: Error handling fixes
On Tue, 2012-06-19 at 13:31 +0200, Andreas Färber wrote: > Tab alert. :) Argh :-) Gotta get an emacs config for use with qemu :-) Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] spapr_vscsi: Error handling fixes
On 19.06.2012, at 22:18, Benjamin Herrenschmidt wrote: > On Tue, 2012-06-19 at 13:31 +0200, Andreas Färber wrote: >> Tab alert. :) > > Argh :-) Gotta get an emacs config for use with qemu :-) So do you want to resend? I can just fix it on the fly too. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Regarding improving ple handler (vcpu_on_spin)
In ple handler code, last_boosted_vcpu (lbv) variable is serving as reference point to start when we enter. lbv = kvm->lbv; for each vcpu i of kvm if i is eligible if yield_to(i) is success lbv = i currently this variable is per VM and it is set after we do yield_to(target), unfortunately it may take little longer than we expect to come back again (depending on its lag in rb tree) on successful yield and set the value. So when several ple_handle entry happens before it is set, all of them start from same place. (and overall RR is also slower). Also statistical analysis (below) is showing lbv is not very well distributed with current approach. naturally, first approach is to move lbv before yield_to, without bothering failure case to make RR fast. (was in Rik's V4 vcpu_on_spin patch series). But when I did performance analysis, in no-overcommit scenario, I saw violent/cascaded directed yield happening, leading to more wastage of cpu in spinning. (huge degradation in 1x and improvement in 3x, I assume this was the reason it was moved after yield_to in V5 of vcpu_on_spin series.) Second approach, I tried was, (1) get rid of per kvm lbv variable (2) everybody who enters handler start from a random vcpu as reference point. The above gave good distribution of starting point,(and performance improvement in 32 vcpu guest I tested) and also IMO, it scales well for larger VM's. Analysis = Four 32 vcpu guest running with one of them running kernbench. PLE handler yield stat is the statistics for successfully yielded case (for 32 vcpus) PLE handler start stat is the statistics for frequency of each vcpu index as starting point (for 32 vcpus) snapshot1 = PLE handler yield stat : 274391 33088 32554 46688 46653 48742 48055 37491 38839 31799 28974 30303 31466 45936 36208 51580 32754 53441 28956 30738 37940 37693 26183 40022 31725 41879 23443 35826 40985 30447 37352 35445 PLE handler start stat : 433590 383318 204835 169981 193508 203954 175960 139373 153835 125245 118532 140092 135732 134903 119349 149467 109871 160404 117140 120554 144715 125099 108527 125051 111416 141385 94815 138387 154710 116270 123130 173795 snapshot2 PLE handler yield stat : 1957091 59383 67866 65474 100335 77683 80958 64073 53783 44620 80131 81058 66493 56677 74222 74974 42398 132762 48982 70230 78318 65198 54446 104793 59937 57974 73367 96436 79922 59476 58835 63547 PLE handler start stat : 2555089 611546 461121 346769 435889 452398 407495 314403 354277 298006 364202 461158 344783 288263 342165 357270 270887 451660 300020 332120 378403 317848 307969 414282 351443 328501 352840 426094 375050 330016 347540 371819 So questions I have in mind is, 1. Do you think going for randomizing last_boosted_vcpu and get rid of per VM variable is better? 2. Can/Do we have a mechanism, from which we will be able to decide not to yield to vcpu who is doing frequent PLE exit (possibly because he is doing unnecessary busy-waits) OR doing yield_to better candidate? On a side note: With pv patches I have tried doing yield_to a kicked VCPU, in vcpu_block path and is giving some performance improvement. Please let me know if you have any comments/suggestions. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] spapr_vscsi: Error handling fixes
On Tue, 2012-06-19 at 22:20 +0200, Alexander Graf wrote: > So do you want to resend? I can just fix it on the fly too. If you can then sure, please do :-) Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: handle last_boosted_vcpu = 0 case
On Wed, 20 Jun 2012 01:50:50 +0530 Raghavendra K T wrote: > > In ple handler code, last_boosted_vcpu (lbv) variable is > serving as reference point to start when we enter. > Also statistical analysis (below) is showing lbv is not very well > distributed with current approach. You are the second person to spot this bug today (yes, today). Due to time zones, the first person has not had a chance yet to test the patch below, which might fix the issue... Please let me know how it goes. 8< If last_boosted_vcpu == 0, then we fall through all test cases and may end up with all VCPUs pouncing on vcpu 0. With a large enough guest, this can result in enormous runqueue lock contention, which can prevent vcpu0 from running, leading to a livelock. Changing < to <= makes sure we properly handle that case. Signed-off-by: Rik van Riel --- virt/kvm/kvm_main.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7e14068..1da542b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1586,7 +1586,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) */ for (pass = 0; pass < 2 && !yielded; pass++) { kvm_for_each_vcpu(i, vcpu, kvm) { - if (!pass && i < last_boosted_vcpu) { + if (!pass && i <= last_boosted_vcpu) { i = last_boosted_vcpu; continue; } else if (pass && i > last_boosted_vcpu) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] spapr: Add "memop" hypercall
On 19.06.2012, at 08:21, Benjamin Herrenschmidt wrote: > This adds a qemu-specific hypervisor call to the pseries machine > which allows to do what amounts to memmove, memcpy and xor over > regions of physical memory such as the framebuffer. > > This is the simplest way to get usable framebuffer speed from > SLOF since the framebuffer isn't mapped in the VRMA and so would > otherwise require an hcall per 8 bytes access. > > The performance is still not great but usable, and can be improved > with a more complex implementation of the hcall itself if needed. > > This also adds some documentation for the qemu-specific hypercalls > that we add to PAPR along with a new qemu,hypertas-functions property > that mirrors ibm,hypertas-functions and provides some discoverability > for the new calls. > > Note: I chose note to advertise H_RTAS to the guest via that mechanism. > This is done on purpose, the guest uses the normal RTAS interfaces > provided by qemu (including SLOF) which internally calls H_RTAS. > > We might in the future implement part (or even all) of RTAS inside the > guest like IBM's firmware does and replace H_RTAS with some finer grained > set of private hypercalls. > > Signed-off-by: Benjamin Herrenschmidt Thanks, applied to ppc-next. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] ppc64: Rudimentary Support for extra page sizes on server CPUs
On 19.06.2012, at 07:56, Benjamin Herrenschmidt wrote: > More recent Power server chips (i.e. based on the 64 bit hash MMU) > support more than just the traditional 4k and 16M page sizes. This > can get quite complicated, because which page sizes are supported, > which combinations are supported within an MMU segment and how these > page sizes are encoded both in the SLB entry and the hash PTE can vary > depending on the CPU model (they are not specified by the > architecture). In addition the firmware or hypervisor may not permit > use of certain page sizes, for various reasons. Whether various page > sizes are supported on KVM, for example, depends on whether the PR or > HV variant of KVM is in use, and on the page size of the memory > backing the guest's RAM. > > This patch adds information to the CPUState and cpu defs to describe > the supported page sizes and encodings. Since TCG does not yet > support any extended page sizes, we just set this to NULL in the > static CPU definitions, expanding this to the default 4k and 16M page > sizes when we initialize the cpu state. When using KVM, however, we > instead determine available page sizes using the new > KVM_PPC_GET_SMMU_INFO call. For old kernels without that call, we use > some defaults, with some guesswork which should do the right thing for > existing HV and PR implementations. The fallback might not be correct > for future versions, but that's ok, because they'll have > KVM_PPC_GET_SMMU_INFO. > > Signed-off-by: Benjamin Herrenschmidt > Signed-off-by: David Gibson Thanks, applied both to ppc-next. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] spapr_vscsi: Error handling fixes
On 19.06.2012, at 08:02, Benjamin Herrenschmidt wrote: > We were incorrectly g_free'ing an object that isn't allocated > in one error path and failed to release it completely in another > > This fixes qemu crashes with some cases of IO errors. > > Signed-off-by: Benjamin Herrenschmidt Thanks, applied to ppc-next. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Fixes related to processing of qemu's -numa option
From: root Changes since v1: - Use bitmap functions that are already in qemu (instead of cpu_set_t macro's) - Added a check for endvalue >= max_cpus. - Fix to address the round-robbing assignment (for the case when cpu's are not explicitly specified) Note: Continuing to use a new constant for allocation of the cpumask (max_cpus was not getting set early enough). --- v1: -- The -numa option to qemu is used to create [fake] numa nodes and expose them to the guest OS instance. There are a couple of issues with the -numa option: a) Max VCPU's that can be specified for a guest while using the qemu's -numa option is 64. Due to a typecasting issue when the number of VCPUs is > 32 the VCPUs don't show up under the specified [fake] numa nodes. b) KVM currently has support for 160VCPUs per guest. The qemu's -numa option has only support for upto 64VCPUs per guest. This patch addresses these two issues. Below are examples of (a) and (b) a) >32 VCPUs are specified with the -numa option: /usr/local/bin/qemu-system-x86_64 \ -enable-kvm \ 71:01:01 \ -net tap,ifname=tap0,script=no,downscript=no \ -vnc :4 ... Upstream qemu : -- QEMU 1.1.50 monitor - type 'help' for more information (qemu) info numa 6 nodes node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41 node 0 size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 node 2 size: 131072 MB node 3 cpus: 30 node 3 size: 131072 MB node 4 cpus: node 4 size: 131072 MB node 5 cpus: 31 node 5 size: 131072 MB With the patch applied : --- QEMU 1.1.50 monitor - type 'help' for more information (qemu) info numa 6 nodes node 0 cpus: 0 1 2 3 4 5 6 7 8 9 node 0 size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 131072 MB node 3 cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 131072 MB node 4 cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 131072 MB node 5 cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 131072 MB b) >64 VCPUs specified with -numa option: /usr/local/bin/qemu-system-x86_64 \ -enable-kvm \ -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme \ -smp sockets=8,cores=10,threads=1 \ -numa node,nodeid=0,cpus=0-9,mem=64g \ -numa node,nodeid=1,cpus=10-19,mem=64g \ -numa node,nodeid=2,cpus=20-29,mem=64g \ -numa node,nodeid=3,cpus=30-39,mem=64g \ -numa node,nodeid=4,cpus=40-49,mem=64g \ -numa node,nodeid=5,cpus=50-59,mem=64g \ -numa node,nodeid=6,cpus=60-69,mem=64g \ -numa node,nodeid=7,cpus=70-79,mem=64g \ -m 524288 \ -name vm1 \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait \ -drive file=/dev/libvirt_lvm/vm.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ -monitor stdio \ -net nic,macaddr=52:54:00:71:01:01 \ -net tap,ifname=tap0,script=no,downscript=no \ -vnc :4 ... Upstream qemu : -- only 63 CPUs in NUMA mode supported. only 64 CPUs in NUMA mode supported. QEMU 1.1.50 monitor - type 'help' for more information (qemu) info numa 8 nodes node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73 node 0 size: 65536 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 74 75 76 77 78 79 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61 node 2 size: 65536 MB node 3 cpus: 30 62 node 3 size: 65536 MB node 4 cpus: node 4 size: 65536 MB node 5 cpus: node 5 size: 65536 MB node 6 cpus: 31 63 node 6 size: 65536 MB node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69 node 7 size: 65536 MB With the patch applied : --- QEMU 1.1.50 monitor - type 'help' for more information (qemu) info numa 8 nodes node 0 cpus: 0 1 2 3 4 5 6 7 8 9 node 0 size: 65536 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 65536 MB node 3 cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 65536 MB node 4 cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 65536 MB node 5 cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 65536 MB node 6 cpus: 60 61 62 63 64 65 66 67 68 69 node 6 size: 65536 MB node 7 cpus: 70 71 72 73 74 75 76 77 78 79 Signed-off-by: Chegu Vinod , Jim Hull , Craig Hada --- cpus.c |3 ++- hw/pc.c |4 +++- sysemu.h |3 ++- vl.c | 48 ++-- 4 files changed, 33 insertions(+), 25 deletions(-) diff --git a/cpus.c b/cpus.c index b182b3d..89ce04d 100644 --- a/cpus.c +++ b/cpus.c @@ -36,6 +36,7 @@ #include "cpus.h" #include "qtest.h" #include "main-loop.h" +#incl
how to map a host virtual address (hva) to guest address space (gva/gpa/gfn)
Hi all, is there any existing function in KVM that can map a host virtual address (hva) to the guest address space. So that the guest VM can access it. i.e. translate hva to a gpa or gva or gfn, so the guest VM can use that converted address to access? Thanks, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
VMM OSX port
Hey everyone, I just subscribed to this list so please forgive me if I goof something up. I was looking into the possibility of porting VMM to OSX. Is this on the todo list already? Or something that anyone else is hoping for? Since it's all written in Python (with GTK for the UI), it doesn't seem like it would be too bad of a port. The hard part is finding all of the pre-req's for OSX. Would someone that knows this code better than I, know of any walls I might run into? Of course the gnome libs won't be available but I was thinking those would get replaced with cocoa interface libs. I've never ported anything before (probably obvious :) so any tips/advice are welcome. (Yay!) python >= 2.4 pygtk2 >= 1.99.12-6 libxml2-python >= 2.6.23 (Not sure if these exist for osx) libvirt-python >= 0.4.0 dbus-python >= 0.61 vte >= 0.12.2 gtk-vnc >= 0.0.1 python-virtinst >= 0.300.0 PolicyKit >= 0.6 (Most likely don't exist for osx :) gnome-python2-gconf >= 1.99.11-7 gnome-keyring >= 0.4.9 gnome-python-desktop >= 2.15.4 -- Gene Crucean - Emmy winning - Oscar nominated VFX Supervisor / iOS-OSX Developer / Filmmaker / Photographer ** Freelance for hire ** www.genecrucean.com ~~ Please use my website's contact form on www.genecrucean.com for any personal emails. Thanks. I may not get them at this address. ~~ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to map a host virtual address (hva) to guest address space (gva/gpa/gfn)
On Tue, Jun 19, 2012 at 6:09 PM, sheng qiu wrote: > Hi all, > > is there any existing function in KVM that can map a host virtual > address (hva) to the guest address space. So that the guest VM can > access it. > i.e. translate hva to a gpa or gva or gfn, so the guest VM can use > that converted address to access? I am not aware of a function that exists in kvm, but you can write one using __gfn_to_memslot() as a reference implementation. you can check if the hva falls in a particular memslot using userspace_addr member of the memslot structure, and then calculate the corresponding gfn from that. > > Thanks, > Sheng > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 10/15] ARM: KVM: Inject IRQs and FIQs from userspace
On Tue, Jun 19, 2012 at 4:49 AM, Avi Kivity wrote: > On 06/18/2012 11:56 PM, Christoffer Dall wrote: >> On Mon, Jun 18, 2012 at 9:32 AM, Avi Kivity wrote: >>> On 06/15/2012 10:08 PM, Christoffer Dall wrote: From: Christoffer Dall Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl. This ioctl is used since the sematics are in fact two lines that can be either raised or lowered on the VCPU - the IRQ and FIQ lines. KVM needs to know which VCPU it must operate on and whether the FIQ or IRQ line is raised/lowered. Hence both pieces of information is packed in the kvm_irq_level->irq field. The irq fild value will be: IRQ: vcpu_index << 1 FIQ: (vcpu_index << 1) | 1 This is documented in Documentation/kvm/api.txt. The effect of the ioctl is simply to simply raise/lower the corresponding irq_line field on the VCPU struct, which will cause the world-switch code to raise/lower virtual interrupts when running the guest on next switch. The wait_for_interrupt flag is also cleared for raised IRQs or FIQs causing an idle VCPU to become active again. CPUs in guest mode are kicked to make sure they refresh their interrupt status. >>> +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm, + struct kvm_irq_level *irq_level) +{ + int mask; + unsigned int vcpu_idx; + struct kvm_vcpu *vcpu; + unsigned long old, new, *ptr; + + vcpu_idx = irq_level->irq >> 1; + if (vcpu_idx >= KVM_MAX_VCPUS) + return -EINVAL; + + vcpu = kvm_get_vcpu(kvm, vcpu_idx); + if (!vcpu) + return -EINVAL; + + if ((irq_level->irq & 1) == KVM_ARM_IRQ_LINE) + mask = HCR_VI; + else /* KVM_ARM_FIQ_LINE */ + mask = HCR_VF; + + trace_kvm_set_irq(irq_level->irq, irq_level->level, 0); + + ptr = (unsigned long *)&vcpu->arch.irq_lines; + do { + old = ACCESS_ONCE(*ptr); + if (irq_level->level) + new = old | mask; + else + new = old & ~mask; + + if (new == old) + return 0; /* no change */ + } while (cmpxchg(ptr, old, new) != old); >>> >>> Isn't this a complicated >>> >>> >>> if (level) >>> set_bit() >>> else >>> clear_bit() >>> >>> ? >>> >> >> we need to atomically know if we changed either the FIQ/IRQ fields and >> atomically update without locking. > > So use test_and_set_bit()/test_and_clear_bit(). > >> (I think you suggested this >> approach, in fact). > > I think so too, but it only makes sense if you need to consider both > fields simultaneously (which you don't here). For example, if IRQ was > asserted while FIQ was already raised, you don't need to kick. But > that's not the case according to the below. > ok, fair enough. I simplified it a bit. >> + + /* + * The vcpu irq_lines field was updated, wake up sleeping VCPUs and + * trigger a world-switch round on the running physical CPU to set the + * virtual IRQ/FIQ fields in the HCR appropriately. + */ + kvm_vcpu_kick(vcpu); >>> >>> No need to wake when the line is asserted so you can make this >>> conditional on level. >>> >> >> we need to trigger a world switch to update the virtualized register >> from the actual running physical CPU if we changed any of the IRQ/FIQ >> fields. We return in the loop above if we didn't change anything. > > Okay. > > >> + + return 0; +} + long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -298,7 +345,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { - return -EINVAL; + struct kvm *kvm = filp->private_data; + void __user *argp = (void __user *)arg; + + switch (ioctl) { + case KVM_IRQ_LINE: { + struct kvm_irq_level irq_event; + + if (copy_from_user(&irq_event, argp, sizeof irq_event)) + return -EFAULT; + return kvm_arch_vm_ioctl_irq_line(kvm, &irq_event); + } + default: + return -EINVAL; + } } >>> >>> Should be in common code guarded by the define introduced previously. >>> >>> >> >> you mean like this: ? >> >> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c >> index a9f209b..5bf2193 100644 >> --- a/arch/arm/kvm/arm.c >> +++ b/arch/arm/kvm/arm.c >> @@ -535,8 +535,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcp
Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
On Tue, Jun 19, 2012 at 5:16 AM, Avi Kivity wrote: > On 06/19/2012 01:05 AM, Christoffer Dall wrote: >>> Premature, but this is sad. I suggest you split vmid generation from >>> next available vmid. This allows you to make the generation counter >>> atomic so it may be read outside the lock. >>> >>> You can do >>> >>> if (likely(kvm->arch.vmd_gen) == atomic_read(&kvm_vmid_gen))) >>> return; >>> >>> spin_lock(... >>> >> >> I knew you were going to say something here :), please take a look at >> this and see if you agree: > > It looks reasonable wrt locking. > >> + >> + /* First user of a new VMID generation? */ >> + if (unlikely(kvm_next_vmid == 0)) { >> + atomic64_inc(&kvm_vmid_gen); >> + kvm_next_vmid = 1; >> + >> + /* This does nothing on UP */ >> + smp_call_function(reset_vm_context, NULL, 1); > > Does this imply a memory barrier? If not, smp_mb__after_atomic_inc(). > yes, it implies a memory barrier. >> + >> + /* >> + * On SMP we know no other CPUs can use this CPU's or >> + * each other's VMID since the kvm_vmid_lock blocks >> + * them from reentry to the guest. >> + */ >> + >> + reset_vm_context(NULL); > > These two lines can be folded as on_each_cpu(). > > Don't you have a race here where you can increment the generation just > before guest entry? I don't think I do. > > cpu0 cpu1 > (vmid=0, gen=1) (gen=0) > -- -- > gen == global_gen, return > > gen != global_gen > increment global_gen > smp_call_function > reset_vm_context > vmid=0 > > enter with vmid=0 enter with vmid=0 I can't see how the scenario above can happen. First, no-one can run with vmid 0 - it is reserved for the host. If we bump global_gen on cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure that after this call, all cpus(N-1) potentially being inside a VM will have exited, called reset_vm_context, but before they can re-enter into the guest, they will call update_vttbr, and if their generation counter differs from global_gen, they will try to grab that spinlock and everything should happen in order. > > You must recheck gen after disabling interrupts to ensure global_gen > didn't bump after update_vttbr but before entry. x86 has a lot of this, > see vcpu_enter_guest() and vcpu->requests (doesn't apply directly to > your case but may come in useful later). > >> + +/** + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code + * @vcpu: The VCPU pointer + * @run: The kvm_run structure pointer used for userspace state exchange + * + * This function is called through the VCPU_RUN ioctl called from user space. It + * will execute VM code in a loop until the time slice for the process is used + * or some emulation is needed from user space in which case the function will + * return with return value 0 and with the kvm_run structure filled in with the + * required data for the requested emulation. + */ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) { - return -EINVAL; + int ret = 0; + sigset_t sigsaved; + + if (vcpu->sigset_active) + sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); + + run->exit_reason = KVM_EXIT_UNKNOWN; + while (run->exit_reason == KVM_EXIT_UNKNOWN) { >>> >>> It's not a good idea to read stuff from run unless it's part of the ABI, >>> since userspace can play with it while you're reading it. It's harmless >>> here but in other places it can lead to a vulnerability. >>> >> >> ok, so in this case, userspace can 'suppress' an MMIO or interrupt >> window operation. >> >> I can change to keep some local variable to maintain the state and >> have some API convention for emulation functions, if you feel strongly >> about it, but otherwise it feels to me like the code will be more >> complicated. Do you have other ideas? > > x86 uses: > > 0 - return to userspace (run prepared) > 1 - return to guest (run untouched) > -ESOMETHING - return to userspace > yeah, we can do that I guess, that's fair. > as return values from handlers and for locals (usually named 'r'). > > > -- > error compiling committee.c: too many arguments to function > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
On Tue, Jun 19, 2012 at 11:27 PM, Christoffer Dall wrote: > On Tue, Jun 19, 2012 at 5:16 AM, Avi Kivity wrote: >> On 06/19/2012 01:05 AM, Christoffer Dall wrote: Premature, but this is sad. I suggest you split vmid generation from next available vmid. This allows you to make the generation counter atomic so it may be read outside the lock. You can do if (likely(kvm->arch.vmd_gen) == atomic_read(&kvm_vmid_gen))) return; spin_lock(... >>> >>> I knew you were going to say something here :), please take a look at >>> this and see if you agree: >> >> It looks reasonable wrt locking. >> >>> + >>> + /* First user of a new VMID generation? */ >>> + if (unlikely(kvm_next_vmid == 0)) { >>> + atomic64_inc(&kvm_vmid_gen); >>> + kvm_next_vmid = 1; >>> + >>> + /* This does nothing on UP */ >>> + smp_call_function(reset_vm_context, NULL, 1); >> >> Does this imply a memory barrier? If not, smp_mb__after_atomic_inc(). >> > > yes, it implies a memory barrier. > >>> + >>> + /* >>> + * On SMP we know no other CPUs can use this CPU's or >>> + * each other's VMID since the kvm_vmid_lock blocks >>> + * them from reentry to the guest. >>> + */ >>> + >>> + reset_vm_context(NULL); >> >> These two lines can be folded as on_each_cpu(). >> >> Don't you have a race here where you can increment the generation just >> before guest entry? > > I don't think I do. > uh, strike that, I do. >> >> cpu0 cpu1 >> (vmid=0, gen=1) (gen=0) >> -- -- >> gen == global_gen, return >> >> gen != global_gen >> increment global_gen >> smp_call_function >> reset_vm_context >> vmid=0 >> >> enter with vmid=0 enter with vmid=0 > > I can't see how the scenario above can happen. First, no-one can run > with vmid 0 - it is reserved for the host. If we bump global_gen on > cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure > that after this call, all cpus(N-1) potentially being inside a VM will > have exited, called reset_vm_context, but before they can re-enter > into the guest, they will call update_vttbr, and if their generation > counter differs from global_gen, they will try to grab that spinlock > and everything should happen in order. > the whole vmid=0 confused me a bit. The point is since we moved the generation check outside the spin_lock we have to re-check, I see: diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 19fe3b0..74760af 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -313,6 +313,24 @@ static void reset_vm_context(void *info) } /** + * check_new_vmid_gen - check that the VMID is still valid + * @kvm: The VM's VMID to checkt + * + * return true if there is a new generation of VMIDs being used + * + * The hardware supports only 256 values with the value zero reserved for the + * host, so we check if an assigned value belongs to a previous generation, + * which which requires us to assign a new value. If we're the first to use a + * VMID for the new generation, we must flush necessary caches and TLBs on all + * CPUs. + */ +static bool check_new_vmid_gen(struct kvm *kvm) +{ + if (likely(kvm->arch.vmid_gen == atomic64_read(&kvm_vmid_gen))) + return; +} + +/** * update_vttbr - Update the VTTBR with a valid VMID before the guest runs * @kvmThe guest that we are about to run * @@ -324,15 +342,7 @@ static void update_vttbr(struct kvm *kvm) { phys_addr_t pgd_phys; - /* -* Check that the VMID is still valid. -* (The hardware supports only 256 values with the value zero -* reserved for the host, so we check if an assigned value belongs to -* a previous generation, which which requires us to assign a new -* value. If we're the first to use a VMID for the new generation, -* we must flush necessary caches and TLBs on all CPUs.) -*/ - if (likely(kvm->arch.vmid_gen == atomic64_read(&kvm_vmid_gen))) + if (!check_new_vmid_gen(kvm)) return; spin_lock(&kvm_vmid_lock); @@ -504,6 +514,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) */ preempt_disable(); local_irq_disable(); + + if (check_new_vmid_gen(kvm)) { + local_irq_enable(); + preempt_enable(); + continue; + } + kvm_guest_enter(); vcpu->mode = IN_GUEST_MODE; >> >> You must recheck gen after disabling interrupts to ensure global_gen >> didn't bump after update_vttbr but before entry. x86 has a lot of this, >> see vcpu_ente
Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk
On 06/19/2012 02:21 PM, Dor Laor wrote: On 06/19/2012 05:51 AM, Asias He wrote: On 06/18/2012 07:39 PM, Sasha Levin wrote: On Mon, 2012-06-18 at 14:14 +0300, Dor Laor wrote: On 06/18/2012 01:05 PM, Rusty Russell wrote: On Mon, 18 Jun 2012 16:03:23 +0800, Asias He wrote: On 06/18/2012 03:46 PM, Rusty Russell wrote: On Mon, 18 Jun 2012 14:53:10 +0800, Asias He wrote: This patch introduces bio-based IO path for virtio-blk. Why make it optional? request-based IO path is useful for users who do not want to bypass the IO scheduler in guest kernel, e.g. users using spinning disk. For users using fast disk device, e.g. SSD device, they can use bio-based IO path. Users using a spinning disk still get IO scheduling in the host though. What benefit is there in doing it in the guest as well? The io scheduler waits for requests to merge and thus batch IOs together. It's not important w.r.t spinning disks since the host can do it but it causes much less vmexits which is the key issue for VMs. Is the amount of exits caused by virtio-blk significant at all with EVENT_IDX? Yes. EVENT_IDX saves the number of notify and interrupt. Let's take the interrupt as an example, The guest fires 200K request to host, the number of interrupt is about 6K thanks to EVENT_IDX. The ratio is 200K / 6K = 33. The ratio of merging is 4K / 200K = 20. In this case, why don't you always recommend bio over request based? This case shows that IO scheduler's merging in guest saves a lot of requests to host side. Why should I recommend bio over request based here? -- Asias -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Implementing a Virtio host module for 802.11 media
Greetings! I want to develop a Virtio host module to emulate 802.11 wireless networks. Here's the motivation: I am a PhD student working in wireless networks. I've been using a cluster of 32 KVM guest nodes to emulate wireless nodes and testing mobile Ad-hoc routing protocols. I use "iptables" MAC address filtering to emulate the physical isolation of wireless nodes caused by distance, and use "tc" command to control the bandwidth. However, nodes within communication range share the wireless media, which means they can interfere with each other and there are packets collisions, as a result, the bandwidth of each node is also affected by nodes around it. I find that no matter how I use the "tc" command to do the traffic control, it's always hard to approach the real media sharing model in 802.11 wireless networks. I have to synchronize not only virtual location, but also real-time network usage of each node. Even if synchronizing these, it's still only approximate, and it's ugly. So I begin to believe that I'm trying to solve the problem in a wrong way. I should probably emulate these in KVM host. I want to emulate the real world wireless network, where interference and packets collision can take place just like the real wireless communication. The model does not have to very precisely describe the real media, but should consider those factors affecting the wireless bandwidth and delay. I have some Linux development and network experience, but I'm quite new to the community. I don't really have much kernel development experience but I'll be pleased to learn. Do you guys have any advices about where I can get started? Any suggestion ( or collaboration if anybody wants ) will be appreciated. Thanks! Kind regards, -- Song Gao https://profiles.google.com/song.gao.beta/about -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html