Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On 12/06/2011 04:42 AM, Ben Hutchings wrote: On Mon, 2011-12-05 at 16:59 +0800, Jason Wang wrote: In order to let the packets of a flow to be passed to the desired guest cpu, we can co-operate with devices through programming the flow director which was just a hash to queue table. This kinds of co-operation is done through the accelerate RFS support, a device specific flow sterring method virtnet_fd() is used to modify the flow director based on rfs mapping. The desired queue were calculated through reverse mapping of the irq affinity table. In order to parallelize the ingress path, irq affinity of rx queue were also provides by the driver. In addition to accelerate RFS, we can also use the guest scheduler to balance the load of TX and reduce the lock contention on egress path, so the processor_id() were used to tx queue selection. [...] +#ifdef CONFIG_RFS_ACCEL + +int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb, + u16 rxq_index, u32 flow_id) +{ + struct virtnet_info *vi = netdev_priv(net_dev); + u16 *table = NULL; + + if (skb->protocol != htons(ETH_P_IP) || !skb->rxhash) + return -EPROTONOSUPPORT; Why only IPv4? Oops, IPv6 should work also. + table = kmap_atomic(vi->fd_page); + table[skb->rxhash& TAP_HASH_MASK] = rxq_index; + kunmap_atomic(table); + + return 0; +} +#endif This is not a proper implementation of ndo_rx_flow_steer. If you steer a flow by changing the RSS table this can easily cause packet reordering in other flows. The filtering should be more precise, ideally matching exactly a single flow by e.g. VID and IP 5-tuple. I think you need to add a second hash table which records exactly which flow is supposed to be steered. Also, you must call rps_may_expire_flow() to check whether an entry in this table may be replaced; otherwise you can cause packet reordering in the flow that was previously being steered. Finally, this function must return the table index it assigned, so that rps_may_expire_flow() works. Thanks for the explanation, how about document this briefly in scaling.txt? +static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb) +{ + int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) : + smp_processor_id(); + + /* As we make use of the accelerate rfs which let the scheduler to +* balance the load, it make sense to choose the tx queue also based on +* theprocessor id? +*/ + while (unlikely(txq>= dev->real_num_tx_queues)) + txq -= dev->real_num_tx_queues; + return txq; +} [...] Don't do this, let XPS handle it. Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 2/5] tuntap: simple flow director support
On 12/06/2011 04:09 AM, Ben Hutchings wrote: On Mon, 2011-12-05 at 16:58 +0800, Jason Wang wrote: This patch adds a simple flow director to tun/tap device. It is just a page that contains the hash to queue mapping which could be changed by user-space. The backend (tap/macvtap) would query this table to get the desired queue of a packets when it send packets to userspace. This is just flow hashing (RSS), not flow steering. The page address were set through a new kind of ioctl - TUNSETFD and were pinned until device exit or another new page were specified. [...] You should implement ethtool ETHTOOL_{G,S}RXFHINDIR instead. Ben. I'm not fully understanding this. The page belongs to guest, and the idea is to let guest driver can easily change any entry. Looks like if ethtool_set_rxfh_indir() is used, this kind of change is not easy as it needs one copy and can only accept the whole table as its parameters. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V3 4/4] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
On 12/06/2011 08:57 AM, Konrad Rzeszutek Wilk wrote: +static inline void add_stats(enum kvm_contention_stat var, int val) You probably want 'int val' to be 'u32 val' as that is the type in contention_stats. Yes. Thanks for pointing, as its cumulative. It is indeed u32 in #else :).I 'll change that. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance
On Mon, Dec 5, 2011 at 4:41 PM, Gleb Natapov wrote: > On Mon, Dec 05, 2011 at 01:39:37PM +0800, Liu ping fan wrote: >> On Sun, Dec 4, 2011 at 8:10 PM, Gleb Natapov wrote: >> > On Sun, Dec 04, 2011 at 07:53:37PM +0800, Liu ping fan wrote: >> >> On Sat, Dec 3, 2011 at 2:26 AM, Jan Kiszka wrote: >> >> > On 2011-12-02 07:26, Liu Ping Fan wrote: >> >> >> From: Liu Ping Fan >> >> >> >> >> >> Currently, vcpu can be destructed only when kvm instance destroyed. >> >> >> Change this to vcpu's destruction taken when its refcnt is zero, >> >> >> and then vcpu MUST and CAN be destroyed before kvm's destroy. >> >> > >> >> > I'm lacking the big picture yet (would be good to have in the change log >> >> > - at least I'm too lazy to read the code): >> >> > >> >> > What increments the refcnt, what decrements it again? IOW, how does user >> >> > space controls the life-cycle of a vcpu after your changes? >> >> > >> >> In local APIC mode, delivering IPI to target APIC, target's refcnt is >> >> incremented, and decremented when finished. At other times, using RCU to >> > Why is this needed? >> > >> Suppose the following scene: >> >> #define kvm_for_each_vcpu(idx, vcpup, kvm) \ >> for (idx = 0; \ >> idx < atomic_read(&kvm->online_vcpus) && \ >> (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \ >> idx++) >> >> --> >> Here kvm_vcpu's destruction is called >> vcpup->vcpu_id ... //oops! >> >> > And this is exactly how your code looks. i.e you do not increment > reference count in most of the loops, you only increment it twice > (in pic_unlock() and kvm_irq_delivery_to_apic()) because you are using > vcpu outside of rcu_read_lock() protected section and I do not see why > not just extend protected section to include kvm_vcpu_kick(). As far as > I can see this function does not sleep. > :-), I just want to minimize the RCU critical area, and as you say, we can extend protected section to include kvm_vcpu_kick() > What should protect vcpu from disappearing in your example above is RCU > itself if you are using it right. But since I do not see any calls to > rcu_assign_pointer()/rcu_dereference() I doubt you are using it right > actually. > Sorry, but I thought it would not be. Please help me to check my thoughts : struct kvm_vcpu *kvm_vcpu_get(struct kvm_vcpu *vcpu) { if (vcpu == NULL) return NULL; if (atomic_add_unless(&vcpu->refcount, 1, 0)) --increment return vcpu; return NULL; } void kvm_vcpu_put(struct kvm_vcpu *vcpu) { struct kvm *kvm; if (atomic_dec_and_test(&vcpu->refcount)) { --decrement kvm = vcpu->kvm; mutex_lock(&kvm->lock); kvm->vcpus[vcpu->vcpu_id] = NULL; atomic_dec(&kvm->online_vcpus); mutex_unlock(&kvm->lock); call_rcu(&vcpu->head, kvm_vcpu_zap); } } The atomic of decrement and increment are protected by cache coherent protocol. So once we hold a valid kvm_vcpu pointer through kvm_vcpu_get(), we will always keep it until we release it, then, the destruction may happen. Thanks and regards, ping fan > -- > Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote: On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang wrote: +static int virtnet_set_fd(struct net_device *dev, u32 pfn) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct virtio_device *vdev = vi->vdev; + + if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) { + vdev->config->set(vdev, + offsetof(struct virtio_net_config_fd, addr), +&pfn, sizeof(u32)); Please use the virtio model (i.e. virtqueues) instead of shared memory. Mapping a page breaks the virtio abstraction. Using control virtqueue is more suitable but there's are also some problems: One problem is the interface, if we use control virtqueue, we need a interface between the backend and tap/macvtap to change the flow mapping. But qemu and vhost_net only know about the file descriptor, more informations or interfaces need to be exposed in order to let ethtool or ioctl work. Another problem is the delay introduced by ctrl vq, as the ctrl vq would be used in the critical path in guest and it use busy wait to get the response, the delay is not neglectable. Stefan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/13] KVM: PPC: Allow for read-only pages backing a Book3S HV guest
With this, if a guest does an H_ENTER with a read/write HPTE on a page which is currently read-only, we make the actual HPTE inserted be a read-only version of the HPTE. We now intercept protection faults as well as HPTE not found faults, and for a protection fault we work out whether it should be reflected to the guest (e.g. because the guest HPTE didn't allow write access to usermode) or handled by switching to kernel context and calling kvmppc_book3s_hv_page_fault, which will then request write access to the page and update the actual HPTE. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h | 20 - arch/powerpc/kvm/book3s_64_mmu_hv.c | 33 +++-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 32 - arch/powerpc/kvm/book3s_hv_rmhandlers.S |4 +- 4 files changed, 72 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 75a1b42..37755d0 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -115,6 +115,22 @@ static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize) return ((ptel & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT; } +static inline int hpte_is_writable(unsigned long ptel) +{ + unsigned long pp = ptel & (HPTE_R_PP0 | HPTE_R_PP); + + return pp != PP_RXRX && pp != PP_RXXX; +} + +static inline unsigned long hpte_make_readonly(unsigned long ptel) +{ + if ((ptel & HPTE_R_PP0) || (ptel & HPTE_R_PP) == PP_RWXX) + ptel = (ptel & ~HPTE_R_PP) | PP_RXXX; + else + ptel |= PP_RXRX; + return ptel; +} + static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long io_type) { unsigned int wimg = ptel & HPTE_R_WIMG; @@ -134,7 +150,7 @@ static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long io_type) * Lock and read a linux PTE. If it's present and writable, atomically * set dirty and referenced bits and return the PTE, otherwise return 0. */ -static inline pte_t kvmppc_read_update_linux_pte(pte_t *p) +static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing) { pte_t pte, tmp; @@ -152,7 +168,7 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t *p) if (pte_present(pte)) { pte = pte_mkyoung(pte); - if (pte_write(pte)) + if (writing && pte_write(pte)) pte = pte_mkdirty(pte); } diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 6919d99..b1b31c7 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -502,6 +502,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, struct page *page, *pages[1]; long index, ret, npages; unsigned long is_io; + unsigned int writing, write_ok; struct vm_area_struct *vma; /* @@ -552,8 +553,11 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, pfn = 0; page = NULL; pte_size = PAGE_SIZE; + writing = (dsisr & DSISR_ISSTORE) != 0; + /* If writing != 0, then the HPTE must allow writing, if we get here */ + write_ok = writing; hva = gfn_to_hva_memslot(memslot, gfn); - npages = get_user_pages_fast(hva, 1, 1, pages); + npages = get_user_pages_fast(hva, 1, writing, pages); if (npages < 1) { /* Check if it's an I/O mapping */ down_read(¤t->mm->mmap_sem); @@ -564,6 +568,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, ((hva - vma->vm_start) >> PAGE_SHIFT); pte_size = psize; is_io = hpte_cache_bits(pgprot_val(vma->vm_page_prot)); + write_ok = vma->vm_flags & VM_WRITE; } up_read(¤t->mm->mmap_sem); if (!pfn) @@ -574,6 +579,18 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, page = compound_head(page); pte_size <<= compound_order(page); } + /* if the guest wants write access, see if that is OK */ + if (!writing && hpte_is_writable(hpte[2])) { + pte_t *ptep, pte; + + ptep = find_linux_pte_or_hugepte(current->mm->pgd, +hva, NULL); + if (ptep && pte_present(*ptep)) { + pte = kvmppc_read_update_linux_pte(ptep, 1); + if (pte_write(pte)) + write_ok = 1; + } + } pfn = page_to_pfn(pa
[PATCH 12/13] KVM: PPC: Implement MMU notifiers for Book3S HV guests
This adds the infrastructure to enable us to page out pages underneath a Book3S HV guest, on processors that support virtualized partition memory, that is, POWER7. Instead of pinning all the guest's pages, we now look in the host userspace Linux page tables to find the mapping for a given guest page. Then, if the userspace Linux PTE gets invalidated, kvm_unmap_hva() gets called for that address, and we replace all the guest HPTEs that refer to that page with absent HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit set, which will cause an HDSI when the guest tries to access them. Finally, the page fault handler is extended to reinstantiate the guest HPTE when the guest tries to access a page which has been paged out. Since we can't intercept the guest DSI and ISI interrupts on PPC970, we still have to pin all the guest pages on PPC970. We have a new flag, kvm->arch.using_mmu_notifiers, that indicates whether we can page guest pages out. If it is not set, the MMU notifier callbacks do nothing and everything operates as before. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s.h|4 + arch/powerpc/include/asm/kvm_book3s_64.h | 31 arch/powerpc/include/asm/kvm_host.h | 16 ++ arch/powerpc/include/asm/reg.h |3 + arch/powerpc/kvm/Kconfig |1 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 268 -- arch/powerpc/kvm/book3s_hv.c | 25 ++-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 140 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 49 ++ arch/powerpc/kvm/powerpc.c |3 + arch/powerpc/mm/hugetlbpage.c|2 + 11 files changed, 483 insertions(+), 59 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 5ac53f9..72688d8 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -145,6 +145,10 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat, extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr); extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu); extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn); +extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev, + unsigned long *rmap, long pte_index, int realmode); +extern void kvmppc_invalidate_hpte(struct kvm *kvm, unsigned long *hptep, + unsigned long pte_index); extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr, unsigned long *nb_ret); extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr); diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 9a59b6d..75a1b42 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -130,6 +130,37 @@ static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long io_type) return (wimg & (HPTE_R_W | HPTE_R_I)) == io_type; } +/* + * Lock and read a linux PTE. If it's present and writable, atomically + * set dirty and referenced bits and return the PTE, otherwise return 0. + */ +static inline pte_t kvmppc_read_update_linux_pte(pte_t *p) +{ + pte_t pte, tmp; + + /* wait until _PAGE_BUSY is clear then set it atomically */ + __asm__ __volatile__ ( + "1: ldarx %0,0,%3\n" + " andi. %1,%0,%4\n" + " bne-1b\n" + " ori %1,%0,%4\n" + " stdcx. %1,0,%3\n" + " bne-1b" + : "=&r" (pte), "=&r" (tmp), "=m" (*p) + : "r" (p), "i" (_PAGE_BUSY) + : "cc"); + + if (pte_present(pte)) { + pte = pte_mkyoung(pte); + if (pte_write(pte)) + pte = pte_mkdirty(pte); + } + + *p = pte; /* clears _PAGE_BUSY */ + + return pte; +} + /* Return HPTE cache control bits corresponding to Linux pte bits */ static inline unsigned long hpte_cache_bits(unsigned long pte_val) { diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index c9c92f0..eb20ddc 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -32,6 +32,7 @@ #include #include #include +#include #define KVM_MAX_VCPUS NR_CPUS #define KVM_MAX_VCORES NR_CPUS @@ -43,6 +44,19 @@ #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 #endif +#ifdef CONFIG_KVM_BOOK3S_64_HV +#include + +#define KVM_ARCH_WANT_MMU_NOTIFIER + +struct kvm; +extern int kvm_unmap_hva(struct kvm *kvm, unsigned long hva); +extern int kvm_age_hva(struct kvm *kvm, unsigned long hva); +extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva); +extern void kvm_set_spte_hva(struct kv
[PATCH 07/13] KVM: PPC: Allow use of small pages to back Book3S HV guests
This relaxes the requirement that the guest memory be provided as 16MB huge pages, allowing it to be provided as normal memory, i.e. in pages of PAGE_SIZE bytes (4k or 64k). To allow this, we index the kvm->arch.slot_phys[] arrays with a small page index, even if huge pages are being used, and use the low-order 5 bits of each entry to store the order of the enclosing page with respect to normal pages, i.e. log_2(enclosing_page_size / PAGE_SIZE). Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h |8 ++ arch/powerpc/include/asm/kvm_host.h |3 +- arch/powerpc/include/asm/kvm_ppc.h |2 +- arch/powerpc/include/asm/reg.h |1 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 122 -- arch/powerpc/kvm/book3s_hv.c | 57 -- arch/powerpc/kvm/book3s_hv_rm_mmu.c |6 +- 7 files changed, 130 insertions(+), 69 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index ab6772e..d55e6b4 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -107,4 +107,12 @@ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) return 0; /* error */ } +static inline bool slot_is_aligned(struct kvm_memory_slot *memslot, + unsigned long pagesize) +{ + unsigned long mask = (pagesize >> PAGE_SHIFT) - 1; + + return !(memslot->base_gfn & mask) && !(memslot->npages & mask); +} + #endif /* __ASM_KVM_BOOK3S_64_H__ */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2a52bdb..ba1da85 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -176,14 +176,13 @@ struct revmap_entry { }; /* Low-order bits in kvm->arch.slot_phys[][] */ +#define KVMPPC_PAGE_ORDER_MASK 0x1f #define KVMPPC_GOT_PAGE0x80 struct kvm_arch { #ifdef CONFIG_KVM_BOOK3S_64_HV unsigned long hpt_virt; struct revmap_entry *revmap; - unsigned long ram_psize; - unsigned long ram_porder; unsigned int lpid; unsigned int host_lpid; unsigned long host_lpcr; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 111e1b4..a61b5b5 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -122,7 +122,7 @@ extern void kvmppc_free_hpt(struct kvm *kvm); extern long kvmppc_prepare_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem); extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu, - struct kvm_memory_slot *memslot); + struct kvm_memory_slot *memslot, unsigned long porder); extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce *args); diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 559da19..4599d12 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -237,6 +237,7 @@ #define LPCR_ISL (1ul << (63-2)) #define LPCR_VC_SH (63-2) #define LPCR_DPFD_SH (63-11) +#define LPCR_VRMASD (0x1ful << (63-16)) #define LPCR_VRMA_L (1ul << (63-12)) #define LPCR_VRMA_LP0(1ul << (63-15)) #define LPCR_VRMA_LP1(1ul << (63-16)) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 87016cc..cc18f3d 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -34,8 +34,6 @@ #include #include -/* Pages in the VRMA are 16MB pages */ -#define VRMA_PAGE_ORDER24 #define VRMA_VSID 0x1ffUL /* 1TB VSID reserved for VRMA */ /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */ @@ -95,17 +93,31 @@ void kvmppc_free_hpt(struct kvm *kvm) free_pages(kvm->arch.hpt_virt, HPT_ORDER - PAGE_SHIFT); } -void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot) +/* Bits in first HPTE dword for pagesize 4k, 64k or 16M */ +static inline unsigned long hpte0_pgsize_encoding(unsigned long pgsize) +{ + return (pgsize > 0x1000) ? HPTE_V_LARGE : 0; +} + +/* Bits in second HPTE dword for pagesize 4k, 64k or 16M */ +static inline unsigned long hpte1_pgsize_encoding(unsigned long pgsize) +{ + return (pgsize == 0x1) ? 0x1000 : 0; +} + +void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, +unsigned long porder) { - struct kvm *kvm = vcpu->kvm; unsigned long i; unsigned long npages; unsigned long hp_v, hp_r; unsigned long addr, hash; - unsigned long porder = kvm->arch.ram_porder; + unsigned long psize; + unsigned long hp0, hp1;
[PATCH 01/13] KVM: PPC: Move kvm_vcpu_ioctl_[gs]et_one_reg down to platform-specific code
This moves the get/set_one_reg implementation down from powerpc.c into booke.c, book3s_pr.c and book3s_hv.c. This avoids #ifdefs in C code, but more importantly, it fixes a bug on Book3s HV where we were accessing beyond the end of the kvm_vcpu struct (via the to_book3s() macro) and corrupting memory, causing random crashes and file corruption. On Book3s HV we only accept setting the HIOR to zero, since the guest runs in supervisor mode and its vectors are never offset from zero. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_ppc.h |3 ++ arch/powerpc/kvm/book3s_hv.c | 33 ++ arch/powerpc/kvm/book3s_pr.c | 33 ++ arch/powerpc/kvm/booke.c | 10 + arch/powerpc/kvm/powerpc.c | 39 5 files changed, 79 insertions(+), 39 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 5192c2e..fc2d696 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -176,6 +176,9 @@ int kvmppc_core_set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs); void kvmppc_get_sregs_ivor(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs); int kvmppc_set_sregs_ivor(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs); +int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg); +int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg); + void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 pid); #ifdef CONFIG_KVM_BOOK3S_64_HV diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index ecc77fa..5efdd5b 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -390,6 +390,39 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, return 0; } +int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) +{ + int r = -EINVAL; + + switch (reg->id) { + case KVM_ONE_REG_PPC_HIOR: + reg->u.reg64 = 0; + r = 0; + break; + default: + break; + } + + return r; +} + +int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) +{ + int r = -EINVAL; + + switch (reg->id) { + case KVM_ONE_REG_PPC_HIOR: + /* Only allow this to be set to zero */ + if (reg->u.reg64 == 0) + r = 0; + break; + default: + break; + } + + return r; +} + int kvmppc_core_check_processor_compat(void) { if (cpu_has_feature(CPU_FTR_HVMODE)) diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index cbb7051..1abe35c 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -837,6 +837,39 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, return 0; } +int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) +{ + int r = -EINVAL; + + switch (reg->id) { + case KVM_ONE_REG_PPC_HIOR: + reg->u.reg64 = to_book3s(vcpu)->hior; + r = 0; + break; + default: + break; + } + + return r; +} + +int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) +{ + int r = -EINVAL; + + switch (reg->id) { + case KVM_ONE_REG_PPC_HIOR: + to_book3s(vcpu)->hior = reg->u.reg64; + to_book3s(vcpu)->hior_explicit = true; + r = 0; + break; + default: + break; + } + + return r; +} + int kvmppc_core_check_processor_compat(void) { return 0; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 9e41f45..ee9e1ee 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -887,6 +887,16 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, return kvmppc_core_set_sregs(vcpu, sregs); } +int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) +{ + return -EINVAL; +} + +int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) +{ + return -EINVAL; +} + int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) { return -ENOTSUPP; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 34515e8..1239c6f 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -620,45 +620,6 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, return r; } -static int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, - struct kvm_one_reg *reg) -{ - int r = -EINVAL; - - switch (reg->id) { -#ifdef CONFIG_PPC_BOOK3S - case KVM_ONE_REG_PPC_HIOR: - reg->u.reg64 = to_book3s(vcpu)->hior; - r = 0; -
[PATCH 05/13] KVM: PPC: Make the H_ENTER hcall more reliable
At present, our implementation of H_ENTER only makes one try at locking each slot that it looks at, and doesn't even retry the ldarx/stdcx. atomic update sequence that it uses to attempt to lock the slot. Thus it can return the H_PTEG_FULL error unnecessarily, particularly when the H_EXACT flag is set, meaning that the caller wants a specific PTEG slot. This improves the situation by making a second pass when no free HPTE slot is found, where we spin until we succeed in locking each slot in turn and then check whether it is full while we hold the lock. If the second pass fails, then we return H_PTEG_FULL. This also moves lock_hpte to a header file (since later commits in this series will need to use it from other source files) and renames it to try_lock_hpte, which is a somewhat less misleading name. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h | 25 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 63 -- 2 files changed, 59 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 23bb17e..fe45a81 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -37,6 +37,31 @@ static inline struct kvmppc_book3s_shadow_vcpu *to_svcpu(struct kvm_vcpu *vcpu) #define HPT_HASH_MASK (HPT_NPTEG - 1) #endif +/* + * We use a lock bit in HPTE dword 0 to synchronize updates and + * accesses to each HPTE, and another bit to indicate non-present + * HPTEs. + */ +#define HPTE_V_HVLOCK 0x40UL + +static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits) +{ + unsigned long tmp, old; + + asm volatile(" ldarx %0,0,%2\n" +" and.%1,%0,%3\n" +" bne 2f\n" +" ori %0,%0,%4\n" +" stdcx. %0,0,%2\n" +" beq+2f\n" +" li %1,%3\n" +"2:isync" +: "=&r" (tmp), "=&r" (old) +: "r" (hpte), "r" (bits), "i" (HPTE_V_HVLOCK) +: "cc", "memory"); + return old == 0; +} + static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, unsigned long pte_index) { diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 5f45ba7..659175f 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -56,26 +56,6 @@ static void *real_vmalloc_addr(void *x) return __va(addr); } -#define HPTE_V_HVLOCK 0x40UL - -static inline long lock_hpte(unsigned long *hpte, unsigned long bits) -{ - unsigned long tmp, old; - - asm volatile(" ldarx %0,0,%2\n" -" and.%1,%0,%3\n" -" bne 2f\n" -" ori %0,%0,%4\n" -" stdcx. %0,0,%2\n" -" beq+2f\n" -" li %1,%3\n" -"2:isync" -: "=&r" (tmp), "=&r" (old) -: "r" (hpte), "r" (bits), "i" (HPTE_V_HVLOCK) -: "cc", "memory"); - return old == 0; -} - long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, long pte_index, unsigned long pteh, unsigned long ptel) { @@ -129,24 +109,49 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, pteh &= ~0x60UL; ptel &= ~(HPTE_R_PP0 - kvm->arch.ram_psize); ptel |= pa; + if (pte_index >= HPT_NPTE) return H_PARAMETER; if (likely((flags & H_EXACT) == 0)) { pte_index &= ~7UL; hpte = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4)); - for (i = 0; ; ++i) { - if (i == 8) - return H_PTEG_FULL; + for (i = 0; i < 8; ++i) { if ((*hpte & HPTE_V_VALID) == 0 && - lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID)) + try_lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID)) break; hpte += 2; } + if (i == 8) { + /* +* Since try_lock_hpte doesn't retry (not even stdcx. +* failures), it could be that there is a free slot +* but we transiently failed to lock it. Try again, +* actually locking each slot and checking it. +*/ + hpte -= 16; + for (i = 0; i < 8; ++i) { + while (!try_lock_hpte(hpte, HPTE_V_HVLOCK)) + cpu_relax(); +
[PATCH 10/13] KVM: PPC: Implement MMIO emulation support for Book3S HV guests
This provides the low-level support for MMIO emulation in Book3S HV guests. When the guest tries to map a page which is not covered by any memslot, that page is taken to be an MMIO emulation page. Instead of inserting a valid HPTE, we insert an HPTE that has the valid bit clear but another hypervisor software-use bit set, which we call HPTE_V_ABSENT, to indicate that this is an absent page. An absent page is treated much like a valid page as far as guest hcalls (H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that an absent HPTE doesn't need to be invalidated with tlbie since it was never valid as far as the hardware is concerned. When the guest accesses a page for which there is an absent HPTE, it will take a hypervisor data storage interrupt (HDSI) since we now set the VPM1 bit in the LPCR. Our HDSI handler for HPTE-not-present faults looks up the hash table and if it finds an absent HPTE mapping the requested virtual address, will switch to kernel mode and handle the fault in kvmppc_book3s_hv_page_fault(), which at present just calls kvmppc_hv_emulate_mmio() to set up the MMIO emulation. This is based on an earlier patch by Benjamin Herrenschmidt, but since heavily reworked. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s.h|5 + arch/powerpc/include/asm/kvm_book3s_64.h | 26 +++ arch/powerpc/include/asm/kvm_host.h |5 + arch/powerpc/include/asm/mmu-hash64.h|2 +- arch/powerpc/include/asm/ppc-opcode.h|4 +- arch/powerpc/include/asm/reg.h |1 + arch/powerpc/kernel/asm-offsets.c|1 + arch/powerpc/kernel/exceptions-64s.S |8 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 228 +-- arch/powerpc/kvm/book3s_hv.c | 21 ++- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 262 ++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 127 --- 12 files changed, 607 insertions(+), 83 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 5e7e04b..5ac53f9 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -121,6 +121,11 @@ extern void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu); extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte); extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr); extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu); +extern int kvmppc_book3s_hv_page_fault(struct kvm_run *run, + struct kvm_vcpu *vcpu, unsigned long addr, + unsigned long status); +extern long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, + unsigned long slb_v, unsigned long valid); extern void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache *pte); extern struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 90e6658..9a59b6d 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -37,12 +37,15 @@ static inline struct kvmppc_book3s_shadow_vcpu *to_svcpu(struct kvm_vcpu *vcpu) #define HPT_HASH_MASK (HPT_NPTEG - 1) #endif +#define VRMA_VSID 0x1ffUL /* 1TB VSID reserved for VRMA */ + /* * We use a lock bit in HPTE dword 0 to synchronize updates and * accesses to each HPTE, and another bit to indicate non-present * HPTEs. */ #define HPTE_V_HVLOCK 0x40UL +#define HPTE_V_ABSENT 0x20UL static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits) { @@ -138,6 +141,29 @@ static inline unsigned long hpte_cache_bits(unsigned long pte_val) #endif } +static inline bool hpte_read_permission(unsigned long pp, unsigned long key) +{ + if (key) + return PP_RWRX <= pp && pp <= PP_RXRX; + return 1; +} + +static inline bool hpte_write_permission(unsigned long pp, unsigned long key) +{ + if (key) + return pp == PP_RWRW; + return pp <= PP_RWRW; +} + +static inline int hpte_get_skey_perm(unsigned long hpte_r, unsigned long amr) +{ + unsigned long skey; + + skey = ((hpte_r & HPTE_R_KEY_HI) >> 57) | + ((hpte_r & HPTE_R_KEY_LO) >> 9); + return (amr >> (62 - 2 * skey)) & 3; +} + static inline void lock_rmap(unsigned long *rmap) { do { diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index e369d49..c9c92f0 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -209,6 +209,7 @@ struct kvm_arch { unsigned long lpcr; unsigned long rmor; struct kvmppc_rma_info *rma; + unsigned long vrma_slb_v; int rma_setup_done; struct list_head spapr_tce_tables; spinlock_t slot_phys_lock; @@ -451,6 +452,10 @@
[PATCH 08/13] KVM: PPC: Allow I/O mappings in memory slots
This provides for the case where userspace maps an I/O device into the address range of a memory slot using a VM_PFNMAP mapping. In that case, we work out the pfn from vma->vm_pgoff, and record the cache enable bits from vma->vm_page_prot in two low-order bits in the slot_phys array entries. Then, in kvmppc_h_enter() we check that the cache bits in the HPTE that the guest wants to insert match the cache bits in the slot_phys array entry. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h | 26 +++ arch/powerpc/include/asm/kvm_host.h |2 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 67 -- arch/powerpc/kvm/book3s_hv_rm_mmu.c |5 +- 4 files changed, 76 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index d55e6b4..a98e0f6 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -107,6 +107,32 @@ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) return 0; /* error */ } +static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long io_type) +{ + unsigned int wimg = ptel & HPTE_R_WIMG; + + /* Handle SAO */ + if (wimg == (HPTE_R_W | HPTE_R_I | HPTE_R_M) && + cpu_has_feature(CPU_FTR_ARCH_206)) + wimg = HPTE_R_M; + + if (!io_type) + return wimg == HPTE_R_M; + + return (wimg & (HPTE_R_W | HPTE_R_I)) == io_type; +} + +/* Return HPTE cache control bits corresponding to Linux pte bits */ +static inline unsigned long hpte_cache_bits(unsigned long pte_val) +{ +#if _PAGE_NO_CACHE == HPTE_R_I && _PAGE_WRITETHRU == HPTE_R_W + return pte_val & (HPTE_R_W | HPTE_R_I); +#else + return ((pte_val & _PAGE_NO_CACHE) ? HPTE_R_I : 0) + + ((pte_val & _PAGE_WRITETHRU) ? HPTE_R_W : 0); +#endif +} + static inline bool slot_is_aligned(struct kvm_memory_slot *memslot, unsigned long pagesize) { diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index ba1da85..9b1c247 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -177,6 +177,8 @@ struct revmap_entry { /* Low-order bits in kvm->arch.slot_phys[][] */ #define KVMPPC_PAGE_ORDER_MASK 0x1f +#define KVMPPC_PAGE_NO_CACHE HPTE_R_I/* 0x20 */ +#define KVMPPC_PAGE_WRITETHRU HPTE_R_W/* 0x40 */ #define KVMPPC_GOT_PAGE0x80 struct kvm_arch { diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index cc18f3d..b904c40 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -199,7 +199,8 @@ static long kvmppc_get_guest_page(struct kvm *kvm, unsigned long gfn, struct page *page, *hpage, *pages[1]; unsigned long s, pgsize; unsigned long *physp; - unsigned int got, pgorder; + unsigned int is_io, got, pgorder; + struct vm_area_struct *vma; unsigned long pfn, i, npages; physp = kvm->arch.slot_phys[memslot->id]; @@ -208,34 +209,51 @@ static long kvmppc_get_guest_page(struct kvm *kvm, unsigned long gfn, if (physp[gfn - memslot->base_gfn]) return 0; + is_io = 0; + got = 0; page = NULL; pgsize = psize; + err = -EINVAL; start = gfn_to_hva_memslot(memslot, gfn); /* Instantiate and get the page we want access to */ np = get_user_pages_fast(start, 1, 1, pages); - if (np != 1) - return -EINVAL; - page = pages[0]; - got = KVMPPC_GOT_PAGE; + if (np != 1) { + /* Look up the vma for the page */ + down_read(¤t->mm->mmap_sem); + vma = find_vma(current->mm, start); + if (!vma || vma->vm_start > start || + start + psize > vma->vm_end || + !(vma->vm_flags & VM_PFNMAP)) + goto up_err; + is_io = hpte_cache_bits(pgprot_val(vma->vm_page_prot)); + pfn = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); + /* check alignment of pfn vs. requested page size */ + if (psize > PAGE_SIZE && (pfn & ((psize >> PAGE_SHIFT) - 1))) + goto up_err; + up_read(¤t->mm->mmap_sem); - /* See if this is a large page */ - s = PAGE_SIZE; - if (PageHuge(page)) { - hpage = compound_head(page); - s <<= compound_order(hpage); - /* Get the whole large page if slot alignment is ok */ - if (s > psize && slot_is_aligned(memslot, s) && - !(memslot->userspace_addr & (s - 1))) { - start &= ~(s - 1); - pgsize = s; -
[PATCH 09/13] KVM: PPC: Maintain a doubly-linked list of guest HPTEs for each gfn
This expands the reverse mapping array to contain two links for each HPTE which are used to link together HPTEs that correspond to the same guest logical page. Each circular list of HPTEs is pointed to by the rmap array entry for the guest logical page, pointed to by the relevant memslot. Links are 32-bit HPT entry indexes rather than full 64-bit pointers, to save space. We use 3 of the remaining 32 bits in the rmap array entries as a lock bit, a referenced bit and a present bit (the present bit is needed since HPTE index 0 is valid). The bit lock for the rmap chain nests inside the HPTE lock bit. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h | 18 ++ arch/powerpc/include/asm/kvm_host.h | 17 ++- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 84 +- 3 files changed, 117 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index a98e0f6..90e6658 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -107,6 +107,11 @@ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) return 0; /* error */ } +static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize) +{ + return ((ptel & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT; +} + static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long io_type) { unsigned int wimg = ptel & HPTE_R_WIMG; @@ -133,6 +138,19 @@ static inline unsigned long hpte_cache_bits(unsigned long pte_val) #endif } +static inline void lock_rmap(unsigned long *rmap) +{ + do { + while (test_bit(KVMPPC_RMAP_LOCK_BIT, rmap)) + cpu_relax(); + } while (test_and_set_bit_lock(KVMPPC_RMAP_LOCK_BIT, rmap)); +} + +static inline void unlock_rmap(unsigned long *rmap) +{ + __clear_bit_unlock(KVMPPC_RMAP_LOCK_BIT, rmap); +} + static inline bool slot_is_aligned(struct kvm_memory_slot *memslot, unsigned long pagesize) { diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 9b1c247..e369d49 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -169,12 +169,27 @@ struct kvmppc_rma_info { /* * The reverse mapping array has one entry for each HPTE, * which stores the guest's view of the second word of the HPTE - * (including the guest physical address of the mapping). + * (including the guest physical address of the mapping), + * plus forward and backward pointers in a doubly-linked ring + * of HPTEs that map the same host page. The pointers in this + * ring are 32-bit HPTE indexes, to save space. */ struct revmap_entry { unsigned long guest_rpte; + unsigned int forw, back; }; +/* + * We use the top bit of each memslot->rmap entry as a lock bit, + * and bit 32 as a present flag. The bottom 32 bits are the + * index in the guest HPT of a HPTE that points to the page. + */ +#define KVMPPC_RMAP_LOCK_BIT 63 +#define KVMPPC_RMAP_REF_BIT33 +#define KVMPPC_RMAP_REFERENCED (1ul << KVMPPC_RMAP_REF_BIT) +#define KVMPPC_RMAP_PRESENT0x1ul +#define KVMPPC_RMAP_INDEX 0xul + /* Low-order bits in kvm->arch.slot_phys[][] */ #define KVMPPC_PAGE_ORDER_MASK 0x1f #define KVMPPC_PAGE_NO_CACHE HPTE_R_I/* 0x20 */ diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 88d2add..b600f8c 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -57,6 +57,70 @@ static void *real_vmalloc_addr(void *x) return __va(addr); } +/* + * Add this HPTE into the chain for the real page. + * Must be called with the chain locked; it unlocks the chain. + */ +static void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev, +unsigned long *rmap, long pte_index, int realmode) +{ + struct revmap_entry *head, *tail; + unsigned long i; + + if (*rmap & KVMPPC_RMAP_PRESENT) { + i = *rmap & KVMPPC_RMAP_INDEX; + head = &kvm->arch.revmap[i]; + if (realmode) + head = real_vmalloc_addr(head); + tail = &kvm->arch.revmap[head->back]; + if (realmode) + tail = real_vmalloc_addr(tail); + rev->forw = i; + rev->back = head->back; + tail->forw = pte_index; + head->back = pte_index; + } else { + rev->forw = rev->back = pte_index; + i = pte_index; + } + smp_wmb(); + *rmap = i | KVMPPC_RMAP_REFERENCED | KVMPPC_RMAP_PRESENT; /* unlock */ +} + +/* Remove this HPTE from the chain for a real page */ +static void remove_revmap_chain(struct kvm *kvm, long pte_index, +
[PATCH 04/13] KVM: PPC: Add an interface for pinning guest pages in Book3s HV guests
This adds two new functions, kvmppc_pin_guest_page() and kvmppc_unpin_guest_page(), and uses them to pin the guest pages where the guest has registered areas of memory for the hypervisor to update, (i.e. the per-cpu virtual processor areas, SLB shadow buffers and dispatch trace logs) and then unpin them when they are no longer required. Although it is not strictly necessary to pin the pages at this point, since all guest pages are already pinned, later commits in this series will mean that guest pages aren't all pinned. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s.h |3 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 38 ++ arch/powerpc/kvm/book3s_hv.c | 67 ++--- 3 files changed, 78 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index deb8a4e..16db48c 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -140,6 +140,9 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat, extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr); extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu); extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn); +extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr, + unsigned long *nb_ret); +extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr); extern void kvmppc_entry_trampoline(void); extern void kvmppc_hv_entry_trampoline(void); diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index e4c6069..dcd39dc 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -184,6 +184,44 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, return -ENOENT; } +void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long gpa, + unsigned long *nb_ret) +{ + struct kvm_memory_slot *memslot; + unsigned long gfn = gpa >> PAGE_SHIFT; + struct page *page; + unsigned long offset; + unsigned long pfn, pa; + unsigned long *physp; + + memslot = gfn_to_memslot(kvm, gfn); + if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) + return NULL; + physp = kvm->arch.slot_phys[memslot->id]; + if (!physp) + return NULL; + physp += (gfn - memslot->base_gfn) >> + (kvm->arch.ram_porder - PAGE_SHIFT); + pa = *physp; + if (!pa) + return NULL; + pfn = pa >> PAGE_SHIFT; + page = pfn_to_page(pfn); + get_page(page); + offset = gpa & (kvm->arch.ram_psize - 1); + if (nb_ret) + *nb_ret = kvm->arch.ram_psize - offset; + return page_address(page) + offset; +} + +void kvmppc_unpin_guest_page(struct kvm *kvm, void *va) +{ + struct page *page = virt_to_page(va); + + page = compound_head(page); + put_page(page); +} + void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu) { struct kvmppc_mmu *mmu = &vcpu->arch.mmu; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index c2ee5a7..6e94af8 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -137,12 +137,10 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu *vcpu, unsigned long vcpuid, unsigned long vpa) { struct kvm *kvm = vcpu->kvm; - unsigned long gfn, pg_index, ra, len; - unsigned long pg_offset; + unsigned long len, nb; void *va; struct kvm_vcpu *tvcpu; - struct kvm_memory_slot *memslot; - unsigned long *physp; + int err = H_PARAMETER; tvcpu = kvmppc_find_vcpu(kvm, vcpuid); if (!tvcpu) @@ -155,51 +153,41 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu *vcpu, if (flags < 4) { if (vpa & 0x7f) return H_PARAMETER; + if (flags >= 2 && !tvcpu->arch.vpa) + return H_RESOURCE; /* registering new area; convert logical addr to real */ - gfn = vpa >> PAGE_SHIFT; - memslot = gfn_to_memslot(kvm, gfn); - if (!memslot || !(memslot->flags & KVM_MEMSLOT_INVALID)) - return H_PARAMETER; - physp = kvm->arch.slot_phys[memslot->id]; - if (!physp) - return H_PARAMETER; - pg_index = (gfn - memslot->base_gfn) >> - (kvm->arch.ram_porder - PAGE_SHIFT); - pg_offset = vpa & (kvm->arch.ram_psize - 1); - ra = physp[pg_index]; - if (!ra) + va = kvmppc_pin_guest_page(kvm, vpa, &nb); + if (va == NULL) return H_PARAMETER; -
[PATCH 03/13] KVM: PPC: Keep page physical addresses in per-slot arrays
This allocates an array for each memory slot that is added to store the physical addresses of the pages in the slot. This array is vmalloc'd and accessed in kvmppc_h_enter using real_vmalloc_addr(). This allows us to remove the ram_pginfo field from the kvm_arch struct, and removes the 64GB guest RAM limit that we had. We use the low-order bits of the array entries to store a flag indicating that we have done get_page on the corresponding page, and therefore need to call put_page when we are finished with the page. Currently this is set for all pages except those in our special RMO regions. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_host.h |8 ++- arch/powerpc/kvm/book3s_64_mmu_hv.c | 18 +++--- arch/powerpc/kvm/book3s_hv.c| 114 +-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 44 - 4 files changed, 109 insertions(+), 75 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 629df2e..cf6b4d7 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -175,25 +175,27 @@ struct revmap_entry { unsigned long guest_rpte; }; +/* Low-order bits in kvm->arch.slot_phys[][] */ +#define KVMPPC_GOT_PAGE0x80 + struct kvm_arch { #ifdef CONFIG_KVM_BOOK3S_64_HV unsigned long hpt_virt; struct revmap_entry *revmap; - unsigned long ram_npages; unsigned long ram_psize; unsigned long ram_porder; - struct kvmppc_pginfo *ram_pginfo; unsigned int lpid; unsigned int host_lpid; unsigned long host_lpcr; unsigned long sdr1; unsigned long host_sdr1; int tlbie_lock; - int n_rma_pages; unsigned long lpcr; unsigned long rmor; struct kvmppc_rma_info *rma; struct list_head spapr_tce_tables; + unsigned long *slot_phys[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS]; + int slot_npages[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS]; unsigned short last_vcpu[NR_CPUS]; struct kvmppc_vcore *vcores[KVM_MAX_VCORES]; #endif /* CONFIG_KVM_BOOK3S_64_HV */ diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 80ece8d..e4c6069 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -98,16 +98,16 @@ void kvmppc_free_hpt(struct kvm *kvm) void kvmppc_map_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem) { unsigned long i; - unsigned long npages = kvm->arch.ram_npages; - unsigned long pfn; + unsigned long npages; + unsigned long pa; unsigned long *hpte; unsigned long hash; unsigned long porder = kvm->arch.ram_porder; struct revmap_entry *rev; - struct kvmppc_pginfo *pginfo = kvm->arch.ram_pginfo; + unsigned long *physp; - if (!pginfo) - return; + physp = kvm->arch.slot_phys[mem->slot]; + npages = kvm->arch.slot_npages[mem->slot]; /* VRMA can't be > 1TB */ if (npages > 1ul << (40 - porder)) @@ -117,9 +117,10 @@ void kvmppc_map_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem) npages = HPT_NPTEG; for (i = 0; i < npages; ++i) { - pfn = pginfo[i].pfn; - if (!pfn) + pa = physp[i]; + if (!pa) break; + pa &= PAGE_MASK; /* can't use hpt_hash since va > 64 bits */ hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & HPT_HASH_MASK; /* @@ -131,8 +132,7 @@ void kvmppc_map_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem) hash = (hash << 3) + 7; hpte = (unsigned long *) (kvm->arch.hpt_virt + (hash << 4)); /* HPTE low word - RPN, protection, etc. */ - hpte[1] = (pfn << PAGE_SHIFT) | HPTE_R_R | HPTE_R_C | - HPTE_R_M | PP_RWXX; + hpte[1] = pa | HPTE_R_R | HPTE_R_C | HPTE_R_M | PP_RWXX; smp_wmb(); hpte[0] = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) | (i << (VRMA_PAGE_ORDER - 16)) | HPTE_V_BOLTED | diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 5efdd5b..c2ee5a7 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -48,14 +48,6 @@ #include #include -/* - * For now, limit memory to 64GB and require it to be large pages. - * This value is chosen because it makes the ram_pginfo array be - * 64kB in size, which is about as large as we want to be trying - * to allocate with kmalloc. - */ -#define MAX_MEM_ORDER 36 - #define LARGE_PAGE_ORDER 24 /* 16MB pages */ /* #define EXIT_DEBUG */ @@ -145,10 +137,12 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu *vcpu,
[PATCH 02/13] KVM: PPC: Keep a record of HV guest view of hashed page table entries
This adds an array that parallels the guest hashed page table (HPT), that is, it has one entry per HPTE, used to store the guest's view of the second doubleword of the corresponding HPTE. The first doubleword in the HPTE is the same as the guest's idea of it, so we don't need to store a copy, but the second doubleword in the HPTE has the real page number rather than the guest's logical page number. This allows us to remove the back_translate() and reverse_xlate() functions. This "reverse mapping" array is vmalloc'd, meaning that to access it in real mode we have to walk the kernel's page tables explicitly. That is done by the new real_vmalloc_addr() function. (In fact this returns an address in the linear mapping, so the result is usable both in real mode and in virtual mode.) There are also some minor cleanups here: moving the definitions of HPT_ORDER etc. to a header file and defining HPT_NPTE for HPT_NPTEG << 3. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h |8 +++ arch/powerpc/include/asm/kvm_host.h | 10 arch/powerpc/kvm/book3s_64_mmu_hv.c | 44 +++ arch/powerpc/kvm/book3s_hv_rm_mmu.c | 87 ++ 4 files changed, 103 insertions(+), 46 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index d0ac94f..23bb17e 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -29,6 +29,14 @@ static inline struct kvmppc_book3s_shadow_vcpu *to_svcpu(struct kvm_vcpu *vcpu) #define SPAPR_TCE_SHIFT12 +#ifdef CONFIG_KVM_BOOK3S_64_HV +/* For now use fixed-size 16MB page table */ +#define HPT_ORDER 24 +#define HPT_NPTEG (1ul << (HPT_ORDER - 7))/* 128B per pteg */ +#define HPT_NPTE (HPT_NPTEG << 3)/* 8 PTEs per PTEG */ +#define HPT_HASH_MASK (HPT_NPTEG - 1) +#endif + static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, unsigned long pte_index) { diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 66c75cd..629df2e 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -166,9 +166,19 @@ struct kvmppc_rma_info { atomic_t use_count; }; +/* + * The reverse mapping array has one entry for each HPTE, + * which stores the guest's view of the second word of the HPTE + * (including the guest physical address of the mapping). + */ +struct revmap_entry { + unsigned long guest_rpte; +}; + struct kvm_arch { #ifdef CONFIG_KVM_BOOK3S_64_HV unsigned long hpt_virt; + struct revmap_entry *revmap; unsigned long ram_npages; unsigned long ram_psize; unsigned long ram_porder; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index bc3a2ea..80ece8d 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -33,11 +34,6 @@ #include #include -/* For now use fixed-size 16MB page table */ -#define HPT_ORDER 24 -#define HPT_NPTEG (1ul << (HPT_ORDER - 7))/* 128B per pteg */ -#define HPT_HASH_MASK (HPT_NPTEG - 1) - /* Pages in the VRMA are 16MB pages */ #define VRMA_PAGE_ORDER24 #define VRMA_VSID 0x1ffUL /* 1TB VSID reserved for VRMA */ @@ -51,7 +47,9 @@ long kvmppc_alloc_hpt(struct kvm *kvm) { unsigned long hpt; unsigned long lpid; + struct revmap_entry *rev; + /* Allocate guest's hashed page table */ hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|__GFP_NOWARN, HPT_ORDER - PAGE_SHIFT); if (!hpt) { @@ -60,12 +58,20 @@ long kvmppc_alloc_hpt(struct kvm *kvm) } kvm->arch.hpt_virt = hpt; + /* Allocate reverse map array */ + rev = vmalloc(sizeof(struct revmap_entry) * HPT_NPTE); + if (!rev) { + pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n"); + goto out_freehpt; + } + kvm->arch.revmap = rev; + + /* Allocate the guest's logical partition ID */ do { lpid = find_first_zero_bit(lpid_inuse, NR_LPIDS); if (lpid >= NR_LPIDS) { pr_err("kvm_alloc_hpt: No LPIDs free\n"); - free_pages(hpt, HPT_ORDER - PAGE_SHIFT); - return -ENOMEM; + goto out_freeboth; } } while (test_and_set_bit(lpid, lpid_inuse)); @@ -74,11 +80,18 @@ long kvmppc_alloc_hpt(struct kvm *kvm) pr_info("KVM guest htab at %lx, LPID %lx\n", hpt, lpid); return 0; + + out_freeboth: + vfree(rev); + out_freehpt: + free_pages(hpt, HPT_ORDER - PAGE_SHIFT); +
[PATCH 06/13] KVM: PPC: Only get pages when actually needed, not in prepare_memory_region()
This removes the code from kvmppc_core_prepare_memory_region() that looked up the VMA for the region being added and called hva_to_page to get the pfns for the memory. We have no guarantee that there will be anything mapped there at the time of the KVM_SET_USER_MEMORY_REGION ioctl call; userspace can do that ioctl and then map memory into the region later. Instead we defer looking up the pfn for each memory page until it is needed, which generally means when the guest does an H_ENTER hcall on the page. Since we can't call get_user_pages in real mode, if we don't already have the pfn for the page, kvmppc_h_enter() will return H_TOO_HARD and we then call kvmppc_virtmode_h_enter() once we get back to kernel context. That calls kvmppc_get_guest_page() to get the pfn for the page, and then calls back to kvmppc_h_enter() to redo the HPTE insertion. When the first vcpu starts executing, we need to have the RMO or VRMA region mapped so that the guest's real mode accesses will work. Thus we now have a check in kvmppc_vcpu_run() to see if the RMO/VRMA is set up and if not, call kvmppc_hv_setup_rma(). It checks if the memslot starting at guest physical 0 now has RMO memory mapped there; if so it sets it up for the guest, otherwise on POWER7 it sets up the VRMA. The function that does that, kvmppc_map_vrma, is now a bit simpler, as it calls kvmppc_virtmode_h_enter instead of creating the HPTE itself. Since we are now potentially updating entries in the slot_phys[] arrays from multiple vcpu threads, we now have a spinlock protecting those updates to ensure that we don't lose track of any references to pages. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s.h|4 + arch/powerpc/include/asm/kvm_book3s_64.h | 12 ++ arch/powerpc/include/asm/kvm_host.h |2 + arch/powerpc/include/asm/kvm_ppc.h |4 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 130 +--- arch/powerpc/kvm/book3s_hv.c | 244 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 56 7 files changed, 291 insertions(+), 161 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 16db48c..5e7e04b 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -143,6 +143,10 @@ extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn); extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr, unsigned long *nb_ret); extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr); +extern long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, + long pte_index, unsigned long pteh, unsigned long ptel); +extern long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, + long pte_index, unsigned long pteh, unsigned long ptel); extern void kvmppc_entry_trampoline(void); extern void kvmppc_hv_entry_trampoline(void); diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index fe45a81..ab6772e 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -95,4 +95,16 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, return rb; } +static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +{ + /* only handle 4k, 64k and 16M pages for now */ + if (!(h & HPTE_V_LARGE)) + return 1ul << 12; /* 4k page */ + if ((l & 0xf000) == 0x1000 && cpu_has_feature(CPU_FTR_ARCH_206)) + return 1ul << 16; /* 64k page */ + if ((l & 0xff000) == 0) + return 1ul << 24; /* 16M page */ + return 0; /* error */ +} + #endif /* __ASM_KVM_BOOK3S_64_H__ */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index cf6b4d7..2a52bdb 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -193,7 +193,9 @@ struct kvm_arch { unsigned long lpcr; unsigned long rmor; struct kvmppc_rma_info *rma; + int rma_setup_done; struct list_head spapr_tce_tables; + spinlock_t slot_phys_lock; unsigned long *slot_phys[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS]; int slot_npages[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS]; unsigned short last_vcpu[NR_CPUS]; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index fc2d696..111e1b4 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -121,8 +121,8 @@ extern long kvmppc_alloc_hpt(struct kvm *kvm); extern void kvmppc_free_hpt(struct kvm *kvm); extern long kvmppc_prepare_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem); -
[PATCH 0/13] KVM: PPC: Update Book3S HV memory handling
This series of patches updates the Book3S-HV KVM code that manages the guest hashed page table (HPT) to enable several things: * MMIO emulation and MMIO pass-through * Use of small pages (4kB or 64kB, depending on config) to back the guest memory * Pageable guest memory - i.e. backing pages can be removed from the guest and reinstated on demand, using the MMU notifier mechanism. * Guests can be given read-only access to pages even though they think they have mapped them read/write. When they try to write to them their access is upgraded to read/write. This allows KSM to share pages between guests. On PPC970 we have no way to get DSIs and ISIs to come to the hypervisor, so we can't do MMIO emulation or pageable guest memory. On POWER7 we set the VPM1 bit in the LPCR to make all DSIs and ISIs come to the hypervisor (host) as HDSIs or HISIs. This code is working well in my tests. The sporadic crashes that I was seeing earlier are fixed by the first patch in the series. Somewhat to my surprise, when I implemented the last patch in the series I started to see KSM coalescing pages without any further effort on my part -- my tests were on a machine with Fedora 16 installed, and it has ksmtuned running by default. This series is on top of Alex Graf's kvm-ppc-next branch, although the last patch on that branch ("KVM: PPC: booke: Improve timer register emulation") is causing the decrementer not to work properly in Book3S HV guests, for reasons that I haven't fully determined yet. These patches only touch arch/powerpc except for patch 11, which adds a couple of barriers to allow mmu_notifier_retry() to be used outside of the kvm->mmu_lock. Unlike the previous version of these patches, we don't look at what's mapped in the user address space at the time that kvmppc_core_prepare_memory_region or kvmppc_core_commit_memory_region gets called; we look up pages only when they are needed, either because the guest wants to map them with an H_ENTER hypercall, or for the pages needed for the virtual real-mode area (VRMA), at the time of the first VCPU_RUN ioctl. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/13] KVM: Add barriers to allow mmu_notifier_retry to be used locklessly
This adds an smp_wmb in kvm_mmu_notifier_invalidate_range_end() and an smp_rmb in mmu_notifier_retry() so that mmu_notifier_retry() will give the correct answer when called without kvm->mmu_lock being held. PowerPC Book3S HV KVM wants to use a bitlock per guest page rather than a single global spinlock in order to improve the scalability of updates to the guest MMU hashed page table, and so needs this. Signed-off-by: Paul Mackerras --- include/linux/kvm_host.h | 14 +- virt/kvm/kvm_main.c |6 +++--- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c6a2ec9..17319ad 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -672,12 +672,16 @@ static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_se if (unlikely(vcpu->kvm->mmu_notifier_count)) return 1; /* -* Both reads happen under the mmu_lock and both values are -* modified under mmu_lock, so there's no need of smb_rmb() -* here in between, otherwise mmu_notifier_count should be -* read before mmu_notifier_seq, see -* mmu_notifier_invalidate_range_end write side. +* Ensure the read of mmu_notifier_count happens before the read +* of mmu_notifier_seq. This interacts with the smp_wmb() in +* mmu_notifier_invalidate_range_end to make sure that the caller +* either sees the old (non-zero) value of mmu_notifier_count or +* the new (incremented) value of mmu_notifier_seq. +* PowerPC Book3s HV KVM calls this under a per-page lock +* rather than under kvm->mmu_lock, for scalability, so +* can't rely on kvm->mmu_lock to keep things ordered. */ + smp_rmb(); if (vcpu->kvm->mmu_notifier_seq != mmu_seq) return 1; return 0; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d9cfb78..ad2a912 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -357,11 +357,11 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, * been freed. */ kvm->mmu_notifier_seq++; + smp_wmb(); /* * The above sequence increase must be visible before the -* below count decrease but both values are read by the kvm -* page fault under mmu_lock spinlock so we don't need to add -* a smb_wmb() here in between the two. +* below count decrease, which is ensured by the smp_wmb above +* in conjunction with the smp_rmb in mmu_notifier_retry(). */ kvm->mmu_notifier_count--; spin_unlock(&kvm->mmu_lock); -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/28] kvm tools: Prepare kvmtool for another architecture
On 06/12/11 14:35, Matt Evans wrote: > This patch series rearranges and tidies various parts of kvmtool to pave the > way > for the addition of support for another architecture -- SPAPR PPC64. A second > patch series will follow to present the PPC64 support. I forgot to mention, of course, that these two sets apply on top of git://github.com/penberg/linux-kvm.git master as of d5e6b9fa. Also, I've have been testing PPC64 kvmtool using the book3s_hv KVM mode. Matt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
On Mon, 05 Dec 2011 11:52:54 +0200, Avi Kivity wrote: > On 12/05/2011 02:10 AM, Rusty Russell wrote: > > On Sun, 04 Dec 2011 17:16:59 +0200, Avi Kivity wrote: > > > On 12/04/2011 05:11 PM, Michael S. Tsirkin wrote: > > > > > There's also the used ring, but that's a > > > > > mistake if you have out of order completion. We should have used > > > > > copying. > > > > > > > > Seems unrelated... unless you want used to be written into > > > > descriptor ring itself? > > > > > > The avail/used rings are in addition to the regular ring, no? If you > > > copy descriptors, then it goes away. > > > > There were two ideas which drove the current design: > > > > 1) The Van-Jacobson style "no two writers to same cacheline makes rings > >fast" idea. Empirically, this doesn't show any winnage. > > Write/write is the same as write/read or read/write. Both cases have to > send a probe and wait for the result. What we really need is to > minimize cache line ping ponging, and the descriptor pool fails that > with ooo completion. I doubt it's measurable though except with the > very fastest storage providers. The claim was that going exclusive->shared->exclusive was cheaper than exclusive->invalid->exclusive. When VJ said it, it seemed convincing :) > > 2) Allowing a generic inter-guest copy mechanism, so we could have > >genuinely untrusted driver domains. Yet noone ever did this so it's > >hardly a killer feature :( > > It's still a goal, though not an important one. But we have to > translate rings anyway, don't, since buffers are in guest physical > addresses, and we're moving into an address space that doesn't map those. Yes, but the hypervisor/trusted party would simply have to do the copy; the rings themselves would be shared A would say "copy this to/from B's ring entry N" and you know that A can't have changed B's entry. > I thought of having a vhost-copy driver that could do ring translation, > using a dma engine for the copy. As long as we get the length of data written from the vhost-copy driver (ie. not just the network header). Otherwise a malicious other guest can send short packets, and a local process can read uninitialized memory. And pre-zeroing the buffers for this corner case sucks. > > So if we're going to revisit and drop those requirements, I'd say: > > > > 1) Shared device/driver rings like Xen. Xen uses device-specific ring > >contents, I'd be tempted to stick to our pre-headers, and a 'u64 > >addr; u64 len_and_flags; u64 cookie;' generic style. Then use > >the same ring for responses. That's a slight space-win, since > >we're 24 bytes vs 26 bytes now. > > Let's cheat and have inline contents. Take three bits from > len_and_flags to specify additional descriptors as inline data. Nice, I like this optimization. > Also, stuff the cookie into len_and_flags as well. Every driver really wants to put a pointer in there. We have an array to map desc. numbers to cookies inside the virtio core. We really want 64 bits. > > 2) Stick with physically-contiguous rings, but use them of size (2^n)-1. > >Makes the indexing harder, but that -1 lets us stash the indices in > >the first entry and makes the ring a nice 2^n size. > > Allocate at lease a cache line for those. The 2^n size is not really > material, a division is never necessary. We free-run our indices, so we *do* a division (by truncation). If we limit indices to ringsize, then we have to handle empty/full confusion. It's nice for simple OSes if things pack nicely into pages, but it's not a killer feature IMHO. > > > > > 16kB worth of descriptors is 1024 entries. With 4kB buffers, that's > > > > > 4MB > > > > > worth of data, or 4 ms at 10GbE line speed. With 1500 byte buffers > > > > > it's > > > > > just 1.5 ms. In any case I think it's sufficient. > > > > > > > > Right. So I think that without indirect, we waste about 3 entries > > > > per packet for virtio header and transport etc headers. > > > > > > That does suck. Are there issues in increasing the ring size? Or > > > making it discontiguous? > > > > Because the qemu implementation is broken. > > I was talking about something else, but this is more important. Every > time we make a simplifying assumption, it turns around and bites us, and > the code becomes twice as complicated as it would have been in the first > place, and the test matrix explodes. True, though we seem to be improving. But this is why I don't want optional features in the spec; I want us always to exercise all of it. > > We can often put the virtio > > header at the head of the packet. In practice, the qemu implementation > > insists the header be a single descriptor. > > > > (At least, it used to, perhaps it has now been fixed. We need a > > VIRTIO_NET_F_I_NOW_CONFORM_TO_THE_DAMN_SPEC_SORRY_I_SUCK bit). > > We'll run out of bits in no time. We had one already: VIRTIO_F_BAD_FEATURE. We haven't used it in a long time (if ever), and I ju
[PATCH 8/8] kvm tools: Make virtio-pci's ioeventfd__add_event() fall back gracefully if ioeventfds unavailable
PPC KVM doesn't yet support ioeventfds, so don't bomb out/die. virtio-pci is able to function if it instead uses normal IO port notification. Signed-off-by: Matt Evans --- tools/kvm/include/kvm/ioeventfd.h |3 ++- tools/kvm/ioeventfd.c | 12 +--- tools/kvm/virtio/pci.c| 11 ++- 3 files changed, 21 insertions(+), 5 deletions(-) diff --git a/tools/kvm/include/kvm/ioeventfd.h b/tools/kvm/include/kvm/ioeventfd.h index df01750..5e458be 100644 --- a/tools/kvm/include/kvm/ioeventfd.h +++ b/tools/kvm/include/kvm/ioeventfd.h @@ -4,6 +4,7 @@ #include #include #include +#include struct kvm; @@ -21,7 +22,7 @@ struct ioevent { void ioeventfd__init(void); void ioeventfd__start(void); -void ioeventfd__add_event(struct ioevent *ioevent); +bool ioeventfd__add_event(struct ioevent *ioevent); void ioeventfd__del_event(u64 addr, u64 datamatch); #endif diff --git a/tools/kvm/ioeventfd.c b/tools/kvm/ioeventfd.c index 3a240e4..37f9a63 100644 --- a/tools/kvm/ioeventfd.c +++ b/tools/kvm/ioeventfd.c @@ -26,7 +26,7 @@ void ioeventfd__init(void) die("Failed creating epoll fd"); } -void ioeventfd__add_event(struct ioevent *ioevent) +bool ioeventfd__add_event(struct ioevent *ioevent) { struct kvm_ioeventfd kvm_ioevent; struct epoll_event epoll_event; @@ -48,8 +48,13 @@ void ioeventfd__add_event(struct ioevent *ioevent) .flags = KVM_IOEVENTFD_FLAG_PIO | KVM_IOEVENTFD_FLAG_DATAMATCH, }; - if (ioctl(ioevent->fn_kvm->vm_fd, KVM_IOEVENTFD, &kvm_ioevent) != 0) - die("Failed creating new ioeventfd"); + if (ioctl(ioevent->fn_kvm->vm_fd, KVM_IOEVENTFD, &kvm_ioevent) != 0) { + /* Not all KVM implementations may support KVM_IOEVENTFD, +* so be graceful. +*/ + free(new_ioevent); + return false; + } epoll_event = (struct epoll_event) { .events = EPOLLIN, @@ -60,6 +65,7 @@ void ioeventfd__add_event(struct ioevent *ioevent) die("Failed assigning new event to the epoll fd"); list_add_tail(&new_ioevent->list, &used_ioevents); + return true; } void ioeventfd__del_event(u64 addr, u64 datamatch) diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c index ffa3768..06d3b79 100644 --- a/tools/kvm/virtio/pci.c +++ b/tools/kvm/virtio/pci.c @@ -50,7 +50,16 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_trans *vtra .fd = eventfd(0, 0), }; - ioeventfd__add_event(&ioevent); + if (!ioeventfd__add_event(&ioevent)) { +#ifndef CONFIG_PPC + /* PPC64 doesn't have kvm ioevents yet, so we expect this to +* fail -- don't need to be verbose about it! For virtio-pci, +* this is fine. It catches the IO accesses anyway, so +* still works (but slower). +*/ + pr_warning("Failed creating new ioeventfd"); +#endif + } if (vtrans->virtio_ops->notify_vq_eventfd) vtrans->virtio_ops->notify_vq_eventfd(kvm, vpci->dev, vq, ioevent.fd); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/8] kvm tools: Add PPC64 kvm_cpu__emulate_io()
This is the final piece of the puzzle for PPC SPAPR PCI; this function splits MMIO accesses into the two PHB windows & directs things to MMIO/IO emulation as appropriate. Signed-off-by: Matt Evans --- tools/kvm/Makefile |1 + tools/kvm/powerpc/kvm-cpu.c | 34 ++ 2 files changed, 35 insertions(+), 0 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 6c8..9b875dd 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -131,6 +131,7 @@ ifeq ($(uname_M), ppc64) OBJS+= powerpc/spapr_hcall.o OBJS+= powerpc/spapr_rtas.o OBJS+= powerpc/spapr_hvcons.o + OBJS+= powerpc/spapr_pci.o OBJS+= powerpc/xics.o ARCH_INCLUDE := powerpc/include CFLAGS += -m64 diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c index 63cd106..0cf4dc8 100644 --- a/tools/kvm/powerpc/kvm-cpu.c +++ b/tools/kvm/powerpc/kvm-cpu.c @@ -24,6 +24,7 @@ #include #include #include +#include static int debug_fd; @@ -177,6 +178,39 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu) return ret; } +bool kvm_cpu__emulate_io(struct kvm_cpu *cpu, struct kvm_run *kvm_run) +{ + bool ret = false; + u64 phys_addr; + + /* We'll never get KVM_EXIT_IO, it's x86-specific. All IO is MM! :P +* So, look at our windows here & split addresses into I/O or MMIO. +*/ + assert(kvm_run->exit_reason == KVM_EXIT_MMIO); + + phys_addr = cpu->kvm_run->mmio.phys_addr; + if ((phys_addr >= SPAPR_PCI_IO_WIN_ADDR) && + (phys_addr < SPAPR_PCI_IO_WIN_ADDR + SPAPR_PCI_IO_WIN_SIZE)) { + ret = kvm__emulate_io(cpu->kvm, phys_addr - SPAPR_PCI_IO_WIN_ADDR, + cpu->kvm_run->mmio.data, + cpu->kvm_run->mmio.is_write ? + KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN, + cpu->kvm_run->mmio.len, 1); + } else if ((phys_addr >= SPAPR_PCI_MEM_WIN_ADDR) && + (phys_addr < SPAPR_PCI_MEM_WIN_ADDR + SPAPR_PCI_MEM_WIN_SIZE)) { + ret = kvm__emulate_mmio(cpu->kvm, + cpu->kvm_run->mmio.phys_addr - SPAPR_PCI_MEM_WIN_ADDR, + cpu->kvm_run->mmio.data, + cpu->kvm_run->mmio.len, + cpu->kvm_run->mmio.is_write); + } else { + pr_warning("MMIO %s unknown address %lx (size %d)!\n", + cpu->kvm_run->mmio.is_write ? "write to" : "read from", + phys_addr, cpu->kvm_run->mmio.len); + } + return ret; +} + #define CONDSTR_BIT(m, b) (((m) & MSR_##b) ? #b" " : "") void kvm_cpu__show_registers(struct kvm_cpu *vcpu) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] kvm tools: Add PPC64 PCI Host Bridge
This provides the PCI bridge, definitions for the address layout of the windows and wires in IRQs. Once PCI devices are all registered, they are enumerated and DT nodes generated for each. Signed-off-by: Matt Evans --- tools/kvm/powerpc/include/kvm/kvm-arch.h |3 + tools/kvm/powerpc/irq.c | 17 +- tools/kvm/powerpc/kvm.c | 11 + tools/kvm/powerpc/spapr.h|8 + tools/kvm/powerpc/spapr_pci.c| 429 ++ tools/kvm/powerpc/spapr_pci.h| 38 +++ 6 files changed, 504 insertions(+), 2 deletions(-) create mode 100644 tools/kvm/powerpc/spapr_pci.c create mode 100644 tools/kvm/powerpc/spapr_pci.h diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h b/tools/kvm/powerpc/include/kvm/kvm-arch.h index ae811e9..ba374f5 100644 --- a/tools/kvm/powerpc/include/kvm/kvm-arch.h +++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h @@ -40,6 +40,8 @@ */ #define KVM_PCI_MMIO_AREA 0x100 +struct spapr_phb; + struct kvm { int sys_fd; /* For system ioctls(), i.e. /dev/kvm */ int vm_fd; /* For VM ioctls() */ @@ -66,6 +68,7 @@ struct kvm { unsigned long initrd_size; const char *name; struct icp_state*icp; + struct spapr_phb*phb; }; #endif /* KVM__KVM_ARCH_H */ diff --git a/tools/kvm/powerpc/irq.c b/tools/kvm/powerpc/irq.c index 80c972a..134db8f 100644 --- a/tools/kvm/powerpc/irq.c +++ b/tools/kvm/powerpc/irq.c @@ -21,14 +21,27 @@ #include #include +#include "kvm/pci.h" + #include "xics.h" +#include "spapr_pci.h" #define XICS_IRQS 1024 +static int pci_devs = 0; + int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line) { - fprintf(stderr, "irq__register_device(%d, [%d], [%d], [%d]\n", - dev, *num, *pin, *line); + if (pci_devs >= PCI_MAX_DEVICES) + die("Hit PCI device limit!\n"); + + *num = pci_devs++; + + *pin = 1; + /* Have I said how nasty I find this? Line should be dontcare... PHB +* should determine which CPU/XICS IRQ to fire. +*/ + *line = xics_alloc_irqnum(); return 0; } diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c index bfd7c3a..353c667 100644 --- a/tools/kvm/powerpc/kvm.c +++ b/tools/kvm/powerpc/kvm.c @@ -16,6 +16,7 @@ #include "spapr.h" #include "spapr_hvcons.h" +#include "spapr_pci.h" #include @@ -166,6 +167,11 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char *hugetlbfs_ register_core_rtas(); /* Now that hypercalls are initialised, register a couple for the console: */ spapr_hvcons_init(); + spapr_create_phb(kvm, "pci", SPAPR_PCI_BUID, +SPAPR_PCI_MEM_WIN_ADDR, +SPAPR_PCI_MEM_WIN_SIZE, +SPAPR_PCI_IO_WIN_ADDR, +SPAPR_PCI_IO_WIN_SIZE); } void kvm__irq_trigger(struct kvm *kvm, int irq) @@ -420,6 +426,11 @@ static void setup_fdt(struct kvm *kvm) _FDT(fdt_finish(fdt)); _FDT(fdt_open_into(fdt, fdt_dest, FDT_MAX_SIZE)); + + /* PCI */ + if (spapr_populate_pci_devices(kvm, PHANDLE_XICP, fdt_dest)) + die("Fail populating PCI device nodes"); + _FDT(fdt_add_mem_rsv(fdt_dest, kvm->rtas_gra, kvm->rtas_size)); _FDT(fdt_pack(fdt_dest)); } diff --git a/tools/kvm/powerpc/spapr.h b/tools/kvm/powerpc/spapr.h index 4e5d7bd..902496d 100644 --- a/tools/kvm/powerpc/spapr.h +++ b/tools/kvm/powerpc/spapr.h @@ -305,4 +305,12 @@ target_ulong spapr_rtas_call(struct kvm_cpu *vcpu, uint32_t token, uint32_t nargs, target_ulong args, uint32_t nret, target_ulong rets); +#define SPAPR_PCI_BUID 0x8002001ULL +#define SPAPR_PCI_MEM_WIN_ADDR (KVM_MMIO_START + 0xA000) +#define SPAPR_PCI_MEM_WIN_SIZE 0x2000 +#define SPAPR_PCI_IO_WIN_ADDR (KVM_MMIO_START + 0x8000) +/* This, to me, is odd... 32MB of I/O? Some PHBs are set up like this. + * Anything ever use > 64K? :P */ +#define SPAPR_PCI_IO_WIN_SIZE 0x200 + #endif /* !defined (__HW_SPAPR_H__) */ diff --git a/tools/kvm/powerpc/spapr_pci.c b/tools/kvm/powerpc/spapr_pci.c new file mode 100644 index 000..233c42c --- /dev/null +++ b/tools/kvm/powerpc/spapr_pci.c @@ -0,0 +1,429 @@ +/* + * SPAPR PHB emulation, RTAS interface to PCI config space, device tree nodes + * for enumerated devices. + * + * Borrowed heavily from QEMU's spapr_pci.c, + * Copyright (c) 2011 Alexey Kardashevskiy, IBM Corporation. + * Copyright (c) 2011 David Gibson, IBM Corporation. + * + * Modifications copyright 2011 Matt Evans , IBM Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by t
[PATCH 5/8] kvm tools: Add PPC64 XICS interrupt controller support
This patch adds XICS emulation code (heavily borrowed from QEMU), and wires this into kvm_cpu__irq() to fire a CPU IRQ via KVM. A device tree entry is also added. IPIs work, xics_alloc_irqnum() is added to allocate an external IRQ (which will later be used by the PHB PCI code) and finally, kvm__irq_line() can be called to raise an IRQ on XICS. Signed-off-by: Matt Evans --- tools/kvm/Makefile |1 + tools/kvm/powerpc/include/kvm/kvm-arch.h |1 + tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h |2 + tools/kvm/powerpc/irq.c | 11 +- tools/kvm/powerpc/kvm-cpu.c | 10 + tools/kvm/powerpc/kvm.c | 25 +- tools/kvm/powerpc/xics.c | 529 ++ tools/kvm/powerpc/xics.h | 23 ++ 8 files changed, 596 insertions(+), 6 deletions(-) create mode 100644 tools/kvm/powerpc/xics.c create mode 100644 tools/kvm/powerpc/xics.h diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 76cce3a..6c8 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -131,6 +131,7 @@ ifeq ($(uname_M), ppc64) OBJS+= powerpc/spapr_hcall.o OBJS+= powerpc/spapr_rtas.o OBJS+= powerpc/spapr_hvcons.o + OBJS+= powerpc/xics.o ARCH_INCLUDE := powerpc/include CFLAGS += -m64 LIBS+= -lfdt diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h b/tools/kvm/powerpc/include/kvm/kvm-arch.h index 722d01c..ae811e9 100644 --- a/tools/kvm/powerpc/include/kvm/kvm-arch.h +++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h @@ -65,6 +65,7 @@ struct kvm { unsigned long initrd_gra; unsigned long initrd_size; const char *name; + struct icp_state*icp; }; #endif /* KVM__KVM_ARCH_H */ diff --git a/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h b/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h index dbabc57..551307e 100644 --- a/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h +++ b/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h @@ -17,6 +17,8 @@ #include +#define POWER7_EXT_IRQ 0 + struct kvm; struct kvm_cpu { diff --git a/tools/kvm/powerpc/irq.c b/tools/kvm/powerpc/irq.c index 46aa64f..80c972a 100644 --- a/tools/kvm/powerpc/irq.c +++ b/tools/kvm/powerpc/irq.c @@ -21,6 +21,10 @@ #include #include +#include "xics.h" + +#define XICS_IRQS 1024 + int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line) { fprintf(stderr, "irq__register_device(%d, [%d], [%d], [%d]\n", @@ -30,7 +34,12 @@ int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line) void irq__init(struct kvm *kvm) { - fprintf(stderr, __func__); + /* kvm->nr_cpus is now valid; for /now/, pass +* this to xics_system_init(), which assumes servers +* are numbered 0..nrcpus. This may not really be true, +* but it is OK currently. +*/ + kvm->icp = xics_system_init(XICS_IRQS, kvm->nrcpus); } int irq__add_msix_route(struct kvm *kvm, struct msi_msg *msg) diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c index 71c648e..63cd106 100644 --- a/tools/kvm/powerpc/kvm-cpu.c +++ b/tools/kvm/powerpc/kvm-cpu.c @@ -15,6 +15,7 @@ #include "kvm/kvm.h" #include "spapr.h" +#include "xics.h" #include #include @@ -107,6 +108,9 @@ struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned long cpu_id) */ vcpu->is_running = true; + /* Register with IRQ controller */ + xics_cpu_register(vcpu); + return vcpu; } @@ -151,6 +155,12 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu) /* kvm_cpu__irq - set KVM's IRQ flag on this vcpu */ void kvm_cpu__irq(struct kvm_cpu *vcpu, int pin, int level) { + unsigned int virq = level ? KVM_INTERRUPT_SET_LEVEL : KVM_INTERRUPT_UNSET; + + if (pin != POWER7_EXT_IRQ) + return; + if (ioctl(vcpu->vcpu_fd, KVM_INTERRUPT, &virq) < 0) + pr_warning("Could not KVM_INTERRUPT."); } bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu) diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c index 8614538..bfd7c3a 100644 --- a/tools/kvm/powerpc/kvm.c +++ b/tools/kvm/powerpc/kvm.c @@ -41,9 +41,13 @@ #define HUGETLBFS_PATH "/var/lib/hugetlbfs/global/pagesize-16MB/" +#define PHANDLE_XICP 0x + static char kern_cmdline[2048]; struct kvm_ext kvm_req_ext[] = { + { DEFINE_KVM_EXT(KVM_CAP_PPC_UNSET_IRQ) }, + { DEFINE_KVM_EXT(KVM_CAP_PPC_IRQ_LEVEL) }, { 0, 0 } }; @@ -164,11 +168,6 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char *hugetlbfs_ spapr_hvcons_init(); } -void kvm__irq_line(struct kvm *kvm, int irq, int level) -{ - fprintf(stderr, "irq_line(%d, %d)\n", irq, level); -} - void kvm__irq_trigger(struct kvm *kvm, int irq) { kvm__irq_line(kvm, irq, 1); @@ -384,6 +383,22 @@ static void setup_fdt(str
[PATCH 4/8] kvm tools: Add SPAPR PPC64 HV console
This adds the console code, plus VIO HV terminal nodes are added to the device tree so the guest kernel will pick it up. Signed-off-by: Matt Evans --- tools/kvm/Makefile |1 + tools/kvm/powerpc/kvm.c | 31 tools/kvm/powerpc/spapr_hvcons.c | 101 ++ tools/kvm/powerpc/spapr_hvcons.h | 19 +++ 4 files changed, 152 insertions(+), 0 deletions(-) create mode 100644 tools/kvm/powerpc/spapr_hvcons.c create mode 100644 tools/kvm/powerpc/spapr_hvcons.h diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 0f24104..76cce3a 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -130,6 +130,7 @@ ifeq ($(uname_M), ppc64) OBJS+= powerpc/kvm-cpu.o OBJS+= powerpc/spapr_hcall.o OBJS+= powerpc/spapr_rtas.o + OBJS+= powerpc/spapr_hvcons.o ARCH_INCLUDE := powerpc/include CFLAGS += -m64 LIBS+= -lfdt diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c index 2f0a921..8614538 100644 --- a/tools/kvm/powerpc/kvm.c +++ b/tools/kvm/powerpc/kvm.c @@ -15,6 +15,7 @@ #include "kvm/util.h" #include "spapr.h" +#include "spapr_hvcons.h" #include @@ -159,6 +160,8 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char *hugetlbfs_ /* Do these before FDT setup, IRQ setup, etc. */ hypercall_init(); register_core_rtas(); + /* Now that hypercalls are initialised, register a couple for the console: */ + spapr_hvcons_init(); } void kvm__irq_line(struct kvm *kvm, int irq, int level) @@ -172,6 +175,11 @@ void kvm__irq_trigger(struct kvm *kvm, int irq) kvm__irq_line(kvm, irq, 0); } +void kvm__arch_periodic_poll(struct kvm *kvm) +{ + spapr_hvcons_poll(kvm); +} + int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char *kernel_cmdline) { void *p; @@ -297,6 +305,13 @@ static void setup_fdt(struct kvm *kvm) &ird_end_prop, sizeof(ird_end_prop))); } + /* stdout-path: This is assuming we're using the HV console. Also, the +* address is hardwired until we do a VIO bus. +*/ + _FDT(fdt_property_string(fdt, "linux,stdout-path", +"/vdevice/vty@3000")); + _FDT(fdt_end_node(fdt)); + /* Memory: We don't alloc. a separate RMA yet. If we ever need to * (CAP_PPC_RMA == 2) then have one memory node for 0->RMAsize, and * another RMAsize->endOfMem. @@ -369,6 +384,22 @@ static void setup_fdt(struct kvm *kvm) } _FDT(fdt_end_node(fdt)); + /* VIO: See comment in linux,stdout-path; we don't yet represent a VIO +* bus/address allocation so addresses are hardwired here. +*/ + _FDT(fdt_begin_node(fdt, "vdevice")); + _FDT(fdt_property_cell(fdt, "#address-cells", 0x1)); + _FDT(fdt_property_cell(fdt, "#size-cells", 0x0)); + _FDT(fdt_property_string(fdt, "device_type", "vdevice")); + _FDT(fdt_property_string(fdt, "compatible", "IBM,vdevice")); + _FDT(fdt_begin_node(fdt, "vty@3000")); + _FDT(fdt_property_string(fdt, "name", "vty")); + _FDT(fdt_property_string(fdt, "device_type", "serial")); + _FDT(fdt_property_string(fdt, "compatible", "hvterm1")); + _FDT(fdt_property_cell(fdt, "reg", 0x3000)); + _FDT(fdt_end_node(fdt)); + _FDT(fdt_end_node(fdt)); + /* Finalise: */ _FDT(fdt_end_node(fdt)); /* Root node */ _FDT(fdt_finish(fdt)); diff --git a/tools/kvm/powerpc/spapr_hvcons.c b/tools/kvm/powerpc/spapr_hvcons.c new file mode 100644 index 000..97902ac --- /dev/null +++ b/tools/kvm/powerpc/spapr_hvcons.c @@ -0,0 +1,101 @@ +/* + * SPAPR HV console + * + * Borrowed lightly from QEMU's spapr_vty.c, Copyright (c) 2010 David Gibson, + * IBM Corporation. + * + * Copyright (c) 2011 Matt Evans , IBM Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + */ + +#include "kvm/term.h" +#include "kvm/kvm.h" +#include "kvm/kvm-cpu.h" +#include "kvm/util.h" +#include "spapr.h" +#include "spapr_hvcons.h" + +#include +#include +#include + +#include + +union hv_chario { + struct { + uint64_t char0_7; + uint64_t char8_15; + } a; + uint8_t buf[16]; +}; + +static unsigned long h_put_term_char(struct kvm_cpu *vcpu, unsigned long opcode, unsigned long *args) +{ + /* To do: Read register from args[0], and check it. */ + unsigned long len = args[1]; + union hv_chario data; + struct iovec iov; + + if (len > 16) { + return H_PARAMETER; + } + data.a.char0_7 = cpu_to_be64(args[2]); + data.a.char8_15 = cpu_to_be64(args[3]); + + iov.iov_base = data.buf; + iov.iov_
[PATCH 3/8] kvm tools: Add SPAPR PPC64 hcall & rtascall structure
This patch adds the basic structure for HV calls, their registration and some of the simpler calls. A similar layout for RTAS calls is also added, again with some of the simpler RTAS calls used by the guest. The SPAPR RTAS stub is generated inline. Also, nodes for RTAS are added to the device tree. Signed-off-by: Matt Evans --- tools/kvm/Makefile |2 + tools/kvm/powerpc/kvm-cpu.c |5 + tools/kvm/powerpc/kvm.c | 39 +- tools/kvm/powerpc/spapr.h | 308 +++ tools/kvm/powerpc/spapr_hcall.c | 151 +++ tools/kvm/powerpc/spapr_rtas.c | 226 6 files changed, 730 insertions(+), 1 deletions(-) create mode 100644 tools/kvm/powerpc/spapr.h create mode 100644 tools/kvm/powerpc/spapr_hcall.c create mode 100644 tools/kvm/powerpc/spapr_rtas.c diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index dc18959..0f24104 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -128,6 +128,8 @@ ifeq ($(uname_M), ppc64) OBJS+= powerpc/irq.o OBJS+= powerpc/kvm.o OBJS+= powerpc/kvm-cpu.o + OBJS+= powerpc/spapr_hcall.o + OBJS+= powerpc/spapr_rtas.o ARCH_INCLUDE := powerpc/include CFLAGS += -m64 LIBS+= -lfdt diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c index 79422ff..71c648e 100644 --- a/tools/kvm/powerpc/kvm-cpu.c +++ b/tools/kvm/powerpc/kvm-cpu.c @@ -14,6 +14,8 @@ #include "kvm/util.h" #include "kvm/kvm.h" +#include "spapr.h" + #include #include #include @@ -156,6 +158,9 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu) bool ret = true; struct kvm_run *run = vcpu->kvm_run; switch(run->exit_reason) { + case KVM_EXIT_PAPR_HCALL: + run->papr_hcall.ret = spapr_hypercall(vcpu, run->papr_hcall.nr, run->papr_hcall.args); + break; default: ret = false; } diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c index d792bee..2f0a921 100644 --- a/tools/kvm/powerpc/kvm.c +++ b/tools/kvm/powerpc/kvm.c @@ -14,6 +14,8 @@ #include "kvm/kvm.h" #include "kvm/util.h" +#include "spapr.h" + #include #include @@ -153,6 +155,10 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char *hugetlbfs_ cap_ppc_rma = ioctl(kvm->sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_PPC_RMA); if (cap_ppc_rma == 2) die("Need contiguous RMA allocation on this hardware, which is not yet supported."); + + /* Do these before FDT setup, IRQ setup, etc. */ + hypercall_init(); + register_core_rtas(); } void kvm__irq_line(struct kvm *kvm, int irq, int level) @@ -262,6 +268,20 @@ static void setup_fdt(struct kvm *kvm) _FDT(fdt_property_cell(fdt, "#address-cells", 0x2)); _FDT(fdt_property_cell(fdt, "#size-cells", 0x2)); + /* RTAS */ + _FDT(fdt_begin_node(fdt, "rtas")); + /* This is what the kernel uses to switch 'We're an LPAR'! */ +_FDT(fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop_kvm, + sizeof(hypertas_prop_kvm))); + _FDT(fdt_property_cell(fdt, "linux,rtas-base", kvm->rtas_gra)); + _FDT(fdt_property_cell(fdt, "linux,rtas-entry", kvm->rtas_gra)); + _FDT(fdt_property_cell(fdt, "rtas-size", kvm->rtas_size)); + /* Now add properties for all RTAS tokens: */ + if (spapr_rtas_fdt_setup(kvm, fdt)) + die("Couldn't create RTAS FDT properties\n"); + + _FDT(fdt_end_node(fdt)); + /* /chosen */ _FDT(fdt_begin_node(fdt, "chosen")); /* cmdline */ @@ -363,7 +383,24 @@ static void setup_fdt(struct kvm *kvm) */ void kvm__arch_setup_firmware(struct kvm *kvm) { - /* Load RTAS */ + /* Set up RTAS stub. All it is is a single hypercall: + 0: 7c 64 1b 78 mr r4,r3 + 4: 3c 60 00 00 lis r3,0 + 8: 60 63 f0 00 ori r3,r3,61440 + c: 44 00 00 22 sc 1 + 10: 4e 80 00 20 blr + */ + uint32_t *rtas = guest_flat_to_host(kvm, kvm->rtas_gra); + + rtas[0] = 0x7c641b78; + rtas[1] = 0x3c60; + rtas[2] = 0x6063f000; + rtas[3] = 0x4422; + rtas[4] = 0x4e800020; + kvm->rtas_size = 20; + + pr_info("Set up %ld bytes of RTAS at 0x%lx\n", + kvm->rtas_size, kvm->rtas_gra); /* Load SLOF */ diff --git a/tools/kvm/powerpc/spapr.h b/tools/kvm/powerpc/spapr.h new file mode 100644 index 000..4e5d7bd --- /dev/null +++ b/tools/kvm/powerpc/spapr.h @@ -0,0 +1,308 @@ +/* + * SPAPR definitions and declarations + * + * Borrowed heavily from QEMU's spapr.h, + * Copyright (c) 2010 David Gibson, IBM Corporation. + * + * Modifications by Matt Evans , IBM Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of t
[PATCH 0/8] kvm tools SPAPR PPC64 support
Hi, This set of patches builds upon the prep-work of the previous set and adds support to kvmtool for PPC64 SPAPR-based guests, i.e. an environment akin to an LPAR on IBM's pSeries machines. This support is not yet fully-featured but, in a basic state, works well. The guests have a functional but no-frills experience, with: - SMP guests - HV console (or RTAS console, for udbg) - Net, block over virtio-pci - No PAPR VIO/VSCSI/VNET yet - No fancyfeatures like migration yet Though minimal, guests are quite stable. There are obvious areas for future improvement: - Non-VRMA RMAs aren't supported, meaning POWER7-only for the moment - Other CPU-specific details are currently assumed (e.g. available page sizes); work is required to determine host capabilities and pass these up. - Support SLOF - Maybe support VIO - Some hypercalls used by partition firmware/SLOF (not the kernel) are unimplemented - Fancy PCI (e.g. passthrough) - Currently KVM_NR_CPUs is arbitrarily fixed at 255, and could be higher. Guests with this many CPUs boot fine. Some PPC KVM kernel-side features aren't implemented yet and have required kvmtool workarounds; mmio coalescing isn't supported and lack of ioeventfds requires virtio to gracefully fall back when it fails to register one. Cheers, Matt Matt Evans (8): kvm tools: Add initial SPAPR PPC64 architecture support kvm tools: Generate SPAPR PPC64 guest device tree kvm tools: Add SPAPR PPC64 hcall & rtascall structure kvm tools: Add SPAPR PPC64 HV console kvm tools: Add PPC64 XICS interrupt controller support kvm tools: Add PPC64 PCI Host Bridge kvm tools: Add PPC64 kvm_cpu__emulate_io() kvm tools: Make virtio-pci's ioeventfd__add_event() fall back gracefully if ioeventfds unavailable tools/kvm/Makefile | 16 + tools/kvm/include/kvm/ioeventfd.h|3 +- tools/kvm/ioeventfd.c| 12 +- tools/kvm/kvm.c |3 + tools/kvm/powerpc/include/kvm/barrier.h |6 + tools/kvm/powerpc/include/kvm/kvm-arch.h | 74 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h | 48 +++ tools/kvm/powerpc/ioport.c | 18 + tools/kvm/powerpc/irq.c | 62 +++ tools/kvm/powerpc/kvm-cpu.c | 281 ++ tools/kvm/powerpc/kvm.c | 466 +++ tools/kvm/powerpc/spapr.h| 316 +++ tools/kvm/powerpc/spapr_hcall.c | 151 tools/kvm/powerpc/spapr_hvcons.c | 101 + tools/kvm/powerpc/spapr_hvcons.h | 19 + tools/kvm/powerpc/spapr_pci.c| 429 + tools/kvm/powerpc/spapr_pci.h| 38 ++ tools/kvm/powerpc/spapr_rtas.c | 226 +++ tools/kvm/powerpc/xics.c | 529 ++ tools/kvm/powerpc/xics.h | 23 ++ tools/kvm/virtio/pci.c | 11 +- 21 files changed, 2827 insertions(+), 5 deletions(-) create mode 100644 tools/kvm/powerpc/include/kvm/barrier.h create mode 100644 tools/kvm/powerpc/include/kvm/kvm-arch.h create mode 100644 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h create mode 100644 tools/kvm/powerpc/ioport.c create mode 100644 tools/kvm/powerpc/irq.c create mode 100644 tools/kvm/powerpc/kvm-cpu.c create mode 100644 tools/kvm/powerpc/kvm.c create mode 100644 tools/kvm/powerpc/spapr.h create mode 100644 tools/kvm/powerpc/spapr_hcall.c create mode 100644 tools/kvm/powerpc/spapr_hvcons.c create mode 100644 tools/kvm/powerpc/spapr_hvcons.h create mode 100644 tools/kvm/powerpc/spapr_pci.c create mode 100644 tools/kvm/powerpc/spapr_pci.h create mode 100644 tools/kvm/powerpc/spapr_rtas.c create mode 100644 tools/kvm/powerpc/xics.c create mode 100644 tools/kvm/powerpc/xics.h -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/8] kvm tools: Generate SPAPR PPC64 guest device tree
The generated DT is the bare minimum structure required for SPAPR (on which subsequent patches for VIO, XICS, PCI etc. will build); root node, cpus, memory. Some aspects are currently hardwired for simplicity, for example advertised page sizes, HPT size, SLB size, VMX/DFP, etc. Future support of a variety of POWER CPUs should acquire this info from the host and encode appropriately. This requires a 64-bit libfdt. Signed-off-by: Matt Evans --- tools/kvm/Makefile |3 +- tools/kvm/powerpc/kvm.c | 141 +++ 2 files changed, 143 insertions(+), 1 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 58815a2..dc18959 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -129,7 +129,8 @@ ifeq ($(uname_M), ppc64) OBJS+= powerpc/kvm.o OBJS+= powerpc/kvm-cpu.o ARCH_INCLUDE := powerpc/include - CFLAGS += -m64 + CFLAGS += -m64 + LIBS+= -lfdt endif ### diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c index 036bfc0..d792bee 100644 --- a/tools/kvm/powerpc/kvm.c +++ b/tools/kvm/powerpc/kvm.c @@ -3,6 +3,9 @@ * * Copyright 2011 Matt Evans , IBM Corporation. * + * Portions of FDT setup borrowed from QEMU, copyright 2010 David Gibson, IBM + * Corporation. + * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 as published * by the Free Software Foundation. @@ -28,8 +31,11 @@ #include #include +#include #include +#define HPT_ORDER 24 + #define HUGETLBFS_PATH "/var/lib/hugetlbfs/global/pagesize-16MB/" static char kern_cmdline[2048]; @@ -212,9 +218,144 @@ bool load_bzimage(struct kvm *kvm, int fd_kernel, return false; } +#define SMT_THREADS 4 + +#define _FDT(exp) \ + do {\ + int ret = (exp);\ + if (ret < 0) { \ + die("Error creating device tree: %s: %s\n", \ + #exp, fdt_strerror(ret)); \ + } \ + } while (0) + +static uint32_t mfpvr(void) +{ + uint32_t r; + asm volatile ("mfpvr %0" : "=r"(r)); + return r; +} + static void setup_fdt(struct kvm *kvm) { + uint64_tmem_reg_property[] = { 0, cpu_to_be64(kvm->ram_size) }; + int smp_cpus = kvm->nrcpus; + uint32_tinterrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)}; + charhypertas_prop_kvm[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt" + "\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk"; + int i, j; + charcpu_name[30]; + u8 staging_fdt[FDT_MAX_SIZE]; + uint32_tpvr = mfpvr(); + + /* Generate an appropriate DT at kvm->fdt_gra */ + void *fdt_dest = guest_flat_to_host(kvm, kvm->fdt_gra); + void *fdt = staging_fdt; + + _FDT(fdt_create(fdt, FDT_MAX_SIZE)); + _FDT(fdt_finish_reservemap(fdt)); + + _FDT(fdt_begin_node(fdt, "")); + + _FDT(fdt_property_string(fdt, "device_type", "chrp")); + _FDT(fdt_property_string(fdt, "model", "IBM pSeries (emulated by kvmtool)")); + _FDT(fdt_property_cell(fdt, "#address-cells", 0x2)); + _FDT(fdt_property_cell(fdt, "#size-cells", 0x2)); + + /* /chosen */ + _FDT(fdt_begin_node(fdt, "chosen")); + /* cmdline */ + _FDT(fdt_property_string(fdt, "bootargs", kern_cmdline)); + /* Initrd */ + if (kvm->initrd_size != 0) { + uint32_t ird_st_prop = cpu_to_be32(kvm->initrd_gra); + uint32_t ird_end_prop = cpu_to_be32(kvm->initrd_gra + + kvm->initrd_size); + _FDT(fdt_property(fdt, "linux,initrd-start", + &ird_st_prop, sizeof(ird_st_prop))); + _FDT(fdt_property(fdt, "linux,initrd-end", + &ird_end_prop, sizeof(ird_end_prop))); + } + + /* Memory: We don't alloc. a separate RMA yet. If we ever need to +* (CAP_PPC_RMA == 2) then have one memory node for 0->RMAsize, and +* another RMAsize->endOfMem. +*/ + _FDT(fdt_begin_node(fdt, "memory@0")); + _FDT(fdt_property_string(fdt, "device_type", "memory")); + _FDT(fdt_property(fdt, "reg", mem_reg_property, sizeof(mem_reg_property))); + _FDT(fdt_end_node(fdt)); + + /* CPUs */ + _FDT(fdt_begin_node(fdt, "cpus")); + _FDT(fdt_property_cell(fdt, "#address-cells", 0x1)); + _FDT(fdt_property_cell(fdt, "#size-cells", 0x0)); + + for (i = 0; i < smp_cpus; i += SMT_THREADS) {
[PATCH 1/8] kvm tools: Add initial SPAPR PPC64 architecture support
This patch adds a new arch directory, powerpc, basic file structure, register setup and where necessary stubs out arch-specific functions (e.g. interrupts, runloop exits) that later patches will provide. The target is an SPAPR-compliant PPC64 machine (i.e. pSeries); there is no support for PPC32 or 'bare metal' PPC64 guests as yet. Subsequent patches implement the hcalls and RTAS required to boot SPAPR pSeries kernels. Memory is mapped from hugetlbfs (as that is currently required by upstream PPC64 HV-mode KVM). The mapping of a VRMA region is yet to be implemented; this is only necessary on processors that don't support VRMA, e.g. <= P6. Work is therefore needed to get this going on pre-P7 CPUs. Processor state is set up as a guest kernel would expect (both primary and secondaries), and SMP is fully supported. Finally, support is added for simply loading flat binary kernels (plus initrd). (bzImages are not used on PPC, and this series does not add zImage support or an ELF loader.) The intention is to later support loading firmware such as SLOF. Signed-off-by: Matt Evans --- tools/kvm/Makefile | 10 + tools/kvm/kvm.c |3 + tools/kvm/powerpc/include/kvm/barrier.h |6 + tools/kvm/powerpc/include/kvm/kvm-arch.h | 70 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h | 46 + tools/kvm/powerpc/ioport.c | 18 ++ tools/kvm/powerpc/irq.c | 40 + tools/kvm/powerpc/kvm-cpu.c | 232 ++ tools/kvm/powerpc/kvm.c | 231 + 9 files changed, 656 insertions(+), 0 deletions(-) create mode 100644 tools/kvm/powerpc/include/kvm/barrier.h create mode 100644 tools/kvm/powerpc/include/kvm/kvm-arch.h create mode 100644 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h create mode 100644 tools/kvm/powerpc/ioport.c create mode 100644 tools/kvm/powerpc/irq.c create mode 100644 tools/kvm/powerpc/kvm-cpu.c create mode 100644 tools/kvm/powerpc/kvm.c diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 57dc521..58815a2 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -121,6 +121,16 @@ ifeq ($(ARCH),x86) OTHEROBJS += x86/bios/bios-rom.o ARCH_INCLUDE := x86/include endif +# POWER/ppc: Actually only support ppc64 currently. +ifeq ($(uname_M), ppc64) + DEFINES += -DCONFIG_PPC + OBJS+= powerpc/ioport.o + OBJS+= powerpc/irq.o + OBJS+= powerpc/kvm.o + OBJS+= powerpc/kvm-cpu.o + ARCH_INCLUDE := powerpc/include + CFLAGS += -m64 +endif ### diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 503ceae..d716ede 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -49,6 +49,9 @@ const char *kvm_exit_reasons[] = { DEFINE_KVM_EXIT_REASON(KVM_EXIT_DCR), DEFINE_KVM_EXIT_REASON(KVM_EXIT_NMI), DEFINE_KVM_EXIT_REASON(KVM_EXIT_INTERNAL_ERROR), +#ifdef CONFIG_PPC64 + DEFINE_KVM_EXIT_REASON(KVM_EXIT_PAPR_HCALL), +#endif }; extern struct kvm *kvm; diff --git a/tools/kvm/powerpc/include/kvm/barrier.h b/tools/kvm/powerpc/include/kvm/barrier.h new file mode 100644 index 000..bc7d179 --- /dev/null +++ b/tools/kvm/powerpc/include/kvm/barrier.h @@ -0,0 +1,6 @@ +#ifndef _KVM_BARRIER_H_ +#define _KVM_BARRIER_H_ + +#include + +#endif /* _KVM_BARRIER_H_ */ diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h b/tools/kvm/powerpc/include/kvm/kvm-arch.h new file mode 100644 index 000..722d01c --- /dev/null +++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h @@ -0,0 +1,70 @@ +/* + * PPC64 architecture-specific definitions + * + * Copyright 2011 Matt Evans , IBM Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + */ + +#ifndef KVM__KVM_ARCH_H +#define KVM__KVM_ARCH_H + +#include +#include +#include + +#define KVM_NR_CPUS(255) + +/* MMIO lives after RAM, but it'd be nice if it didn't constantly move. + * Choose a suitably high address, e.g. 63T... This limits RAM size. + */ +#define PPC_MMIO_START 0x3F00UL +#define PPC_MMIO_SIZE 0x0100UL + +#define KERNEL_LOAD_ADDR 0x +#define KERNEL_START_ADDR 0x +#define KERNEL_SECONDARY_START_ADDR 0x0060 +#define INITRD_LOAD_ADDR 0x0280 + +#define FDT_MAX_SIZE 0x1 +#define RTAS_MAX_SIZE 0x1 + +#define TIMEBASE_FREQ 51200ULL + +#define KVM_MMIO_START PPC_MMIO_START + +/* This is the address that pci_get_io_space_block() starts allocating + * from. Note that this is a PCI bus address. + */ +#define KVM_PCI_MMIO_AREA 0x100 + +struct kvm { + int
[PATCH 28/28] kvm tools: Create arch-specific kvm_cpu__emulate_io()
Different architectures will deal with MMIO exits differently. For example, KVM_EXIT_IO is x86-specific, and I/O cycles are often synthesisted by steering into windows in PCI bridges on other architectures. This patch moves the IO/MMIO exit code from the main runloop into x86/kvm-cpu.c Signed-off-by: Matt Evans --- tools/kvm/include/kvm/kvm-cpu.h |1 + tools/kvm/kvm-cpu.c | 37 + tools/kvm/x86/kvm-cpu.c | 37 + 3 files changed, 43 insertions(+), 32 deletions(-) diff --git a/tools/kvm/include/kvm/kvm-cpu.h b/tools/kvm/include/kvm/kvm-cpu.h index 15618f1..6f38c0c 100644 --- a/tools/kvm/include/kvm/kvm-cpu.h +++ b/tools/kvm/include/kvm/kvm-cpu.h @@ -13,6 +13,7 @@ void kvm_cpu__run(struct kvm_cpu *vcpu); void kvm_cpu__reboot(void); int kvm_cpu__start(struct kvm_cpu *cpu); bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu); +bool kvm_cpu__emulate_io(struct kvm_cpu *cpu, struct kvm_run *kvm_run); int kvm_cpu__get_debug_fd(void); void kvm_cpu__set_debug_fd(int fd); diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c index 884a89f..c9fbc81 100644 --- a/tools/kvm/kvm-cpu.c +++ b/tools/kvm/kvm-cpu.c @@ -103,49 +103,22 @@ int kvm_cpu__start(struct kvm_cpu *cpu) kvm_cpu__show_registers(cpu); kvm_cpu__show_code(cpu); break; - case KVM_EXIT_IO: { - bool ret; - - ret = kvm__emulate_io(cpu->kvm, - cpu->kvm_run->io.port, - (u8 *)cpu->kvm_run + - cpu->kvm_run->io.data_offset, - cpu->kvm_run->io.direction, - cpu->kvm_run->io.size, - cpu->kvm_run->io.count); - - if (!ret) + case KVM_EXIT_IO: + case KVM_EXIT_MMIO: + if (!kvm_cpu__emulate_io(cpu, cpu->kvm_run)) goto panic_kvm; break; - } - case KVM_EXIT_MMIO: { - bool ret; - - ret = kvm__emulate_mmio(cpu->kvm, - cpu->kvm_run->mmio.phys_addr, - cpu->kvm_run->mmio.data, - cpu->kvm_run->mmio.len, - cpu->kvm_run->mmio.is_write); - - if (!ret) - goto panic_kvm; - break; - } case KVM_EXIT_INTR: if (cpu->is_running) break; goto exit_kvm; case KVM_EXIT_SHUTDOWN: goto exit_kvm; - default: { - bool ret; - - ret = kvm_cpu__handle_exit(cpu); - if (!ret) + default: + if (!kvm_cpu__handle_exit(cpu)) goto panic_kvm; break; } - } kvm_cpu__handle_coalesced_mmio(cpu); } diff --git a/tools/kvm/x86/kvm-cpu.c b/tools/kvm/x86/kvm-cpu.c index a0d10cc..665d742 100644 --- a/tools/kvm/x86/kvm-cpu.c +++ b/tools/kvm/x86/kvm-cpu.c @@ -217,6 +217,43 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu) return false; } +bool kvm_cpu__emulate_io(struct kvm_cpu *cpu, struct kvm_run *kvm_run) +{ + bool ret; + switch (kvm_run->exit_reason) { + case KVM_EXIT_IO: { + ret = kvm__emulate_io(cpu->kvm, + cpu->kvm_run->io.port, + (u8 *)cpu->kvm_run + + cpu->kvm_run->io.data_offset, + cpu->kvm_run->io.direction, + cpu->kvm_run->io.size, + cpu->kvm_run->io.count); + + if (!ret) + goto panic_kvm; + break; + } + case KVM_EXIT_MMIO: { + ret = kvm__emulate_mmio(cpu->kvm, + cpu->kvm_run->mmio.phys_addr, + cpu->kvm_run->mmio.data, + cpu->kvm_run->mmio.len, + cpu->kvm_run->mmio.is_write); + + if (!ret) + goto panic_kvm; + break; + } + default: + pr_warning("Unknown exit reason %d in %s\n", kvm_run->exit_reason, __FUNCTION__); + return false; + } + return true; +panic_kvm: + return false; +} + st
[PATCH 27/28] kvm tools: Arch-specific define for PCI MMIO allocation area
pci_get_io_space_block() used to grab addresses from KVM_32BIT_GAP_START + 0x100, which is x86-specific. Create a new define, KVM_PCI_MMIO_AREA, to specify a bus address these allocations can come from. Signed-off-by: Matt Evans --- tools/kvm/pci.c |8 ++-- tools/kvm/x86/include/kvm/kvm-arch.h |5 + 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/tools/kvm/pci.c b/tools/kvm/pci.c index 8282e23..045c1c5 100644 --- a/tools/kvm/pci.c +++ b/tools/kvm/pci.c @@ -11,8 +11,12 @@ static struct pci_device_header *pci_devices[PCI_MAX_DEVICES]; static union pci_config_addresspci_config_address; -/* This is within our PCI gap - in an unused area */ -static u32 io_space_blocks = KVM_32BIT_GAP_START + 0x100; +/* This is within our PCI gap - in an unused area. + * Note this is a PCI *bus address*, is used to assign BARs etc.! + * (That's why it can still 32bit even with 64bit guests-- 64bit + * PCI isn't currently supported.) + */ +static u32 io_space_blocks = KVM_PCI_MMIO_AREA; u32 pci_get_io_space_block(u32 size) { diff --git a/tools/kvm/x86/include/kvm/kvm-arch.h b/tools/kvm/x86/include/kvm/kvm-arch.h index 02aa8b9..686b1b8 100644 --- a/tools/kvm/x86/include/kvm/kvm-arch.h +++ b/tools/kvm/x86/include/kvm/kvm-arch.h @@ -18,6 +18,11 @@ #define KVM_MMIO_START KVM_32BIT_GAP_START +/* This is the address that pci_get_io_space_block() starts allocating + * from. Note that this is a PCI bus address (though same on x86). + */ +#define KVM_PCI_MMIO_AREA (KVM_MMIO_START + 0x100) + struct kvm { int sys_fd; /* For system ioctls(), i.e. /dev/kvm */ int vm_fd; /* For VM ioctls() */ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 26/28] kvm tools: Add pci__config_{rd,wr}(), pci__find_dev() and fix PCI config register addressing
This allows config space access in a more natural manner than clunky x86 IO ports, and is useful for other architectures. Furthermore, the actual registers were only accessed in 32bit chunks; other systems (e.g. PPC) allow smaller accesses so that, for example, the 16-bit config field can be read directly. This patch allows this sort of addressing. Signed-off-by: Matt Evans --- tools/kvm/include/kvm/pci.h |5 +++ tools/kvm/pci.c | 63 +++--- 2 files changed, 45 insertions(+), 23 deletions(-) diff --git a/tools/kvm/include/kvm/pci.h b/tools/kvm/include/kvm/pci.h index 88e92dc..be2b0bc 100644 --- a/tools/kvm/include/kvm/pci.h +++ b/tools/kvm/include/kvm/pci.h @@ -7,6 +7,8 @@ #include #include +#include "kvm/kvm.h" + #define PCI_MAX_DEVICES256 /* * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1. @@ -82,6 +84,9 @@ struct pci_device_header { void pci__init(void); void pci__register(struct pci_device_header *dev, u8 dev_num); +struct pci_device_header *pci__find_dev(u8 dev_num); u32 pci_get_io_space_block(u32 size); +void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size); +void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size); #endif /* KVM__PCI_H */ diff --git a/tools/kvm/pci.c b/tools/kvm/pci.c index 5bbcbc7..8282e23 100644 --- a/tools/kvm/pci.c +++ b/tools/kvm/pci.c @@ -77,7 +77,6 @@ static bool pci_device_exists(u8 bus_number, u8 device_number, u8 function_numbe static bool pci_config_data_out(struct ioport *ioport, struct kvm *kvm, u16 port, void *data, int size) { unsigned long start; - u8 dev_num; /* * If someone accesses PCI configuration space offsets that are not @@ -85,12 +84,41 @@ static bool pci_config_data_out(struct ioport *ioport, struct kvm *kvm, u16 port */ start = port - PCI_CONFIG_DATA; - dev_num = pci_config_address.device_number; + pci__config_wr(kvm, pci_config_address, data, size); + + return true; +} + +static bool pci_config_data_in(struct ioport *ioport, struct kvm *kvm, u16 port, void *data, int size) +{ + unsigned long start; + + /* +* If someone accesses PCI configuration space offsets that are not +* aligned to 4 bytes, it uses ioports to signify that. +*/ + start = port - PCI_CONFIG_DATA; + + pci__config_rd(kvm, pci_config_address, data, size); + + return true; +} + +static struct ioport_operations pci_config_data_ops = { + .io_in = pci_config_data_in, + .io_out = pci_config_data_out, +}; + +void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size) +{ + u8 dev_num; + + dev_num = addr.device_number; if (pci_device_exists(0, dev_num, 0)) { unsigned long offset; - offset = start + (pci_config_address.register_number << 2); + offset = addr.w & 0xff; if (offset < sizeof(struct pci_device_header)) { void *p = pci_devices[dev_num]; u8 bar = (offset - PCI_BAR_OFFSET(0)) / (sizeof(u32)); @@ -116,27 +144,18 @@ static bool pci_config_data_out(struct ioport *ioport, struct kvm *kvm, u16 port } } } - - return true; } -static bool pci_config_data_in(struct ioport *ioport, struct kvm *kvm, u16 port, void *data, int size) +void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size) { - unsigned long start; u8 dev_num; - /* -* If someone accesses PCI configuration space offsets that are not -* aligned to 4 bytes, it uses ioports to signify that. -*/ - start = port - PCI_CONFIG_DATA; - - dev_num = pci_config_address.device_number; + dev_num = addr.device_number; if (pci_device_exists(0, dev_num, 0)) { unsigned long offset; - offset = start + (pci_config_address.register_number << 2); + offset = addr.w & 0xff; if (offset < sizeof(struct pci_device_header)) { void *p = pci_devices[dev_num]; @@ -145,22 +164,20 @@ static bool pci_config_data_in(struct ioport *ioport, struct kvm *kvm, u16 port, memset(data, 0x00, size); } else memset(data, 0xff, size); - - return true; } -static struct ioport_operations pci_config_data_ops = { - .io_in = pci_config_data_in, - .io_out = pci_config_data_out, -}; - void pci__register(struct pci_device_header *dev, u8 dev_num) { assert(dev_num < PCI_MAX_DEVICES); - pci_devices[dev_num]= dev; } +struct pci_device_header *pci__find_dev(u8 dev_num) +{ + assert(dev_num < PCI_MA
[PATCH 25/28] kvm tools: Correctly set virtio-pci bar_size and remove hardwired address
The BAR addresses are set up fine, but missed the bar_size[] array which is now updated correspondingly. Use PCI_IO_SIZE instead of '0x100'. Signed-off-by: Matt Evans --- tools/kvm/virtio/pci.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c index 6b27ff8..ffa3768 100644 --- a/tools/kvm/virtio/pci.c +++ b/tools/kvm/virtio/pci.c @@ -293,8 +293,8 @@ int virtio_pci__init(struct kvm *kvm, struct virtio_trans *vtrans, void *dev, vpci->msix_pba_block = pci_get_io_space_block(PCI_IO_SIZE); vpci->base_addr = ioport__register(IOPORT_EMPTY, &virtio_pci__io_ops, IOPORT_SIZE, vtrans); - kvm__register_mmio(kvm, vpci->msix_io_block, 0x100, callback_mmio_table, vpci); - kvm__register_mmio(kvm, vpci->msix_pba_block, 0x100, callback_mmio_pba, vpci); + kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE, callback_mmio_table, vpci); + kvm__register_mmio(kvm, vpci->msix_pba_block, PCI_IO_SIZE, callback_mmio_pba, vpci); vpci->pci_hdr = (struct pci_device_header) { .vendor_id = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET), @@ -313,6 +313,9 @@ int virtio_pci__init(struct kvm *kvm, struct virtio_trans *vtrans, void *dev, | PCI_BASE_ADDRESS_MEM_TYPE_64), .status = cpu_to_le16(PCI_STATUS_CAP_LIST), .capabilities = (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr, + .bar_size[0]= IOPORT_SIZE, + .bar_size[1]= PCI_IO_SIZE, + .bar_size[3]= PCI_IO_SIZE, }; vpci->pci_hdr.msix.cap = PCI_CAP_ID_MSIX; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 24/28] kvm tools: Fix virtio-pci endian bug when reading VIRTIO_PCI_QUEUE_NUM
The field size is currently wrong, read into a 32bit word instead of 16. This casues trouble when BE. Signed-off-by: Matt Evans --- tools/kvm/virtio/pci.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c index 0ae93fb..6b27ff8 100644 --- a/tools/kvm/virtio/pci.c +++ b/tools/kvm/virtio/pci.c @@ -116,8 +116,7 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct kvm *kvm, u16 port, break; case VIRTIO_PCI_QUEUE_NUM: val = vtrans->virtio_ops->get_size_vq(kvm, vpci->dev, vpci->queue_selector); - ioport__write32(data, val); - break; + ioport__write16(data, val); break; case VIRTIO_PCI_STATUS: ioport__write8(data, vpci->status); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 23/28] kvm tools: Endian-sanitise pci.h and PCI device setup
vesa, pci-shmem and virtio-pci devices need to set up config space with little-endian conversions (as config space is LE). The pci_config_address bitfield also needs to be reversed when building on BE systems. Signed-off-by: Matt Evans --- tools/kvm/hw/pci-shmem.c | 23 +++-- tools/kvm/hw/vesa.c| 15 +++-- tools/kvm/include/kvm/ioport.h | 11 + tools/kvm/include/kvm/pci.h| 24 +- tools/kvm/pci.c|4 +- tools/kvm/virtio/pci.c | 41 +-- 6 files changed, 68 insertions(+), 50 deletions(-) diff --git a/tools/kvm/hw/pci-shmem.c b/tools/kvm/hw/pci-shmem.c index 780a377..fd954c5 100644 --- a/tools/kvm/hw/pci-shmem.c +++ b/tools/kvm/hw/pci-shmem.c @@ -8,21 +8,22 @@ #include "kvm/ioeventfd.h" #include +#include #include #include #include static struct pci_device_header pci_shmem_pci_device = { - .vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET, - .device_id = 0x1110, + .vendor_id = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET), + .device_id = cpu_to_le16(0x1110), .header_type= PCI_HEADER_TYPE_NORMAL, - .class = 0xFF, /* misc pci device */ - .status = PCI_STATUS_CAP_LIST, + .class[2] = 0xFF, /* misc pci device */ + .status = cpu_to_le16(PCI_STATUS_CAP_LIST), .capabilities = (void *)&pci_shmem_pci_device.msix - (void *)&pci_shmem_pci_device, .msix.cap = PCI_CAP_ID_MSIX, - .msix.ctrl = 1, - .msix.table_offset = 1, /* Use BAR 1 */ - .msix.pba_offset = 0x1001, /* Use BAR 1 */ + .msix.ctrl = cpu_to_le16(1), + .msix.table_offset = cpu_to_le32(1),/* Use BAR 1 */ + .msix.pba_offset = cpu_to_le32(0x1001), /* Use BAR 1 */ }; /* registers for the Inter-VM shared memory device */ @@ -123,7 +124,7 @@ int pci_shmem__get_local_irqfd(struct kvm *kvm) if (fd < 0) return fd; - if (pci_shmem_pci_device.msix.ctrl & PCI_MSIX_FLAGS_ENABLE) { + if (pci_shmem_pci_device.msix.ctrl & cpu_to_le16(PCI_MSIX_FLAGS_ENABLE)) { gsi = irq__add_msix_route(kvm, &msix_table[0].msg); } else { gsi = pci_shmem_pci_device.irq_line; @@ -241,11 +242,11 @@ int pci_shmem__init(struct kvm *kvm) * 1 - MSI-X MMIO space * 2 - Shared memory block */ - pci_shmem_pci_device.bar[0] = ivshmem_registers | PCI_BASE_ADDRESS_SPACE_IO; + pci_shmem_pci_device.bar[0] = cpu_to_le32(ivshmem_registers | PCI_BASE_ADDRESS_SPACE_IO); pci_shmem_pci_device.bar_size[0] = shmem_region->size; - pci_shmem_pci_device.bar[1] = msix_block | PCI_BASE_ADDRESS_SPACE_MEMORY; + pci_shmem_pci_device.bar[1] = cpu_to_le32(msix_block | PCI_BASE_ADDRESS_SPACE_MEMORY); pci_shmem_pci_device.bar_size[1] = 0x1010; - pci_shmem_pci_device.bar[2] = shmem_region->phys_addr | PCI_BASE_ADDRESS_SPACE_MEMORY; + pci_shmem_pci_device.bar[2] = cpu_to_le32(shmem_region->phys_addr | PCI_BASE_ADDRESS_SPACE_MEMORY); pci_shmem_pci_device.bar_size[2] = shmem_region->size; pci__register(&pci_shmem_pci_device, dev); diff --git a/tools/kvm/hw/vesa.c b/tools/kvm/hw/vesa.c index 22b1652..63f1082 100644 --- a/tools/kvm/hw/vesa.c +++ b/tools/kvm/hw/vesa.c @@ -8,6 +8,7 @@ #include "kvm/irq.h" #include "kvm/kvm.h" #include "kvm/pci.h" +#include #include #include @@ -31,14 +32,14 @@ static struct ioport_operations vesa_io_ops = { }; static struct pci_device_header vesa_pci_device = { - .vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET, - .device_id = PCI_DEVICE_ID_VESA, + .vendor_id = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET), + .device_id = cpu_to_le16(PCI_DEVICE_ID_VESA), .header_type= PCI_HEADER_TYPE_NORMAL, .revision_id= 0, - .class = 0x03, - .subsys_vendor_id = PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET, - .subsys_id = PCI_SUBSYSTEM_ID_VESA, - .bar[1] = VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY, + .class[2] = 0x03, + .subsys_vendor_id = cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET), + .subsys_id = cpu_to_le16(PCI_SUBSYSTEM_ID_VESA), + .bar[1] = cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY), .bar_size[1]= VESA_MEM_SIZE, }; @@ -56,7 +57,7 @@ struct framebuffer *vesa__init(struct kvm *kvm) vesa_pci_device.irq_pin = pin; vesa_pci_device.irq_line= line; vesa_base_addr = ioport__register(IOPORT_EMPTY, &vesa_io_ops, IOPORT_SIZE, NULL); - vesa_
[PATCH 22/28] kvm tools: Move PCI_MAX_DEVICES to pci.h
Other pieces of kvmtool may be interested in PCI_MAX_DEVICES. Signed-off-by: Matt Evans --- tools/kvm/include/kvm/pci.h |1 + tools/kvm/pci.c |1 - 2 files changed, 1 insertions(+), 1 deletions(-) diff --git a/tools/kvm/include/kvm/pci.h b/tools/kvm/include/kvm/pci.h index f71af0b..b578ad7 100644 --- a/tools/kvm/include/kvm/pci.h +++ b/tools/kvm/include/kvm/pci.h @@ -6,6 +6,7 @@ #include #include +#define PCI_MAX_DEVICES256 /* * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1. * ("Configuration Mechanism #1") of the PCI Local Bus Specification 2.1 for diff --git a/tools/kvm/pci.c b/tools/kvm/pci.c index d1afc05..920e13e 100644 --- a/tools/kvm/pci.c +++ b/tools/kvm/pci.c @@ -5,7 +5,6 @@ #include -#define PCI_MAX_DEVICES256 #define PCI_BAR_OFFSET(b) (offsetof(struct pci_device_header, bar[b])) static struct pci_device_header*pci_devices[PCI_MAX_DEVICES]; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 21/28] kvm tools: Add --hugetlbfs option to specify memory path
Some architectures may want to use hugetlbfs to mmap() their guest memory, so allow a path to be specified on the commandline and pass it to kvm__arch_init(). Signed-off-by: Matt Evans --- tools/kvm/builtin-run.c |4 +++- tools/kvm/include/kvm/kvm.h |4 ++-- tools/kvm/kvm.c |4 ++-- tools/kvm/x86/kvm.c |2 +- 4 files changed, 8 insertions(+), 6 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 84aa931..4c88169 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -84,6 +84,7 @@ static const char *guest_mac; static const char *host_mac; static const char *script; static const char *guest_name; +static const char *hugetlbfs_path; static struct virtio_net_params *net_params; static bool single_step; static bool readonly_image[MAX_DISK_IMAGES]; @@ -422,6 +423,7 @@ static const struct option options[] = { OPT_CALLBACK('\0', "tty", NULL, "tty id", "Remap guest TTY into a pty on the host", tty_parser), + OPT_STRING('\0', "hugetlbfs", &hugetlbfs_path, "path", "Hugetlbfs path"), OPT_GROUP("Kernel options:"), OPT_STRING('k', "kernel", &kernel_filename, "kernel", @@ -808,7 +810,7 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) guest_name = default_name; } - kvm = kvm__init(dev, ram_size, guest_name); + kvm = kvm__init(dev, hugetlbfs_path, ram_size, guest_name); kvm->single_step = single_step; diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index 5fe6e75..7159952 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -30,7 +30,7 @@ struct kvm_ext { void kvm__set_dir(const char *fmt, ...); const char *kvm__get_dir(void); -struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name); +struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 ram_size, const char *name); int kvm__recommended_cpus(struct kvm *kvm); int kvm__max_cpus(struct kvm *kvm); void kvm__init_ram(struct kvm *kvm); @@ -54,7 +54,7 @@ int kvm__enumerate_instances(int (*callback)(const char *name, int pid)); void kvm__remove_socket(const char *name); void kvm__arch_set_cmdline(char *cmdline, bool video); -void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const char *name); +void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char *hugetlbfs_path, u64 ram_size, const char *name); void kvm__arch_setup_firmware(struct kvm *kvm); bool kvm__arch_cpu_supports_vm(void); void kvm__arch_periodic_poll(struct kvm *kvm); diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 6f33e1a..503ceae 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -272,7 +272,7 @@ static void kvm__pid(int fd, u32 type, u32 len, u8 *msg) pr_warning("Failed sending PID"); } -struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name) +struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 ram_size, const char *name) { struct kvm *kvm; int ret; @@ -305,7 +305,7 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name) if (kvm__check_extensions(kvm)) die("A required KVM extention is not supported by OS"); - kvm__arch_init(kvm, kvm_dev, ram_size, name); + kvm__arch_init(kvm, kvm_dev, hugetlbfs_path, ram_size, name); kvm->name = name; diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c index 4ac21c0..76f805f 100644 --- a/tools/kvm/x86/kvm.c +++ b/tools/kvm/x86/kvm.c @@ -161,7 +161,7 @@ void kvm__arch_set_cmdline(char *cmdline, bool video) } /* Architecture-specific KVM init */ -void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const char *name) +void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char *hugetlbfs_path, u64 ram_size, const char *name) { struct kvm_pit_config pit_config = { .flags = 0, }; int ret; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 20/28] kvm tools: Init IRQs after determining nrcpus
IRQ init may involve per-CPU setup/allocation of resources, so make sure kvm->nrcpus is initialised before calling irq__init(). Signed-off-by: Matt Evans --- tools/kvm/builtin-run.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 576dcfa..84aa931 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -810,8 +810,6 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm = kvm__init(dev, ram_size, guest_name); - irq__init(kvm); - kvm->single_step = single_step; ioeventfd__init(); @@ -829,6 +827,8 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm->nrcpus = nrcpus; + irq__init(kvm); + pci__init(); /* -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 19/28] kvm tools: Perform CPU and firmware setup after devices are added
Currently some devices (in this case kbd, fb, vesa) are initialised after CPU/firmware setup. On some platforms (e.g. PPC) kvm__arch_setup_firmware() may be making a device tree. Any devices added after this point will be missed! Tiny refactor of builtin-run.c, moving timer start, firmware setup, cpu init to occur last. Signed-off-by: Matt Evans --- tools/kvm/builtin-run.c | 24 ++-- 1 files changed, 14 insertions(+), 10 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 32e19e7..576dcfa 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -933,16 +933,6 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) virtio_net__init(&net_params); } - kvm__start_timer(kvm); - - kvm__arch_setup_firmware(kvm); - - for (i = 0; i < nrcpus; i++) { - kvm_cpus[i] = kvm_cpu__init(kvm, i); - if (!kvm_cpus[i]) - die("unable to initialize KVM VCPU"); - } - kvm__init_ram(kvm); #ifdef CONFIG_X86 @@ -966,6 +956,20 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) fb__start(); + /* Device init all done; firmware init must +* come after this (it may set up device trees etc.) +*/ + + kvm__start_timer(kvm); + + kvm__arch_setup_firmware(kvm); + + for (i = 0; i < nrcpus; i++) { + kvm_cpus[i] = kvm_cpu__init(kvm, i); + if (!kvm_cpus[i]) + die("unable to initialize KVM VCPU"); + } + thread_pool__init(nr_online_cpus); ioeventfd__start(); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 18/28] kvm tools: Initialise PCI before devices start getting registered with PCI
Re-arrange pci__init() in builtin-run such that it comes before devices are initialised. Signed-off-by: Matt Evans --- tools/kvm/builtin-run.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index aaa5132..32e19e7 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -829,6 +829,8 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm->nrcpus = nrcpus; + pci__init(); + /* * vidmode should be either specified * either set by default @@ -896,8 +898,6 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) serial8250__init(kvm); - pci__init(); - if (active_console == CONSOLE_VIRTIO) virtio_console__init(kvm); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/28] kvm tools: Only call symbol__init() if we have BFD
CONFIG_HAS_BFD is optional, symbol.c inclusion is optional -- so make its init call dependent on CONFIG_HAS_BFD. Signed-off-by: Matt Evans --- tools/kvm/builtin-run.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 1257c90..aaa5132 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -798,8 +798,9 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) if (!script) script = DEFAULT_SCRIPT; +#ifdef CONFIG_HAS_BFD symbol__init(vmlinux_filename); - +#endif term_init(); if (!guest_name) { -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/28] kvm tools: Allow load_flat_binary() to load an initrd alongside
This patch passes the initrd fd and commandline to load_flat_binary(), which may be used to load both the kernel & an initrd (stashing or inserting the commandline as appropriate) in the same way that load_bzimage() does. This is especially useful when load_bzimage() is unused for a particular architecture. :-) Signed-off-by: Matt Evans --- tools/kvm/include/kvm/kvm.h |2 +- tools/kvm/kvm.c | 10 ++ tools/kvm/x86/kvm.c | 12 +--- 3 files changed, 16 insertions(+), 8 deletions(-) diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index fae2ba9..5fe6e75 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -59,7 +59,7 @@ void kvm__arch_setup_firmware(struct kvm *kvm); bool kvm__arch_cpu_supports_vm(void); void kvm__arch_periodic_poll(struct kvm *kvm); -int load_flat_binary(struct kvm *kvm, int fd); +int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char *kernel_cmdline); bool load_bzimage(struct kvm *kvm, int fd_kernel, int fd_initrd, const char *kernel_cmdline, u16 vidmode); /* diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 457de1a..6f33e1a 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -354,23 +354,25 @@ bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename, ret = load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline, vidmode); - if (initrd_filename) - close(fd_initrd); - if (ret) goto found_kernel; pr_warning("%s is not a bzImage. Trying to load it as a flat binary...", kernel_filename); - ret = load_flat_binary(kvm, fd_kernel); + ret = load_flat_binary(kvm, fd_kernel, fd_initrd, kernel_cmdline); + if (ret) goto found_kernel; + if (initrd_filename) + close(fd_initrd); close(fd_kernel); die("%s is not a valid bzImage or flat binary", kernel_filename); found_kernel: + if (initrd_filename) + close(fd_initrd); close(fd_kernel); return ret; diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c index 7071dc6..4ac21c0 100644 --- a/tools/kvm/x86/kvm.c +++ b/tools/kvm/x86/kvm.c @@ -227,17 +227,23 @@ void kvm__irq_trigger(struct kvm *kvm, int irq) #define BOOT_PROTOCOL_REQUIRED 0x206 #define LOAD_HIGH 0x01 -int load_flat_binary(struct kvm *kvm, int fd) +int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char *kernel_cmdline) { void *p; int nr; - if (lseek(fd, 0, SEEK_SET) < 0) + /* Some architectures may support loading an initrd alongside the flat kernel, +* but we do not. +*/ + if (fd_initrd != -1) + pr_warning("Loading initrd with flat binary not supported."); + + if (lseek(fd_kernel, 0, SEEK_SET) < 0) die_perror("lseek"); p = guest_real_to_host(kvm, BOOT_LOADER_SELECTOR, BOOT_LOADER_IP); - while ((nr = read(fd, p, 65536)) > 0) + while ((nr = read(fd_kernel, p, 65536)) > 0) p += nr; kvm->boot_selector = BOOT_LOADER_SELECTOR; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/28] kvm tools: Allow initrd_check() to match a cpio
cpios are valid as initrds too, so allow them through the check. Signed-off-by: Matt Evans --- tools/kvm/kvm.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 33243f1..457de1a 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -317,10 +317,11 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name) /* RFC 1952 */ #define GZIP_ID1 0x1f #define GZIP_ID2 0x8b - +#define CPIO_MAGIC "0707" +/* initrd may be gzipped, or a plain cpio */ static bool initrd_check(int fd) { - unsigned char id[2]; + unsigned char id[4]; if (read_in_full(fd, id, ARRAY_SIZE(id)) < 0) return false; @@ -328,7 +329,8 @@ static bool initrd_check(int fd) if (lseek(fd, 0, SEEK_SET) < 0) die_perror("lseek"); - return id[0] == GZIP_ID1 && id[1] == GZIP_ID2; + return (id[0] == GZIP_ID1 && id[1] == GZIP_ID2) || + !memcmp(id, CPIO_MAGIC, 4); } bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename, -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/28] kvm tools: Fix term_getc(), term_getc_iov() endian bugs
term_getc()'s int c has one byte written into it (at its lowest address) by read_in_full(). This is expected to be the least significant byte, but that isn't the case on BE! Use correct type, unsigned char. A similar issue exists in term_getc_iov(), which needs to write a char to the iov rather than an int. Signed-off-by: Matt Evans --- tools/kvm/term.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/tools/kvm/term.c b/tools/kvm/term.c index fb5d71c..440884e 100644 --- a/tools/kvm/term.c +++ b/tools/kvm/term.c @@ -30,11 +30,10 @@ int term_fds[4][2]; int term_getc(int who, int term) { - int c; + unsigned char c; if (who != active_console) return -1; - if (read_in_full(term_fds[term][TERM_FD_IN], &c, 1) < 0) return -1; @@ -84,7 +83,7 @@ int term_getc_iov(int who, struct iovec *iov, int iovcnt, int term) if (c < 0) return 0; - *((int *)iov[TERM_FD_IN].iov_base) = c; + *((char *)iov[TERM_FD_IN].iov_base) = (char)c; return sizeof(char); } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/28] kvm tools: Add CONSOLE_HV term type and allow it to be selected
This patch paves the way for adding a hypervisor console, useful on systems that support one out of the box yet don't have either serial port or virtio console support (e.g. kernels expecting POWER SPAPR). Signed-off-by: Matt Evans --- tools/kvm/builtin-run.c |8 ++-- tools/kvm/include/kvm/term.h |1 + 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index a67bd8c..1257c90 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -416,7 +416,7 @@ static const struct option options[] = { OPT_BOOLEAN('\0', "rng", &virtio_rng, "Enable virtio Random Number Generator"), OPT_CALLBACK('\0', "9p", NULL, "dir_to_share,tag_name", "Enable virtio 9p to share files between host and guest", virtio_9p_rootdir_parser), - OPT_STRING('\0', "console", &console, "serial or virtio", + OPT_STRING('\0', "console", &console, "serial, virtio or hv", "Console to use"), OPT_STRING('\0', "dev", &dev, "device_file", "KVM device file"), OPT_CALLBACK('\0', "tty", NULL, "tty id", @@ -776,8 +776,12 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) if (!strncmp(console, "virtio", 6)) active_console = CONSOLE_VIRTIO; - else + else if (!strncmp(console, "serial", 6)) active_console = CONSOLE_8250; + else if (!strncmp(console, "hv", 2)) + active_console = CONSOLE_HV; + else + pr_warning("No console!"); if (!host_ip) host_ip = DEFAULT_HOST_ADDR; diff --git a/tools/kvm/include/kvm/term.h b/tools/kvm/include/kvm/term.h index 938c26f..a6a9822 100644 --- a/tools/kvm/include/kvm/term.h +++ b/tools/kvm/include/kvm/term.h @@ -6,6 +6,7 @@ #define CONSOLE_8250 1 #define CONSOLE_VIRTIO 2 +#define CONSOLE_HV 3 int term_putc_iov(int who, struct iovec *iov, int iovcnt, int term); int term_getc_iov(int who, struct iovec *iov, int iovcnt, int term); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/28] kvm tools: Move arch-specific cmdline init into kvm__arch_set_cmdline()
Different systems will want different base kernel commandlines, e.g. non-x86 systems probably don't need noapic, i8042.* etc., so set the commandline up in arch-specific code. Then, if the resulting commandline is empty, don't strcat a space onto the front. Signed-off-by: Matt Evans --- tools/kvm/builtin-run.c | 12 +--- tools/kvm/include/kvm/kvm.h |1 + tools/kvm/x86/kvm.c | 11 +++ 3 files changed, 17 insertions(+), 7 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 9ef331e..a67bd8c 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -835,13 +835,11 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) vidmode = 0; memset(real_cmdline, 0, sizeof(real_cmdline)); - strcpy(real_cmdline, "noapic noacpi pci=conf1 reboot=k panic=1 i8042.direct=1 " - "i8042.dumbkbd=1 i8042.nopnp=1"); - if (vnc || sdl) { - strcat(real_cmdline, " video=vesafb console=tty0"); - } else - strcat(real_cmdline, " console=ttyS0 earlyprintk=serial i8042.noaux=1"); - strcat(real_cmdline, " "); + kvm__arch_set_cmdline(real_cmdline, vnc || sdl); + + if (strlen(real_cmdline) > 0) + strcat(real_cmdline, " "); + if (kernel_cmdline) strlcat(real_cmdline, kernel_cmdline, sizeof(real_cmdline)); diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index 60842d5..fae2ba9 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -53,6 +53,7 @@ int kvm__get_sock_by_instance(const char *name); int kvm__enumerate_instances(int (*callback)(const char *name, int pid)); void kvm__remove_socket(const char *name); +void kvm__arch_set_cmdline(char *cmdline, bool video); void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const char *name); void kvm__arch_setup_firmware(struct kvm *kvm); bool kvm__arch_cpu_supports_vm(void); diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c index 45dcb77..7071dc6 100644 --- a/tools/kvm/x86/kvm.c +++ b/tools/kvm/x86/kvm.c @@ -149,6 +149,17 @@ void kvm__init_ram(struct kvm *kvm) } } +/* Arch-specific commandline setup */ +void kvm__arch_set_cmdline(char *cmdline, bool video) +{ + strcpy(cmdline, "noapic noacpi pci=conf1 reboot=k panic=1 i8042.direct=1 " + "i8042.dumbkbd=1 i8042.nopnp=1"); + if (video) { + strcat(cmdline, " video=vesafb console=tty0"); + } else + strcat(cmdline, " console=ttyS0 earlyprintk=serial i8042.noaux=1"); +} + /* Architecture-specific KVM init */ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const char *name) { -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/28] kvm tools: kvm.c needs to include sys/stat.h for mkdir
Fix a missing include. Signed-off-by: Matt Evans --- tools/kvm/kvm.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index e526483..33243f1 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -8,6 +8,7 @@ #include #include +#include #include #include #include -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/28] kvm tools: term.h needs to include stdbool.h
Fix a missing include. Signed-off-by: Matt Evans --- tools/kvm/include/kvm/term.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/tools/kvm/include/kvm/term.h b/tools/kvm/include/kvm/term.h index 37ec731..938c26f 100644 --- a/tools/kvm/include/kvm/term.h +++ b/tools/kvm/include/kvm/term.h @@ -2,6 +2,7 @@ #define KVM__TERM_H #include +#include #define CONSOLE_8250 1 #define CONSOLE_VIRTIO 2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/28] kvm tools: Add kvm__arch_periodic_poll()
Currently, the SIGALRM handler calls device poll functions (for serial, virtio console) directly. Which devices are present and which require polling is a system-specific decision, so create a new function called from common code & move the x86-specific poll calls into it. Signed-off-by: Matt Evans --- tools/kvm/builtin-run.c |3 +-- tools/kvm/include/kvm/kvm.h |1 + tools/kvm/x86/kvm.c |8 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 7cf208d..9ef331e 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -522,8 +522,7 @@ static void handle_debug(int fd, u32 type, u32 len, u8 *msg) static void handle_sigalrm(int sig) { - serial8250__inject_interrupt(kvm); - virtio_console__inject_interrupt(kvm); + kvm__arch_periodic_poll(kvm); } static void handle_stop(int fd, u32 type, u32 len, u8 *msg) diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index ca1acc0..60842d5 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -56,6 +56,7 @@ void kvm__remove_socket(const char *name); void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const char *name); void kvm__arch_setup_firmware(struct kvm *kvm); bool kvm__arch_cpu_supports_vm(void); +void kvm__arch_periodic_poll(struct kvm *kvm); int load_flat_binary(struct kvm *kvm, int fd); bool load_bzimage(struct kvm *kvm, int fd_kernel, int fd_initrd, const char *kernel_cmdline, u16 vidmode); diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c index 75e4a52..45dcb77 100644 --- a/tools/kvm/x86/kvm.c +++ b/tools/kvm/x86/kvm.c @@ -4,6 +4,8 @@ #include "kvm/interrupt.h" #include "kvm/mptable.h" #include "kvm/util.h" +#include "kvm/8250-serial.h" +#include "kvm/virtio-console.h" #include #include @@ -358,3 +360,9 @@ void kvm__arch_setup_firmware(struct kvm *kvm) /* MP table */ mptable_setup(kvm, kvm->nrcpus); } + +void kvm__arch_periodic_poll(struct kvm *kvm) +{ + serial8250__inject_interrupt(kvm); + virtio_console__inject_interrupt(kvm); +} -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/28] kvm tools: Fix KVM_RUN exit code check
kvm_cpu__run() currently die()s if KVM_RUN returns non-zero. Some architectures may return positive values in non-error cases, whereas real errors are always negative return values. Check for those instead. Signed-off-by: Matt Evans --- tools/kvm/kvm-cpu.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c index 9bc0796..884a89f 100644 --- a/tools/kvm/kvm-cpu.c +++ b/tools/kvm/kvm-cpu.c @@ -30,7 +30,7 @@ void kvm_cpu__run(struct kvm_cpu *vcpu) int err; err = ioctl(vcpu->vcpu_fd, KVM_RUN, 0); - if (err && (errno != EINTR && errno != EAGAIN)) + if (err < 0 && (errno != EINTR && errno != EAGAIN)) die_perror("KVM_RUN failed"); } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/28] kvm tools: Move 'kvm__recommended_cpus' to arch-specific code
Architectures can recommend/count/determine number of CPUs differently, so move this out of generic code. Signed-off-by: Matt Evans --- tools/kvm/kvm.c | 30 -- tools/kvm/x86/kvm.c | 30 ++ 2 files changed, 30 insertions(+), 30 deletions(-) diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 7ce1640..e526483 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -259,17 +259,6 @@ void kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspac die_perror("KVM_SET_USER_MEMORY_REGION ioctl"); } -int kvm__recommended_cpus(struct kvm *kvm) -{ - int ret; - - ret = ioctl(kvm->sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_NR_VCPUS); - if (ret <= 0) - die_perror("KVM_CAP_NR_VCPUS"); - - return ret; -} - static void kvm__pid(int fd, u32 type, u32 len, u8 *msg) { pid_t pid = getpid(); @@ -282,25 +271,6 @@ static void kvm__pid(int fd, u32 type, u32 len, u8 *msg) pr_warning("Failed sending PID"); } -/* - * The following hack should be removed once 'x86: Raise the hard - * VCPU count limit' makes it's way into the mainline. - */ -#ifndef KVM_CAP_MAX_VCPUS -#define KVM_CAP_MAX_VCPUS 66 -#endif - -int kvm__max_cpus(struct kvm *kvm) -{ - int ret; - - ret = ioctl(kvm->sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_MAX_VCPUS); - if (ret <= 0) - ret = kvm__recommended_cpus(kvm); - - return ret; -} - struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name) { struct kvm *kvm; diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c index ac6c91e..75e4a52 100644 --- a/tools/kvm/x86/kvm.c +++ b/tools/kvm/x86/kvm.c @@ -76,6 +76,36 @@ bool kvm__arch_cpu_supports_vm(void) return regs.ecx & (1 << feature); } +int kvm__recommended_cpus(struct kvm *kvm) +{ + int ret; + + ret = ioctl(kvm->sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_NR_VCPUS); + if (ret <= 0) + die_perror("KVM_CAP_NR_VCPUS"); + + return ret; +} + +/* + * The following hack should be removed once 'x86: Raise the hard + * VCPU count limit' makes it's way into the mainline. + */ +#ifndef KVM_CAP_MAX_VCPUS +#define KVM_CAP_MAX_VCPUS 66 +#endif + +int kvm__max_cpus(struct kvm *kvm) +{ + int ret; + + ret = ioctl(kvm->sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_MAX_VCPUS); + if (ret <= 0) + ret = kvm__recommended_cpus(kvm); + + return ret; +} + /* * Allocating RAM size bigger than 4GB requires us to leave a gap * in the RAM which is used for PCI MMIO, hotplug, and unconfigured -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/28] kvm tools: Add arch-specific KVM_RUN exit handling via kvm_cpu__handle_exit()
This patch creates a new function in x86/kvm-cpu.c, kvm_cpu__handle_exit(), in which arch-specific exit reasons can be handled outside of the common runloop. Signed-off-by: Matt Evans --- tools/kvm/include/kvm/kvm-cpu.h |2 ++ tools/kvm/kvm-cpu.c | 10 -- tools/kvm/x86/kvm-cpu.c |5 + 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/tools/kvm/include/kvm/kvm-cpu.h b/tools/kvm/include/kvm/kvm-cpu.h index 719e286..15618f1 100644 --- a/tools/kvm/include/kvm/kvm-cpu.h +++ b/tools/kvm/include/kvm/kvm-cpu.h @@ -2,6 +2,7 @@ #define KVM__KVM_CPU_H #include "kvm/kvm-cpu-arch.h" +#include struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned long cpu_id); void kvm_cpu__delete(struct kvm_cpu *vcpu); @@ -11,6 +12,7 @@ void kvm_cpu__enable_singlestep(struct kvm_cpu *vcpu); void kvm_cpu__run(struct kvm_cpu *vcpu); void kvm_cpu__reboot(void); int kvm_cpu__start(struct kvm_cpu *cpu); +bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu); int kvm_cpu__get_debug_fd(void); void kvm_cpu__set_debug_fd(int fd); diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c index 5aba3bb..9bc0796 100644 --- a/tools/kvm/kvm-cpu.c +++ b/tools/kvm/kvm-cpu.c @@ -137,8 +137,14 @@ int kvm_cpu__start(struct kvm_cpu *cpu) goto exit_kvm; case KVM_EXIT_SHUTDOWN: goto exit_kvm; - default: - goto panic_kvm; + default: { + bool ret; + + ret = kvm_cpu__handle_exit(cpu); + if (!ret) + goto panic_kvm; + break; + } } kvm_cpu__handle_coalesced_mmio(cpu); } diff --git a/tools/kvm/x86/kvm-cpu.c b/tools/kvm/x86/kvm-cpu.c index b26b208..a0d10cc 100644 --- a/tools/kvm/x86/kvm-cpu.c +++ b/tools/kvm/x86/kvm-cpu.c @@ -212,6 +212,11 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu) kvm_cpu__setup_msrs(vcpu); } +bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu) +{ + return false; +} + static void print_dtable(const char *name, struct kvm_dtable *dtable) { dprintf(debug_fd, " %s %016llx %08hx\n", -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/28] kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately
On LP64 systems our u64s are just longs; remove the %llx'es in favour of PRIx64 etc. This patch also adds CFLAGS to the final link, so that any -m64 is obeyed when linking, too. Signed-off-by: Matt Evans --- tools/kvm/Makefile |2 +- tools/kvm/builtin-run.c | 14 -- tools/kvm/builtin-stat.c |4 +++- tools/kvm/disk/core.c|4 +++- tools/kvm/mmio.c |4 +++- 5 files changed, 18 insertions(+), 10 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 009a6ba..57dc521 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -218,7 +218,7 @@ KVMTOOLS-VERSION-FILE: $(PROGRAM): $(DEPS) $(OBJS) $(E) " LINK" $@ - $(Q) $(CC) $(OBJS) $(LIBS) -o $@ + $(Q) $(CC) $(CFLAGS) $(OBJS) $(LIBS) -o $@ $(GUEST_INIT): guest/init.c $(E) " LINK" $@ diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index e4aa87e..7cf208d 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -42,6 +42,8 @@ #include #include #include +#define __STDC_FORMAT_MACROS +#include #include #include @@ -383,8 +385,8 @@ static int shmem_parser(const struct option *opt, const char *arg, int unset) strcpy(handle, default_handle); } if (verbose) { - pr_info("shmem: phys_addr = %llx", phys_addr); - pr_info("shmem: size = %llx", size); + pr_info("shmem: phys_addr = %"PRIx64, phys_addr); + pr_info("shmem: size = %"PRIx64, size); pr_info("shmem: handle= %s", handle); pr_info("shmem: create= %d", create); } @@ -545,7 +547,7 @@ panic_kvm: current_kvm_cpu->kvm_run->exit_reason, kvm_exit_reasons[current_kvm_cpu->kvm_run->exit_reason]); if (current_kvm_cpu->kvm_run->exit_reason == KVM_EXIT_UNKNOWN) - fprintf(stderr, "KVM exit code: 0x%Lu\n", + fprintf(stderr, "KVM exit code: 0x%"PRIx64"\n", current_kvm_cpu->kvm_run->hw.hardware_exit_reason); kvm_cpu__set_debug_fd(STDOUT_FILENO); @@ -760,10 +762,10 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) ram_size= get_ram_size(nrcpus); if (ram_size < MIN_RAM_SIZE_MB) - die("Not enough memory specified: %lluMB (min %lluMB)", ram_size, MIN_RAM_SIZE_MB); + die("Not enough memory specified: %"PRIu64"MB (min %lluMB)", ram_size, MIN_RAM_SIZE_MB); if (ram_size > host_ram_size()) - pr_warning("Guest memory size %lluMB exceeds host physical RAM size %lluMB", ram_size, host_ram_size()); + pr_warning("Guest memory size %"PRIu64"MB exceeds host physical RAM size %"PRIu64"MB", ram_size, host_ram_size()); ram_size <<= MB_SHIFT; @@ -878,7 +880,7 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) virtio_blk__init_all(kvm); } - printf(" # kvm run -k %s -m %Lu -c %d --name %s\n", kernel_filename, ram_size / 1024 / 1024, nrcpus, guest_name); + printf(" # kvm run -k %s -m %"PRId64" -c %d --name %s\n", kernel_filename, ram_size / 1024 / 1024, nrcpus, guest_name); if (!kvm__load_kernel(kvm, kernel_filename, initrd_filename, real_cmdline, vidmode)) diff --git a/tools/kvm/builtin-stat.c b/tools/kvm/builtin-stat.c index e28eb5b..c1f2605 100644 --- a/tools/kvm/builtin-stat.c +++ b/tools/kvm/builtin-stat.c @@ -9,6 +9,8 @@ #include #include #include +#define __STDC_FORMAT_MACROS +#include #include @@ -97,7 +99,7 @@ static int do_memstat(const char *name, int sock) printf("The total amount of memory available (in bytes):"); break; } - printf("%llu\n", stats[i].val); + printf("%"PRId64"\n", stats[i].val); } printf("\n"); diff --git a/tools/kvm/disk/core.c b/tools/kvm/disk/core.c index 4915efd..a135851 100644 --- a/tools/kvm/disk/core.c +++ b/tools/kvm/disk/core.c @@ -4,6 +4,8 @@ #include #include +#define __STDC_FORMAT_MACROS +#include #define AIO_MAX 32 @@ -232,7 +234,7 @@ ssize_t disk_image__get_serial(struct disk_image *disk, void *buffer, ssize_t *l if (fstat(disk->fd, &st) != 0) return 0; - *len = snprintf(buffer, *len, "%llu%llu%llu", (u64)st.st_dev, (u64)st.st_rdev, (u64)st.st_ino); + *len = snprintf(buffer, *len, "%"PRId64"%"PRId64"%"PRId64, (u64)st.st_dev, (u64)st.st_rdev, (u64)st.st_ino); return *len; } diff --git a/tools/kvm/mmio.c b/tools/kvm/mmio.c index de7320f..1158bff 100644 --- a/tools/kvm/mmio.c +++ b/tools/kvm/mmio.c @@ -9,6 +9,8 @@ #include #include #include +#define __STDC_FORMAT_MACROS +#include #define mmio_node(n) rb_entry(n, struct mmio_mapping, node) @@ -124,7 +126,7 @@ bool kvm__emulate_mmio(struct kvm *k
[PATCH 04/28] kvm tools: Re-arrange Makefile to heed CFLAGS before checking for optional libs
The checks for optional libraries build code to perform the tests, so should respect certain CFLAGS -- in particular, -m64 so we check for 64bit libraries if they're required. Signed-off-by: Matt Evans --- tools/kvm/Makefile | 86 ++- 1 files changed, 44 insertions(+), 42 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index f85a154..009a6ba 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -85,48 +85,6 @@ OBJS += hw/vesa.o OBJS += hw/pci-shmem.o OBJS += kvm-ipc.o -FLAGS_BFD := $(CFLAGS) -lbfd -has_bfd := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD)) -ifeq ($(has_bfd),y) - CFLAGS += -DCONFIG_HAS_BFD - OBJS+= symbol.o - LIBS+= -lbfd -endif - -FLAGS_VNCSERVER := $(CFLAGS) -lvncserver -has_vncserver := $(call try-cc,$(SOURCE_VNCSERVER),$(FLAGS_VNCSERVER)) -ifeq ($(has_vncserver),y) - OBJS+= ui/vnc.o - CFLAGS += -DCONFIG_HAS_VNCSERVER - LIBS+= -lvncserver -endif - -FLAGS_SDL := $(CFLAGS) -lSDL -has_SDL := $(call try-cc,$(SOURCE_SDL),$(FLAGS_SDL)) -ifeq ($(has_SDL),y) - OBJS+= ui/sdl.o - CFLAGS += -DCONFIG_HAS_SDL - LIBS+= -lSDL -endif - -FLAGS_ZLIB := $(CFLAGS) -lz -has_ZLIB := $(call try-cc,$(SOURCE_ZLIB),$(FLAGS_ZLIB)) -ifeq ($(has_ZLIB),y) - CFLAGS += -DCONFIG_HAS_ZLIB - LIBS+= -lz -endif - -FLAGS_AIO := $(CFLAGS) -laio -has_AIO := $(call try-cc,$(SOURCE_AIO),$(FLAGS_AIO)) -ifeq ($(has_AIO),y) - CFLAGS += -DCONFIG_HAS_AIO - LIBS+= -laio -endif - -LIBS += -lrt -LIBS += -lpthread -LIBS += -lutil - # Additional ARCH settings for x86 ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ \ -e s/arm.*/arm/ -e s/sa110/arm/ \ @@ -172,6 +130,50 @@ else UNSUPP_ERR = endif + +FLAGS_BFD := $(CFLAGS) -lbfd +has_bfd := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD)) +ifeq ($(has_bfd),y) + CFLAGS += -DCONFIG_HAS_BFD + OBJS+= symbol.o + LIBS+= -lbfd +endif + +FLAGS_VNCSERVER := $(CFLAGS) -lvncserver +has_vncserver := $(call try-cc,$(SOURCE_VNCSERVER),$(FLAGS_VNCSERVER)) +ifeq ($(has_vncserver),y) + OBJS+= ui/vnc.o + CFLAGS += -DCONFIG_HAS_VNCSERVER + LIBS+= -lvncserver +endif + +FLAGS_SDL := $(CFLAGS) -lSDL +has_SDL := $(call try-cc,$(SOURCE_SDL),$(FLAGS_SDL)) +ifeq ($(has_SDL),y) + OBJS+= ui/sdl.o + CFLAGS += -DCONFIG_HAS_SDL + LIBS+= -lSDL +endif + +FLAGS_ZLIB := $(CFLAGS) -lz +has_ZLIB := $(call try-cc,$(SOURCE_ZLIB),$(FLAGS_ZLIB)) +ifeq ($(has_ZLIB),y) + CFLAGS += -DCONFIG_HAS_ZLIB + LIBS+= -lz +endif + +FLAGS_AIO := $(CFLAGS) -laio +has_AIO := $(call try-cc,$(SOURCE_AIO),$(FLAGS_AIO)) +ifeq ($(has_AIO),y) + CFLAGS += -DCONFIG_HAS_AIO + LIBS+= -laio +endif + +LIBS += -lrt +LIBS += -lpthread +LIBS += -lutil + + DEPS := $(patsubst %.o,%.d,$(OBJS)) OBJS += $(OTHEROBJS) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/28] kvm tools: Add Makefile parameter for kernel include path
This patch adds an 'I' parameter to override the default kernel include path of '../../include'. Signed-off-by: Matt Evans --- tools/kvm/Makefile |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index f58a1d8..f85a154 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -9,7 +9,12 @@ else E = @\# Q = endif -export E Q +ifneq ($(I), ) + KINCL_PATH=$(I) +else + KINCL_PATH=../.. +endif +export E Q KINCL_PATH include config/utilities.mak include config/feature-tests.mak @@ -176,7 +181,7 @@ DEFINES += -DKVMTOOLS_VERSION='"$(KVMTOOLS_VERSION)"' DEFINES+= -DBUILD_ARCH='"$(ARCH)"' KVM_INCLUDE := include -CFLAGS += $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) -I$(ARCH_INCLUDE) -I../../include -I../../arch/$(ARCH)/include/ -Os -g +CFLAGS += $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) -I$(ARCH_INCLUDE) -I$(KINCL_PATH)/include -I$(KINCL_PATH)/arch/$(ARCH)/include/ -Os -g ifneq ($(WERROR),0) WARNINGS += -Werror -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/28] kvm tools: Only build/init i8042 on x86
Not every architecture has an i8042 kbd controller, so only use this when building for x86. Signed-off-by: Matt Evans --- tools/kvm/Makefile |2 +- tools/kvm/builtin-run.c |2 ++ 2 files changed, 3 insertions(+), 1 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 243886e..f58a1d8 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -77,7 +77,6 @@ OBJS += util/strbuf.o OBJS += virtio/9p.o OBJS += virtio/9p-pdu.o OBJS += hw/vesa.o -OBJS += hw/i8042.o OBJS += hw/pci-shmem.o OBJS += kvm-ipc.o @@ -153,6 +152,7 @@ ifeq ($(ARCH),x86) OBJS+= x86/kvm.o OBJS+= x86/kvm-cpu.o OBJS+= x86/mptable.o + OBJS+= hw/i8042.o # Exclude BIOS object files from header dependencies. OTHEROBJS += x86/bios.o OTHEROBJS += x86/bios/bios-rom.o diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 9148d83..e4aa87e 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -941,7 +941,9 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm__init_ram(kvm); +#ifdef CONFIG_X86 kbd__init(kvm); +#endif pci_shmem__init(kvm); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/28] kvm tools: Prepare kvmtool for another architecture
Hi, This patch series rearranges and tidies various parts of kvmtool to pave the way for the addition of support for another architecture -- SPAPR PPC64. A second patch series will follow to present the PPC64 support. kvmtool is extremely x86-specific, so a fair chunk of refactoring into "common code" vs "architecture-specific code" is performed in this set. It also has a (refreshingly small) set of endian bugs that are fixed, plus assumptions about the hardware presented to the guest. I've started the series with the main meat-- moving/renaming things like bios, CPU setup, guest address space layout, interrupts, ioports etc., into a new x86/ directory. The Makefile determines an architecture and builds the appropriate dir, devices, etc. Follow-on patches change some of the mechanics, for example modifying the loop around ioctl(KVM_RUN) so that whilst it stays generic, it calls into arch-specific code to handle specific exit reasons, MMIO etc. The builtin-run initialisation path is rationalised so that PCI & IRQs are initialised before devices, and all of this happens before arch-specific code is given the chance to initialise any firmware and generate any device trees. Most of this series is fairly trivial, in moving code, making definitions arch-local or available via a header, endian sanitisation. The PCI code changes are probably most 'interesting', in that I have made the config space accesses available to those not using the PC ioport access method, plus wrapped initialisations of config space with cpu_to_leXX accesses. If there's anything in this series that'll cause the world to end, or stain, do let me know. :) Cheers, Matt Matt Evans (28): kvm tools: Split x86 arch-specific bits into x86/ kvm tools: Only build/init i8042 on x86 kvm tools: Add Makefile parameter for kernel include path kvm tools: Re-arrange Makefile to heed CFLAGS before checking for optional libs kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately kvm tools: Add arch-specific KVM_RUN exit handling via kvm_cpu__handle_exit() kvm tools: Move 'kvm__recommended_cpus' to arch-specific code kvm tools: Fix KVM_RUN exit code check kvm tools: Add kvm__arch_periodic_poll() kvm tools: term.h needs to include stdbool.h kvm tools: kvm.c needs to include sys/stat.h for mkdir kvm tools: Move arch-specific cmdline init into kvm__arch_set_cmdline() kvm tools: Add CONSOLE_HV term type and allow it to be selected kvm tools: Fix term_getc(), term_getc_iov() endian bugs kvm tools: Allow initrd_check() to match a cpio kvm tools: Allow load_flat_binary() to load an initrd alongside kvm tools: Only call symbol__init() if we have BFD kvm tools: Initialise PCI before devices start getting registered with PCI kvm tools: Perform CPU and firmware setup after devices are added kvm tools: Init IRQs after determining nrcpus kvm tools: Add --hugetlbfs option to specify memory path kvm tools: Move PCI_MAX_DEVICES to pci.h kvm tools: Endian-sanitise pci.h and PCI device setup kvm tools: Fix virtio-pci endian bug when reading VIRTIO_PCI_QUEUE_NUM kvm tools: Correctly set virtio-pci bar_size and remove hardwired address kvm tools: Add pci__config_{rd,wr}(), pci__find_dev() and fix PCI config register addressing kvm tools: Arch-specific define for PCI MMIO allocation area kvm tools: Create arch-specific kvm_cpu__emulate_io() tools/kvm/Makefile | 139 +--- tools/kvm/builtin-run.c | 82 +++-- tools/kvm/builtin-stat.c|4 +- tools/kvm/disk/core.c |4 +- tools/kvm/hw/pci-shmem.c| 23 +- tools/kvm/hw/vesa.c | 15 +- tools/kvm/include/kvm/ioport.h | 13 +- tools/kvm/include/kvm/kvm-cpu.h | 30 +-- tools/kvm/include/kvm/kvm.h | 62 +--- tools/kvm/include/kvm/pci.h | 30 ++- tools/kvm/include/kvm/term.h|2 + tools/kvm/ioport.c | 54 --- tools/kvm/kvm-cpu.c | 407 +- tools/kvm/kvm.c | 374 +--- tools/kvm/mmio.c|4 +- tools/kvm/pci.c | 76 +++-- tools/kvm/term.c|5 +- tools/kvm/virtio/pci.c | 51 ++-- tools/kvm/{ => x86}/bios.c |0 tools/kvm/{ => x86}/bios/.gitignore |0 tools/kvm/{ => x86}/bios/bios-rom.S |2 +- tools/kvm/{ => x86}/bios/e820.c |0 tools/kvm/{ => x86}/bios/entry.S|0 tools/kvm/{ => x86}/bios/gen-offsets.sh |0 tools/kvm/{ => x86}/bios/int10.c|0 tools/kvm/{ => x86}/bios/
Re: [PATCH RFC V3 4/4] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
On Wed, Nov 30, 2011 at 02:30:38PM +0530, Raghavendra K T wrote: > This patch extends Linux guests running on KVM hypervisor to support > pv-ticketlocks. > During smp_boot_cpus paravirtualied KVM guest detects if the hypervisor has > required feature (KVM_FEATURE_KICK_VCPU) to support pv-ticketlocks. If so, > support for pv-ticketlocks is registered via pv_lock_ops. > > Signed-off-by: Srivatsa Vaddagiri > Signed-off-by: Suzuki Poulose > Signed-off-by: Raghavendra K T > --- > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h > index 8b1d65d..7e419ad 100644 > --- a/arch/x86/include/asm/kvm_para.h > +++ b/arch/x86/include/asm/kvm_para.h > @@ -195,10 +195,21 @@ void kvm_async_pf_task_wait(u32 token); > void kvm_async_pf_task_wake(u32 token); > u32 kvm_read_and_reset_pf_reason(void); > extern void kvm_disable_steal_time(void); > -#else > -#define kvm_guest_init() do { } while (0) > + > +#ifdef CONFIG_PARAVIRT_SPINLOCKS > +void __init kvm_spinlock_init(void); > +#else /* CONFIG_PARAVIRT_SPINLOCKS */ > +static void kvm_spinlock_init(void) > +{ > +} > +#endif /* CONFIG_PARAVIRT_SPINLOCKS */ > + > +#else /* CONFIG_KVM_GUEST */ > +#define kvm_guest_init() do {} while (0) > #define kvm_async_pf_task_wait(T) do {} while(0) > #define kvm_async_pf_task_wake(T) do {} while(0) > +#define kvm_spinlock_init() do {} while (0) > + > static inline u32 kvm_read_and_reset_pf_reason(void) > { > return 0; > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index a9c2116..dffeea3 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -33,6 +33,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -545,6 +546,7 @@ static void __init kvm_smp_prepare_boot_cpu(void) > #endif > kvm_guest_cpu_init(); > native_smp_prepare_boot_cpu(); > + kvm_spinlock_init(); > } > > static void __cpuinit kvm_guest_cpu_online(void *dummy) > @@ -627,3 +629,248 @@ static __init int activate_jump_labels(void) > return 0; > } > arch_initcall(activate_jump_labels); > + > +#ifdef CONFIG_PARAVIRT_SPINLOCKS > + > +enum kvm_contention_stat { > + TAKEN_SLOW, > + TAKEN_SLOW_PICKUP, > + RELEASED_SLOW, > + RELEASED_SLOW_KICKED, > + NR_CONTENTION_STATS > +}; > + > +#ifdef CONFIG_KVM_DEBUG_FS > + > +static struct kvm_spinlock_stats > +{ > + u32 contention_stats[NR_CONTENTION_STATS]; > + > +#define HISTO_BUCKETS30 > + u32 histo_spin_blocked[HISTO_BUCKETS+1]; > + > + u64 time_blocked; > +} spinlock_stats; > + > +static u8 zero_stats; > + > +static inline void check_zero(void) > +{ > + u8 ret; > + u8 old = ACCESS_ONCE(zero_stats); > + if (unlikely(old)) { > + ret = cmpxchg(&zero_stats, old, 0); > + /* This ensures only one fellow resets the stat */ > + if (ret == old) > + memset(&spinlock_stats, 0, sizeof(spinlock_stats)); > + } > +} > + > +static inline void add_stats(enum kvm_contention_stat var, int val) You probably want 'int val' to be 'u32 val' as that is the type in contention_stats. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [PATCH RFC V3 1/4] debugfs: Add support to print u32 array in debugfs
On Wed, Nov 30, 2011 at 02:29:39PM +0530, Raghavendra K T wrote: > Add debugfs support to print u32-arrays in debugfs. Move the code from Xen to > debugfs > to make the code common for other users as well. > > Signed-off-by: Srivatsa Vaddagiri > Signed-off-by: Suzuki Poulose > Signed-off-by: Raghavendra K T Looks good to me. > --- > diff --git a/arch/x86/xen/debugfs.c b/arch/x86/xen/debugfs.c > index 7c0fedd..c8377fb 100644 > --- a/arch/x86/xen/debugfs.c > +++ b/arch/x86/xen/debugfs.c > @@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(void) > return d_xen_debug; > } > > -struct array_data > -{ > - void *array; > - unsigned elements; > -}; > - > -static int u32_array_open(struct inode *inode, struct file *file) > -{ > - file->private_data = NULL; > - return nonseekable_open(inode, file); > -} > - > -static size_t format_array(char *buf, size_t bufsize, const char *fmt, > -u32 *array, unsigned array_size) > -{ > - size_t ret = 0; > - unsigned i; > - > - for(i = 0; i < array_size; i++) { > - size_t len; > - > - len = snprintf(buf, bufsize, fmt, array[i]); > - len++; /* ' ' or '\n' */ > - ret += len; > - > - if (buf) { > - buf += len; > - bufsize -= len; > - buf[-1] = (i == array_size-1) ? '\n' : ' '; > - } > - } > - > - ret++; /* \0 */ > - if (buf) > - *buf = '\0'; > - > - return ret; > -} > - > -static char *format_array_alloc(const char *fmt, u32 *array, unsigned > array_size) > -{ > - size_t len = format_array(NULL, 0, fmt, array, array_size); > - char *ret; > - > - ret = kmalloc(len, GFP_KERNEL); > - if (ret == NULL) > - return NULL; > - > - format_array(ret, len, fmt, array, array_size); > - return ret; > -} > - > -static ssize_t u32_array_read(struct file *file, char __user *buf, size_t > len, > - loff_t *ppos) > -{ > - struct inode *inode = file->f_path.dentry->d_inode; > - struct array_data *data = inode->i_private; > - size_t size; > - > - if (*ppos == 0) { > - if (file->private_data) { > - kfree(file->private_data); > - file->private_data = NULL; > - } > - > - file->private_data = format_array_alloc("%u", data->array, > data->elements); > - } > - > - size = 0; > - if (file->private_data) > - size = strlen(file->private_data); > - > - return simple_read_from_buffer(buf, len, ppos, file->private_data, > size); > -} > - > -static int xen_array_release(struct inode *inode, struct file *file) > -{ > - kfree(file->private_data); > - > - return 0; > -} > - > -static const struct file_operations u32_array_fops = { > - .owner = THIS_MODULE, > - .open = u32_array_open, > - .release= xen_array_release, > - .read = u32_array_read, > - .llseek = no_llseek, > -}; > - > -struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode, > - struct dentry *parent, > - u32 *array, unsigned elements) > -{ > - struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL); > - > - if (data == NULL) > - return NULL; > - > - data->array = array; > - data->elements = elements; > - > - return debugfs_create_file(name, mode, parent, data, &u32_array_fops); > -} > diff --git a/arch/x86/xen/debugfs.h b/arch/x86/xen/debugfs.h > index e281320..12ebf33 100644 > --- a/arch/x86/xen/debugfs.h > +++ b/arch/x86/xen/debugfs.h > @@ -3,8 +3,4 @@ > > struct dentry * __init xen_init_debugfs(void); > > -struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode, > - struct dentry *parent, > - u32 *array, unsigned elements); > - > #endif /* _XEN_DEBUGFS_H */ > diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c > index fc506e6..14a8961 100644 > --- a/arch/x86/xen/spinlock.c > +++ b/arch/x86/xen/spinlock.c > @@ -286,7 +286,7 @@ static int __init xen_spinlock_debugfs(void) > debugfs_create_u64("time_blocked", 0444, d_spin_debug, > &spinlock_stats.time_blocked); > > - xen_debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug, > + debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug, >spinlock_stats.histo_spin_blocked, > HISTO_BUCKETS + 1); > > return 0; > diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c > index 90f7657..df44ccf 100644 > --- a/fs/debugfs/file.c > +++ b/fs/debugfs/file.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > > static ssize_t default_read_file(struct file *file, char __user *buf, >
kvm deadlock
Hello, I am struggling with repeatable full hardware locks when running 8-12 KVM vms. At some point before the hard lock I get a inconsistent lock state warning. An example of this can be found here: http://pastebin.com/8wKhgE2C After that the server continues to run for a while and then starts its death spiral. When it reaches that point it fails to log anything further to the disk, but by attaching a console I have been able to get a stack trace documenting the final implosion: http://pastebin.com/PbcN76bd All of the cores end up hung and the server stops responding to all input, including SysRq commands. I have seen this behavior on two machines (dual E5606 running Fedora 16) both passed cpuburnin testing and memtest86 scans without error. I have reproduced the crash and stack traces from a Fedora debugging kernel - 3.1.2-1 and with a vanilla 3.1.4 kernel. Nate Custer QA Analyst cPanel Inc-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On Mon, 2011-12-05 at 16:59 +0800, Jason Wang wrote: > In order to let the packets of a flow to be passed to the desired > guest cpu, we can co-operate with devices through programming the flow > director which was just a hash to queue table. > > This kinds of co-operation is done through the accelerate RFS support, > a device specific flow sterring method virtnet_fd() is used to modify > the flow director based on rfs mapping. The desired queue were > calculated through reverse mapping of the irq affinity table. In order > to parallelize the ingress path, irq affinity of rx queue were also > provides by the driver. > > In addition to accelerate RFS, we can also use the guest scheduler to > balance the load of TX and reduce the lock contention on egress path, > so the processor_id() were used to tx queue selection. [...] > +#ifdef CONFIG_RFS_ACCEL > + > +int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb, > +u16 rxq_index, u32 flow_id) > +{ > + struct virtnet_info *vi = netdev_priv(net_dev); > + u16 *table = NULL; > + > + if (skb->protocol != htons(ETH_P_IP) || !skb->rxhash) > + return -EPROTONOSUPPORT; Why only IPv4? > + table = kmap_atomic(vi->fd_page); > + table[skb->rxhash & TAP_HASH_MASK] = rxq_index; > + kunmap_atomic(table); > + > + return 0; > +} > +#endif This is not a proper implementation of ndo_rx_flow_steer. If you steer a flow by changing the RSS table this can easily cause packet reordering in other flows. The filtering should be more precise, ideally matching exactly a single flow by e.g. VID and IP 5-tuple. I think you need to add a second hash table which records exactly which flow is supposed to be steered. Also, you must call rps_may_expire_flow() to check whether an entry in this table may be replaced; otherwise you can cause packet reordering in the flow that was previously being steered. Finally, this function must return the table index it assigned, so that rps_may_expire_flow() works. > +static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb) > +{ > + int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) : > +smp_processor_id(); > + > + /* As we make use of the accelerate rfs which let the scheduler to > + * balance the load, it make sense to choose the tx queue also based on > + * theprocessor id? > + */ > + while (unlikely(txq >= dev->real_num_tx_queues)) > + txq -= dev->real_num_tx_queues; > + return txq; > +} [...] Don't do this, let XPS handle it. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5 V5] Add ioctl for KVMCLOCK_GUEST_STOPPED
Now that we have a flag that will tell the guest it was suspended, create an interface for that communication using a KVM ioctl. Signed-off-by: Eric B Munson Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@arndb.de Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: jeremy.fitzhardi...@citrix.com Cc: levinsasha...@gmail.com Cc: Jan Kiszka Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org --- Changes from V4: Rename KVM_GUEST_PAUSED to KVMCLOCK_GUEST_PAUSED Add new ioctl description to api.txt Documentation/virtual/kvm/api.txt | 12 arch/x86/include/asm/kvm_host.h |2 ++ arch/x86/kvm/x86.c| 20 include/linux/kvm.h |2 ++ 4 files changed, 36 insertions(+), 0 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 7945b0b..0f7dd99 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1450,6 +1450,18 @@ is supported; 2 if the processor requires all virtual machines to have an RMA, or 1 if the processor can use an RMA but doesn't require it, because it supports the Virtual RMA (VRMA) facility. +4.64 KVMCLOCK_GUEST_PAUSED + +Capability: basic +Architechtures: Any that implement pvclocks (currently x86 only) +Type: vcpu ioctl +Parameters: None +Returns: 0 on success, -1 on error + +This signals to the host kernel that the specified guest is being paused by +userspace. The host will set a flag in the pvclock structure that is checked +from the soft lockup watchdog. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b4973f4..beb94c6 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -672,6 +672,8 @@ int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes, gpa_t addr, unsigned long *ret); u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); +int kvm_set_guest_paused(struct kvm_vcpu *vcpu); + extern bool tdp_enabled; u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c38efd7..1dab5fd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3295,6 +3295,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp, goto out; } + case KVMCLOCK_GUEST_PAUSED: { + r = kvm_set_guest_paused(vcpu); + break; + } default: r = -EINVAL; } @@ -6117,6 +6121,22 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason, } EXPORT_SYMBOL_GPL(kvm_task_switch); +/* + * kvm_set_guest_paused() indicates to the guest kernel that it has been + * stopped by the hypervisor. This function will be called from the host only. + * EINVAL is returned when the host attempts to set the flag for a guest that + * does not support pv clocks. + */ +int kvm_set_guest_paused(struct kvm_vcpu *vcpu) +{ + struct pvclock_vcpu_time_info *src = &vcpu->arch.hv_clock; + if (!vcpu->arch.time_page) + return -EINVAL; + src->flags |= PVCLOCK_GUEST_STOPPED; + return 0; +} +EXPORT_SYMBOL_GPL(kvm_set_guest_paused); + int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) { diff --git a/include/linux/kvm.h b/include/linux/kvm.h index c3892fc..1d1ddef 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -762,6 +762,8 @@ struct kvm_clock_data { #define KVM_CREATE_SPAPR_TCE _IOW(KVMIO, 0xa8, struct kvm_create_spapr_tce) /* Available with KVM_CAP_RMA */ #define KVM_ALLOCATE_RMA _IOR(KVMIO, 0xa9, struct kvm_allocate_rma) +/* VM is being stopped by host */ +#define KVMCLOCK_GUEST_PAUSED_IO(KVMIO, 0xaa) #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0) -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5 V5] Add generic stubs for kvm stop check functions
Signed-off-by: Eric B Munson Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@arndb.de Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: jeremy.fitzhardi...@citrix.com Cc: levinsasha...@gmail.com Cc: Jan Kiszka Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org --- include/asm-generic/kvm_para.h | 14 ++ 1 files changed, 14 insertions(+), 0 deletions(-) create mode 100644 include/asm-generic/kvm_para.h diff --git a/include/asm-generic/kvm_para.h b/include/asm-generic/kvm_para.h new file mode 100644 index 000..177e1eb --- /dev/null +++ b/include/asm-generic/kvm_para.h @@ -0,0 +1,14 @@ +#ifndef _ASM_GENERIC_KVM_PARA_H +#define _ASM_GENERIC_KVM_PARA_H + + +/* + * This function is used by architectures that support kvm to avoid issuing + * false soft lockup messages. + */ +static inline bool kvm_check_and_clear_guest_paused(int cpu) +{ + return false; +} + +#endif -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 RFC] virtio-pci: flexible configuration layout
On Mon, Dec 05, 2011 at 11:16:05AM -0800, Jesse Barnes wrote: > On Mon, 14 Nov 2011 20:18:55 +0200 > "Michael S. Tsirkin" wrote: > > > Add a flexible mechanism to specify virtio configuration layout, using > > pci vendor-specific capability. A separate capability is used for each > > of common, device specific and data-path accesses. > > > > Warning: compiled only. > > This patch also needs to be split up, pci_iomap changes > > also need arch updates for non-x86. > > There might also be more spec changes. > > > > Posting here for early feedback, and to allow Sasha to > > proceed with his "kvm tool" work. > > > > Changes from v1: > > Updated to match v3 of the spec, see: > > Subject: [PATCHv3 RFC] virtio-spec: flexible configuration layout > > Message-ID: <2010122436.ga13...@redhat.com> > > In-Reply-To: <2009195901.ga28...@redhat.com> > > Looks like this conflicts with your other iomap changes... I didn't > check your latest tree; do you just add another patch on top for the > virtio changes now? > > Thanks, Yes. Rusty asked for more changes so that isn't yet pushed. > -- > Jesse Barnes, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: winXP "Standard PC" HAL and qemu-kvm >= 0.15
On 05.12.2011 17:28, Avi Kivity wrote: [] >> I haven't debugged further yet, -- because it were >> not easy to find out what was causing the regression >> and how to reproduce it, and also because I don't think >> it is the right HAL for qemu-kvm guest anyway. > > It's not, but the regression indicates we broke something. It would be > good to know what that is. So today I gave it a chance with git bisect, and here's what it found: First bad commit ef390067a72fe09977bb4ac8211313e1503302ea Merge: c7b3e90 0fd542f Author: Avi Kivity Date: Sun May 15 04:48:05 2011 -0400 Merge commit '0fd542fb7d13ddf12f897bb27c5950f31638b1df' into upstream-merge * commit '0fd542fb7d13ddf12f897bb27c5950f31638b1df': cpu: add set_memory flag to request dirty logging piix_pci: load path clean up piix_pci: optimize set irq path piix_pci: eliminate PIIX3State::pci_irq_levels pci: add accessor function to get irq levels cirrus_vga: remove unneeded reset Conflicts: exec.c Signed-off-by: Avi Kivity And just like with the 32/64bit lockup issue, this is a merge commit, which is not exactly useful. Any guesses? :) The problem is that so far, there's no known way to change to use proper hal type in winXP (except of reinstalling the guest), and there's no known workaround on the kvm side, so users are stuck with older versions. >> So, if anybody have some thoughts about this issue, >> and especially if you know a way to switch winXP HAL >> type to some ACPI variant without reinstalling, please >> speak up.. ;) > > I remember doing it somewhere in device manager, perhaps in the > processor entry. But it was years since I last did this. As I already mentioned, changing HAL type works from anything to "Standard PC", but not back. I'll try to investigate. >> Debian bugreport for a reference: http://bugs.debian.org/647312 >> >> Reproducer: install a winXP guest on kvm with -no-acpi so >> it chooses an "Uniprocessor with MPS" HAL. Switch it to >> "Standard PC" in device manager, reboot -- in 0.15+ it does >> not work anymore, while in 0.14 it continues to work fine. > > Most likely non-ACPI interrupt routing. The commit it bisected to talks about piix -- may it be related? Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5 V5] Add check for suspended vm in softlockup detector
A suspended VM can cause spurious soft lockup warnings. To avoid these, the watchdog now checks if the kernel knows it was stopped by the host and skips the warning if so. When the watchdog is reset successfully, clear the guest paused flag. Signed-off-by: Eric B Munson Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@arndb.de Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: jeremy.fitzhardi...@citrix.com Cc: levinsasha...@gmail.com Cc: Jan Kiszka Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org --- Changes from V3: Clear the PAUSED flag when the watchdog is reset kernel/watchdog.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 1d7bca7..7c62919 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -25,6 +25,7 @@ #include #include +#include #include int watchdog_enabled = 1; @@ -280,6 +281,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) __this_cpu_write(softlockup_touch_sync, false); sched_clock_tick(); } + + /* Clear the guest paused flag on watchdog reset */ + kvm_check_and_clear_guest_paused(smp_processor_id()); __touch_watchdog(); return HRTIMER_RESTART; } @@ -292,6 +296,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) */ duration = is_softlockup(touch_ts); if (unlikely(duration)) { + /* +* If a virtual machine is stopped by the host it can look to +* the watchdog like a soft lockup, check to see if the host +* stopped the vm before we issue the warning +*/ + if (kvm_check_and_clear_guest_paused(smp_processor_id())) + return HRTIMER_RESTART; + /* only warn once */ if (__this_cpu_read(soft_watchdog_warn) == true) return HRTIMER_RESTART; -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5 V5] Avoid soft lockup message when KVM is stopped by host
Changes from V4: Rename KVM_GUEST_PAUSED to KVMCLOCK_GUEST_PAUSED Add description of KVMCLOCK_GUEST_PAUSED ioctl to api.txt Changes from V3: Include CC's on patch 3 Drop clear flag ioctl and have the watchdog clear the flag when it is reset Changes from V2: A new kvm functions defined in kvm_para.h, the only change to pvclock is the initial flag definition Changes from V1: (Thanks Marcelo) Host code has all been moved to arch/x86/kvm/x86.c KVM_PAUSE_GUEST was renamed to KVM_GUEST_PAUSED When a guest kernel is stopped by the host hypervisor it can look like a soft lockup to the guest kernel. This false warning can mask later soft lockup warnings which may be real. This patch series adds a method for a host hypervisor to communicate to a guest kernel that it is being stopped. The final patch in the series has the watchdog check this flag when it goes to issue a soft lockup warning and skip the warning if the guest knows it was stopped. It was attempted to solve this in Qemu, but the side effects of saving and restoring the clock and tsc for each vcpu put the wall clock of the guest behind by the amount of time of the pause. This forces a guest to have ntp running in order to keep the wall clock accurate. Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@arndb.de Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: jeremy.fitzhardi...@citrix.com Cc: levinsasha...@gmail.com Cc: Jan Kiszka Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org Eric B Munson (5): Add flag to indicate that a vm was stopped by the host Add functions to check if the host has stopped the vm Add ioctl for KVMCLOCK_GUEST_STOPPED Add generic stubs for kvm stop check functions Add check for suspended vm in softlockup detector Documentation/virtual/kvm/api.txt | 12 arch/x86/include/asm/kvm_host.h|2 ++ arch/x86/include/asm/kvm_para.h|1 + arch/x86/include/asm/pvclock-abi.h |1 + arch/x86/kernel/kvmclock.c | 21 + arch/x86/kvm/x86.c | 20 include/asm-generic/kvm_para.h | 14 ++ include/linux/kvm.h|2 ++ kernel/watchdog.c | 12 9 files changed, 85 insertions(+), 0 deletions(-) create mode 100644 include/asm-generic/kvm_para.h -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5 V5] Add functions to check if the host has stopped the vm
When a host stops or suspends a VM it will set a flag to show this. The watchdog will use these functions to determine if a softlockup is real, or the result of a suspended VM. Signed-off-by: Eric B Munson Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@arndb.de Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: jeremy.fitzhardi...@citrix.com Cc: levinsasha...@gmail.com Cc: Jan Kiszka Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org --- arch/x86/include/asm/kvm_para.h |1 + arch/x86/kernel/kvmclock.c | 21 + 2 files changed, 22 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 734c376..e9d63a6 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -95,6 +95,7 @@ struct kvm_vcpu_pv_apf_data { extern void kvmclock_init(void); extern int kvm_register_clock(char *txt); +bool kvm_check_and_clear_guest_paused(int cpu); /* This instruction is vmcall. On non-VT architectures, it will generate a * trap that we will then rewrite to the appropriate instruction. diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 44842d7..f0c0599 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include @@ -114,6 +115,26 @@ static void kvm_get_preset_lpj(void) preset_lpj = lpj; } +bool kvm_check_and_clear_guest_paused(int cpu) +{ + bool ret = false; + struct pvclock_vcpu_time_info *src; + + /* +* per_cpu() is safe here because this function is only called from +* timer functions where preemption is already disabled. +*/ + WARN_ON(!in_atomic()); + src = &per_cpu(hv_clock, cpu); + if ((src->flags & PVCLOCK_GUEST_STOPPED) != 0) { + src->flags = src->flags & (~PVCLOCK_GUEST_STOPPED); + ret = true; + } + + return ret; +} +EXPORT_SYMBOL_GPL(kvm_check_and_clear_guest_paused); + static struct clocksource kvm_clock = { .name = "kvm-clock", .read = kvm_clock_get_cycles, -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5 V5] Add flag to indicate that a vm was stopped by the host
This flag will be used to check if the vm was stopped by the host when a soft lockup was detected. The host will set the flag when it stops the guest. On resume, the guest will check this flag if a soft lockup is detected and skip issuing the warning. Signed-off-by: Eric B Munson Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@arndb.de Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: jeremy.fitzhardi...@citrix.com Cc: levinsasha...@gmail.com Cc: Jan Kiszka Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org --- arch/x86/include/asm/pvclock-abi.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h index 35f2d19..6167fd7 100644 --- a/arch/x86/include/asm/pvclock-abi.h +++ b/arch/x86/include/asm/pvclock-abi.h @@ -40,5 +40,6 @@ struct pvclock_wall_clock { } __attribute__((__packed__)); #define PVCLOCK_TSC_STABLE_BIT (1 << 0) +#define PVCLOCK_GUEST_STOPPED (1 << 1) #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_PVCLOCK_ABI_H */ -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4] Guest stop notification
Often when a guest is stopped from the qemu console, it will report spurious soft lockup warnings on resume. There are kernel patches being discussed that will give the host the ability to tell the guest that it is being stopped and should ignore the soft lockup warning that generates. This patch uses the qemu Notifier system to tell the guest it is about to be stopped. Signed-off-by: Eric B Munson Cc: Avi Kivity Cc: Marcelo Tosatti Cc: Jan Kiszka Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: kvm@vger.kernel.org --- Changes from V3: Collapse new state change notification function into existsing function. Correct whitespace issues Change ioctl name to KVMCLOCK_GUEST_PAUSED Use for loop to iterate vpcu's Changes from V2: Move ioctl into hw/kvmclock.c so as other arches can use it as it is implemented Changes from V1: Remove unnecessary encapsulating function hw/kvmclock.c | 15 +++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/hw/kvmclock.c b/hw/kvmclock.c index 5388bc4..fa11dd7 100644 --- a/hw/kvmclock.c +++ b/hw/kvmclock.c @@ -16,6 +16,7 @@ #include "sysbus.h" #include "kvm.h" #include "kvmclock.h" +#include "cpu-all.h" #include #include @@ -62,10 +63,24 @@ static int kvmclock_post_load(void *opaque, int version_id) static void kvmclock_vm_state_change(void *opaque, int running, RunState state) { +int ret; +CPUState *penv = first_cpu; KVMClockState *s = opaque; if (running) { s->clock_valid = false; + +for (penv = first_cpu; penv != NULL; penv = penv->next_cpu) { +ret = kvm_vcpu_ioctl(penv, KVMCLOCK_GUEST_PAUSED, 0); +if (ret) { +if (ret != -EINVAL) { +fprintf(stderr, +"kvmclock_vm_state_change: %s\n", +strerror(-ret)); +} +return; +} +} } } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 3/5] macvtap: flow director support
Similarly, macvtap chould implement the ethtool {get,set}_rxfh_indir operations. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 2/5] tuntap: simple flow director support
On Mon, 2011-12-05 at 16:58 +0800, Jason Wang wrote: > This patch adds a simple flow director to tun/tap device. It is just a > page that contains the hash to queue mapping which could be changed by > user-space. The backend (tap/macvtap) would query this table to get > the desired queue of a packets when it send packets to userspace. This is just flow hashing (RSS), not flow steering. > The page address were set through a new kind of ioctl - TUNSETFD and > were pinned until device exit or another new page were specified. [...] You should implement ethtool ETHTOOL_{G,S}RXFHINDIR instead. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] ivshmem: fix guest unable to start with ioeventfd
2011/12/2 Cam Macdonell : > 2011/11/30 Cam Macdonell : >> 2011/11/30 Zang Hongyong : >>> Can this bug fix patch be applied yet? >> >> Sorry, for not replying yet. I'll test your patch within the next day. > > Have you confirmed the proper receipt of interrupts in the receiving guests? > > I can confirm the bug occurs with ioeventfd enabled and that the > patches fixes it, but sometime after 15.1, I no longer see interrupts > (MSI or regular) being delivered in the guest. > > I will bisect tomorrow. With Michael's help we debugged msi-x interrupt delivery. With that fix in place, this patch fixes ioeventfd in ivshmem. > > Cam > >> >>> With this bug, guest os cannot successfully boot with ioeventfd. >>> Thus the new PIO DoorBell patch cannot be posted. >> >> Well, you can certainly post the new patch, just clarify that it's >> dependent on this patch. >> >> Sincerely, >> Cam >> >>> >>> Thanks, >>> Hongyong >>> >>> 于 2011/11/24,星期四 18:05, zanghongy...@huawei.com 写道: From: Hongyong Zang When a guest boots with ioeventfd, an error (by gdb) occurs: Program received signal SIGSEGV, Segmentation fault. 0x006009cc in setup_ioeventfds (s=0x171dc40) at /home/louzhengwei/git_source/qemu-kvm/hw/ivshmem.c:363 363 for (j = 0; j < s->peers[i].nb_eventfds; j++) { The bug is due to accessing s->peers which is NULL. This patch uses the memory region API to replace the old one kvm_set_ioeventfd_mmio_long(). And this patch makes memory_region_add_eventfd() called in ivshmem_read() when qemu receives eventfd information from ivshmem_server. Signed-off-by: Hongyong Zang --- hw/ivshmem.c | 41 ++--- 1 files changed, 14 insertions(+), 27 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..be26f03 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -58,7 +58,6 @@ typedef struct IVShmemState { CharDriverState *server_chr; MemoryRegion ivshmem_mmio; -pcibus_t mmio_addr; /* We might need to register the BAR before we actually have the memory. * So prepare a container MemoryRegion for the BAR immediately and * add a subregion when we have the memory. @@ -346,8 +345,14 @@ static void close_guest_eventfds(IVShmemState *s, int posn) guest_curr_max = s->peers[posn].nb_eventfds; for (i = 0; i < guest_curr_max; i++) { -kvm_set_ioeventfd_mmio_long(s->peers[posn].eventfds[i], -s->mmio_addr + DOORBELL, (posn << 16) | i, 0); +if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { +memory_region_del_eventfd(&s->ivshmem_mmio, + DOORBELL, + 4, + true, + (posn << 16) | i, + s->peers[posn].eventfds[i]); +} close(s->peers[posn].eventfds[i]); } @@ -355,22 +360,6 @@ static void close_guest_eventfds(IVShmemState *s, int posn) s->peers[posn].nb_eventfds = 0; } -static void setup_ioeventfds(IVShmemState *s) { - -int i, j; - -for (i = 0; i <= s->max_peer; i++) { -for (j = 0; j < s->peers[i].nb_eventfds; j++) { -memory_region_add_eventfd(&s->ivshmem_mmio, - DOORBELL, - 4, - true, - (i << 16) | j, - s->peers[i].eventfds[j]); -} -} -} - /* this function increase the dynamic storage need to store data about other * guests */ static void increase_dynamic_storage(IVShmemState *s, int new_min_size) { @@ -491,10 +480,12 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) } if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { -if (kvm_set_ioeventfd_mmio_long(incoming_fd, s->mmio_addr + DOORBELL, -(incoming_posn << 16) | guest_max_eventfd, 1) < 0) { -fprintf(stderr, "ivshmem: ioeventfd not available\n"); -} +memory_region_add_eventfd(&s->ivshmem_mmio, + DOORBELL, + 4, + true, + (incoming_posn << 16) | guest_max_eventfd, + incoming_fd); } return; @@ -659,10 +650,6 @@ static int pci_ivshmem_init(PCIDevice *dev) m
Re: [PATCHv2 RFC] virtio-pci: flexible configuration layout
On Mon, 14 Nov 2011 20:18:55 +0200 "Michael S. Tsirkin" wrote: > Add a flexible mechanism to specify virtio configuration layout, using > pci vendor-specific capability. A separate capability is used for each > of common, device specific and data-path accesses. > > Warning: compiled only. > This patch also needs to be split up, pci_iomap changes > also need arch updates for non-x86. > There might also be more spec changes. > > Posting here for early feedback, and to allow Sasha to > proceed with his "kvm tool" work. > > Changes from v1: > Updated to match v3 of the spec, see: > Subject: [PATCHv3 RFC] virtio-spec: flexible configuration layout > Message-ID: <2010122436.ga13...@redhat.com> > In-Reply-To: <2009195901.ga28...@redhat.com> Looks like this conflicts with your other iomap changes... I didn't check your latest tree; do you just add another patch on top for the virtio changes now? Thanks, -- Jesse Barnes, Intel Open Source Technology Center signature.asc Description: PGP signature
Re: [PATCH v2 1/3] pci: Rework config space blocking services
On Fri, 4 Nov 2011 09:45:59 +0100 Jan Kiszka wrote: > pci_block_user_cfg_access was designed for the use case that a single > context, the IPR driver, temporarily delays user space accesses to the > config space via sysfs. This assumption became invalid by the time > pci_dev_reset was added as locking instance. Today, if you run two loops > in parallel that reset the same device via sysfs, you end up with a > kernel BUG as pci_block_user_cfg_access detect the broken assumption. > > This reworks the pci_block_user_cfg_access to a sleeping service > pci_cfg_access_lock and an atomic-compatible variant called > pci_cfg_access_trylock. The former not only blocks user space access as > before but also waits if access was already locked. The latter service > just returns false in this case, allowing the caller to resolve the > conflict instead of raising a BUG. > > Adaptions of the ipr driver were originally written by Brian King. Applied this series to linux-next, thanks. -- Jesse Barnes, Intel Open Source Technology Center signature.asc Description: PGP signature
Re: [PATCH 3/5 V4] Add ioctl for KVM_GUEST_STOPPED
On Sat, 03 Dec 2011, Sasha Levin wrote: > On Tue, 2011-11-29 at 16:35 -0500, Eric B Munson wrote: > > > > Now that we have a flag that will tell the guest it was suspended, > > create an interface for that communication using a KVM ioctl. > > > > Signed-off-by: Eric B Munson > > Can it be documented in api.txt as well? > > -- > > Sasha. > Thanks for the review, will do for V5. Eric signature.asc Description: Digital signature
[PATCH 5/5] kvm tools: Add 'kvm sandbox'
This patch adds 'kvm sandbox' which is a wrapper on top of 'kvm run' which allows the user to easily specify sandboxed command to run in a custom rootfs guest. Example usage: kvm sandbox -d test_guest -k some_kernel -- do_something_in_guest Suggested-by: Pekka Enberg Signed-off-by: Sasha Levin --- tools/kvm/Documentation/kvm-sandbox.txt | 16 ++ tools/kvm/Makefile |1 + tools/kvm/builtin-run.c | 49 +- tools/kvm/builtin-sandbox.c |9 ++ tools/kvm/command-list.txt |1 + tools/kvm/include/kvm/builtin-run.h |2 + tools/kvm/include/kvm/builtin-sandbox.h |6 tools/kvm/kvm-cmd.c |2 + 8 files changed, 84 insertions(+), 2 deletions(-) create mode 100644 tools/kvm/Documentation/kvm-sandbox.txt create mode 100644 tools/kvm/builtin-sandbox.c create mode 100644 tools/kvm/include/kvm/builtin-sandbox.h diff --git a/tools/kvm/Documentation/kvm-sandbox.txt b/tools/kvm/Documentation/kvm-sandbox.txt new file mode 100644 index 000..8f24fc7 --- /dev/null +++ b/tools/kvm/Documentation/kvm-sandbox.txt @@ -0,0 +1,16 @@ +kvm-sandbox(1) + + +NAME + +kvm-sandbox - Run a command in a sandboxed guest + +SYNOPSIS + +[verse] +'kvm sandbox ['kvm run' arguments] -- [sandboxed command]' + +DESCRIPTION +--- +The sandboxed command will run in a guest as part of it's init +command. diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index ece3306..24af1d0 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -85,6 +85,7 @@ OBJS += hw/vesa.o OBJS += hw/i8042.o OBJS += hw/pci-shmem.o OBJS += kvm-ipc.o +OBJS += builtin-sandbox.o FLAGS_BFD := $(CFLAGS) -lbfd has_bfd := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD)) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 5db6995..7a57b5c 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -53,6 +53,7 @@ #define DEFAULT_GUEST_MAC "02:15:15:15:15:15" #define DEFAULT_HOST_MAC "02:01:01:01:01:01" #define DEFAULT_SCRIPT "none" +const char *DEFAULT_SANDBOX_FILENAME = "guest/sandbox.sh"; #define MB_SHIFT (20) #define KB_SHIFT (10) @@ -94,6 +95,7 @@ static bool custom_rootfs; static bool no_net; static bool no_dhcp; extern bool ioport_debug; +static int kvm_run_wrapper; extern int active_console; extern int debug_iodelay; @@ -107,6 +109,15 @@ static const char * const run_usage[] = { NULL }; +enum { + KVM_RUN_SANDBOX, +}; + +void kvm_run_set_wrapper_sandbox(void) +{ + kvm_run_wrapper = KVM_RUN_SANDBOX; +} + static int img_name_parser(const struct option *opt, const char *arg, int unset) { char *sep; @@ -755,6 +766,35 @@ static int kvm_run_set_sandbox(void) return symlink(script, path); } +static void kvm_run_write_sandbox_cmd(const char **argv, int argc) +{ + const char script_hdr[] = "#! /bin/bash\n\n"; + int fd; + + remove(sandbox); + + fd = open(sandbox, O_RDWR | O_CREAT, 0777); + if (fd < 0) + die("Failed creating sandbox script"); + + if (write(fd, script_hdr, sizeof(script_hdr) - 1) <= 0) + die("Failed writing sandbox script"); + + while (argc) { + if (write(fd, argv[0], strlen(argv[0])) <= 0) + die("Failed writing sandbox script"); + if (argc - 1) + if (write(fd, " ", 1) <= 0) + die("Failed writing sandbox script"); + argv++; + argc--; + } + if (write(fd, "\n", 1) <= 0) + die("Failed writing sandbox script"); + + close(fd); +} + int kvm_cmd_run(int argc, const char **argv, const char *prefix) { static char real_cmdline[2048], default_name[20]; @@ -780,8 +820,13 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) PARSE_OPT_KEEP_DASHDASH); if (argc != 0) { /* Cusrom options, should have been handled elsewhere */ - if (strcmp(argv[0], "--") == 0) - break; + if (strcmp(argv[0], "--") == 0) { + if (kvm_run_wrapper == KVM_RUN_SANDBOX) { + sandbox = DEFAULT_SANDBOX_FILENAME; + kvm_run_write_sandbox_cmd(argv+1, argc-1); + break; + } + } if (kernel_filename) { fprintf(stderr, "Cannot handle parameter: " diff --git a/tools/kvm/builtin-sandbox.c b/tools/kvm/builtin-sandbox.c new file mode 100644 index 000..433f536 --- /dev/null +++ b/tools/kvm/builtin-sandbox.c @@ -0,0 +1,9 @@ +#include
[PATCH 4/5] kvm tools: Ignore parameters after dashdash in 'kvm run'
This allows other commands to wrap 'kvm run' and use the parameters user provides after a dash-dash for it's own use. Signed-off-by: Sasha Levin --- tools/kvm/builtin-run.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index cd14159..5db6995 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -776,8 +776,13 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) while (argc != 0) { argc = parse_options(argc, argv, options, run_usage, - PARSE_OPT_STOP_AT_NON_OPTION); + PARSE_OPT_STOP_AT_NON_OPTION | + PARSE_OPT_KEEP_DASHDASH); if (argc != 0) { + /* Cusrom options, should have been handled elsewhere */ + if (strcmp(argv[0], "--") == 0) + break; + if (kernel_filename) { fprintf(stderr, "Cannot handle parameter: " "%s\n", argv[0]); -- 1.7.8 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] kvm tools: Allow easily sandboxing applications within a guest
This patch adds a '--sandbox' argument when used in conjuction with a custom rootfs, it allows running a script or an executable in the guest environment by using executables and other files from the host. This is useful when testing code that might cause problems on the host, or to automate kernel testing since it's now easy to link a kvm tools test script with 'git bisect run'. Suggested-by: Ingo Molnar Signed-off-by: Sasha Levin --- tools/kvm/builtin-run.c | 31 +++ tools/kvm/guest/init_stage2.c | 13 - 2 files changed, 43 insertions(+), 1 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index de3001e..cd14159 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -82,6 +82,7 @@ static const char *guest_mac; static const char *host_mac; static const char *script; static const char *guest_name; +static const char *sandbox; static struct virtio_net_params *net_params; static bool single_step; static bool readonly_image[MAX_DISK_IMAGES]; @@ -420,6 +421,8 @@ static const struct option options[] = { OPT_CALLBACK('\0', "tty", NULL, "tty id", "Remap guest TTY into a pty on the host", tty_parser), + OPT_STRING('\0', "sandbox", &sandbox, "script", + "Run this script when booting into custom rootfs"), OPT_GROUP("Kernel options:"), OPT_STRING('k', "kernel", &kernel_filename, "kernel", @@ -727,6 +730,31 @@ static int kvm_custom_stage2(void) return r; } +static int kvm_run_set_sandbox(void) +{ + const char *guestfs_name = "default"; + char path[PATH_MAX], script[PATH_MAX], *tmp; + + if (image_filename[0]) + guestfs_name = image_filename[0]; + + snprintf(path, PATH_MAX, "%s%s/virt/sandbox.sh", kvm__get_dir(), guestfs_name); + + remove(path); + + if (sandbox == NULL) + return 0; + + tmp = realpath(sandbox, NULL); + if (tmp == NULL) + return -ENOMEM; + + snprintf(script, PATH_MAX, "/host/%s", tmp); + free(tmp); + + return symlink(script, path); +} + int kvm_cmd_run(int argc, const char **argv, const char *prefix) { static char real_cmdline[2048], default_name[20]; @@ -886,7 +914,10 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) if (using_rootfs) { strcat(real_cmdline, " root=/dev/root rw rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p"); if (custom_rootfs) { + kvm_run_set_sandbox(); + strcat(real_cmdline, " init=/virt/init"); + if (!no_dhcp) strcat(real_cmdline, " ip=dhcp"); if (kvm_custom_stage2()) diff --git a/tools/kvm/guest/init_stage2.c b/tools/kvm/guest/init_stage2.c index af615a0..6489fee 100644 --- a/tools/kvm/guest/init_stage2.c +++ b/tools/kvm/guest/init_stage2.c @@ -16,6 +16,14 @@ static int run_process(char *filename) return execve(filename, new_argv, new_env); } +static int run_process_sandbox(char *filename) +{ + char *new_argv[] = { filename, "/virt/sandbox.sh", NULL }; + char *new_env[] = { "TERM=linux", NULL }; + + return execve(filename, new_argv, new_env); +} + int main(int argc, char *argv[]) { /* get session leader */ @@ -26,7 +34,10 @@ int main(int argc, char *argv[]) puts("Starting '/bin/sh'..."); - run_process("/bin/sh"); + if (access("/virt/sandbox.sh", R_OK) == 0) + run_process_sandbox("/bin/sh"); + else + run_process("/bin/sh"); printf("Init failed: %s\n", strerror(errno)); -- 1.7.8 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] kvm tools: Remove double 'init=' kernel param
Signed-off-by: Sasha Levin --- tools/kvm/builtin-run.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 9635c82..de3001e 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -881,9 +881,6 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) if (virtio_9p__register(kvm, "/", "hostfs") < 0) die("Unable to initialize virtio 9p"); using_rootfs = custom_rootfs = 1; - - if (!strstr(real_cmdline, "init=")) - strlcat(real_cmdline, " init=/bin/sh ", sizeof(real_cmdline)); } if (using_rootfs) { -- 1.7.8 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] kvm tools: Split custom rootfs init into two stages
Currently custom rootfs init is built along with the main KVM tools executable and is copied into custom rootfs directories when they are created with 'kvm setup'. The problem there is that if the init code changes, they have to be manually copied to custom rootfs directories. Instead, this patch splits init process into two parts. One part that simply handles mounts, and passes it to stage 2 of the init. Stage 2 really sits along in the code tree, and does all the heavy lifting. This allows us to make init changes in the code tree and have it automatically be updated in custom rootfs guests without having to copy files over manua Signed-off-by: Sasha Levin --- tools/kvm/Makefile|9 +++-- tools/kvm/builtin-run.c | 27 +++ tools/kvm/guest/init.c| 14 +++--- tools/kvm/guest/init_stage2.c | 34 ++ 4 files changed, 71 insertions(+), 13 deletions(-) create mode 100644 tools/kvm/guest/init_stage2.c diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index bb5f6b0..ece3306 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -21,6 +21,7 @@ TAGS := ctags PROGRAM:= kvm GUEST_INIT := guest/init +GUEST_INIT_S2 := guest/init_stage2 OBJS += builtin-balloon.o OBJS += builtin-debug.o @@ -179,7 +180,7 @@ WARNINGS += -Wwrite-strings CFLAGS += $(WARNINGS) -all: $(PROGRAM) $(GUEST_INIT) +all: $(PROGRAM) $(GUEST_INIT) $(GUEST_INIT_S2) KVMTOOLS-VERSION-FILE: @$(SHELL_PATH) util/KVMTOOLS-VERSION-GEN $(OUTPUT) @@ -193,6 +194,10 @@ $(GUEST_INIT): guest/init.c $(E) " LINK" $@ $(Q) $(CC) -static guest/init.c -o $@ +$(GUEST_INIT_S2): guest/init_stage2.c + $(E) " LINK" $@ + $(Q) $(CC) -static guest/init_stage2.c -o $@ + $(DEPS): %.d: %.c @@ -269,7 +274,7 @@ clean: $(Q) rm -f bios/bios-rom.h $(Q) rm -f tests/boot/boot_test.iso $(Q) rm -rf tests/boot/rootfs/ - $(Q) rm -f $(DEPS) $(OBJS) $(PROGRAM) $(GUEST_INIT) + $(Q) rm -f $(DEPS) $(OBJS) $(PROGRAM) $(GUEST_INIT) $(GUEST_INIT_S2) $(Q) rm -f cscope.* $(Q) rm -f $(KVM_INCLUDE)/common-cmds.h $(Q) rm -f KVMTOOLS-VERSION-FILE diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 43cf2c4..9635c82 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -702,6 +702,31 @@ void kvm_run_help(void) usage_with_options(run_usage, options); } +static int kvm_custom_stage2(void) +{ + char tmp[PATH_MAX], dst[PATH_MAX], *src; + const char *rootfs; + int r; + + src = realpath("guest/init_stage2", NULL); + if (src == NULL) + return -ENOMEM; + + if (image_filename[0] == NULL) + rootfs = "default"; + else + rootfs = image_filename[0]; + + snprintf(tmp, PATH_MAX, "%s%s/virt/init_stage2", kvm__get_dir(), rootfs); + remove(tmp); + + snprintf(dst, PATH_MAX, "/host/%s", src); + r = symlink(dst, tmp); + free(src); + + return r; +} + int kvm_cmd_run(int argc, const char **argv, const char *prefix) { static char real_cmdline[2048], default_name[20]; @@ -867,6 +892,8 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) strcat(real_cmdline, " init=/virt/init"); if (!no_dhcp) strcat(real_cmdline, " ip=dhcp"); + if (kvm_custom_stage2()) + die("Failed linking stage 2 of init."); } } else if (!strstr(real_cmdline, "root=")) { strlcat(real_cmdline, " root=/dev/vda rw ", sizeof(real_cmdline)); diff --git a/tools/kvm/guest/init.c b/tools/kvm/guest/init.c index 8975023..032a261 100644 --- a/tools/kvm/guest/init.c +++ b/tools/kvm/guest/init.c @@ -1,6 +1,6 @@ /* - * This is a simple init for shared rootfs guests. It brings up critical - * mountpoints and then launches /bin/sh. + * This is a simple init for shared rootfs guests. This part should be limited + * to doing mounts and running stage 2 of the init process. */ #include #include @@ -30,15 +30,7 @@ int main(int argc, char *argv[]) do_mounts(); -/* get session leader */ -setsid(); - -/* set controlling terminal */ -ioctl (0, TIOCSCTTY, 1); - - puts("Starting '/bin/sh'..."); - - run_process("/bin/sh"); + run_process("/virt/init_stage2"); printf("Init failed: %s\n", strerror(errno)); diff --git a/tools/kvm/guest/init_stage2.c b/tools/kvm/guest/init_stage2.c new file mode 100644 index 000..af615a0 --- /dev/null +++ b/tools/kvm/guest/init_stage2.c @@ -0,0 +1,34 @@ +/* + * This is a stage 2 of the init. This part should do all the heavy + * lifting such as setting up the console and calling /bin/sh. + */ +#include +#include +#include +#include +#include + +static int run_proces
KVM call agenda for 12/6 (Tuesday) @ 10am US/Eastern
Hi Please send in any agenda items you are interested in covering. Proposal (from Anthony): > 1. A short introduction to each of the guest agents, what guests they > support, and what verbs they support. > 2. A short description of key requirements from each party (oVirt, > libvirt, QEMU) for a guest agent > 3. An open discussion about possible ways to collaborate/converge. Notice that guest integration will take more than one week (Anthony estimation also). For libvirt and ovirt folks, please contact me or Chris for details of the call. Thanks, Juan. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 2011-12-05 14:36, Avi Kivity wrote: > On 12/05/2011 03:29 PM, Jan Kiszka wrote: >> On 2011-12-05 14:14, Avi Kivity wrote: >>> On 12/05/2011 02:47 PM, Jan Kiszka wrote: > > (the memory API added unstable names, hopefully the QOM can take over > the stable ones and we'll have a good way to denote the unstable ones). > OK, maybe - or likely - we should make those device models have the same names in QOM once instantiated. But I'm still convinced they should remain separated models in contrast to a single model with a property. >>> >>> What do you mean by separate models? You share all the code you can, >>> and don't share the code you can't. To me, single model == single name. >> >> But different configuration. > > Right, just like IDE with different backends. Except that there is a comparably large infrastructure to manage those backends. > >>> The kvm ioapic, e.g., requires an additional property (gsi_base) that is meaningless for user space devices. And its interrupts have to be wired&configured differently at board model level. So, from the QEMU POV, it is a very different device. Just the guest does not notice. >>> >>> It's like qcow2 and raw/native IO are wire differently, or virtio-net >>> and vhost-net. But it's the same IDE device or virtio NIC. >> >> That would mean introducing a backend/frontend concept for irqchips. > > We could do it, have one ioapic model with ioapic_ops->eoi_broadcast(). > Most of the interfaces already dispatch dynamically (qdev gpio/irq) so > there wouldn't be much more there. The problem is configuration. Just by setting ioapic.backend=xxx, we cannot pass down parameters that are backend-specific. We could ignore this issue and make all specific parameters visible via the frontend. Would be slightly ugly. > > To me, how it's actually implemented is not important. What is > important is that save/restore, the monitor, and the guest don't notice > any changes. I widely agree, except that differentiation (or backend awareness) has to be preserved in the monitor. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH V3] Guest stop notification
On 2011-12-05 14:35, Marcelo Tosatti wrote: > On Sat, Dec 03, 2011 at 12:45:51PM +0100, Jan Kiszka wrote: I was referring to the relation between the IOCTL and kvmclock, but IOCTL vs. kvm_run. Jan >>> >>> Ah, OK. Yes, we better characterize it as KVMCLOCK specific (a generic >>> "guest is paused" command is not the scope of this patch). >>> >>> So appending KVMCLOCK_ to the ioctl definitions would make that more >>> explicit. >> >> IMHO, that would move things in the wrong direction. The IOCTL in itself >> has _nothing_ to do with kvmclock. It's just that its x86 backend is >> implemented on top of that infrastructure. For me the IOCTL is pretty >> generic, can be backed by kvmclock, but need not be on all future archs. >> >> Jan > > I do not see the need to lift this infrastructure to arch independent > status at the moment, without clear semantics on that arch independent > level. > > So I am fine with the current GUEST_PAUSED naming (which can later be > extended with GUEST_RESUMED etc, if necessary, for use by other archs > for example), and implementation in hw/kvmclock.c. > Yes, let's keep it as suggested last (addition of kvmclock, unchanged IOCTL interface). Jan signature.asc Description: OpenPGP digital signature
Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 12/05/2011 03:29 PM, Jan Kiszka wrote: > On 2011-12-05 14:14, Avi Kivity wrote: > > On 12/05/2011 02:47 PM, Jan Kiszka wrote: > >>> > >>> (the memory API added unstable names, hopefully the QOM can take over > >>> the stable ones and we'll have a good way to denote the unstable ones). > >>> > >> > >> OK, maybe - or likely - we should make those device models have the same > >> names in QOM once instantiated. But I'm still convinced they should > >> remain separated models in contrast to a single model with a property. > > > > What do you mean by separate models? You share all the code you can, > > and don't share the code you can't. To me, single model == single name. > > But different configuration. Right, just like IDE with different backends. > > > >> The kvm ioapic, e.g., requires an additional property (gsi_base) that is > >> meaningless for user space devices. And its interrupts have to be > >> wired&configured differently at board model level. So, from the QEMU > >> POV, it is a very different device. Just the guest does not notice. > > > > It's like qcow2 and raw/native IO are wire differently, or virtio-net > > and vhost-net. But it's the same IDE device or virtio NIC. > > That would mean introducing a backend/frontend concept for irqchips. We could do it, have one ioapic model with ioapic_ops->eoi_broadcast(). Most of the interfaces already dispatch dynamically (qdev gpio/irq) so there wouldn't be much more there. To me, how it's actually implemented is not important. What is important is that save/restore, the monitor, and the guest don't notice any changes. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 2011-12-05 14:14, Avi Kivity wrote: > On 12/05/2011 02:47 PM, Jan Kiszka wrote: >>> >>> (the memory API added unstable names, hopefully the QOM can take over >>> the stable ones and we'll have a good way to denote the unstable ones). >>> >> >> OK, maybe - or likely - we should make those device models have the same >> names in QOM once instantiated. But I'm still convinced they should >> remain separated models in contrast to a single model with a property. > > What do you mean by separate models? You share all the code you can, > and don't share the code you can't. To me, single model == single name. But different configuration. > >> The kvm ioapic, e.g., requires an additional property (gsi_base) that is >> meaningless for user space devices. And its interrupts have to be >> wired&configured differently at board model level. So, from the QEMU >> POV, it is a very different device. Just the guest does not notice. > > It's like qcow2 and raw/native IO are wire differently, or virtio-net > and vhost-net. But it's the same IDE device or virtio NIC. That would mean introducing a backend/frontend concept for irqchips. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: winXP "Standard PC" HAL and qemu-kvm >= 0.15
On 12/05/2011 11:21 AM, Michael Tokarev wrote: > As it turned out, a windowsXP machine does not work in > qemu-kvm >= 0.15 (it loses network and USB entirely) > if it is using "Standard PC" HAL. In 0.14 it worked > fine, but not in 0.14 (I haven't tried any in-between > versions yet). > > There are several HAL types available in winXP: these > are "Uniprocessor PC with MPS" (or Multiprocessor), > also two ACPI types, and "Standard PC". All the other > HAL types appears to work fine, but not "Standard PC". > > I haven't debugged further yet, -- because it were > not easy to find out what was causing the regression > and how to reproduce it, and also because I don't think > it is the right HAL for qemu-kvm guest anyway. It's not, but the regression indicates we broke something. It would be good to know what that is. > So, if anybody have some thoughts about this issue, > and especially if you know a way to switch winXP HAL > type to some ACPI variant without reinstalling, please > speak up.. ;) I remember doing it somewhere in device manager, perhaps in the processor entry. But it was years since I last did this. > Debian bugreport for a reference: http://bugs.debian.org/647312 > > Reproducer: install a winXP guest on kvm with -no-acpi so > it chooses an "Uniprocessor with MPS" HAL. Switch it to > "Standard PC" in device manager, reboot -- in 0.15+ it does > not work anymore, while in 0.14 it continues to work fine. Most likely non-ACPI interrupt routing. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 12/05/2011 02:47 PM, Jan Kiszka wrote: > > > > (the memory API added unstable names, hopefully the QOM can take over > > the stable ones and we'll have a good way to denote the unstable ones). > > > > OK, maybe - or likely - we should make those device models have the same > names in QOM once instantiated. But I'm still convinced they should > remain separated models in contrast to a single model with a property. What do you mean by separate models? You share all the code you can, and don't share the code you can't. To me, single model == single name. > The kvm ioapic, e.g., requires an additional property (gsi_base) that is > meaningless for user space devices. And its interrupts have to be > wired&configured differently at board model level. So, from the QEMU > POV, it is a very different device. Just the guest does not notice. It's like qcow2 and raw/native IO are wire differently, or virtio-net and vhost-net. But it's the same IDE device or virtio NIC. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3] Guest stop notification
On Sat, 03 Dec 2011, Jan Kiszka wrote: > On 2011-12-02 22:27, Eric B Munson wrote: > > On Fri, 02 Dec 2011, Jan Kiszka wrote: > > > >> On 2011-12-02 20:19, Eric B Munson wrote: > >>> Often when a guest is stopped from the qemu console, it will report > >>> spurious > >>> soft lockup warnings on resume. There are kernel patches being discussed > >>> that > >>> will give the host the ability to tell the guest that it is being stopped > >>> and > >>> should ignore the soft lockup warning that generates. > >>> > >>> Signed-off-by: Eric B Munson > >>> Cc: Avi Kivity > >>> Cc: Marcelo Tosatti > >>> Cc: Jan Kiszka > >>> Cc: ry...@linux.vnet.ibm.com > >>> Cc: aligu...@us.ibm.com > >>> Cc: kvm@vger.kernel.org > >>> > >>> --- > >>> Changes from V2: > >>> Move ioctl into hw/kvmclock.c so as other arches can use it as it is > >>> implemented > >>> > >>> Changes from V1: > >>> Remove unnecessary encapsulating function > >>> > >>> hw/kvmclock.c | 24 > >>> 1 files changed, 24 insertions(+), 0 deletions(-) > >>> > >>> diff --git a/hw/kvmclock.c b/hw/kvmclock.c > >>> index 5388bc4..756839f 100644 > >>> --- a/hw/kvmclock.c > >>> +++ b/hw/kvmclock.c > >>> @@ -16,6 +16,7 @@ > >>> #include "sysbus.h" > >>> #include "kvm.h" > >>> #include "kvmclock.h" > >>> +#include "cpu-all.h" > >>> > >>> #include > >>> #include > >>> @@ -69,11 +70,34 @@ static void kvmclock_vm_state_change(void *opaque, > >>> int running, > >>> } > >>> } > >>> > >>> +static void kvmclock_vm_state_change_vcpu(void *opaque, int running, > >>> + RunState state) > >>> +{ > >>> +int ret; > >>> +CPUState *penv = first_cpu; > >>> + > >>> +if (running) { > >>> + while (penv) { > >> > >> or: for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) { > >> > > > > Functionally equivalent and I see both in the code, is there a standard? > > Not really. I once tried to introduce an iterator macro, but it was > refused. The above is just more compact. > > But this is only a minor nit. > Fair enough, since there will be a V4 I will switch to the for loop. > > > >>> +ret = kvm_vcpu_ioctl(penv, KVM_GUEST_PAUSED, 0); > >>> +if (ret) { > >>> +if (ret != ENOSYS) { > >>> +fprintf(stderr, > >>> +"kvmclock_vm_state_change_vcpu: %s\n", > >>> +strerror(-ret)); > >>> +} > >>> +return; > >>> +} > >>> +penv = (CPUState *)penv->next_cpu; > >> > >> Unneeded cast. > >> > > > > Also following an example seen elsewhere. > > Generally, we try to avoid those pointless casts. > Will remove for V4. > > > >>> +} > >>> +} > >>> +} > >>> + > >> > >> Again: please use checkpatch.pl. > >> > > > > Sorry, tough to get used to hitting space bar that many times... > > > >>> static int kvmclock_init(SysBusDevice *dev) > >>> { > >>> KVMClockState *s = FROM_SYSBUS(KVMClockState, dev); > >>> > >>> qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s); > >>> +qemu_add_vm_change_state_handler(kvmclock_vm_state_change_vcpu, > >>> NULL); > >>> return 0; > >>> } > >>> > >> > >> Why not extend the existing handler? > > > > Because the new handler doesn't touch the KVMClockState object. If this is > > preferred, I have no objection. > > The separate registration looks strange to me. And the fact that you > don't need to object doesn't justify a callback of its own. > I think you misunderstood me, I meant I have no object to doign it your way if you have a strong opinion (as it seems you do). > > > >> > >> I still wonder if the IOCTL interface is actually kvmclock specific. But > >> Marcello asked for this, and we could still change it when some arch > >> comes around that provides it independent of kvmclock. > > > > The flag itself is stored in the pvclock_vcpu_time_info structure, and > > anything > > else that touches that structure uses ioctls. > > That's the host-guest interface. But I'm talking about the kvm-qemu > interface here which has no relation to how the "was paused" information > is transferred to the guest. > > Jan > signature.asc Description: Digital signature
Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 2011-12-05 13:36, Avi Kivity wrote: > On 12/05/2011 01:37 PM, Jan Kiszka wrote: >> On 2011-12-05 11:01, Avi Kivity wrote: >>> On 12/04/2011 11:38 PM, Jan Kiszka wrote: > > It should be also possible to migrate from non-KVM device to KVM > version, different names would prevent that for ever. It is (theoretically) possible with these patches as the vmstate names are the same. KVM to TCG migration does not work right now, so I was only able to test in-kernel <-> user space irqchip model migrations. >>> >>> btw, for the next-gen migration protocol, we'd probably be using QOM >>> paths, not vmstate names; the QOM paths would include the device name? >> >> That would be a very bad idea IMHO. Every refactoring of your device >> tree, e.g. to model CPU hotplug and the ICC bus more accurately, would >> risk to create a migration crack. > > At some point, something has to be stable. We can't have an infinite > number of layers giving names to things. I propose we have just one layer. > >> At least we would need some stable >> naming and/or alias concept then. > > We should be able to transform a path to backward compatible names, > yes. But if something has an unstable name, let's omit it in the first > place. > > (the memory API added unstable names, hopefully the QOM can take over > the stable ones and we'll have a good way to denote the unstable ones). > OK, maybe - or likely - we should make those device models have the same names in QOM once instantiated. But I'm still convinced they should remain separated models in contrast to a single model with a property. The kvm ioapic, e.g., requires an additional property (gsi_base) that is meaningless for user space devices. And its interrupts have to be wired&configured differently at board model level. So, from the QEMU POV, it is a very different device. Just the guest does not notice. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 12/05/2011 01:37 PM, Jan Kiszka wrote: > On 2011-12-05 11:01, Avi Kivity wrote: > > On 12/04/2011 11:38 PM, Jan Kiszka wrote: > >>> > >>> It should be also possible to migrate from non-KVM device to KVM > >>> version, different names would prevent that for ever. > >> > >> It is (theoretically) possible with these patches as the vmstate names > >> are the same. KVM to TCG migration does not work right now, so I was > >> only able to test in-kernel <-> user space irqchip model migrations. > > > > btw, for the next-gen migration protocol, we'd probably be using QOM > > paths, not vmstate names; the QOM paths would include the device name? > > That would be a very bad idea IMHO. Every refactoring of your device > tree, e.g. to model CPU hotplug and the ICC bus more accurately, would > risk to create a migration crack. At some point, something has to be stable. We can't have an infinite number of layers giving names to things. I propose we have just one layer. > At least we would need some stable > naming and/or alias concept then. We should be able to transform a path to backward compatible names, yes. But if something has an unstable name, let's omit it in the first place. (the memory API added unstable names, hopefully the QOM can take over the stable ones and we'll have a good way to denote the unstable ones). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] [autotest] client.tests.cgroup: Replace LoadPerCpu() by get_load_per_cpu
* Move LoadPerCpu into cgroup_common.py (cgroup-kvm will need it too) * [FIX] Use etraceback * Code cleanup --- client/tests/cgroup/cgroup.py| 79 ++ client/tests/cgroup/cgroup_common.py | 22 + 2 files changed, 35 insertions(+), 66 deletions(-) diff --git a/client/tests/cgroup/cgroup.py b/client/tests/cgroup/cgroup.py index 207a0d7..000e562 100755 --- a/client/tests/cgroup/cgroup.py +++ b/client/tests/cgroup/cgroup.py @@ -12,9 +12,7 @@ from tempfile import NamedTemporaryFile from autotest_lib.client.bin import test, utils from autotest_lib.client.common_lib import error -from cgroup_common import Cgroup as CG -from cgroup_common import CgroupModules -from cgroup_common import _traceback +from cgroup_common import Cgroup, CgroupModules, get_load_per_cpu class cgroup(test.test): """ @@ -48,7 +46,7 @@ class cgroup(test.test): logging.info("---< 'test_%s' FAILED >---", subtest) except Exception: err += "%s, " % subtest -tb = _traceback("test_%s" % subtest, sys.exc_info()) +tb = utils.etraceback("test_%s" % subtest, sys.exc_info()) logging.error("test_%s: FAILED%s", subtest, tb) logging.info("---< 'test_%s' FAILED >---", subtest) @@ -75,7 +73,6 @@ class cgroup(test.test): def cleanup(self): """ Cleanup """ logging.debug('cgroup_test cleanup') -print "Cleanup" del (self.modules) @@ -102,7 +99,7 @@ class cgroup(test.test): raise error.TestFail("Some parts of cleanup failed%s" % err) # Preparation -item = CG('memory', self._client) +item = Cgroup('memory', self._client) item.initialize(self.modules) item.smoke_test() pwd = item.mk_cgroup() @@ -116,8 +113,8 @@ class cgroup(test.test): mem = min(int(mem.split()[1])/1024, 1024) mem = max(mem, 100) # at least 100M try: -memsw_limit_bytes = item.get_property("memory.memsw.limit_in_bytes") -except error.TestFail: +item.get_property("memory.memsw.limit_in_bytes") +except error.TestError: # Doesn't support memsw limitation -> disabling logging.info("System does not support 'memsw'") utils.system("swapoff -a") @@ -222,7 +219,8 @@ class cgroup(test.test): logging.debug("test_memory: Memfill mem + swap limit") ps = item.test("memfill %d %s" % (mem, outf.name)) item.set_cgroup(ps.pid, pwd) -item.set_property_h("memory.memsw.limit_in_bytes", "%dM"%(mem/2), pwd) +item.set_property_h("memory.memsw.limit_in_bytes", "%dM"%(mem/2), +pwd) ps.stdin.write('\n') i = 0 while ps.poll() == None: @@ -266,56 +264,6 @@ class cgroup(test.test): Cpuset test 1) Initiate CPU load on CPU0, than spread into CPU* - CPU0 """ -class LoadPerCpu: -""" -Handles the LoadPerCpu stats -self.values [cpus, cpu0, cpu1, ...] -""" -def __init__(self): -""" -Init -""" -self.values = [] -self.stat = open('/proc/stat', 'r') -line = self.stat.readline() -while line: -if line.startswith('cpu'): -self.values.append(int(line.split()[1])) -else: -break -line = self.stat.readline() - -def reload(self): -""" -Reload current values -""" -self.values = self.get() - -def get(self): -""" -Get the current values -@return vals: array of current values [cpus, cpu0, cpu1..] -""" -self.stat.seek(0) -self.stat.flush() -vals = [] -for _ in range(len(self.values)): -vals.append(int(self.stat.readline().split()[1])) -return vals - -def tick(self): -""" -Reload values and returns the load between the last tick/reload -@return vals: array of load between ticks/reloads - values [cpus, cpu0, cpu1..] -""" -vals = self.get() -ret = [] -for i in range(len(self.values)): -ret.append(vals[i] - self.values[i]) -self.values = vals -return ret - def cleanup(supress=False): """ cleanup """ logging.debug("test_cpuset: Cleanup") @@ -341,7 +289,7 @@ class cgroup(test.test): raise error.TestFail("Some par
[PATCH 3/3] [kvm-autotest] tests.cgroup: Add TestCpusetCpusSwitching
Tests the cpuset.cpus cgroup feature. It stresses all VM's CPUs while switching between cgroups with different setting. Signed-off-by: Lukas Doktor --- client/tests/cgroup/cgroup_common.py |4 + client/tests/kvm/tests/cgroup.py | 108 +- 2 files changed, 109 insertions(+), 3 deletions(-) diff --git a/client/tests/cgroup/cgroup_common.py b/client/tests/cgroup/cgroup_common.py index fe1601b..56856c0 100755 --- a/client/tests/cgroup/cgroup_common.py +++ b/client/tests/cgroup/cgroup_common.py @@ -105,6 +105,8 @@ class Cgroup(object): @param pwd: cgroup directory @return: 0 when is 'pwd' member """ +if isinstance(pwd, int): +pwd = self.cgroups[pwd] if open(pwd + '/tasks').readlines().count("%d\n" % pid) > 0: return 0 else: @@ -126,6 +128,8 @@ class Cgroup(object): @param pid: pid of the process @param pwd: cgroup directory """ +if isinstance(pwd, int): +pwd = self.cgroups[pwd] try: open(pwd+'/tasks', 'w').write(str(pid)) except Exception, inst: diff --git a/client/tests/kvm/tests/cgroup.py b/client/tests/kvm/tests/cgroup.py index 23ae622..2e18ef7 100644 --- a/client/tests/kvm/tests/cgroup.py +++ b/client/tests/kvm/tests/cgroup.py @@ -51,13 +51,12 @@ def run_cgroup(test, params, env): @param cgroup: cgroup handler @param pwd: desired cgroup's pwd, cgroup index or None for root cgroup """ -if isinstance(pwd, int): -pwd = cgroup.cgroups[pwd] cgroup.set_cgroup(vm.get_shell_pid(), pwd) for pid in utils.get_children_pids(vm.get_shell_pid()): cgroup.set_cgroup(int(pid), pwd) + def distance(actual, reference): """ Absolute value of relative distance of two numbers @@ -1341,7 +1340,7 @@ def run_cgroup(test, params, env): except Exception, failure_detail: err += "\nCan't remove Cgroup: %s" % failure_detail -self.sessions[0].sendline('rm -f /tmp/cgroup-cpu-lock') +self.sessions[-1].sendline('rm -f /tmp/cgroup-cpu-lock') for i in range(len(self.sessions)): try: self.sessions[i].close() @@ -1381,6 +1380,7 @@ def run_cgroup(test, params, env): self.sessions.append(self.vm.wait_for_login(timeout=30)) self.sessions[i].cmd("touch /tmp/cgroup-cpu-lock") self.sessions[i].sendline(cmd) +self.sessions.append(self.vm.wait_for_login(timeout=30)) # cleanup def run(self): @@ -1485,8 +1485,109 @@ def run_cgroup(test, params, env): logging.error(err) raise error.TestFail(err) +logging.info("Test passed successfully") return ("All clear") + +class TestCpusetCpusSwitching: +""" +Tests the cpuset.cpus cgroup feature. It stresses all VM's CPUs +while switching between cgroups with different setting. +""" +def __init__(self, vms, modules): +""" +Initialization +@param vms: list of vms +@param modules: initialized cgroup module class +""" +self.vm = vms[0] # Virt machines +self.modules = modules # cgroup module handler +self.cgroup = Cgroup('cpuset', '') # cgroup handler +self.sessions = [] + + +def cleanup(self): +""" Cleanup """ +err = "" +try: +del(self.cgroup) +except Exception, failure_detail: +err += "\nCan't remove Cgroup: %s" % failure_detail + +self.sessions[-1].sendline('rm -f /tmp/cgroup-cpu-lock') +for i in range(len(self.sessions)): +try: +self.sessions[i].close() +except Exception, failure_detail: +err += ("\nCan't close the %dst ssh connection" % i) + +if err: +logging.error("Some cleanup operations failed: %s", err) +raise error.TestError("Some cleanup operations failed: %s" % + err) + + +def init(self): +""" +Prepares cgroup, moves VM into it and execute stressers. +""" +self.cgroup.initialize(self.modules) +vm_cpus = int(params.get('smp', 1)) +all_cpus = self.cgroup.get_property("cpuset.cpus")[0] +if all_cpus == "0": +raise error.TestFail("This test needs at least 2 CPUs on " + "host, cpuset=%s" % all_cpus) +try: +last_cpu = int(all_cpus.split('-')[1]) +except Exception: +raise error.TestFail("Failed to get #CPU from root cgroup.") + +# Comments ar
[PATCH 2/3] [kvm-autotest] tests.cgroup: Add TestCpusetCpus test
Tests the cpuset.cpus cgroup feature. It stresses all VM's CPUs and changes the CPU affinity. Verifies correct behaviour. * Add TestCpusetCpus test * import cleanup * private function names cleanup --- client/tests/cgroup/cgroup_common.py |2 + client/tests/kvm/tests/cgroup.py | 211 +++--- 2 files changed, 194 insertions(+), 19 deletions(-) diff --git a/client/tests/cgroup/cgroup_common.py b/client/tests/cgroup/cgroup_common.py index 186bf09..fe1601b 100755 --- a/client/tests/cgroup/cgroup_common.py +++ b/client/tests/cgroup/cgroup_common.py @@ -152,6 +152,8 @@ class Cgroup(object): """ if pwd == None: pwd = self.root +if isinstance(pwd, int): +pwd = self.cgroups[pwd] try: # Remove tailing '\n' from each line ret = [_[:-1] for _ in open(pwd+prop, 'r').readlines()] diff --git a/client/tests/kvm/tests/cgroup.py b/client/tests/kvm/tests/cgroup.py index ee6ef2e..23ae622 100644 --- a/client/tests/kvm/tests/cgroup.py +++ b/client/tests/kvm/tests/cgroup.py @@ -7,8 +7,8 @@ import logging, os, re, sys, tempfile, time from random import random from autotest_lib.client.common_lib import error from autotest_lib.client.bin import utils -from autotest_lib.client.tests.cgroup.cgroup_common import Cgroup, CgroupModules -from autotest_lib.client.virt import virt_utils, virt_env_process +from autotest_lib.client.tests.cgroup.cgroup_common import (Cgroup, +CgroupModules, get_load_per_cpu) from autotest_lib.client.virt.aexpect import ExpectTimeoutError from autotest_lib.client.virt.aexpect import ExpectProcessTerminatedError @@ -839,7 +839,7 @@ def run_cgroup(test, params, env): * Freezes the guest and thaws it again couple of times * verifies that guest is frozen and runs when expected """ -def get_stat(pid): +def _get_stat(pid): """ Gather statistics of pid+1st level subprocesses cpu usage @param pid: PID of the desired process @@ -877,9 +877,9 @@ def run_cgroup(test, params, env): _ = cgroup.get_property('freezer.state', cgroup.cgroups[0]) if 'FROZEN' not in _: raise error.TestFail("Couldn't freeze the VM: state %s" % _) -stat_ = get_stat(pid) +stat_ = _get_stat(pid) time.sleep(tsttime) -stat = get_stat(pid) +stat = _get_stat(pid) if stat != stat_: raise error.TestFail('Process was running in FROZEN state; ' 'stat=%s, stat_=%s, diff=%s' % @@ -887,9 +887,9 @@ def run_cgroup(test, params, env): logging.info("THAWING (%ss)", tsttime) self.cgroup.set_property('freezer.state', 'THAWED', self.cgroup.cgroups[0]) -stat_ = get_stat(pid) +stat_ = _get_stat(pid) time.sleep(tsttime) -stat = get_stat(pid) +stat = _get_stat(pid) if (stat - stat_) < (90*tsttime): raise error.TestFail('Process was not active in FROZEN' 'state; stat=%s, stat_=%s, diff=%s' % @@ -1186,7 +1186,7 @@ def run_cgroup(test, params, env): Let each of 3 scenerios (described in test specification) stabilize and then measure the CPU utilisation for time_test time. """ -def get_stat(f_stats, _stats=None): +def _get_stat(f_stats, _stats=None): """ Reads CPU times from f_stats[] files and sumarize them. """ if _stats is None: _stats = [] @@ -1218,27 +1218,27 @@ def run_cgroup(test, params, env): for thread_count in range(0, host_cpus): sessions[thread_count].sendline(cmd) time.sleep(time_init) -_stats = get_stat(f_stats) +_stats = _get_stat(f_stats) time.sleep(time_test) -stats.append(get_stat(f_stats, _stats)) +stats.append(_get_stat(f_stats, _stats)) thread_count += 1 sessions[thread_count].sendline(cmd) if host_cpus % no_speeds == 0 and no_speeds <= host_cpus: time.sleep(time_init) -_stats = get_stat(f_stats) +_stats = _get_stat(f_stats) time.sleep(time_test) -stats.append(get_stat(f_stats, _stats)) +stats.append(_get_stat(f_stats, _stats)) for i in range(thread_count+1, no_threads): sessions[i].sendline(cmd) time.sleep(time_init) -_stats = get_stat(f_stats) +_stats = _get_stat(f_stats) for j in r
[kvm-autotest] tests.cgroup: Add 2 new tests of cpuset.cpus cgroup functionality
Hi, This patchset fixes some issues in cgroup_common.py library and adds 2 new tests to cgroup-kvm test. Please find the details in each patch. Sent to upstream as pull req. 103: https://github.com/autotest/autotest/pull/103 Regards, Lukáš -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 2011-12-05 11:01, Avi Kivity wrote: > On 12/04/2011 11:38 PM, Jan Kiszka wrote: >>> >>> It should be also possible to migrate from non-KVM device to KVM >>> version, different names would prevent that for ever. >> >> It is (theoretically) possible with these patches as the vmstate names >> are the same. KVM to TCG migration does not work right now, so I was >> only able to test in-kernel <-> user space irqchip model migrations. > > btw, for the next-gen migration protocol, we'd probably be using QOM > paths, not vmstate names; the QOM paths would include the device name? That would be a very bad idea IMHO. Every refactoring of your device tree, e.g. to model CPU hotplug and the ICC bus more accurately, would risk to create a migration crack. At least we would need some stable naming and/or alias concept then. Jan signature.asc Description: OpenPGP digital signature
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang wrote: > +static int virtnet_set_fd(struct net_device *dev, u32 pfn) > +{ > + struct virtnet_info *vi = netdev_priv(dev); > + struct virtio_device *vdev = vi->vdev; > + > + if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) { > + vdev->config->set(vdev, > + offsetof(struct virtio_net_config_fd, addr), > + &pfn, sizeof(u32)); Please use the virtio model (i.e. virtqueues) instead of shared memory. Mapping a page breaks the virtio abstraction. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC][PATCH 02/16] kvm: Move kvmclock into hw/kvm folder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Am 03.12.2011 23:33, schrieb Jan Kiszka: > On 2011-12-03 20:00, Andreas Färber wrote: >> Am 03.12.2011 12:17, schrieb Jan Kiszka: >>> diff --git a/hw/kvmclock.c b/hw/kvm/clock.c similarity index >>> 96% rename from hw/kvmclock.c rename to hw/kvm/clock.c index >>> 5388bc4..aa37c5d 100644 --- a/hw/kvmclock.c +++ >>> b/hw/kvm/clock.c @@ -11,11 +11,11 @@ * */ >>> >>> -#include "qemu-common.h" -#include "sysemu.h" -#include >>> "sysbus.h" -#include "kvm.h" -#include "kvmclock.h" +#include >>> +#include +#include >>> +#include +#include >>> >>> #include #include >> >> Please don't start using system includes for everything. Rather >> extend QEMU_CFLAGS to contain the right user include path(s). > > No problem - and no need to tweak any CFLAGS Right, I had recursion into kvm/ in mind - would've required -I ../.. to be added to CFLAGS. > ("" only adds . to the header search paths). By default that is. -iquote can add further paths. (Unfortunately didn't solve the Cocoa Block.h vs. block.h problem since Objective-C frameworks use quotes, too.) > Do we have a convention that every include in <> is considered > system header? Should probably be documented then (and code should > be converted gradually). The convention I perceived was that everything QEMU was in quotes whereas POSIX, Linux, zlib, glib, etc. were in angle brackets. Didn't check for documentation. Andreas - -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.18 (GNU/Linux) iQIcBAEBAgAGBQJO3KA6AAoJEPou0S0+fgE/izQP/1q0Oje72FdXyUyVxPZw2Ypi zp+2TFYJ3FJUrTLkkDBjmsaMT0sdIoI/wXxDTrrif9QI1gfRhNlxw9qES+En4xDG 3ClCl6UMNrcq35WrejIvPOXQMvVH6tTnliHBKmG6TSsQXPEFLS/BbWA1Y3gV7nZ4 KXmMHdNqVzmo66AU0FGQPSZyE/u+w8PKnfOIea961tMFtYodny69lzuoBWIaC/oT 8neCRT6U4BVX6hEy6QgY1651IM0KUOUC0fbBwFMwiy+NeL5KgB+GWsrnVq+U0hpM gDtE09L1IKzuppMLlsx1DmxAZYHX12ZlW5W3np13+qDOkFx+4JqT3AU1MGBDhVQ+ ylbYXAINpcXsV8hTyCv1xoWlCJTUreD5+vVgAe5IN3jJUuXttR867YZHS6w0Xkh2 saTYRdkaywNpb9Jm/8RdP0Nepjq2YKdjP99/Da5/GOlVBOqASycKmtAyKQKerhAx 2n+Os8Ekji9fLM7S1FFWe2i/v/bUiVKb9TPRw98tDaDd9V0RW2AkBrJcL2BlFBC4 nqM57ndpv3phGLbVoin2yo32P6iTqL/bS7iyJap+IeklSzxSyW0bBcJyT0oIZMQ2 TdeZNSS2aF9+SmIp91aNRIWhXDAZGggls5AvrS3FTbyzY0jb4HXLIYVGyLCdzfar uHBpp0n3XZsqieTYP+f0 =zA/a -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html