Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Jason Wang

On 12/06/2011 04:42 AM, Ben Hutchings wrote:

On Mon, 2011-12-05 at 16:59 +0800, Jason Wang wrote:

In order to let the packets of a flow to be passed to the desired
guest cpu, we can co-operate with devices through programming the flow
director which was just a hash to queue table.

This kinds of co-operation is done through the accelerate RFS support,
a device specific flow sterring method virtnet_fd() is used to modify
the flow director based on rfs mapping. The desired queue were
calculated through reverse mapping of the irq affinity table. In order
to parallelize the ingress path, irq affinity of rx queue were also
provides by the driver.

In addition to accelerate RFS, we can also use the guest scheduler to
balance the load of TX and reduce the lock contention on egress path,
so the processor_id() were used to tx queue selection.

[...]

+#ifdef CONFIG_RFS_ACCEL
+
+int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb,
+  u16 rxq_index, u32 flow_id)
+{
+   struct virtnet_info *vi = netdev_priv(net_dev);
+   u16 *table = NULL;
+
+   if (skb->protocol != htons(ETH_P_IP) || !skb->rxhash)
+   return -EPROTONOSUPPORT;

Why only IPv4?


Oops, IPv6 should work also.

+   table = kmap_atomic(vi->fd_page);
+   table[skb->rxhash&  TAP_HASH_MASK] = rxq_index;
+   kunmap_atomic(table);
+
+   return 0;
+}
+#endif

This is not a proper implementation of ndo_rx_flow_steer.  If you steer
a flow by changing the RSS table this can easily cause packet reordering
in other flows.  The filtering should be more precise, ideally matching
exactly a single flow by e.g. VID and IP 5-tuple.

I think you need to add a second hash table which records exactly which
flow is supposed to be steered.  Also, you must call
rps_may_expire_flow() to check whether an entry in this table may be
replaced; otherwise you can cause packet reordering in the flow that was
previously being steered.

Finally, this function must return the table index it assigned, so that
rps_may_expire_flow() works.


Thanks for the explanation, how about document this briefly in scaling.txt?

+static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+   int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
+  smp_processor_id();
+
+   /* As we make use of the accelerate rfs which let the scheduler to
+* balance the load, it make sense to choose the tx queue also based on
+* theprocessor id?
+*/
+   while (unlikely(txq>= dev->real_num_tx_queues))
+   txq -= dev->real_num_tx_queues;
+   return txq;
+}

[...]

Don't do this, let XPS handle it.

Ben.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 2/5] tuntap: simple flow director support

2011-12-05 Thread Jason Wang

On 12/06/2011 04:09 AM, Ben Hutchings wrote:

On Mon, 2011-12-05 at 16:58 +0800, Jason Wang wrote:

This patch adds a simple flow director to tun/tap device. It is just a
page that contains the hash to queue mapping which could be changed by
user-space. The backend (tap/macvtap) would query this table to get
the desired queue of a packets when it send packets to userspace.

This is just flow hashing (RSS), not flow steering.


The page address were set through a new kind of ioctl - TUNSETFD and
were pinned until device exit or another new page were specified.

[...]

You should implement ethtool ETHTOOL_{G,S}RXFHINDIR instead.

Ben.



I'm not fully understanding this. The page belongs to guest, and the 
idea is to let guest driver can easily change any entry. Looks like if 
ethtool_set_rxfh_indir() is used, this kind of change is not easy as it 
needs one copy and can only accept the whole table as its parameters.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V3 4/4] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

2011-12-05 Thread Raghavendra K T

On 12/06/2011 08:57 AM, Konrad Rzeszutek Wilk wrote:

+static inline void add_stats(enum kvm_contention_stat var, int val)


You probably want 'int val' to be 'u32 val' as that is the type
in contention_stats.



Yes. Thanks for pointing, as its cumulative.   It is indeed u32 in #else 
:).I 'll change that.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance

2011-12-05 Thread Liu ping fan
On Mon, Dec 5, 2011 at 4:41 PM, Gleb Natapov  wrote:
> On Mon, Dec 05, 2011 at 01:39:37PM +0800, Liu ping fan wrote:
>> On Sun, Dec 4, 2011 at 8:10 PM, Gleb Natapov  wrote:
>> > On Sun, Dec 04, 2011 at 07:53:37PM +0800, Liu ping fan wrote:
>> >> On Sat, Dec 3, 2011 at 2:26 AM, Jan Kiszka  wrote:
>> >> > On 2011-12-02 07:26, Liu Ping Fan wrote:
>> >> >> From: Liu Ping Fan 
>> >> >>
>> >> >> Currently, vcpu can be destructed only when kvm instance destroyed.
>> >> >> Change this to vcpu's destruction taken when its refcnt is zero,
>> >> >> and then vcpu MUST and CAN be destroyed before kvm's destroy.
>> >> >
>> >> > I'm lacking the big picture yet (would be good to have in the change log
>> >> > - at least I'm too lazy to read the code):
>> >> >
>> >> > What increments the refcnt, what decrements it again? IOW, how does user
>> >> > space controls the life-cycle of a vcpu after your changes?
>> >> >
>> >> In local APIC mode, delivering IPI to target APIC, target's refcnt is
>> >> incremented, and decremented when finished. At other times, using RCU to
>> > Why is this needed?
>> >
>> Suppose the following scene:
>>
>> #define kvm_for_each_vcpu(idx, vcpup, kvm) \
>>         for (idx = 0; \
>>              idx < atomic_read(&kvm->online_vcpus) && \
>>              (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
>>              idx++)
>>
>> -->
>> Here kvm_vcpu's destruction is called
>>               vcpup->vcpu_id ...  //oops!
>>
>>
> And this is exactly how your code looks. i.e you do not increment
> reference count in most of the loops, you only increment it twice
> (in pic_unlock() and kvm_irq_delivery_to_apic()) because you are using
> vcpu outside of rcu_read_lock() protected section and I do not see why
> not just extend protected section to include kvm_vcpu_kick(). As far as
> I can see this function does not sleep.
>
:-), I just want to minimize the RCU critical area, and as you say, we
can  extend protected section to include kvm_vcpu_kick()

> What should protect vcpu from disappearing in your example above is RCU
> itself if you are using it right. But since I do not see any calls to
> rcu_assign_pointer()/rcu_dereference() I doubt you are using it right
> actually.
>
Sorry, but I thought it would not be. Please help me to check my thoughts :

struct kvm_vcpu *kvm_vcpu_get(struct kvm_vcpu *vcpu)
{
if (vcpu == NULL)
return NULL;
if (atomic_add_unless(&vcpu->refcount, 1, 0))
--increment
return vcpu;
return NULL;
}

void kvm_vcpu_put(struct kvm_vcpu *vcpu)
{
struct kvm *kvm;
if (atomic_dec_and_test(&vcpu->refcount)) {
--decrement
kvm = vcpu->kvm;
mutex_lock(&kvm->lock);
kvm->vcpus[vcpu->vcpu_id] = NULL;
atomic_dec(&kvm->online_vcpus);
mutex_unlock(&kvm->lock);
call_rcu(&vcpu->head, kvm_vcpu_zap);
}
}

The atomic of decrement and increment are protected by cache coherent protocol.
So once we hold a valid kvm_vcpu pointer through kvm_vcpu_get(),
we will always keep it until we release it, then, the destruction may happen.

Thanks and regards,
ping fan

> --
>                        Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Jason Wang

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang  wrote:

+static int virtnet_set_fd(struct net_device *dev, u32 pfn)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   struct virtio_device *vdev = vi->vdev;
+
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
+   vdev->config->set(vdev,
+ offsetof(struct virtio_net_config_fd, addr),
+&pfn, sizeof(u32));

Please use the virtio model (i.e. virtqueues) instead of shared
memory.  Mapping a page breaks the virtio abstraction.


Using control virtqueue is more suitable but there's are also some problems:

One problem is the interface,  if we use control virtqueue, we need a 
interface between the backend and tap/macvtap to change the flow 
mapping. But qemu and vhost_net only know about the file descriptor, 
more informations or interfaces need to be exposed in order to let 
ethtool or ioctl work.


Another problem is the delay introduced by ctrl vq, as the ctrl vq would 
be used in the critical path in guest and it use busy wait to get the 
response, the delay is not neglectable.



Stefan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/13] KVM: PPC: Allow for read-only pages backing a Book3S HV guest

2011-12-05 Thread Paul Mackerras
With this, if a guest does an H_ENTER with a read/write HPTE on a page
which is currently read-only, we make the actual HPTE inserted be a
read-only version of the HPTE.  We now intercept protection faults as
well as HPTE not found faults, and for a protection fault we work out
whether it should be reflected to the guest (e.g. because the guest HPTE
didn't allow write access to usermode) or handled by switching to
kernel context and calling kvmppc_book3s_hv_page_fault, which will then
request write access to the page and update the actual HPTE.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   20 -
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |   33 +++--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   32 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |4 +-
 4 files changed, 72 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 75a1b42..37755d0 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -115,6 +115,22 @@ static inline unsigned long hpte_rpn(unsigned long ptel, 
unsigned long psize)
return ((ptel & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT;
 }
 
+static inline int hpte_is_writable(unsigned long ptel)
+{
+   unsigned long pp = ptel & (HPTE_R_PP0 | HPTE_R_PP);
+
+   return pp != PP_RXRX && pp != PP_RXXX;
+}
+
+static inline unsigned long hpte_make_readonly(unsigned long ptel)
+{
+   if ((ptel & HPTE_R_PP0) || (ptel & HPTE_R_PP) == PP_RWXX)
+   ptel = (ptel & ~HPTE_R_PP) | PP_RXXX;
+   else
+   ptel |= PP_RXRX;
+   return ptel;
+}
+
 static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long 
io_type)
 {
unsigned int wimg = ptel & HPTE_R_WIMG;
@@ -134,7 +150,7 @@ static inline int hpte_cache_flags_ok(unsigned long ptel, 
unsigned long io_type)
  * Lock and read a linux PTE.  If it's present and writable, atomically
  * set dirty and referenced bits and return the PTE, otherwise return 0.
  */
-static inline pte_t kvmppc_read_update_linux_pte(pte_t *p)
+static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing)
 {
pte_t pte, tmp;
 
@@ -152,7 +168,7 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t *p)
 
if (pte_present(pte)) {
pte = pte_mkyoung(pte);
-   if (pte_write(pte))
+   if (writing && pte_write(pte))
pte = pte_mkdirty(pte);
}
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 6919d99..b1b31c7 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -502,6 +502,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
struct page *page, *pages[1];
long index, ret, npages;
unsigned long is_io;
+   unsigned int writing, write_ok;
struct vm_area_struct *vma;
 
/*
@@ -552,8 +553,11 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
pfn = 0;
page = NULL;
pte_size = PAGE_SIZE;
+   writing = (dsisr & DSISR_ISSTORE) != 0;
+   /* If writing != 0, then the HPTE must allow writing, if we get here */
+   write_ok = writing;
hva = gfn_to_hva_memslot(memslot, gfn);
-   npages = get_user_pages_fast(hva, 1, 1, pages);
+   npages = get_user_pages_fast(hva, 1, writing, pages);
if (npages < 1) {
/* Check if it's an I/O mapping */
down_read(¤t->mm->mmap_sem);
@@ -564,6 +568,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
((hva - vma->vm_start) >> PAGE_SHIFT);
pte_size = psize;
is_io = hpte_cache_bits(pgprot_val(vma->vm_page_prot));
+   write_ok = vma->vm_flags & VM_WRITE;
}
up_read(¤t->mm->mmap_sem);
if (!pfn)
@@ -574,6 +579,18 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
page = compound_head(page);
pte_size <<= compound_order(page);
}
+   /* if the guest wants write access, see if that is OK */
+   if (!writing && hpte_is_writable(hpte[2])) {
+   pte_t *ptep, pte;
+
+   ptep = find_linux_pte_or_hugepte(current->mm->pgd,
+hva, NULL);
+   if (ptep && pte_present(*ptep)) {
+   pte = kvmppc_read_update_linux_pte(ptep, 1);
+   if (pte_write(pte))
+   write_ok = 1;
+   }
+   }
pfn = page_to_pfn(pa

[PATCH 12/13] KVM: PPC: Implement MMU notifiers for Book3S HV guests

2011-12-05 Thread Paul Mackerras
This adds the infrastructure to enable us to page out pages underneath
a Book3S HV guest, on processors that support virtualized partition
memory, that is, POWER7.  Instead of pinning all the guest's pages,
we now look in the host userspace Linux page tables to find the
mapping for a given guest page.  Then, if the userspace Linux PTE
gets invalidated, kvm_unmap_hva() gets called for that address, and
we replace all the guest HPTEs that refer to that page with absent
HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit
set, which will cause an HDSI when the guest tries to access them.
Finally, the page fault handler is extended to reinstantiate the
guest HPTE when the guest tries to access a page which has been paged
out.

Since we can't intercept the guest DSI and ISI interrupts on PPC970,
we still have to pin all the guest pages on PPC970.  We have a new flag,
kvm->arch.using_mmu_notifiers, that indicates whether we can page
guest pages out.  If it is not set, the MMU notifier callbacks do
nothing and everything operates as before.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s.h|4 +
 arch/powerpc/include/asm/kvm_book3s_64.h |   31 
 arch/powerpc/include/asm/kvm_host.h  |   16 ++
 arch/powerpc/include/asm/reg.h   |3 +
 arch/powerpc/kvm/Kconfig |1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  268 --
 arch/powerpc/kvm/book3s_hv.c |   25 ++--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  140 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |   49 ++
 arch/powerpc/kvm/powerpc.c   |3 +
 arch/powerpc/mm/hugetlbpage.c|2 +
 11 files changed, 483 insertions(+), 59 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 5ac53f9..72688d8 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -145,6 +145,10 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct 
kvmppc_bat *bat,
 extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr);
 extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu 
*vcpu);
 extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
+   unsigned long *rmap, long pte_index, int realmode);
+extern void kvmppc_invalidate_hpte(struct kvm *kvm, unsigned long *hptep,
+   unsigned long pte_index);
 extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr,
unsigned long *nb_ret);
 extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr);
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 9a59b6d..75a1b42 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -130,6 +130,37 @@ static inline int hpte_cache_flags_ok(unsigned long ptel, 
unsigned long io_type)
return (wimg & (HPTE_R_W | HPTE_R_I)) == io_type;
 }
 
+/*
+ * Lock and read a linux PTE.  If it's present and writable, atomically
+ * set dirty and referenced bits and return the PTE, otherwise return 0.
+ */
+static inline pte_t kvmppc_read_update_linux_pte(pte_t *p)
+{
+   pte_t pte, tmp;
+
+   /* wait until _PAGE_BUSY is clear then set it atomically */
+   __asm__ __volatile__ (
+   "1: ldarx   %0,0,%3\n"
+   "   andi.   %1,%0,%4\n"
+   "   bne-1b\n"
+   "   ori %1,%0,%4\n"
+   "   stdcx.  %1,0,%3\n"
+   "   bne-1b"
+   : "=&r" (pte), "=&r" (tmp), "=m" (*p)
+   : "r" (p), "i" (_PAGE_BUSY)
+   : "cc");
+
+   if (pte_present(pte)) {
+   pte = pte_mkyoung(pte);
+   if (pte_write(pte))
+   pte = pte_mkdirty(pte);
+   }
+
+   *p = pte;   /* clears _PAGE_BUSY */
+
+   return pte;
+}
+
 /* Return HPTE cache control bits corresponding to Linux pte bits */
 static inline unsigned long hpte_cache_bits(unsigned long pte_val)
 {
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index c9c92f0..eb20ddc 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define KVM_MAX_VCPUS  NR_CPUS
 #define KVM_MAX_VCORES NR_CPUS
@@ -43,6 +44,19 @@
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 #endif
 
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+#include 
+
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+
+struct kvm;
+extern int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+extern int kvm_age_hva(struct kvm *kvm, unsigned long hva);
+extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
+extern void kvm_set_spte_hva(struct kv

[PATCH 07/13] KVM: PPC: Allow use of small pages to back Book3S HV guests

2011-12-05 Thread Paul Mackerras
This relaxes the requirement that the guest memory be provided as
16MB huge pages, allowing it to be provided as normal memory, i.e.
in pages of PAGE_SIZE bytes (4k or 64k).  To allow this, we index
the kvm->arch.slot_phys[] arrays with a small page index, even if
huge pages are being used, and use the low-order 5 bits of each
entry to store the order of the enclosing page with respect to
normal pages, i.e. log_2(enclosing_page_size / PAGE_SIZE).

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |8 ++
 arch/powerpc/include/asm/kvm_host.h  |3 +-
 arch/powerpc/include/asm/kvm_ppc.h   |2 +-
 arch/powerpc/include/asm/reg.h   |1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  122 --
 arch/powerpc/kvm/book3s_hv.c |   57 --
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |6 +-
 7 files changed, 130 insertions(+), 69 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index ab6772e..d55e6b4 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -107,4 +107,12 @@ static inline unsigned long hpte_page_size(unsigned long 
h, unsigned long l)
return 0;   /* error */
 }
 
+static inline bool slot_is_aligned(struct kvm_memory_slot *memslot,
+  unsigned long pagesize)
+{
+   unsigned long mask = (pagesize >> PAGE_SHIFT) - 1;
+
+   return !(memslot->base_gfn & mask) && !(memslot->npages & mask);
+}
+
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2a52bdb..ba1da85 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -176,14 +176,13 @@ struct revmap_entry {
 };
 
 /* Low-order bits in kvm->arch.slot_phys[][] */
+#define KVMPPC_PAGE_ORDER_MASK 0x1f
 #define KVMPPC_GOT_PAGE0x80
 
 struct kvm_arch {
 #ifdef CONFIG_KVM_BOOK3S_64_HV
unsigned long hpt_virt;
struct revmap_entry *revmap;
-   unsigned long ram_psize;
-   unsigned long ram_porder;
unsigned int lpid;
unsigned int host_lpid;
unsigned long host_lpcr;
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 111e1b4..a61b5b5 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -122,7 +122,7 @@ extern void kvmppc_free_hpt(struct kvm *kvm);
 extern long kvmppc_prepare_vrma(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
 extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
-   struct kvm_memory_slot *memslot);
+   struct kvm_memory_slot *memslot, unsigned long porder);
 extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 559da19..4599d12 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -237,6 +237,7 @@
 #define   LPCR_ISL (1ul << (63-2))
 #define   LPCR_VC_SH   (63-2)
 #define   LPCR_DPFD_SH (63-11)
+#define   LPCR_VRMASD  (0x1ful << (63-16))
 #define   LPCR_VRMA_L  (1ul << (63-12))
 #define   LPCR_VRMA_LP0(1ul << (63-15))
 #define   LPCR_VRMA_LP1(1ul << (63-16))
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 87016cc..cc18f3d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -34,8 +34,6 @@
 #include 
 #include 
 
-/* Pages in the VRMA are 16MB pages */
-#define VRMA_PAGE_ORDER24
 #define VRMA_VSID  0x1ffUL /* 1TB VSID reserved for VRMA */
 
 /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */
@@ -95,17 +93,31 @@ void kvmppc_free_hpt(struct kvm *kvm)
free_pages(kvm->arch.hpt_virt, HPT_ORDER - PAGE_SHIFT);
 }
 
-void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot)
+/* Bits in first HPTE dword for pagesize 4k, 64k or 16M */
+static inline unsigned long hpte0_pgsize_encoding(unsigned long pgsize)
+{
+   return (pgsize > 0x1000) ? HPTE_V_LARGE : 0;
+}
+
+/* Bits in second HPTE dword for pagesize 4k, 64k or 16M */
+static inline unsigned long hpte1_pgsize_encoding(unsigned long pgsize)
+{
+   return (pgsize == 0x1) ? 0x1000 : 0;
+}
+
+void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
+unsigned long porder)
 {
-   struct kvm *kvm = vcpu->kvm;
unsigned long i;
unsigned long npages;
unsigned long hp_v, hp_r;
unsigned long addr, hash;
-   unsigned long porder = kvm->arch.ram_porder;
+   unsigned long psize;
+   unsigned long hp0, hp1;
   

[PATCH 01/13] KVM: PPC: Move kvm_vcpu_ioctl_[gs]et_one_reg down to platform-specific code

2011-12-05 Thread Paul Mackerras
This moves the get/set_one_reg implementation down from powerpc.c into
booke.c, book3s_pr.c and book3s_hv.c.  This avoids #ifdefs in C code,
but more importantly, it fixes a bug on Book3s HV where we were
accessing beyond the end of the kvm_vcpu struct (via the to_book3s()
macro) and corrupting memory, causing random crashes and file corruption.

On Book3s HV we only accept setting the HIOR to zero, since the guest
runs in supervisor mode and its vectors are never offset from zero.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_ppc.h |3 ++
 arch/powerpc/kvm/book3s_hv.c   |   33 ++
 arch/powerpc/kvm/book3s_pr.c   |   33 ++
 arch/powerpc/kvm/booke.c   |   10 +
 arch/powerpc/kvm/powerpc.c |   39 
 5 files changed, 79 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 5192c2e..fc2d696 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -176,6 +176,9 @@ int kvmppc_core_set_sregs(struct kvm_vcpu *vcpu, struct 
kvm_sregs *sregs);
 void kvmppc_get_sregs_ivor(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
 int kvmppc_set_sregs_ivor(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
 
+int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg);
+int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg);
+
 void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 pid);
 
 #ifdef CONFIG_KVM_BOOK3S_64_HV
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ecc77fa..5efdd5b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -390,6 +390,39 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
return 0;
 }
 
+int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
+{
+   int r = -EINVAL;
+
+   switch (reg->id) {
+   case KVM_ONE_REG_PPC_HIOR:
+   reg->u.reg64 = 0;
+   r = 0;
+   break;
+   default:
+   break;
+   }
+
+   return r;
+}
+
+int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
+{
+   int r = -EINVAL;
+
+   switch (reg->id) {
+   case KVM_ONE_REG_PPC_HIOR:
+   /* Only allow this to be set to zero */
+   if (reg->u.reg64 == 0)
+   r = 0;
+   break;
+   default:
+   break;
+   }
+
+   return r;
+}
+
 int kvmppc_core_check_processor_compat(void)
 {
if (cpu_has_feature(CPU_FTR_HVMODE))
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index cbb7051..1abe35c 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -837,6 +837,39 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
return 0;
 }
 
+int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
+{
+   int r = -EINVAL;
+
+   switch (reg->id) {
+   case KVM_ONE_REG_PPC_HIOR:
+   reg->u.reg64 = to_book3s(vcpu)->hior;
+   r = 0;
+   break;
+   default:
+   break;
+   }
+
+   return r;
+}
+
+int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
+{
+   int r = -EINVAL;
+
+   switch (reg->id) {
+   case KVM_ONE_REG_PPC_HIOR:
+   to_book3s(vcpu)->hior = reg->u.reg64;
+   to_book3s(vcpu)->hior_explicit = true;
+   r = 0;
+   break;
+   default:
+   break;
+   }
+
+   return r;
+}
+
 int kvmppc_core_check_processor_compat(void)
 {
return 0;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 9e41f45..ee9e1ee 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -887,6 +887,16 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
return kvmppc_core_set_sregs(vcpu, sregs);
 }
 
+int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
+{
+   return -EINVAL;
+}
+
+int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
+{
+   return -EINVAL;
+}
+
 int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 {
return -ENOTSUPP;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 34515e8..1239c6f 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -620,45 +620,6 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
return r;
 }
 
-static int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu,
- struct kvm_one_reg *reg)
-{
-   int r = -EINVAL;
-
-   switch (reg->id) {
-#ifdef CONFIG_PPC_BOOK3S
-   case KVM_ONE_REG_PPC_HIOR:
-   reg->u.reg64 = to_book3s(vcpu)->hior;
-   r = 0;
-   

[PATCH 05/13] KVM: PPC: Make the H_ENTER hcall more reliable

2011-12-05 Thread Paul Mackerras
At present, our implementation of H_ENTER only makes one try at locking
each slot that it looks at, and doesn't even retry the ldarx/stdcx.
atomic update sequence that it uses to attempt to lock the slot.  Thus
it can return the H_PTEG_FULL error unnecessarily, particularly when
the H_EXACT flag is set, meaning that the caller wants a specific PTEG
slot.

This improves the situation by making a second pass when no free HPTE
slot is found, where we spin until we succeed in locking each slot in
turn and then check whether it is full while we hold the lock.  If the
second pass fails, then we return H_PTEG_FULL.

This also moves lock_hpte to a header file (since later commits in this
series will need to use it from other source files) and renames it to
try_lock_hpte, which is a somewhat less misleading name.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   25 
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   63 --
 2 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 23bb17e..fe45a81 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -37,6 +37,31 @@ static inline struct kvmppc_book3s_shadow_vcpu 
*to_svcpu(struct kvm_vcpu *vcpu)
 #define HPT_HASH_MASK  (HPT_NPTEG - 1)
 #endif
 
+/*
+ * We use a lock bit in HPTE dword 0 to synchronize updates and
+ * accesses to each HPTE, and another bit to indicate non-present
+ * HPTEs.
+ */
+#define HPTE_V_HVLOCK  0x40UL
+
+static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits)
+{
+   unsigned long tmp, old;
+
+   asm volatile("  ldarx   %0,0,%2\n"
+"  and.%1,%0,%3\n"
+"  bne 2f\n"
+"  ori %0,%0,%4\n"
+"  stdcx.  %0,0,%2\n"
+"  beq+2f\n"
+"  li  %1,%3\n"
+"2:isync"
+: "=&r" (tmp), "=&r" (old)
+: "r" (hpte), "r" (bits), "i" (HPTE_V_HVLOCK)
+: "cc", "memory");
+   return old == 0;
+}
+
 static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 unsigned long pte_index)
 {
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 5f45ba7..659175f 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -56,26 +56,6 @@ static void *real_vmalloc_addr(void *x)
return __va(addr);
 }
 
-#define HPTE_V_HVLOCK  0x40UL
-
-static inline long lock_hpte(unsigned long *hpte, unsigned long bits)
-{
-   unsigned long tmp, old;
-
-   asm volatile("  ldarx   %0,0,%2\n"
-"  and.%1,%0,%3\n"
-"  bne 2f\n"
-"  ori %0,%0,%4\n"
-"  stdcx.  %0,0,%2\n"
-"  beq+2f\n"
-"  li  %1,%3\n"
-"2:isync"
-: "=&r" (tmp), "=&r" (old)
-: "r" (hpte), "r" (bits), "i" (HPTE_V_HVLOCK)
-: "cc", "memory");
-   return old == 0;
-}
-
 long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
long pte_index, unsigned long pteh, unsigned long ptel)
 {
@@ -129,24 +109,49 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long 
flags,
pteh &= ~0x60UL;
ptel &= ~(HPTE_R_PP0 - kvm->arch.ram_psize);
ptel |= pa;
+
if (pte_index >= HPT_NPTE)
return H_PARAMETER;
if (likely((flags & H_EXACT) == 0)) {
pte_index &= ~7UL;
hpte = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4));
-   for (i = 0; ; ++i) {
-   if (i == 8)
-   return H_PTEG_FULL;
+   for (i = 0; i < 8; ++i) {
if ((*hpte & HPTE_V_VALID) == 0 &&
-   lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID))
+   try_lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID))
break;
hpte += 2;
}
+   if (i == 8) {
+   /*
+* Since try_lock_hpte doesn't retry (not even stdcx.
+* failures), it could be that there is a free slot
+* but we transiently failed to lock it.  Try again,
+* actually locking each slot and checking it.
+*/
+   hpte -= 16;
+   for (i = 0; i < 8; ++i) {
+   while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
+   cpu_relax();
+   

[PATCH 10/13] KVM: PPC: Implement MMIO emulation support for Book3S HV guests

2011-12-05 Thread Paul Mackerras
This provides the low-level support for MMIO emulation in Book3S HV
guests.  When the guest tries to map a page which is not covered by
any memslot, that page is taken to be an MMIO emulation page.  Instead
of inserting a valid HPTE, we insert an HPTE that has the valid bit
clear but another hypervisor software-use bit set, which we call
HPTE_V_ABSENT, to indicate that this is an absent page.  An
absent page is treated much like a valid page as far as guest hcalls
(H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that
an absent HPTE doesn't need to be invalidated with tlbie since it
was never valid as far as the hardware is concerned.

When the guest accesses a page for which there is an absent HPTE, it
will take a hypervisor data storage interrupt (HDSI) since we now set
the VPM1 bit in the LPCR.  Our HDSI handler for HPTE-not-present faults
looks up the hash table and if it finds an absent HPTE mapping the
requested virtual address, will switch to kernel mode and handle the
fault in kvmppc_book3s_hv_page_fault(), which at present just calls
kvmppc_hv_emulate_mmio() to set up the MMIO emulation.

This is based on an earlier patch by Benjamin Herrenschmidt, but since
heavily reworked.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s.h|5 +
 arch/powerpc/include/asm/kvm_book3s_64.h |   26 +++
 arch/powerpc/include/asm/kvm_host.h  |5 +
 arch/powerpc/include/asm/mmu-hash64.h|2 +-
 arch/powerpc/include/asm/ppc-opcode.h|4 +-
 arch/powerpc/include/asm/reg.h   |1 +
 arch/powerpc/kernel/asm-offsets.c|1 +
 arch/powerpc/kernel/exceptions-64s.S |8 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  228 +--
 arch/powerpc/kvm/book3s_hv.c |   21 ++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  262 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |  127 ---
 12 files changed, 607 insertions(+), 83 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 5e7e04b..5ac53f9 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -121,6 +121,11 @@ extern void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu 
*vcpu);
 extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte);
 extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr);
 extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu);
+extern int kvmppc_book3s_hv_page_fault(struct kvm_run *run,
+   struct kvm_vcpu *vcpu, unsigned long addr,
+   unsigned long status);
+extern long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr,
+   unsigned long slb_v, unsigned long valid);
 
 extern void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache 
*pte);
 extern struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu);
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 90e6658..9a59b6d 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -37,12 +37,15 @@ static inline struct kvmppc_book3s_shadow_vcpu 
*to_svcpu(struct kvm_vcpu *vcpu)
 #define HPT_HASH_MASK  (HPT_NPTEG - 1)
 #endif
 
+#define VRMA_VSID  0x1ffUL /* 1TB VSID reserved for VRMA */
+
 /*
  * We use a lock bit in HPTE dword 0 to synchronize updates and
  * accesses to each HPTE, and another bit to indicate non-present
  * HPTEs.
  */
 #define HPTE_V_HVLOCK  0x40UL
+#define HPTE_V_ABSENT  0x20UL
 
 static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits)
 {
@@ -138,6 +141,29 @@ static inline unsigned long hpte_cache_bits(unsigned long 
pte_val)
 #endif
 }
 
+static inline bool hpte_read_permission(unsigned long pp, unsigned long key)
+{
+   if (key)
+   return PP_RWRX <= pp && pp <= PP_RXRX;
+   return 1;
+}
+
+static inline bool hpte_write_permission(unsigned long pp, unsigned long key)
+{
+   if (key)
+   return pp == PP_RWRW;
+   return pp <= PP_RWRW;
+}
+
+static inline int hpte_get_skey_perm(unsigned long hpte_r, unsigned long amr)
+{
+   unsigned long skey;
+
+   skey = ((hpte_r & HPTE_R_KEY_HI) >> 57) |
+   ((hpte_r & HPTE_R_KEY_LO) >> 9);
+   return (amr >> (62 - 2 * skey)) & 3;
+}
+
 static inline void lock_rmap(unsigned long *rmap)
 {
do {
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index e369d49..c9c92f0 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -209,6 +209,7 @@ struct kvm_arch {
unsigned long lpcr;
unsigned long rmor;
struct kvmppc_rma_info *rma;
+   unsigned long vrma_slb_v;
int rma_setup_done;
struct list_head spapr_tce_tables;
spinlock_t slot_phys_lock;
@@ -451,6 +452,10 @@ 

[PATCH 08/13] KVM: PPC: Allow I/O mappings in memory slots

2011-12-05 Thread Paul Mackerras
This provides for the case where userspace maps an I/O device into the
address range of a memory slot using a VM_PFNMAP mapping.  In that
case, we work out the pfn from vma->vm_pgoff, and record the cache
enable bits from vma->vm_page_prot in two low-order bits in the
slot_phys array entries.  Then, in kvmppc_h_enter() we check that the
cache bits in the HPTE that the guest wants to insert match the cache
bits in the slot_phys array entry.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   26 +++
 arch/powerpc/include/asm/kvm_host.h  |2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |   67 --
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |5 +-
 4 files changed, 76 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index d55e6b4..a98e0f6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -107,6 +107,32 @@ static inline unsigned long hpte_page_size(unsigned long 
h, unsigned long l)
return 0;   /* error */
 }
 
+static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long 
io_type)
+{
+   unsigned int wimg = ptel & HPTE_R_WIMG;
+
+   /* Handle SAO */
+   if (wimg == (HPTE_R_W | HPTE_R_I | HPTE_R_M) &&
+   cpu_has_feature(CPU_FTR_ARCH_206))
+   wimg = HPTE_R_M;
+
+   if (!io_type)
+   return wimg == HPTE_R_M;
+
+   return (wimg & (HPTE_R_W | HPTE_R_I)) == io_type;
+}
+
+/* Return HPTE cache control bits corresponding to Linux pte bits */
+static inline unsigned long hpte_cache_bits(unsigned long pte_val)
+{
+#if _PAGE_NO_CACHE == HPTE_R_I && _PAGE_WRITETHRU == HPTE_R_W
+   return pte_val & (HPTE_R_W | HPTE_R_I);
+#else
+   return ((pte_val & _PAGE_NO_CACHE) ? HPTE_R_I : 0) +
+   ((pte_val & _PAGE_WRITETHRU) ? HPTE_R_W : 0);
+#endif
+}
+
 static inline bool slot_is_aligned(struct kvm_memory_slot *memslot,
   unsigned long pagesize)
 {
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index ba1da85..9b1c247 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -177,6 +177,8 @@ struct revmap_entry {
 
 /* Low-order bits in kvm->arch.slot_phys[][] */
 #define KVMPPC_PAGE_ORDER_MASK 0x1f
+#define KVMPPC_PAGE_NO_CACHE   HPTE_R_I/* 0x20 */
+#define KVMPPC_PAGE_WRITETHRU  HPTE_R_W/* 0x40 */
 #define KVMPPC_GOT_PAGE0x80
 
 struct kvm_arch {
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index cc18f3d..b904c40 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -199,7 +199,8 @@ static long kvmppc_get_guest_page(struct kvm *kvm, unsigned 
long gfn,
struct page *page, *hpage, *pages[1];
unsigned long s, pgsize;
unsigned long *physp;
-   unsigned int got, pgorder;
+   unsigned int is_io, got, pgorder;
+   struct vm_area_struct *vma;
unsigned long pfn, i, npages;
 
physp = kvm->arch.slot_phys[memslot->id];
@@ -208,34 +209,51 @@ static long kvmppc_get_guest_page(struct kvm *kvm, 
unsigned long gfn,
if (physp[gfn - memslot->base_gfn])
return 0;
 
+   is_io = 0;
+   got = 0;
page = NULL;
pgsize = psize;
+   err = -EINVAL;
start = gfn_to_hva_memslot(memslot, gfn);
 
/* Instantiate and get the page we want access to */
np = get_user_pages_fast(start, 1, 1, pages);
-   if (np != 1)
-   return -EINVAL;
-   page = pages[0];
-   got = KVMPPC_GOT_PAGE;
+   if (np != 1) {
+   /* Look up the vma for the page */
+   down_read(¤t->mm->mmap_sem);
+   vma = find_vma(current->mm, start);
+   if (!vma || vma->vm_start > start ||
+   start + psize > vma->vm_end ||
+   !(vma->vm_flags & VM_PFNMAP))
+   goto up_err;
+   is_io = hpte_cache_bits(pgprot_val(vma->vm_page_prot));
+   pfn = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
+   /* check alignment of pfn vs. requested page size */
+   if (psize > PAGE_SIZE && (pfn & ((psize >> PAGE_SHIFT) - 1)))
+   goto up_err;
+   up_read(¤t->mm->mmap_sem);
 
-   /* See if this is a large page */
-   s = PAGE_SIZE;
-   if (PageHuge(page)) {
-   hpage = compound_head(page);
-   s <<= compound_order(hpage);
-   /* Get the whole large page if slot alignment is ok */
-   if (s > psize && slot_is_aligned(memslot, s) &&
-   !(memslot->userspace_addr & (s - 1))) {
-   start &= ~(s - 1);
-   pgsize = s;
-   

[PATCH 09/13] KVM: PPC: Maintain a doubly-linked list of guest HPTEs for each gfn

2011-12-05 Thread Paul Mackerras
This expands the reverse mapping array to contain two links for each
HPTE which are used to link together HPTEs that correspond to the
same guest logical page.  Each circular list of HPTEs is pointed to
by the rmap array entry for the guest logical page, pointed to by
the relevant memslot.  Links are 32-bit HPT entry indexes rather than
full 64-bit pointers, to save space.  We use 3 of the remaining 32
bits in the rmap array entries as a lock bit, a referenced bit and
a present bit (the present bit is needed since HPTE index 0 is valid).
The bit lock for the rmap chain nests inside the HPTE lock bit.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   18 ++
 arch/powerpc/include/asm/kvm_host.h  |   17 ++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   84 +-
 3 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index a98e0f6..90e6658 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -107,6 +107,11 @@ static inline unsigned long hpte_page_size(unsigned long 
h, unsigned long l)
return 0;   /* error */
 }
 
+static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
+{
+   return ((ptel & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT;
+}
+
 static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long 
io_type)
 {
unsigned int wimg = ptel & HPTE_R_WIMG;
@@ -133,6 +138,19 @@ static inline unsigned long hpte_cache_bits(unsigned long 
pte_val)
 #endif
 }
 
+static inline void lock_rmap(unsigned long *rmap)
+{
+   do {
+   while (test_bit(KVMPPC_RMAP_LOCK_BIT, rmap))
+   cpu_relax();
+   } while (test_and_set_bit_lock(KVMPPC_RMAP_LOCK_BIT, rmap));
+}
+
+static inline void unlock_rmap(unsigned long *rmap)
+{
+   __clear_bit_unlock(KVMPPC_RMAP_LOCK_BIT, rmap);
+}
+
 static inline bool slot_is_aligned(struct kvm_memory_slot *memslot,
   unsigned long pagesize)
 {
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 9b1c247..e369d49 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -169,12 +169,27 @@ struct kvmppc_rma_info {
 /*
  * The reverse mapping array has one entry for each HPTE,
  * which stores the guest's view of the second word of the HPTE
- * (including the guest physical address of the mapping).
+ * (including the guest physical address of the mapping),
+ * plus forward and backward pointers in a doubly-linked ring
+ * of HPTEs that map the same host page.  The pointers in this
+ * ring are 32-bit HPTE indexes, to save space.
  */
 struct revmap_entry {
unsigned long guest_rpte;
+   unsigned int forw, back;
 };
 
+/*
+ * We use the top bit of each memslot->rmap entry as a lock bit,
+ * and bit 32 as a present flag.  The bottom 32 bits are the
+ * index in the guest HPT of a HPTE that points to the page.
+ */
+#define KVMPPC_RMAP_LOCK_BIT   63
+#define KVMPPC_RMAP_REF_BIT33
+#define KVMPPC_RMAP_REFERENCED (1ul << KVMPPC_RMAP_REF_BIT)
+#define KVMPPC_RMAP_PRESENT0x1ul
+#define KVMPPC_RMAP_INDEX  0xul
+
 /* Low-order bits in kvm->arch.slot_phys[][] */
 #define KVMPPC_PAGE_ORDER_MASK 0x1f
 #define KVMPPC_PAGE_NO_CACHE   HPTE_R_I/* 0x20 */
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 88d2add..b600f8c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -57,6 +57,70 @@ static void *real_vmalloc_addr(void *x)
return __va(addr);
 }
 
+/*
+ * Add this HPTE into the chain for the real page.
+ * Must be called with the chain locked; it unlocks the chain.
+ */
+static void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
+unsigned long *rmap, long pte_index, int realmode)
+{
+   struct revmap_entry *head, *tail;
+   unsigned long i;
+
+   if (*rmap & KVMPPC_RMAP_PRESENT) {
+   i = *rmap & KVMPPC_RMAP_INDEX;
+   head = &kvm->arch.revmap[i];
+   if (realmode)
+   head = real_vmalloc_addr(head);
+   tail = &kvm->arch.revmap[head->back];
+   if (realmode)
+   tail = real_vmalloc_addr(tail);
+   rev->forw = i;
+   rev->back = head->back;
+   tail->forw = pte_index;
+   head->back = pte_index;
+   } else {
+   rev->forw = rev->back = pte_index;
+   i = pte_index;
+   }
+   smp_wmb();
+   *rmap = i | KVMPPC_RMAP_REFERENCED | KVMPPC_RMAP_PRESENT; /* unlock */
+}
+
+/* Remove this HPTE from the chain for a real page */
+static void remove_revmap_chain(struct kvm *kvm, long pte_index,
+  

[PATCH 04/13] KVM: PPC: Add an interface for pinning guest pages in Book3s HV guests

2011-12-05 Thread Paul Mackerras
This adds two new functions, kvmppc_pin_guest_page() and
kvmppc_unpin_guest_page(), and uses them to pin the guest pages where
the guest has registered areas of memory for the hypervisor to update,
(i.e. the per-cpu virtual processor areas, SLB shadow buffers and
dispatch trace logs) and then unpin them when they are no longer
required.

Although it is not strictly necessary to pin the pages at this point,
since all guest pages are already pinned, later commits in this series
will mean that guest pages aren't all pinned.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s.h |3 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |   38 ++
 arch/powerpc/kvm/book3s_hv.c  |   67 ++---
 3 files changed, 78 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index deb8a4e..16db48c 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -140,6 +140,9 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct 
kvmppc_bat *bat,
 extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr);
 extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu 
*vcpu);
 extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr,
+   unsigned long *nb_ret);
+extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr);
 
 extern void kvmppc_entry_trampoline(void);
 extern void kvmppc_hv_entry_trampoline(void);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index e4c6069..dcd39dc 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -184,6 +184,44 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
return -ENOENT;
 }
 
+void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long gpa,
+   unsigned long *nb_ret)
+{
+   struct kvm_memory_slot *memslot;
+   unsigned long gfn = gpa >> PAGE_SHIFT;
+   struct page *page;
+   unsigned long offset;
+   unsigned long pfn, pa;
+   unsigned long *physp;
+
+   memslot = gfn_to_memslot(kvm, gfn);
+   if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID))
+   return NULL;
+   physp = kvm->arch.slot_phys[memslot->id];
+   if (!physp)
+   return NULL;
+   physp += (gfn - memslot->base_gfn) >>
+   (kvm->arch.ram_porder - PAGE_SHIFT);
+   pa = *physp;
+   if (!pa)
+   return NULL;
+   pfn = pa >> PAGE_SHIFT;
+   page = pfn_to_page(pfn);
+   get_page(page);
+   offset = gpa & (kvm->arch.ram_psize - 1);
+   if (nb_ret)
+   *nb_ret = kvm->arch.ram_psize - offset;
+   return page_address(page) + offset;
+}
+
+void kvmppc_unpin_guest_page(struct kvm *kvm, void *va)
+{
+   struct page *page = virt_to_page(va);
+
+   page = compound_head(page);
+   put_page(page);
+}
+
 void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu)
 {
struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c2ee5a7..6e94af8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -137,12 +137,10 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu 
*vcpu,
   unsigned long vcpuid, unsigned long vpa)
 {
struct kvm *kvm = vcpu->kvm;
-   unsigned long gfn, pg_index, ra, len;
-   unsigned long pg_offset;
+   unsigned long len, nb;
void *va;
struct kvm_vcpu *tvcpu;
-   struct kvm_memory_slot *memslot;
-   unsigned long *physp;
+   int err = H_PARAMETER;
 
tvcpu = kvmppc_find_vcpu(kvm, vcpuid);
if (!tvcpu)
@@ -155,51 +153,41 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu 
*vcpu,
if (flags < 4) {
if (vpa & 0x7f)
return H_PARAMETER;
+   if (flags >= 2 && !tvcpu->arch.vpa)
+   return H_RESOURCE;
/* registering new area; convert logical addr to real */
-   gfn = vpa >> PAGE_SHIFT;
-   memslot = gfn_to_memslot(kvm, gfn);
-   if (!memslot || !(memslot->flags & KVM_MEMSLOT_INVALID))
-   return H_PARAMETER;
-   physp = kvm->arch.slot_phys[memslot->id];
-   if (!physp)
-   return H_PARAMETER;
-   pg_index = (gfn - memslot->base_gfn) >>
-   (kvm->arch.ram_porder - PAGE_SHIFT);
-   pg_offset = vpa & (kvm->arch.ram_psize - 1);
-   ra = physp[pg_index];
-   if (!ra)
+   va = kvmppc_pin_guest_page(kvm, vpa, &nb);
+   if (va == NULL)
return H_PARAMETER;
-

[PATCH 03/13] KVM: PPC: Keep page physical addresses in per-slot arrays

2011-12-05 Thread Paul Mackerras
This allocates an array for each memory slot that is added to store
the physical addresses of the pages in the slot.  This array is
vmalloc'd and accessed in kvmppc_h_enter using real_vmalloc_addr().
This allows us to remove the ram_pginfo field from the kvm_arch
struct, and removes the 64GB guest RAM limit that we had.

We use the low-order bits of the array entries to store a flag
indicating that we have done get_page on the corresponding page,
and therefore need to call put_page when we are finished with the
page.  Currently this is set for all pages except those in our
special RMO regions.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |8 ++-
 arch/powerpc/kvm/book3s_64_mmu_hv.c |   18 +++---
 arch/powerpc/kvm/book3s_hv.c|  114 +--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |   44 -
 4 files changed, 109 insertions(+), 75 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 629df2e..cf6b4d7 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -175,25 +175,27 @@ struct revmap_entry {
unsigned long guest_rpte;
 };
 
+/* Low-order bits in kvm->arch.slot_phys[][] */
+#define KVMPPC_GOT_PAGE0x80
+
 struct kvm_arch {
 #ifdef CONFIG_KVM_BOOK3S_64_HV
unsigned long hpt_virt;
struct revmap_entry *revmap;
-   unsigned long ram_npages;
unsigned long ram_psize;
unsigned long ram_porder;
-   struct kvmppc_pginfo *ram_pginfo;
unsigned int lpid;
unsigned int host_lpid;
unsigned long host_lpcr;
unsigned long sdr1;
unsigned long host_sdr1;
int tlbie_lock;
-   int n_rma_pages;
unsigned long lpcr;
unsigned long rmor;
struct kvmppc_rma_info *rma;
struct list_head spapr_tce_tables;
+   unsigned long *slot_phys[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS];
+   int slot_npages[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS];
unsigned short last_vcpu[NR_CPUS];
struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
 #endif /* CONFIG_KVM_BOOK3S_64_HV */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 80ece8d..e4c6069 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -98,16 +98,16 @@ void kvmppc_free_hpt(struct kvm *kvm)
 void kvmppc_map_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem)
 {
unsigned long i;
-   unsigned long npages = kvm->arch.ram_npages;
-   unsigned long pfn;
+   unsigned long npages;
+   unsigned long pa;
unsigned long *hpte;
unsigned long hash;
unsigned long porder = kvm->arch.ram_porder;
struct revmap_entry *rev;
-   struct kvmppc_pginfo *pginfo = kvm->arch.ram_pginfo;
+   unsigned long *physp;
 
-   if (!pginfo)
-   return;
+   physp = kvm->arch.slot_phys[mem->slot];
+   npages = kvm->arch.slot_npages[mem->slot];
 
/* VRMA can't be > 1TB */
if (npages > 1ul << (40 - porder))
@@ -117,9 +117,10 @@ void kvmppc_map_vrma(struct kvm *kvm, struct 
kvm_userspace_memory_region *mem)
npages = HPT_NPTEG;
 
for (i = 0; i < npages; ++i) {
-   pfn = pginfo[i].pfn;
-   if (!pfn)
+   pa = physp[i];
+   if (!pa)
break;
+   pa &= PAGE_MASK;
/* can't use hpt_hash since va > 64 bits */
hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & HPT_HASH_MASK;
/*
@@ -131,8 +132,7 @@ void kvmppc_map_vrma(struct kvm *kvm, struct 
kvm_userspace_memory_region *mem)
hash = (hash << 3) + 7;
hpte = (unsigned long *) (kvm->arch.hpt_virt + (hash << 4));
/* HPTE low word - RPN, protection, etc. */
-   hpte[1] = (pfn << PAGE_SHIFT) | HPTE_R_R | HPTE_R_C |
-   HPTE_R_M | PP_RWXX;
+   hpte[1] = pa | HPTE_R_R | HPTE_R_C | HPTE_R_M | PP_RWXX;
smp_wmb();
hpte[0] = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
(i << (VRMA_PAGE_ORDER - 16)) | HPTE_V_BOLTED |
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 5efdd5b..c2ee5a7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -48,14 +48,6 @@
 #include 
 #include 
 
-/*
- * For now, limit memory to 64GB and require it to be large pages.
- * This value is chosen because it makes the ram_pginfo array be
- * 64kB in size, which is about as large as we want to be trying
- * to allocate with kmalloc.
- */
-#define MAX_MEM_ORDER  36
-
 #define LARGE_PAGE_ORDER   24  /* 16MB pages */
 
 /* #define EXIT_DEBUG */
@@ -145,10 +137,12 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu 
*vcpu,
  

[PATCH 02/13] KVM: PPC: Keep a record of HV guest view of hashed page table entries

2011-12-05 Thread Paul Mackerras
This adds an array that parallels the guest hashed page table (HPT),
that is, it has one entry per HPTE, used to store the guest's view
of the second doubleword of the corresponding HPTE.  The first
doubleword in the HPTE is the same as the guest's idea of it, so we
don't need to store a copy, but the second doubleword in the HPTE has
the real page number rather than the guest's logical page number.
This allows us to remove the back_translate() and reverse_xlate()
functions.

This "reverse mapping" array is vmalloc'd, meaning that to access it
in real mode we have to walk the kernel's page tables explicitly.
That is done by the new real_vmalloc_addr() function.  (In fact this
returns an address in the linear mapping, so the result is usable
both in real mode and in virtual mode.)

There are also some minor cleanups here: moving the definitions of
HPT_ORDER etc. to a header file and defining HPT_NPTE for HPT_NPTEG << 3.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |8 +++
 arch/powerpc/include/asm/kvm_host.h  |   10 
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |   44 +++
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   87 ++
 4 files changed, 103 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index d0ac94f..23bb17e 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -29,6 +29,14 @@ static inline struct kvmppc_book3s_shadow_vcpu 
*to_svcpu(struct kvm_vcpu *vcpu)
 
 #define SPAPR_TCE_SHIFT12
 
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+/* For now use fixed-size 16MB page table */
+#define HPT_ORDER  24
+#define HPT_NPTEG  (1ul << (HPT_ORDER - 7))/* 128B per pteg */
+#define HPT_NPTE   (HPT_NPTEG << 3)/* 8 PTEs per PTEG */
+#define HPT_HASH_MASK  (HPT_NPTEG - 1)
+#endif
+
 static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 unsigned long pte_index)
 {
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 66c75cd..629df2e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -166,9 +166,19 @@ struct kvmppc_rma_info {
atomic_t use_count;
 };
 
+/*
+ * The reverse mapping array has one entry for each HPTE,
+ * which stores the guest's view of the second word of the HPTE
+ * (including the guest physical address of the mapping).
+ */
+struct revmap_entry {
+   unsigned long guest_rpte;
+};
+
 struct kvm_arch {
 #ifdef CONFIG_KVM_BOOK3S_64_HV
unsigned long hpt_virt;
+   struct revmap_entry *revmap;
unsigned long ram_npages;
unsigned long ram_psize;
unsigned long ram_porder;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index bc3a2ea..80ece8d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -33,11 +34,6 @@
 #include 
 #include 
 
-/* For now use fixed-size 16MB page table */
-#define HPT_ORDER  24
-#define HPT_NPTEG  (1ul << (HPT_ORDER - 7))/* 128B per pteg */
-#define HPT_HASH_MASK  (HPT_NPTEG - 1)
-
 /* Pages in the VRMA are 16MB pages */
 #define VRMA_PAGE_ORDER24
 #define VRMA_VSID  0x1ffUL /* 1TB VSID reserved for VRMA */
@@ -51,7 +47,9 @@ long kvmppc_alloc_hpt(struct kvm *kvm)
 {
unsigned long hpt;
unsigned long lpid;
+   struct revmap_entry *rev;
 
+   /* Allocate guest's hashed page table */
hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|__GFP_NOWARN,
   HPT_ORDER - PAGE_SHIFT);
if (!hpt) {
@@ -60,12 +58,20 @@ long kvmppc_alloc_hpt(struct kvm *kvm)
}
kvm->arch.hpt_virt = hpt;
 
+   /* Allocate reverse map array */
+   rev = vmalloc(sizeof(struct revmap_entry) * HPT_NPTE);
+   if (!rev) {
+   pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
+   goto out_freehpt;
+   }
+   kvm->arch.revmap = rev;
+
+   /* Allocate the guest's logical partition ID */
do {
lpid = find_first_zero_bit(lpid_inuse, NR_LPIDS);
if (lpid >= NR_LPIDS) {
pr_err("kvm_alloc_hpt: No LPIDs free\n");
-   free_pages(hpt, HPT_ORDER - PAGE_SHIFT);
-   return -ENOMEM;
+   goto out_freeboth;
}
} while (test_and_set_bit(lpid, lpid_inuse));
 
@@ -74,11 +80,18 @@ long kvmppc_alloc_hpt(struct kvm *kvm)
 
pr_info("KVM guest htab at %lx, LPID %lx\n", hpt, lpid);
return 0;
+
+ out_freeboth:
+   vfree(rev);
+ out_freehpt:
+   free_pages(hpt, HPT_ORDER - PAGE_SHIFT);
+

[PATCH 06/13] KVM: PPC: Only get pages when actually needed, not in prepare_memory_region()

2011-12-05 Thread Paul Mackerras
This removes the code from kvmppc_core_prepare_memory_region() that
looked up the VMA for the region being added and called hva_to_page
to get the pfns for the memory.  We have no guarantee that there will
be anything mapped there at the time of the KVM_SET_USER_MEMORY_REGION
ioctl call; userspace can do that ioctl and then map memory into the
region later.

Instead we defer looking up the pfn for each memory page until it is
needed, which generally means when the guest does an H_ENTER hcall on
the page.  Since we can't call get_user_pages in real mode, if we don't
already have the pfn for the page, kvmppc_h_enter() will return
H_TOO_HARD and we then call kvmppc_virtmode_h_enter() once we get back
to kernel context.  That calls kvmppc_get_guest_page() to get the pfn
for the page, and then calls back to kvmppc_h_enter() to redo the HPTE
insertion.

When the first vcpu starts executing, we need to have the RMO or VRMA
region mapped so that the guest's real mode accesses will work.  Thus
we now have a check in kvmppc_vcpu_run() to see if the RMO/VRMA is set
up and if not, call kvmppc_hv_setup_rma().  It checks if the memslot
starting at guest physical 0 now has RMO memory mapped there; if so it
sets it up for the guest, otherwise on POWER7 it sets up the VRMA.
The function that does that, kvmppc_map_vrma, is now a bit simpler,
as it calls kvmppc_virtmode_h_enter instead of creating the HPTE itself.

Since we are now potentially updating entries in the slot_phys[]
arrays from multiple vcpu threads, we now have a spinlock protecting
those updates to ensure that we don't lose track of any references
to pages.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s.h|4 +
 arch/powerpc/include/asm/kvm_book3s_64.h |   12 ++
 arch/powerpc/include/asm/kvm_host.h  |2 +
 arch/powerpc/include/asm/kvm_ppc.h   |4 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  130 +---
 arch/powerpc/kvm/book3s_hv.c |  244 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   56 
 7 files changed, 291 insertions(+), 161 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 16db48c..5e7e04b 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -143,6 +143,10 @@ extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, 
gfn_t gfn);
 extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr,
unsigned long *nb_ret);
 extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr);
+extern long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
+   long pte_index, unsigned long pteh, unsigned long ptel);
+extern long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
+   long pte_index, unsigned long pteh, unsigned long ptel);
 
 extern void kvmppc_entry_trampoline(void);
 extern void kvmppc_hv_entry_trampoline(void);
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index fe45a81..ab6772e 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -95,4 +95,16 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
return rb;
 }
 
+static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+{
+   /* only handle 4k, 64k and 16M pages for now */
+   if (!(h & HPTE_V_LARGE))
+   return 1ul << 12;   /* 4k page */
+   if ((l & 0xf000) == 0x1000 && cpu_has_feature(CPU_FTR_ARCH_206))
+   return 1ul << 16;   /* 64k page */
+   if ((l & 0xff000) == 0)
+   return 1ul << 24;   /* 16M page */
+   return 0;   /* error */
+}
+
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index cf6b4d7..2a52bdb 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -193,7 +193,9 @@ struct kvm_arch {
unsigned long lpcr;
unsigned long rmor;
struct kvmppc_rma_info *rma;
+   int rma_setup_done;
struct list_head spapr_tce_tables;
+   spinlock_t slot_phys_lock;
unsigned long *slot_phys[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS];
int slot_npages[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS];
unsigned short last_vcpu[NR_CPUS];
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index fc2d696..111e1b4 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -121,8 +121,8 @@ extern long kvmppc_alloc_hpt(struct kvm *kvm);
 extern void kvmppc_free_hpt(struct kvm *kvm);
 extern long kvmppc_prepare_vrma(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
-

[PATCH 0/13] KVM: PPC: Update Book3S HV memory handling

2011-12-05 Thread Paul Mackerras
This series of patches updates the Book3S-HV KVM code that manages the
guest hashed page table (HPT) to enable several things:

* MMIO emulation and MMIO pass-through

* Use of small pages (4kB or 64kB, depending on config) to back the
  guest memory

* Pageable guest memory - i.e. backing pages can be removed from the
  guest and reinstated on demand, using the MMU notifier mechanism.

* Guests can be given read-only access to pages even though they think
  they have mapped them read/write.  When they try to write to them
  their access is upgraded to read/write.  This allows KSM to share
  pages between guests.

On PPC970 we have no way to get DSIs and ISIs to come to the
hypervisor, so we can't do MMIO emulation or pageable guest memory.
On POWER7 we set the VPM1 bit in the LPCR to make all DSIs and ISIs
come to the hypervisor (host) as HDSIs or HISIs.

This code is working well in my tests.  The sporadic crashes that I
was seeing earlier are fixed by the first patch in the series.
Somewhat to my surprise, when I implemented the last patch in the
series I started to see KSM coalescing pages without any further
effort on my part -- my tests were on a machine with Fedora 16
installed, and it has ksmtuned running by default.

This series is on top of Alex Graf's kvm-ppc-next branch, although the
last patch on that branch ("KVM: PPC: booke: Improve timer register
emulation") is causing the decrementer not to work properly in Book3S
HV guests, for reasons that I haven't fully determined yet.

These patches only touch arch/powerpc except for patch 11, which adds
a couple of barriers to allow mmu_notifier_retry() to be used outside
of the kvm->mmu_lock.

Unlike the previous version of these patches, we don't look at what's
mapped in the user address space at the time that
kvmppc_core_prepare_memory_region or kvmppc_core_commit_memory_region
gets called; we look up pages only when they are needed, either
because the guest wants to map them with an H_ENTER hypercall, or for
the pages needed for the virtual real-mode area (VRMA), at the time of
the first VCPU_RUN ioctl.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/13] KVM: Add barriers to allow mmu_notifier_retry to be used locklessly

2011-12-05 Thread Paul Mackerras
This adds an smp_wmb in kvm_mmu_notifier_invalidate_range_end() and an
smp_rmb in mmu_notifier_retry() so that mmu_notifier_retry() will give
the correct answer when called without kvm->mmu_lock being held.
PowerPC Book3S HV KVM wants to use a bitlock per guest page rather than
a single global spinlock in order to improve the scalability of updates
to the guest MMU hashed page table, and so needs this.

Signed-off-by: Paul Mackerras 
---
 include/linux/kvm_host.h |   14 +-
 virt/kvm/kvm_main.c  |6 +++---
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c6a2ec9..17319ad 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -672,12 +672,16 @@ static inline int mmu_notifier_retry(struct kvm_vcpu 
*vcpu, unsigned long mmu_se
if (unlikely(vcpu->kvm->mmu_notifier_count))
return 1;
/*
-* Both reads happen under the mmu_lock and both values are
-* modified under mmu_lock, so there's no need of smb_rmb()
-* here in between, otherwise mmu_notifier_count should be
-* read before mmu_notifier_seq, see
-* mmu_notifier_invalidate_range_end write side.
+* Ensure the read of mmu_notifier_count happens before the read
+* of mmu_notifier_seq.  This interacts with the smp_wmb() in
+* mmu_notifier_invalidate_range_end to make sure that the caller
+* either sees the old (non-zero) value of mmu_notifier_count or
+* the new (incremented) value of mmu_notifier_seq.
+* PowerPC Book3s HV KVM calls this under a per-page lock
+* rather than under kvm->mmu_lock, for scalability, so
+* can't rely on kvm->mmu_lock to keep things ordered.
 */
+   smp_rmb();
if (vcpu->kvm->mmu_notifier_seq != mmu_seq)
return 1;
return 0;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d9cfb78..ad2a912 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -357,11 +357,11 @@ static void kvm_mmu_notifier_invalidate_range_end(struct 
mmu_notifier *mn,
 * been freed.
 */
kvm->mmu_notifier_seq++;
+   smp_wmb();
/*
 * The above sequence increase must be visible before the
-* below count decrease but both values are read by the kvm
-* page fault under mmu_lock spinlock so we don't need to add
-* a smb_wmb() here in between the two.
+* below count decrease, which is ensured by the smp_wmb above
+* in conjunction with the smp_rmb in mmu_notifier_retry().
 */
kvm->mmu_notifier_count--;
spin_unlock(&kvm->mmu_lock);
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/28] kvm tools: Prepare kvmtool for another architecture

2011-12-05 Thread Matt Evans
On 06/12/11 14:35, Matt Evans wrote:

> This patch series rearranges and tidies various parts of kvmtool to pave the 
> way
> for the addition of support for another architecture -- SPAPR PPC64.  A second
> patch series will follow to present the PPC64 support.

I forgot to mention, of course, that these two sets apply on top of 
git://github.com/penberg/linux-kvm.git master as of d5e6b9fa.

Also, I've have been testing PPC64 kvmtool using the book3s_hv KVM mode.


Matt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors

2011-12-05 Thread Rusty Russell
On Mon, 05 Dec 2011 11:52:54 +0200, Avi Kivity  wrote:
> On 12/05/2011 02:10 AM, Rusty Russell wrote:
> > On Sun, 04 Dec 2011 17:16:59 +0200, Avi Kivity  wrote:
> > > On 12/04/2011 05:11 PM, Michael S. Tsirkin wrote:
> > > > > There's also the used ring, but that's a
> > > > > mistake if you have out of order completion.  We should have used 
> > > > > copying.
> > > >
> > > > Seems unrelated... unless you want used to be written into
> > > > descriptor ring itself?
> > > 
> > > The avail/used rings are in addition to the regular ring, no?  If you
> > > copy descriptors, then it goes away.
> >
> > There were two ideas which drove the current design:
> >
> > 1) The Van-Jacobson style "no two writers to same cacheline makes rings
> >fast" idea.  Empirically, this doesn't show any winnage.
> 
> Write/write is the same as write/read or read/write.  Both cases have to
> send a probe and wait for the result.  What we really need is to
> minimize cache line ping ponging, and the descriptor pool fails that
> with ooo completion.  I doubt it's measurable though except with the
> very fastest storage providers.

The claim was that going exclusive->shared->exclusive was cheaper than
exclusive->invalid->exclusive.  When VJ said it, it seemed convincing :)

> > 2) Allowing a generic inter-guest copy mechanism, so we could have
> >genuinely untrusted driver domains.  Yet noone ever did this so it's
> >hardly a killer feature :(
> 
> It's still a goal, though not an important one.  But we have to
> translate rings anyway, don't, since buffers are in guest physical
> addresses, and we're moving into an address space that doesn't map those.

Yes, but the hypervisor/trusted party would simply have to do the copy;
the rings themselves would be shared A would say "copy this to/from B's
ring entry N" and you know that A can't have changed B's entry.

> I thought of having a vhost-copy driver that could do ring translation,
> using a dma engine for the copy.

As long as we get the length of data written from the vhost-copy driver
(ie. not just the network header).  Otherwise a malicious other guest
can send short packets, and a local process can read uninitialized
memory.  And pre-zeroing the buffers for this corner case sucks.

> > So if we're going to revisit and drop those requirements, I'd say:
> >
> > 1) Shared device/driver rings like Xen.  Xen uses device-specific ring
> >contents, I'd be tempted to stick to our pre-headers, and a 'u64
> >addr; u64 len_and_flags; u64 cookie;' generic style.  Then use
> >the same ring for responses.  That's a slight space-win, since
> >we're 24 bytes vs 26 bytes now.
> 
> Let's cheat and have inline contents.  Take three bits from
> len_and_flags to specify additional descriptors as inline data.

Nice, I like this optimization.

> Also, stuff the cookie into len_and_flags as well.

Every driver really wants to put a pointer in there.  We have an array
to map desc. numbers to cookies inside the virtio core.

We really want 64 bits.

> > 2) Stick with physically-contiguous rings, but use them of size (2^n)-1.
> >Makes the indexing harder, but that -1 lets us stash the indices in
> >the first entry and makes the ring a nice 2^n size.
> 
> Allocate at lease a cache line for those.  The 2^n size is not really
> material, a division is never necessary.

We free-run our indices, so we *do* a division (by truncation).  If we
limit indices to ringsize, then we have to handle empty/full confusion.

It's nice for simple OSes if things pack nicely into pages, but it's not
a killer feature IMHO.

> > > > > 16kB worth of descriptors is 1024 entries.  With 4kB buffers, that's 
> > > > > 4MB
> > > > > worth of data, or 4 ms at 10GbE line speed.  With 1500 byte buffers 
> > > > > it's
> > > > > just 1.5 ms.  In any case I think it's sufficient.
> > > >
> > > > Right. So I think that without indirect, we waste about 3 entries
> > > > per packet for virtio header and transport etc headers.
> > > 
> > > That does suck.  Are there issues in increasing the ring size?  Or
> > > making it discontiguous?
> >
> > Because the qemu implementation is broken.  
> 
> I was talking about something else, but this is more important.  Every
> time we make a simplifying assumption, it turns around and bites us, and
> the code becomes twice as complicated as it would have been in the first
> place, and the test matrix explodes.

True, though we seem to be improving.  But this is why I don't want
optional features in the spec; I want us always to exercise all of it.

> > We can often put the virtio
> > header at the head of the packet.  In practice, the qemu implementation
> > insists the header be a single descriptor.
> >
> > (At least, it used to, perhaps it has now been fixed.  We need a
> > VIRTIO_NET_F_I_NOW_CONFORM_TO_THE_DAMN_SPEC_SORRY_I_SUCK bit).
> 
> We'll run out of bits in no time.

We had one already: VIRTIO_F_BAD_FEATURE.  We haven't used it in a long
time (if ever), and I ju

[PATCH 8/8] kvm tools: Make virtio-pci's ioeventfd__add_event() fall back gracefully if ioeventfds unavailable

2011-12-05 Thread Matt Evans
PPC KVM doesn't yet support ioeventfds, so don't bomb out/die.  virtio-pci is
able to function if it instead uses normal IO port notification.

Signed-off-by: Matt Evans 
---
 tools/kvm/include/kvm/ioeventfd.h |3 ++-
 tools/kvm/ioeventfd.c |   12 +---
 tools/kvm/virtio/pci.c|   11 ++-
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/tools/kvm/include/kvm/ioeventfd.h 
b/tools/kvm/include/kvm/ioeventfd.h
index df01750..5e458be 100644
--- a/tools/kvm/include/kvm/ioeventfd.h
+++ b/tools/kvm/include/kvm/ioeventfd.h
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct kvm;
 
@@ -21,7 +22,7 @@ struct ioevent {
 
 void ioeventfd__init(void);
 void ioeventfd__start(void);
-void ioeventfd__add_event(struct ioevent *ioevent);
+bool ioeventfd__add_event(struct ioevent *ioevent);
 void ioeventfd__del_event(u64 addr, u64 datamatch);
 
 #endif
diff --git a/tools/kvm/ioeventfd.c b/tools/kvm/ioeventfd.c
index 3a240e4..37f9a63 100644
--- a/tools/kvm/ioeventfd.c
+++ b/tools/kvm/ioeventfd.c
@@ -26,7 +26,7 @@ void ioeventfd__init(void)
die("Failed creating epoll fd");
 }
 
-void ioeventfd__add_event(struct ioevent *ioevent)
+bool ioeventfd__add_event(struct ioevent *ioevent)
 {
struct kvm_ioeventfd kvm_ioevent;
struct epoll_event epoll_event;
@@ -48,8 +48,13 @@ void ioeventfd__add_event(struct ioevent *ioevent)
.flags  = KVM_IOEVENTFD_FLAG_PIO | 
KVM_IOEVENTFD_FLAG_DATAMATCH,
};
 
-   if (ioctl(ioevent->fn_kvm->vm_fd, KVM_IOEVENTFD, &kvm_ioevent) != 0)
-   die("Failed creating new ioeventfd");
+   if (ioctl(ioevent->fn_kvm->vm_fd, KVM_IOEVENTFD, &kvm_ioevent) != 0) {
+   /* Not all KVM implementations may support KVM_IOEVENTFD,
+* so be graceful.
+*/
+   free(new_ioevent);
+   return false;
+   }
 
epoll_event = (struct epoll_event) {
.events = EPOLLIN,
@@ -60,6 +65,7 @@ void ioeventfd__add_event(struct ioevent *ioevent)
die("Failed assigning new event to the epoll fd");
 
list_add_tail(&new_ioevent->list, &used_ioevents);
+   return true;
 }
 
 void ioeventfd__del_event(u64 addr, u64 datamatch)
diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c
index ffa3768..06d3b79 100644
--- a/tools/kvm/virtio/pci.c
+++ b/tools/kvm/virtio/pci.c
@@ -50,7 +50,16 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, 
struct virtio_trans *vtra
.fd = eventfd(0, 0),
};
 
-   ioeventfd__add_event(&ioevent);
+   if (!ioeventfd__add_event(&ioevent)) {
+#ifndef CONFIG_PPC
+   /* PPC64 doesn't have kvm ioevents yet, so we expect this to
+* fail -- don't need to be verbose about it!  For virtio-pci,
+* this is fine.  It catches the IO accesses anyway, so
+* still works (but slower).
+*/
+   pr_warning("Failed creating new ioeventfd");
+#endif
+   }
 
if (vtrans->virtio_ops->notify_vq_eventfd)
vtrans->virtio_ops->notify_vq_eventfd(kvm, vpci->dev, vq, 
ioevent.fd);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/8] kvm tools: Add PPC64 kvm_cpu__emulate_io()

2011-12-05 Thread Matt Evans
This is the final piece of the puzzle for PPC SPAPR PCI; this
function splits MMIO accesses into the two PHB windows & directs
things to MMIO/IO emulation as appropriate.

Signed-off-by: Matt Evans 
---
 tools/kvm/Makefile  |1 +
 tools/kvm/powerpc/kvm-cpu.c |   34 ++
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 6c8..9b875dd 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -131,6 +131,7 @@ ifeq ($(uname_M), ppc64)
OBJS+= powerpc/spapr_hcall.o
OBJS+= powerpc/spapr_rtas.o
OBJS+= powerpc/spapr_hvcons.o
+   OBJS+= powerpc/spapr_pci.o
OBJS+= powerpc/xics.o
ARCH_INCLUDE := powerpc/include
CFLAGS  += -m64
diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c
index 63cd106..0cf4dc8 100644
--- a/tools/kvm/powerpc/kvm-cpu.c
+++ b/tools/kvm/powerpc/kvm-cpu.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int debug_fd;
 
@@ -177,6 +178,39 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
return ret;
 }
 
+bool kvm_cpu__emulate_io(struct kvm_cpu *cpu, struct kvm_run *kvm_run)
+{
+   bool ret = false;
+   u64 phys_addr;
+
+   /* We'll never get KVM_EXIT_IO, it's x86-specific.  All IO is MM! :P
+* So, look at our windows here & split addresses into I/O or MMIO.
+*/
+   assert(kvm_run->exit_reason == KVM_EXIT_MMIO);
+
+   phys_addr = cpu->kvm_run->mmio.phys_addr;
+   if ((phys_addr >= SPAPR_PCI_IO_WIN_ADDR) &&
+   (phys_addr < SPAPR_PCI_IO_WIN_ADDR + SPAPR_PCI_IO_WIN_SIZE)) {
+   ret = kvm__emulate_io(cpu->kvm, phys_addr - 
SPAPR_PCI_IO_WIN_ADDR,
+ cpu->kvm_run->mmio.data,
+ cpu->kvm_run->mmio.is_write ?
+ KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN,
+ cpu->kvm_run->mmio.len, 1);
+   } else if ((phys_addr >= SPAPR_PCI_MEM_WIN_ADDR) &&
+  (phys_addr < SPAPR_PCI_MEM_WIN_ADDR + 
SPAPR_PCI_MEM_WIN_SIZE)) {
+   ret = kvm__emulate_mmio(cpu->kvm,
+   cpu->kvm_run->mmio.phys_addr - 
SPAPR_PCI_MEM_WIN_ADDR,
+   cpu->kvm_run->mmio.data,
+   cpu->kvm_run->mmio.len,
+   cpu->kvm_run->mmio.is_write);
+   } else {
+   pr_warning("MMIO %s unknown address %lx (size %d)!\n",
+  cpu->kvm_run->mmio.is_write ? "write to" : "read 
from",
+  phys_addr, cpu->kvm_run->mmio.len);
+   }
+   return ret;
+}
+
 #define CONDSTR_BIT(m, b) (((m) & MSR_##b) ? #b" " : "")
 
 void kvm_cpu__show_registers(struct kvm_cpu *vcpu)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8] kvm tools: Add PPC64 PCI Host Bridge

2011-12-05 Thread Matt Evans
This provides the PCI bridge, definitions for the address layout of the windows
and wires in IRQs.  Once PCI devices are all registered, they are enumerated and
DT nodes generated for each.

Signed-off-by: Matt Evans 
---
 tools/kvm/powerpc/include/kvm/kvm-arch.h |3 +
 tools/kvm/powerpc/irq.c  |   17 +-
 tools/kvm/powerpc/kvm.c  |   11 +
 tools/kvm/powerpc/spapr.h|8 +
 tools/kvm/powerpc/spapr_pci.c|  429 ++
 tools/kvm/powerpc/spapr_pci.h|   38 +++
 6 files changed, 504 insertions(+), 2 deletions(-)
 create mode 100644 tools/kvm/powerpc/spapr_pci.c
 create mode 100644 tools/kvm/powerpc/spapr_pci.h

diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h 
b/tools/kvm/powerpc/include/kvm/kvm-arch.h
index ae811e9..ba374f5 100644
--- a/tools/kvm/powerpc/include/kvm/kvm-arch.h
+++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h
@@ -40,6 +40,8 @@
  */
 #define KVM_PCI_MMIO_AREA  0x100
 
+struct spapr_phb;
+
 struct kvm {
int sys_fd; /* For system ioctls(), i.e. 
/dev/kvm */
int vm_fd;  /* For VM ioctls() */
@@ -66,6 +68,7 @@ struct kvm {
unsigned long   initrd_size;
const char  *name;
struct icp_state*icp;
+   struct spapr_phb*phb;
 };
 
 #endif /* KVM__KVM_ARCH_H */
diff --git a/tools/kvm/powerpc/irq.c b/tools/kvm/powerpc/irq.c
index 80c972a..134db8f 100644
--- a/tools/kvm/powerpc/irq.c
+++ b/tools/kvm/powerpc/irq.c
@@ -21,14 +21,27 @@
 #include 
 #include 
 
+#include "kvm/pci.h"
+
 #include "xics.h"
+#include "spapr_pci.h"
 
 #define XICS_IRQS   1024
 
+static int pci_devs = 0;
+
 int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line)
 {
-   fprintf(stderr, "irq__register_device(%d, [%d], [%d], [%d]\n",
-   dev, *num, *pin, *line);
+   if (pci_devs >= PCI_MAX_DEVICES)
+   die("Hit PCI device limit!\n");
+
+   *num = pci_devs++;
+
+   *pin = 1;
+   /* Have I said how nasty I find this?  Line should be dontcare... PHB
+* should determine which CPU/XICS IRQ to fire.
+*/
+   *line = xics_alloc_irqnum();
return 0;
 }
 
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index bfd7c3a..353c667 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -16,6 +16,7 @@
 
 #include "spapr.h"
 #include "spapr_hvcons.h"
+#include "spapr_pci.h"
 
 #include 
 
@@ -166,6 +167,11 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, 
const char *hugetlbfs_
register_core_rtas();
/* Now that hypercalls are initialised, register a couple for the 
console: */
spapr_hvcons_init();
+   spapr_create_phb(kvm, "pci", SPAPR_PCI_BUID,
+SPAPR_PCI_MEM_WIN_ADDR,
+SPAPR_PCI_MEM_WIN_SIZE,
+SPAPR_PCI_IO_WIN_ADDR,
+SPAPR_PCI_IO_WIN_SIZE);
 }
 
 void kvm__irq_trigger(struct kvm *kvm, int irq)
@@ -420,6 +426,11 @@ static void setup_fdt(struct kvm *kvm)
_FDT(fdt_finish(fdt));
 
_FDT(fdt_open_into(fdt, fdt_dest, FDT_MAX_SIZE));
+
+   /* PCI */
+   if (spapr_populate_pci_devices(kvm, PHANDLE_XICP, fdt_dest))
+   die("Fail populating PCI device nodes");
+
_FDT(fdt_add_mem_rsv(fdt_dest, kvm->rtas_gra, kvm->rtas_size));
_FDT(fdt_pack(fdt_dest));
 }
diff --git a/tools/kvm/powerpc/spapr.h b/tools/kvm/powerpc/spapr.h
index 4e5d7bd..902496d 100644
--- a/tools/kvm/powerpc/spapr.h
+++ b/tools/kvm/powerpc/spapr.h
@@ -305,4 +305,12 @@ target_ulong spapr_rtas_call(struct kvm_cpu *vcpu,
  uint32_t token, uint32_t nargs, target_ulong args,
  uint32_t nret, target_ulong rets);
 
+#define SPAPR_PCI_BUID  0x8002001ULL
+#define SPAPR_PCI_MEM_WIN_ADDR  (KVM_MMIO_START + 0xA000)
+#define SPAPR_PCI_MEM_WIN_SIZE  0x2000
+#define SPAPR_PCI_IO_WIN_ADDR   (KVM_MMIO_START + 0x8000)
+/* This, to me, is odd... 32MB of I/O?  Some PHBs are set up like this.
+ * Anything ever use > 64K? :P */
+#define SPAPR_PCI_IO_WIN_SIZE  0x200
+
 #endif /* !defined (__HW_SPAPR_H__) */
diff --git a/tools/kvm/powerpc/spapr_pci.c b/tools/kvm/powerpc/spapr_pci.c
new file mode 100644
index 000..233c42c
--- /dev/null
+++ b/tools/kvm/powerpc/spapr_pci.c
@@ -0,0 +1,429 @@
+/*
+ * SPAPR PHB emulation, RTAS interface to PCI config space, device tree nodes
+ * for enumerated devices.
+ *
+ * Borrowed heavily from QEMU's spapr_pci.c,
+ * Copyright (c) 2011 Alexey Kardashevskiy, IBM Corporation.
+ * Copyright (c) 2011 David Gibson, IBM Corporation.
+ *
+ * Modifications copyright 2011 Matt Evans , IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by t

[PATCH 5/8] kvm tools: Add PPC64 XICS interrupt controller support

2011-12-05 Thread Matt Evans
This patch adds XICS emulation code (heavily borrowed from QEMU), and wires
this into kvm_cpu__irq() to fire a CPU IRQ via KVM.  A device tree entry is
also added.  IPIs work, xics_alloc_irqnum() is added to allocate an external
IRQ (which will later be used by the PHB PCI code) and finally, kvm__irq_line()
can be called to raise an IRQ on XICS.

Signed-off-by: Matt Evans 
---
 tools/kvm/Makefile   |1 +
 tools/kvm/powerpc/include/kvm/kvm-arch.h |1 +
 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h |2 +
 tools/kvm/powerpc/irq.c  |   11 +-
 tools/kvm/powerpc/kvm-cpu.c  |   10 +
 tools/kvm/powerpc/kvm.c  |   25 +-
 tools/kvm/powerpc/xics.c |  529 ++
 tools/kvm/powerpc/xics.h |   23 ++
 8 files changed, 596 insertions(+), 6 deletions(-)
 create mode 100644 tools/kvm/powerpc/xics.c
 create mode 100644 tools/kvm/powerpc/xics.h

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 76cce3a..6c8 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -131,6 +131,7 @@ ifeq ($(uname_M), ppc64)
OBJS+= powerpc/spapr_hcall.o
OBJS+= powerpc/spapr_rtas.o
OBJS+= powerpc/spapr_hvcons.o
+   OBJS+= powerpc/xics.o
ARCH_INCLUDE := powerpc/include
CFLAGS  += -m64
LIBS+= -lfdt
diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h 
b/tools/kvm/powerpc/include/kvm/kvm-arch.h
index 722d01c..ae811e9 100644
--- a/tools/kvm/powerpc/include/kvm/kvm-arch.h
+++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h
@@ -65,6 +65,7 @@ struct kvm {
unsigned long   initrd_gra;
unsigned long   initrd_size;
const char  *name;
+   struct icp_state*icp;
 };
 
 #endif /* KVM__KVM_ARCH_H */
diff --git a/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h 
b/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
index dbabc57..551307e 100644
--- a/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
+++ b/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
@@ -17,6 +17,8 @@
 
 #include 
 
+#define POWER7_EXT_IRQ 0
+
 struct kvm;
 
 struct kvm_cpu {
diff --git a/tools/kvm/powerpc/irq.c b/tools/kvm/powerpc/irq.c
index 46aa64f..80c972a 100644
--- a/tools/kvm/powerpc/irq.c
+++ b/tools/kvm/powerpc/irq.c
@@ -21,6 +21,10 @@
 #include 
 #include 
 
+#include "xics.h"
+
+#define XICS_IRQS   1024
+
 int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line)
 {
fprintf(stderr, "irq__register_device(%d, [%d], [%d], [%d]\n",
@@ -30,7 +34,12 @@ int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line)
 
 void irq__init(struct kvm *kvm)
 {
-   fprintf(stderr, __func__);
+   /* kvm->nr_cpus is now valid; for /now/, pass
+* this to xics_system_init(), which assumes servers
+* are numbered 0..nrcpus.  This may not really be true,
+* but it is OK currently.
+*/
+   kvm->icp = xics_system_init(XICS_IRQS, kvm->nrcpus);
 }
 
 int irq__add_msix_route(struct kvm *kvm, struct msi_msg *msg)
diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c
index 71c648e..63cd106 100644
--- a/tools/kvm/powerpc/kvm-cpu.c
+++ b/tools/kvm/powerpc/kvm-cpu.c
@@ -15,6 +15,7 @@
 #include "kvm/kvm.h"
 
 #include "spapr.h"
+#include "xics.h"
 
 #include 
 #include 
@@ -107,6 +108,9 @@ struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned 
long cpu_id)
 */
vcpu->is_running = true;
 
+   /* Register with IRQ controller */
+   xics_cpu_register(vcpu);
+
return vcpu;
 }
 
@@ -151,6 +155,12 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu)
 /* kvm_cpu__irq - set KVM's IRQ flag on this vcpu */
 void kvm_cpu__irq(struct kvm_cpu *vcpu, int pin, int level)
 {
+   unsigned int virq = level ? KVM_INTERRUPT_SET_LEVEL : 
KVM_INTERRUPT_UNSET;
+
+   if (pin != POWER7_EXT_IRQ)
+   return;
+   if (ioctl(vcpu->vcpu_fd, KVM_INTERRUPT, &virq) < 0)
+   pr_warning("Could not KVM_INTERRUPT.");
 }
 
 bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index 8614538..bfd7c3a 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -41,9 +41,13 @@
 
 #define HUGETLBFS_PATH "/var/lib/hugetlbfs/global/pagesize-16MB/"
 
+#define PHANDLE_XICP   0x
+
 static char kern_cmdline[2048];
 
 struct kvm_ext kvm_req_ext[] = {
+   { DEFINE_KVM_EXT(KVM_CAP_PPC_UNSET_IRQ) },
+   { DEFINE_KVM_EXT(KVM_CAP_PPC_IRQ_LEVEL) },
{ 0, 0 }
 };
 
@@ -164,11 +168,6 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, 
const char *hugetlbfs_
spapr_hvcons_init();
 }
 
-void kvm__irq_line(struct kvm *kvm, int irq, int level)
-{
-   fprintf(stderr, "irq_line(%d, %d)\n", irq, level);
-}
-
 void kvm__irq_trigger(struct kvm *kvm, int irq)
 {
kvm__irq_line(kvm, irq, 1);
@@ -384,6 +383,22 @@ static void setup_fdt(str

[PATCH 4/8] kvm tools: Add SPAPR PPC64 HV console

2011-12-05 Thread Matt Evans
This adds the console code, plus VIO HV terminal nodes are added to
the device tree so the guest kernel will pick it up.

Signed-off-by: Matt Evans 
---
 tools/kvm/Makefile   |1 +
 tools/kvm/powerpc/kvm.c  |   31 
 tools/kvm/powerpc/spapr_hvcons.c |  101 ++
 tools/kvm/powerpc/spapr_hvcons.h |   19 +++
 4 files changed, 152 insertions(+), 0 deletions(-)
 create mode 100644 tools/kvm/powerpc/spapr_hvcons.c
 create mode 100644 tools/kvm/powerpc/spapr_hvcons.h

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 0f24104..76cce3a 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -130,6 +130,7 @@ ifeq ($(uname_M), ppc64)
OBJS+= powerpc/kvm-cpu.o
OBJS+= powerpc/spapr_hcall.o
OBJS+= powerpc/spapr_rtas.o
+   OBJS+= powerpc/spapr_hvcons.o
ARCH_INCLUDE := powerpc/include
CFLAGS  += -m64
LIBS+= -lfdt
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index 2f0a921..8614538 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -15,6 +15,7 @@
 #include "kvm/util.h"
 
 #include "spapr.h"
+#include "spapr_hvcons.h"
 
 #include 
 
@@ -159,6 +160,8 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, 
const char *hugetlbfs_
/* Do these before FDT setup, IRQ setup, etc. */
hypercall_init();
register_core_rtas();
+   /* Now that hypercalls are initialised, register a couple for the 
console: */
+   spapr_hvcons_init();
 }
 
 void kvm__irq_line(struct kvm *kvm, int irq, int level)
@@ -172,6 +175,11 @@ void kvm__irq_trigger(struct kvm *kvm, int irq)
kvm__irq_line(kvm, irq, 0);
 }
 
+void kvm__arch_periodic_poll(struct kvm *kvm)
+{
+   spapr_hvcons_poll(kvm);
+}
+
 int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline)
 {
void *p;
@@ -297,6 +305,13 @@ static void setup_fdt(struct kvm *kvm)
   &ird_end_prop, sizeof(ird_end_prop)));
}
 
+   /* stdout-path: This is assuming we're using the HV console.  Also, the
+* address is hardwired until we do a VIO bus.
+*/
+   _FDT(fdt_property_string(fdt, "linux,stdout-path",
+"/vdevice/vty@3000"));
+   _FDT(fdt_end_node(fdt));
+
/* Memory: We don't alloc. a separate RMA yet.  If we ever need to
 * (CAP_PPC_RMA == 2) then have one memory node for 0->RMAsize, and
 * another RMAsize->endOfMem.
@@ -369,6 +384,22 @@ static void setup_fdt(struct kvm *kvm)
}
_FDT(fdt_end_node(fdt));
 
+   /* VIO: See comment in linux,stdout-path; we don't yet represent a VIO
+* bus/address allocation so addresses are hardwired here.
+*/
+   _FDT(fdt_begin_node(fdt, "vdevice"));
+   _FDT(fdt_property_cell(fdt, "#address-cells", 0x1));
+   _FDT(fdt_property_cell(fdt, "#size-cells", 0x0));
+   _FDT(fdt_property_string(fdt, "device_type", "vdevice"));
+   _FDT(fdt_property_string(fdt, "compatible", "IBM,vdevice"));
+   _FDT(fdt_begin_node(fdt, "vty@3000"));
+   _FDT(fdt_property_string(fdt, "name", "vty"));
+   _FDT(fdt_property_string(fdt, "device_type", "serial"));
+   _FDT(fdt_property_string(fdt, "compatible", "hvterm1"));
+   _FDT(fdt_property_cell(fdt, "reg", 0x3000));
+   _FDT(fdt_end_node(fdt));
+   _FDT(fdt_end_node(fdt));
+
/* Finalise: */
_FDT(fdt_end_node(fdt)); /* Root node */
_FDT(fdt_finish(fdt));
diff --git a/tools/kvm/powerpc/spapr_hvcons.c b/tools/kvm/powerpc/spapr_hvcons.c
new file mode 100644
index 000..97902ac
--- /dev/null
+++ b/tools/kvm/powerpc/spapr_hvcons.c
@@ -0,0 +1,101 @@
+/*
+ * SPAPR HV console
+ *
+ * Borrowed lightly from QEMU's spapr_vty.c, Copyright (c) 2010 David Gibson,
+ * IBM Corporation.
+ *
+ * Copyright (c) 2011 Matt Evans , IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include "kvm/term.h"
+#include "kvm/kvm.h"
+#include "kvm/kvm-cpu.h"
+#include "kvm/util.h"
+#include "spapr.h"
+#include "spapr_hvcons.h"
+
+#include 
+#include 
+#include 
+
+#include 
+
+union hv_chario {
+   struct {
+   uint64_t char0_7;
+   uint64_t char8_15;
+   } a;
+   uint8_t buf[16];
+};
+
+static unsigned long h_put_term_char(struct kvm_cpu *vcpu, unsigned long 
opcode, unsigned long *args)
+{
+   /* To do: Read register from args[0], and check it. */
+   unsigned long len = args[1];
+   union hv_chario data;
+   struct iovec iov;
+
+   if (len > 16) {
+   return H_PARAMETER;
+   }
+   data.a.char0_7 = cpu_to_be64(args[2]);
+   data.a.char8_15 = cpu_to_be64(args[3]);
+
+   iov.iov_base = data.buf;
+   iov.iov_

[PATCH 3/8] kvm tools: Add SPAPR PPC64 hcall & rtascall structure

2011-12-05 Thread Matt Evans
This patch adds the basic structure for HV calls, their registration and some of
the simpler calls.  A similar layout for RTAS calls is also added, again with
some of the simpler RTAS calls used by the guest.  The SPAPR RTAS stub is
generated inline.  Also, nodes for RTAS are added to the device tree.

Signed-off-by: Matt Evans 
---
 tools/kvm/Makefile  |2 +
 tools/kvm/powerpc/kvm-cpu.c |5 +
 tools/kvm/powerpc/kvm.c |   39 +-
 tools/kvm/powerpc/spapr.h   |  308 +++
 tools/kvm/powerpc/spapr_hcall.c |  151 +++
 tools/kvm/powerpc/spapr_rtas.c  |  226 
 6 files changed, 730 insertions(+), 1 deletions(-)
 create mode 100644 tools/kvm/powerpc/spapr.h
 create mode 100644 tools/kvm/powerpc/spapr_hcall.c
 create mode 100644 tools/kvm/powerpc/spapr_rtas.c

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index dc18959..0f24104 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -128,6 +128,8 @@ ifeq ($(uname_M), ppc64)
OBJS+= powerpc/irq.o
OBJS+= powerpc/kvm.o
OBJS+= powerpc/kvm-cpu.o
+   OBJS+= powerpc/spapr_hcall.o
+   OBJS+= powerpc/spapr_rtas.o
ARCH_INCLUDE := powerpc/include
CFLAGS  += -m64
LIBS+= -lfdt
diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c
index 79422ff..71c648e 100644
--- a/tools/kvm/powerpc/kvm-cpu.c
+++ b/tools/kvm/powerpc/kvm-cpu.c
@@ -14,6 +14,8 @@
 #include "kvm/util.h"
 #include "kvm/kvm.h"
 
+#include "spapr.h"
+
 #include 
 #include 
 #include 
@@ -156,6 +158,9 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
bool ret = true;
struct kvm_run *run = vcpu->kvm_run;
switch(run->exit_reason) {
+   case KVM_EXIT_PAPR_HCALL:
+   run->papr_hcall.ret = spapr_hypercall(vcpu, run->papr_hcall.nr, 
run->papr_hcall.args);
+   break;
default:
ret = false;
}
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index d792bee..2f0a921 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -14,6 +14,8 @@
 #include "kvm/kvm.h"
 #include "kvm/util.h"
 
+#include "spapr.h"
+
 #include 
 
 #include 
@@ -153,6 +155,10 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, 
const char *hugetlbfs_
cap_ppc_rma = ioctl(kvm->sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_PPC_RMA);
if (cap_ppc_rma == 2)
die("Need contiguous RMA allocation on this hardware, which is 
not yet supported.");
+
+   /* Do these before FDT setup, IRQ setup, etc. */
+   hypercall_init();
+   register_core_rtas();
 }
 
 void kvm__irq_line(struct kvm *kvm, int irq, int level)
@@ -262,6 +268,20 @@ static void setup_fdt(struct kvm *kvm)
_FDT(fdt_property_cell(fdt, "#address-cells", 0x2));
_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
 
+   /* RTAS */
+   _FDT(fdt_begin_node(fdt, "rtas"));
+   /* This is what the kernel uses to switch 'We're an LPAR'! */
+_FDT(fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop_kvm,
+   sizeof(hypertas_prop_kvm)));
+   _FDT(fdt_property_cell(fdt, "linux,rtas-base", kvm->rtas_gra));
+   _FDT(fdt_property_cell(fdt, "linux,rtas-entry", kvm->rtas_gra));
+   _FDT(fdt_property_cell(fdt, "rtas-size", kvm->rtas_size));
+   /* Now add properties for all RTAS tokens: */
+   if (spapr_rtas_fdt_setup(kvm, fdt))
+   die("Couldn't create RTAS FDT properties\n");
+
+   _FDT(fdt_end_node(fdt));
+
/* /chosen */
_FDT(fdt_begin_node(fdt, "chosen"));
/* cmdline */
@@ -363,7 +383,24 @@ static void setup_fdt(struct kvm *kvm)
  */
 void kvm__arch_setup_firmware(struct kvm *kvm)
 {
-   /* Load RTAS */
+   /* Set up RTAS stub.  All it is is a single hypercall:
+  0:   7c 64 1b 78 mr  r4,r3
+  4:   3c 60 00 00 lis r3,0
+  8:   60 63 f0 00 ori r3,r3,61440
+  c:   44 00 00 22 sc  1
+ 10:   4e 80 00 20 blr
+   */
+   uint32_t *rtas = guest_flat_to_host(kvm, kvm->rtas_gra);
+
+   rtas[0] = 0x7c641b78;
+   rtas[1] = 0x3c60;
+   rtas[2] = 0x6063f000;
+   rtas[3] = 0x4422;
+   rtas[4] = 0x4e800020;
+   kvm->rtas_size = 20;
+
+   pr_info("Set up %ld bytes of RTAS at 0x%lx\n",
+   kvm->rtas_size, kvm->rtas_gra);
 
/* Load SLOF */
 
diff --git a/tools/kvm/powerpc/spapr.h b/tools/kvm/powerpc/spapr.h
new file mode 100644
index 000..4e5d7bd
--- /dev/null
+++ b/tools/kvm/powerpc/spapr.h
@@ -0,0 +1,308 @@
+/*
+ * SPAPR definitions and declarations
+ *
+ * Borrowed heavily from QEMU's spapr.h,
+ * Copyright (c) 2010 David Gibson, IBM Corporation.
+ *
+ * Modifications by Matt Evans , IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of t

[PATCH 0/8] kvm tools SPAPR PPC64 support

2011-12-05 Thread Matt Evans
Hi,

This set of patches builds upon the prep-work of the previous set and adds
support to kvmtool for PPC64 SPAPR-based guests, i.e. an environment akin to an
LPAR on IBM's pSeries machines.

This support is not yet fully-featured but, in a basic state, works well.
The guests have a functional but no-frills experience, with:

- SMP guests
- HV console (or RTAS console, for udbg)
- Net, block over virtio-pci
- No PAPR VIO/VSCSI/VNET yet
- No fancyfeatures like migration yet

Though minimal, guests are quite stable.

There are obvious areas for future improvement:

- Non-VRMA RMAs aren't supported, meaning POWER7-only for the moment
- Other CPU-specific details are currently assumed (e.g. available page sizes);
  work is required to determine host capabilities and pass these up.
- Support SLOF
- Maybe support VIO
- Some hypercalls used by partition firmware/SLOF (not the kernel) are
  unimplemented
- Fancy PCI (e.g. passthrough)
- Currently KVM_NR_CPUs is arbitrarily fixed at 255, and could be higher.
  Guests with this many CPUs boot fine.

Some PPC KVM kernel-side features aren't implemented yet and have required
kvmtool workarounds; mmio coalescing isn't supported and lack of ioeventfds
requires virtio to gracefully fall back when it fails to register one.


Cheers,


Matt



Matt Evans (8):
  kvm tools: Add initial SPAPR PPC64 architecture support
  kvm tools: Generate SPAPR PPC64 guest device tree
  kvm tools: Add SPAPR PPC64 hcall & rtascall structure
  kvm tools: Add SPAPR PPC64 HV console
  kvm tools: Add PPC64 XICS interrupt controller support
  kvm tools: Add PPC64 PCI Host Bridge
  kvm tools: Add PPC64 kvm_cpu__emulate_io()
  kvm tools: Make virtio-pci's ioeventfd__add_event() fall back
gracefully if ioeventfds unavailable

 tools/kvm/Makefile   |   16 +
 tools/kvm/include/kvm/ioeventfd.h|3 +-
 tools/kvm/ioeventfd.c|   12 +-
 tools/kvm/kvm.c  |3 +
 tools/kvm/powerpc/include/kvm/barrier.h  |6 +
 tools/kvm/powerpc/include/kvm/kvm-arch.h |   74 
 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h |   48 +++
 tools/kvm/powerpc/ioport.c   |   18 +
 tools/kvm/powerpc/irq.c  |   62 +++
 tools/kvm/powerpc/kvm-cpu.c  |  281 ++
 tools/kvm/powerpc/kvm.c  |  466 +++
 tools/kvm/powerpc/spapr.h|  316 +++
 tools/kvm/powerpc/spapr_hcall.c  |  151 
 tools/kvm/powerpc/spapr_hvcons.c |  101 +
 tools/kvm/powerpc/spapr_hvcons.h |   19 +
 tools/kvm/powerpc/spapr_pci.c|  429 +
 tools/kvm/powerpc/spapr_pci.h|   38 ++
 tools/kvm/powerpc/spapr_rtas.c   |  226 +++
 tools/kvm/powerpc/xics.c |  529 ++
 tools/kvm/powerpc/xics.h |   23 ++
 tools/kvm/virtio/pci.c   |   11 +-
 21 files changed, 2827 insertions(+), 5 deletions(-)
 create mode 100644 tools/kvm/powerpc/include/kvm/barrier.h
 create mode 100644 tools/kvm/powerpc/include/kvm/kvm-arch.h
 create mode 100644 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
 create mode 100644 tools/kvm/powerpc/ioport.c
 create mode 100644 tools/kvm/powerpc/irq.c
 create mode 100644 tools/kvm/powerpc/kvm-cpu.c
 create mode 100644 tools/kvm/powerpc/kvm.c
 create mode 100644 tools/kvm/powerpc/spapr.h
 create mode 100644 tools/kvm/powerpc/spapr_hcall.c
 create mode 100644 tools/kvm/powerpc/spapr_hvcons.c
 create mode 100644 tools/kvm/powerpc/spapr_hvcons.h
 create mode 100644 tools/kvm/powerpc/spapr_pci.c
 create mode 100644 tools/kvm/powerpc/spapr_pci.h
 create mode 100644 tools/kvm/powerpc/spapr_rtas.c
 create mode 100644 tools/kvm/powerpc/xics.c
 create mode 100644 tools/kvm/powerpc/xics.h

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/8] kvm tools: Generate SPAPR PPC64 guest device tree

2011-12-05 Thread Matt Evans
The generated DT is the bare minimum structure required for SPAPR (on which
subsequent patches for VIO, XICS, PCI etc. will build); root node, cpus, memory.

Some aspects are currently hardwired for simplicity, for example advertised
page sizes, HPT size, SLB size, VMX/DFP, etc.  Future support of a variety
of POWER CPUs should acquire this info from the host and encode appropriately.

This requires a 64-bit libfdt.

Signed-off-by: Matt Evans 
---
 tools/kvm/Makefile  |3 +-
 tools/kvm/powerpc/kvm.c |  141 +++
 2 files changed, 143 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 58815a2..dc18959 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -129,7 +129,8 @@ ifeq ($(uname_M), ppc64)
OBJS+= powerpc/kvm.o
OBJS+= powerpc/kvm-cpu.o
ARCH_INCLUDE := powerpc/include
-   CFLAGS += -m64
+   CFLAGS  += -m64
+   LIBS+= -lfdt
 endif
 
 ###
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index 036bfc0..d792bee 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -3,6 +3,9 @@
  *
  * Copyright 2011 Matt Evans , IBM Corporation.
  *
+ * Portions of FDT setup borrowed from QEMU, copyright 2010 David Gibson, IBM
+ * Corporation.
+ *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License version 2 as published
  * by the Free Software Foundation.
@@ -28,8 +31,11 @@
 #include 
 #include 
 
+#include 
 #include 
 
+#define HPT_ORDER 24
+
 #define HUGETLBFS_PATH "/var/lib/hugetlbfs/global/pagesize-16MB/"
 
 static char kern_cmdline[2048];
@@ -212,9 +218,144 @@ bool load_bzimage(struct kvm *kvm, int fd_kernel,
return false;
 }
 
+#define SMT_THREADS 4
+
+#define _FDT(exp)  \
+   do {\
+   int ret = (exp);\
+   if (ret < 0) {  \
+   die("Error creating device tree: %s: %s\n", \
+   #exp, fdt_strerror(ret));   \
+   }   \
+   } while (0)
+
+static uint32_t mfpvr(void)
+{
+   uint32_t r;
+   asm volatile ("mfpvr %0" : "=r"(r));
+   return r;
+}
+
 static void setup_fdt(struct kvm *kvm)
 {
+   uint64_tmem_reg_property[] = { 0, cpu_to_be64(kvm->ram_size) };
+   int smp_cpus = kvm->nrcpus;
+   uint32_tinterrupt_server_ranges_prop[] = {0, 
cpu_to_be32(smp_cpus)};
+   charhypertas_prop_kvm[] = 
"hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
+   "\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk";
+   int i, j;
+   charcpu_name[30];
+   u8  staging_fdt[FDT_MAX_SIZE];
+   uint32_tpvr = mfpvr();
+
+   /* Generate an appropriate DT at kvm->fdt_gra */
+   void *fdt_dest = guest_flat_to_host(kvm, kvm->fdt_gra);
+   void *fdt = staging_fdt;
+
+   _FDT(fdt_create(fdt, FDT_MAX_SIZE));
+   _FDT(fdt_finish_reservemap(fdt));
+
+   _FDT(fdt_begin_node(fdt, ""));
+
+   _FDT(fdt_property_string(fdt, "device_type", "chrp"));
+   _FDT(fdt_property_string(fdt, "model", "IBM pSeries (emulated by 
kvmtool)"));
+   _FDT(fdt_property_cell(fdt, "#address-cells", 0x2));
+   _FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
+
+   /* /chosen */
+   _FDT(fdt_begin_node(fdt, "chosen"));
+   /* cmdline */
+   _FDT(fdt_property_string(fdt, "bootargs", kern_cmdline));
+   /* Initrd */
+   if (kvm->initrd_size != 0) {
+   uint32_t ird_st_prop = cpu_to_be32(kvm->initrd_gra);
+   uint32_t ird_end_prop = cpu_to_be32(kvm->initrd_gra +
+   kvm->initrd_size);
+   _FDT(fdt_property(fdt, "linux,initrd-start",
+  &ird_st_prop, sizeof(ird_st_prop)));
+   _FDT(fdt_property(fdt, "linux,initrd-end",
+  &ird_end_prop, sizeof(ird_end_prop)));
+   }
+
+   /* Memory: We don't alloc. a separate RMA yet.  If we ever need to
+* (CAP_PPC_RMA == 2) then have one memory node for 0->RMAsize, and
+* another RMAsize->endOfMem.
+*/
+   _FDT(fdt_begin_node(fdt, "memory@0"));
+   _FDT(fdt_property_string(fdt, "device_type", "memory"));
+   _FDT(fdt_property(fdt, "reg", mem_reg_property, 
sizeof(mem_reg_property)));
+   _FDT(fdt_end_node(fdt));
+
+   /* CPUs */
+   _FDT(fdt_begin_node(fdt, "cpus"));
+   _FDT(fdt_property_cell(fdt, "#address-cells", 0x1));
+   _FDT(fdt_property_cell(fdt, "#size-cells", 0x0));
+
+   for (i = 0; i < smp_cpus; i += SMT_THREADS) {

[PATCH 1/8] kvm tools: Add initial SPAPR PPC64 architecture support

2011-12-05 Thread Matt Evans
This patch adds a new arch directory, powerpc, basic file structure, register
setup and where necessary stubs out arch-specific functions (e.g. interrupts,
runloop exits) that later patches will provide.  The target is an
SPAPR-compliant PPC64 machine (i.e. pSeries); there is no support for PPC32 or
'bare metal' PPC64 guests as yet.  Subsequent patches implement the hcalls and
RTAS required to boot SPAPR pSeries kernels.

Memory is mapped from hugetlbfs (as that is currently required by upstream PPC64
HV-mode KVM).  The mapping of a VRMA region is yet to be implemented; this is
only necessary on processors that don't support VRMA, e.g. <= P6.  Work is
therefore needed to get this going on pre-P7 CPUs.

Processor state is set up as a guest kernel would expect (both primary and
secondaries), and SMP is fully supported.

Finally, support is added for simply loading flat binary kernels (plus initrd).
(bzImages are not used on PPC, and this series does not add zImage support or an
ELF loader.)  The intention is to later support loading firmware such as SLOF.

Signed-off-by: Matt Evans 
---
 tools/kvm/Makefile   |   10 +
 tools/kvm/kvm.c  |3 +
 tools/kvm/powerpc/include/kvm/barrier.h  |6 +
 tools/kvm/powerpc/include/kvm/kvm-arch.h |   70 
 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h |   46 +
 tools/kvm/powerpc/ioport.c   |   18 ++
 tools/kvm/powerpc/irq.c  |   40 +
 tools/kvm/powerpc/kvm-cpu.c  |  232 ++
 tools/kvm/powerpc/kvm.c  |  231 +
 9 files changed, 656 insertions(+), 0 deletions(-)
 create mode 100644 tools/kvm/powerpc/include/kvm/barrier.h
 create mode 100644 tools/kvm/powerpc/include/kvm/kvm-arch.h
 create mode 100644 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
 create mode 100644 tools/kvm/powerpc/ioport.c
 create mode 100644 tools/kvm/powerpc/irq.c
 create mode 100644 tools/kvm/powerpc/kvm-cpu.c
 create mode 100644 tools/kvm/powerpc/kvm.c

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 57dc521..58815a2 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -121,6 +121,16 @@ ifeq ($(ARCH),x86)
OTHEROBJS   += x86/bios/bios-rom.o
ARCH_INCLUDE := x86/include
 endif
+# POWER/ppc:  Actually only support ppc64 currently.
+ifeq ($(uname_M), ppc64)
+   DEFINES += -DCONFIG_PPC
+   OBJS+= powerpc/ioport.o
+   OBJS+= powerpc/irq.o
+   OBJS+= powerpc/kvm.o
+   OBJS+= powerpc/kvm-cpu.o
+   ARCH_INCLUDE := powerpc/include
+   CFLAGS += -m64
+endif
 
 ###
 
diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 503ceae..d716ede 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -49,6 +49,9 @@ const char *kvm_exit_reasons[] = {
DEFINE_KVM_EXIT_REASON(KVM_EXIT_DCR),
DEFINE_KVM_EXIT_REASON(KVM_EXIT_NMI),
DEFINE_KVM_EXIT_REASON(KVM_EXIT_INTERNAL_ERROR),
+#ifdef CONFIG_PPC64
+   DEFINE_KVM_EXIT_REASON(KVM_EXIT_PAPR_HCALL),
+#endif
 };
 
 extern struct kvm *kvm;
diff --git a/tools/kvm/powerpc/include/kvm/barrier.h 
b/tools/kvm/powerpc/include/kvm/barrier.h
new file mode 100644
index 000..bc7d179
--- /dev/null
+++ b/tools/kvm/powerpc/include/kvm/barrier.h
@@ -0,0 +1,6 @@
+#ifndef _KVM_BARRIER_H_
+#define _KVM_BARRIER_H_
+
+#include 
+
+#endif /* _KVM_BARRIER_H_ */
diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h 
b/tools/kvm/powerpc/include/kvm/kvm-arch.h
new file mode 100644
index 000..722d01c
--- /dev/null
+++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h
@@ -0,0 +1,70 @@
+/*
+ * PPC64 architecture-specific definitions
+ *
+ * Copyright 2011 Matt Evans , IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#ifndef KVM__KVM_ARCH_H
+#define KVM__KVM_ARCH_H
+
+#include 
+#include 
+#include 
+
+#define KVM_NR_CPUS(255)
+
+/* MMIO lives after RAM, but it'd be nice if it didn't constantly move.
+ * Choose a suitably high address, e.g. 63T...  This limits RAM size.
+ */
+#define PPC_MMIO_START 0x3F00UL
+#define PPC_MMIO_SIZE  0x0100UL
+
+#define KERNEL_LOAD_ADDR   0x
+#define KERNEL_START_ADDR  0x
+#define KERNEL_SECONDARY_START_ADDR 0x0060
+#define INITRD_LOAD_ADDR   0x0280
+
+#define FDT_MAX_SIZE   0x1
+#define RTAS_MAX_SIZE  0x1
+
+#define TIMEBASE_FREQ  51200ULL
+
+#define KVM_MMIO_START PPC_MMIO_START
+
+/* This is the address that pci_get_io_space_block() starts allocating
+ * from.  Note that this is a PCI bus address.
+ */
+#define KVM_PCI_MMIO_AREA  0x100
+
+struct kvm {
+   int  

[PATCH 28/28] kvm tools: Create arch-specific kvm_cpu__emulate_io()

2011-12-05 Thread Matt Evans
Different architectures will deal with MMIO exits differently.  For example,
KVM_EXIT_IO is x86-specific, and I/O cycles are often synthesisted by steering
into windows in PCI bridges on other architectures.

This patch moves the IO/MMIO exit code from the main runloop into x86/kvm-cpu.c

Signed-off-by: Matt Evans 
---
 tools/kvm/include/kvm/kvm-cpu.h |1 +
 tools/kvm/kvm-cpu.c |   37 +
 tools/kvm/x86/kvm-cpu.c |   37 +
 3 files changed, 43 insertions(+), 32 deletions(-)

diff --git a/tools/kvm/include/kvm/kvm-cpu.h b/tools/kvm/include/kvm/kvm-cpu.h
index 15618f1..6f38c0c 100644
--- a/tools/kvm/include/kvm/kvm-cpu.h
+++ b/tools/kvm/include/kvm/kvm-cpu.h
@@ -13,6 +13,7 @@ void kvm_cpu__run(struct kvm_cpu *vcpu);
 void kvm_cpu__reboot(void);
 int kvm_cpu__start(struct kvm_cpu *cpu);
 bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu);
+bool kvm_cpu__emulate_io(struct kvm_cpu *cpu, struct kvm_run *kvm_run);
 
 int kvm_cpu__get_debug_fd(void);
 void kvm_cpu__set_debug_fd(int fd);
diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c
index 884a89f..c9fbc81 100644
--- a/tools/kvm/kvm-cpu.c
+++ b/tools/kvm/kvm-cpu.c
@@ -103,49 +103,22 @@ int kvm_cpu__start(struct kvm_cpu *cpu)
kvm_cpu__show_registers(cpu);
kvm_cpu__show_code(cpu);
break;
-   case KVM_EXIT_IO: {
-   bool ret;
-
-   ret = kvm__emulate_io(cpu->kvm,
-   cpu->kvm_run->io.port,
-   (u8 *)cpu->kvm_run +
-   cpu->kvm_run->io.data_offset,
-   cpu->kvm_run->io.direction,
-   cpu->kvm_run->io.size,
-   cpu->kvm_run->io.count);
-
-   if (!ret)
+   case KVM_EXIT_IO:
+   case KVM_EXIT_MMIO:
+   if (!kvm_cpu__emulate_io(cpu, cpu->kvm_run))
goto panic_kvm;
break;
-   }
-   case KVM_EXIT_MMIO: {
-   bool ret;
-
-   ret = kvm__emulate_mmio(cpu->kvm,
-   cpu->kvm_run->mmio.phys_addr,
-   cpu->kvm_run->mmio.data,
-   cpu->kvm_run->mmio.len,
-   cpu->kvm_run->mmio.is_write);
-
-   if (!ret)
-   goto panic_kvm;
-   break;
-   }
case KVM_EXIT_INTR:
if (cpu->is_running)
break;
goto exit_kvm;
case KVM_EXIT_SHUTDOWN:
goto exit_kvm;
-   default: {
-   bool ret;
-
-   ret = kvm_cpu__handle_exit(cpu);
-   if (!ret)
+   default:
+   if (!kvm_cpu__handle_exit(cpu))
goto panic_kvm;
break;
}
-   }
kvm_cpu__handle_coalesced_mmio(cpu);
}
 
diff --git a/tools/kvm/x86/kvm-cpu.c b/tools/kvm/x86/kvm-cpu.c
index a0d10cc..665d742 100644
--- a/tools/kvm/x86/kvm-cpu.c
+++ b/tools/kvm/x86/kvm-cpu.c
@@ -217,6 +217,43 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
return false;
 }
 
+bool kvm_cpu__emulate_io(struct kvm_cpu *cpu, struct kvm_run *kvm_run)
+{
+   bool ret;
+   switch (kvm_run->exit_reason) {
+   case KVM_EXIT_IO: {
+   ret = kvm__emulate_io(cpu->kvm,
+ cpu->kvm_run->io.port,
+ (u8 *)cpu->kvm_run +
+ cpu->kvm_run->io.data_offset,
+ cpu->kvm_run->io.direction,
+ cpu->kvm_run->io.size,
+ cpu->kvm_run->io.count);
+
+   if (!ret)
+   goto panic_kvm;
+   break;
+   }
+   case KVM_EXIT_MMIO: {
+   ret = kvm__emulate_mmio(cpu->kvm,
+   cpu->kvm_run->mmio.phys_addr,
+   cpu->kvm_run->mmio.data,
+   cpu->kvm_run->mmio.len,
+   cpu->kvm_run->mmio.is_write);
+
+   if (!ret)
+   goto panic_kvm;
+   break;
+   }
+   default:
+   pr_warning("Unknown exit reason %d in %s\n", 
kvm_run->exit_reason, __FUNCTION__);
+   return false;
+   }
+   return true;
+panic_kvm:
+   return false;
+}
+
 st

[PATCH 27/28] kvm tools: Arch-specific define for PCI MMIO allocation area

2011-12-05 Thread Matt Evans
pci_get_io_space_block() used to grab addresses from
KVM_32BIT_GAP_START + 0x100, which is x86-specific.  Create a new define,
KVM_PCI_MMIO_AREA, to specify a bus address these allocations can come from.

Signed-off-by: Matt Evans 
---
 tools/kvm/pci.c  |8 ++--
 tools/kvm/x86/include/kvm/kvm-arch.h |5 +
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/pci.c b/tools/kvm/pci.c
index 8282e23..045c1c5 100644
--- a/tools/kvm/pci.c
+++ b/tools/kvm/pci.c
@@ -11,8 +11,12 @@ static struct pci_device_header  
*pci_devices[PCI_MAX_DEVICES];
 
 static union pci_config_addresspci_config_address;
 
-/* This is within our PCI gap - in an unused area */
-static u32 io_space_blocks = KVM_32BIT_GAP_START + 0x100;
+/* This is within our PCI gap - in an unused area.
+ * Note this is a PCI *bus address*, is used to assign BARs etc.!
+ * (That's why it can still 32bit even with 64bit guests-- 64bit
+ * PCI isn't currently supported.)
+ */
+static u32 io_space_blocks = KVM_PCI_MMIO_AREA;
 
 u32 pci_get_io_space_block(u32 size)
 {
diff --git a/tools/kvm/x86/include/kvm/kvm-arch.h 
b/tools/kvm/x86/include/kvm/kvm-arch.h
index 02aa8b9..686b1b8 100644
--- a/tools/kvm/x86/include/kvm/kvm-arch.h
+++ b/tools/kvm/x86/include/kvm/kvm-arch.h
@@ -18,6 +18,11 @@
 
 #define KVM_MMIO_START KVM_32BIT_GAP_START
 
+/* This is the address that pci_get_io_space_block() starts allocating
+ * from.  Note that this is a PCI bus address (though same on x86).
+ */
+#define KVM_PCI_MMIO_AREA  (KVM_MMIO_START + 0x100)
+
 struct kvm {
int sys_fd; /* For system ioctls(), i.e. 
/dev/kvm */
int vm_fd;  /* For VM ioctls() */
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 26/28] kvm tools: Add pci__config_{rd,wr}(), pci__find_dev() and fix PCI config register addressing

2011-12-05 Thread Matt Evans
This allows config space access in a more natural manner than clunky x86 IO 
ports,
and is useful for other architectures.

Furthermore, the actual registers were only accessed in 32bit chunks; other
systems (e.g. PPC) allow smaller accesses so that, for example, the 16-bit
config field can be read directly.  This patch allows this sort of addressing.

Signed-off-by: Matt Evans 
---
 tools/kvm/include/kvm/pci.h |5 +++
 tools/kvm/pci.c |   63 +++---
 2 files changed, 45 insertions(+), 23 deletions(-)

diff --git a/tools/kvm/include/kvm/pci.h b/tools/kvm/include/kvm/pci.h
index 88e92dc..be2b0bc 100644
--- a/tools/kvm/include/kvm/pci.h
+++ b/tools/kvm/include/kvm/pci.h
@@ -7,6 +7,8 @@
 #include 
 #include 
 
+#include "kvm/kvm.h"
+
 #define PCI_MAX_DEVICES256
 /*
  * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1.
@@ -82,6 +84,9 @@ struct pci_device_header {
 
 void pci__init(void);
 void pci__register(struct pci_device_header *dev, u8 dev_num);
+struct pci_device_header *pci__find_dev(u8 dev_num);
 u32 pci_get_io_space_block(u32 size);
+void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void 
*data, int size);
+void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void 
*data, int size);
 
 #endif /* KVM__PCI_H */
diff --git a/tools/kvm/pci.c b/tools/kvm/pci.c
index 5bbcbc7..8282e23 100644
--- a/tools/kvm/pci.c
+++ b/tools/kvm/pci.c
@@ -77,7 +77,6 @@ static bool pci_device_exists(u8 bus_number, u8 
device_number, u8 function_numbe
 static bool pci_config_data_out(struct ioport *ioport, struct kvm *kvm, u16 
port, void *data, int size)
 {
unsigned long start;
-   u8 dev_num;
 
/*
 * If someone accesses PCI configuration space offsets that are not
@@ -85,12 +84,41 @@ static bool pci_config_data_out(struct ioport *ioport, 
struct kvm *kvm, u16 port
 */
start = port - PCI_CONFIG_DATA;
 
-   dev_num = pci_config_address.device_number;
+   pci__config_wr(kvm, pci_config_address, data, size);
+
+   return true;
+}
+
+static bool pci_config_data_in(struct ioport *ioport, struct kvm *kvm, u16 
port, void *data, int size)
+{
+   unsigned long start;
+
+   /*
+* If someone accesses PCI configuration space offsets that are not
+* aligned to 4 bytes, it uses ioports to signify that.
+*/
+   start = port - PCI_CONFIG_DATA;
+
+   pci__config_rd(kvm, pci_config_address, data, size);
+
+   return true;
+}
+
+static struct ioport_operations pci_config_data_ops = {
+   .io_in  = pci_config_data_in,
+   .io_out = pci_config_data_out,
+};
+
+void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void 
*data, int size)
+{
+   u8 dev_num;
+
+   dev_num = addr.device_number;
 
if (pci_device_exists(0, dev_num, 0)) {
unsigned long offset;
 
-   offset = start + (pci_config_address.register_number << 2);
+   offset = addr.w & 0xff;
if (offset < sizeof(struct pci_device_header)) {
void *p = pci_devices[dev_num];
u8 bar = (offset - PCI_BAR_OFFSET(0)) / (sizeof(u32));
@@ -116,27 +144,18 @@ static bool pci_config_data_out(struct ioport *ioport, 
struct kvm *kvm, u16 port
}
}
}
-
-   return true;
 }
 
-static bool pci_config_data_in(struct ioport *ioport, struct kvm *kvm, u16 
port, void *data, int size)
+void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void 
*data, int size)
 {
-   unsigned long start;
u8 dev_num;
 
-   /*
-* If someone accesses PCI configuration space offsets that are not
-* aligned to 4 bytes, it uses ioports to signify that.
-*/
-   start = port - PCI_CONFIG_DATA;
-
-   dev_num = pci_config_address.device_number;
+   dev_num = addr.device_number;
 
if (pci_device_exists(0, dev_num, 0)) {
unsigned long offset;
 
-   offset = start + (pci_config_address.register_number << 2);
+   offset = addr.w & 0xff;
if (offset < sizeof(struct pci_device_header)) {
void *p = pci_devices[dev_num];
 
@@ -145,22 +164,20 @@ static bool pci_config_data_in(struct ioport *ioport, 
struct kvm *kvm, u16 port,
memset(data, 0x00, size);
} else
memset(data, 0xff, size);
-
-   return true;
 }
 
-static struct ioport_operations pci_config_data_ops = {
-   .io_in  = pci_config_data_in,
-   .io_out = pci_config_data_out,
-};
-
 void pci__register(struct pci_device_header *dev, u8 dev_num)
 {
assert(dev_num < PCI_MAX_DEVICES);
-
pci_devices[dev_num]= dev;
 }
 
+struct pci_device_header *pci__find_dev(u8 dev_num)
+{
+   assert(dev_num < PCI_MA

[PATCH 25/28] kvm tools: Correctly set virtio-pci bar_size and remove hardwired address

2011-12-05 Thread Matt Evans
The BAR addresses are set up fine, but missed the bar_size[] array which is now
updated correspondingly.

Use PCI_IO_SIZE instead of '0x100'.

Signed-off-by: Matt Evans 
---
 tools/kvm/virtio/pci.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c
index 6b27ff8..ffa3768 100644
--- a/tools/kvm/virtio/pci.c
+++ b/tools/kvm/virtio/pci.c
@@ -293,8 +293,8 @@ int virtio_pci__init(struct kvm *kvm, struct virtio_trans 
*vtrans, void *dev,
vpci->msix_pba_block = pci_get_io_space_block(PCI_IO_SIZE);
 
vpci->base_addr = ioport__register(IOPORT_EMPTY, &virtio_pci__io_ops, 
IOPORT_SIZE, vtrans);
-   kvm__register_mmio(kvm, vpci->msix_io_block, 0x100, 
callback_mmio_table, vpci);
-   kvm__register_mmio(kvm, vpci->msix_pba_block, 0x100, callback_mmio_pba, 
vpci);
+   kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE, 
callback_mmio_table, vpci);
+   kvm__register_mmio(kvm, vpci->msix_pba_block, PCI_IO_SIZE, 
callback_mmio_pba, vpci);
 
vpci->pci_hdr = (struct pci_device_header) {
.vendor_id  = 
cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
@@ -313,6 +313,9 @@ int virtio_pci__init(struct kvm *kvm, struct virtio_trans 
*vtrans, void *dev,
  | 
PCI_BASE_ADDRESS_MEM_TYPE_64),
.status = cpu_to_le16(PCI_STATUS_CAP_LIST),
.capabilities   = (void *)&vpci->pci_hdr.msix - (void 
*)&vpci->pci_hdr,
+   .bar_size[0]= IOPORT_SIZE,
+   .bar_size[1]= PCI_IO_SIZE,
+   .bar_size[3]= PCI_IO_SIZE,
};
 
vpci->pci_hdr.msix.cap = PCI_CAP_ID_MSIX;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 24/28] kvm tools: Fix virtio-pci endian bug when reading VIRTIO_PCI_QUEUE_NUM

2011-12-05 Thread Matt Evans
The field size is currently wrong, read into a 32bit word instead of 16.  This
casues trouble when BE.

Signed-off-by: Matt Evans 
---
 tools/kvm/virtio/pci.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c
index 0ae93fb..6b27ff8 100644
--- a/tools/kvm/virtio/pci.c
+++ b/tools/kvm/virtio/pci.c
@@ -116,8 +116,7 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct 
kvm *kvm, u16 port,
break;
case VIRTIO_PCI_QUEUE_NUM:
val = vtrans->virtio_ops->get_size_vq(kvm, vpci->dev, 
vpci->queue_selector);
-   ioport__write32(data, val);
-   break;
+   ioport__write16(data, val);
break;
case VIRTIO_PCI_STATUS:
ioport__write8(data, vpci->status);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 23/28] kvm tools: Endian-sanitise pci.h and PCI device setup

2011-12-05 Thread Matt Evans
vesa, pci-shmem and virtio-pci devices need to set up config space with
little-endian conversions (as config space is LE).  The pci_config_address
bitfield also needs to be reversed when building on BE systems.

Signed-off-by: Matt Evans 
---
 tools/kvm/hw/pci-shmem.c   |   23 +++--
 tools/kvm/hw/vesa.c|   15 +++--
 tools/kvm/include/kvm/ioport.h |   11 +
 tools/kvm/include/kvm/pci.h|   24 +-
 tools/kvm/pci.c|4 +-
 tools/kvm/virtio/pci.c |   41 +--
 6 files changed, 68 insertions(+), 50 deletions(-)

diff --git a/tools/kvm/hw/pci-shmem.c b/tools/kvm/hw/pci-shmem.c
index 780a377..fd954c5 100644
--- a/tools/kvm/hw/pci-shmem.c
+++ b/tools/kvm/hw/pci-shmem.c
@@ -8,21 +8,22 @@
 #include "kvm/ioeventfd.h"
 
 #include 
+#include 
 #include 
 #include 
 #include 
 
 static struct pci_device_header pci_shmem_pci_device = {
-   .vendor_id  = PCI_VENDOR_ID_REDHAT_QUMRANET,
-   .device_id  = 0x1110,
+   .vendor_id  = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
+   .device_id  = cpu_to_le16(0x1110),
.header_type= PCI_HEADER_TYPE_NORMAL,
-   .class  = 0xFF, /* misc pci device */
-   .status = PCI_STATUS_CAP_LIST,
+   .class[2]   = 0xFF, /* misc pci device */
+   .status = cpu_to_le16(PCI_STATUS_CAP_LIST),
.capabilities   = (void *)&pci_shmem_pci_device.msix - (void 
*)&pci_shmem_pci_device,
.msix.cap   = PCI_CAP_ID_MSIX,
-   .msix.ctrl  = 1,
-   .msix.table_offset = 1, /* Use BAR 1 */
-   .msix.pba_offset = 0x1001,  /* Use BAR 1 */
+   .msix.ctrl  = cpu_to_le16(1),
+   .msix.table_offset = cpu_to_le32(1),/* Use BAR 1 */
+   .msix.pba_offset = cpu_to_le32(0x1001), /* Use BAR 1 */
 };
 
 /* registers for the Inter-VM shared memory device */
@@ -123,7 +124,7 @@ int pci_shmem__get_local_irqfd(struct kvm *kvm)
if (fd < 0)
return fd;
 
-   if (pci_shmem_pci_device.msix.ctrl & PCI_MSIX_FLAGS_ENABLE) {
+   if (pci_shmem_pci_device.msix.ctrl & 
cpu_to_le16(PCI_MSIX_FLAGS_ENABLE)) {
gsi = irq__add_msix_route(kvm, &msix_table[0].msg);
} else {
gsi = pci_shmem_pci_device.irq_line;
@@ -241,11 +242,11 @@ int pci_shmem__init(struct kvm *kvm)
 * 1 - MSI-X MMIO space
 * 2 - Shared memory block
 */
-   pci_shmem_pci_device.bar[0] = ivshmem_registers | 
PCI_BASE_ADDRESS_SPACE_IO;
+   pci_shmem_pci_device.bar[0] = cpu_to_le32(ivshmem_registers | 
PCI_BASE_ADDRESS_SPACE_IO);
pci_shmem_pci_device.bar_size[0] = shmem_region->size;
-   pci_shmem_pci_device.bar[1] = msix_block | 
PCI_BASE_ADDRESS_SPACE_MEMORY;
+   pci_shmem_pci_device.bar[1] = cpu_to_le32(msix_block | 
PCI_BASE_ADDRESS_SPACE_MEMORY);
pci_shmem_pci_device.bar_size[1] = 0x1010;
-   pci_shmem_pci_device.bar[2] = shmem_region->phys_addr | 
PCI_BASE_ADDRESS_SPACE_MEMORY;
+   pci_shmem_pci_device.bar[2] = cpu_to_le32(shmem_region->phys_addr | 
PCI_BASE_ADDRESS_SPACE_MEMORY);
pci_shmem_pci_device.bar_size[2] = shmem_region->size;
 
pci__register(&pci_shmem_pci_device, dev);
diff --git a/tools/kvm/hw/vesa.c b/tools/kvm/hw/vesa.c
index 22b1652..63f1082 100644
--- a/tools/kvm/hw/vesa.c
+++ b/tools/kvm/hw/vesa.c
@@ -8,6 +8,7 @@
 #include "kvm/irq.h"
 #include "kvm/kvm.h"
 #include "kvm/pci.h"
+#include 
 #include 
 
 #include 
@@ -31,14 +32,14 @@ static struct ioport_operations vesa_io_ops = {
 };
 
 static struct pci_device_header vesa_pci_device = {
-   .vendor_id  = PCI_VENDOR_ID_REDHAT_QUMRANET,
-   .device_id  = PCI_DEVICE_ID_VESA,
+   .vendor_id  = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
+   .device_id  = cpu_to_le16(PCI_DEVICE_ID_VESA),
.header_type= PCI_HEADER_TYPE_NORMAL,
.revision_id= 0,
-   .class  = 0x03,
-   .subsys_vendor_id   = PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET,
-   .subsys_id  = PCI_SUBSYSTEM_ID_VESA,
-   .bar[1] = VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY,
+   .class[2]   = 0x03,
+   .subsys_vendor_id   = 
cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
+   .subsys_id  = cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
+   .bar[1] = cpu_to_le32(VESA_MEM_ADDR | 
PCI_BASE_ADDRESS_SPACE_MEMORY),
.bar_size[1]= VESA_MEM_SIZE,
 };
 
@@ -56,7 +57,7 @@ struct framebuffer *vesa__init(struct kvm *kvm)
vesa_pci_device.irq_pin = pin;
vesa_pci_device.irq_line= line;
vesa_base_addr  = ioport__register(IOPORT_EMPTY, 
&vesa_io_ops, IOPORT_SIZE, NULL);
-   vesa_

[PATCH 22/28] kvm tools: Move PCI_MAX_DEVICES to pci.h

2011-12-05 Thread Matt Evans
Other pieces of kvmtool may be interested in PCI_MAX_DEVICES.

Signed-off-by: Matt Evans 
---
 tools/kvm/include/kvm/pci.h |1 +
 tools/kvm/pci.c |1 -
 2 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/include/kvm/pci.h b/tools/kvm/include/kvm/pci.h
index f71af0b..b578ad7 100644
--- a/tools/kvm/include/kvm/pci.h
+++ b/tools/kvm/include/kvm/pci.h
@@ -6,6 +6,7 @@
 #include 
 #include 
 
+#define PCI_MAX_DEVICES256
 /*
  * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1.
  * ("Configuration Mechanism #1") of the PCI Local Bus Specification 2.1 for
diff --git a/tools/kvm/pci.c b/tools/kvm/pci.c
index d1afc05..920e13e 100644
--- a/tools/kvm/pci.c
+++ b/tools/kvm/pci.c
@@ -5,7 +5,6 @@
 
 #include 
 
-#define PCI_MAX_DEVICES256
 #define PCI_BAR_OFFSET(b)  (offsetof(struct pci_device_header, 
bar[b]))
 
 static struct pci_device_header*pci_devices[PCI_MAX_DEVICES];
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/28] kvm tools: Add --hugetlbfs option to specify memory path

2011-12-05 Thread Matt Evans
Some architectures may want to use hugetlbfs to mmap() their guest memory, so
allow a path to be specified on the commandline and pass it to kvm__arch_init().

Signed-off-by: Matt Evans 
---
 tools/kvm/builtin-run.c |4 +++-
 tools/kvm/include/kvm/kvm.h |4 ++--
 tools/kvm/kvm.c |4 ++--
 tools/kvm/x86/kvm.c |2 +-
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 84aa931..4c88169 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -84,6 +84,7 @@ static const char *guest_mac;
 static const char *host_mac;
 static const char *script;
 static const char *guest_name;
+static const char *hugetlbfs_path;
 static struct virtio_net_params *net_params;
 static bool single_step;
 static bool readonly_image[MAX_DISK_IMAGES];
@@ -422,6 +423,7 @@ static const struct option options[] = {
OPT_CALLBACK('\0', "tty", NULL, "tty id",
 "Remap guest TTY into a pty on the host",
 tty_parser),
+   OPT_STRING('\0', "hugetlbfs", &hugetlbfs_path, "path", "Hugetlbfs 
path"),
 
OPT_GROUP("Kernel options:"),
OPT_STRING('k', "kernel", &kernel_filename, "kernel",
@@ -808,7 +810,7 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
guest_name = default_name;
}
 
-   kvm = kvm__init(dev, ram_size, guest_name);
+   kvm = kvm__init(dev, hugetlbfs_path, ram_size, guest_name);
 
kvm->single_step = single_step;
 
diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index 5fe6e75..7159952 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -30,7 +30,7 @@ struct kvm_ext {
 void kvm__set_dir(const char *fmt, ...);
 const char *kvm__get_dir(void);
 
-struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name);
+struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 
ram_size, const char *name);
 int kvm__recommended_cpus(struct kvm *kvm);
 int kvm__max_cpus(struct kvm *kvm);
 void kvm__init_ram(struct kvm *kvm);
@@ -54,7 +54,7 @@ int kvm__enumerate_instances(int (*callback)(const char 
*name, int pid));
 void kvm__remove_socket(const char *name);
 
 void kvm__arch_set_cmdline(char *cmdline, bool video);
-void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const 
char *name);
+void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char 
*hugetlbfs_path, u64 ram_size, const char *name);
 void kvm__arch_setup_firmware(struct kvm *kvm);
 bool kvm__arch_cpu_supports_vm(void);
 void kvm__arch_periodic_poll(struct kvm *kvm);
diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 6f33e1a..503ceae 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -272,7 +272,7 @@ static void kvm__pid(int fd, u32 type, u32 len, u8 *msg)
pr_warning("Failed sending PID");
 }
 
-struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name)
+struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 
ram_size, const char *name)
 {
struct kvm *kvm;
int ret;
@@ -305,7 +305,7 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, 
const char *name)
if (kvm__check_extensions(kvm))
die("A required KVM extention is not supported by OS");
 
-   kvm__arch_init(kvm, kvm_dev, ram_size, name);
+   kvm__arch_init(kvm, kvm_dev, hugetlbfs_path, ram_size, name);
 
kvm->name = name;
 
diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c
index 4ac21c0..76f805f 100644
--- a/tools/kvm/x86/kvm.c
+++ b/tools/kvm/x86/kvm.c
@@ -161,7 +161,7 @@ void kvm__arch_set_cmdline(char *cmdline, bool video)
 }
 
 /* Architecture-specific KVM init */
-void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const 
char *name)
+void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char 
*hugetlbfs_path, u64 ram_size, const char *name)
 {
struct kvm_pit_config pit_config = { .flags = 0, };
int ret;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 20/28] kvm tools: Init IRQs after determining nrcpus

2011-12-05 Thread Matt Evans
IRQ init may involve per-CPU setup/allocation of resources, so make sure
kvm->nrcpus is initialised before calling irq__init().

Signed-off-by: Matt Evans 
---
 tools/kvm/builtin-run.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 576dcfa..84aa931 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -810,8 +810,6 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
kvm = kvm__init(dev, ram_size, guest_name);
 
-   irq__init(kvm);
-
kvm->single_step = single_step;
 
ioeventfd__init();
@@ -829,6 +827,8 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
kvm->nrcpus = nrcpus;
 
+   irq__init(kvm);
+
pci__init();
 
/*
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 19/28] kvm tools: Perform CPU and firmware setup after devices are added

2011-12-05 Thread Matt Evans
Currently some devices (in this case kbd, fb, vesa) are initialised after
CPU/firmware setup.  On some platforms (e.g. PPC) kvm__arch_setup_firmware() may
be making a device tree.  Any devices added after this point will be missed!

Tiny refactor of builtin-run.c, moving timer start, firmware setup, cpu init
to occur last.

Signed-off-by: Matt Evans 
---
 tools/kvm/builtin-run.c |   24 ++--
 1 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 32e19e7..576dcfa 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -933,16 +933,6 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
virtio_net__init(&net_params);
}
 
-   kvm__start_timer(kvm);
-
-   kvm__arch_setup_firmware(kvm);
-
-   for (i = 0; i < nrcpus; i++) {
-   kvm_cpus[i] = kvm_cpu__init(kvm, i);
-   if (!kvm_cpus[i])
-   die("unable to initialize KVM VCPU");
-   }
-
kvm__init_ram(kvm);
 
 #ifdef CONFIG_X86
@@ -966,6 +956,20 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
fb__start();
 
+   /* Device init all done; firmware init must
+* come after this (it may set up device trees etc.)
+*/
+
+   kvm__start_timer(kvm);
+
+   kvm__arch_setup_firmware(kvm);
+
+   for (i = 0; i < nrcpus; i++) {
+   kvm_cpus[i] = kvm_cpu__init(kvm, i);
+   if (!kvm_cpus[i])
+   die("unable to initialize KVM VCPU");
+   }
+
thread_pool__init(nr_online_cpus);
ioeventfd__start();
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 18/28] kvm tools: Initialise PCI before devices start getting registered with PCI

2011-12-05 Thread Matt Evans
Re-arrange pci__init() in builtin-run such that it comes before devices are
initialised.

Signed-off-by: Matt Evans 
---
 tools/kvm/builtin-run.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index aaa5132..32e19e7 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -829,6 +829,8 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
kvm->nrcpus = nrcpus;
 
+   pci__init();
+
/*
 * vidmode should be either specified
 * either set by default
@@ -896,8 +898,6 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
serial8250__init(kvm);
 
-   pci__init();
-
if (active_console == CONSOLE_VIRTIO)
virtio_console__init(kvm);
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 17/28] kvm tools: Only call symbol__init() if we have BFD

2011-12-05 Thread Matt Evans
CONFIG_HAS_BFD is optional, symbol.c inclusion is optional -- so make its init
call dependent on CONFIG_HAS_BFD.

Signed-off-by: Matt Evans 
---
 tools/kvm/builtin-run.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 1257c90..aaa5132 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -798,8 +798,9 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
if (!script)
script = DEFAULT_SCRIPT;
 
+#ifdef CONFIG_HAS_BFD
symbol__init(vmlinux_filename);
-
+#endif
term_init();
 
if (!guest_name) {
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/28] kvm tools: Allow load_flat_binary() to load an initrd alongside

2011-12-05 Thread Matt Evans
This patch passes the initrd fd and commandline to load_flat_binary(), which may
be used to load both the kernel & an initrd (stashing or inserting the
commandline as appropriate) in the same way that load_bzimage() does.  This is
especially useful when load_bzimage() is unused for a particular
architecture. :-)

Signed-off-by: Matt Evans 
---
 tools/kvm/include/kvm/kvm.h |2 +-
 tools/kvm/kvm.c |   10 ++
 tools/kvm/x86/kvm.c |   12 +---
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index fae2ba9..5fe6e75 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -59,7 +59,7 @@ void kvm__arch_setup_firmware(struct kvm *kvm);
 bool kvm__arch_cpu_supports_vm(void);
 void kvm__arch_periodic_poll(struct kvm *kvm);
 
-int load_flat_binary(struct kvm *kvm, int fd);
+int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline);
 bool load_bzimage(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline, u16 vidmode);
 
 /*
diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 457de1a..6f33e1a 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -354,23 +354,25 @@ bool kvm__load_kernel(struct kvm *kvm, const char 
*kernel_filename,
 
ret = load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline, vidmode);
 
-   if (initrd_filename)
-   close(fd_initrd);
-
if (ret)
goto found_kernel;
 
pr_warning("%s is not a bzImage. Trying to load it as a flat 
binary...", kernel_filename);
 
-   ret = load_flat_binary(kvm, fd_kernel);
+   ret = load_flat_binary(kvm, fd_kernel, fd_initrd, kernel_cmdline);
+
if (ret)
goto found_kernel;
 
+   if (initrd_filename)
+   close(fd_initrd);
close(fd_kernel);
 
die("%s is not a valid bzImage or flat binary", kernel_filename);
 
 found_kernel:
+   if (initrd_filename)
+   close(fd_initrd);
close(fd_kernel);
 
return ret;
diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c
index 7071dc6..4ac21c0 100644
--- a/tools/kvm/x86/kvm.c
+++ b/tools/kvm/x86/kvm.c
@@ -227,17 +227,23 @@ void kvm__irq_trigger(struct kvm *kvm, int irq)
 #define BOOT_PROTOCOL_REQUIRED 0x206
 #define LOAD_HIGH  0x01
 
-int load_flat_binary(struct kvm *kvm, int fd)
+int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline)
 {
void *p;
int nr;
 
-   if (lseek(fd, 0, SEEK_SET) < 0)
+   /* Some architectures may support loading an initrd alongside the flat 
kernel,
+* but we do not.
+*/
+   if (fd_initrd != -1)
+   pr_warning("Loading initrd with flat binary not supported.");
+
+   if (lseek(fd_kernel, 0, SEEK_SET) < 0)
die_perror("lseek");
 
p = guest_real_to_host(kvm, BOOT_LOADER_SELECTOR, BOOT_LOADER_IP);
 
-   while ((nr = read(fd, p, 65536)) > 0)
+   while ((nr = read(fd_kernel, p, 65536)) > 0)
p += nr;
 
kvm->boot_selector  = BOOT_LOADER_SELECTOR;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/28] kvm tools: Allow initrd_check() to match a cpio

2011-12-05 Thread Matt Evans
cpios are valid as initrds too, so allow them through the check.

Signed-off-by: Matt Evans 
---
 tools/kvm/kvm.c |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 33243f1..457de1a 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -317,10 +317,11 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, 
const char *name)
 /* RFC 1952 */
 #define GZIP_ID1   0x1f
 #define GZIP_ID2   0x8b
-
+#define CPIO_MAGIC "0707"
+/* initrd may be gzipped, or a plain cpio */
 static bool initrd_check(int fd)
 {
-   unsigned char id[2];
+   unsigned char id[4];
 
if (read_in_full(fd, id, ARRAY_SIZE(id)) < 0)
return false;
@@ -328,7 +329,8 @@ static bool initrd_check(int fd)
if (lseek(fd, 0, SEEK_SET) < 0)
die_perror("lseek");
 
-   return id[0] == GZIP_ID1 && id[1] == GZIP_ID2;
+   return (id[0] == GZIP_ID1 && id[1] == GZIP_ID2) ||
+   !memcmp(id, CPIO_MAGIC, 4);
 }
 
 bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename,
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/28] kvm tools: Fix term_getc(), term_getc_iov() endian bugs

2011-12-05 Thread Matt Evans
term_getc()'s int c has one byte written into it (at its lowest address) by
read_in_full().  This is expected to be the least significant byte, but that
isn't the case on BE!  Use correct type, unsigned char.  A similar issue exists
in term_getc_iov(), which needs to write a char to the iov rather than an int.

Signed-off-by: Matt Evans 
---
 tools/kvm/term.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/term.c b/tools/kvm/term.c
index fb5d71c..440884e 100644
--- a/tools/kvm/term.c
+++ b/tools/kvm/term.c
@@ -30,11 +30,10 @@ int term_fds[4][2];
 
 int term_getc(int who, int term)
 {
-   int c;
+   unsigned char c;
 
if (who != active_console)
return -1;
-
if (read_in_full(term_fds[term][TERM_FD_IN], &c, 1) < 0)
return -1;
 
@@ -84,7 +83,7 @@ int term_getc_iov(int who, struct iovec *iov, int iovcnt, int 
term)
if (c < 0)
return 0;
 
-   *((int *)iov[TERM_FD_IN].iov_base)  = c;
+   *((char *)iov[TERM_FD_IN].iov_base) = (char)c;
 
return sizeof(char);
 }
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/28] kvm tools: Add CONSOLE_HV term type and allow it to be selected

2011-12-05 Thread Matt Evans
This patch paves the way for adding a hypervisor console, useful on systems that
support one out of the box yet don't have either serial port or virtio console
support (e.g. kernels expecting POWER SPAPR).

Signed-off-by: Matt Evans 
---
 tools/kvm/builtin-run.c  |8 ++--
 tools/kvm/include/kvm/term.h |1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index a67bd8c..1257c90 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -416,7 +416,7 @@ static const struct option options[] = {
OPT_BOOLEAN('\0', "rng", &virtio_rng, "Enable virtio Random Number 
Generator"),
OPT_CALLBACK('\0', "9p", NULL, "dir_to_share,tag_name",
 "Enable virtio 9p to share files between host and guest", 
virtio_9p_rootdir_parser),
-   OPT_STRING('\0', "console", &console, "serial or virtio",
+   OPT_STRING('\0', "console", &console, "serial, virtio or hv",
"Console to use"),
OPT_STRING('\0', "dev", &dev, "device_file", "KVM device file"),
OPT_CALLBACK('\0', "tty", NULL, "tty id",
@@ -776,8 +776,12 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
if (!strncmp(console, "virtio", 6))
active_console  = CONSOLE_VIRTIO;
-   else
+   else if (!strncmp(console, "serial", 6))
active_console  = CONSOLE_8250;
+   else if (!strncmp(console, "hv", 2))
+   active_console = CONSOLE_HV;
+   else
+   pr_warning("No console!");
 
if (!host_ip)
host_ip = DEFAULT_HOST_ADDR;
diff --git a/tools/kvm/include/kvm/term.h b/tools/kvm/include/kvm/term.h
index 938c26f..a6a9822 100644
--- a/tools/kvm/include/kvm/term.h
+++ b/tools/kvm/include/kvm/term.h
@@ -6,6 +6,7 @@
 
 #define CONSOLE_8250   1
 #define CONSOLE_VIRTIO 2
+#define CONSOLE_HV 3
 
 int term_putc_iov(int who, struct iovec *iov, int iovcnt, int term);
 int term_getc_iov(int who, struct iovec *iov, int iovcnt, int term);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/28] kvm tools: Move arch-specific cmdline init into kvm__arch_set_cmdline()

2011-12-05 Thread Matt Evans
Different systems will want different base kernel commandlines, e.g. non-x86
systems probably don't need noapic, i8042.* etc., so set the commandline up in
arch-specific code.  Then, if the resulting commandline is empty, don't strcat a
space onto the front.

Signed-off-by: Matt Evans 
---
 tools/kvm/builtin-run.c |   12 +---
 tools/kvm/include/kvm/kvm.h |1 +
 tools/kvm/x86/kvm.c |   11 +++
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 9ef331e..a67bd8c 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -835,13 +835,11 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
vidmode = 0;
 
memset(real_cmdline, 0, sizeof(real_cmdline));
-   strcpy(real_cmdline, "noapic noacpi pci=conf1 reboot=k panic=1 
i8042.direct=1 "
-   "i8042.dumbkbd=1 i8042.nopnp=1");
-   if (vnc || sdl) {
-   strcat(real_cmdline, " video=vesafb console=tty0");
-   } else
-   strcat(real_cmdline, " console=ttyS0 earlyprintk=serial 
i8042.noaux=1");
-   strcat(real_cmdline, " ");
+   kvm__arch_set_cmdline(real_cmdline, vnc || sdl);
+
+   if (strlen(real_cmdline) > 0)
+   strcat(real_cmdline, " ");
+
if (kernel_cmdline)
strlcat(real_cmdline, kernel_cmdline, sizeof(real_cmdline));
 
diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index 60842d5..fae2ba9 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -53,6 +53,7 @@ int kvm__get_sock_by_instance(const char *name);
 int kvm__enumerate_instances(int (*callback)(const char *name, int pid));
 void kvm__remove_socket(const char *name);
 
+void kvm__arch_set_cmdline(char *cmdline, bool video);
 void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const 
char *name);
 void kvm__arch_setup_firmware(struct kvm *kvm);
 bool kvm__arch_cpu_supports_vm(void);
diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c
index 45dcb77..7071dc6 100644
--- a/tools/kvm/x86/kvm.c
+++ b/tools/kvm/x86/kvm.c
@@ -149,6 +149,17 @@ void kvm__init_ram(struct kvm *kvm)
}
 }
 
+/* Arch-specific commandline setup */
+void kvm__arch_set_cmdline(char *cmdline, bool video)
+{
+   strcpy(cmdline, "noapic noacpi pci=conf1 reboot=k panic=1 
i8042.direct=1 "
+   "i8042.dumbkbd=1 i8042.nopnp=1");
+   if (video) {
+   strcat(cmdline, " video=vesafb console=tty0");
+   } else
+   strcat(cmdline, " console=ttyS0 earlyprintk=serial 
i8042.noaux=1");
+}
+
 /* Architecture-specific KVM init */
 void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const 
char *name)
 {
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/28] kvm tools: kvm.c needs to include sys/stat.h for mkdir

2011-12-05 Thread Matt Evans
Fix a missing include.

Signed-off-by: Matt Evans 
---
 tools/kvm/kvm.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index e526483..33243f1 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -8,6 +8,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/28] kvm tools: term.h needs to include stdbool.h

2011-12-05 Thread Matt Evans
Fix a missing include.

Signed-off-by: Matt Evans 
---
 tools/kvm/include/kvm/term.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/tools/kvm/include/kvm/term.h b/tools/kvm/include/kvm/term.h
index 37ec731..938c26f 100644
--- a/tools/kvm/include/kvm/term.h
+++ b/tools/kvm/include/kvm/term.h
@@ -2,6 +2,7 @@
 #define KVM__TERM_H
 
 #include 
+#include 
 
 #define CONSOLE_8250   1
 #define CONSOLE_VIRTIO 2
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/28] kvm tools: Add kvm__arch_periodic_poll()

2011-12-05 Thread Matt Evans
Currently, the SIGALRM handler calls device poll functions (for serial, virtio
console) directly.  Which devices are present and which require polling is a
system-specific decision, so create a new function called from common code &
move the x86-specific poll calls into it.

Signed-off-by: Matt Evans 
---
 tools/kvm/builtin-run.c |3 +--
 tools/kvm/include/kvm/kvm.h |1 +
 tools/kvm/x86/kvm.c |8 
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 7cf208d..9ef331e 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -522,8 +522,7 @@ static void handle_debug(int fd, u32 type, u32 len, u8 *msg)
 
 static void handle_sigalrm(int sig)
 {
-   serial8250__inject_interrupt(kvm);
-   virtio_console__inject_interrupt(kvm);
+   kvm__arch_periodic_poll(kvm);
 }
 
 static void handle_stop(int fd, u32 type, u32 len, u8 *msg)
diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index ca1acc0..60842d5 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -56,6 +56,7 @@ void kvm__remove_socket(const char *name);
 void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const 
char *name);
 void kvm__arch_setup_firmware(struct kvm *kvm);
 bool kvm__arch_cpu_supports_vm(void);
+void kvm__arch_periodic_poll(struct kvm *kvm);
 
 int load_flat_binary(struct kvm *kvm, int fd);
 bool load_bzimage(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline, u16 vidmode);
diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c
index 75e4a52..45dcb77 100644
--- a/tools/kvm/x86/kvm.c
+++ b/tools/kvm/x86/kvm.c
@@ -4,6 +4,8 @@
 #include "kvm/interrupt.h"
 #include "kvm/mptable.h"
 #include "kvm/util.h"
+#include "kvm/8250-serial.h"
+#include "kvm/virtio-console.h"
 
 #include 
 #include 
@@ -358,3 +360,9 @@ void kvm__arch_setup_firmware(struct kvm *kvm)
/* MP table */
mptable_setup(kvm, kvm->nrcpus);
 }
+
+void kvm__arch_periodic_poll(struct kvm *kvm)
+{
+   serial8250__inject_interrupt(kvm);
+   virtio_console__inject_interrupt(kvm);
+}
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/28] kvm tools: Fix KVM_RUN exit code check

2011-12-05 Thread Matt Evans
kvm_cpu__run() currently die()s if KVM_RUN returns non-zero.  Some architectures
may return positive values in non-error cases, whereas real errors are always
negative return values.  Check for those instead.

Signed-off-by: Matt Evans 
---
 tools/kvm/kvm-cpu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c
index 9bc0796..884a89f 100644
--- a/tools/kvm/kvm-cpu.c
+++ b/tools/kvm/kvm-cpu.c
@@ -30,7 +30,7 @@ void kvm_cpu__run(struct kvm_cpu *vcpu)
int err;
 
err = ioctl(vcpu->vcpu_fd, KVM_RUN, 0);
-   if (err && (errno != EINTR && errno != EAGAIN))
+   if (err < 0 && (errno != EINTR && errno != EAGAIN))
die_perror("KVM_RUN failed");
 }
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/28] kvm tools: Move 'kvm__recommended_cpus' to arch-specific code

2011-12-05 Thread Matt Evans
Architectures can recommend/count/determine number of CPUs differently, so move
this out of generic code.

Signed-off-by: Matt Evans 
---
 tools/kvm/kvm.c |   30 --
 tools/kvm/x86/kvm.c |   30 ++
 2 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 7ce1640..e526483 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -259,17 +259,6 @@ void kvm__register_mem(struct kvm *kvm, u64 guest_phys, 
u64 size, void *userspac
die_perror("KVM_SET_USER_MEMORY_REGION ioctl");
 }
 
-int kvm__recommended_cpus(struct kvm *kvm)
-{
-   int ret;
-
-   ret = ioctl(kvm->sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_NR_VCPUS);
-   if (ret <= 0)
-   die_perror("KVM_CAP_NR_VCPUS");
-
-   return ret;
-}
-
 static void kvm__pid(int fd, u32 type, u32 len, u8 *msg)
 {
pid_t pid = getpid();
@@ -282,25 +271,6 @@ static void kvm__pid(int fd, u32 type, u32 len, u8 *msg)
pr_warning("Failed sending PID");
 }
 
-/*
- * The following hack should be removed once 'x86: Raise the hard
- * VCPU count limit' makes it's way into the mainline.
- */
-#ifndef KVM_CAP_MAX_VCPUS
-#define KVM_CAP_MAX_VCPUS 66
-#endif
-
-int kvm__max_cpus(struct kvm *kvm)
-{
-   int ret;
-
-   ret = ioctl(kvm->sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_MAX_VCPUS);
-   if (ret <= 0)
-   ret = kvm__recommended_cpus(kvm);
-
-   return ret;
-}
-
 struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name)
 {
struct kvm *kvm;
diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c
index ac6c91e..75e4a52 100644
--- a/tools/kvm/x86/kvm.c
+++ b/tools/kvm/x86/kvm.c
@@ -76,6 +76,36 @@ bool kvm__arch_cpu_supports_vm(void)
return regs.ecx & (1 << feature);
 }
 
+int kvm__recommended_cpus(struct kvm *kvm)
+{
+   int ret;
+
+   ret = ioctl(kvm->sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_NR_VCPUS);
+   if (ret <= 0)
+   die_perror("KVM_CAP_NR_VCPUS");
+
+   return ret;
+}
+
+/*
+ * The following hack should be removed once 'x86: Raise the hard
+ * VCPU count limit' makes it's way into the mainline.
+ */
+#ifndef KVM_CAP_MAX_VCPUS
+#define KVM_CAP_MAX_VCPUS 66
+#endif
+
+int kvm__max_cpus(struct kvm *kvm)
+{
+   int ret;
+
+   ret = ioctl(kvm->sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_MAX_VCPUS);
+   if (ret <= 0)
+   ret = kvm__recommended_cpus(kvm);
+
+   return ret;
+}
+
 /*
  * Allocating RAM size bigger than 4GB requires us to leave a gap
  * in the RAM which is used for PCI MMIO, hotplug, and unconfigured
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/28] kvm tools: Add arch-specific KVM_RUN exit handling via kvm_cpu__handle_exit()

2011-12-05 Thread Matt Evans
This patch creates a new function in x86/kvm-cpu.c, kvm_cpu__handle_exit(), in
which arch-specific exit reasons can be handled outside of the common runloop.

Signed-off-by: Matt Evans 
---
 tools/kvm/include/kvm/kvm-cpu.h |2 ++
 tools/kvm/kvm-cpu.c |   10 --
 tools/kvm/x86/kvm-cpu.c |5 +
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/include/kvm/kvm-cpu.h b/tools/kvm/include/kvm/kvm-cpu.h
index 719e286..15618f1 100644
--- a/tools/kvm/include/kvm/kvm-cpu.h
+++ b/tools/kvm/include/kvm/kvm-cpu.h
@@ -2,6 +2,7 @@
 #define KVM__KVM_CPU_H
 
 #include "kvm/kvm-cpu-arch.h"
+#include 
 
 struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned long cpu_id);
 void kvm_cpu__delete(struct kvm_cpu *vcpu);
@@ -11,6 +12,7 @@ void kvm_cpu__enable_singlestep(struct kvm_cpu *vcpu);
 void kvm_cpu__run(struct kvm_cpu *vcpu);
 void kvm_cpu__reboot(void);
 int kvm_cpu__start(struct kvm_cpu *cpu);
+bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu);
 
 int kvm_cpu__get_debug_fd(void);
 void kvm_cpu__set_debug_fd(int fd);
diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c
index 5aba3bb..9bc0796 100644
--- a/tools/kvm/kvm-cpu.c
+++ b/tools/kvm/kvm-cpu.c
@@ -137,8 +137,14 @@ int kvm_cpu__start(struct kvm_cpu *cpu)
goto exit_kvm;
case KVM_EXIT_SHUTDOWN:
goto exit_kvm;
-   default:
-   goto panic_kvm;
+   default: {
+   bool ret;
+
+   ret = kvm_cpu__handle_exit(cpu);
+   if (!ret)
+   goto panic_kvm;
+   break;
+   }
}
kvm_cpu__handle_coalesced_mmio(cpu);
}
diff --git a/tools/kvm/x86/kvm-cpu.c b/tools/kvm/x86/kvm-cpu.c
index b26b208..a0d10cc 100644
--- a/tools/kvm/x86/kvm-cpu.c
+++ b/tools/kvm/x86/kvm-cpu.c
@@ -212,6 +212,11 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu)
kvm_cpu__setup_msrs(vcpu);
 }
 
+bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
+{
+   return false;
+}
+
 static void print_dtable(const char *name, struct kvm_dtable *dtable)
 {
dprintf(debug_fd, " %s %016llx  %08hx\n",
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/28] kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately

2011-12-05 Thread Matt Evans
On LP64 systems our u64s are just longs; remove the %llx'es in favour of PRIx64
etc.

This patch also adds CFLAGS to the final link, so that any -m64 is obeyed when
linking, too.

Signed-off-by: Matt Evans 
---
 tools/kvm/Makefile   |2 +-
 tools/kvm/builtin-run.c  |   14 --
 tools/kvm/builtin-stat.c |4 +++-
 tools/kvm/disk/core.c|4 +++-
 tools/kvm/mmio.c |4 +++-
 5 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 009a6ba..57dc521 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -218,7 +218,7 @@ KVMTOOLS-VERSION-FILE:
 
 $(PROGRAM): $(DEPS) $(OBJS)
$(E) "  LINK" $@
-   $(Q) $(CC) $(OBJS) $(LIBS) -o $@
+   $(Q) $(CC) $(CFLAGS) $(OBJS) $(LIBS) -o $@
 
 $(GUEST_INIT): guest/init.c
$(E) "  LINK" $@
diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index e4aa87e..7cf208d 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -42,6 +42,8 @@
 #include 
 #include 
 #include 
+#define __STDC_FORMAT_MACROS
+#include 
 #include 
 #include 
 
@@ -383,8 +385,8 @@ static int shmem_parser(const struct option *opt, const 
char *arg, int unset)
strcpy(handle, default_handle);
}
if (verbose) {
-   pr_info("shmem: phys_addr = %llx", phys_addr);
-   pr_info("shmem: size  = %llx", size);
+   pr_info("shmem: phys_addr = %"PRIx64, phys_addr);
+   pr_info("shmem: size  = %"PRIx64, size);
pr_info("shmem: handle= %s", handle);
pr_info("shmem: create= %d", create);
}
@@ -545,7 +547,7 @@ panic_kvm:
current_kvm_cpu->kvm_run->exit_reason,
kvm_exit_reasons[current_kvm_cpu->kvm_run->exit_reason]);
if (current_kvm_cpu->kvm_run->exit_reason == KVM_EXIT_UNKNOWN)
-   fprintf(stderr, "KVM exit code: 0x%Lu\n",
+   fprintf(stderr, "KVM exit code: 0x%"PRIx64"\n",
current_kvm_cpu->kvm_run->hw.hardware_exit_reason);
 
kvm_cpu__set_debug_fd(STDOUT_FILENO);
@@ -760,10 +762,10 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
ram_size= get_ram_size(nrcpus);
 
if (ram_size < MIN_RAM_SIZE_MB)
-   die("Not enough memory specified: %lluMB (min %lluMB)", 
ram_size, MIN_RAM_SIZE_MB);
+   die("Not enough memory specified: %"PRIu64"MB (min %lluMB)", 
ram_size, MIN_RAM_SIZE_MB);
 
if (ram_size > host_ram_size())
-   pr_warning("Guest memory size %lluMB exceeds host physical RAM 
size %lluMB", ram_size, host_ram_size());
+   pr_warning("Guest memory size %"PRIu64"MB exceeds host physical 
RAM size %"PRIu64"MB", ram_size, host_ram_size());
 
ram_size <<= MB_SHIFT;
 
@@ -878,7 +880,7 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
virtio_blk__init_all(kvm);
}
 
-   printf("  # kvm run -k %s -m %Lu -c %d --name %s\n", kernel_filename, 
ram_size / 1024 / 1024, nrcpus, guest_name);
+   printf("  # kvm run -k %s -m %"PRId64" -c %d --name %s\n", 
kernel_filename, ram_size / 1024 / 1024, nrcpus, guest_name);
 
if (!kvm__load_kernel(kvm, kernel_filename, initrd_filename,
real_cmdline, vidmode))
diff --git a/tools/kvm/builtin-stat.c b/tools/kvm/builtin-stat.c
index e28eb5b..c1f2605 100644
--- a/tools/kvm/builtin-stat.c
+++ b/tools/kvm/builtin-stat.c
@@ -9,6 +9,8 @@
 #include 
 #include 
 #include 
+#define __STDC_FORMAT_MACROS
+#include 
 
 #include 
 
@@ -97,7 +99,7 @@ static int do_memstat(const char *name, int sock)
printf("The total amount of memory available (in 
bytes):");
break;
}
-   printf("%llu\n", stats[i].val);
+   printf("%"PRId64"\n", stats[i].val);
}
printf("\n");
 
diff --git a/tools/kvm/disk/core.c b/tools/kvm/disk/core.c
index 4915efd..a135851 100644
--- a/tools/kvm/disk/core.c
+++ b/tools/kvm/disk/core.c
@@ -4,6 +4,8 @@
 
 #include 
 #include 
+#define __STDC_FORMAT_MACROS
+#include 
 
 #define AIO_MAX 32
 
@@ -232,7 +234,7 @@ ssize_t disk_image__get_serial(struct disk_image *disk, 
void *buffer, ssize_t *l
if (fstat(disk->fd, &st) != 0)
return 0;
 
-   *len = snprintf(buffer, *len, "%llu%llu%llu", (u64)st.st_dev, 
(u64)st.st_rdev, (u64)st.st_ino);
+   *len = snprintf(buffer, *len, "%"PRId64"%"PRId64"%"PRId64, 
(u64)st.st_dev, (u64)st.st_rdev, (u64)st.st_ino);
return *len;
 }
 
diff --git a/tools/kvm/mmio.c b/tools/kvm/mmio.c
index de7320f..1158bff 100644
--- a/tools/kvm/mmio.c
+++ b/tools/kvm/mmio.c
@@ -9,6 +9,8 @@
 #include 
 #include 
 #include 
+#define __STDC_FORMAT_MACROS
+#include 
 
 #define mmio_node(n) rb_entry(n, struct mmio_mapping, node)
 
@@ -124,7 +126,7 @@ bool kvm__emulate_mmio(struct kvm *k

[PATCH 04/28] kvm tools: Re-arrange Makefile to heed CFLAGS before checking for optional libs

2011-12-05 Thread Matt Evans
The checks for optional libraries build code to perform the tests, so should
respect certain CFLAGS -- in particular, -m64 so we check for 64bit libraries if
they're required.

Signed-off-by: Matt Evans 
---
 tools/kvm/Makefile |   86 ++-
 1 files changed, 44 insertions(+), 42 deletions(-)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index f85a154..009a6ba 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -85,48 +85,6 @@ OBJS += hw/vesa.o
 OBJS   += hw/pci-shmem.o
 OBJS   += kvm-ipc.o
 
-FLAGS_BFD := $(CFLAGS) -lbfd
-has_bfd := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD))
-ifeq ($(has_bfd),y)
-   CFLAGS  += -DCONFIG_HAS_BFD
-   OBJS+= symbol.o
-   LIBS+= -lbfd
-endif
-
-FLAGS_VNCSERVER := $(CFLAGS) -lvncserver
-has_vncserver := $(call try-cc,$(SOURCE_VNCSERVER),$(FLAGS_VNCSERVER))
-ifeq ($(has_vncserver),y)
-   OBJS+= ui/vnc.o
-   CFLAGS  += -DCONFIG_HAS_VNCSERVER
-   LIBS+= -lvncserver
-endif
-
-FLAGS_SDL := $(CFLAGS) -lSDL
-has_SDL := $(call try-cc,$(SOURCE_SDL),$(FLAGS_SDL))
-ifeq ($(has_SDL),y)
-   OBJS+= ui/sdl.o
-   CFLAGS  += -DCONFIG_HAS_SDL
-   LIBS+= -lSDL
-endif
-
-FLAGS_ZLIB := $(CFLAGS) -lz
-has_ZLIB := $(call try-cc,$(SOURCE_ZLIB),$(FLAGS_ZLIB))
-ifeq ($(has_ZLIB),y)
-   CFLAGS  += -DCONFIG_HAS_ZLIB
-   LIBS+= -lz
-endif
-
-FLAGS_AIO := $(CFLAGS) -laio
-has_AIO := $(call try-cc,$(SOURCE_AIO),$(FLAGS_AIO))
-ifeq ($(has_AIO),y)
-   CFLAGS  += -DCONFIG_HAS_AIO
-   LIBS+= -laio
-endif
-
-LIBS   += -lrt
-LIBS   += -lpthread
-LIBS   += -lutil
-
 # Additional ARCH settings for x86
 ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ \
   -e s/arm.*/arm/ -e s/sa110/arm/ \
@@ -172,6 +130,50 @@ else
UNSUPP_ERR =
 endif
 
+
+FLAGS_BFD := $(CFLAGS) -lbfd
+has_bfd := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD))
+ifeq ($(has_bfd),y)
+   CFLAGS  += -DCONFIG_HAS_BFD
+   OBJS+= symbol.o
+   LIBS+= -lbfd
+endif
+
+FLAGS_VNCSERVER := $(CFLAGS) -lvncserver
+has_vncserver := $(call try-cc,$(SOURCE_VNCSERVER),$(FLAGS_VNCSERVER))
+ifeq ($(has_vncserver),y)
+   OBJS+= ui/vnc.o
+   CFLAGS  += -DCONFIG_HAS_VNCSERVER
+   LIBS+= -lvncserver
+endif
+
+FLAGS_SDL := $(CFLAGS) -lSDL
+has_SDL := $(call try-cc,$(SOURCE_SDL),$(FLAGS_SDL))
+ifeq ($(has_SDL),y)
+   OBJS+= ui/sdl.o
+   CFLAGS  += -DCONFIG_HAS_SDL
+   LIBS+= -lSDL
+endif
+
+FLAGS_ZLIB := $(CFLAGS) -lz
+has_ZLIB := $(call try-cc,$(SOURCE_ZLIB),$(FLAGS_ZLIB))
+ifeq ($(has_ZLIB),y)
+   CFLAGS  += -DCONFIG_HAS_ZLIB
+   LIBS+= -lz
+endif
+
+FLAGS_AIO := $(CFLAGS) -laio
+has_AIO := $(call try-cc,$(SOURCE_AIO),$(FLAGS_AIO))
+ifeq ($(has_AIO),y)
+   CFLAGS  += -DCONFIG_HAS_AIO
+   LIBS+= -laio
+endif
+
+LIBS   += -lrt
+LIBS   += -lpthread
+LIBS   += -lutil
+
+
 DEPS   := $(patsubst %.o,%.d,$(OBJS))
 OBJS   += $(OTHEROBJS)
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/28] kvm tools: Add Makefile parameter for kernel include path

2011-12-05 Thread Matt Evans
This patch adds an 'I' parameter to override the default kernel include path of
'../../include'.

Signed-off-by: Matt Evans 
---
 tools/kvm/Makefile |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index f58a1d8..f85a154 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -9,7 +9,12 @@ else
E = @\#
Q =
 endif
-export E Q
+ifneq ($(I), )
+   KINCL_PATH=$(I)
+else
+   KINCL_PATH=../..
+endif
+export E Q KINCL_PATH
 
 include config/utilities.mak
 include config/feature-tests.mak
@@ -176,7 +181,7 @@ DEFINES += -DKVMTOOLS_VERSION='"$(KVMTOOLS_VERSION)"'
 DEFINES+= -DBUILD_ARCH='"$(ARCH)"'
 
 KVM_INCLUDE := include
-CFLAGS += $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) -I$(ARCH_INCLUDE) 
-I../../include -I../../arch/$(ARCH)/include/ -Os -g
+CFLAGS += $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) -I$(ARCH_INCLUDE) 
-I$(KINCL_PATH)/include -I$(KINCL_PATH)/arch/$(ARCH)/include/ -Os -g
 
 ifneq ($(WERROR),0)
WARNINGS += -Werror
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/28] kvm tools: Only build/init i8042 on x86

2011-12-05 Thread Matt Evans
Not every architecture has an i8042 kbd controller, so only use this when
building for x86.

Signed-off-by: Matt Evans 
---
 tools/kvm/Makefile  |2 +-
 tools/kvm/builtin-run.c |2 ++
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 243886e..f58a1d8 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -77,7 +77,6 @@ OBJS  += util/strbuf.o
 OBJS   += virtio/9p.o
 OBJS   += virtio/9p-pdu.o
 OBJS   += hw/vesa.o
-OBJS   += hw/i8042.o
 OBJS   += hw/pci-shmem.o
 OBJS   += kvm-ipc.o
 
@@ -153,6 +152,7 @@ ifeq ($(ARCH),x86)
OBJS+= x86/kvm.o
OBJS+= x86/kvm-cpu.o
OBJS+= x86/mptable.o
+   OBJS+= hw/i8042.o
 # Exclude BIOS object files from header dependencies.
OTHEROBJS   += x86/bios.o
OTHEROBJS   += x86/bios/bios-rom.o
diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 9148d83..e4aa87e 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -941,7 +941,9 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
kvm__init_ram(kvm);
 
+#ifdef CONFIG_X86
kbd__init(kvm);
+#endif
 
pci_shmem__init(kvm);
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/28] kvm tools: Prepare kvmtool for another architecture

2011-12-05 Thread Matt Evans
Hi,


This patch series rearranges and tidies various parts of kvmtool to pave the way
for the addition of support for another architecture -- SPAPR PPC64.  A second
patch series will follow to present the PPC64 support.

kvmtool is extremely x86-specific, so a fair chunk of refactoring into "common
code" vs "architecture-specific code" is performed in this set.  It also has a
(refreshingly small) set of endian bugs that are fixed, plus assumptions about
the hardware presented to the guest.

I've started the series with the main meat-- moving/renaming things like bios,
CPU setup, guest address space layout, interrupts, ioports etc., into a new x86/
directory.  The Makefile determines an architecture and builds the appropriate
dir, devices, etc.

Follow-on patches change some of the mechanics, for example modifying the loop
around ioctl(KVM_RUN) so that whilst it stays generic, it calls into
arch-specific code to handle specific exit reasons, MMIO etc.  The builtin-run
initialisation path is rationalised so that PCI & IRQs are initialised before
devices, and all of this happens before arch-specific code is given the chance
to initialise any firmware and generate any device trees.

Most of this series is fairly trivial, in moving code, making definitions
arch-local or available via a header, endian sanitisation.  The PCI code changes
are probably most 'interesting', in that I have made the config space accesses
available to those not using the PC ioport access method, plus wrapped
initialisations of config space with cpu_to_leXX accesses.

If there's anything in this series that'll cause the world to end, or stain, do
let me know. :)


Cheers,


Matt



Matt Evans (28):
  kvm tools: Split x86 arch-specific bits into x86/
  kvm tools: Only build/init i8042 on x86
  kvm tools: Add Makefile parameter for kernel include path
  kvm tools: Re-arrange Makefile to heed CFLAGS before checking for
optional libs
  kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link
appropriately
  kvm tools: Add arch-specific KVM_RUN exit handling via
kvm_cpu__handle_exit()
  kvm tools: Move 'kvm__recommended_cpus' to arch-specific code
  kvm tools: Fix KVM_RUN exit code check
  kvm tools: Add kvm__arch_periodic_poll()
  kvm tools: term.h needs to include stdbool.h
  kvm tools: kvm.c needs to include sys/stat.h for mkdir
  kvm tools: Move arch-specific cmdline init into
kvm__arch_set_cmdline()
  kvm tools: Add CONSOLE_HV term type and allow it to be selected
  kvm tools: Fix term_getc(), term_getc_iov() endian bugs
  kvm tools: Allow initrd_check() to match a cpio
  kvm tools: Allow load_flat_binary() to load an initrd alongside
  kvm tools: Only call symbol__init() if we have BFD
  kvm tools: Initialise PCI before devices start getting registered
with PCI
  kvm tools: Perform CPU and firmware setup after devices are added
  kvm tools: Init IRQs after determining nrcpus
  kvm tools: Add --hugetlbfs option to specify memory path
  kvm tools: Move PCI_MAX_DEVICES to pci.h
  kvm tools: Endian-sanitise pci.h and PCI device setup
  kvm tools: Fix virtio-pci endian bug when reading
VIRTIO_PCI_QUEUE_NUM
  kvm tools: Correctly set virtio-pci bar_size and remove hardwired
address
  kvm tools: Add pci__config_{rd,wr}(), pci__find_dev() and fix PCI
config register addressing
  kvm tools: Arch-specific define for PCI MMIO allocation area
  kvm tools: Create arch-specific kvm_cpu__emulate_io()

 tools/kvm/Makefile  |  139 +---
 tools/kvm/builtin-run.c |   82 +++--
 tools/kvm/builtin-stat.c|4 +-
 tools/kvm/disk/core.c   |4 +-
 tools/kvm/hw/pci-shmem.c|   23 +-
 tools/kvm/hw/vesa.c |   15 +-
 tools/kvm/include/kvm/ioport.h  |   13 +-
 tools/kvm/include/kvm/kvm-cpu.h |   30 +--
 tools/kvm/include/kvm/kvm.h |   62 +---
 tools/kvm/include/kvm/pci.h |   30 ++-
 tools/kvm/include/kvm/term.h|2 +
 tools/kvm/ioport.c  |   54 ---
 tools/kvm/kvm-cpu.c |  407 +-
 tools/kvm/kvm.c |  374 +---
 tools/kvm/mmio.c|4 +-
 tools/kvm/pci.c |   76 +++--
 tools/kvm/term.c|5 +-
 tools/kvm/virtio/pci.c  |   51 ++--
 tools/kvm/{ => x86}/bios.c  |0
 tools/kvm/{ => x86}/bios/.gitignore |0
 tools/kvm/{ => x86}/bios/bios-rom.S |2 +-
 tools/kvm/{ => x86}/bios/e820.c |0
 tools/kvm/{ => x86}/bios/entry.S|0
 tools/kvm/{ => x86}/bios/gen-offsets.sh |0
 tools/kvm/{ => x86}/bios/int10.c|0
 tools/kvm/{ => x86}/bios/

Re: [PATCH RFC V3 4/4] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

2011-12-05 Thread Konrad Rzeszutek Wilk
On Wed, Nov 30, 2011 at 02:30:38PM +0530, Raghavendra K T wrote:
> This patch extends Linux guests running on KVM hypervisor to support
> pv-ticketlocks. 
> During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
> required feature (KVM_FEATURE_KICK_VCPU) to support pv-ticketlocks. If so,
>  support for pv-ticketlocks is registered via pv_lock_ops.
> 
> Signed-off-by: Srivatsa Vaddagiri 
> Signed-off-by: Suzuki Poulose 
> Signed-off-by: Raghavendra K T 
> ---
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 8b1d65d..7e419ad 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -195,10 +195,21 @@ void kvm_async_pf_task_wait(u32 token);
>  void kvm_async_pf_task_wake(u32 token);
>  u32 kvm_read_and_reset_pf_reason(void);
>  extern void kvm_disable_steal_time(void);
> -#else
> -#define kvm_guest_init() do { } while (0)
> +
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
> +void __init kvm_spinlock_init(void);
> +#else /* CONFIG_PARAVIRT_SPINLOCKS */
> +static void kvm_spinlock_init(void)
> +{
> +}
> +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
> +
> +#else /* CONFIG_KVM_GUEST */
> +#define kvm_guest_init() do {} while (0)
>  #define kvm_async_pf_task_wait(T) do {} while(0)
>  #define kvm_async_pf_task_wake(T) do {} while(0)
> +#define kvm_spinlock_init() do {} while (0)
> +
>  static inline u32 kvm_read_and_reset_pf_reason(void)
>  {
>   return 0;
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index a9c2116..dffeea3 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -545,6 +546,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
>  #endif
>   kvm_guest_cpu_init();
>   native_smp_prepare_boot_cpu();
> + kvm_spinlock_init();
>  }
>  
>  static void __cpuinit kvm_guest_cpu_online(void *dummy)
> @@ -627,3 +629,248 @@ static __init int activate_jump_labels(void)
>   return 0;
>  }
>  arch_initcall(activate_jump_labels);
> +
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
> +
> +enum kvm_contention_stat {
> + TAKEN_SLOW,
> + TAKEN_SLOW_PICKUP,
> + RELEASED_SLOW,
> + RELEASED_SLOW_KICKED,
> + NR_CONTENTION_STATS
> +};
> +
> +#ifdef CONFIG_KVM_DEBUG_FS
> +
> +static struct kvm_spinlock_stats
> +{
> + u32 contention_stats[NR_CONTENTION_STATS];
> +
> +#define HISTO_BUCKETS30
> + u32 histo_spin_blocked[HISTO_BUCKETS+1];
> +
> + u64 time_blocked;
> +} spinlock_stats;
> +
> +static u8 zero_stats;
> +
> +static inline void check_zero(void)
> +{
> + u8 ret;
> + u8 old = ACCESS_ONCE(zero_stats);
> + if (unlikely(old)) {
> + ret = cmpxchg(&zero_stats, old, 0);
> + /* This ensures only one fellow resets the stat */
> + if (ret == old)
> + memset(&spinlock_stats, 0, sizeof(spinlock_stats));
> + }
> +}
> +
> +static inline void add_stats(enum kvm_contention_stat var, int val)

You probably want 'int val' to be 'u32 val' as that is the type
in contention_stats.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2011-12-05 Thread Cao,Bing Bu

subscribe kvm

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH RFC V3 1/4] debugfs: Add support to print u32 array in debugfs

2011-12-05 Thread Konrad Rzeszutek Wilk
On Wed, Nov 30, 2011 at 02:29:39PM +0530, Raghavendra K T wrote:
> Add debugfs support to print u32-arrays in debugfs. Move the code from Xen to 
> debugfs
> to make the code common for other users as well.
> 
> Signed-off-by: Srivatsa Vaddagiri 
> Signed-off-by: Suzuki Poulose 
> Signed-off-by: Raghavendra K T 

Looks good to me.
> ---
> diff --git a/arch/x86/xen/debugfs.c b/arch/x86/xen/debugfs.c
> index 7c0fedd..c8377fb 100644
> --- a/arch/x86/xen/debugfs.c
> +++ b/arch/x86/xen/debugfs.c
> @@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(void)
>   return d_xen_debug;
>  }
>  
> -struct array_data
> -{
> - void *array;
> - unsigned elements;
> -};
> -
> -static int u32_array_open(struct inode *inode, struct file *file)
> -{
> - file->private_data = NULL;
> - return nonseekable_open(inode, file);
> -}
> -
> -static size_t format_array(char *buf, size_t bufsize, const char *fmt,
> -u32 *array, unsigned array_size)
> -{
> - size_t ret = 0;
> - unsigned i;
> -
> - for(i = 0; i < array_size; i++) {
> - size_t len;
> -
> - len = snprintf(buf, bufsize, fmt, array[i]);
> - len++;  /* ' ' or '\n' */
> - ret += len;
> -
> - if (buf) {
> - buf += len;
> - bufsize -= len;
> - buf[-1] = (i == array_size-1) ? '\n' : ' ';
> - }
> - }
> -
> - ret++;  /* \0 */
> - if (buf)
> - *buf = '\0';
> -
> - return ret;
> -}
> -
> -static char *format_array_alloc(const char *fmt, u32 *array, unsigned 
> array_size)
> -{
> - size_t len = format_array(NULL, 0, fmt, array, array_size);
> - char *ret;
> -
> - ret = kmalloc(len, GFP_KERNEL);
> - if (ret == NULL)
> - return NULL;
> -
> - format_array(ret, len, fmt, array, array_size);
> - return ret;
> -}
> -
> -static ssize_t u32_array_read(struct file *file, char __user *buf, size_t 
> len,
> -   loff_t *ppos)
> -{
> - struct inode *inode = file->f_path.dentry->d_inode;
> - struct array_data *data = inode->i_private;
> - size_t size;
> -
> - if (*ppos == 0) {
> - if (file->private_data) {
> - kfree(file->private_data);
> - file->private_data = NULL;
> - }
> -
> - file->private_data = format_array_alloc("%u", data->array, 
> data->elements);
> - }
> -
> - size = 0;
> - if (file->private_data)
> - size = strlen(file->private_data);
> -
> - return simple_read_from_buffer(buf, len, ppos, file->private_data, 
> size);
> -}
> -
> -static int xen_array_release(struct inode *inode, struct file *file)
> -{
> - kfree(file->private_data);
> -
> - return 0;
> -}
> -
> -static const struct file_operations u32_array_fops = {
> - .owner  = THIS_MODULE,
> - .open   = u32_array_open,
> - .release= xen_array_release,
> - .read   = u32_array_read,
> - .llseek = no_llseek,
> -};
> -
> -struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
> - struct dentry *parent,
> - u32 *array, unsigned elements)
> -{
> - struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
> -
> - if (data == NULL)
> - return NULL;
> -
> - data->array = array;
> - data->elements = elements;
> -
> - return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
> -}
> diff --git a/arch/x86/xen/debugfs.h b/arch/x86/xen/debugfs.h
> index e281320..12ebf33 100644
> --- a/arch/x86/xen/debugfs.h
> +++ b/arch/x86/xen/debugfs.h
> @@ -3,8 +3,4 @@
>  
>  struct dentry * __init xen_init_debugfs(void);
>  
> -struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
> - struct dentry *parent,
> - u32 *array, unsigned elements);
> -
>  #endif /* _XEN_DEBUGFS_H */
> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> index fc506e6..14a8961 100644
> --- a/arch/x86/xen/spinlock.c
> +++ b/arch/x86/xen/spinlock.c
> @@ -286,7 +286,7 @@ static int __init xen_spinlock_debugfs(void)
>   debugfs_create_u64("time_blocked", 0444, d_spin_debug,
>  &spinlock_stats.time_blocked);
>  
> - xen_debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
> + debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
>spinlock_stats.histo_spin_blocked, 
> HISTO_BUCKETS + 1);
>  
>   return 0;
> diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
> index 90f7657..df44ccf 100644
> --- a/fs/debugfs/file.c
> +++ b/fs/debugfs/file.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  static ssize_t default_read_file(struct file *file, char __user *buf,
> 

kvm deadlock

2011-12-05 Thread Nate Custer
Hello,

I am struggling with repeatable full hardware locks when running 8-12 KVM vms. 
At some point before the hard lock I get a inconsistent lock state warning. An 
example of this can be found here:

http://pastebin.com/8wKhgE2C

After that the server continues to run for a while and then starts its death 
spiral. When it reaches that point it fails to log anything further to the 
disk, but by attaching a console I have been able to get a stack trace 
documenting the final implosion:

http://pastebin.com/PbcN76bd

All of the cores end up hung and the server stops responding to all input, 
including SysRq commands. 

I have seen this behavior on two machines (dual E5606 running Fedora 16) both 
passed cpuburnin testing and memtest86 scans without error. 

I have reproduced the crash and stack traces from a Fedora debugging kernel - 
3.1.2-1 and with a vanilla 3.1.4 kernel.

Nate Custer
QA Analyst
cPanel Inc--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Ben Hutchings
On Mon, 2011-12-05 at 16:59 +0800, Jason Wang wrote:
> In order to let the packets of a flow to be passed to the desired
> guest cpu, we can co-operate with devices through programming the flow
> director which was just a hash to queue table.
> 
> This kinds of co-operation is done through the accelerate RFS support,
> a device specific flow sterring method virtnet_fd() is used to modify
> the flow director based on rfs mapping. The desired queue were
> calculated through reverse mapping of the irq affinity table. In order
> to parallelize the ingress path, irq affinity of rx queue were also
> provides by the driver.
> 
> In addition to accelerate RFS, we can also use the guest scheduler to
> balance the load of TX and reduce the lock contention on egress path,
> so the processor_id() were used to tx queue selection.
[...]
> +#ifdef CONFIG_RFS_ACCEL
> +
> +int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb,
> +u16 rxq_index, u32 flow_id)
> +{
> + struct virtnet_info *vi = netdev_priv(net_dev);
> + u16 *table = NULL;
> +
> + if (skb->protocol != htons(ETH_P_IP) || !skb->rxhash)
> + return -EPROTONOSUPPORT;

Why only IPv4?

> + table = kmap_atomic(vi->fd_page);
> + table[skb->rxhash & TAP_HASH_MASK] = rxq_index;
> + kunmap_atomic(table);
> +
> + return 0;
> +}
> +#endif

This is not a proper implementation of ndo_rx_flow_steer.  If you steer
a flow by changing the RSS table this can easily cause packet reordering
in other flows.  The filtering should be more precise, ideally matching
exactly a single flow by e.g. VID and IP 5-tuple.

I think you need to add a second hash table which records exactly which
flow is supposed to be steered.  Also, you must call
rps_may_expire_flow() to check whether an entry in this table may be
replaced; otherwise you can cause packet reordering in the flow that was
previously being steered.

Finally, this function must return the table index it assigned, so that
rps_may_expire_flow() works.

> +static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
> +{
> + int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> +smp_processor_id();
> +
> + /* As we make use of the accelerate rfs which let the scheduler to
> +  * balance the load, it make sense to choose the tx queue also based on
> +  * theprocessor id?
> +  */
> + while (unlikely(txq >= dev->real_num_tx_queues))
> + txq -= dev->real_num_tx_queues;
> + return txq;
> +}
[...]

Don't do this, let XPS handle it.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5 V5] Add ioctl for KVMCLOCK_GUEST_STOPPED

2011-12-05 Thread Eric B Munson
Now that we have a flag that will tell the guest it was suspended, create an
interface for that communication using a KVM ioctl.

Signed-off-by: Eric B Munson 

Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@arndb.de
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: jeremy.fitzhardi...@citrix.com
Cc: levinsasha...@gmail.com
Cc: Jan Kiszka 
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
---
Changes from V4:
 Rename KVM_GUEST_PAUSED to KVMCLOCK_GUEST_PAUSED
 Add new ioctl description to api.txt

 Documentation/virtual/kvm/api.txt |   12 
 arch/x86/include/asm/kvm_host.h   |2 ++
 arch/x86/kvm/x86.c|   20 
 include/linux/kvm.h   |2 ++
 4 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 7945b0b..0f7dd99 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1450,6 +1450,18 @@ is supported; 2 if the processor requires all virtual 
machines to have
 an RMA, or 1 if the processor can use an RMA but doesn't require it,
 because it supports the Virtual RMA (VRMA) facility.
 
+4.64 KVMCLOCK_GUEST_PAUSED
+
+Capability: basic
+Architechtures: Any that implement pvclocks (currently x86 only)
+Type: vcpu ioctl
+Parameters: None
+Returns: 0 on success, -1 on error
+
+This signals to the host kernel that the specified guest is being paused by
+userspace.  The host will set a flag in the pvclock structure that is checked
+from the soft lockup watchdog.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b4973f4..beb94c6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -672,6 +672,8 @@ int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long 
bytes,
  gpa_t addr, unsigned long *ret);
 u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
 
+int kvm_set_guest_paused(struct kvm_vcpu *vcpu);
+
 extern bool tdp_enabled;
 
 u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c38efd7..1dab5fd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3295,6 +3295,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
goto out;
}
+   case KVMCLOCK_GUEST_PAUSED: {
+   r = kvm_set_guest_paused(vcpu);
+   break;
+   }
default:
r = -EINVAL;
}
@@ -6117,6 +6121,22 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 
tss_selector, int reason,
 }
 EXPORT_SYMBOL_GPL(kvm_task_switch);
 
+/*
+ * kvm_set_guest_paused() indicates to the guest kernel that it has been
+ * stopped by the hypervisor.  This function will be called from the host only.
+ * EINVAL is returned when the host attempts to set the flag for a guest that
+ * does not support pv clocks.
+ */
+int kvm_set_guest_paused(struct kvm_vcpu *vcpu)
+{
+   struct pvclock_vcpu_time_info *src = &vcpu->arch.hv_clock;
+   if (!vcpu->arch.time_page)
+   return -EINVAL;
+   src->flags |= PVCLOCK_GUEST_STOPPED;
+   return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_set_guest_paused);
+
 int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
  struct kvm_sregs *sregs)
 {
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index c3892fc..1d1ddef 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -762,6 +762,8 @@ struct kvm_clock_data {
 #define KVM_CREATE_SPAPR_TCE _IOW(KVMIO,  0xa8, struct 
kvm_create_spapr_tce)
 /* Available with KVM_CAP_RMA */
 #define KVM_ALLOCATE_RMA _IOR(KVMIO,  0xa9, struct kvm_allocate_rma)
+/* VM is being stopped by host */
+#define KVMCLOCK_GUEST_PAUSED_IO(KVMIO,   0xaa)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5 V5] Add generic stubs for kvm stop check functions

2011-12-05 Thread Eric B Munson
Signed-off-by: Eric B Munson 
Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@arndb.de
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: jeremy.fitzhardi...@citrix.com
Cc: levinsasha...@gmail.com
Cc: Jan Kiszka 
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
---
 include/asm-generic/kvm_para.h |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)
 create mode 100644 include/asm-generic/kvm_para.h

diff --git a/include/asm-generic/kvm_para.h b/include/asm-generic/kvm_para.h
new file mode 100644
index 000..177e1eb
--- /dev/null
+++ b/include/asm-generic/kvm_para.h
@@ -0,0 +1,14 @@
+#ifndef _ASM_GENERIC_KVM_PARA_H
+#define _ASM_GENERIC_KVM_PARA_H
+
+
+/*
+ * This function is used by architectures that support kvm to avoid issuing
+ * false soft lockup messages.
+ */
+static inline bool kvm_check_and_clear_guest_paused(int cpu)
+{
+   return false;
+}
+
+#endif
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 RFC] virtio-pci: flexible configuration layout

2011-12-05 Thread Michael S. Tsirkin
On Mon, Dec 05, 2011 at 11:16:05AM -0800, Jesse Barnes wrote:
> On Mon, 14 Nov 2011 20:18:55 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > Add a flexible mechanism to specify virtio configuration layout, using
> > pci vendor-specific capability.  A separate capability is used for each
> > of common, device specific and data-path accesses.
> > 
> > Warning: compiled only.
> > This patch also needs to be split up, pci_iomap changes
> > also need arch updates for non-x86.
> > There might also be more spec changes.
> > 
> > Posting here for early feedback, and to allow Sasha to
> > proceed with his "kvm tool" work.
> > 
> > Changes from v1:
> > Updated to match v3 of the spec, see:
> > Subject: [PATCHv3 RFC] virtio-spec: flexible configuration layout
> > Message-ID: <2010122436.ga13...@redhat.com>
> > In-Reply-To: <2009195901.ga28...@redhat.com>
> 
> Looks like this conflicts with your other iomap changes... I didn't
> check your latest tree; do you just add another patch on top for the
> virtio changes now?
> 
> Thanks,

Yes. Rusty asked for more changes so that isn't yet pushed.

> -- 
> Jesse Barnes, Intel Open Source Technology Center


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: winXP "Standard PC" HAL and qemu-kvm >= 0.15

2011-12-05 Thread Michael Tokarev
On 05.12.2011 17:28, Avi Kivity wrote:
[]
>> I haven't debugged further yet, -- because it were
>> not easy to find out what was causing the regression
>> and how to reproduce it, and also because I don't think
>> it is the right HAL for qemu-kvm guest anyway.
> 
> It's not, but the regression indicates we broke something.  It would be
> good to know what that is.

So today I gave it a chance with git bisect, and here's what it found:

First bad commit ef390067a72fe09977bb4ac8211313e1503302ea
Merge: c7b3e90 0fd542f
Author: Avi Kivity 
Date:   Sun May 15 04:48:05 2011 -0400

Merge commit '0fd542fb7d13ddf12f897bb27c5950f31638b1df' into upstream-merge

* commit '0fd542fb7d13ddf12f897bb27c5950f31638b1df':
  cpu: add set_memory flag to request dirty logging
  piix_pci: load path clean up
  piix_pci: optimize set irq path
  piix_pci: eliminate PIIX3State::pci_irq_levels
  pci: add accessor function to get irq levels
  cirrus_vga: remove unneeded reset

Conflicts:
exec.c

Signed-off-by: Avi Kivity 

And just like with the 32/64bit lockup issue, this is a merge
commit, which is not exactly useful.

Any guesses? :)

The problem is that so far, there's no known way to change to
use proper hal type in winXP (except of reinstalling the guest),
and there's no known workaround on the kvm side, so users are
stuck with older versions.

>> So, if anybody have some thoughts about this issue,
>> and especially if you know a way to switch winXP HAL
>> type to some ACPI variant without reinstalling, please
>> speak up.. ;)
> 
> I remember doing it somewhere in device manager, perhaps in the
> processor entry.  But it was years since I last did this.

As I already mentioned, changing HAL type works from anything to
"Standard PC", but not back.  I'll try to investigate.

>> Debian bugreport for a reference: http://bugs.debian.org/647312
>>
>> Reproducer: install a winXP guest on kvm with -no-acpi so
>> it chooses an "Uniprocessor with MPS" HAL.  Switch it to
>> "Standard PC" in device manager, reboot -- in 0.15+ it does
>> not work anymore, while in 0.14 it continues to work fine.
> 
> Most likely non-ACPI interrupt routing.

The commit it bisected to talks about piix -- may it be related?

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5 V5] Add check for suspended vm in softlockup detector

2011-12-05 Thread Eric B Munson
A suspended VM can cause spurious soft lockup warnings.  To avoid these, the
watchdog now checks if the kernel knows it was stopped by the host and skips
the warning if so.  When the watchdog is reset successfully, clear the guest
paused flag.

Signed-off-by: Eric B Munson 
Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@arndb.de
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: jeremy.fitzhardi...@citrix.com
Cc: levinsasha...@gmail.com
Cc: Jan Kiszka 
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
---
Changes from V3:
 Clear the PAUSED flag when the watchdog is reset

 kernel/watchdog.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 1d7bca7..7c62919 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -25,6 +25,7 @@
 #include 
 
 #include 
+#include 
 #include 
 
 int watchdog_enabled = 1;
@@ -280,6 +281,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
__this_cpu_write(softlockup_touch_sync, false);
sched_clock_tick();
}
+
+   /* Clear the guest paused flag on watchdog reset */
+   kvm_check_and_clear_guest_paused(smp_processor_id());
__touch_watchdog();
return HRTIMER_RESTART;
}
@@ -292,6 +296,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
 */
duration = is_softlockup(touch_ts);
if (unlikely(duration)) {
+   /*
+* If a virtual machine is stopped by the host it can look to
+* the watchdog like a soft lockup, check to see if the host
+* stopped the vm before we issue the warning
+*/
+   if (kvm_check_and_clear_guest_paused(smp_processor_id()))
+   return HRTIMER_RESTART;
+
/* only warn once */
if (__this_cpu_read(soft_watchdog_warn) == true)
return HRTIMER_RESTART;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5 V5] Avoid soft lockup message when KVM is stopped by host

2011-12-05 Thread Eric B Munson
Changes from V4:
Rename KVM_GUEST_PAUSED to KVMCLOCK_GUEST_PAUSED
Add description of KVMCLOCK_GUEST_PAUSED ioctl to api.txt

Changes from V3:
Include CC's on patch 3
Drop clear flag ioctl and have the watchdog clear the flag when it is reset

Changes from V2:
A new kvm functions defined in kvm_para.h, the only change to pvclock is the
initial flag definition

Changes from V1:
(Thanks Marcelo)
Host code has all been moved to arch/x86/kvm/x86.c
KVM_PAUSE_GUEST was renamed to KVM_GUEST_PAUSED

When a guest kernel is stopped by the host hypervisor it can look like a soft
lockup to the guest kernel.  This false warning can mask later soft lockup
warnings which may be real.  This patch series adds a method for a host
hypervisor to communicate to a guest kernel that it is being stopped.  The
final patch in the series has the watchdog check this flag when it goes to
issue a soft lockup warning and skip the warning if the guest knows it was
stopped.

It was attempted to solve this in Qemu, but the side effects of saving and
restoring the clock and tsc for each vcpu put the wall clock of the guest behind
by the amount of time of the pause.  This forces a guest to have ntp running
in order to keep the wall clock accurate.

Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@arndb.de
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: jeremy.fitzhardi...@citrix.com
Cc: levinsasha...@gmail.com
Cc: Jan Kiszka 
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org

Eric B Munson (5):
  Add flag to indicate that a vm was stopped by the host
  Add functions to check if the host has stopped the vm
  Add ioctl for KVMCLOCK_GUEST_STOPPED
  Add generic stubs for kvm stop check functions
  Add check for suspended vm in softlockup detector

 Documentation/virtual/kvm/api.txt  |   12 
 arch/x86/include/asm/kvm_host.h|2 ++
 arch/x86/include/asm/kvm_para.h|1 +
 arch/x86/include/asm/pvclock-abi.h |1 +
 arch/x86/kernel/kvmclock.c |   21 +
 arch/x86/kvm/x86.c |   20 
 include/asm-generic/kvm_para.h |   14 ++
 include/linux/kvm.h|2 ++
 kernel/watchdog.c  |   12 
 9 files changed, 85 insertions(+), 0 deletions(-)
 create mode 100644 include/asm-generic/kvm_para.h

-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5 V5] Add functions to check if the host has stopped the vm

2011-12-05 Thread Eric B Munson
When a host stops or suspends a VM it will set a flag to show this.  The
watchdog will use these functions to determine if a softlockup is real, or the
result of a suspended VM.

Signed-off-by: Eric B Munson 
Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@arndb.de
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: jeremy.fitzhardi...@citrix.com
Cc: levinsasha...@gmail.com
Cc: Jan Kiszka 
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
---
 arch/x86/include/asm/kvm_para.h |1 +
 arch/x86/kernel/kvmclock.c  |   21 +
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 734c376..e9d63a6 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -95,6 +95,7 @@ struct kvm_vcpu_pv_apf_data {
 extern void kvmclock_init(void);
 extern int kvm_register_clock(char *txt);
 
+bool kvm_check_and_clear_guest_paused(int cpu);
 
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
  * trap that we will then rewrite to the appropriate instruction.
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 44842d7..f0c0599 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -114,6 +115,26 @@ static void kvm_get_preset_lpj(void)
preset_lpj = lpj;
 }
 
+bool kvm_check_and_clear_guest_paused(int cpu)
+{
+   bool ret = false;
+   struct pvclock_vcpu_time_info *src;
+
+   /*
+* per_cpu() is safe here because this function is only called from
+* timer functions where preemption is already disabled.
+*/
+   WARN_ON(!in_atomic());
+   src = &per_cpu(hv_clock, cpu);
+   if ((src->flags & PVCLOCK_GUEST_STOPPED) != 0) {
+   src->flags = src->flags & (~PVCLOCK_GUEST_STOPPED);
+   ret = true;
+   }
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_check_and_clear_guest_paused);
+
 static struct clocksource kvm_clock = {
.name = "kvm-clock",
.read = kvm_clock_get_cycles,
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5 V5] Add flag to indicate that a vm was stopped by the host

2011-12-05 Thread Eric B Munson
This flag will be used to check if the vm was stopped by the host when a soft
lockup was detected.  The host will set the flag when it stops the guest.  On
resume, the guest will check this flag if a soft lockup is detected and skip
issuing the warning.

Signed-off-by: Eric B Munson 
Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@arndb.de
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: jeremy.fitzhardi...@citrix.com
Cc: levinsasha...@gmail.com
Cc: Jan Kiszka 
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
---
 arch/x86/include/asm/pvclock-abi.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/pvclock-abi.h 
b/arch/x86/include/asm/pvclock-abi.h
index 35f2d19..6167fd7 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -40,5 +40,6 @@ struct pvclock_wall_clock {
 } __attribute__((__packed__));
 
 #define PVCLOCK_TSC_STABLE_BIT (1 << 0)
+#define PVCLOCK_GUEST_STOPPED  (1 << 1)
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PVCLOCK_ABI_H */
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4] Guest stop notification

2011-12-05 Thread Eric B Munson
Often when a guest is stopped from the qemu console, it will report spurious
soft lockup warnings on resume.  There are kernel patches being discussed that
will give the host the ability to tell the guest that it is being stopped and
should ignore the soft lockup warning that generates.  This patch uses the qemu
Notifier system to tell the guest it is about to be stopped.

Signed-off-by: Eric B Munson 
Cc: Avi Kivity 
Cc: Marcelo Tosatti 
Cc: Jan Kiszka 
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: kvm@vger.kernel.org

---
Changes from V3:
 Collapse new state change notification function into existsing function.
 Correct whitespace issues
 Change ioctl name to KVMCLOCK_GUEST_PAUSED
 Use for loop to iterate vpcu's

Changes from V2:
 Move ioctl into hw/kvmclock.c so as other arches can use it as it is
implemented

Changes from V1:
 Remove unnecessary encapsulating function

 hw/kvmclock.c |   15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/hw/kvmclock.c b/hw/kvmclock.c
index 5388bc4..fa11dd7 100644
--- a/hw/kvmclock.c
+++ b/hw/kvmclock.c
@@ -16,6 +16,7 @@
 #include "sysbus.h"
 #include "kvm.h"
 #include "kvmclock.h"
+#include "cpu-all.h"
 
 #include 
 #include 
@@ -62,10 +63,24 @@ static int kvmclock_post_load(void *opaque, int version_id)
 static void kvmclock_vm_state_change(void *opaque, int running,
  RunState state)
 {
+int ret;
+CPUState *penv = first_cpu;
 KVMClockState *s = opaque;
 
 if (running) {
 s->clock_valid = false;
+
+for (penv = first_cpu; penv != NULL; penv = penv->next_cpu) {
+ret = kvm_vcpu_ioctl(penv, KVMCLOCK_GUEST_PAUSED, 0);
+if (ret) {
+if (ret != -EINVAL) {
+fprintf(stderr,
+"kvmclock_vm_state_change: %s\n",
+strerror(-ret));
+}
+return;
+}
+}
 }
 }
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 3/5] macvtap: flow director support

2011-12-05 Thread Ben Hutchings
Similarly, macvtap chould implement the ethtool {get,set}_rxfh_indir
operations.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 2/5] tuntap: simple flow director support

2011-12-05 Thread Ben Hutchings
On Mon, 2011-12-05 at 16:58 +0800, Jason Wang wrote:
> This patch adds a simple flow director to tun/tap device. It is just a
> page that contains the hash to queue mapping which could be changed by
> user-space. The backend (tap/macvtap) would query this table to get
> the desired queue of a packets when it send packets to userspace.

This is just flow hashing (RSS), not flow steering.

> The page address were set through a new kind of ioctl - TUNSETFD and
> were pinned until device exit or another new page were specified.
[...]

You should implement ethtool ETHTOOL_{G,S}RXFHINDIR instead.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] ivshmem: fix guest unable to start with ioeventfd

2011-12-05 Thread Cam Macdonell
2011/12/2 Cam Macdonell :
> 2011/11/30 Cam Macdonell :
>> 2011/11/30 Zang Hongyong :
>>> Can this bug fix patch be applied yet?
>>
>> Sorry, for not replying yet.  I'll test your patch within the next day.
>
> Have you confirmed the proper receipt of interrupts in the receiving guests?
>
> I can confirm the bug occurs with ioeventfd enabled and that the
> patches fixes it, but sometime after 15.1, I no longer see interrupts
> (MSI or regular) being delivered in the guest.
>
> I will bisect tomorrow.

With Michael's help we debugged msi-x interrupt delivery.  With that
fix in place, this patch fixes ioeventfd in ivshmem.

>
> Cam
>
>>
>>> With this bug, guest os cannot successfully boot with ioeventfd.
>>> Thus the new PIO DoorBell patch cannot be posted.
>>
>> Well, you can certainly post the new patch, just clarify that it's
>> dependent on this patch.
>>
>> Sincerely,
>> Cam
>>
>>>
>>> Thanks,
>>> Hongyong
>>>
>>> 于 2011/11/24,星期四 18:05, zanghongy...@huawei.com 写道:
 From: Hongyong Zang 

 When a guest boots with ioeventfd, an error (by gdb) occurs:
   Program received signal SIGSEGV, Segmentation fault.
   0x006009cc in setup_ioeventfds (s=0x171dc40)
   at /home/louzhengwei/git_source/qemu-kvm/hw/ivshmem.c:363
   363 for (j = 0; j < s->peers[i].nb_eventfds; j++) {
 The bug is due to accessing s->peers which is NULL.

 This patch uses the memory region API to replace the old one 
 kvm_set_ioeventfd_mmio_long().
 And this patch makes memory_region_add_eventfd() called in ivshmem_read() 
 when qemu receives
 eventfd information from ivshmem_server.

 Signed-off-by: Hongyong Zang 
 ---
  hw/ivshmem.c |   41 ++---
  1 files changed, 14 insertions(+), 27 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 242fbea..be26f03 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -58,7 +58,6 @@ typedef struct IVShmemState {
  CharDriverState *server_chr;
  MemoryRegion ivshmem_mmio;

 -pcibus_t mmio_addr;
  /* We might need to register the BAR before we actually have the 
 memory.
   * So prepare a container MemoryRegion for the BAR immediately and
   * add a subregion when we have the memory.
 @@ -346,8 +345,14 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
  guest_curr_max = s->peers[posn].nb_eventfds;

  for (i = 0; i < guest_curr_max; i++) {
 -kvm_set_ioeventfd_mmio_long(s->peers[posn].eventfds[i],
 -s->mmio_addr + DOORBELL, (posn << 16) | i, 0);
 +if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 +memory_region_del_eventfd(&s->ivshmem_mmio,
 + DOORBELL,
 + 4,
 + true,
 + (posn << 16) | i,
 + s->peers[posn].eventfds[i]);
 +}
  close(s->peers[posn].eventfds[i]);
  }

 @@ -355,22 +360,6 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
  s->peers[posn].nb_eventfds = 0;
  }

 -static void setup_ioeventfds(IVShmemState *s) {
 -
 -int i, j;
 -
 -for (i = 0; i <= s->max_peer; i++) {
 -for (j = 0; j < s->peers[i].nb_eventfds; j++) {
 -memory_region_add_eventfd(&s->ivshmem_mmio,
 -  DOORBELL,
 -  4,
 -  true,
 -  (i << 16) | j,
 -  s->peers[i].eventfds[j]);
 -}
 -}
 -}
 -
  /* this function increase the dynamic storage need to store data about 
 other
   * guests */
  static void increase_dynamic_storage(IVShmemState *s, int new_min_size) {
 @@ -491,10 +480,12 @@ static void ivshmem_read(void *opaque, const uint8_t 
 * buf, int flags)
  }

  if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 -if (kvm_set_ioeventfd_mmio_long(incoming_fd, s->mmio_addr + 
 DOORBELL,
 -(incoming_posn << 16) | guest_max_eventfd, 1) < 
 0) {
 -fprintf(stderr, "ivshmem: ioeventfd not available\n");
 -}
 +memory_region_add_eventfd(&s->ivshmem_mmio,
 +  DOORBELL,
 +  4,
 +  true,
 +  (incoming_posn << 16) | 
 guest_max_eventfd,
 +  incoming_fd);
  }

  return;
 @@ -659,10 +650,6 @@ static int pci_ivshmem_init(PCIDevice *dev)
  m

Re: [PATCHv2 RFC] virtio-pci: flexible configuration layout

2011-12-05 Thread Jesse Barnes
On Mon, 14 Nov 2011 20:18:55 +0200
"Michael S. Tsirkin"  wrote:

> Add a flexible mechanism to specify virtio configuration layout, using
> pci vendor-specific capability.  A separate capability is used for each
> of common, device specific and data-path accesses.
> 
> Warning: compiled only.
> This patch also needs to be split up, pci_iomap changes
> also need arch updates for non-x86.
> There might also be more spec changes.
> 
> Posting here for early feedback, and to allow Sasha to
> proceed with his "kvm tool" work.
> 
> Changes from v1:
> Updated to match v3 of the spec, see:
>   Subject: [PATCHv3 RFC] virtio-spec: flexible configuration layout
>   Message-ID: <2010122436.ga13...@redhat.com>
>   In-Reply-To: <2009195901.ga28...@redhat.com>

Looks like this conflicts with your other iomap changes... I didn't
check your latest tree; do you just add another patch on top for the
virtio changes now?

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center


signature.asc
Description: PGP signature


Re: [PATCH v2 1/3] pci: Rework config space blocking services

2011-12-05 Thread Jesse Barnes
On Fri,  4 Nov 2011 09:45:59 +0100
Jan Kiszka  wrote:

> pci_block_user_cfg_access was designed for the use case that a single
> context, the IPR driver, temporarily delays user space accesses to the
> config space via sysfs. This assumption became invalid by the time
> pci_dev_reset was added as locking instance. Today, if you run two loops
> in parallel that reset the same device via sysfs, you end up with a
> kernel BUG as pci_block_user_cfg_access detect the broken assumption.
> 
> This reworks the pci_block_user_cfg_access to a sleeping service
> pci_cfg_access_lock and an atomic-compatible variant called
> pci_cfg_access_trylock. The former not only blocks user space access as
> before but also waits if access was already locked. The latter service
> just returns false in this case, allowing the caller to resolve the
> conflict instead of raising a BUG.
> 
> Adaptions of the ipr driver were originally written by Brian King.

Applied this series to linux-next, thanks.

-- 
Jesse Barnes, Intel Open Source Technology Center


signature.asc
Description: PGP signature


Re: [PATCH 3/5 V4] Add ioctl for KVM_GUEST_STOPPED

2011-12-05 Thread Eric B Munson
On Sat, 03 Dec 2011, Sasha Levin wrote:

> On Tue, 2011-11-29 at 16:35 -0500, Eric B Munson wrote:
> > 
> > Now that we have a flag that will tell the guest it was suspended,
> > create an interface for that communication using a KVM ioctl.
> > 
> > Signed-off-by: Eric B Munson  
> 
> Can it be documented in api.txt as well?
> 
> -- 
> 
> Sasha.
> 

Thanks for the review, will do for V5.

Eric


signature.asc
Description: Digital signature


[PATCH 5/5] kvm tools: Add 'kvm sandbox'

2011-12-05 Thread Sasha Levin
This patch adds 'kvm sandbox' which is a wrapper on top of 'kvm run' which
allows the user to easily specify sandboxed command to run in a custom
rootfs guest.

Example usage:

kvm sandbox -d test_guest -k some_kernel -- do_something_in_guest

Suggested-by: Pekka Enberg 
Signed-off-by: Sasha Levin 
---
 tools/kvm/Documentation/kvm-sandbox.txt |   16 ++
 tools/kvm/Makefile  |1 +
 tools/kvm/builtin-run.c |   49 +-
 tools/kvm/builtin-sandbox.c |9 ++
 tools/kvm/command-list.txt  |1 +
 tools/kvm/include/kvm/builtin-run.h |2 +
 tools/kvm/include/kvm/builtin-sandbox.h |6 
 tools/kvm/kvm-cmd.c |2 +
 8 files changed, 84 insertions(+), 2 deletions(-)
 create mode 100644 tools/kvm/Documentation/kvm-sandbox.txt
 create mode 100644 tools/kvm/builtin-sandbox.c
 create mode 100644 tools/kvm/include/kvm/builtin-sandbox.h

diff --git a/tools/kvm/Documentation/kvm-sandbox.txt 
b/tools/kvm/Documentation/kvm-sandbox.txt
new file mode 100644
index 000..8f24fc7
--- /dev/null
+++ b/tools/kvm/Documentation/kvm-sandbox.txt
@@ -0,0 +1,16 @@
+kvm-sandbox(1)
+
+
+NAME
+
+kvm-sandbox - Run a command in a sandboxed guest
+
+SYNOPSIS
+
+[verse]
+'kvm sandbox ['kvm run' arguments] -- [sandboxed command]'
+
+DESCRIPTION
+---
+The sandboxed command will run in a guest as part of it's init
+command.
diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index ece3306..24af1d0 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -85,6 +85,7 @@ OBJS  += hw/vesa.o
 OBJS   += hw/i8042.o
 OBJS   += hw/pci-shmem.o
 OBJS   += kvm-ipc.o
+OBJS   += builtin-sandbox.o
 
 FLAGS_BFD := $(CFLAGS) -lbfd
 has_bfd := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD))
diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 5db6995..7a57b5c 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -53,6 +53,7 @@
 #define DEFAULT_GUEST_MAC  "02:15:15:15:15:15"
 #define DEFAULT_HOST_MAC   "02:01:01:01:01:01"
 #define DEFAULT_SCRIPT "none"
+const char *DEFAULT_SANDBOX_FILENAME = "guest/sandbox.sh";
 
 #define MB_SHIFT   (20)
 #define KB_SHIFT   (10)
@@ -94,6 +95,7 @@ static bool custom_rootfs;
 static bool no_net;
 static bool no_dhcp;
 extern bool ioport_debug;
+static int  kvm_run_wrapper;
 extern int  active_console;
 extern int  debug_iodelay;
 
@@ -107,6 +109,15 @@ static const char * const run_usage[] = {
NULL
 };
 
+enum {
+   KVM_RUN_SANDBOX,
+};
+
+void kvm_run_set_wrapper_sandbox(void)
+{
+   kvm_run_wrapper = KVM_RUN_SANDBOX;
+}
+
 static int img_name_parser(const struct option *opt, const char *arg, int 
unset)
 {
char *sep;
@@ -755,6 +766,35 @@ static int kvm_run_set_sandbox(void)
return symlink(script, path);
 }
 
+static void kvm_run_write_sandbox_cmd(const char **argv, int argc)
+{
+   const char script_hdr[] = "#! /bin/bash\n\n";
+   int fd;
+
+   remove(sandbox);
+
+   fd = open(sandbox, O_RDWR | O_CREAT, 0777);
+   if (fd < 0)
+   die("Failed creating sandbox script");
+
+   if (write(fd, script_hdr, sizeof(script_hdr) - 1) <= 0)
+   die("Failed writing sandbox script");
+
+   while (argc) {
+   if (write(fd, argv[0], strlen(argv[0])) <= 0)
+   die("Failed writing sandbox script");
+   if (argc - 1)
+   if (write(fd, " ", 1) <= 0)
+   die("Failed writing sandbox script");
+   argv++;
+   argc--;
+   }
+   if (write(fd, "\n", 1) <= 0)
+   die("Failed writing sandbox script");
+
+   close(fd);
+}
+
 int kvm_cmd_run(int argc, const char **argv, const char *prefix)
 {
static char real_cmdline[2048], default_name[20];
@@ -780,8 +820,13 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
PARSE_OPT_KEEP_DASHDASH);
if (argc != 0) {
/* Cusrom options, should have been handled elsewhere */
-   if (strcmp(argv[0], "--") == 0)
-   break;
+   if (strcmp(argv[0], "--") == 0) {
+   if (kvm_run_wrapper == KVM_RUN_SANDBOX) {
+   sandbox = DEFAULT_SANDBOX_FILENAME;
+   kvm_run_write_sandbox_cmd(argv+1, 
argc-1);
+   break;
+   }
+   }
 
if (kernel_filename) {
fprintf(stderr, "Cannot handle parameter: "
diff --git a/tools/kvm/builtin-sandbox.c b/tools/kvm/builtin-sandbox.c
new file mode 100644
index 000..433f536
--- /dev/null
+++ b/tools/kvm/builtin-sandbox.c
@@ -0,0 +1,9 @@
+#include 

[PATCH 4/5] kvm tools: Ignore parameters after dashdash in 'kvm run'

2011-12-05 Thread Sasha Levin
This allows other commands to wrap 'kvm run' and use the parameters user
provides after a dash-dash for it's own use.

Signed-off-by: Sasha Levin 
---
 tools/kvm/builtin-run.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index cd14159..5db6995 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -776,8 +776,13 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
while (argc != 0) {
argc = parse_options(argc, argv, options, run_usage,
-   PARSE_OPT_STOP_AT_NON_OPTION);
+   PARSE_OPT_STOP_AT_NON_OPTION |
+   PARSE_OPT_KEEP_DASHDASH);
if (argc != 0) {
+   /* Cusrom options, should have been handled elsewhere */
+   if (strcmp(argv[0], "--") == 0)
+   break;
+
if (kernel_filename) {
fprintf(stderr, "Cannot handle parameter: "
"%s\n", argv[0]);
-- 
1.7.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] kvm tools: Allow easily sandboxing applications within a guest

2011-12-05 Thread Sasha Levin
This patch adds a '--sandbox' argument when used in conjuction with a custom
rootfs, it allows running a script or an executable in the guest environment
by using executables and other files from the host.

This is useful when testing code that might cause problems on the host, or
to automate kernel testing since it's now easy to link a kvm tools test
script with 'git bisect run'.

Suggested-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
---
 tools/kvm/builtin-run.c   |   31 +++
 tools/kvm/guest/init_stage2.c |   13 -
 2 files changed, 43 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index de3001e..cd14159 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -82,6 +82,7 @@ static const char *guest_mac;
 static const char *host_mac;
 static const char *script;
 static const char *guest_name;
+static const char *sandbox;
 static struct virtio_net_params *net_params;
 static bool single_step;
 static bool readonly_image[MAX_DISK_IMAGES];
@@ -420,6 +421,8 @@ static const struct option options[] = {
OPT_CALLBACK('\0', "tty", NULL, "tty id",
 "Remap guest TTY into a pty on the host",
 tty_parser),
+   OPT_STRING('\0', "sandbox", &sandbox, "script",
+   "Run this script when booting into custom rootfs"),
 
OPT_GROUP("Kernel options:"),
OPT_STRING('k', "kernel", &kernel_filename, "kernel",
@@ -727,6 +730,31 @@ static int kvm_custom_stage2(void)
return r;
 }
 
+static int kvm_run_set_sandbox(void)
+{
+   const char *guestfs_name = "default";
+   char path[PATH_MAX], script[PATH_MAX], *tmp;
+
+   if (image_filename[0])
+   guestfs_name = image_filename[0];
+
+   snprintf(path, PATH_MAX, "%s%s/virt/sandbox.sh", kvm__get_dir(), 
guestfs_name);
+
+   remove(path);
+
+   if (sandbox == NULL)
+   return 0;
+
+   tmp = realpath(sandbox, NULL);
+   if (tmp == NULL)
+   return -ENOMEM;
+
+   snprintf(script, PATH_MAX, "/host/%s", tmp);
+   free(tmp);
+
+   return symlink(script, path);
+}
+
 int kvm_cmd_run(int argc, const char **argv, const char *prefix)
 {
static char real_cmdline[2048], default_name[20];
@@ -886,7 +914,10 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
if (using_rootfs) {
strcat(real_cmdline, " root=/dev/root rw 
rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p");
if (custom_rootfs) {
+   kvm_run_set_sandbox();
+
strcat(real_cmdline, " init=/virt/init");
+
if (!no_dhcp)
strcat(real_cmdline, "  ip=dhcp");
if (kvm_custom_stage2())
diff --git a/tools/kvm/guest/init_stage2.c b/tools/kvm/guest/init_stage2.c
index af615a0..6489fee 100644
--- a/tools/kvm/guest/init_stage2.c
+++ b/tools/kvm/guest/init_stage2.c
@@ -16,6 +16,14 @@ static int run_process(char *filename)
return execve(filename, new_argv, new_env);
 }
 
+static int run_process_sandbox(char *filename)
+{
+   char *new_argv[] = { filename, "/virt/sandbox.sh", NULL };
+   char *new_env[] = { "TERM=linux", NULL };
+
+   return execve(filename, new_argv, new_env);
+}
+
 int main(int argc, char *argv[])
 {
/* get session leader */
@@ -26,7 +34,10 @@ int main(int argc, char *argv[])
 
puts("Starting '/bin/sh'...");
 
-   run_process("/bin/sh");
+   if (access("/virt/sandbox.sh", R_OK) == 0)
+   run_process_sandbox("/bin/sh");
+   else
+   run_process("/bin/sh");
 
printf("Init failed: %s\n", strerror(errno));
 
-- 
1.7.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] kvm tools: Remove double 'init=' kernel param

2011-12-05 Thread Sasha Levin
Signed-off-by: Sasha Levin 
---
 tools/kvm/builtin-run.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 9635c82..de3001e 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -881,9 +881,6 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
if (virtio_9p__register(kvm, "/", "hostfs") < 0)
die("Unable to initialize virtio 9p");
using_rootfs = custom_rootfs = 1;
-
-   if (!strstr(real_cmdline, "init="))
-   strlcat(real_cmdline, " init=/bin/sh ", 
sizeof(real_cmdline));
}
 
if (using_rootfs) {
-- 
1.7.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] kvm tools: Split custom rootfs init into two stages

2011-12-05 Thread Sasha Levin
Currently custom rootfs init is built along with the main KVM tools executable
and is copied into custom rootfs directories when they are created with
'kvm setup'. The problem there is that if the init code changes, they have
to be manually copied to custom rootfs directories.

Instead, this patch splits init process into two parts. One part that simply
handles mounts, and passes it to stage 2 of the init.

Stage 2 really sits along in the code tree, and does all the heavy lifting.

This allows us to make init changes in the code tree and have it automatically
be updated in custom rootfs guests without having to copy files over manua

Signed-off-by: Sasha Levin 
---
 tools/kvm/Makefile|9 +++--
 tools/kvm/builtin-run.c   |   27 +++
 tools/kvm/guest/init.c|   14 +++---
 tools/kvm/guest/init_stage2.c |   34 ++
 4 files changed, 71 insertions(+), 13 deletions(-)
 create mode 100644 tools/kvm/guest/init_stage2.c

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index bb5f6b0..ece3306 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -21,6 +21,7 @@ TAGS  := ctags
 PROGRAM:= kvm
 
 GUEST_INIT := guest/init
+GUEST_INIT_S2 := guest/init_stage2
 
 OBJS   += builtin-balloon.o
 OBJS   += builtin-debug.o
@@ -179,7 +180,7 @@ WARNINGS += -Wwrite-strings
 
 CFLAGS += $(WARNINGS)
 
-all: $(PROGRAM) $(GUEST_INIT)
+all: $(PROGRAM) $(GUEST_INIT) $(GUEST_INIT_S2)
 
 KVMTOOLS-VERSION-FILE:
@$(SHELL_PATH) util/KVMTOOLS-VERSION-GEN $(OUTPUT)
@@ -193,6 +194,10 @@ $(GUEST_INIT): guest/init.c
$(E) "  LINK" $@
$(Q) $(CC) -static guest/init.c -o $@
 
+$(GUEST_INIT_S2): guest/init_stage2.c
+   $(E) "  LINK" $@
+   $(Q) $(CC) -static guest/init_stage2.c -o $@
+
 $(DEPS):
 
 %.d: %.c
@@ -269,7 +274,7 @@ clean:
$(Q) rm -f bios/bios-rom.h
$(Q) rm -f tests/boot/boot_test.iso
$(Q) rm -rf tests/boot/rootfs/
-   $(Q) rm -f $(DEPS) $(OBJS) $(PROGRAM) $(GUEST_INIT)
+   $(Q) rm -f $(DEPS) $(OBJS) $(PROGRAM) $(GUEST_INIT) $(GUEST_INIT_S2)
$(Q) rm -f cscope.*
$(Q) rm -f $(KVM_INCLUDE)/common-cmds.h
$(Q) rm -f KVMTOOLS-VERSION-FILE
diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 43cf2c4..9635c82 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -702,6 +702,31 @@ void kvm_run_help(void)
usage_with_options(run_usage, options);
 }
 
+static int kvm_custom_stage2(void)
+{
+   char tmp[PATH_MAX], dst[PATH_MAX], *src;
+   const char *rootfs;
+   int r;
+
+   src = realpath("guest/init_stage2", NULL);
+   if (src == NULL)
+   return -ENOMEM;
+
+   if (image_filename[0] == NULL)
+   rootfs = "default";
+   else
+   rootfs = image_filename[0];
+
+   snprintf(tmp, PATH_MAX, "%s%s/virt/init_stage2", kvm__get_dir(), 
rootfs);
+   remove(tmp);
+
+   snprintf(dst, PATH_MAX, "/host/%s", src);
+   r = symlink(dst, tmp);
+   free(src);
+
+   return r;
+}
+
 int kvm_cmd_run(int argc, const char **argv, const char *prefix)
 {
static char real_cmdline[2048], default_name[20];
@@ -867,6 +892,8 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
strcat(real_cmdline, " init=/virt/init");
if (!no_dhcp)
strcat(real_cmdline, "  ip=dhcp");
+   if (kvm_custom_stage2())
+   die("Failed linking stage 2 of init.");
}
} else if (!strstr(real_cmdline, "root=")) {
strlcat(real_cmdline, " root=/dev/vda rw ", 
sizeof(real_cmdline));
diff --git a/tools/kvm/guest/init.c b/tools/kvm/guest/init.c
index 8975023..032a261 100644
--- a/tools/kvm/guest/init.c
+++ b/tools/kvm/guest/init.c
@@ -1,6 +1,6 @@
 /*
- * This is a simple init for shared rootfs guests. It brings up critical
- * mountpoints and then launches /bin/sh.
+ * This is a simple init for shared rootfs guests. This part should be limited
+ * to doing mounts and running stage 2 of the init process.
  */
 #include 
 #include 
@@ -30,15 +30,7 @@ int main(int argc, char *argv[])
 
do_mounts();
 
-/* get session leader */
-setsid();
-
-/* set controlling terminal */
-ioctl (0, TIOCSCTTY, 1);
-
-   puts("Starting '/bin/sh'...");
-
-   run_process("/bin/sh");
+   run_process("/virt/init_stage2");
 
printf("Init failed: %s\n", strerror(errno));
 
diff --git a/tools/kvm/guest/init_stage2.c b/tools/kvm/guest/init_stage2.c
new file mode 100644
index 000..af615a0
--- /dev/null
+++ b/tools/kvm/guest/init_stage2.c
@@ -0,0 +1,34 @@
+/*
+ * This is a stage 2 of the init. This part should do all the heavy
+ * lifting such as setting up the console and calling /bin/sh.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int run_proces

KVM call agenda for 12/6 (Tuesday) @ 10am US/Eastern

2011-12-05 Thread Juan Quintela

Hi

Please send in any agenda items you are interested in covering.

Proposal (from Anthony):

> 1. A short introduction to each of the guest agents, what guests they
> support, and what verbs they support.

> 2. A short description of key requirements from each party (oVirt,
> libvirt, QEMU) for a guest agent

> 3. An open discussion about possible ways to collaborate/converge.

Notice that guest integration will take more than one week (Anthony
estimation also).

For libvirt and ovirt folks, please contact me or Chris for details of
the call.


Thanks, Juan.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-05 Thread Jan Kiszka
On 2011-12-05 14:36, Avi Kivity wrote:
> On 12/05/2011 03:29 PM, Jan Kiszka wrote:
>> On 2011-12-05 14:14, Avi Kivity wrote:
>>> On 12/05/2011 02:47 PM, Jan Kiszka wrote:
>
> (the memory API added unstable names, hopefully the QOM can take over
> the stable ones and we'll have a good way to denote the unstable ones).
>

 OK, maybe - or likely - we should make those device models have the same
 names in QOM once instantiated. But I'm still convinced they should
 remain separated models in contrast to a single model with a property.
>>>
>>> What do you mean by separate models?  You share all the code you can,
>>> and don't share the code you can't.  To me, single model == single name.
>>
>> But different configuration.
> 
> Right, just like IDE with different backends.

Except that there is a comparably large infrastructure to manage those
backends.

> 
>>>
 The kvm ioapic, e.g., requires an additional property (gsi_base) that is
 meaningless for user space devices. And its interrupts have to be
 wired&configured differently at board model level. So, from the QEMU
 POV, it is a very different device. Just the guest does not notice.
>>>
>>> It's like qcow2 and raw/native IO are wire differently, or virtio-net
>>> and vhost-net.  But it's the same IDE device or virtio NIC.
>>
>> That would mean introducing a backend/frontend concept for irqchips.
> 
> We could do it, have one ioapic model with ioapic_ops->eoi_broadcast(). 
> Most of the interfaces already dispatch dynamically (qdev gpio/irq) so
> there wouldn't be much more there.

The problem is configuration. Just by setting ioapic.backend=xxx, we
cannot pass down parameters that are backend-specific. We could ignore
this issue and make all specific parameters visible via the frontend.
Would be slightly ugly.

> 
> To me, how it's actually implemented is not important.  What is
> important is that save/restore, the monitor, and the guest don't notice
> any changes.

I widely agree, except that differentiation (or backend awareness) has
to be preserved in the monitor.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH V3] Guest stop notification

2011-12-05 Thread Jan Kiszka
On 2011-12-05 14:35, Marcelo Tosatti wrote:
> On Sat, Dec 03, 2011 at 12:45:51PM +0100, Jan Kiszka wrote:
 I was referring to the relation between the IOCTL and kvmclock, but
 IOCTL vs. kvm_run.

 Jan
>>>
>>> Ah, OK. Yes, we better characterize it as KVMCLOCK specific (a generic
>>> "guest is paused" command is not the scope of this patch).
>>>
>>> So appending KVMCLOCK_ to the ioctl definitions would make that more
>>> explicit.
>>
>> IMHO, that would move things in the wrong direction. The IOCTL in itself
>> has _nothing_ to do with kvmclock. It's just that its x86 backend is
>> implemented on top of that infrastructure. For me the IOCTL is pretty
>> generic, can be backed by kvmclock, but need not be on all future archs.
>>
>> Jan
> 
> I do not see the need to lift this infrastructure to arch independent
> status at the moment, without clear semantics on that arch independent
> level.
> 
> So I am fine with the current GUEST_PAUSED naming (which can later be
> extended with GUEST_RESUMED etc, if necessary, for use by other archs
> for example), and implementation in hw/kvmclock.c.
> 

Yes, let's keep it as suggested last (addition of kvmclock, unchanged
IOCTL interface).

Jan



signature.asc
Description: OpenPGP digital signature


Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-05 Thread Avi Kivity
On 12/05/2011 03:29 PM, Jan Kiszka wrote:
> On 2011-12-05 14:14, Avi Kivity wrote:
> > On 12/05/2011 02:47 PM, Jan Kiszka wrote:
> >>>
> >>> (the memory API added unstable names, hopefully the QOM can take over
> >>> the stable ones and we'll have a good way to denote the unstable ones).
> >>>
> >>
> >> OK, maybe - or likely - we should make those device models have the same
> >> names in QOM once instantiated. But I'm still convinced they should
> >> remain separated models in contrast to a single model with a property.
> > 
> > What do you mean by separate models?  You share all the code you can,
> > and don't share the code you can't.  To me, single model == single name.
>
> But different configuration.

Right, just like IDE with different backends.

> > 
> >> The kvm ioapic, e.g., requires an additional property (gsi_base) that is
> >> meaningless for user space devices. And its interrupts have to be
> >> wired&configured differently at board model level. So, from the QEMU
> >> POV, it is a very different device. Just the guest does not notice.
> > 
> > It's like qcow2 and raw/native IO are wire differently, or virtio-net
> > and vhost-net.  But it's the same IDE device or virtio NIC.
>
> That would mean introducing a backend/frontend concept for irqchips.

We could do it, have one ioapic model with ioapic_ops->eoi_broadcast(). 
Most of the interfaces already dispatch dynamically (qdev gpio/irq) so
there wouldn't be much more there.

To me, how it's actually implemented is not important.  What is
important is that save/restore, the monitor, and the guest don't notice
any changes.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-05 Thread Jan Kiszka
On 2011-12-05 14:14, Avi Kivity wrote:
> On 12/05/2011 02:47 PM, Jan Kiszka wrote:
>>>
>>> (the memory API added unstable names, hopefully the QOM can take over
>>> the stable ones and we'll have a good way to denote the unstable ones).
>>>
>>
>> OK, maybe - or likely - we should make those device models have the same
>> names in QOM once instantiated. But I'm still convinced they should
>> remain separated models in contrast to a single model with a property.
> 
> What do you mean by separate models?  You share all the code you can,
> and don't share the code you can't.  To me, single model == single name.

But different configuration.

> 
>> The kvm ioapic, e.g., requires an additional property (gsi_base) that is
>> meaningless for user space devices. And its interrupts have to be
>> wired&configured differently at board model level. So, from the QEMU
>> POV, it is a very different device. Just the guest does not notice.
> 
> It's like qcow2 and raw/native IO are wire differently, or virtio-net
> and vhost-net.  But it's the same IDE device or virtio NIC.

That would mean introducing a backend/frontend concept for irqchips.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: winXP "Standard PC" HAL and qemu-kvm >= 0.15

2011-12-05 Thread Avi Kivity
On 12/05/2011 11:21 AM, Michael Tokarev wrote:
> As it turned out, a windowsXP machine does not work in
> qemu-kvm >= 0.15 (it loses network and USB entirely)
> if it is using "Standard PC" HAL.  In 0.14 it worked
> fine, but not in 0.14 (I haven't tried any in-between
> versions yet).
>
> There are several HAL types available in winXP: these
> are "Uniprocessor PC with MPS" (or Multiprocessor),
> also two ACPI types, and "Standard PC".  All the other
> HAL types appears to work fine, but not "Standard PC".
>
> I haven't debugged further yet, -- because it were
> not easy to find out what was causing the regression
> and how to reproduce it, and also because I don't think
> it is the right HAL for qemu-kvm guest anyway.

It's not, but the regression indicates we broke something.  It would be
good to know what that is.

> So, if anybody have some thoughts about this issue,
> and especially if you know a way to switch winXP HAL
> type to some ACPI variant without reinstalling, please
> speak up.. ;)

I remember doing it somewhere in device manager, perhaps in the
processor entry.  But it was years since I last did this.

> Debian bugreport for a reference: http://bugs.debian.org/647312
>
> Reproducer: install a winXP guest on kvm with -no-acpi so
> it chooses an "Uniprocessor with MPS" HAL.  Switch it to
> "Standard PC" in device manager, reboot -- in 0.15+ it does
> not work anymore, while in 0.14 it continues to work fine.

Most likely non-ACPI interrupt routing.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-05 Thread Avi Kivity
On 12/05/2011 02:47 PM, Jan Kiszka wrote:
> > 
> > (the memory API added unstable names, hopefully the QOM can take over
> > the stable ones and we'll have a good way to denote the unstable ones).
> > 
>
> OK, maybe - or likely - we should make those device models have the same
> names in QOM once instantiated. But I'm still convinced they should
> remain separated models in contrast to a single model with a property.

What do you mean by separate models?  You share all the code you can,
and don't share the code you can't.  To me, single model == single name.

> The kvm ioapic, e.g., requires an additional property (gsi_base) that is
> meaningless for user space devices. And its interrupts have to be
> wired&configured differently at board model level. So, from the QEMU
> POV, it is a very different device. Just the guest does not notice.

It's like qcow2 and raw/native IO are wire differently, or virtio-net
and vhost-net.  But it's the same IDE device or virtio NIC.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] Guest stop notification

2011-12-05 Thread Eric B Munson
On Sat, 03 Dec 2011, Jan Kiszka wrote:

> On 2011-12-02 22:27, Eric B Munson wrote:
> > On Fri, 02 Dec 2011, Jan Kiszka wrote:
> > 
> >> On 2011-12-02 20:19, Eric B Munson wrote:
> >>> Often when a guest is stopped from the qemu console, it will report 
> >>> spurious
> >>> soft lockup warnings on resume.  There are kernel patches being discussed 
> >>> that
> >>> will give the host the ability to tell the guest that it is being stopped 
> >>> and
> >>> should ignore the soft lockup warning that generates.
> >>>
> >>> Signed-off-by: Eric B Munson 
> >>> Cc: Avi Kivity 
> >>> Cc: Marcelo Tosatti 
> >>> Cc: Jan Kiszka 
> >>> Cc: ry...@linux.vnet.ibm.com
> >>> Cc: aligu...@us.ibm.com
> >>> Cc: kvm@vger.kernel.org
> >>>
> >>> ---
> >>> Changes from V2:
> >>>  Move ioctl into hw/kvmclock.c so as other arches can use it as it is
> >>> implemented
> >>>
> >>> Changes from V1:
> >>>  Remove unnecessary encapsulating function
> >>>
> >>>  hw/kvmclock.c |   24 
> >>>  1 files changed, 24 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/hw/kvmclock.c b/hw/kvmclock.c
> >>> index 5388bc4..756839f 100644
> >>> --- a/hw/kvmclock.c
> >>> +++ b/hw/kvmclock.c
> >>> @@ -16,6 +16,7 @@
> >>>  #include "sysbus.h"
> >>>  #include "kvm.h"
> >>>  #include "kvmclock.h"
> >>> +#include "cpu-all.h"
> >>>  
> >>>  #include 
> >>>  #include 
> >>> @@ -69,11 +70,34 @@ static void kvmclock_vm_state_change(void *opaque, 
> >>> int running,
> >>>  }
> >>>  }
> >>>  
> >>> +static void kvmclock_vm_state_change_vcpu(void *opaque, int running,
> >>> +  RunState state)
> >>> +{
> >>> +int ret;
> >>> +CPUState *penv = first_cpu;
> >>> +
> >>> +if (running) {
> >>> + while (penv) {
> >>
> >> or: for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
> >>
> > 
> > Functionally equivalent and I see both in the code, is there a standard?
> 
> Not really. I once tried to introduce an iterator macro, but it was
> refused. The above is just more compact.
> 
> But this is only a minor nit.
> 

Fair enough, since there will be a V4 I will switch to the for loop.

> > 
> >>> +ret = kvm_vcpu_ioctl(penv, KVM_GUEST_PAUSED, 0);
> >>> +if (ret) {
> >>> +if (ret != ENOSYS) {
> >>> +fprintf(stderr,
> >>> +"kvmclock_vm_state_change_vcpu: %s\n",
> >>> +strerror(-ret));
> >>> +}
> >>> +return;
> >>> +}
> >>> +penv = (CPUState *)penv->next_cpu;
> >>
> >> Unneeded cast.
> >>
> > 
> > Also following an example seen elsewhere.
> 
> Generally, we try to avoid those pointless casts.
> 

Will remove for V4.

> > 
> >>> +}
> >>> +}
> >>> +}
> >>> +
> >>
> >> Again: please use checkpatch.pl.
> >>
> > 
> > Sorry, tough to get used to hitting space bar that many times...
> > 
> >>>  static int kvmclock_init(SysBusDevice *dev)
> >>>  {
> >>>  KVMClockState *s = FROM_SYSBUS(KVMClockState, dev);
> >>>  
> >>>  qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
> >>> +qemu_add_vm_change_state_handler(kvmclock_vm_state_change_vcpu, 
> >>> NULL);
> >>>  return 0;
> >>>  }
> >>>  
> >>
> >> Why not extend the existing handler?
> > 
> > Because the new handler doesn't touch the KVMClockState object.  If this is
> > preferred, I have no objection.
> 
> The separate registration looks strange to me. And the fact that you
> don't need to object doesn't justify a callback of its own.
> 

I think you misunderstood me, I meant I have no object to doign it your way if
you have a strong opinion (as it seems you do).

> > 
> >>
> >> I still wonder if the IOCTL interface is actually kvmclock specific. But
> >> Marcello asked for this, and we could still change it when some arch
> >> comes around that provides it independent of kvmclock.
> > 
> > The flag itself is stored in the pvclock_vcpu_time_info structure, and 
> > anything
> > else that touches that structure uses ioctls.
> 
> That's the host-guest interface. But I'm talking about the kvm-qemu
> interface here which has no relation to how the "was paused" information
> is transferred to the guest.
> 
> Jan
> 




signature.asc
Description: Digital signature


Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-05 Thread Jan Kiszka
On 2011-12-05 13:36, Avi Kivity wrote:
> On 12/05/2011 01:37 PM, Jan Kiszka wrote:
>> On 2011-12-05 11:01, Avi Kivity wrote:
>>> On 12/04/2011 11:38 PM, Jan Kiszka wrote:
>
> It should be also possible to migrate from non-KVM device to KVM
> version, different names would prevent that for ever.

 It is (theoretically) possible with these patches as the vmstate names
 are the same. KVM to TCG migration does not work right now, so I was
 only able to test in-kernel <-> user space irqchip model migrations.
>>>
>>> btw, for the next-gen migration protocol, we'd probably be using QOM
>>> paths, not vmstate names; the QOM paths would include the device name?
>>
>> That would be a very bad idea IMHO. Every refactoring of your device
>> tree, e.g. to model CPU hotplug and the ICC bus more accurately, would
>> risk to create a migration crack.
> 
> At some point, something has to be stable.  We can't have an infinite
> number of layers giving names to things.  I propose we have just one layer.
> 
>>  At least we would need some stable
>> naming and/or alias concept then.
> 
> We should be able to transform a path to backward compatible names,
> yes.  But if something has an unstable name, let's omit it in the first
> place.
> 
> (the memory API added unstable names, hopefully the QOM can take over
> the stable ones and we'll have a good way to denote the unstable ones).
> 

OK, maybe - or likely - we should make those device models have the same
names in QOM once instantiated. But I'm still convinced they should
remain separated models in contrast to a single model with a property.

The kvm ioapic, e.g., requires an additional property (gsi_base) that is
meaningless for user space devices. And its interrupts have to be
wired&configured differently at board model level. So, from the QEMU
POV, it is a very different device. Just the guest does not notice.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-05 Thread Avi Kivity
On 12/05/2011 01:37 PM, Jan Kiszka wrote:
> On 2011-12-05 11:01, Avi Kivity wrote:
> > On 12/04/2011 11:38 PM, Jan Kiszka wrote:
> >>>
> >>> It should be also possible to migrate from non-KVM device to KVM
> >>> version, different names would prevent that for ever.
> >>
> >> It is (theoretically) possible with these patches as the vmstate names
> >> are the same. KVM to TCG migration does not work right now, so I was
> >> only able to test in-kernel <-> user space irqchip model migrations.
> > 
> > btw, for the next-gen migration protocol, we'd probably be using QOM
> > paths, not vmstate names; the QOM paths would include the device name?
>
> That would be a very bad idea IMHO. Every refactoring of your device
> tree, e.g. to model CPU hotplug and the ICC bus more accurately, would
> risk to create a migration crack.

At some point, something has to be stable.  We can't have an infinite
number of layers giving names to things.  I propose we have just one layer.

>  At least we would need some stable
> naming and/or alias concept then.

We should be able to transform a path to backward compatible names,
yes.  But if something has an unstable name, let's omit it in the first
place.

(the memory API added unstable names, hopefully the QOM can take over
the stable ones and we'll have a good way to denote the unstable ones).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] [autotest] client.tests.cgroup: Replace LoadPerCpu() by get_load_per_cpu

2011-12-05 Thread Lukas Doktor
* Move LoadPerCpu into cgroup_common.py (cgroup-kvm will need it too)
* [FIX] Use etraceback
* Code cleanup
---
 client/tests/cgroup/cgroup.py|   79 ++
 client/tests/cgroup/cgroup_common.py |   22 +
 2 files changed, 35 insertions(+), 66 deletions(-)

diff --git a/client/tests/cgroup/cgroup.py b/client/tests/cgroup/cgroup.py
index 207a0d7..000e562 100755
--- a/client/tests/cgroup/cgroup.py
+++ b/client/tests/cgroup/cgroup.py
@@ -12,9 +12,7 @@ from tempfile import NamedTemporaryFile
 
 from autotest_lib.client.bin import test, utils
 from autotest_lib.client.common_lib import error
-from cgroup_common import Cgroup as CG
-from cgroup_common import CgroupModules
-from cgroup_common import _traceback
+from cgroup_common import Cgroup, CgroupModules, get_load_per_cpu
 
 class cgroup(test.test):
 """
@@ -48,7 +46,7 @@ class cgroup(test.test):
 logging.info("---< 'test_%s' FAILED >---", subtest)
 except Exception:
 err += "%s, " % subtest
-tb = _traceback("test_%s" % subtest, sys.exc_info())
+tb = utils.etraceback("test_%s" % subtest, sys.exc_info())
 logging.error("test_%s: FAILED%s", subtest, tb)
 logging.info("---< 'test_%s' FAILED >---", subtest)
 
@@ -75,7 +73,6 @@ class cgroup(test.test):
 def cleanup(self):
 """ Cleanup """
 logging.debug('cgroup_test cleanup')
-print "Cleanup"
 del (self.modules)
 
 
@@ -102,7 +99,7 @@ class cgroup(test.test):
 raise error.TestFail("Some parts of cleanup failed%s" % 
err)
 
 # Preparation
-item = CG('memory', self._client)
+item = Cgroup('memory', self._client)
 item.initialize(self.modules)
 item.smoke_test()
 pwd = item.mk_cgroup()
@@ -116,8 +113,8 @@ class cgroup(test.test):
 mem = min(int(mem.split()[1])/1024, 1024)
 mem = max(mem, 100) # at least 100M
 try:
-memsw_limit_bytes = 
item.get_property("memory.memsw.limit_in_bytes")
-except error.TestFail:
+item.get_property("memory.memsw.limit_in_bytes")
+except error.TestError:
 # Doesn't support memsw limitation -> disabling
 logging.info("System does not support 'memsw'")
 utils.system("swapoff -a")
@@ -222,7 +219,8 @@ class cgroup(test.test):
 logging.debug("test_memory: Memfill mem + swap limit")
 ps = item.test("memfill %d %s" % (mem, outf.name))
 item.set_cgroup(ps.pid, pwd)
-item.set_property_h("memory.memsw.limit_in_bytes", "%dM"%(mem/2), 
pwd)
+item.set_property_h("memory.memsw.limit_in_bytes", "%dM"%(mem/2),
+pwd)
 ps.stdin.write('\n')
 i = 0
 while ps.poll() == None:
@@ -266,56 +264,6 @@ class cgroup(test.test):
 Cpuset test
 1) Initiate CPU load on CPU0, than spread into CPU* - CPU0
 """
-class LoadPerCpu:
-"""
-Handles the LoadPerCpu stats
-self.values [cpus, cpu0, cpu1, ...]
-"""
-def __init__(self):
-"""
-Init
-"""
-self.values = []
-self.stat = open('/proc/stat', 'r')
-line = self.stat.readline()
-while line:
-if line.startswith('cpu'):
-self.values.append(int(line.split()[1]))
-else:
-break
-line = self.stat.readline()
-
-def reload(self):
-"""
-Reload current values
-"""
-self.values = self.get()
-
-def get(self):
-"""
-Get the current values
-@return vals: array of current values [cpus, cpu0, cpu1..]
-"""
-self.stat.seek(0)
-self.stat.flush()
-vals = []
-for _ in range(len(self.values)):
-vals.append(int(self.stat.readline().split()[1]))
-return vals
-
-def tick(self):
-"""
-Reload values and returns the load between the last tick/reload
-@return vals: array of load between ticks/reloads
-  values [cpus, cpu0, cpu1..]
-"""
-vals = self.get()
-ret = []
-for i in range(len(self.values)):
-ret.append(vals[i] - self.values[i])
-self.values = vals
-return ret
-
 def cleanup(supress=False):
 """ cleanup """
 logging.debug("test_cpuset: Cleanup")
@@ -341,7 +289,7 @@ class cgroup(test.test):
 raise error.TestFail("Some par

[PATCH 3/3] [kvm-autotest] tests.cgroup: Add TestCpusetCpusSwitching

2011-12-05 Thread Lukas Doktor
Tests the cpuset.cpus cgroup feature. It stresses all VM's CPUs
while switching between cgroups with different setting.

Signed-off-by: Lukas Doktor 
---
 client/tests/cgroup/cgroup_common.py |4 +
 client/tests/kvm/tests/cgroup.py |  108 +-
 2 files changed, 109 insertions(+), 3 deletions(-)

diff --git a/client/tests/cgroup/cgroup_common.py 
b/client/tests/cgroup/cgroup_common.py
index fe1601b..56856c0 100755
--- a/client/tests/cgroup/cgroup_common.py
+++ b/client/tests/cgroup/cgroup_common.py
@@ -105,6 +105,8 @@ class Cgroup(object):
 @param pwd: cgroup directory
 @return: 0 when is 'pwd' member
 """
+if isinstance(pwd, int):
+pwd = self.cgroups[pwd]
 if open(pwd + '/tasks').readlines().count("%d\n" % pid) > 0:
 return 0
 else:
@@ -126,6 +128,8 @@ class Cgroup(object):
 @param pid: pid of the process
 @param pwd: cgroup directory
 """
+if isinstance(pwd, int):
+pwd = self.cgroups[pwd]
 try:
 open(pwd+'/tasks', 'w').write(str(pid))
 except Exception, inst:
diff --git a/client/tests/kvm/tests/cgroup.py b/client/tests/kvm/tests/cgroup.py
index 23ae622..2e18ef7 100644
--- a/client/tests/kvm/tests/cgroup.py
+++ b/client/tests/kvm/tests/cgroup.py
@@ -51,13 +51,12 @@ def run_cgroup(test, params, env):
 @param cgroup: cgroup handler
 @param pwd: desired cgroup's pwd, cgroup index or None for root cgroup
 """
-if isinstance(pwd, int):
-pwd = cgroup.cgroups[pwd]
 cgroup.set_cgroup(vm.get_shell_pid(), pwd)
 for pid in utils.get_children_pids(vm.get_shell_pid()):
 cgroup.set_cgroup(int(pid), pwd)
 
 
+
 def distance(actual, reference):
 """
 Absolute value of relative distance of two numbers
@@ -1341,7 +1340,7 @@ def run_cgroup(test, params, env):
 except Exception, failure_detail:
 err += "\nCan't remove Cgroup: %s" % failure_detail
 
-self.sessions[0].sendline('rm -f /tmp/cgroup-cpu-lock')
+self.sessions[-1].sendline('rm -f /tmp/cgroup-cpu-lock')
 for i in range(len(self.sessions)):
 try:
 self.sessions[i].close()
@@ -1381,6 +1380,7 @@ def run_cgroup(test, params, env):
 self.sessions.append(self.vm.wait_for_login(timeout=30))
 self.sessions[i].cmd("touch /tmp/cgroup-cpu-lock")
 self.sessions[i].sendline(cmd)
+self.sessions.append(self.vm.wait_for_login(timeout=30))   # 
cleanup
 
 
 def run(self):
@@ -1485,8 +1485,109 @@ def run_cgroup(test, params, env):
 logging.error(err)
 raise error.TestFail(err)
 
+logging.info("Test passed successfully")
 return ("All clear")
 
+
+class TestCpusetCpusSwitching:
+"""
+Tests the cpuset.cpus cgroup feature. It stresses all VM's CPUs
+while switching between cgroups with different setting.
+"""
+def __init__(self, vms, modules):
+"""
+Initialization
+@param vms: list of vms
+@param modules: initialized cgroup module class
+"""
+self.vm = vms[0]  # Virt machines
+self.modules = modules  # cgroup module handler
+self.cgroup = Cgroup('cpuset', '')   # cgroup handler
+self.sessions = []
+
+
+def cleanup(self):
+""" Cleanup """
+err = ""
+try:
+del(self.cgroup)
+except Exception, failure_detail:
+err += "\nCan't remove Cgroup: %s" % failure_detail
+
+self.sessions[-1].sendline('rm -f /tmp/cgroup-cpu-lock')
+for i in range(len(self.sessions)):
+try:
+self.sessions[i].close()
+except Exception, failure_detail:
+err += ("\nCan't close the %dst ssh connection" % i)
+
+if err:
+logging.error("Some cleanup operations failed: %s", err)
+raise error.TestError("Some cleanup operations failed: %s" %
+  err)
+
+
+def init(self):
+"""
+Prepares cgroup, moves VM into it and execute stressers.
+"""
+self.cgroup.initialize(self.modules)
+vm_cpus = int(params.get('smp', 1))
+all_cpus = self.cgroup.get_property("cpuset.cpus")[0]
+if all_cpus == "0":
+raise error.TestFail("This test needs at least 2 CPUs on "
+ "host, cpuset=%s" % all_cpus)
+try:
+last_cpu = int(all_cpus.split('-')[1])
+except Exception:
+raise error.TestFail("Failed to get #CPU from root cgroup.")
+
+# Comments ar

[PATCH 2/3] [kvm-autotest] tests.cgroup: Add TestCpusetCpus test

2011-12-05 Thread Lukas Doktor
Tests the cpuset.cpus cgroup feature. It stresses all VM's CPUs
and changes the CPU affinity. Verifies correct behaviour.

* Add TestCpusetCpus test
* import cleanup
* private function names cleanup
---
 client/tests/cgroup/cgroup_common.py |2 +
 client/tests/kvm/tests/cgroup.py |  211 +++---
 2 files changed, 194 insertions(+), 19 deletions(-)

diff --git a/client/tests/cgroup/cgroup_common.py 
b/client/tests/cgroup/cgroup_common.py
index 186bf09..fe1601b 100755
--- a/client/tests/cgroup/cgroup_common.py
+++ b/client/tests/cgroup/cgroup_common.py
@@ -152,6 +152,8 @@ class Cgroup(object):
 """
 if pwd == None:
 pwd = self.root
+if isinstance(pwd, int):
+pwd = self.cgroups[pwd]
 try:
 # Remove tailing '\n' from each line
 ret = [_[:-1] for _ in open(pwd+prop, 'r').readlines()]
diff --git a/client/tests/kvm/tests/cgroup.py b/client/tests/kvm/tests/cgroup.py
index ee6ef2e..23ae622 100644
--- a/client/tests/kvm/tests/cgroup.py
+++ b/client/tests/kvm/tests/cgroup.py
@@ -7,8 +7,8 @@ import logging, os, re, sys, tempfile, time
 from random import random
 from autotest_lib.client.common_lib import error
 from autotest_lib.client.bin import utils
-from autotest_lib.client.tests.cgroup.cgroup_common import Cgroup, 
CgroupModules
-from autotest_lib.client.virt import virt_utils, virt_env_process
+from autotest_lib.client.tests.cgroup.cgroup_common import (Cgroup,
+CgroupModules, get_load_per_cpu)
 from autotest_lib.client.virt.aexpect import ExpectTimeoutError
 from autotest_lib.client.virt.aexpect import ExpectProcessTerminatedError
 
@@ -839,7 +839,7 @@ def run_cgroup(test, params, env):
  * Freezes the guest and thaws it again couple of times
  * verifies that guest is frozen and runs when expected
 """
-def get_stat(pid):
+def _get_stat(pid):
 """
 Gather statistics of pid+1st level subprocesses cpu usage
 @param pid: PID of the desired process
@@ -877,9 +877,9 @@ def run_cgroup(test, params, env):
 _ = cgroup.get_property('freezer.state', cgroup.cgroups[0])
 if 'FROZEN' not in _:
 raise error.TestFail("Couldn't freeze the VM: state %s" % 
_)
-stat_ = get_stat(pid)
+stat_ = _get_stat(pid)
 time.sleep(tsttime)
-stat = get_stat(pid)
+stat = _get_stat(pid)
 if stat != stat_:
 raise error.TestFail('Process was running in FROZEN state; 
'
  'stat=%s, stat_=%s, diff=%s' %
@@ -887,9 +887,9 @@ def run_cgroup(test, params, env):
 logging.info("THAWING (%ss)", tsttime)
 self.cgroup.set_property('freezer.state', 'THAWED',
  self.cgroup.cgroups[0])
-stat_ = get_stat(pid)
+stat_ = _get_stat(pid)
 time.sleep(tsttime)
-stat = get_stat(pid)
+stat = _get_stat(pid)
 if (stat - stat_) < (90*tsttime):
 raise error.TestFail('Process was not active in FROZEN'
  'state; stat=%s, stat_=%s, diff=%s' %
@@ -1186,7 +1186,7 @@ def run_cgroup(test, params, env):
 Let each of 3 scenerios (described in test specification) stabilize
 and then measure the CPU utilisation for time_test time.
 """
-def get_stat(f_stats, _stats=None):
+def _get_stat(f_stats, _stats=None):
 """ Reads CPU times from f_stats[] files and sumarize them. """
 if _stats is None:
 _stats = []
@@ -1218,27 +1218,27 @@ def run_cgroup(test, params, env):
 for thread_count in range(0, host_cpus):
 sessions[thread_count].sendline(cmd)
 time.sleep(time_init)
-_stats = get_stat(f_stats)
+_stats = _get_stat(f_stats)
 time.sleep(time_test)
-stats.append(get_stat(f_stats, _stats))
+stats.append(_get_stat(f_stats, _stats))
 
 thread_count += 1
 sessions[thread_count].sendline(cmd)
 if host_cpus % no_speeds == 0 and no_speeds <= host_cpus:
 time.sleep(time_init)
-_stats = get_stat(f_stats)
+_stats = _get_stat(f_stats)
 time.sleep(time_test)
-stats.append(get_stat(f_stats, _stats))
+stats.append(_get_stat(f_stats, _stats))
 
 for i in range(thread_count+1, no_threads):
 sessions[i].sendline(cmd)
 time.sleep(time_init)
-_stats = get_stat(f_stats)
+_stats = _get_stat(f_stats)
 for j in r

[kvm-autotest] tests.cgroup: Add 2 new tests of cpuset.cpus cgroup functionality

2011-12-05 Thread Lukas Doktor

Hi,

This patchset fixes some issues in cgroup_common.py library and adds 2 new 
tests to cgroup-kvm test.

Please find the details in each patch.

Sent to upstream as pull req. 103:
https://github.com/autotest/autotest/pull/103

Regards,
Lukáš

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-05 Thread Jan Kiszka
On 2011-12-05 11:01, Avi Kivity wrote:
> On 12/04/2011 11:38 PM, Jan Kiszka wrote:
>>>
>>> It should be also possible to migrate from non-KVM device to KVM
>>> version, different names would prevent that for ever.
>>
>> It is (theoretically) possible with these patches as the vmstate names
>> are the same. KVM to TCG migration does not work right now, so I was
>> only able to test in-kernel <-> user space irqchip model migrations.
> 
> btw, for the next-gen migration protocol, we'd probably be using QOM
> paths, not vmstate names; the QOM paths would include the device name?

That would be a very bad idea IMHO. Every refactoring of your device
tree, e.g. to model CPU hotplug and the ICC bus more accurately, would
risk to create a migration crack. At least we would need some stable
naming and/or alias concept then.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Stefan Hajnoczi
On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang  wrote:
> +static int virtnet_set_fd(struct net_device *dev, u32 pfn)
> +{
> +       struct virtnet_info *vi = netdev_priv(dev);
> +       struct virtio_device *vdev = vi->vdev;
> +
> +       if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
> +               vdev->config->set(vdev,
> +                                 offsetof(struct virtio_net_config_fd, addr),
> +                                 &pfn, sizeof(u32));

Please use the virtio model (i.e. virtqueues) instead of shared
memory.  Mapping a page breaks the virtio abstraction.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC][PATCH 02/16] kvm: Move kvmclock into hw/kvm folder

2011-12-05 Thread Andreas Färber
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Am 03.12.2011 23:33, schrieb Jan Kiszka:
> On 2011-12-03 20:00, Andreas Färber wrote:
>> Am 03.12.2011 12:17, schrieb Jan Kiszka:
>>> diff --git a/hw/kvmclock.c b/hw/kvm/clock.c similarity index
>>> 96% rename from hw/kvmclock.c rename to hw/kvm/clock.c index
>>> 5388bc4..aa37c5d 100644 --- a/hw/kvmclock.c +++
>>> b/hw/kvm/clock.c @@ -11,11 +11,11 @@ * */
>>> 
>>> -#include "qemu-common.h" -#include "sysemu.h" -#include
>>> "sysbus.h" -#include "kvm.h" -#include "kvmclock.h" +#include
>>>  +#include  +#include  
>>> +#include  +#include 
>>> 
>>> #include  #include 
>> 
>> Please don't start using system includes for everything. Rather
>> extend QEMU_CFLAGS to contain the right user include path(s).
> 
> No problem - and no need to tweak any CFLAGS

Right, I had recursion into kvm/ in mind - would've required -I ../..
to be added to CFLAGS.

> ("" only adds . to the header search paths).

By default that is. -iquote can add further paths. (Unfortunately
didn't solve the Cocoa Block.h vs. block.h problem since Objective-C
frameworks use quotes, too.)

> Do we have a convention that every include in <> is considered
> system header? Should probably be documented then (and code should
> be converted gradually).

The convention I perceived was that everything QEMU was in quotes
whereas POSIX, Linux, zlib, glib, etc. were in angle brackets. Didn't
check for documentation.

Andreas

- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.18 (GNU/Linux)

iQIcBAEBAgAGBQJO3KA6AAoJEPou0S0+fgE/izQP/1q0Oje72FdXyUyVxPZw2Ypi
zp+2TFYJ3FJUrTLkkDBjmsaMT0sdIoI/wXxDTrrif9QI1gfRhNlxw9qES+En4xDG
3ClCl6UMNrcq35WrejIvPOXQMvVH6tTnliHBKmG6TSsQXPEFLS/BbWA1Y3gV7nZ4
KXmMHdNqVzmo66AU0FGQPSZyE/u+w8PKnfOIea961tMFtYodny69lzuoBWIaC/oT
8neCRT6U4BVX6hEy6QgY1651IM0KUOUC0fbBwFMwiy+NeL5KgB+GWsrnVq+U0hpM
gDtE09L1IKzuppMLlsx1DmxAZYHX12ZlW5W3np13+qDOkFx+4JqT3AU1MGBDhVQ+
ylbYXAINpcXsV8hTyCv1xoWlCJTUreD5+vVgAe5IN3jJUuXttR867YZHS6w0Xkh2
saTYRdkaywNpb9Jm/8RdP0Nepjq2YKdjP99/Da5/GOlVBOqASycKmtAyKQKerhAx
2n+Os8Ekji9fLM7S1FFWe2i/v/bUiVKb9TPRw98tDaDd9V0RW2AkBrJcL2BlFBC4
nqM57ndpv3phGLbVoin2yo32P6iTqL/bS7iyJap+IeklSzxSyW0bBcJyT0oIZMQ2
TdeZNSS2aF9+SmIp91aNRIWhXDAZGggls5AvrS3FTbyzY0jb4HXLIYVGyLCdzfar
uHBpp0n3XZsqieTYP+f0
=zA/a
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >