Re: [PATCH 2/6] KVM: PPC: E500: Explicitly mark shadow maps invalid

2013-01-17 Thread Scott Wood

On 01/17/2013 08:34:53 PM, Alexander Graf wrote:

When we invalidate shadow TLB maps on the host, we don't mark them
as not valid. But we should.

Fix this by removing the E500_TLB_VALID from their flags when
invalidating.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/e500_tlb.c |   13 ++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index d38ad63..8efb2ac 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -204,9 +204,13 @@ static void inval_gtlbe_on_host(struct  
kvmppc_vcpu_e500 *vcpu_e500,

 {
struct kvm_book3e_206_tlb_entry *gtlbe =
get_entry(vcpu_e500, tlbsel, esel);
+   struct tlbe_ref *ref = &vcpu_e500->gtlb_priv[tlbsel][esel].ref;

-   if (tlbsel == 1 &&
-   vcpu_e500->gtlb_priv[1][esel].ref.flags & E500_TLB_BITMAP) {
+   /* Don't bother with unmapped entries */
+   if (!(ref->flags & E500_TLB_VALID))
+   return;
+
+   if (tlbsel == 1 && ref->flags & E500_TLB_BITMAP) {
u64 tmp = vcpu_e500->g2h_tlb1_map[esel];
int hw_tlb_indx;
unsigned long flags;
@@ -224,7 +228,7 @@ static void inval_gtlbe_on_host(struct  
kvmppc_vcpu_e500 *vcpu_e500,

}
mb();
vcpu_e500->g2h_tlb1_map[esel] = 0;
-		vcpu_e500->gtlb_priv[1][esel].ref.flags &=  
~E500_TLB_BITMAP;

+   ref->flags &= ~(E500_TLB_BITMAP | E500_TLB_VALID);
local_irq_restore(flags);

return;
@@ -232,6 +236,9 @@ static void inval_gtlbe_on_host(struct  
kvmppc_vcpu_e500 *vcpu_e500,


 	/* Guest tlbe is backed by at most one host tlbe per shadow  
pid. */

kvmppc_e500_tlbil_one(vcpu_e500, gtlbe);
+
+   /* Mark the TLB as not backed by the host anymore */
+   ref->flags &= ~E500_TLB_VALID;
 }


Invalidation paths that call kvmppc_e500_tlbil_all(), such as MMUCSR0  
and tlbivax, need a call to clear_tlb_refs() in order to get the valid  
bits cleared.


In looking this up, I also saw that tlbilxlpid (T=0) seems to be  
broken; it compares PID/TID as if it were T=1.  Don't be fooled by the  
"lpid" in the name; it's still relevant (and different from T=1) in the  
absence of E.HV, and should be treated as "tlbilx all".  Once  
implemented, that will also presumably use kvmppc_e500_tlbil_all().


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [QEMU PATCH v2] virtio-net: introduce a new macaddr control

2013-01-17 Thread Amos Kong
On Thu, Jan 17, 2013 at 02:13:08PM +0200, Michael S. Tsirkin wrote:
> On Thu, Jan 17, 2013 at 11:49:20AM +1030, Rusty Russell wrote:
> > ak...@redhat.com writes:
> > > @@ -349,6 +351,14 @@ static int virtio_net_handle_mac(VirtIONet *n, 
> > > uint8_t cmd,
> > >  {
> > >  struct virtio_net_ctrl_mac mac_data;
> > >  
> > > +if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET && elem->out_num == 2 &&
> > > +elem->out_sg[1].iov_len == ETH_ALEN) {
> > > +/* Set MAC address */
> > > +memcpy(n->mac, elem->out_sg[1].iov_base, 
> > > elem->out_sg[1].iov_len);
> > > +qemu_format_nic_info_str(&n->nic->nc, n->mac);
> > > +return VIRTIO_NET_OK;
> > > +}
> > 
> > Does the rest of the net device still rely on the layout of descriptors?
> > If so, OK, we'll fix them all together.  If not, this introduces a new
> > one.
> > 
> > Cheers,
> > Rusty.
> 
> The following fixes all existing users.
> Got to deal with some urgent stuff so did not test yet -
> Amos, would you like to include this in your patchset
> and build on it, test it all together?

No problem.

> If not I'll get to it next week.
> 
> --->
> 
> virtio-net: remove layout assumptions for ctrl vq
> 
> Signed-off-by: Michael S. Tsirkin 

...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/6] KVM: PPC: e500: Shadow TLB Improvements v2

2013-01-17 Thread Alexander Graf
This patch set improves the shadow TLB handling of our e500
target.

The really important bit here is that with these patches applied,
we can map guest TLB1 entries into the host's TLB0. This gives a
significant performance improvement as you can see below.

Alex

v1 -> v2:

  - new patch: Move write_stlbe higher
  - new patch: Explicitly mark shadow maps invalid
  - new patch: Propagate errors when shadow mapping
  - remove force init logic, it's handled by the explicit invalidate now
  - leave stlbe uninitialized
  - allow mixing of TLB0 and TLB1_BITMAP backing of TLB1 pages

---

without patch, using 4k backed memory:

$ time for i in {1..1000}; do /bin/echo > /dev/null; done
real0m12.947s
user0m1.076s
sys 0m9.720s


with hugetlbfs:

$ time for i in {1..1000}; do /bin/echo > /dev/null; done

real0m3.262s
user0m0.464s
sys 0m0.236s


with patches applied, using 4k backed memory:

$ time for i in {1..1000}; do /bin/echo > /dev/null; done

real0m4.446s
user0m0.380s
sys 0m0.644s

Alexander Graf (6):
  KVM: PPC: E500: Move write_stlbe higher
  KVM: PPC: E500: Explicitly mark shadow maps invalid
  KVM: PPC: E500: Propagate errors when shadow mapping
  KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping
  KVM: PPC: E500: Split host and guest MMU parts
  KVM: PPC: e500: Implement TLB1-in-TLB0 mapping

 arch/powerpc/kvm/Makefile|9 +-
 arch/powerpc/kvm/e500.h  |1 +
 arch/powerpc/kvm/e500_mmu.c  |  811 +
 arch/powerpc/kvm/e500_mmu_host.c |  699 +++
 arch/powerpc/kvm/e500_mmu_host.h |   20 +
 arch/powerpc/kvm/e500_tlb.c  | 1430 --
 6 files changed, 1537 insertions(+), 1433 deletions(-)
 create mode 100644 arch/powerpc/kvm/e500_mmu.c
 create mode 100644 arch/powerpc/kvm/e500_mmu_host.c
 create mode 100644 arch/powerpc/kvm/e500_mmu_host.h
 delete mode 100644 arch/powerpc/kvm/e500_tlb.c

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] KVM: PPC: e500: Implement TLB1-in-TLB0 mapping

2013-01-17 Thread Alexander Graf
When a host mapping fault happens in a guest TLB1 entry today, we
map the translated guest entry into the host's TLB1.

This isn't particularly clever when the guest is mapped by normal 4k
pages, since these would be a lot better to put into TLB0 instead.

This patch adds the required logic to map 4k TLB1 shadow maps into
the host's TLB0.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - rebase
  - allow mixing of TLB0 and TLB1_BITMAP backing of TLB1 pages
---
 arch/powerpc/kvm/e500.h  |1 +
 arch/powerpc/kvm/e500_mmu_host.c |   65 +++---
 2 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index c70d37e..41cefd4 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -28,6 +28,7 @@
 
 #define E500_TLB_VALID 1
 #define E500_TLB_BITMAP 2
+#define E500_TLB_TLB0  (1 << 2)
 
 struct tlbe_ref {
pfn_t pfn;
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 4c32d65..9a150bc 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -216,10 +216,21 @@ void inval_gtlbe_on_host(struct kvmppc_vcpu_e500 
*vcpu_e500, int tlbsel,
vcpu_e500->g2h_tlb1_map[esel] = 0;
ref->flags &= ~(E500_TLB_BITMAP | E500_TLB_VALID);
local_irq_restore(flags);
+   }
 
-   return;
+   if (tlbsel == 1 && ref->flags & E500_TLB_TLB0) {
+   /*
+* TLB1 entry is backed by 4k pages. This should happen
+* rarely and is not worth optimizing. Invalidate everything.
+*/
+   kvmppc_e500_tlbil_all(vcpu_e500);
+   ref->flags &= ~(E500_TLB_TLB0 | E500_TLB_VALID);
}
 
+   /* Already invalidated in between */
+   if (!(ref->flags & E500_TLB_VALID))
+   return;
+
/* Guest tlbe is backed by at most one host tlbe per shadow pid. */
kvmppc_e500_tlbil_one(vcpu_e500, gtlbe);
 
@@ -487,38 +498,54 @@ static int kvmppc_e500_tlb0_map(struct kvmppc_vcpu_e500 
*vcpu_e500, int esel,
return 0;
 }
 
+static int kvmppc_e500_tlb1_map_tlb1(struct kvmppc_vcpu_e500 *vcpu_e500,
+struct tlbe_ref *ref,
+int esel)
+{
+   unsigned int sesel = vcpu_e500->host_tlb1_nv++;
+
+   if (unlikely(vcpu_e500->host_tlb1_nv >= tlb1_max_shadow_size()))
+   vcpu_e500->host_tlb1_nv = 0;
+
+   vcpu_e500->tlb_refs[1][sesel] = *ref;
+   vcpu_e500->g2h_tlb1_map[esel] |= (u64)1 << sesel;
+   vcpu_e500->gtlb_priv[1][esel].ref.flags |= E500_TLB_BITMAP;
+   if (vcpu_e500->h2g_tlb1_rmap[sesel]) {
+   unsigned int idx = vcpu_e500->h2g_tlb1_rmap[sesel];
+   vcpu_e500->g2h_tlb1_map[idx] &= ~(1ULL << sesel);
+   }
+   vcpu_e500->h2g_tlb1_rmap[sesel] = esel;
+
+   return sesel;
+}
+
 /* Caller must ensure that the specified guest TLB entry is safe to insert into
  * the shadow TLB. */
-/* XXX for both one-one and one-to-many , for now use TLB1 */
+/* For both one-one and one-to-many */
 static int kvmppc_e500_tlb1_map(struct kvmppc_vcpu_e500 *vcpu_e500,
u64 gvaddr, gfn_t gfn, struct kvm_book3e_206_tlb_entry *gtlbe,
struct kvm_book3e_206_tlb_entry *stlbe, int esel)
 {
-   struct tlbe_ref *ref;
-   unsigned int sesel;
+   struct tlbe_ref ref;
+   int sesel;
int r;
-   int stlbsel = 1;
-
-   sesel = vcpu_e500->host_tlb1_nv++;
-
-   if (unlikely(vcpu_e500->host_tlb1_nv >= tlb1_max_shadow_size()))
-   vcpu_e500->host_tlb1_nv = 0;
 
-   ref = &vcpu_e500->tlb_refs[1][sesel];
+   ref.flags = 0;
r = kvmppc_e500_shadow_map(vcpu_e500, gvaddr, gfn, gtlbe, 1, stlbe,
-  ref);
+  &ref);
if (r)
return r;
 
-   vcpu_e500->g2h_tlb1_map[esel] |= (u64)1 << sesel;
-   vcpu_e500->gtlb_priv[1][esel].ref.flags |= E500_TLB_BITMAP;
-   if (vcpu_e500->h2g_tlb1_rmap[sesel]) {
-   unsigned int idx = vcpu_e500->h2g_tlb1_rmap[sesel];
-   vcpu_e500->g2h_tlb1_map[idx] &= ~(1ULL << sesel);
+   /* Use TLB0 when we can only map a page with 4k */
+   if (get_tlb_tsize(stlbe) == BOOK3E_PAGESZ_4K) {
+   vcpu_e500->gtlb_priv[1][esel].ref.flags |= E500_TLB_TLB0;
+   write_stlbe(vcpu_e500, gtlbe, stlbe, 0, 0);
+   return 0;
}
-   vcpu_e500->h2g_tlb1_rmap[sesel] = esel;
 
-   write_stlbe(vcpu_e500, gtlbe, stlbe, stlbsel, sesel);
+   /* Otherwise map into TLB1 */
+   sesel = kvmppc_e500_tlb1_map_tlb1(vcpu_e500, &ref, esel);
+   write_stlbe(vcpu_e500, gtlbe, stlbe, 1, sesel);
 
return 0;
 }
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org

[PATCH 2/6] KVM: PPC: E500: Explicitly mark shadow maps invalid

2013-01-17 Thread Alexander Graf
When we invalidate shadow TLB maps on the host, we don't mark them
as not valid. But we should.

Fix this by removing the E500_TLB_VALID from their flags when
invalidating.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/e500_tlb.c |   13 ++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index d38ad63..8efb2ac 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -204,9 +204,13 @@ static void inval_gtlbe_on_host(struct kvmppc_vcpu_e500 
*vcpu_e500,
 {
struct kvm_book3e_206_tlb_entry *gtlbe =
get_entry(vcpu_e500, tlbsel, esel);
+   struct tlbe_ref *ref = &vcpu_e500->gtlb_priv[tlbsel][esel].ref;
 
-   if (tlbsel == 1 &&
-   vcpu_e500->gtlb_priv[1][esel].ref.flags & E500_TLB_BITMAP) {
+   /* Don't bother with unmapped entries */
+   if (!(ref->flags & E500_TLB_VALID))
+   return;
+
+   if (tlbsel == 1 && ref->flags & E500_TLB_BITMAP) {
u64 tmp = vcpu_e500->g2h_tlb1_map[esel];
int hw_tlb_indx;
unsigned long flags;
@@ -224,7 +228,7 @@ static void inval_gtlbe_on_host(struct kvmppc_vcpu_e500 
*vcpu_e500,
}
mb();
vcpu_e500->g2h_tlb1_map[esel] = 0;
-   vcpu_e500->gtlb_priv[1][esel].ref.flags &= ~E500_TLB_BITMAP;
+   ref->flags &= ~(E500_TLB_BITMAP | E500_TLB_VALID);
local_irq_restore(flags);
 
return;
@@ -232,6 +236,9 @@ static void inval_gtlbe_on_host(struct kvmppc_vcpu_e500 
*vcpu_e500,
 
/* Guest tlbe is backed by at most one host tlbe per shadow pid. */
kvmppc_e500_tlbil_one(vcpu_e500, gtlbe);
+
+   /* Mark the TLB as not backed by the host anymore */
+   ref->flags &= ~E500_TLB_VALID;
 }
 
 static int tlb0_set_base(gva_t addr, int sets, int ways)
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] KVM: PPC: E500: Propagate errors when shadow mapping

2013-01-17 Thread Alexander Graf
When shadow mapping a page, mapping this page can fail. In that case we
don't have a shadow map.

Take this case into account, otherwise we might end up writing bogus TLB
entries into the host TLB.

While at it, also move the write_stlbe() calls into the respective TLBn
handlers.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/e500_tlb.c |   69 +-
 1 files changed, 41 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index 8efb2ac..3777167 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -432,7 +432,7 @@ static inline void kvmppc_e500_setup_stlbe(
 #endif
 }
 
-static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
+static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
u64 gvaddr, gfn_t gfn, struct kvm_book3e_206_tlb_entry *gtlbe,
int tlbsel, struct kvm_book3e_206_tlb_entry *stlbe,
struct tlbe_ref *ref)
@@ -551,7 +551,7 @@ static inline void kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
if (is_error_noslot_pfn(pfn)) {
printk(KERN_ERR "Couldn't get real page for gfn %lx!\n",
(long)gfn);
-   return;
+   return -EINVAL;
}
 
/* Align guest and physical address to page map boundaries */
@@ -571,22 +571,33 @@ static inline void kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
 
/* Drop refcount on page, so that mmu notifiers can clear it */
kvm_release_pfn_clean(pfn);
+
+   return 0;
 }
 
 /* XXX only map the one-one case, for now use TLB0 */
-static void kvmppc_e500_tlb0_map(struct kvmppc_vcpu_e500 *vcpu_e500,
-int esel,
-struct kvm_book3e_206_tlb_entry *stlbe)
+static int kvmppc_e500_tlb0_map(struct kvmppc_vcpu_e500 *vcpu_e500,
+   int esel,
+   struct kvm_book3e_206_tlb_entry *stlbe)
 {
struct kvm_book3e_206_tlb_entry *gtlbe;
struct tlbe_ref *ref;
+   int stlbsel = 0;
+   int sesel = 0;
+   int r;
 
gtlbe = get_entry(vcpu_e500, 0, esel);
ref = &vcpu_e500->gtlb_priv[0][esel].ref;
 
-   kvmppc_e500_shadow_map(vcpu_e500, get_tlb_eaddr(gtlbe),
+   r = kvmppc_e500_shadow_map(vcpu_e500, get_tlb_eaddr(gtlbe),
get_tlb_raddr(gtlbe) >> PAGE_SHIFT,
gtlbe, 0, stlbe, ref);
+   if (r)
+   return r;
+
+   write_stlbe(vcpu_e500, gtlbe, stlbe, stlbsel, sesel);
+
+   return 0;
 }
 
 /* Caller must ensure that the specified guest TLB entry is safe to insert into
@@ -597,25 +608,32 @@ static int kvmppc_e500_tlb1_map(struct kvmppc_vcpu_e500 
*vcpu_e500,
struct kvm_book3e_206_tlb_entry *stlbe, int esel)
 {
struct tlbe_ref *ref;
-   unsigned int victim;
+   unsigned int sesel;
+   int r;
+   int stlbsel = 1;
 
-   victim = vcpu_e500->host_tlb1_nv++;
+   sesel = vcpu_e500->host_tlb1_nv++;
 
if (unlikely(vcpu_e500->host_tlb1_nv >= tlb1_max_shadow_size()))
vcpu_e500->host_tlb1_nv = 0;
 
-   ref = &vcpu_e500->tlb_refs[1][victim];
-   kvmppc_e500_shadow_map(vcpu_e500, gvaddr, gfn, gtlbe, 1, stlbe, ref);
+   ref = &vcpu_e500->tlb_refs[1][sesel];
+   r = kvmppc_e500_shadow_map(vcpu_e500, gvaddr, gfn, gtlbe, 1, stlbe,
+  ref);
+   if (r)
+   return r;
 
-   vcpu_e500->g2h_tlb1_map[esel] |= (u64)1 << victim;
+   vcpu_e500->g2h_tlb1_map[esel] |= (u64)1 << sesel;
vcpu_e500->gtlb_priv[1][esel].ref.flags |= E500_TLB_BITMAP;
-   if (vcpu_e500->h2g_tlb1_rmap[victim]) {
-   unsigned int idx = vcpu_e500->h2g_tlb1_rmap[victim];
-   vcpu_e500->g2h_tlb1_map[idx] &= ~(1ULL << victim);
+   if (vcpu_e500->h2g_tlb1_rmap[sesel]) {
+   unsigned int idx = vcpu_e500->h2g_tlb1_rmap[sesel];
+   vcpu_e500->g2h_tlb1_map[idx] &= ~(1ULL << sesel);
}
-   vcpu_e500->h2g_tlb1_rmap[victim] = esel;
+   vcpu_e500->h2g_tlb1_rmap[sesel] = esel;
 
-   return victim;
+   write_stlbe(vcpu_e500, gtlbe, stlbe, stlbsel, sesel);
+
+   return 0;
 }
 
 static void kvmppc_recalc_tlb1map_range(struct kvmppc_vcpu_e500 *vcpu_e500)
@@ -1034,30 +1052,27 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, 
gpa_t gpaddr,
struct kvm_book3e_206_tlb_entry *gtlbe, stlbe;
int tlbsel = tlbsel_of(index);
int esel = esel_of(index);
-   int stlbsel, sesel;
 
gtlbe = get_entry(vcpu_e500, tlbsel, esel);
 
switch (tlbsel) {
case 0:
-   stlbsel = 0;
-   sesel = 0; /* unused */
priv = &vcpu_e500->gtlb_priv[tlbsel][esel];
 
-   /* Only tri

[PATCH 4/6] KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping

2013-01-17 Thread Alexander Graf
When emulating tlbwe, we want to automatically map the entry that just got
written in our shadow TLB map, because chances are quite high that it's
going to be used very soon.

Today this happens explicitly, duplicating all the logic that is in
kvmppc_mmu_map() already. Just call that one instead.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - leave stlbe uninitialized
  - remove force init logic, it's handled by the explicit invalidate now
---
 arch/powerpc/kvm/e500_tlb.c |   38 +++---
 1 files changed, 7 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index 3777167..48d1a4f 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -878,8 +878,8 @@ int kvmppc_e500_emul_tlbsx(struct kvm_vcpu *vcpu, gva_t ea)
 int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
-   struct kvm_book3e_206_tlb_entry *gtlbe, stlbe;
-   int tlbsel, esel, stlbsel, sesel;
+   struct kvm_book3e_206_tlb_entry *gtlbe;
+   int tlbsel, esel;
int recal = 0;
 
tlbsel = get_tlb_tlbsel(vcpu);
@@ -917,40 +917,16 @@ int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu)
 
/* Invalidate shadow mappings for the about-to-be-clobbered TLBE. */
if (tlbe_is_host_safe(vcpu, gtlbe)) {
-   u64 eaddr;
-   u64 raddr;
+   u64 eaddr = get_tlb_eaddr(gtlbe);
+   u64 raddr = get_tlb_raddr(gtlbe);
 
-   switch (tlbsel) {
-   case 0:
-   /* TLB0 */
+   if (tlbsel == 0) {
gtlbe->mas1 &= ~MAS1_TSIZE(~0);
gtlbe->mas1 |= MAS1_TSIZE(BOOK3E_PAGESZ_4K);
-
-   stlbsel = 0;
-   kvmppc_e500_tlb0_map(vcpu_e500, esel, &stlbe);
-   sesel = 0; /* unused */
-
-   break;
-
-   case 1:
-   /* TLB1 */
-   eaddr = get_tlb_eaddr(gtlbe);
-   raddr = get_tlb_raddr(gtlbe);
-
-   /* Create a 4KB mapping on the host.
-* If the guest wanted a large page,
-* only the first 4KB is mapped here and the rest
-* are mapped on the fly. */
-   stlbsel = 1;
-   sesel = kvmppc_e500_tlb1_map(vcpu_e500, eaddr,
-   raddr >> PAGE_SHIFT, gtlbe, &stlbe, esel);
-   break;
-
-   default:
-   BUG();
}
 
-   write_stlbe(vcpu_e500, gtlbe, &stlbe, stlbsel, sesel);
+   /* Premap the faulting page */
+   kvmppc_mmu_map(vcpu, eaddr, raddr, index_of(tlbsel, esel));
}
 
kvmppc_set_exit_type(vcpu, EMULATED_TLBWE_EXITS);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] KVM: PPC: E500: Move write_stlbe higher

2013-01-17 Thread Alexander Graf
Later patches want to call the function and it doesn't have
dependencies on anything below write_host_tlbe.

Move it higher up in the file.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/e500_tlb.c |   32 
 1 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index cf3f180..d38ad63 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -156,6 +156,22 @@ static inline void write_host_tlbe(struct kvmppc_vcpu_e500 
*vcpu_e500,
}
 }
 
+/* sesel is for tlb1 only */
+static void write_stlbe(struct kvmppc_vcpu_e500 *vcpu_e500,
+   struct kvm_book3e_206_tlb_entry *gtlbe,
+   struct kvm_book3e_206_tlb_entry *stlbe,
+   int stlbsel, int sesel)
+{
+   int stid;
+
+   preempt_disable();
+   stid = kvmppc_e500_get_tlb_stid(&vcpu_e500->vcpu, gtlbe);
+
+   stlbe->mas1 |= MAS1_TID(stid);
+   write_host_tlbe(vcpu_e500, stlbsel, sesel, stlbe);
+   preempt_enable();
+}
+
 #ifdef CONFIG_KVM_E500V2
 void kvmppc_map_magic(struct kvm_vcpu *vcpu)
 {
@@ -834,22 +850,6 @@ int kvmppc_e500_emul_tlbsx(struct kvm_vcpu *vcpu, gva_t ea)
return EMULATE_DONE;
 }
 
-/* sesel is for tlb1 only */
-static void write_stlbe(struct kvmppc_vcpu_e500 *vcpu_e500,
-   struct kvm_book3e_206_tlb_entry *gtlbe,
-   struct kvm_book3e_206_tlb_entry *stlbe,
-   int stlbsel, int sesel)
-{
-   int stid;
-
-   preempt_disable();
-   stid = kvmppc_e500_get_tlb_stid(&vcpu_e500->vcpu, gtlbe);
-
-   stlbe->mas1 |= MAS1_TID(stid);
-   write_host_tlbe(vcpu_e500, stlbsel, sesel, stlbe);
-   preempt_enable();
-}
-
 int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v11 2/3] x86, apicv: add virtual x2apic support

2013-01-17 Thread Zhang, Yang Z
Gleb Natapov wrote on 2013-01-17:
> On Wed, Jan 16, 2013 at 06:21:11PM +0800, Yang Zhang wrote:
>> From: Yang Zhang 
>> @@ -6103,6 +6206,53 @@ static void update_cr8_intercept(struct kvm_vcpu
> *vcpu, int tpr, int irr)
>>  vmcs_write32(TPR_THRESHOLD, irr);
>>  }
>> +static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool
>> set) +{ +u32 exec_control, sec_exec_control; +   int msr; +  struct
>> vcpu_vmx *vmx = to_vmx(vcpu); + +/* There is not point to enable
>> virtualize x2apic without enable +* apicv*/ +if
>> (!cpu_has_vmx_virtualize_x2apic_mode() || !enable_apicv_reg) +   
>> return;
>> + +  if (set) { +exec_control =
>> vmcs_read32(CPU_BASED_VM_EXEC_CONTROL); +/* virtualize x2apic 
>> mode
>> relies on tpr shadow */ +if (!(exec_control & 
>> CPU_BASED_TPR_SHADOW))
>> +return; +   } + +   sec_exec_control =
>> vmcs_read32(SECONDARY_VM_EXEC_CONTROL); + +  if (set) {
>> +sec_exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
>> +sec_exec_control |= SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE; +
>> } else
>> { +  sec_exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE; +   
>> if
>> (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) +  
>> sec_exec_control
>> |= + 
>> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; +  }
>> +vmcs_write32(SECONDARY_VM_EXEC_CONTROL, sec_exec_control); + +  if
>> (set) { +for (msr = 0x800; msr <= 0x8ff; msr++)
>> +vmx_intercept_for_msr_read_x2apic(msr, false); + +  
>> /* According
>> SDM, in x2apic mode, the whole id reg is used. +  * But in KVM, 
>> it
>> only use the highest eight bits. Need to +* intercept it */
>> +vmx_intercept_for_msr_read_x2apic(0x802, true); +   
>> /* TMCCT */
>> +vmx_intercept_for_msr_read_x2apic(0x839, true); +   
>> /* TPR */
>> +vmx_intercept_for_msr_write_x2apic(0x808, false); + }
> Do it during vmx init, not here. Here you only need to call
> vmx_set_msr_bitmap().
Sure. It is more reasonable. 

>> +vmx_set_msr_bitmap(vcpu);
>> +}
>> +
>>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx) {u32
>>  exit_intr_info; @@ -7366,6 +7516,7 @@ static struct kvm_x86_ops
>>  vmx_x86_ops = { .enable_nmi_window = enable_nmi_window,
>>  .enable_irq_window = enable_irq_window, .update_cr8_intercept =
>>  update_cr8_intercept,
>> +.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
>> 
>>  .set_tss_addr = vmx_set_tss_addr,   .get_tdp_level = get_ept_level, 
>> @@
>>  -7419,11 +7570,19 @@ static int __init vmx_init(void)   if
>>  (!vmx_msr_bitmap_legacy)goto out1;
>> +vmx_msr_bitmap_legacy_x2apic =
>> +(unsigned long *)__get_free_page(GFP_KERNEL);
>> +if (!vmx_msr_bitmap_legacy_x2apic)
>> +goto out2;
>> 
>>  vmx_msr_bitmap_longmode = (unsigned long
>>  *)__get_free_page(GFP_KERNEL);  if (!vmx_msr_bitmap_longmode)
>> -goto out2;
>> +goto out3;
>> 
>> +vmx_msr_bitmap_longmode_x2apic =
>> +(unsigned long *)__get_free_page(GFP_KERNEL);
>> +if (!vmx_msr_bitmap_longmode_x2apic)
>> +goto out4;
>> 
>>  /*   * Allow direct access to the PC debug port (it is often used 
>> for
>>  I/O @@ -7456,6 +7615,11 @@ static int __init vmx_init(void)
>>  vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false);
>>  vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
>> +memcpy(vmx_msr_bitmap_legacy_x2apic,
>> +vmx_msr_bitmap_legacy, PAGE_SIZE);
>> +memcpy(vmx_msr_bitmap_longmode_x2apic,
>> +vmx_msr_bitmap_longmode, PAGE_SIZE);
>> +
>>  if (enable_ept) {
>>  kvm_mmu_set_mask_ptes(0ull,
>>  (enable_ept_ad_bits) ? VMX_EPT_ACCESS_BIT : 0ull,
>> @@ -7468,8 +7632,10 @@ static int __init vmx_init(void)
>> 
>>  return 0;
>> -out3:
>> +out4:
>>  free_page((unsigned long)vmx_msr_bitmap_longmode);
>> +out3:
>> +free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic);
>>  out2:
>>  free_page((unsigned long)vmx_msr_bitmap_legacy);
>>  out1:
>> @@ -7481,6 +7647,8 @@ out:
>> 
>>  static void __exit vmx_exit(void)
>>  {
>> +free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic);
>> +free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic);
>>  free_page((unsigned long)vmx_msr_bitmap_legacy);
>>  free_page((unsigned long)vmx_msr_bitmap_longmode);
>>  free_page((unsigned long)vmx_io_bitmap_b);
>> --
>> 1.7.1
> 
> --
>   Gleb.


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/ma

Re: [PATCH 1/3] KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping

2013-01-17 Thread Alexander Graf

On 18.01.2013, at 01:47, Scott Wood wrote:

> On 01/17/2013 06:20:03 PM, Alexander Graf wrote:
>> On 18.01.2013, at 01:11, Scott Wood wrote:
>> > On 01/17/2013 04:50:39 PM, Alexander Graf wrote:
>> >> @@ -1024,9 +1001,11 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 
>> >> eaddr, gpa_t gpaddr,
>> >> {
>> >>   struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
>> >>   struct tlbe_priv *priv;
>> >> - struct kvm_book3e_206_tlb_entry *gtlbe, stlbe;
>> >> + struct kvm_book3e_206_tlb_entry *gtlbe, stlbe = {};
>> >
>> > Is there a code path in which stlbe gets used but not fully filled in
>> > without this?
>> I am hoping not, but when I wrote this patch gcc suddenly jumped at me 
>> claiming that the whole struct can get used uninitialized:
>> arch/powerpc/kvm/e500_mmu_host.c: In function ‘kvmppc_mmu_map’:
>> arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas1’ may be used 
>> uninitialized in this function
>> arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas2’ may be used 
>> uninitialized in this function
>> arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas7_3’ may be used 
>> uninitialized in this function
>> arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas8’ may be used 
>> uninitialized in this function
>> If you have any idea where this could come from, please let me know :).
> 
> I can't reproduce with either GCC 4.5.1 or GCC 4.7.2.  Maybe from a non-final 
> version of the patch?  It would be nice to not have this, and have GCC be 
> able to detect if we're actually missing something rather than have it get 
> zeroed.

Bleks - the warning was actually genuine. I have no idea why it never triggered 
before, but when kvmppc_e500_shadow_map errored out with if 
(is_error_noslot_pfn(pfn)) then the masX fields were never set in stlbe.

I've added proper error passing now. Let's see if I can still manage to split 
all of this into patches...


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping

2013-01-17 Thread Alexander Graf

On 18.01.2013, at 01:47, Scott Wood wrote:

> On 01/17/2013 06:20:03 PM, Alexander Graf wrote:
>> On 18.01.2013, at 01:11, Scott Wood wrote:
>> > On 01/17/2013 04:50:39 PM, Alexander Graf wrote:
>> >> @@ -1024,9 +1001,11 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 
>> >> eaddr, gpa_t gpaddr,
>> >> {
>> >>   struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
>> >>   struct tlbe_priv *priv;
>> >> - struct kvm_book3e_206_tlb_entry *gtlbe, stlbe;
>> >> + struct kvm_book3e_206_tlb_entry *gtlbe, stlbe = {};
>> >
>> > Is there a code path in which stlbe gets used but not fully filled in
>> > without this?
>> I am hoping not, but when I wrote this patch gcc suddenly jumped at me 
>> claiming that the whole struct can get used uninitialized:
>> arch/powerpc/kvm/e500_mmu_host.c: In function ‘kvmppc_mmu_map’:
>> arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas1’ may be used 
>> uninitialized in this function
>> arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas2’ may be used 
>> uninitialized in this function
>> arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas7_3’ may be used 
>> uninitialized in this function
>> arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas8’ may be used 
>> uninitialized in this function
>> If you have any idea where this could come from, please let me know :).
> 
> I can't reproduce with either GCC 4.5.1 or GCC 4.7.2.  Maybe from a non-final 
> version of the patch?  It would be nice to not have this, and have GCC be 
> able to detect if we're actually missing something rather than have it get 
> zeroed.

Ok, I'll try without again :).

> BTW, it's "stlbe = {}" in this patch but after the file split, somehow come 
> out as "stlbe = { }".  Was that patch supposed to be just a simple cut and 
> paste of part of the file?

It started off as that, but constant rebasing to fix issues I encountered might 
have slipped in :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping

2013-01-17 Thread Scott Wood

On 01/17/2013 06:20:03 PM, Alexander Graf wrote:


On 18.01.2013, at 01:11, Scott Wood wrote:

> On 01/17/2013 04:50:39 PM, Alexander Graf wrote:
>> @@ -1024,9 +1001,11 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu,  
u64 eaddr, gpa_t gpaddr,

>> {
>>struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
>>struct tlbe_priv *priv;
>> -  struct kvm_book3e_206_tlb_entry *gtlbe, stlbe;
>> +  struct kvm_book3e_206_tlb_entry *gtlbe, stlbe = {};
>
> Is there a code path in which stlbe gets used but not fully filled  
in

> without this?

I am hoping not, but when I wrote this patch gcc suddenly jumped at  
me claiming that the whole struct can get used uninitialized:


arch/powerpc/kvm/e500_mmu_host.c: In function ‘kvmppc_mmu_map’:
arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas1’ may be used  
uninitialized in this function
arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas2’ may be used  
uninitialized in this function
arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas7_3’ may be  
used uninitialized in this function
arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas8’ may be used  
uninitialized in this function


If you have any idea where this could come from, please let me know  
:).


I can't reproduce with either GCC 4.5.1 or GCC 4.7.2.  Maybe from a  
non-final version of the patch?  It would be nice to not have this, and  
have GCC be able to detect if we're actually missing something rather  
than have it get zeroed.


BTW, it's "stlbe = {}" in this patch but after the file split, somehow  
come out as "stlbe = { }".  Was that patch supposed to be just a simple  
cut and paste of part of the file?


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tcm_vhost: Use llist for cmd completion list

2013-01-17 Thread Asias He
On 01/18/2013 04:35 AM, Nicholas A. Bellinger wrote:
> Hi Asias!
> 
> On Sun, 2013-01-06 at 14:36 +0800, Asias He wrote:
>> This drops the cmd completion list spin lock and makes the cmd
>> completion queue lock-less.
>>
>> Signed-off-by: Asias He 
>> ---
> 
> Apologies for the long delay to get back to this patch.

No problem.

> 
> After some initial testing, I'm seeing about about a ~5K IOPs
> performance increase to single RAMDISK_MCP (~110K to ~115K) on the heavy
> mixed 75/25 4k randrw fio workload.

> That said, I think it's fine to as a for-3.9 item.

Okay.

> Acked-by: Nicholas Bellinger 
> 
> MST, do you want to take this via your vhost tree, or shall I merge via
> target-pending/for-next..?
> 
> Thanks,
> 
> --nab
> 
>>  drivers/vhost/tcm_vhost.c | 46 
>> +-
>>  drivers/vhost/tcm_vhost.h |  2 +-
>>  2 files changed, 14 insertions(+), 34 deletions(-)
>>
>> diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
>> index b20df5c..3720604 100644
>> --- a/drivers/vhost/tcm_vhost.c
>> +++ b/drivers/vhost/tcm_vhost.c
>> @@ -47,6 +47,7 @@
>>  #include 
>>  #include  /* TODO vhost.h currently depends on this */
>>  #include 
>> +#include 
>>  
>>  #include "vhost.c"
>>  #include "vhost.h"
>> @@ -64,8 +65,7 @@ struct vhost_scsi {
>>  struct vhost_virtqueue vqs[3];
>>  
>>  struct vhost_work vs_completion_work; /* cmd completion work item */
>> -struct list_head vs_completion_list;  /* cmd completion queue */
>> -spinlock_t vs_completion_lock;/* protects s_completion_list */
>> +struct llist_head vs_completion_list; /* cmd completion queue */
>>  };
>>  
>>  /* Local pointer to allocated TCM configfs fabric module */
>> @@ -301,9 +301,7 @@ static void vhost_scsi_complete_cmd(struct tcm_vhost_cmd 
>> *tv_cmd)
>>  {
>>  struct vhost_scsi *vs = tv_cmd->tvc_vhost;
>>  
>> -spin_lock_bh(&vs->vs_completion_lock);
>> -list_add_tail(&tv_cmd->tvc_completion_list, &vs->vs_completion_list);
>> -spin_unlock_bh(&vs->vs_completion_lock);
>> +llist_add(&tv_cmd->tvc_completion_list, &vs->vs_completion_list);
>>  
>>  vhost_work_queue(&vs->dev, &vs->vs_completion_work);
>>  }
>> @@ -347,27 +345,6 @@ static void vhost_scsi_free_cmd(struct tcm_vhost_cmd 
>> *tv_cmd)
>>  kfree(tv_cmd);
>>  }
>>  
>> -/* Dequeue a command from the completion list */
>> -static struct tcm_vhost_cmd *vhost_scsi_get_cmd_from_completion(
>> -struct vhost_scsi *vs)
>> -{
>> -struct tcm_vhost_cmd *tv_cmd = NULL;
>> -
>> -spin_lock_bh(&vs->vs_completion_lock);
>> -if (list_empty(&vs->vs_completion_list)) {
>> -spin_unlock_bh(&vs->vs_completion_lock);
>> -return NULL;
>> -}
>> -
>> -list_for_each_entry(tv_cmd, &vs->vs_completion_list,
>> -tvc_completion_list) {
>> -list_del(&tv_cmd->tvc_completion_list);
>> -break;
>> -}
>> -spin_unlock_bh(&vs->vs_completion_lock);
>> -return tv_cmd;
>> -}
>> -
>>  /* Fill in status and signal that we are done processing this command
>>   *
>>   * This is scheduled in the vhost work queue so we are called with the owner
>> @@ -377,12 +354,18 @@ static void vhost_scsi_complete_cmd_work(struct 
>> vhost_work *work)
>>  {
>>  struct vhost_scsi *vs = container_of(work, struct vhost_scsi,
>>  vs_completion_work);
>> +struct virtio_scsi_cmd_resp v_rsp;
>>  struct tcm_vhost_cmd *tv_cmd;
>> +struct llist_node *llnode;
>> +struct se_cmd *se_cmd;
>> +int ret;
>>  
>> -while ((tv_cmd = vhost_scsi_get_cmd_from_completion(vs))) {
>> -struct virtio_scsi_cmd_resp v_rsp;
>> -struct se_cmd *se_cmd = &tv_cmd->tvc_se_cmd;
>> -int ret;
>> +llnode = llist_del_all(&vs->vs_completion_list);
>> +while (llnode) {
>> +tv_cmd = llist_entry(llnode, struct tcm_vhost_cmd,
>> + tvc_completion_list);
>> +llnode = llist_next(llnode);
>> +se_cmd = &tv_cmd->tvc_se_cmd;
>>  
>>  pr_debug("%s tv_cmd %p resid %u status %#02x\n", __func__,
>>  tv_cmd, se_cmd->residual_count, se_cmd->scsi_status);
>> @@ -426,7 +409,6 @@ static struct tcm_vhost_cmd *vhost_scsi_allocate_cmd(
>>  pr_err("Unable to allocate struct tcm_vhost_cmd\n");
>>  return ERR_PTR(-ENOMEM);
>>  }
>> -INIT_LIST_HEAD(&tv_cmd->tvc_completion_list);
>>  tv_cmd->tvc_tag = v_req->tag;
>>  tv_cmd->tvc_task_attr = v_req->task_attr;
>>  tv_cmd->tvc_exp_data_len = exp_data_len;
>> @@ -859,8 +841,6 @@ static int vhost_scsi_open(struct inode *inode, struct 
>> file *f)
>>  return -ENOMEM;
>>  
>>  vhost_work_init(&s->vs_completion_work, vhost_scsi_complete_cmd_work);
>> -INIT_LIST_HEAD(&s->vs_completion_list);
>> -spin_lock_init(&s->vs_completion_lock);
>>  
>>  s->vqs[VHOST_SCSI_VQ_CTL].handle_kick = vhost_scsi

Re: [PATCH 1/3] KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping

2013-01-17 Thread Scott Wood

On 01/17/2013 06:29:56 PM, Alexander Graf wrote:


On 18.01.2013, at 01:20, Alexander Graf wrote:

>
> On 18.01.2013, at 01:11, Scott Wood wrote:
>
>> It also seems like it would be cleaner to just invalidate the old  
entry
>> in tlbwe, and then this function doesn't need to change at all.  I  
am a

>> bit confused by how invalidation is currently operating

Well, this bit is obvious. It's done by kvmppc_e500_shadow_map when  
it calls kvmppc_e500_ref_release(), right?


Why it's not done explicitly though is a really good question :).


Yeah, I saw that but considered it (in combination with not reaching  
this function if the gtlbe is invalid, and with always preloading the  
host entry when the guest does tlbwe) to be why we're getting away with  
it rather than why it was done that way.


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: PPC: e500: Implement TLB1-in-TLB0 mapping

2013-01-17 Thread Scott Wood

On 01/17/2013 04:50:41 PM, Alexander Graf wrote:

When a host mapping fault happens in a guest TLB1 entry today, we
map the translated guest entry into the host's TLB1.

This isn't particularly clever when the guest is mapped by normal 4k
pages, since these would be a lot better to put into TLB0 instead.

This patch adds the required logic to map 4k TLB1 shadow maps into
the host's TLB0.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/e500.h  |1 +
 arch/powerpc/kvm/e500_mmu_host.c |   58  
+

 2 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index 00f96d8..d32e6a8 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -28,6 +28,7 @@

 #define E500_TLB_VALID 1
 #define E500_TLB_BITMAP 2
+#define E500_TLB_TLB0  (1 << 2)

 struct tlbe_ref {
pfn_t pfn;
diff --git a/arch/powerpc/kvm/e500_mmu_host.c  
b/arch/powerpc/kvm/e500_mmu_host.c

index 3bb2154..cbb6cf8 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -198,6 +198,11 @@ void inval_gtlbe_on_host(struct kvmppc_vcpu_e500  
*vcpu_e500, int tlbsel,

local_irq_restore(flags);

return;
+   } else if (tlbsel == 1 &&
+		   vcpu_e500->gtlb_priv[1][esel].ref.flags &  
E500_TLB_TLB0) {

+   /* This is a slow path, so just invalidate everything */
+   kvmppc_e500_tlbil_all(vcpu_e500);
+		vcpu_e500->gtlb_priv[1][esel].ref.flags &=  
~E500_TLB_TLB0;

}


What if the guest TLB1 entry is backed by a mix of TLB0 and TLB1  
entries on the host?  I don't see checks elsewhere that would prevent  
this situation.


@@ -529,9 +556,14 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64  
eaddr, gpa_t gpaddr,

case 1: {
gfn_t gfn = gpaddr >> PAGE_SHIFT;

-   stlbsel = 1;
sesel = kvmppc_e500_tlb1_map(vcpu_e500, eaddr, gfn,
 gtlbe, &stlbe, esel);
+   if (sesel < 0) {
+   /* TLB0 mapping */
+   sesel = 0;
+   stlbsel = 0;
+   } else
+   stlbsel = 1;
break;
}


Maybe push the call to write_tlbe() into the tlb0/1_map functions,  
getting rid of the need to pass sesel/stlbsel/stlbe back?


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping

2013-01-17 Thread Alexander Graf

On 18.01.2013, at 01:20, Alexander Graf wrote:

> 
> On 18.01.2013, at 01:11, Scott Wood wrote:
> 
>> On 01/17/2013 04:50:39 PM, Alexander Graf wrote:
>>> @@ -1024,9 +1001,11 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 
>>> eaddr, gpa_t gpaddr,
>>> {
>>> struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
>>> struct tlbe_priv *priv;
>>> -   struct kvm_book3e_206_tlb_entry *gtlbe, stlbe;
>>> +   struct kvm_book3e_206_tlb_entry *gtlbe, stlbe = {};
>> 
>> Is there a code path in which stlbe gets used but not fully filled in
>> without this?
> 
> I am hoping not, but when I wrote this patch gcc suddenly jumped at me 
> claiming that the whole struct can get used uninitialized:
> 
> arch/powerpc/kvm/e500_mmu_host.c: In function ‘kvmppc_mmu_map’:
> arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas1’ may be used 
> uninitialized in this function
> arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas2’ may be used 
> uninitialized in this function
> arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas7_3’ may be used 
> uninitialized in this function
> arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas8’ may be used 
> uninitialized in this function
> 
> If you have any idea where this could come from, please let me know :).
> 
>> 
>>> int tlbsel = tlbsel_of(index);
>>> int esel = esel_of(index);
>>> +   /* Needed for initial map, where we can't use the cached value */
>>> +   int force_map = index & KVM_E500_INDEX_FORCE_MAP;
>>> int stlbsel, sesel;
>>> gtlbe = get_entry(vcpu_e500, tlbsel, esel);
>>> @@ -1038,7 +1017,7 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, 
>>> gpa_t gpaddr,
>>> priv = &vcpu_e500->gtlb_priv[tlbsel][esel];
>>> /* Only triggers after clear_tlb_refs */
>>> -   if (unlikely(!(priv->ref.flags & E500_TLB_VALID)))
>>> +   if (force_map || unlikely(!(priv->ref.flags & E500_TLB_VALID)))
>>> kvmppc_e500_tlb0_map(vcpu_e500, esel, &stlbe);
>>> else
>>> kvmppc_e500_setup_stlbe(vcpu, gtlbe, BOOK3E_PAGESZ_4K,
>> 
>> It seems a bit odd to overload index rather than adding a flags
>> parameter...
> 
> Yeah, I mostly wanted to refrain from touching 440 code. But if you prefer 
> that, I can certainly do so :).
> 
>> It also seems like it would be cleaner to just invalidate the old entry
>> in tlbwe, and then this function doesn't need to change at all.  I am a
>> bit confused by how invalidation is currently operating

Well, this bit is obvious. It's done by kvmppc_e500_shadow_map when it calls 
kvmppc_e500_ref_release(), right?

Why it's not done explicitly though is a really good question :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping

2013-01-17 Thread Alexander Graf

On 18.01.2013, at 01:11, Scott Wood wrote:

> On 01/17/2013 04:50:39 PM, Alexander Graf wrote:
>> @@ -1024,9 +1001,11 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, 
>> gpa_t gpaddr,
>> {
>>  struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
>>  struct tlbe_priv *priv;
>> -struct kvm_book3e_206_tlb_entry *gtlbe, stlbe;
>> +struct kvm_book3e_206_tlb_entry *gtlbe, stlbe = {};
> 
> Is there a code path in which stlbe gets used but not fully filled in
> without this?

I am hoping not, but when I wrote this patch gcc suddenly jumped at me claiming 
that the whole struct can get used uninitialized:

arch/powerpc/kvm/e500_mmu_host.c: In function ‘kvmppc_mmu_map’:
arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas1’ may be used 
uninitialized in this function
arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas2’ may be used 
uninitialized in this function
arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas7_3’ may be used 
uninitialized in this function
arch/powerpc/kvm/e500_mmu_host.c:533: error: ‘stlbe.mas8’ may be used 
uninitialized in this function

If you have any idea where this could come from, please let me know :).

> 
>>  int tlbsel = tlbsel_of(index);
>>  int esel = esel_of(index);
>> +/* Needed for initial map, where we can't use the cached value */
>> +int force_map = index & KVM_E500_INDEX_FORCE_MAP;
>>  int stlbsel, sesel;
>>  gtlbe = get_entry(vcpu_e500, tlbsel, esel);
>> @@ -1038,7 +1017,7 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, 
>> gpa_t gpaddr,
>>  priv = &vcpu_e500->gtlb_priv[tlbsel][esel];
>>  /* Only triggers after clear_tlb_refs */
>> -if (unlikely(!(priv->ref.flags & E500_TLB_VALID)))
>> +if (force_map || unlikely(!(priv->ref.flags & E500_TLB_VALID)))
>>  kvmppc_e500_tlb0_map(vcpu_e500, esel, &stlbe);
>>  else
>>  kvmppc_e500_setup_stlbe(vcpu, gtlbe, BOOK3E_PAGESZ_4K,
> 
> It seems a bit odd to overload index rather than adding a flags
> parameter...

Yeah, I mostly wanted to refrain from touching 440 code. But if you prefer 
that, I can certainly do so :).

> It also seems like it would be cleaner to just invalidate the old entry
> in tlbwe, and then this function doesn't need to change at all.  I am a
> bit confused by how invalidation is currently operating -- why is
> E500_TLB_VALID not cleared on invalidations (except for MMU API stuff and
> MMU notifiers)?

Consider me as confused as you are.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping

2013-01-17 Thread Scott Wood

On 01/17/2013 04:50:39 PM, Alexander Graf wrote:
@@ -1024,9 +1001,11 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64  
eaddr, gpa_t gpaddr,

 {
struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
struct tlbe_priv *priv;
-   struct kvm_book3e_206_tlb_entry *gtlbe, stlbe;
+   struct kvm_book3e_206_tlb_entry *gtlbe, stlbe = {};


Is there a code path in which stlbe gets used but not fully filled in
without this?


int tlbsel = tlbsel_of(index);
int esel = esel_of(index);
+	/* Needed for initial map, where we can't use the cached value  
*/

+   int force_map = index & KVM_E500_INDEX_FORCE_MAP;
int stlbsel, sesel;

gtlbe = get_entry(vcpu_e500, tlbsel, esel);
@@ -1038,7 +1017,7 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64  
eaddr, gpa_t gpaddr,

priv = &vcpu_e500->gtlb_priv[tlbsel][esel];

/* Only triggers after clear_tlb_refs */
-   if (unlikely(!(priv->ref.flags & E500_TLB_VALID)))
+		if (force_map || unlikely(!(priv->ref.flags &  
E500_TLB_VALID)))

kvmppc_e500_tlb0_map(vcpu_e500, esel, &stlbe);
else
 			kvmppc_e500_setup_stlbe(vcpu, gtlbe,  
BOOK3E_PAGESZ_4K,


It seems a bit odd to overload index rather than adding a flags
parameter...

It also seems like it would be cleaner to just invalidate the old entry
in tlbwe, and then this function doesn't need to change at all.  I am a
bit confused by how invalidation is currently operating -- why is
E500_TLB_VALID not cleared on invalidations (except for MMU API stuff  
and

MMU notifiers)?

-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] KVM: PPC: Emulate dcbf

2013-01-17 Thread Alexander Graf
Guests can trigger MMIO exits using dcbf. Since we don't emulate cache
incoherent MMIO, just do nothing and move on.

Reported-by: Ben Collins 
Signed-off-by: Alexander Graf 
Tested-by: Ben Collins 
CC: sta...@vger.kernel.org
---
 arch/powerpc/kvm/emulate.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index b0855e5..9d9cddc 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -39,6 +39,7 @@
 #define OP_31_XOP_TRAP  4
 #define OP_31_XOP_LWZX  23
 #define OP_31_XOP_TRAP_64   68
+#define OP_31_XOP_DCBF  86
 #define OP_31_XOP_LBZX  87
 #define OP_31_XOP_STWX  151
 #define OP_31_XOP_STBX  215
@@ -374,6 +375,7 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct 
kvm_vcpu *vcpu)
emulated = kvmppc_emulate_mtspr(vcpu, sprn, rs);
break;
 
+   case OP_31_XOP_DCBF:
case OP_31_XOP_DCBI:
/* Do nothing. The guest is performing dcbi because
 * hardware DMA is not snooped by the dcache, but
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 3.8 0/1] ppc patch queue 2013-01-18 for 3.8

2013-01-17 Thread Alexander Graf
Hi Marcelo / Gleb,

This is my current patch queue for ppc against 3.8.  Please pull.

It contains a bug fix for an issue that Ben Collins ran into, where
a guest would just abort because it traps during an unknown instruction.

Alex


The following changes since commit dfdebc24837ed0a1d6ad73b108a10d3c88d1b6e8:
  Linus Torvalds (1):
Merge tag 'for-linus-v3.8-rc4' of git://oss.sgi.com/xfs/xfs

are available in the git repository at:

  git://github.com/agraf/linux-2.6.git kvm-ppc-3.8

Alexander Graf (1):
  KVM: PPC: Emulate dcbf

 arch/powerpc/kvm/emulate.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Emulate dcbf

2013-01-17 Thread Alexander Graf
Guests can trigger MMIO exits using dcbf. Since we don't emulate cache
incoherent MMIO, just do nothing and move on.

Reported-by: Ben Collins 
Signed-off-by: Alexander Graf 
Tested-by: Ben Collins 
---
 arch/powerpc/kvm/emulate.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 71abcf4..7a73b6f 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -39,6 +39,7 @@
 #define OP_31_XOP_TRAP  4
 #define OP_31_XOP_LWZX  23
 #define OP_31_XOP_TRAP_64   68
+#define OP_31_XOP_DCBF  86
 #define OP_31_XOP_LBZX  87
 #define OP_31_XOP_STWX  151
 #define OP_31_XOP_STBX  215
@@ -369,6 +370,7 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct 
kvm_vcpu *vcpu)
emulated = kvmppc_emulate_mtspr(vcpu, sprn, rs);
break;
 
+   case OP_31_XOP_DCBF:
case OP_31_XOP_DCBI:
/* Do nothing. The guest is performing dcbi because
 * hardware DMA is not snooped by the dcache, but
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping

2013-01-17 Thread Alexander Graf
When emulating tlbwe, we want to automatically map the entry that just got
written in our shadow TLB map, because chances are quite high that it's
going to be used very soon.

Today this happens explicitly, duplicating all the logic that is in
kvmppc_mmu_map() already. Just call that one instead.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/e500.h |4 ++-
 arch/powerpc/kvm/e500_tlb.c |   45 +++---
 2 files changed, 15 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index c70d37e..00f96d8 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -113,8 +113,10 @@ static inline struct kvmppc_vcpu_e500 *to_e500(struct 
kvm_vcpu *vcpu)
 #define KVM_E500_TLB0_SIZE  (KVM_E500_TLB0_WAY_SIZE * KVM_E500_TLB0_WAY_NUM)
 #define KVM_E500_TLB1_SIZE  16
 
+#define KVM_E500_INDEX_FORCE_MAP 0x8000
+
 #define index_of(tlbsel, esel) (((tlbsel) << 16) | ((esel) & 0x))
-#define tlbsel_of(index)   ((index) >> 16)
+#define tlbsel_of(index)   (((index) >> 16) & 0x3)
 #define esel_of(index) ((index) & 0x)
 
 #define E500_TLB_USER_PERM_MASK (MAS3_UX|MAS3_UR|MAS3_UW)
diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index cf3f180..eda7be1 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -853,8 +853,8 @@ static void write_stlbe(struct kvmppc_vcpu_e500 *vcpu_e500,
 int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
-   struct kvm_book3e_206_tlb_entry *gtlbe, stlbe;
-   int tlbsel, esel, stlbsel, sesel;
+   struct kvm_book3e_206_tlb_entry *gtlbe;
+   int tlbsel, esel;
int recal = 0;
 
tlbsel = get_tlb_tlbsel(vcpu);
@@ -892,40 +892,17 @@ int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu)
 
/* Invalidate shadow mappings for the about-to-be-clobbered TLBE. */
if (tlbe_is_host_safe(vcpu, gtlbe)) {
-   u64 eaddr;
-   u64 raddr;
+   u64 eaddr = get_tlb_eaddr(gtlbe);
+   u64 raddr = get_tlb_raddr(gtlbe);
 
-   switch (tlbsel) {
-   case 0:
-   /* TLB0 */
+   if (tlbsel == 0) {
gtlbe->mas1 &= ~MAS1_TSIZE(~0);
gtlbe->mas1 |= MAS1_TSIZE(BOOK3E_PAGESZ_4K);
-
-   stlbsel = 0;
-   kvmppc_e500_tlb0_map(vcpu_e500, esel, &stlbe);
-   sesel = 0; /* unused */
-
-   break;
-
-   case 1:
-   /* TLB1 */
-   eaddr = get_tlb_eaddr(gtlbe);
-   raddr = get_tlb_raddr(gtlbe);
-
-   /* Create a 4KB mapping on the host.
-* If the guest wanted a large page,
-* only the first 4KB is mapped here and the rest
-* are mapped on the fly. */
-   stlbsel = 1;
-   sesel = kvmppc_e500_tlb1_map(vcpu_e500, eaddr,
-   raddr >> PAGE_SHIFT, gtlbe, &stlbe, esel);
-   break;
-
-   default:
-   BUG();
}
 
-   write_stlbe(vcpu_e500, gtlbe, &stlbe, stlbsel, sesel);
+   /* Premap the faulting page */
+   kvmppc_mmu_map(vcpu, eaddr, raddr,
+   index_of(tlbsel, esel) | KVM_E500_INDEX_FORCE_MAP);
}
 
kvmppc_set_exit_type(vcpu, EMULATED_TLBWE_EXITS);
@@ -1024,9 +1001,11 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, 
gpa_t gpaddr,
 {
struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
struct tlbe_priv *priv;
-   struct kvm_book3e_206_tlb_entry *gtlbe, stlbe;
+   struct kvm_book3e_206_tlb_entry *gtlbe, stlbe = {};
int tlbsel = tlbsel_of(index);
int esel = esel_of(index);
+   /* Needed for initial map, where we can't use the cached value */
+   int force_map = index & KVM_E500_INDEX_FORCE_MAP;
int stlbsel, sesel;
 
gtlbe = get_entry(vcpu_e500, tlbsel, esel);
@@ -1038,7 +1017,7 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, 
gpa_t gpaddr,
priv = &vcpu_e500->gtlb_priv[tlbsel][esel];
 
/* Only triggers after clear_tlb_refs */
-   if (unlikely(!(priv->ref.flags & E500_TLB_VALID)))
+   if (force_map || unlikely(!(priv->ref.flags & E500_TLB_VALID)))
kvmppc_e500_tlb0_map(vcpu_e500, esel, &stlbe);
else
kvmppc_e500_setup_stlbe(vcpu, gtlbe, BOOK3E_PAGESZ_4K,
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: PPC: e500: Implement TLB1-in-TLB0 mapping

2013-01-17 Thread Alexander Graf
When a host mapping fault happens in a guest TLB1 entry today, we
map the translated guest entry into the host's TLB1.

This isn't particularly clever when the guest is mapped by normal 4k
pages, since these would be a lot better to put into TLB0 instead.

This patch adds the required logic to map 4k TLB1 shadow maps into
the host's TLB0.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/e500.h  |1 +
 arch/powerpc/kvm/e500_mmu_host.c |   58 +
 2 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index 00f96d8..d32e6a8 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -28,6 +28,7 @@
 
 #define E500_TLB_VALID 1
 #define E500_TLB_BITMAP 2
+#define E500_TLB_TLB0  (1 << 2)
 
 struct tlbe_ref {
pfn_t pfn;
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 3bb2154..cbb6cf8 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -198,6 +198,11 @@ void inval_gtlbe_on_host(struct kvmppc_vcpu_e500 
*vcpu_e500, int tlbsel,
local_irq_restore(flags);
 
return;
+   } else if (tlbsel == 1 &&
+  vcpu_e500->gtlb_priv[1][esel].ref.flags & E500_TLB_TLB0) {
+   /* This is a slow path, so just invalidate everything */
+   kvmppc_e500_tlbil_all(vcpu_e500);
+   vcpu_e500->gtlb_priv[1][esel].ref.flags &= ~E500_TLB_TLB0;
}
 
/* Guest tlbe is backed by at most one host tlbe per shadow pid. */
@@ -453,24 +458,27 @@ static void kvmppc_e500_tlb0_map(struct kvmppc_vcpu_e500 
*vcpu_e500, int esel,
gtlbe, 0, stlbe, ref);
 }
 
-/* Caller must ensure that the specified guest TLB entry is safe to insert into
- * the shadow TLB. */
-/* XXX for both one-one and one-to-many , for now use TLB1 */
-static int kvmppc_e500_tlb1_map(struct kvmppc_vcpu_e500 *vcpu_e500,
-   u64 gvaddr, gfn_t gfn, struct kvm_book3e_206_tlb_entry *gtlbe,
-   struct kvm_book3e_206_tlb_entry *stlbe, int esel)
+static int kvmppc_e500_tlb1_map_tlb0(struct kvmppc_vcpu_e500 *vcpu_e500,
+struct tlbe_ref *ref,
+int esel)
 {
-   struct tlbe_ref *ref;
-   unsigned int victim;
+   /* Indicate that we're backing this TLB1 entry with TLB0 entries */
+   vcpu_e500->gtlb_priv[1][esel].ref.flags |= E500_TLB_TLB0;
 
-   victim = vcpu_e500->host_tlb1_nv++;
+   /* Indicate that we're not using TLB1 at all */
+   return -1;
+}
+
+static int kvmppc_e500_tlb1_map_tlb1(struct kvmppc_vcpu_e500 *vcpu_e500,
+struct tlbe_ref *ref,
+int esel)
+{
+   unsigned int victim = vcpu_e500->host_tlb1_nv++;
 
if (unlikely(vcpu_e500->host_tlb1_nv >= tlb1_max_shadow_size()))
vcpu_e500->host_tlb1_nv = 0;
 
-   ref = &vcpu_e500->tlb_refs[1][victim];
-   kvmppc_e500_shadow_map(vcpu_e500, gvaddr, gfn, gtlbe, 1, stlbe, ref);
-
+   vcpu_e500->tlb_refs[1][victim] = *ref;
vcpu_e500->g2h_tlb1_map[esel] |= (u64)1 << victim;
vcpu_e500->gtlb_priv[1][esel].ref.flags |= E500_TLB_BITMAP;
if (vcpu_e500->h2g_tlb1_rmap[victim]) {
@@ -482,6 +490,25 @@ static int kvmppc_e500_tlb1_map(struct kvmppc_vcpu_e500 
*vcpu_e500,
return victim;
 }
 
+/* Caller must ensure that the specified guest TLB entry is safe to insert into
+ * the shadow TLB. */
+/* For both one-one and one-to-many */
+static int kvmppc_e500_tlb1_map(struct kvmppc_vcpu_e500 *vcpu_e500,
+   u64 gvaddr, gfn_t gfn, struct kvm_book3e_206_tlb_entry *gtlbe,
+   struct kvm_book3e_206_tlb_entry *stlbe, int esel)
+{
+   struct tlbe_ref ref;
+
+   ref.flags = 0;
+   kvmppc_e500_shadow_map(vcpu_e500, gvaddr, gfn, gtlbe, 1, stlbe, &ref);
+
+   /* Use TLB0 when we can only map a page with 4k */
+   if (get_tlb_tsize(stlbe) == BOOK3E_PAGESZ_4K)
+   return kvmppc_e500_tlb1_map_tlb0(vcpu_e500, &ref, esel);
+   /* Otherwise map into TLB1 */
+   return kvmppc_e500_tlb1_map_tlb1(vcpu_e500, &ref, esel);
+}
+
 /* sesel is for tlb1 only */
 static void write_stlbe(struct kvmppc_vcpu_e500 *vcpu_e500,
struct kvm_book3e_206_tlb_entry *gtlbe,
@@ -529,9 +556,14 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, 
gpa_t gpaddr,
case 1: {
gfn_t gfn = gpaddr >> PAGE_SHIFT;
 
-   stlbsel = 1;
sesel = kvmppc_e500_tlb1_map(vcpu_e500, eaddr, gfn,
 gtlbe, &stlbe, esel);
+   if (sesel < 0) {
+   /* TLB0 mapping */
+   sesel = 0;
+   stlbsel = 0;
+   } else
+   stlbsel = 1;
break;
}
 
-- 

[PATCH 0/3] KVM: PPC: e500: Shadow TLB Improvements

2013-01-17 Thread Alexander Graf
This patch set improves the shadow TLB handling of our e500
target.

The really important bit here is that with these patches applied,
we can map guest TLB1 entries into the host's TLB0. This gives a
significant performance improvement as you can see below.

Alex

---

without patch, using 4k backed memory:

$ time for i in {1..1000}; do /bin/echo > /dev/null; done
real0m12.947s
user0m1.076s
sys 0m9.720s


with hugetlbfs:

$ time for i in {1..1000}; do /bin/echo > /dev/null; done

real0m3.262s
user0m0.464s
sys 0m0.236s


with patches applied, using 4k backed memory:

$ time for i in {1..1000}; do /bin/echo > /dev/null; done

real0m4.446s
user0m0.380s
sys 0m0.644s


Alexander Graf (3):
  KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping
  KVM: PPC: E500: Split host and guest MMU parts
  KVM: PPC: e500: Implement TLB1-in-TLB0 mapping

 arch/powerpc/kvm/Makefile|9 +-
 arch/powerpc/kvm/e500.h  |5 +-
 arch/powerpc/kvm/e500_mmu.c  |  812 +
 arch/powerpc/kvm/e500_mmu_host.c |  686 ++
 arch/powerpc/kvm/e500_mmu_host.h |   20 +
 arch/powerpc/kvm/e500_tlb.c  | 1430 --
 6 files changed, 1528 insertions(+), 1434 deletions(-)
 create mode 100644 arch/powerpc/kvm/e500_mmu.c
 create mode 100644 arch/powerpc/kvm/e500_mmu_host.c
 create mode 100644 arch/powerpc/kvm/e500_mmu_host.h
 delete mode 100644 arch/powerpc/kvm/e500_tlb.c

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 1/3] KVM: x86: clean up reexecute_instruction

2013-01-17 Thread Marcelo Tosatti
On Sun, Jan 13, 2013 at 11:44:12PM +0800, Xiao Guangrong wrote:
> Little cleanup for reexecute_instruction, also use gpa_to_gfn in
> retry_instruction
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/x86.c |   13 ++---
>  1 files changed, 6 insertions(+), 7 deletions(-)

Reviewed-by: Marcelo Tosatti 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] vhost-scsi: Add support for host virtualized target

2013-01-17 Thread Nicholas A. Bellinger
Hi MST & Co,

On Thu, 2013-01-17 at 18:43 +0200, Michael S. Tsirkin wrote:
> On Fri, Sep 07, 2012 at 06:48:14AM +, Nicholas A. Bellinger wrote:
> > From: Nicholas Bellinger 
> > 
> > Hello Anthony & Co,
> > 
> > This is the fourth installment to add host virtualized target support for
> > the mainline tcm_vhost fabric driver using Linux v3.6-rc into QEMU 1.3.0-rc.
> > 
> > The series is available directly from the following git branch:
> > 
> >git://git.kernel.org/pub/scm/virt/kvm/nab/qemu-kvm.git vhost-scsi-for-1.3
> > 
> > Note the code is cut against yesterday's QEMU head, and dispite the name
> > of the tree is based upon mainline qemu.org git code + has thus far been
> > running overnight with > 100K IOPs small block 4k workloads using v3.6-rc2+
> > based target code with RAMDISK_DR backstores.
> > 
> > Other than some minor fuzz between jumping from QEMU 1.2.0 -> 1.2.50, this
> > series is functionally identical to what's been posted for vhost-scsi RFC-v3
> > to qemu-devel.
> > 
> > Please consider applying these patches for an initial vhost-scsi merge into
> > QEMU 1.3.0-rc code, or let us know what else you'd like to see addressed for
> > this series to in order to merge.
> > 
> > Thank you!
> > 
> > --nab
> 
> OK what's the status here?
> We missed 1.3 but let's try not to miss 1.4?
> 

Unfortunately, I've not been able to get back to the conversion
requested by Paolo for a standalone vhost-scsi PCI device.

At this point my hands are still full with iSER-target for-3.9 kernel
code over the next weeks.  

What's the v1.4 feature cut-off looking like at this point..?

--nab

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tcm_vhost: Use llist for cmd completion list

2013-01-17 Thread Nicholas A. Bellinger
Hi Asias!

On Sun, 2013-01-06 at 14:36 +0800, Asias He wrote:
> This drops the cmd completion list spin lock and makes the cmd
> completion queue lock-less.
> 
> Signed-off-by: Asias He 
> ---

Apologies for the long delay to get back to this patch.

After some initial testing, I'm seeing about about a ~5K IOPs
performance increase to single RAMDISK_MCP (~110K to ~115K) on the heavy
mixed 75/25 4k randrw fio workload.

That said, I think it's fine to as a for-3.9 item.

Acked-by: Nicholas Bellinger 

MST, do you want to take this via your vhost tree, or shall I merge via
target-pending/for-next..?

Thanks,

--nab

>  drivers/vhost/tcm_vhost.c | 46 +-
>  drivers/vhost/tcm_vhost.h |  2 +-
>  2 files changed, 14 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
> index b20df5c..3720604 100644
> --- a/drivers/vhost/tcm_vhost.c
> +++ b/drivers/vhost/tcm_vhost.c
> @@ -47,6 +47,7 @@
>  #include 
>  #include  /* TODO vhost.h currently depends on this */
>  #include 
> +#include 
>  
>  #include "vhost.c"
>  #include "vhost.h"
> @@ -64,8 +65,7 @@ struct vhost_scsi {
>   struct vhost_virtqueue vqs[3];
>  
>   struct vhost_work vs_completion_work; /* cmd completion work item */
> - struct list_head vs_completion_list;  /* cmd completion queue */
> - spinlock_t vs_completion_lock;/* protects s_completion_list */
> + struct llist_head vs_completion_list; /* cmd completion queue */
>  };
>  
>  /* Local pointer to allocated TCM configfs fabric module */
> @@ -301,9 +301,7 @@ static void vhost_scsi_complete_cmd(struct tcm_vhost_cmd 
> *tv_cmd)
>  {
>   struct vhost_scsi *vs = tv_cmd->tvc_vhost;
>  
> - spin_lock_bh(&vs->vs_completion_lock);
> - list_add_tail(&tv_cmd->tvc_completion_list, &vs->vs_completion_list);
> - spin_unlock_bh(&vs->vs_completion_lock);
> + llist_add(&tv_cmd->tvc_completion_list, &vs->vs_completion_list);
>  
>   vhost_work_queue(&vs->dev, &vs->vs_completion_work);
>  }
> @@ -347,27 +345,6 @@ static void vhost_scsi_free_cmd(struct tcm_vhost_cmd 
> *tv_cmd)
>   kfree(tv_cmd);
>  }
>  
> -/* Dequeue a command from the completion list */
> -static struct tcm_vhost_cmd *vhost_scsi_get_cmd_from_completion(
> - struct vhost_scsi *vs)
> -{
> - struct tcm_vhost_cmd *tv_cmd = NULL;
> -
> - spin_lock_bh(&vs->vs_completion_lock);
> - if (list_empty(&vs->vs_completion_list)) {
> - spin_unlock_bh(&vs->vs_completion_lock);
> - return NULL;
> - }
> -
> - list_for_each_entry(tv_cmd, &vs->vs_completion_list,
> - tvc_completion_list) {
> - list_del(&tv_cmd->tvc_completion_list);
> - break;
> - }
> - spin_unlock_bh(&vs->vs_completion_lock);
> - return tv_cmd;
> -}
> -
>  /* Fill in status and signal that we are done processing this command
>   *
>   * This is scheduled in the vhost work queue so we are called with the owner
> @@ -377,12 +354,18 @@ static void vhost_scsi_complete_cmd_work(struct 
> vhost_work *work)
>  {
>   struct vhost_scsi *vs = container_of(work, struct vhost_scsi,
>   vs_completion_work);
> + struct virtio_scsi_cmd_resp v_rsp;
>   struct tcm_vhost_cmd *tv_cmd;
> + struct llist_node *llnode;
> + struct se_cmd *se_cmd;
> + int ret;
>  
> - while ((tv_cmd = vhost_scsi_get_cmd_from_completion(vs))) {
> - struct virtio_scsi_cmd_resp v_rsp;
> - struct se_cmd *se_cmd = &tv_cmd->tvc_se_cmd;
> - int ret;
> + llnode = llist_del_all(&vs->vs_completion_list);
> + while (llnode) {
> + tv_cmd = llist_entry(llnode, struct tcm_vhost_cmd,
> +  tvc_completion_list);
> + llnode = llist_next(llnode);
> + se_cmd = &tv_cmd->tvc_se_cmd;
>  
>   pr_debug("%s tv_cmd %p resid %u status %#02x\n", __func__,
>   tv_cmd, se_cmd->residual_count, se_cmd->scsi_status);
> @@ -426,7 +409,6 @@ static struct tcm_vhost_cmd *vhost_scsi_allocate_cmd(
>   pr_err("Unable to allocate struct tcm_vhost_cmd\n");
>   return ERR_PTR(-ENOMEM);
>   }
> - INIT_LIST_HEAD(&tv_cmd->tvc_completion_list);
>   tv_cmd->tvc_tag = v_req->tag;
>   tv_cmd->tvc_task_attr = v_req->task_attr;
>   tv_cmd->tvc_exp_data_len = exp_data_len;
> @@ -859,8 +841,6 @@ static int vhost_scsi_open(struct inode *inode, struct 
> file *f)
>   return -ENOMEM;
>  
>   vhost_work_init(&s->vs_completion_work, vhost_scsi_complete_cmd_work);
> - INIT_LIST_HEAD(&s->vs_completion_list);
> - spin_lock_init(&s->vs_completion_lock);
>  
>   s->vqs[VHOST_SCSI_VQ_CTL].handle_kick = vhost_scsi_ctl_handle_kick;
>   s->vqs[VHOST_SCSI_VQ_EVT].handle_kick = vhost_scsi_evt_handle_kick;
> diff --git a/drivers/vhost/tcm_vhost.h b/drivers/vhost/tcm_vhost.h
> index

Re: [Qemu-devel] [RFC 00/12] target-i386: Fix APIC-ID-based topology (v4)

2013-01-17 Thread Eduardo Habkost
On Wed, Jan 09, 2013 at 04:53:40PM -0200, Eduardo Habkost wrote:
> This series uses a much simpler approach than the previous versions:
> 
>  - The APIC ID calculation code is now inside cpu.c
>  - It doesn't require touching the PC CPU creation code at all
>  - It simply uses a static variable to enable the compat behavior on pc-1.3
>and older
>- I considered making the compat-apic-id setting a global property for the
>  X86CPU objects, the only problem is that the fw_cfg initialization code
>  on pc.c also depends on the compat behavior
> 
> I am sending this as RFC because it depends on two not-included-yet series:
>  - Andreas' qom-cpu-7 branch
>  - My cpu-enforce fixes series
> 
> I hope to be able to get this buf fix into QEMU 1.4. I don't know if we should
> try to get this before soft freeze, or we can include it after that.
> 

Andreas, Anthony: I still would like to include this fix in 1.4. Do you
think it would be reasonable?

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Clean up and optimize __kvm_set_memory_region() - part1

2013-01-17 Thread Alex Williamson
On Thu, 2013-01-17 at 19:58 +0200, Gleb Natapov wrote:
> On Thu, Jan 17, 2013 at 10:30:05AM -0700, Alex Williamson wrote:
> > On Thu, 2013-01-17 at 19:20 +0200, Gleb Natapov wrote:
> > > On Thu, Jan 17, 2013 at 10:12:06AM -0700, Alex Williamson wrote:
> > > > On Thu, 2013-01-17 at 09:29 +0200, Gleb Natapov wrote:
> > > > > On Thu, Jan 17, 2013 at 11:20:43AM +0800, Xiao Guangrong wrote:
> > > > > > On 01/17/2013 03:07 AM, Marcelo Tosatti wrote:
> > > > > > > On Fri, Jan 11, 2013 at 06:25:18PM +0900, Takuya Yoshikawa wrote:
> > > > > > >> Patches 1 to 3 are trivial.
> > > > > > >>
> > > > > > >> Patch 4 is the main cause of the increased lines, but I think 
> > > > > > >> the new
> > > > > > >> code makes it easier to understand why each condition in
> > > > > > >> __kvm_set_memory_region() is there.
> > > > > > >>
> > > > > > >> If you don't agree with patch 4, please consider taking the rest 
> > > > > > >> of the
> > > > > > >> series at this time.
> > > > > > >>
> > > > > > >> Takuya Yoshikawa (4):
> > > > > > >>   KVM: set_memory_region: Don't jump to out_free unnecessarily
> > > > > > >>   KVM: set_memory_region: Don't check for overlaps unless we 
> > > > > > >> create or move a slot
> > > > > > >>   KVM: set_memory_region: Remove unnecessary variable memslot
> > > > > > >>   KVM: set_memory_region: Identify the requested change 
> > > > > > >> explicitly
> > > > > > >>
> > > > > > >>  virt/kvm/kvm_main.c |   94 
> > > > > > >> ---
> > > > > > >>  1 files changed, 59 insertions(+), 35 deletions(-)
> > > > > > >>
> > > > > > >> -- 
> > > > > > >> 1.7.5.4
> > > > > > > 
> > > > > > > Reviewed-by: Marcelo Tosatti 
> > > > > > > 
> > > > > > > BTW, while at it, its probably worthwhile to restrict flags
> > > > > > > modifications: change from flags = 0 to flags = read-only is 
> > > > > > > incomplete. Xiao, should it be allowed only during creation?
> > > > > > 
> > > > > > Will Readonly memory be used for VM-mem-sharing in the future?
> > > > > > I remember you mentioned about mem-sharing things before. ;)
> > > > > > 
> > > > > > Actually, It is safe on KVM MMU because all the gfns in the slot
> > > > > > can become readonly after calling __kvm_set_memory_region. It is
> > > > > > unsafe on IOMMU, what need to be fixed is unmapping gfns on IOMMU
> > > > > > when the flag is changed.
> > > > > 
> > > > > Which means if we allow changing flags from 0 to read_only or back we
> > > > > need to call kvm_iommu_map_pages() when flags change. BTW as far as I
> > > > > see currently IOMMU maps everything as read/write not matter what 
> > > > > flags
> > > > > say. Looks like a bug. Alex?
> > > > 
> > > > That's correct, legacy device assignment is always r/w.  vfio handles
> > > > this correctly.  Thanks,
> > > > 
> > > Does this mean that device can write into read only memory?
> > 
> > For legacy assignment, I would expect yes.  On bare metal, DMA from an
> > I/O device can obviously circumvent processor based memory attributes,
> > but here that might also include things like ROMs too.  Thanks,
> > 
> Why dropping IOMMU_WRITE from flags if slot is read only in
> kvm_iommu_map_pages() will not work?

Something like:

--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -76,7 +76,9 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slo
gfn = slot->base_gfn;
end_gfn = gfn + slot->npages;
 
-   flags = IOMMU_READ | IOMMU_WRITE;
+   flags = IOMMU_READ;
+   if (slot->flags & KVM_MEM_READONLY)
+   flags |= IOMMU_WRITE;
if (kvm->arch.iommu_flags & KVM_IOMMU_CACHE_COHERENCY)
flags |= IOMMU_CACHE;
 
We'd still need the unmap/remap when flags change.  I'll do some testing
on the above and send as a proper patch.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Clean up and optimize __kvm_set_memory_region() - part1

2013-01-17 Thread Gleb Natapov
On Thu, Jan 17, 2013 at 10:30:05AM -0700, Alex Williamson wrote:
> On Thu, 2013-01-17 at 19:20 +0200, Gleb Natapov wrote:
> > On Thu, Jan 17, 2013 at 10:12:06AM -0700, Alex Williamson wrote:
> > > On Thu, 2013-01-17 at 09:29 +0200, Gleb Natapov wrote:
> > > > On Thu, Jan 17, 2013 at 11:20:43AM +0800, Xiao Guangrong wrote:
> > > > > On 01/17/2013 03:07 AM, Marcelo Tosatti wrote:
> > > > > > On Fri, Jan 11, 2013 at 06:25:18PM +0900, Takuya Yoshikawa wrote:
> > > > > >> Patches 1 to 3 are trivial.
> > > > > >>
> > > > > >> Patch 4 is the main cause of the increased lines, but I think the 
> > > > > >> new
> > > > > >> code makes it easier to understand why each condition in
> > > > > >> __kvm_set_memory_region() is there.
> > > > > >>
> > > > > >> If you don't agree with patch 4, please consider taking the rest 
> > > > > >> of the
> > > > > >> series at this time.
> > > > > >>
> > > > > >> Takuya Yoshikawa (4):
> > > > > >>   KVM: set_memory_region: Don't jump to out_free unnecessarily
> > > > > >>   KVM: set_memory_region: Don't check for overlaps unless we 
> > > > > >> create or move a slot
> > > > > >>   KVM: set_memory_region: Remove unnecessary variable memslot
> > > > > >>   KVM: set_memory_region: Identify the requested change explicitly
> > > > > >>
> > > > > >>  virt/kvm/kvm_main.c |   94 
> > > > > >> ---
> > > > > >>  1 files changed, 59 insertions(+), 35 deletions(-)
> > > > > >>
> > > > > >> -- 
> > > > > >> 1.7.5.4
> > > > > > 
> > > > > > Reviewed-by: Marcelo Tosatti 
> > > > > > 
> > > > > > BTW, while at it, its probably worthwhile to restrict flags
> > > > > > modifications: change from flags = 0 to flags = read-only is 
> > > > > > incomplete. Xiao, should it be allowed only during creation?
> > > > > 
> > > > > Will Readonly memory be used for VM-mem-sharing in the future?
> > > > > I remember you mentioned about mem-sharing things before. ;)
> > > > > 
> > > > > Actually, It is safe on KVM MMU because all the gfns in the slot
> > > > > can become readonly after calling __kvm_set_memory_region. It is
> > > > > unsafe on IOMMU, what need to be fixed is unmapping gfns on IOMMU
> > > > > when the flag is changed.
> > > > 
> > > > Which means if we allow changing flags from 0 to read_only or back we
> > > > need to call kvm_iommu_map_pages() when flags change. BTW as far as I
> > > > see currently IOMMU maps everything as read/write not matter what flags
> > > > say. Looks like a bug. Alex?
> > > 
> > > That's correct, legacy device assignment is always r/w.  vfio handles
> > > this correctly.  Thanks,
> > > 
> > Does this mean that device can write into read only memory?
> 
> For legacy assignment, I would expect yes.  On bare metal, DMA from an
> I/O device can obviously circumvent processor based memory attributes,
> but here that might also include things like ROMs too.  Thanks,
> 
Why dropping IOMMU_WRITE from flags if slot is read only in
kvm_iommu_map_pages() will not work?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Clean up and optimize __kvm_set_memory_region() - part1

2013-01-17 Thread Alex Williamson
On Thu, 2013-01-17 at 19:20 +0200, Gleb Natapov wrote:
> On Thu, Jan 17, 2013 at 10:12:06AM -0700, Alex Williamson wrote:
> > On Thu, 2013-01-17 at 09:29 +0200, Gleb Natapov wrote:
> > > On Thu, Jan 17, 2013 at 11:20:43AM +0800, Xiao Guangrong wrote:
> > > > On 01/17/2013 03:07 AM, Marcelo Tosatti wrote:
> > > > > On Fri, Jan 11, 2013 at 06:25:18PM +0900, Takuya Yoshikawa wrote:
> > > > >> Patches 1 to 3 are trivial.
> > > > >>
> > > > >> Patch 4 is the main cause of the increased lines, but I think the new
> > > > >> code makes it easier to understand why each condition in
> > > > >> __kvm_set_memory_region() is there.
> > > > >>
> > > > >> If you don't agree with patch 4, please consider taking the rest of 
> > > > >> the
> > > > >> series at this time.
> > > > >>
> > > > >> Takuya Yoshikawa (4):
> > > > >>   KVM: set_memory_region: Don't jump to out_free unnecessarily
> > > > >>   KVM: set_memory_region: Don't check for overlaps unless we create 
> > > > >> or move a slot
> > > > >>   KVM: set_memory_region: Remove unnecessary variable memslot
> > > > >>   KVM: set_memory_region: Identify the requested change explicitly
> > > > >>
> > > > >>  virt/kvm/kvm_main.c |   94 
> > > > >> ---
> > > > >>  1 files changed, 59 insertions(+), 35 deletions(-)
> > > > >>
> > > > >> -- 
> > > > >> 1.7.5.4
> > > > > 
> > > > > Reviewed-by: Marcelo Tosatti 
> > > > > 
> > > > > BTW, while at it, its probably worthwhile to restrict flags
> > > > > modifications: change from flags = 0 to flags = read-only is 
> > > > > incomplete. Xiao, should it be allowed only during creation?
> > > > 
> > > > Will Readonly memory be used for VM-mem-sharing in the future?
> > > > I remember you mentioned about mem-sharing things before. ;)
> > > > 
> > > > Actually, It is safe on KVM MMU because all the gfns in the slot
> > > > can become readonly after calling __kvm_set_memory_region. It is
> > > > unsafe on IOMMU, what need to be fixed is unmapping gfns on IOMMU
> > > > when the flag is changed.
> > > 
> > > Which means if we allow changing flags from 0 to read_only or back we
> > > need to call kvm_iommu_map_pages() when flags change. BTW as far as I
> > > see currently IOMMU maps everything as read/write not matter what flags
> > > say. Looks like a bug. Alex?
> > 
> > That's correct, legacy device assignment is always r/w.  vfio handles
> > this correctly.  Thanks,
> > 
> Does this mean that device can write into read only memory?

For legacy assignment, I would expect yes.  On bare metal, DMA from an
I/O device can obviously circumvent processor based memory attributes,
but here that might also include things like ROMs too.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [PATCH v6 02/13] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2013-01-17 Thread Peter Maydell
On 16 January 2013 18:00, Christoffer Dall
 wrote:
> KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

Patch subject needs updating with new name of this ioctl
(KVM_ARM_SET_DEVICE_ADDR)...

> On ARM (and possibly other architectures) some bits are specific to the
> model being emulated for the guest and user space needs a way to tell
> the kernel about those bits.  An example is mmio device base addresses,
> where KVM must know the base address for a given device to properly
> emulate mmio accesses within a certain address range or directly map a
> device with virtualiation extensions into the guest address space.

"virtualization", while I'm here.

> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -65,6 +65,19 @@ struct kvm_regs {
>  #define KVM_ARM_TARGET_CORTEX_A15  0
>  #define KVM_ARM_NUM_TARGETS1
>
> +/* KVM_SET_DEVICE_ADDRESS ioctl id encoding */
> +#define KVM_DEVICE_TYPE_SHIFT  0
> +#define KVM_DEVICE_TYPE_MASK   (0x << KVM_DEVICE_TYPE_SHIFT)
> +#define KVM_DEVICE_ID_SHIFT16
> +#define KVM_DEVICE_ID_MASK (0x << KVM_DEVICE_ID_SHIFT)

...and this comment and I guess these constant names presumably
should have "ARM" in them?

> +/* Available with KVM_CAP_SET_DEVICE_ADDR */

KVM_CAP_ARM_SET_DEVICE_ADDR.

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Clean up and optimize __kvm_set_memory_region() - part1

2013-01-17 Thread Gleb Natapov
On Thu, Jan 17, 2013 at 10:12:06AM -0700, Alex Williamson wrote:
> On Thu, 2013-01-17 at 09:29 +0200, Gleb Natapov wrote:
> > On Thu, Jan 17, 2013 at 11:20:43AM +0800, Xiao Guangrong wrote:
> > > On 01/17/2013 03:07 AM, Marcelo Tosatti wrote:
> > > > On Fri, Jan 11, 2013 at 06:25:18PM +0900, Takuya Yoshikawa wrote:
> > > >> Patches 1 to 3 are trivial.
> > > >>
> > > >> Patch 4 is the main cause of the increased lines, but I think the new
> > > >> code makes it easier to understand why each condition in
> > > >> __kvm_set_memory_region() is there.
> > > >>
> > > >> If you don't agree with patch 4, please consider taking the rest of the
> > > >> series at this time.
> > > >>
> > > >> Takuya Yoshikawa (4):
> > > >>   KVM: set_memory_region: Don't jump to out_free unnecessarily
> > > >>   KVM: set_memory_region: Don't check for overlaps unless we create or 
> > > >> move a slot
> > > >>   KVM: set_memory_region: Remove unnecessary variable memslot
> > > >>   KVM: set_memory_region: Identify the requested change explicitly
> > > >>
> > > >>  virt/kvm/kvm_main.c |   94 
> > > >> ---
> > > >>  1 files changed, 59 insertions(+), 35 deletions(-)
> > > >>
> > > >> -- 
> > > >> 1.7.5.4
> > > > 
> > > > Reviewed-by: Marcelo Tosatti 
> > > > 
> > > > BTW, while at it, its probably worthwhile to restrict flags
> > > > modifications: change from flags = 0 to flags = read-only is 
> > > > incomplete. Xiao, should it be allowed only during creation?
> > > 
> > > Will Readonly memory be used for VM-mem-sharing in the future?
> > > I remember you mentioned about mem-sharing things before. ;)
> > > 
> > > Actually, It is safe on KVM MMU because all the gfns in the slot
> > > can become readonly after calling __kvm_set_memory_region. It is
> > > unsafe on IOMMU, what need to be fixed is unmapping gfns on IOMMU
> > > when the flag is changed.
> > 
> > Which means if we allow changing flags from 0 to read_only or back we
> > need to call kvm_iommu_map_pages() when flags change. BTW as far as I
> > see currently IOMMU maps everything as read/write not matter what flags
> > say. Looks like a bug. Alex?
> 
> That's correct, legacy device assignment is always r/w.  vfio handles
> this correctly.  Thanks,
> 
Does this mean that device can write into read only memory?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 13/15] KVM: ARM: Handle I/O aborts

2013-01-17 Thread Christoffer Dall
On Thu, Jan 17, 2013 at 11:37 AM, Marc Zyngier  wrote:
> On 16/01/13 17:59, Christoffer Dall wrote:
>> When the guest accesses I/O memory this will create data abort
>> exceptions and they are handled by decoding the HSR information
>> (physical address, read/write, length, register) and forwarding reads
>> and writes to QEMU which performs the device emulation.
>>
>> Certain classes of load/store operations do not support the syndrome
>> information provided in the HSR.  We don't support decoding these (patches
>> are available elsewhere), so we report an error to user space in this case.
>>
>> This requires changing the general flow somewhat since new calls to run
>> the VCPU must check if there's a pending MMIO load and perform the write
>> after userspace has made the data available.
>>
>> Reviewed-by: Will Deacon 
>> Reviewed-by: Marcelo Tosatti 
>> Signed-off-by: Rusty Russell 
>> Signed-off-by: Marc Zyngier 
>> Signed-off-by: Christoffer Dall 
>
> [...]
>
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 04a9705..702743e 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -192,6 +192,44 @@ after_vfp_restore:
>> mov r0, r1  @ Return the return code
>> bx  lr  @ return to IOCTL
>>
>> +
>> +/
>> + * Translate VA to PA
>> + *
>> + * u64 __kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv)
>> + *
>> + * Arguments:
>> + *  r0: pointer to vcpu struct
>> + *  r1: virtual address to map (rounded to page)
>> + *  r2: 1 = P1 (read) mapping, 0 = P0 (read) mapping.
>> + * Returns 64 bit PAR value.
>> + */
>> +ENTRY(__kvm_va_to_pa)
>> +   push{r4-r12}
>> +
>> +   @ Fold flag into r1, easier than using stack.
>> +   cmp r2, #0
>> +   movne   r2, #1
>> +   orr r1, r1, r2
>> +
>> +   @ This swaps too many registers, but we're in the slow path anyway.
>> +   read_cp15_state store_to_vcpu = 0
>> +   write_cp15_state read_from_vcpu = 1
>> +
>> +   andsr2, r1, #1
>> +   bic r1, r1, r2
>> +   mcrne   p15, 0, r1, c7, c8, 0   @ VA to PA, ATS1CPR
>> +   mcreq   p15, 0, r1, c7, c8, 2   @ VA to PA, ATS1CUR
>> +   isb
>> +
>> +   @ Restore host state.
>> +   read_cp15_state store_to_vcpu = 1
>> +   write_cp15_state read_from_vcpu = 0
>> +
>> +   mrrcp15, 0, r0, r1, c7  @ PAR
>> +   pop {r4-r12}
>> +   bx  lr
>> +
>
> Do we still need this function? Now that the MMIO emulation is gone,
> there should be no need to perform a manual translation.
>
> I can't even find a caller for it either.
>
I'm trying to leave bits and pieces of the emulation code in here in
hope that it will spawn itself into a glorious unified solution that
decodes everything and that everybody will be happy with :)

I'll remove it. Thanks.
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Clean up and optimize __kvm_set_memory_region() - part1

2013-01-17 Thread Alex Williamson
On Thu, 2013-01-17 at 09:29 +0200, Gleb Natapov wrote:
> On Thu, Jan 17, 2013 at 11:20:43AM +0800, Xiao Guangrong wrote:
> > On 01/17/2013 03:07 AM, Marcelo Tosatti wrote:
> > > On Fri, Jan 11, 2013 at 06:25:18PM +0900, Takuya Yoshikawa wrote:
> > >> Patches 1 to 3 are trivial.
> > >>
> > >> Patch 4 is the main cause of the increased lines, but I think the new
> > >> code makes it easier to understand why each condition in
> > >> __kvm_set_memory_region() is there.
> > >>
> > >> If you don't agree with patch 4, please consider taking the rest of the
> > >> series at this time.
> > >>
> > >> Takuya Yoshikawa (4):
> > >>   KVM: set_memory_region: Don't jump to out_free unnecessarily
> > >>   KVM: set_memory_region: Don't check for overlaps unless we create or 
> > >> move a slot
> > >>   KVM: set_memory_region: Remove unnecessary variable memslot
> > >>   KVM: set_memory_region: Identify the requested change explicitly
> > >>
> > >>  virt/kvm/kvm_main.c |   94 
> > >> ---
> > >>  1 files changed, 59 insertions(+), 35 deletions(-)
> > >>
> > >> -- 
> > >> 1.7.5.4
> > > 
> > > Reviewed-by: Marcelo Tosatti 
> > > 
> > > BTW, while at it, its probably worthwhile to restrict flags
> > > modifications: change from flags = 0 to flags = read-only is 
> > > incomplete. Xiao, should it be allowed only during creation?
> > 
> > Will Readonly memory be used for VM-mem-sharing in the future?
> > I remember you mentioned about mem-sharing things before. ;)
> > 
> > Actually, It is safe on KVM MMU because all the gfns in the slot
> > can become readonly after calling __kvm_set_memory_region. It is
> > unsafe on IOMMU, what need to be fixed is unmapping gfns on IOMMU
> > when the flag is changed.
> 
> Which means if we allow changing flags from 0 to read_only or back we
> need to call kvm_iommu_map_pages() when flags change. BTW as far as I
> see currently IOMMU maps everything as read/write not matter what flags
> say. Looks like a bug. Alex?

That's correct, legacy device assignment is always r/w.  vfio handles
this correctly.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 04/12] Update linux headers.

2013-01-17 Thread Alexander Graf

On 17.01.2013, at 17:20, Cornelia Huck wrote:

> On Thu, 17 Jan 2013 15:05:46 +
> Peter Maydell  wrote:
> 
>> On 17 January 2013 14:23, Cornelia Huck  wrote:
>>> Base is kvm-next as of 2013/01/16.
>> 
>>> --- a/linux-headers/asm-powerpc/kvm_para.h
>>> +++ b/linux-headers/asm-powerpc/kvm_para.h
>>> @@ -78,7 +78,7 @@ struct kvm_vcpu_arch_shared {
>>> 
>>> #define KVM_HCALL_TOKEN(num) _EV_HCALL_TOKEN(EV_KVM_VENDOR_ID, num)
>>> 
>>> -#include 
>>> +#include 
>> 
>> This is reintroducing a bug, isn't it? cf
>> http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02807.html
>> 
>> (maybe this just means the sync needs to be against some
>> different set of kernel headers?)
>> 
>> -- PMM
>> 
> 
> Hm, I seem to have some stale copy here - will check.

No, the problem is that the fix went into master, but you were probably basing 
off of next. This is the way Linus envisions Linux development. Next is always 
a broken tree.


Alex


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] vhost-scsi: Add support for host virtualized target

2013-01-17 Thread Michael S. Tsirkin
On Fri, Sep 07, 2012 at 06:48:14AM +, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger 
> 
> Hello Anthony & Co,
> 
> This is the fourth installment to add host virtualized target support for
> the mainline tcm_vhost fabric driver using Linux v3.6-rc into QEMU 1.3.0-rc.
> 
> The series is available directly from the following git branch:
> 
>git://git.kernel.org/pub/scm/virt/kvm/nab/qemu-kvm.git vhost-scsi-for-1.3
> 
> Note the code is cut against yesterday's QEMU head, and dispite the name
> of the tree is based upon mainline qemu.org git code + has thus far been
> running overnight with > 100K IOPs small block 4k workloads using v3.6-rc2+
> based target code with RAMDISK_DR backstores.
> 
> Other than some minor fuzz between jumping from QEMU 1.2.0 -> 1.2.50, this
> series is functionally identical to what's been posted for vhost-scsi RFC-v3
> to qemu-devel.
> 
> Please consider applying these patches for an initial vhost-scsi merge into
> QEMU 1.3.0-rc code, or let us know what else you'd like to see addressed for
> this series to in order to merge.
> 
> Thank you!
> 
> --nab

OK what's the status here?
We missed 1.3 but let's try not to miss 1.4?

> Nicholas Bellinger (2):
>   monitor: Rename+move net_handle_fd_param -> monitor_handle_fd_param
>   virtio-scsi: Set max_target=0 during vhost-scsi operation
> 
> Stefan Hajnoczi (3):
>   vhost: Pass device path to vhost_dev_init()
>   vhost-scsi: add -vhost-scsi host device for use with tcm-vhost
>   virtio-scsi: Add start/stop functionality for vhost-scsi
> 
>  configure|   10 +++
>  hw/Makefile.objs |1 +
>  hw/qdev-properties.c |   41 +++
>  hw/vhost-scsi.c  |  190 
> ++
>  hw/vhost-scsi.h  |   62 
>  hw/vhost.c   |5 +-
>  hw/vhost.h   |3 +-
>  hw/vhost_net.c   |2 +-
>  hw/virtio-pci.c  |2 +
>  hw/virtio-scsi.c |   55 ++-
>  hw/virtio-scsi.h |1 +
>  monitor.c|   18 +
>  monitor.h|1 +
>  net.c|   18 -
>  net.h|2 -
>  net/socket.c |2 +-
>  net/tap.c|4 +-
>  qemu-common.h|1 +
>  qemu-config.c|   19 +
>  qemu-options.hx  |4 +
>  vl.c |   18 +
>  21 files changed, 431 insertions(+), 28 deletions(-)
>  create mode 100644 hw/vhost-scsi.c
>  create mode 100644 hw/vhost-scsi.h
> 
> -- 
> 1.7.2.5
> 
> ___
> Virtualization mailing list
> virtualizat...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 13/15] KVM: ARM: Handle I/O aborts

2013-01-17 Thread Marc Zyngier
On 16/01/13 17:59, Christoffer Dall wrote:
> When the guest accesses I/O memory this will create data abort
> exceptions and they are handled by decoding the HSR information
> (physical address, read/write, length, register) and forwarding reads
> and writes to QEMU which performs the device emulation.
> 
> Certain classes of load/store operations do not support the syndrome
> information provided in the HSR.  We don't support decoding these (patches
> are available elsewhere), so we report an error to user space in this case.
> 
> This requires changing the general flow somewhat since new calls to run
> the VCPU must check if there's a pending MMIO load and perform the write
> after userspace has made the data available.
> 
> Reviewed-by: Will Deacon 
> Reviewed-by: Marcelo Tosatti 
> Signed-off-by: Rusty Russell 
> Signed-off-by: Marc Zyngier 
> Signed-off-by: Christoffer Dall 

[...]

> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 04a9705..702743e 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -192,6 +192,44 @@ after_vfp_restore:
> mov r0, r1  @ Return the return code
> bx  lr  @ return to IOCTL
> 
> +
> +/
> + * Translate VA to PA
> + *
> + * u64 __kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv)
> + *
> + * Arguments:
> + *  r0: pointer to vcpu struct
> + *  r1: virtual address to map (rounded to page)
> + *  r2: 1 = P1 (read) mapping, 0 = P0 (read) mapping.
> + * Returns 64 bit PAR value.
> + */
> +ENTRY(__kvm_va_to_pa)
> +   push{r4-r12}
> +
> +   @ Fold flag into r1, easier than using stack.
> +   cmp r2, #0
> +   movne   r2, #1
> +   orr r1, r1, r2
> +
> +   @ This swaps too many registers, but we're in the slow path anyway.
> +   read_cp15_state store_to_vcpu = 0
> +   write_cp15_state read_from_vcpu = 1
> +
> +   andsr2, r1, #1
> +   bic r1, r1, r2
> +   mcrne   p15, 0, r1, c7, c8, 0   @ VA to PA, ATS1CPR
> +   mcreq   p15, 0, r1, c7, c8, 2   @ VA to PA, ATS1CUR
> +   isb
> +
> +   @ Restore host state.
> +   read_cp15_state store_to_vcpu = 1
> +   write_cp15_state read_from_vcpu = 0
> +
> +   mrrcp15, 0, r0, r1, c7  @ PAR
> +   pop {r4-r12}
> +   bx  lr
> +

Do we still need this function? Now that the MMIO emulation is gone,
there should be no need to perform a manual translation.

I can't even find a caller for it either.

M.
-- 
Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Clean up and optimize __kvm_set_memory_region() - part1

2013-01-17 Thread Gleb Natapov
On Thu, Jan 17, 2013 at 11:26:53PM +0900, Takuya Yoshikawa wrote:
> On Thu, 17 Jan 2013 14:42:02 +0200
> Gleb Natapov  wrote:
> 
> > Applied 1-3. It is not clear whether kvm_iommu_map_pages() should be called
> > when flags change, so not applying 4 for now.
> 
> Thank you.
> 
> Although I confirmed that kvm_iommu_map_pages() does nothing with
> the flags, it needs to be confirmed by the author.
> 
Yes, it may be a bug. I asked Alex in other email.

> Another thing I'm not 100% sure about the API is whether we can
> change the flags when we move the base: flags + move.  The current
> implementation allows it, but not sure if it's a good thing.
> 
>   Takuya

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 15/15] KVM: ARM: Add maintainer entry for KVM/ARM

2013-01-17 Thread Will Deacon
On Wed, Jan 16, 2013 at 05:59:25PM +, Christoffer Dall wrote:
> Add an entry in the MAINTAINERS file for KVM/ARM.
> 
> Cc: Russell King 
> Reviewed-by: Will Deacon 
> Signed-off-by: Christoffer Dall 
> ---
>  MAINTAINERS |8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 915564e..70e5a10 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4474,6 +4474,14 @@ F: arch/s390/include/asm/kvm*
>  F:   arch/s390/kvm/
>  F:   drivers/s390/kvm/
>  
> +KERNEL VIRTUAL MACHINE (KVM) FOR ARM
> +M:   Christoffer Dall 
> +L:   kvm...@lists.cs.columbia.edu
> +W:   http://http://systems.cs.columbia.edu/projects/kvm-arm

Oops, I missed that this URL has two http:// prefixes.

> +S:   Supported

This means that you're paid to look after KVM. If so, using a gmail address
seems a little odd...

> +F:   arch/arm/include/asm/kvm*

uapi?

> +F:   arch/arm/kvm/

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 04/12] Update linux headers.

2013-01-17 Thread Cornelia Huck
On Thu, 17 Jan 2013 15:05:46 +
Peter Maydell  wrote:

> On 17 January 2013 14:23, Cornelia Huck  wrote:
> > Base is kvm-next as of 2013/01/16.
> 
> > --- a/linux-headers/asm-powerpc/kvm_para.h
> > +++ b/linux-headers/asm-powerpc/kvm_para.h
> > @@ -78,7 +78,7 @@ struct kvm_vcpu_arch_shared {
> >
> >  #define KVM_HCALL_TOKEN(num) _EV_HCALL_TOKEN(EV_KVM_VENDOR_ID, num)
> >
> > -#include 
> > +#include 
> 
> This is reintroducing a bug, isn't it? cf
> http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02807.html
> 
> (maybe this just means the sync needs to be against some
> different set of kernel headers?)
> 
> -- PMM
> 

Hm, I seem to have some stale copy here - will check.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 14/15] KVM: ARM: Power State Coordination Interface implementation

2013-01-17 Thread Marc Zyngier
On 16/01/13 17:59, Christoffer Dall wrote:
> From: Marc Zyngier 
> 
> Implement the PSCI specification (ARM DEN 0022A) to control
> virtual CPUs being "powered" on or off.
> 
> PSCI/KVM is detected using the KVM_CAP_ARM_PSCI capability.
> 
> A virtual CPU can now be initialized in a "powered off" state,
> using the KVM_ARM_VCPU_POWER_OFF feature flag.
> 
> The guest can use either SMC or HVC to execute a PSCI function.
> 
> Reviewed-by: Will Deacon 
> Signed-off-by: Marc Zyngier 
> Signed-off-by: Christoffer Dall 

A few bits went wrong when you reworked this patch. See below.

> ---
>  Documentation/virtual/kvm/api.txt  |4 +
>  arch/arm/include/asm/kvm_emulate.h |5 ++
>  arch/arm/include/asm/kvm_host.h|5 +-
>  arch/arm/include/asm/kvm_psci.h|   23 
>  arch/arm/include/uapi/asm/kvm.h|   16 +
>  arch/arm/kvm/Makefile  |2 -
>  arch/arm/kvm/arm.c |   28 +-
>  arch/arm/kvm/psci.c|  105 
> 
>  include/uapi/linux/kvm.h   |1 
>  9 files changed, 184 insertions(+), 5 deletions(-)
>  create mode 100644 arch/arm/include/asm/kvm_psci.h
>  create mode 100644 arch/arm/kvm/psci.c
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 38066a7a..c25439a 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2185,6 +2185,10 @@ return ENOEXEC for that vcpu.
>  Note that because some registers reflect machine topology, all vcpus
>  should be created before this ioctl is invoked.
>  
> +Possible features:
> + - KVM_ARM_VCPU_POWER_OFF: Starts the CPU in a power-off state.
> +   Depends on KVM_CAP_ARM_PSCI.
> +
>  
>  4.78 KVM_GET_REG_LIST
>  
> diff --git a/arch/arm/include/asm/kvm_emulate.h 
> b/arch/arm/include/asm/kvm_emulate.h
> index 4c1a073..ba07de9 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -32,6 +32,11 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu);
>  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  
> +static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu)
> +{
> + return 1;
> +}
> +
>  static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
>  {
>   return (u32 *)&vcpu->arch.regs.usr_regs.ARM_pc;
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index e65fc96..c9ba918 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -30,7 +30,7 @@
>  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>  #define KVM_HAVE_ONE_REG
>  
> -#define KVM_VCPU_MAX_FEATURES 0
> +#define KVM_VCPU_MAX_FEATURES 1
>  
>  /* We don't currently support large pages. */
>  #define KVM_HPAGE_GFN_SHIFT(x)   0
> @@ -100,6 +100,9 @@ struct kvm_vcpu_arch {
>   int last_pcpu;
>   cpumask_t require_dcache_flush;
>  
> + /* Don't run the guest: see copy_current_insn() */

Now that you made this field PSCI-specific, how about rewording the comment?

> + bool pause;
> +
>   /* IO related fields */
>   struct kvm_decode mmio_decode;
>  
> diff --git a/arch/arm/include/asm/kvm_psci.h b/arch/arm/include/asm/kvm_psci.h
> new file mode 100644
> index 000..9a83d98
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_psci.h
> @@ -0,0 +1,23 @@
> +/*
> + * Copyright (C) 2012 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#ifndef __ARM_KVM_PSCI_H__
> +#define __ARM_KVM_PSCI_H__
> +
> +bool kvm_psci_call(struct kvm_vcpu *vcpu);
> +
> +#endif /* __ARM_KVM_PSCI_H__ */
> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index bbb6b23..3303ff5 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -65,6 +65,8 @@ struct kvm_regs {
>  #define KVM_ARM_TARGET_CORTEX_A150
>  #define KVM_ARM_NUM_TARGETS  1
>  
> +#define KVM_ARM_VCPU_POWER_OFF   0 /* CPU is started in OFF 
> state */
> +
>  struct kvm_vcpu_init {
>   __u32 target;
>   __u32 features[7];
> @@ -145,4 +147,18 @@ struct kvm_arch_memory_slot {
>  /* Highest supported SPI, from VGIC_NR_IRQS */
>  #define KVM_ARM_IRQ_GIC_MAX  127
>  
> +/* PSCI interface */
> +#define KVM_PSCI_FN_BASE 0x95c1ba5e
> +#define KVM_PSCI_F

Re: [Qemu-devel] [PATCH 04/12] Update linux headers.

2013-01-17 Thread Peter Maydell
On 17 January 2013 14:23, Cornelia Huck  wrote:
> Base is kvm-next as of 2013/01/16.

> --- a/linux-headers/asm-powerpc/kvm_para.h
> +++ b/linux-headers/asm-powerpc/kvm_para.h
> @@ -78,7 +78,7 @@ struct kvm_vcpu_arch_shared {
>
>  #define KVM_HCALL_TOKEN(num) _EV_HCALL_TOKEN(EV_KVM_VENDOR_ID, num)
>
> -#include 
> +#include 

This is reintroducing a bug, isn't it? cf
http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02807.html

(maybe this just means the sync needs to be against some
different set of kernel headers?)

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kbuild] [vfio:vfio-vga 3/5] drivers/vfio/pci/vfio_pci_rdwr.c:169 vfio_pci_bar_rw() warn: always true condition '(done >= 0) => (0-u32max >= 0)'

2013-01-17 Thread Fengguang Wu
On Wed, Jan 16, 2013 at 08:47:57PM -0700, Alex Williamson wrote:
> On Thu, 2013-01-17 at 09:21 +0800, Fengguang Wu wrote:
> > Hi Alex,
> > 
> > FYI, there are new smatch warnings show up in
> > 
> > tree:   git://github.com/awilliam/linux-vfio.git vfio-vga
> 
> Thanks for the bug report.  I never would have expected but reports from
> an unadvertised branch but it's a really cool service.  Thanks for
> providing it!

My pleasure! :)

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Clean up and optimize __kvm_set_memory_region() - part1

2013-01-17 Thread Takuya Yoshikawa
On Thu, 17 Jan 2013 14:42:02 +0200
Gleb Natapov  wrote:

> Applied 1-3. It is not clear whether kvm_iommu_map_pages() should be called
> when flags change, so not applying 4 for now.

Thank you.

Although I confirmed that kvm_iommu_map_pages() does nothing with
the flags, it needs to be confirmed by the author.

Another thing I'm not 100% sure about the API is whether we can
change the flags when we move the base: flags + move.  The current
implementation allows it, but not sure if it's a good thing.

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/12] s390: Lowcore mapping helper.

2013-01-17 Thread Cornelia Huck
Create a lowcore mapping helper that includes a check for sufficient
length.

Signed-off-by: Cornelia Huck 
---
 target-s390x/helper.c | 31 +--
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/target-s390x/helper.c b/target-s390x/helper.c
index 9a132e6..bf2b4d3 100644
--- a/target-s390x/helper.c
+++ b/target-s390x/helper.c
@@ -471,13 +471,32 @@ static uint64_t get_psw_mask(CPUS390XState *env)
 return r;
 }
 
+static LowCore *cpu_map_lowcore(CPUS390XState *env, hwaddr *len)
+{
+LowCore *lowcore;
+
+if (*len < sizeof(LowCore)) {
+cpu_abort(env, "Insufficient length %d for mapping lowcore\n",
+  (int) *len);
+}
+
+lowcore = cpu_physical_memory_map(env->psa, len, 1);
+
+return lowcore;
+}
+
+static void cpu_unmap_lowcore(LowCore *lowcore, hwaddr len)
+{
+cpu_physical_memory_unmap(lowcore, len, 1, len);
+}
+
 static void do_svc_interrupt(CPUS390XState *env)
 {
 uint64_t mask, addr;
 LowCore *lowcore;
 hwaddr len = TARGET_PAGE_SIZE;
 
-lowcore = cpu_physical_memory_map(env->psa, &len, 1);
+lowcore = cpu_map_lowcore(env, &len);
 
 lowcore->svc_code = cpu_to_be16(env->int_svc_code);
 lowcore->svc_ilen = cpu_to_be16(env->int_svc_ilen);
@@ -486,7 +505,7 @@ static void do_svc_interrupt(CPUS390XState *env)
 mask = be64_to_cpu(lowcore->svc_new_psw.mask);
 addr = be64_to_cpu(lowcore->svc_new_psw.addr);
 
-cpu_physical_memory_unmap(lowcore, len, 1, len);
+cpu_unmap_lowcore(lowcore, len);
 
 load_psw(env, mask, addr);
 }
@@ -513,7 +532,7 @@ static void do_program_interrupt(CPUS390XState *env)
 qemu_log_mask(CPU_LOG_INT, "%s: code=0x%x ilen=%d\n",
   __func__, env->int_pgm_code, ilen);
 
-lowcore = cpu_physical_memory_map(env->psa, &len, 1);
+lowcore = cpu_map_lowcore(env, &len);
 
 lowcore->pgm_ilen = cpu_to_be16(ilen);
 lowcore->pgm_code = cpu_to_be16(env->int_pgm_code);
@@ -522,7 +541,7 @@ static void do_program_interrupt(CPUS390XState *env)
 mask = be64_to_cpu(lowcore->program_new_psw.mask);
 addr = be64_to_cpu(lowcore->program_new_psw.addr);
 
-cpu_physical_memory_unmap(lowcore, len, 1, len);
+cpu_unmap_lowcore(lowcore, len);
 
 DPRINTF("%s: %x %x %" PRIx64 " %" PRIx64 "\n", __func__,
 env->int_pgm_code, ilen, env->psw.mask,
@@ -549,7 +568,7 @@ static void do_ext_interrupt(CPUS390XState *env)
 }
 
 q = &env->ext_queue[env->ext_index];
-lowcore = cpu_physical_memory_map(env->psa, &len, 1);
+lowcore = cpu_map_lowcore(env, &len);
 
 lowcore->ext_int_code = cpu_to_be16(q->code);
 lowcore->ext_params = cpu_to_be32(q->param);
@@ -560,7 +579,7 @@ static void do_ext_interrupt(CPUS390XState *env)
 mask = be64_to_cpu(lowcore->external_new_psw.mask);
 addr = be64_to_cpu(lowcore->external_new_psw.addr);
 
-cpu_physical_memory_unmap(lowcore, len, 1, len);
+cpu_unmap_lowcore(lowcore, len);
 
 env->ext_index--;
 if (env->ext_index == -1) {
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/12] s390: Virtual channel subsystem support.

2013-01-17 Thread Cornelia Huck
Provide a mechanism for qemu to provide fully virtual subchannels to
the guest.

Signed-off-by: Cornelia Huck 
---
 hw/s390x/Makefile.objs |1 +
 hw/s390x/css.c | 1131 
 hw/s390x/css.h |   92 
 target-s390x/cpu.h |   65 +++
 trace-events   |8 +
 5 files changed, 1297 insertions(+)
 create mode 100644 hw/s390x/css.c
 create mode 100644 hw/s390x/css.h

diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
index ae87a12..029a0b2 100644
--- a/hw/s390x/Makefile.objs
+++ b/hw/s390x/Makefile.objs
@@ -5,3 +5,4 @@ obj-y += s390-virtio-hcall.o
 obj-y += sclp.o
 obj-y += event-facility.o
 obj-y += sclpquiesce.o sclpconsole.o
+obj-y += css.o
diff --git a/hw/s390x/css.c b/hw/s390x/css.c
new file mode 100644
index 000..60372f1
--- /dev/null
+++ b/hw/s390x/css.c
@@ -0,0 +1,1131 @@
+/*
+ * Channel subsystem base support.
+ *
+ * Copyright 2012 IBM Corp.
+ * Author(s): Cornelia Huck 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include 
+#include "qemu/bitops.h"
+#include 
+#include "cpu.h"
+#include "ioinst.h"
+#include "css.h"
+#include "trace.h"
+
+typedef struct CrwContainer {
+CRW crw;
+QTAILQ_ENTRY(CrwContainer) sibling;
+} CrwContainer;
+
+typedef struct ChpInfo {
+uint8_t in_use;
+uint8_t type;
+uint8_t is_virtual;
+} ChpInfo;
+
+typedef struct SubchSet {
+SubchDev *sch[MAX_SCHID + 1];
+unsigned long schids_used[BITS_TO_LONGS(MAX_SCHID + 1)];
+unsigned long devnos_used[BITS_TO_LONGS(MAX_SCHID + 1)];
+} SubchSet;
+
+typedef struct CssImage {
+SubchSet *sch_set[MAX_SSID + 1];
+ChpInfo chpids[MAX_CHPID + 1];
+} CssImage;
+
+typedef struct ChannelSubSys {
+QTAILQ_HEAD(, CrwContainer) pending_crws;
+bool do_crw_mchk;
+bool crws_lost;
+uint8_t max_cssid;
+uint8_t max_ssid;
+bool chnmon_active;
+uint64_t chnmon_area;
+CssImage *css[MAX_CSSID + 1];
+uint8_t default_cssid;
+} ChannelSubSys;
+
+static ChannelSubSys *channel_subsys;
+
+int css_create_css_image(uint8_t cssid, bool default_image)
+{
+trace_css_new_image(cssid, default_image ? "(default)" : "");
+if (cssid > MAX_CSSID) {
+return -EINVAL;
+}
+if (channel_subsys->css[cssid]) {
+return -EBUSY;
+}
+channel_subsys->css[cssid] = g_try_malloc0(sizeof(CssImage));
+if (!channel_subsys->css[cssid]) {
+return -ENOMEM;
+}
+if (default_image) {
+channel_subsys->default_cssid = cssid;
+}
+return 0;
+}
+
+static void css_inject_io_interrupt(SubchDev *sch)
+{
+S390CPU *cpu = s390_cpu_addr2state(0);
+uint8_t isc = (sch->curr_status.pmcw.flags & PMCW_FLAGS_MASK_ISC) >> 11;
+
+trace_css_io_interrupt(sch->cssid, sch->ssid, sch->schid,
+   sch->curr_status.pmcw.intparm, isc, "");
+s390_io_interrupt(cpu,
+  channel_subsys->max_cssid > 0 ?
+  (sch->cssid << 8) | (1 << 3) | (sch->ssid << 1) | 1 :
+  (sch->ssid << 1) | 1,
+  sch->schid,
+  sch->curr_status.pmcw.intparm,
+  (0x80 >> isc) << 24);
+}
+
+void css_conditional_io_interrupt(SubchDev *sch)
+{
+/*
+ * If the subchannel is not currently status pending, make it pending
+ * with alert status.
+ */
+if (sch && !(sch->curr_status.scsw.ctrl & SCSW_STCTL_STATUS_PEND)) {
+S390CPU *cpu = s390_cpu_addr2state(0);
+uint8_t isc = (sch->curr_status.pmcw.flags & PMCW_FLAGS_MASK_ISC) >> 
11;
+
+trace_css_io_interrupt(sch->cssid, sch->ssid, sch->schid,
+   sch->curr_status.pmcw.intparm, isc,
+   "(unsolicited)");
+sch->curr_status.scsw.ctrl &= ~SCSW_CTRL_MASK_STCTL;
+sch->curr_status.scsw.ctrl |=
+SCSW_STCTL_ALERT | SCSW_STCTL_STATUS_PEND;
+/* Inject an I/O interrupt. */
+s390_io_interrupt(cpu,
+  channel_subsys->max_cssid > 0 ?
+  (sch->cssid << 8) | (1 << 3) | (sch->ssid << 1) | 1 :
+  (sch->ssid << 1) | 1,
+  sch->schid,
+  sch->curr_status.pmcw.intparm,
+  (0x80 >> isc) << 24);
+}
+}
+
+static void sch_handle_clear_func(SubchDev *sch)
+{
+PMCW *p = &sch->curr_status.pmcw;
+SCSW *s = &sch->curr_status.scsw;
+int path;
+
+/* Path management: In our simple css, we always choose the only path. */
+path = 0x80;
+
+/* Reset values prior to 'issueing the clear signal'. */
+p->lpum = 0;
+p->pom = 0xff;
+s->flags &= ~SCSW_FLAGS_MASK_PNO;
+
+/* We always 'attempt to issue the clear signal', and we always succeed. */
+sch->orb = NULL;
+sch->channel_prog = NULL;
+sch->last_cmd = NULL;

[PATCH 05/12] s390: Channel I/O basic defintions.

2013-01-17 Thread Cornelia Huck
Basic channel I/O structures and helper function.

Signed-off-by: Cornelia Huck 
---
 target-s390x/Makefile.objs |   2 +-
 target-s390x/cpu.h |   5 ++
 target-s390x/ioinst.c  |  36 
 target-s390x/ioinst.h  | 207 +
 4 files changed, 249 insertions(+), 1 deletion(-)
 create mode 100644 target-s390x/ioinst.c
 create mode 100644 target-s390x/ioinst.h

diff --git a/target-s390x/Makefile.objs b/target-s390x/Makefile.objs
index e728abf..3afb0b7 100644
--- a/target-s390x/Makefile.objs
+++ b/target-s390x/Makefile.objs
@@ -1,4 +1,4 @@
 obj-y += translate.o helper.o cpu.o interrupt.o
 obj-y += int_helper.o fpu_helper.o cc_helper.o mem_helper.o misc_helper.o
-obj-$(CONFIG_SOFTMMU) += machine.o
+obj-$(CONFIG_SOFTMMU) += machine.o ioinst.o
 obj-$(CONFIG_KVM) += kvm.o
diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
index cd729d3..931ed4d 100644
--- a/target-s390x/cpu.h
+++ b/target-s390x/cpu.h
@@ -352,6 +352,11 @@ static inline unsigned s390_del_running_cpu(CPUS390XState 
*env)
 void cpu_lock(void);
 void cpu_unlock(void);
 
+typedef struct SCHIB SCHIB;
+typedef struct ORB ORB;
+typedef struct IRB IRB;
+typedef struct CRW CRW;
+
 static inline void cpu_set_tls(CPUS390XState *env, target_ulong newtls)
 {
 env->aregs[0] = newtls >> 32;
diff --git a/target-s390x/ioinst.c b/target-s390x/ioinst.c
new file mode 100644
index 000..06a16ee
--- /dev/null
+++ b/target-s390x/ioinst.c
@@ -0,0 +1,36 @@
+/*
+ * I/O instructions for S/390
+ *
+ * Copyright 2012 IBM Corp.
+ * Author(s): Cornelia Huck 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include 
+
+#include "cpu.h"
+#include "ioinst.h"
+
+int ioinst_disassemble_sch_ident(uint32_t value, int *m, int *cssid, int *ssid,
+ int *schid)
+{
+if (!IOINST_SCHID_ONE(value)) {
+return -EINVAL;
+}
+if (!IOINST_SCHID_M(value)) {
+if (IOINST_SCHID_CSSID(value)) {
+return -EINVAL;
+}
+*cssid = 0;
+*m = 0;
+} else {
+*cssid = IOINST_SCHID_CSSID(value);
+*m = 1;
+}
+*ssid = IOINST_SCHID_SSID(value);
+*schid = IOINST_SCHID_NR(value);
+return 0;
+}
diff --git a/target-s390x/ioinst.h b/target-s390x/ioinst.h
new file mode 100644
index 000..5580d91
--- /dev/null
+++ b/target-s390x/ioinst.h
@@ -0,0 +1,207 @@
+/*
+ * S/390 channel I/O instructions
+ *
+ * Copyright 2012 IBM Corp.
+ * Author(s): Cornelia Huck 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+*/
+
+#ifndef IOINST_S390X_H
+#define IOINST_S390X_H
+/*
+ * Channel I/O related definitions, as defined in the Principles
+ * Of Operation (and taken from the Linux implementation).
+ */
+
+/* subchannel status word (command mode only) */
+typedef struct SCSW {
+uint16_t flags;
+uint16_t ctrl;
+uint32_t cpa;
+uint8_t dstat;
+uint8_t cstat;
+uint16_t count;
+} QEMU_PACKED SCSW;
+
+#define SCSW_FLAGS_MASK_KEY 0xf000
+#define SCSW_FLAGS_MASK_SCTL 0x0800
+#define SCSW_FLAGS_MASK_ESWF 0x0400
+#define SCSW_FLAGS_MASK_CC 0x0300
+#define SCSW_FLAGS_MASK_FMT 0x0080
+#define SCSW_FLAGS_MASK_PFCH 0x0040
+#define SCSW_FLAGS_MASK_ISIC 0x0020
+#define SCSW_FLAGS_MASK_ALCC 0x0010
+#define SCSW_FLAGS_MASK_SSI 0x0008
+#define SCSW_FLAGS_MASK_ZCC 0x0004
+#define SCSW_FLAGS_MASK_ECTL 0x0002
+#define SCSW_FLAGS_MASK_PNO 0x0001
+
+#define SCSW_CTRL_MASK_FCTL 0x7000
+#define SCSW_CTRL_MASK_ACTL 0x0fe0
+#define SCSW_CTRL_MASK_STCTL 0x001f
+
+#define SCSW_FCTL_CLEAR_FUNC 0x1000
+#define SCSW_FCTL_HALT_FUNC 0x2000
+#define SCSW_FCTL_START_FUNC 0x4000
+
+#define SCSW_ACTL_SUSP 0x0020
+#define SCSW_ACTL_DEVICE_ACTIVE 0x0040
+#define SCSW_ACTL_SUBCH_ACTIVE 0x0080
+#define SCSW_ACTL_CLEAR_PEND 0x0100
+#define SCSW_ACTL_HALT_PEND  0x0200
+#define SCSW_ACTL_START_PEND 0x0400
+#define SCSW_ACTL_RESUME_PEND 0x0800
+
+#define SCSW_STCTL_STATUS_PEND 0x0001
+#define SCSW_STCTL_SECONDARY 0x0002
+#define SCSW_STCTL_PRIMARY 0x0004
+#define SCSW_STCTL_INTERMEDIATE 0x0008
+#define SCSW_STCTL_ALERT 0x0010
+
+#define SCSW_DSTAT_ATTENTION 0x80
+#define SCSW_DSTAT_STAT_MOD  0x40
+#define SCSW_DSTAT_CU_END0x20
+#define SCSW_DSTAT_BUSY  0x10
+#define SCSW_DSTAT_CHANNEL_END   0x08
+#define SCSW_DSTAT_DEVICE_END0x04
+#define SCSW_DSTAT_UNIT_CHECK0x02
+#define SCSW_DSTAT_UNIT_EXCEP0x01
+
+#define SCSW_CSTAT_PCI   0x80
+#define SCSW_CSTAT_INCORR_LEN0x40
+#define SCSW_CSTAT_PROG_CHECK0x20
+#define SCSW_CSTAT_PROT_CHECK0x10
+#define SCSW_CSTAT_DATA_CHECK0x08
+#define SCSW_CSTAT_CHN_CTRL_CHK  0x04
+#define SCSW_CSTAT_INTF_CTRL_CHK 0x02
+#define SCSW_CSTAT_CHAIN_CHECK   0x01
+
+/* path management control word */
+typedef struct PMCW {
+uint32_t intparm;
+ 

[PATCH 04/12] Update linux headers.

2013-01-17 Thread Cornelia Huck
Base is kvm-next as of 2013/01/16.

Signed-off-by: Cornelia Huck 
---
 linux-headers/asm-generic/kvm_para.h |  4 
 linux-headers/asm-powerpc/kvm_para.h |  2 +-
 linux-headers/linux/kvm.h| 21 +
 3 files changed, 26 insertions(+), 1 deletion(-)
 create mode 100644 linux-headers/asm-generic/kvm_para.h

diff --git a/linux-headers/asm-generic/kvm_para.h 
b/linux-headers/asm-generic/kvm_para.h
new file mode 100644
index 000..486f0af
--- /dev/null
+++ b/linux-headers/asm-generic/kvm_para.h
@@ -0,0 +1,4 @@
+/*
+ * There isn't anything here, but the file must not be empty or patch
+ * will delete it.
+ */
diff --git a/linux-headers/asm-powerpc/kvm_para.h 
b/linux-headers/asm-powerpc/kvm_para.h
index 7e64f57..484bcaa 100644
--- a/linux-headers/asm-powerpc/kvm_para.h
+++ b/linux-headers/asm-powerpc/kvm_para.h
@@ -78,7 +78,7 @@ struct kvm_vcpu_arch_shared {
 
 #define KVM_HCALL_TOKEN(num) _EV_HCALL_TOKEN(EV_KVM_VENDOR_ID, num)
 
-#include 
+#include 
 
 #define KVM_FEATURE_MAGIC_PAGE 1
 
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index bfdbf4d..2602437 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -168,6 +168,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_PAPR_HCALL  19
 #define KVM_EXIT_S390_UCONTROL   20
 #define KVM_EXIT_WATCHDOG 21
+#define KVM_EXIT_S390_TSCH22
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -285,6 +286,15 @@ struct kvm_run {
__u64 ret;
__u64 args[9];
} papr_hcall;
+   /* KVM_EXIT_S390_TSCH */
+   struct {
+   __u16 subchannel_id;
+   __u16 subchannel_nr;
+   __u32 io_int_parm;
+   __u32 io_int_word;
+   __u32 ipb;
+   __u8 dequeued;
+   } s390_tsch;
/* Fix the size of the union. */
char padding[256];
};
@@ -397,10 +407,20 @@ struct kvm_s390_psw {
 #define KVM_S390_PROGRAM_INT   0xfffe0001u
 #define KVM_S390_SIGP_SET_PREFIX   0xfffe0002u
 #define KVM_S390_RESTART   0xfffe0003u
+#define KVM_S390_MCHK  0xfffe1000u
 #define KVM_S390_INT_VIRTIO0x2603u
 #define KVM_S390_INT_SERVICE   0x2401u
 #define KVM_S390_INT_EMERGENCY 0x1201u
 #define KVM_S390_INT_EXTERNAL_CALL 0x1202u
+/* Anything below 0xfffeu is taken by INT_IO */
+#define KVM_S390_INT_IO(ai,cssid,ssid,schid)   \
+   (((schid)) |   \
+((ssid) << 16) |  \
+((cssid) << 18) | \
+((ai) << 26))
+#define KVM_S390_INT_IO_MIN0xu
+#define KVM_S390_INT_IO_MAX0xfffdu
+
 
 struct kvm_s390_interrupt {
__u32 type;
@@ -635,6 +655,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_IRQFD_RESAMPLE 82
 #define KVM_CAP_PPC_BOOKE_WATCHDOG 83
 #define KVM_CAP_PPC_HTAB_FD 84
+#define KVM_CAP_S390_CSS_SUPPORT 85
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/12] s390-virtio: Factor out some initialization code.

2013-01-17 Thread Cornelia Huck
Some of the machine initialization for s390-virtio will be reused
by virtio-ccw.

Signed-off-by: Cornelia Huck 
---
 hw/s390-virtio.c | 155 ++-
 hw/s390-virtio.h |   5 ++
 2 files changed, 91 insertions(+), 69 deletions(-)

diff --git a/hw/s390-virtio.c b/hw/s390-virtio.c
index bded30b..603f6b0 100644
--- a/hw/s390-virtio.c
+++ b/hw/s390-virtio.c
@@ -155,62 +155,10 @@ unsigned s390_del_running_cpu(CPUS390XState *env)
 return s390_running_cpus;
 }
 
-/* PC hardware initialisation */
-static void s390_init(QEMUMachineInitArgs *args)
+void s390_init_cpus(const char *cpu_model, uint8_t *storage_keys)
 {
-ram_addr_t my_ram_size = args->ram_size;
-const char *cpu_model = args->cpu_model;
-const char *kernel_filename = args->kernel_filename;
-const char *kernel_cmdline = args->kernel_cmdline;
-const char *initrd_filename = args->initrd_filename;
-CPUS390XState *env = NULL;
-MemoryRegion *sysmem = get_system_memory();
-MemoryRegion *ram = g_new(MemoryRegion, 1);
-ram_addr_t kernel_size = 0;
-ram_addr_t initrd_offset;
-ram_addr_t initrd_size = 0;
-int shift = 0;
-uint8_t *storage_keys;
-void *virtio_region;
-hwaddr virtio_region_len;
-hwaddr virtio_region_start;
 int i;
 
-/* s390x ram size detection needs a 16bit multiplier + an increment. So
-   guests > 64GB can be specified in 2MB steps etc. */
-while ((my_ram_size >> (20 + shift)) > 65535) {
-shift++;
-}
-my_ram_size = my_ram_size >> (20 + shift) << (20 + shift);
-
-/* lets propagate the changed ram size into the global variable. */
-ram_size = my_ram_size;
-
-/* get a BUS */
-s390_bus = s390_virtio_bus_init(&my_ram_size);
-s390_sclp_init();
-
-/* register hypercalls */
-s390_virtio_register_hcalls();
-
-/* allocate RAM */
-memory_region_init_ram(ram, "s390.ram", my_ram_size);
-vmstate_register_ram_global(ram);
-memory_region_add_subregion(sysmem, 0, ram);
-
-/* clear virtio region */
-virtio_region_len = my_ram_size - ram_size;
-virtio_region_start = ram_size;
-virtio_region = cpu_physical_memory_map(virtio_region_start,
-&virtio_region_len, true);
-memset(virtio_region, 0, virtio_region_len);
-cpu_physical_memory_unmap(virtio_region, virtio_region_len, 1,
-  virtio_region_len);
-
-/* allocate storage keys */
-storage_keys = g_malloc0(my_ram_size / TARGET_PAGE_SIZE);
-
-/* init CPUs */
 if (cpu_model == NULL) {
 cpu_model = "host";
 }
@@ -219,21 +167,27 @@ static void s390_init(QEMUMachineInitArgs *args)
 
 for (i = 0; i < smp_cpus; i++) {
 S390CPU *cpu;
-CPUS390XState *tmp_env;
 
 cpu = cpu_s390x_init(cpu_model);
-tmp_env = &cpu->env;
-if (!env) {
-env = tmp_env;
-}
+
 ipi_states[i] = cpu;
-tmp_env->halted = 1;
-tmp_env->exception_index = EXCP_HLT;
-tmp_env->storage_keys = storage_keys;
+cpu->env.halted = 1;
+cpu->env.exception_index = EXCP_HLT;
+cpu->env.storage_keys = storage_keys;
 }
+}
+
+void s390_set_up_kernel(const char *kernel_filename,
+const char *kernel_cmdline,
+const char *initrd_filename)
+{
+ram_addr_t kernel_size = 0;
+ram_addr_t initrd_offset;
+ram_addr_t initrd_size = 0;
+S390CPU *cpu = s390_cpu_addr2state(0);
 
 /* One CPU has to run */
-s390_add_running_cpu(env);
+s390_add_running_cpu(&cpu->env);
 
 if (kernel_filename) {
 
@@ -252,8 +206,8 @@ static void s390_init(QEMUMachineInitArgs *args)
  * value was 0x800 (the SALIPL loader) and it wont work. For
  * all (Linux) cases 0x1 (KERN_IMAGE_START) should be fine.
  */
-env->psw.addr = KERN_IMAGE_START;
-env->psw.mask = 0x00018000ULL;
+cpu->env.psw.addr = KERN_IMAGE_START;
+cpu->env.psw.mask = 0x00018000ULL;
 } else {
 ram_addr_t bios_size = 0;
 char *bios_filename;
@@ -275,8 +229,8 @@ static void s390_init(QEMUMachineInitArgs *args)
 hw_error("stage1 bootloader is > 4k\n");
 }
 
-env->psw.addr = ZIPL_START;
-env->psw.mask = 0x00018000ULL;
+cpu->env.psw.addr = ZIPL_START;
+cpu->env.psw.mask = 0x00018000ULL;
 }
 
 if (initrd_filename) {
@@ -302,9 +256,13 @@ static void s390_init(QEMUMachineInitArgs *args)
 memcpy(rom_ptr(KERN_PARM_AREA), kernel_cmdline,
strlen(kernel_cmdline) + 1);
 }
+}
 
-/* Create VirtIO network adapters */
-for(i = 0; i < nb_nics; i++) {
+void s390_create_virtio_net(BusState *bus, const char *name)
+{
+int i;
+
+for (i = 0; i < nb_nics; i++) {
 NICInfo *nd = &nd_table[i];
 DeviceState *dev;
 
@@ -317,12 +275,71 @@ static void s39

[PATCH 10/12] s390: Add new channel I/O based virtio transport.

2013-01-17 Thread Cornelia Huck
Add a new virtio transport that uses channel commands to perform
virtio operations.

Signed-off-by: Cornelia Huck 
---
 hw/s390x/Makefile.objs |   1 +
 hw/s390x/virtio-ccw.c  | 906 +
 hw/s390x/virtio-ccw.h  |  79 +
 trace-events   |   4 +
 4 files changed, 990 insertions(+)
 create mode 100644 hw/s390x/virtio-ccw.c
 create mode 100644 hw/s390x/virtio-ccw.h

diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
index 029a0b2..71ad255 100644
--- a/hw/s390x/Makefile.objs
+++ b/hw/s390x/Makefile.objs
@@ -6,3 +6,4 @@ obj-y += sclp.o
 obj-y += event-facility.o
 obj-y += sclpquiesce.o sclpconsole.o
 obj-y += css.o
+obj-y += virtio-ccw.o
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
new file mode 100644
index 000..cb15965
--- /dev/null
+++ b/hw/s390x/virtio-ccw.c
@@ -0,0 +1,906 @@
+/*
+ * virtio ccw target implementation
+ *
+ * Copyright 2012 IBM Corp.
+ * Author(s): Cornelia Huck 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include 
+#include "block/block.h"
+#include "sysemu/blockdev.h"
+#include "sysemu/sysemu.h"
+#include "net/net.h"
+#include "monitor/monitor.h"
+#include "hw/virtio.h"
+#include "hw/virtio-serial.h"
+#include "hw/virtio-net.h"
+#include "hw/sysbus.h"
+#include "qemu/bitops.h"
+
+#include "ioinst.h"
+#include "css.h"
+#include "virtio-ccw.h"
+#include "trace.h"
+
+static const TypeInfo virtio_ccw_bus_info = {
+.name = TYPE_VIRTIO_CCW_BUS,
+.parent = TYPE_BUS,
+.instance_size = sizeof(VirtioCcwBus),
+};
+
+static const VirtIOBindings virtio_ccw_bindings;
+
+VirtIODevice *virtio_ccw_get_vdev(SubchDev *sch)
+{
+VirtIODevice *vdev = NULL;
+
+if (sch->driver_data) {
+vdev = ((VirtioCcwData *)sch->driver_data)->vdev;
+}
+return vdev;
+}
+
+static void virtio_ccw_reset_subchannels(void *opaque)
+{
+VirtioCcwBus *bus = opaque;
+BusChild *kid;
+VirtioCcwData *data;
+BusState *parent = BUS(bus);
+
+QTAILQ_FOREACH(kid, &parent->children, sibling) {
+data = (VirtioCcwData *)kid->child;
+virtio_reset(data->vdev);
+css_reset_sch(data->sch);
+}
+css_reset();
+}
+
+VirtioCcwBus *virtio_ccw_bus_init(void)
+{
+VirtioCcwBus *cbus;
+BusState *bus;
+DeviceState *dev;
+
+/* Create bridge device */
+dev = qdev_create(NULL, "virtio-ccw-bridge");
+qdev_init_nofail(dev);
+
+/* Create bus on bridge device */
+bus = qbus_create(TYPE_VIRTIO_CCW_BUS, dev, "virtio-ccw");
+cbus = VIRTIO_CCW_BUS(bus);
+
+/* Enable hotplugging */
+bus->allow_hotplug = 1;
+
+qemu_register_reset(virtio_ccw_reset_subchannels, cbus);
+return cbus;
+}
+
+/* Communication blocks used by several channel commands. */
+typedef struct VqInfoBlock {
+uint64_t queue;
+uint32_t align;
+uint16_t index;
+uint16_t num;
+} QEMU_PACKED VqInfoBlock;
+
+typedef struct VqConfigBlock {
+uint16_t index;
+uint16_t num_max;
+} QEMU_PACKED VqConfigBlock;
+
+typedef struct VirtioFeatDesc {
+uint32_t features;
+uint8_t index;
+} QEMU_PACKED VirtioFeatDesc;
+
+/* Specify where the virtqueues for the subchannel are in guest memory. */
+static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align,
+  uint16_t index, uint16_t num)
+{
+VirtioCcwData *data = sch->driver_data;
+
+if (index > VIRTIO_PCI_QUEUE_MAX) {
+return -EINVAL;
+}
+
+/* Current code in virtio.c relies on 4K alignment. */
+if (addr && (align != 4096)) {
+return -EINVAL;
+}
+
+if (!data) {
+return -EINVAL;
+}
+
+virtio_queue_set_addr(data->vdev, index, addr);
+if (!addr) {
+virtio_queue_set_vector(data->vdev, index, 0);
+} else {
+/* Fail if we don't have a big enough queue. */
+/* TODO: Add interface to handle vring.num changing */
+if (virtio_queue_get_num(data->vdev, index) > num) {
+return -EINVAL;
+}
+virtio_queue_set_vector(data->vdev, index, index);
+}
+/* tell notify handler in case of config change */
+data->vdev->config_vector = VIRTIO_PCI_QUEUE_MAX;
+return 0;
+}
+
+static int virtio_ccw_cb(SubchDev *sch, CCW1 *ccw)
+{
+int ret;
+VqInfoBlock info;
+uint8_t status;
+VirtioFeatDesc features;
+void *config;
+hwaddr indicators;
+VqConfigBlock vq_config;
+VirtioCcwData *data = sch->driver_data;
+bool check_len;
+int len;
+
+if (!ccw) {
+return -EIO;
+}
+
+if (!data) {
+return -EINVAL;
+}
+
+trace_virtio_ccw_interpret_ccw(sch->cssid, sch->ssid, sch->schid,
+   ccw->cmd_code);
+check_len = !((ccw->flags & CCW_FLAG_SLI) && !(ccw->flags & CCW_FLAG_DC));
+
+/* Look at the command. */
+switch (ccw->cmd_code) {
+case CCW_CM

[PATCH 06/12] s390: I/O interrupt and machine check injection.

2013-01-17 Thread Cornelia Huck
I/O interrupts are queued per isc. Only crw pending machine checks
are supported.

Signed-off-by: Cornelia Huck 
---
 target-s390x/cpu.h|  69 +++-
 target-s390x/helper.c | 143 ++
 2 files changed, 211 insertions(+), 1 deletion(-)

diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
index 931ed4d..97f70f3 100644
--- a/target-s390x/cpu.h
+++ b/target-s390x/cpu.h
@@ -47,6 +47,11 @@
 #define MMU_USER_IDX 1
 
 #define MAX_EXT_QUEUE 16
+#define MAX_IO_QUEUE 16
+#define MAX_MCHK_QUEUE 16
+
+#define PSW_MCHK_MASK 0x0004
+#define PSW_IO_MASK 0x0200
 
 typedef struct PSW {
 uint64_t mask;
@@ -59,6 +64,17 @@ typedef struct ExtQueue {
 uint32_t param64;
 } ExtQueue;
 
+typedef struct IOIntQueue {
+uint16_t id;
+uint16_t nr;
+uint32_t parm;
+uint32_t word;
+} IOIntQueue;
+
+typedef struct MchkQueue {
+uint16_t type;
+} MchkQueue;
+
 typedef struct CPUS390XState {
 uint64_t regs[16]; /* GP registers */
 CPU_DoubleU fregs[16]; /* FP registers */
@@ -90,9 +106,17 @@ typedef struct CPUS390XState {
 uint64_t cregs[16]; /* control registers */
 
 ExtQueue ext_queue[MAX_EXT_QUEUE];
-int pending_int;
+IOIntQueue io_queue[MAX_IO_QUEUE][8];
+MchkQueue mchk_queue[MAX_MCHK_QUEUE];
 
+int pending_int;
 int ext_index;
+int io_index[8];
+int mchk_index;
+
+uint64_t ckc;
+uint64_t cputm;
+uint32_t todpr;
 
 CPU_COMMON
 
@@ -373,10 +397,14 @@ static inline void cpu_set_tls(CPUS390XState *env, 
target_ulong newtls)
 #define EXCP_EXT 1 /* external interrupt */
 #define EXCP_SVC 2 /* supervisor call (syscall) */
 #define EXCP_PGM 3 /* program interruption */
+#define EXCP_IO  7 /* I/O interrupt */
+#define EXCP_MCHK 8 /* machine check */
 
 #define INTERRUPT_EXT(1 << 0)
 #define INTERRUPT_TOD(1 << 1)
 #define INTERRUPT_CPUTIMER   (1 << 2)
+#define INTERRUPT_IO (1 << 3)
+#define INTERRUPT_MCHK   (1 << 4)
 
 /* Program Status Word.  */
 #define S390_PSWM_REGNUM 0
@@ -920,6 +948,45 @@ static inline void cpu_inject_ext(CPUS390XState *env, 
uint32_t code, uint32_t pa
 cpu_interrupt(env, CPU_INTERRUPT_HARD);
 }
 
+static inline void cpu_inject_io(CPUS390XState *env, uint16_t subchannel_id,
+ uint16_t subchannel_number,
+ uint32_t io_int_parm, uint32_t io_int_word)
+{
+int isc = ffs(io_int_word << 2) - 1;
+
+if (env->io_index[isc] == MAX_IO_QUEUE - 1) {
+/* ugh - can't queue anymore. Let's drop. */
+return;
+}
+
+env->io_index[isc]++;
+assert(env->io_index[isc] < MAX_IO_QUEUE);
+
+env->io_queue[env->io_index[isc]][isc].id = subchannel_id;
+env->io_queue[env->io_index[isc]][isc].nr = subchannel_number;
+env->io_queue[env->io_index[isc]][isc].parm = io_int_parm;
+env->io_queue[env->io_index[isc]][isc].word = io_int_word;
+
+env->pending_int |= INTERRUPT_IO;
+cpu_interrupt(env, CPU_INTERRUPT_HARD);
+}
+
+static inline void cpu_inject_crw_mchk(CPUS390XState *env)
+{
+if (env->mchk_index == MAX_MCHK_QUEUE - 1) {
+/* ugh - can't queue anymore. Let's drop. */
+return;
+}
+
+env->mchk_index++;
+assert(env->mchk_index < MAX_MCHK_QUEUE);
+
+env->mchk_queue[env->mchk_index].type = 1;
+
+env->pending_int |= INTERRUPT_MCHK;
+cpu_interrupt(env, CPU_INTERRUPT_HARD);
+}
+
 static inline bool cpu_has_work(CPUState *cpu)
 {
 CPUS390XState *env = &S390_CPU(cpu)->env;
diff --git a/target-s390x/helper.c b/target-s390x/helper.c
index d350f28..6e0a2d2 100644
--- a/target-s390x/helper.c
+++ b/target-s390x/helper.c
@@ -618,12 +618,142 @@ static void do_ext_interrupt(CPUS390XState *env)
 load_psw(env, mask, addr);
 }
 
+static void do_io_interrupt(CPUS390XState *env)
+{
+uint64_t mask, addr;
+LowCore *lowcore;
+hwaddr len = TARGET_PAGE_SIZE;
+IOIntQueue *q;
+uint8_t isc;
+int disable = 1;
+int found = 0;
+
+if (!(env->psw.mask & PSW_MASK_IO)) {
+cpu_abort(env, "I/O int w/o I/O mask\n");
+}
+
+for (isc = 0; isc < 8; isc++) {
+if (env->io_index[isc] < 0) {
+continue;
+}
+if (env->io_index[isc] > MAX_IO_QUEUE) {
+cpu_abort(env, "I/O queue overrun for isc %d: %d\n",
+  isc, env->io_index[isc]);
+}
+
+q = &env->io_queue[env->io_index[isc]][isc];
+if (!(env->cregs[6] & q->word)) {
+disable = 0;
+continue;
+}
+found = 1;
+lowcore = cpu_map_lowcore(env, &len);
+
+lowcore->subchannel_id = cpu_to_be16(q->id);
+lowcore->subchannel_nr = cpu_to_be16(q->nr);
+lowcore->io_int_parm = cpu_to_be32(q->parm);
+lowcore->io_int_word = cpu_to_be32(q->word);
+lowcore->io_old_psw.mask = cpu_to_be64(get_psw_mask(env));
+lowcore->io_old_psw.addr = cpu_to_be64(env->psw.addr);
+

[PATCH 09/12] s390: Wire up channel I/O in kvm.

2013-01-17 Thread Cornelia Huck
Trigger the code for our virtual css in case of instruction
intercepts for I/O instructions.

Handle the tsch exit for the subchannel-related part of tsch.

Signed-off-by: Cornelia Huck 
---
 target-s390x/cpu.h |  11 +++
 target-s390x/kvm.c | 252 ++---
 2 files changed, 250 insertions(+), 13 deletions(-)

diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
index b619edb..4438dc8 100644
--- a/target-s390x/cpu.h
+++ b/target-s390x/cpu.h
@@ -1136,6 +1136,13 @@ void program_interrupt(CPUS390XState *env, uint32_t 
code, int ilen);
 void QEMU_NORETURN runtime_exception(CPUS390XState *env, int excp,
  uintptr_t retaddr);
 
+#ifdef CONFIG_KVM
+int kvm_s390_io_interrupt(S390CPU *cpu, uint16_t subchannel_id,
+  uint16_t subchannel_nr, uint32_t io_int_parm,
+  uint32_t io_int_word);
+int kvm_s390_crw_mchk(S390CPU *cpu);
+void kvm_s390_enable_css_support(S390CPU *cpu);
+#else
 static inline int kvm_s390_io_interrupt(S390CPU *cpu,
 uint16_t subchannel_id,
 uint16_t subchannel_nr,
@@ -1148,6 +1155,10 @@ static inline int kvm_s390_crw_mchk(S390CPU *cpu)
 {
 return -EOPNOTSUPP;
 }
+static inline void kvm_s390_enable_css_support(S390CPU *cpu)
+{
+}
+#endif
 
 static inline void s390_io_interrupt(S390CPU *cpu,
  uint16_t subchannel_id,
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index ae6ae07..7578880 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -30,6 +30,7 @@
 #include "sysemu/kvm.h"
 #include "cpu.h"
 #include "sysemu/device_tree.h"
+#include "ioinst.h"
 
 /* #define DEBUG_KVM */
 
@@ -43,9 +44,29 @@
 
 #define IPA0_DIAG   0x8300
 #define IPA0_SIGP   0xae00
-#define IPA0_PRIV   0xb200
+#define IPA0_B2 0xb200
+#define IPA0_B9 0xb900
+#define IPA0_EB 0xeb00
 
 #define PRIV_SCLP_CALL  0x20
+#define PRIV_CSCH   0x30
+#define PRIV_HSCH   0x31
+#define PRIV_MSCH   0x32
+#define PRIV_SSCH   0x33
+#define PRIV_STSCH  0x34
+#define PRIV_TSCH   0x35
+#define PRIV_TPI0x36
+#define PRIV_SAL0x37
+#define PRIV_RSCH   0x38
+#define PRIV_STCRW  0x39
+#define PRIV_STCPS  0x3a
+#define PRIV_RCHP   0x3b
+#define PRIV_SCHM   0x3c
+#define PRIV_CHSC   0x5f
+#define PRIV_SIGA   0x74
+#define PRIV_XSCH   0x76
+#define PRIV_SQBS   0x8a
+#define PRIV_EQBS   0x9c
 #define DIAG_KVM_HYPERCALL  0x500
 #define DIAG_KVM_BREAKPOINT 0x501
 
@@ -364,10 +385,123 @@ static int kvm_sclp_service_call(S390CPU *cpu, struct 
kvm_run *run,
 return 0;
 }
 
-static int handle_priv(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
+static int kvm_handle_css_inst(S390CPU *cpu, struct kvm_run *run,
+   uint8_t ipa0, uint8_t ipa1, uint8_t ipb)
+{
+int r = 0;
+int no_cc = 0;
+CPUS390XState *env = &cpu->env;
+
+if (ipa0 != 0xb2) {
+/* Not handled for now. */
+return -1;
+}
+cpu_synchronize_state(env);
+switch (ipa1) {
+case PRIV_XSCH:
+r = ioinst_handle_xsch(env, env->regs[1]);
+break;
+case PRIV_CSCH:
+r = ioinst_handle_csch(env, env->regs[1]);
+break;
+case PRIV_HSCH:
+r = ioinst_handle_hsch(env, env->regs[1]);
+break;
+case PRIV_MSCH:
+r = ioinst_handle_msch(env, env->regs[1], run->s390_sieic.ipb);
+break;
+case PRIV_SSCH:
+r = ioinst_handle_ssch(env, env->regs[1], run->s390_sieic.ipb);
+break;
+case PRIV_STCRW:
+r = ioinst_handle_stcrw(env, run->s390_sieic.ipb);
+break;
+case PRIV_STSCH:
+r = ioinst_handle_stsch(env, env->regs[1], run->s390_sieic.ipb);
+break;
+case PRIV_TSCH:
+/* We should only get tsch via KVM_EXIT_S390_TSCH. */
+fprintf(stderr, "Spurious tsch intercept\n");
+break;
+case PRIV_CHSC:
+r = ioinst_handle_chsc(env, run->s390_sieic.ipb);
+break;
+case PRIV_TPI:
+/* This should have been handled by kvm already. */
+fprintf(stderr, "Spurious tpi intercept\n");
+break;
+case PRIV_SCHM:
+no_cc = 1;
+r = ioinst_handle_schm(env, env->regs[1], env->regs[2],
+   run->s390_sieic.ipb);
+break;
+case PRIV_RSCH:
+r = ioinst_handle_rsch(env, env->regs[1]);
+break;
+case PRIV_RCHP:
+r = ioinst_

[PATCH 07/12] s390: Add channel I/O instructions.

2013-01-17 Thread Cornelia Huck
Provide handlers for (most) channel I/O instructions.

Signed-off-by: Cornelia Huck 
---
 target-s390x/cpu.h| 101 
 target-s390x/ioinst.c | 673 ++
 target-s390x/ioinst.h |  16 ++
 trace-events  |   6 +
 4 files changed, 796 insertions(+)

diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
index 97f70f3..cf5334e 100644
--- a/target-s390x/cpu.h
+++ b/target-s390x/cpu.h
@@ -130,6 +130,8 @@ typedef struct CPUS390XState {
 QEMUTimer *tod_timer;
 
 QEMUTimer *cpu_timer;
+
+uint8_t chsc_page[TARGET_PAGE_SIZE];
 } CPUS390XState;
 
 #include "cpu-qom.h"
@@ -144,6 +146,9 @@ static inline void cpu_clone_regs(CPUS390XState *env, 
target_ulong newsp)
 }
 #endif
 
+/* distinguish between 24 bit and 31 bit addressing */
+#define HIGH_ORDER_BIT 0x8000
+
 /* Interrupt Codes */
 /* Program Interrupts */
 #define PGM_OPERATION   0x0001
@@ -327,6 +332,20 @@ void *s390_cpu_physical_memory_map(CPUS390XState *env, 
hwaddr addr, hwaddr len,
int is_write);
 void s390_cpu_physical_memory_unmap(CPUS390XState *env, void *addr, hwaddr len,
 int is_write);
+static inline hwaddr decode_basedisp_s(CPUS390XState *env, uint32_t ipb)
+{
+hwaddr addr = 0;
+uint8_t reg;
+
+reg = ipb >> 28;
+if (reg > 0) {
+addr = env->regs[reg];
+}
+addr += (ipb >> 16) & 0xfff;
+
+return addr;
+}
+
 void s390x_tod_timer(void *opaque);
 void s390x_cpu_timer(void *opaque);
 
@@ -376,11 +395,93 @@ static inline unsigned s390_del_running_cpu(CPUS390XState 
*env)
 void cpu_lock(void);
 void cpu_unlock(void);
 
+typedef struct SubchDev SubchDev;
 typedef struct SCHIB SCHIB;
 typedef struct ORB ORB;
 typedef struct IRB IRB;
 typedef struct CRW CRW;
 
+static inline SubchDev *css_find_subch(uint8_t m, uint8_t cssid, uint8_t ssid,
+   uint16_t schid)
+{
+return NULL;
+}
+static inline bool css_subch_visible(SubchDev *sch)
+{
+return false;
+}
+static inline void css_conditional_io_interrupt(SubchDev *sch)
+{
+}
+static inline int css_do_stsch(SubchDev *sch, SCHIB *schib)
+{
+return -ENODEV;
+}
+static inline bool css_schid_final(uint8_t cssid, uint8_t ssid, uint16_t schid)
+{
+return true;
+}
+static inline int css_do_msch(SubchDev *sch, SCHIB *schib)
+{
+return -ENODEV;
+}
+static inline int css_do_xsch(SubchDev *sch)
+{
+return -ENODEV;
+}
+static inline int css_do_csch(SubchDev *sch)
+{
+return -ENODEV;
+}
+static inline int css_do_hsch(SubchDev *sch)
+{
+return -ENODEV;
+}
+static inline int css_do_ssch(SubchDev *sch, ORB *orb)
+{
+return -ENODEV;
+}
+static inline int css_do_tsch(SubchDev *sch, IRB *irb)
+{
+return -ENODEV;
+}
+static inline int css_do_stcrw(CRW *crw)
+{
+return 1;
+}
+static inline int css_do_tpi(uint64_t addr, int lowcore)
+{
+return 0;
+}
+static inline int css_collect_chp_desc(int m, uint8_t cssid, uint8_t f_chpid,
+   int rfmt, uint8_t l_chpid, void *buf)
+{
+return 0;
+}
+static inline void css_do_schm(uint8_t mbk, int update, int dct, uint64_t mbo)
+{
+}
+static inline int css_enable_mss(void)
+{
+return -EINVAL;
+}
+static inline int css_enable_mcsse(void)
+{
+return -EINVAL;
+}
+static inline int css_do_rsch(SubchDev *sch)
+{
+return -ENODEV;
+}
+static inline int css_do_rchp(uint8_t cssid, uint8_t chpid)
+{
+return -ENODEV;
+}
+static inline bool css_present(uint8_t cssid)
+{
+return false;
+}
+
 static inline void cpu_set_tls(CPUS390XState *env, target_ulong newtls)
 {
 env->aregs[0] = newtls >> 32;
diff --git a/target-s390x/ioinst.c b/target-s390x/ioinst.c
index 06a16ee..0cb9569 100644
--- a/target-s390x/ioinst.c
+++ b/target-s390x/ioinst.c
@@ -13,6 +13,7 @@
 
 #include "cpu.h"
 #include "ioinst.h"
+#include "trace.h"
 
 int ioinst_disassemble_sch_ident(uint32_t value, int *m, int *cssid, int *ssid,
  int *schid)
@@ -34,3 +35,675 @@ int ioinst_disassemble_sch_ident(uint32_t value, int *m, 
int *cssid, int *ssid,
 *schid = IOINST_SCHID_NR(value);
 return 0;
 }
+
+int ioinst_handle_xsch(CPUS390XState *env, uint64_t reg1)
+{
+int cssid, ssid, schid, m;
+SubchDev *sch;
+int ret = -ENODEV;
+int cc;
+
+if (ioinst_disassemble_sch_ident(reg1, &m, &cssid, &ssid, &schid)) {
+program_interrupt(env, PGM_OPERAND, 2);
+return -EIO;
+}
+trace_ioinst_sch_id("xsch", cssid, ssid, schid);
+sch = css_find_subch(m, cssid, ssid, schid);
+if (sch && css_subch_visible(sch)) {
+ret = css_do_xsch(sch);
+}
+switch (ret) {
+case -ENODEV:
+cc = 3;
+break;
+case -EBUSY:
+cc = 2;
+break;
+case 0:
+cc = 0;
+break;
+default:
+cc = 1;
+break;
+}
+
+return cc;
+}
+
+int ioinst_handle_csch(CPUS390XState *env, uint64_t reg1)
+{

[PATCH 12/12] s390: Add s390-ccw-virtio machine.

2013-01-17 Thread Cornelia Huck
Add a new machine type, s390-ccw-virtio, making use of the
virtio-ccw transport to present virtio devices as channel
devices and make it the default on s390.

Signed-off-by: Cornelia Huck 
---
 hw/s390-virtio.c   |   1 -
 hw/s390-virtio.h   |   1 +
 hw/s390x/Makefile.objs |   1 +
 hw/s390x/s390-virtio-ccw.c | 141 +
 4 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 hw/s390x/s390-virtio-ccw.c

diff --git a/hw/s390-virtio.c b/hw/s390-virtio.c
index 603f6b0..3e58bc2 100644
--- a/hw/s390-virtio.c
+++ b/hw/s390-virtio.c
@@ -353,7 +353,6 @@ static QEMUMachine s390_machine = {
 .no_sdcard = 1,
 .use_virtcon = 1,
 .max_cpus = 255,
-.is_default = 1,
 DEFAULT_MACHINE_OPTIONS,
 };
 
diff --git a/hw/s390-virtio.h b/hw/s390-virtio.h
index aefc99d..a2cad40 100644
--- a/hw/s390-virtio.h
+++ b/hw/s390-virtio.h
@@ -15,6 +15,7 @@
 #define KVM_S390_VIRTIO_NOTIFY  0
 #define KVM_S390_VIRTIO_RESET   1
 #define KVM_S390_VIRTIO_SET_STATUS  2
+#define KVM_S390_VIRTIO_CCW_NOTIFY  3
 
 typedef int (*s390_virtio_fn)(const uint64_t *args);
 void s390_register_virtio_hypercall(uint64_t code, s390_virtio_fn fn);
diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
index 71ad255..54688b4 100644
--- a/hw/s390x/Makefile.objs
+++ b/hw/s390x/Makefile.objs
@@ -6,4 +6,5 @@ obj-y += sclp.o
 obj-y += event-facility.o
 obj-y += sclpquiesce.o sclpconsole.o
 obj-y += css.o
+obj-y += s390-virtio-ccw.o
 obj-y += virtio-ccw.o
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
new file mode 100644
index 000..98552d3
--- /dev/null
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -0,0 +1,141 @@
+/*
+ * virtio ccw machine
+ *
+ * Copyright 2012 IBM Corp.
+ * Author(s): Cornelia Huck 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include "hw/boards.h"
+#include "exec/address-spaces.h"
+#include "hw/s390-virtio.h"
+#include "hw/s390x/sclp.h"
+#include "ioinst.h"
+#include "css.h"
+#include "virtio-ccw.h"
+
+static VirtioCcwBus *ccw_bus;
+
+static int virtio_ccw_hcall_notify(const uint64_t *args)
+{
+uint64_t subch_id = args[0];
+uint64_t queue = args[1];
+SubchDev *sch;
+int cssid, ssid, schid, m;
+
+if (ioinst_disassemble_sch_ident(subch_id, &m, &cssid, &ssid, &schid)) {
+return -EINVAL;
+}
+sch = css_find_subch(m, cssid, ssid, schid);
+if (!sch || !css_subch_visible(sch)) {
+return -EINVAL;
+}
+virtio_queue_notify(virtio_ccw_get_vdev(sch), queue);
+return 0;
+
+}
+
+static int virtio_ccw_hcall_early_printk(const uint64_t *args)
+{
+uint64_t mem = args[0];
+
+if (mem < ram_size) {
+/* Early printk */
+return 0;
+}
+return -EINVAL;
+}
+
+static void virtio_ccw_register_hcalls(void)
+{
+s390_register_virtio_hypercall(KVM_S390_VIRTIO_CCW_NOTIFY,
+   virtio_ccw_hcall_notify);
+/* Tolerate early printk. */
+s390_register_virtio_hypercall(KVM_S390_VIRTIO_NOTIFY,
+   virtio_ccw_hcall_early_printk);
+}
+
+static void ccw_init(QEMUMachineInitArgs *args)
+{
+ram_addr_t my_ram_size = args->ram_size;
+const char *cpu_model = args->cpu_model;
+const char *kernel_filename = args->kernel_filename;
+const char *kernel_cmdline = args->kernel_cmdline;
+const char *initrd_filename = args->initrd_filename;
+MemoryRegion *sysmem = get_system_memory();
+MemoryRegion *ram = g_new(MemoryRegion, 1);
+int shift = 0;
+uint8_t *storage_keys;
+int ret;
+
+/* s390x ram size detection needs a 16bit multiplier + an increment. So
+   guests > 64GB can be specified in 2MB steps etc. */
+while ((my_ram_size >> (20 + shift)) > 65535) {
+shift++;
+}
+my_ram_size = my_ram_size >> (20 + shift) << (20 + shift);
+
+/* lets propagate the changed ram size into the global variable. */
+ram_size = my_ram_size;
+
+/* get a BUS */
+ccw_bus = virtio_ccw_bus_init();
+s390_sclp_init();
+
+/* register hypercalls */
+virtio_ccw_register_hcalls();
+
+/* allocate RAM */
+memory_region_init_ram(ram, "s390.ram", my_ram_size);
+vmstate_register_ram_global(ram);
+memory_region_add_subregion(sysmem, 0, ram);
+
+/* allocate storage keys */
+storage_keys = g_malloc0(my_ram_size / TARGET_PAGE_SIZE);
+
+/* init CPUs */
+s390_init_cpus(cpu_model, storage_keys);
+
+kvm_s390_enable_css_support(s390_cpu_addr2state(0));
+
+/*
+ * Create virtual css and set it as default so that non mcss-e
+ * enabled guests only see virtio devices.
+ */
+ret = css_create_css_image(VIRTUAL_CSSID, true);
+assert(ret == 0);
+
+
+s390_set_up_kernel(kernel_filename, kernel_cmdline, initrd_filename);
+
+/* Create VirtIO network adapters */
+s39

[PATCH v5 00/12] s390: channel I/O support in qemu.

2013-01-17 Thread Cornelia Huck
Hi,

here's the latest incarnation of my channel I/O and virtio-ccw
patchset for qemu, containing various changes over the last one.

(Note that "s390: Add a hypercall registration interface." has
already been posted: http://marc.info/?l=qemu-devel&m=135834160607372&w=2)

Changes include:
- Add various defines for magic constants.
- Introduce helpers for various mapping stuff and use them.
- Adapt virtio-ccw to QOM conventions.
- Move the new s390-ccw-virtio machine into an extra file (and an
  extra patch).
- Improve cpu handling during machine init (don't pass around env).

Cornelia Huck (12):
  s390: Add a hypercall registration interface.
  s390: Lowcore mapping helper.
  s390: Add mapping helper functions.
  Update linux headers.
  s390: Channel I/O basic defintions.
  s390: I/O interrupt and machine check injection.
  s390: Add channel I/O instructions.
  s390: Virtual channel subsystem support.
  s390: Wire up channel I/O in kvm.
  s390: Add new channel I/O based virtio transport.
  s390-virtio: Factor out some initialization code.
  s390: Add s390-ccw-virtio machine.

 hw/s390-virtio.c |  240 
 hw/s390-virtio.h |   28 +
 hw/s390x/Makefile.objs   |4 +
 hw/s390x/css.c   | 1131 ++
 hw/s390x/css.h   |   92 +++
 hw/s390x/s390-virtio-ccw.c   |  141 +
 hw/s390x/s390-virtio-hcall.c |   36 ++
 hw/s390x/virtio-ccw.c|  906 +++
 hw/s390x/virtio-ccw.h|   79 +++
 linux-headers/asm-generic/kvm_para.h |4 +
 linux-headers/asm-powerpc/kvm_para.h |2 +-
 linux-headers/linux/kvm.h|   21 +
 target-s390x/Makefile.objs   |2 +-
 target-s390x/cpu.h   |  257 +++-
 target-s390x/helper.c|  200 +-
 target-s390x/ioinst.c|  709 +
 target-s390x/ioinst.h|  223 +++
 target-s390x/kvm.c   |  254 +++-
 target-s390x/misc_helper.c   |2 +-
 trace-events |   18 +
 20 files changed, 4215 insertions(+), 134 deletions(-)
 create mode 100644 hw/s390-virtio.h
 create mode 100644 hw/s390x/css.c
 create mode 100644 hw/s390x/css.h
 create mode 100644 hw/s390x/s390-virtio-ccw.c
 create mode 100644 hw/s390x/s390-virtio-hcall.c
 create mode 100644 hw/s390x/virtio-ccw.c
 create mode 100644 hw/s390x/virtio-ccw.h
 create mode 100644 linux-headers/asm-generic/kvm_para.h
 create mode 100644 target-s390x/ioinst.c
 create mode 100644 target-s390x/ioinst.h

-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/12] s390: Add a hypercall registration interface.

2013-01-17 Thread Cornelia Huck
Allow virtio machines to register for different diag500 function
codes and convert s390-virtio to use it.

Signed-off-by: Cornelia Huck 
---
 hw/s390-virtio.c | 90 +++-
 hw/s390-virtio.h | 22 +++
 hw/s390x/Makefile.objs   |  1 +
 hw/s390x/s390-virtio-hcall.c | 36 ++
 target-s390x/cpu.h   |  2 +-
 target-s390x/kvm.c   |  2 +-
 target-s390x/misc_helper.c   |  2 +-
 7 files changed, 110 insertions(+), 45 deletions(-)
 create mode 100644 hw/s390-virtio.h
 create mode 100644 hw/s390x/s390-virtio-hcall.c

diff --git a/hw/s390-virtio.c b/hw/s390-virtio.c
index 0e93cc3..bded30b 100644
--- a/hw/s390-virtio.c
+++ b/hw/s390-virtio.c
@@ -33,6 +33,7 @@
 
 #include "hw/s390-virtio-bus.h"
 #include "hw/s390x/sclp.h"
+#include "hw/s390-virtio.h"
 
 //#define DEBUG_S390
 
@@ -44,10 +45,6 @@
 do { } while (0)
 #endif
 
-#define KVM_S390_VIRTIO_NOTIFY  0
-#define KVM_S390_VIRTIO_RESET   1
-#define KVM_S390_VIRTIO_SET_STATUS  2
-
 #define KERN_IMAGE_START0x01UL
 #define KERN_PARM_AREA  0x010480UL
 #define INITRD_START0x80UL
@@ -73,56 +70,63 @@ S390CPU *s390_cpu_addr2state(uint16_t cpu_addr)
 return ipi_states[cpu_addr];
 }
 
-int s390_virtio_hypercall(CPUS390XState *env, uint64_t mem, uint64_t hypercall)
+static int s390_virtio_hcall_notify(const uint64_t *args)
 {
+uint64_t mem = args[0];
 int r = 0, i;
 
-dprintf("KVM hypercall: %ld\n", hypercall);
-switch (hypercall) {
-case KVM_S390_VIRTIO_NOTIFY:
-if (mem > ram_size) {
-VirtIOS390Device *dev = s390_virtio_bus_find_vring(s390_bus,
-   mem, &i);
-if (dev) {
-virtio_queue_notify(dev->vdev, i);
-} else {
-r = -EINVAL;
-}
-} else {
-/* Early printk */
-}
-break;
-case KVM_S390_VIRTIO_RESET:
-{
-VirtIOS390Device *dev;
-
-dev = s390_virtio_bus_find_mem(s390_bus, mem);
-virtio_reset(dev->vdev);
-stb_phys(dev->dev_offs + VIRTIO_DEV_OFFS_STATUS, 0);
-s390_virtio_device_sync(dev);
-s390_virtio_reset_idx(dev);
-break;
-}
-case KVM_S390_VIRTIO_SET_STATUS:
-{
-VirtIOS390Device *dev;
-
-dev = s390_virtio_bus_find_mem(s390_bus, mem);
+if (mem > ram_size) {
+VirtIOS390Device *dev = s390_virtio_bus_find_vring(s390_bus, mem, &i);
 if (dev) {
-s390_virtio_device_update_status(dev);
+virtio_queue_notify(dev->vdev, i);
 } else {
 r = -EINVAL;
 }
-break;
+} else {
+/* Early printk */
 }
-default:
+return r;
+}
+
+static int s390_virtio_hcall_reset(const uint64_t *args)
+{
+uint64_t mem = args[0];
+VirtIOS390Device *dev;
+
+dev = s390_virtio_bus_find_mem(s390_bus, mem);
+virtio_reset(dev->vdev);
+stb_phys(dev->dev_offs + VIRTIO_DEV_OFFS_STATUS, 0);
+s390_virtio_device_sync(dev);
+s390_virtio_reset_idx(dev);
+
+return 0;
+}
+
+static int s390_virtio_hcall_set_status(const uint64_t *args)
+{
+uint64_t mem = args[0];
+int r = 0;
+VirtIOS390Device *dev;
+
+dev = s390_virtio_bus_find_mem(s390_bus, mem);
+if (dev) {
+s390_virtio_device_update_status(dev);
+} else {
 r = -EINVAL;
-break;
 }
-
 return r;
 }
 
+static void s390_virtio_register_hcalls(void)
+{
+s390_register_virtio_hypercall(KVM_S390_VIRTIO_NOTIFY,
+   s390_virtio_hcall_notify);
+s390_register_virtio_hypercall(KVM_S390_VIRTIO_RESET,
+   s390_virtio_hcall_reset);
+s390_register_virtio_hypercall(KVM_S390_VIRTIO_SET_STATUS,
+   s390_virtio_hcall_set_status);
+}
+
 /*
  * The number of running CPUs. On s390 a shutdown is the state of all CPUs
  * being either stopped or disabled (for interrupts) waiting. We have to
@@ -186,6 +190,9 @@ static void s390_init(QEMUMachineInitArgs *args)
 s390_bus = s390_virtio_bus_init(&my_ram_size);
 s390_sclp_init();
 
+/* register hypercalls */
+s390_virtio_register_hcalls();
+
 /* allocate RAM */
 memory_region_init_ram(ram, "s390.ram", my_ram_size);
 vmstate_register_ram_global(ram);
@@ -339,4 +346,3 @@ static void s390_machine_init(void)
 }
 
 machine_init(s390_machine_init);
-
diff --git a/hw/s390-virtio.h b/hw/s390-virtio.h
new file mode 100644
index 000..25bb610
--- /dev/null
+++ b/hw/s390-virtio.h
@@ -0,0 +1,22 @@
+/*
+ * Virtio interfaces for s390
+ *
+ * Copyright 2012 IBM Corp.
+ * Author(s): Cornelia Huck 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#ifndef HW_

[PATCH 03/12] s390: Add mapping helper functions.

2013-01-17 Thread Cornelia Huck
Add s390_cpu_physical_memory_{map,unmap} with special handling
for the lowcore.

Signed-off-by: Cornelia Huck 
---
 target-s390x/cpu.h|  4 
 target-s390x/helper.c | 26 ++
 2 files changed, 30 insertions(+)

diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
index 6700fe9..cd729d3 100644
--- a/target-s390x/cpu.h
+++ b/target-s390x/cpu.h
@@ -299,6 +299,10 @@ int cpu_s390x_handle_mmu_fault (CPUS390XState *env, 
target_ulong address, int rw
 
 
 #ifndef CONFIG_USER_ONLY
+void *s390_cpu_physical_memory_map(CPUS390XState *env, hwaddr addr, hwaddr len,
+   int is_write);
+void s390_cpu_physical_memory_unmap(CPUS390XState *env, void *addr, hwaddr len,
+int is_write);
 void s390x_tod_timer(void *opaque);
 void s390x_cpu_timer(void *opaque);
 
diff --git a/target-s390x/helper.c b/target-s390x/helper.c
index bf2b4d3..d350f28 100644
--- a/target-s390x/helper.c
+++ b/target-s390x/helper.c
@@ -490,6 +490,32 @@ static void cpu_unmap_lowcore(LowCore *lowcore, hwaddr len)
 cpu_physical_memory_unmap(lowcore, len, 1, len);
 }
 
+void *s390_cpu_physical_memory_map(CPUS390XState *env, hwaddr addr, hwaddr len,
+   int is_write)
+{
+hwaddr start = addr;
+
+/* Mind the prefix area. */
+if (addr < 8192) {
+start += env->psa;
+} else if ((env->psa <= addr) && (addr < env->psa + 8192)) {
+start -= env->psa;
+}
+
+if ((addr + len <= env->psa) || (addr >= env->psa + 8192)) {
+return cpu_physical_memory_map(start, &len, is_write);
+}
+
+DPRINTF("mapping across lowcore boundaries not yet supported\n");
+return NULL;
+}
+
+void s390_cpu_physical_memory_unmap(CPUS390XState *env, void *addr, hwaddr len,
+int is_write)
+{
+cpu_physical_memory_unmap(addr, len, is_write, len);
+}
+
 static void do_svc_interrupt(CPUS390XState *env)
 {
 uint64_t mask, addr;
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 2/3] x86, apicv: add virtual x2apic support

2013-01-17 Thread Gleb Natapov
On Wed, Jan 16, 2013 at 06:21:11PM +0800, Yang Zhang wrote:
> From: Yang Zhang 
> 
> basically to benefit from apicv, we need to enable virtualized x2apic mode.
> Currently, we only enable it when guest is really using x2apic.
> 
> Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled 
> x2apic:
> 0x800 - 0x8ff: no read intercept for apicv register virtualization,
>except APIC ID and TMCCT which need software's assistance 
> to
>  get right value.
> 
> Signed-off-by: Kevin Tian 
> Signed-off-by: Yang Zhang 
> ---
>  arch/x86/include/asm/kvm_host.h |1 +
>  arch/x86/include/asm/vmx.h  |1 +
>  arch/x86/kvm/lapic.c|   20 ++--
>  arch/x86/kvm/lapic.h|5 +
>  arch/x86/kvm/svm.c  |6 +
>  arch/x86/kvm/vmx.c  |  204 
> +++
>  6 files changed, 209 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c431b33..35aa8e6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -697,6 +697,7 @@ struct kvm_x86_ops {
>   void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>   void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>   void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> + void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
>   int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>   int (*get_tdp_level)(void);
>   u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 44c3f7e..0a54df0 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -139,6 +139,7 @@
>  #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x0001
>  #define SECONDARY_EXEC_ENABLE_EPT   0x0002
>  #define SECONDARY_EXEC_RDTSCP0x0008
> +#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE   0x0010
>  #define SECONDARY_EXEC_ENABLE_VPID  0x0020
>  #define SECONDARY_EXEC_WBINVD_EXITING0x0040
>  #define SECONDARY_EXEC_UNRESTRICTED_GUEST0x0080
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 0664c13..f39aee3 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -140,11 +140,6 @@ static inline int apic_enabled(struct kvm_lapic *apic)
>   (LVT_MASK | APIC_MODE_MASK | APIC_INPUT_POLARITY | \
>APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER)
>  
> -static inline int apic_x2apic_mode(struct kvm_lapic *apic)
> -{
> - return apic->vcpu->arch.apic_base & X2APIC_ENABLE;
> -}
> -
>  static inline int kvm_apic_id(struct kvm_lapic *apic)
>  {
>   return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
> @@ -1323,12 +1318,17 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 
> value)
>   if (!kvm_vcpu_is_bsp(apic->vcpu))
>   value &= ~MSR_IA32_APICBASE_BSP;
>  
> - vcpu->arch.apic_base = value;
> - if (apic_x2apic_mode(apic)) {
> - u32 id = kvm_apic_id(apic);
> - u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> - kvm_apic_set_ldr(apic, ldr);
> + if ((vcpu->arch.apic_base ^ value) & X2APIC_ENABLE) {
> + if (value & X2APIC_ENABLE) {
> + u32 id = kvm_apic_id(apic);
> + u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> + kvm_apic_set_ldr(apic, ldr);
> + kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
> + } else
> + kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
>   }
> +
> + vcpu->arch.apic_base = value;
>   apic->base_address = apic->vcpu->arch.apic_base &
>MSR_IA32_APICBASE_BASE;
>  
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 9a8ee22..22a5397 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -126,4 +126,9 @@ static inline int kvm_lapic_enabled(struct kvm_vcpu *vcpu)
>   return kvm_apic_present(vcpu) && kvm_apic_sw_enabled(vcpu->arch.apic);
>  }
>  
> +static inline int apic_x2apic_mode(struct kvm_lapic *apic)
> +{
> + return apic->vcpu->arch.apic_base & X2APIC_ENABLE;
> +}
> +
>  #endif
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index d29d3cd..38407e9 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3571,6 +3571,11 @@ static void update_cr8_intercept(struct kvm_vcpu 
> *vcpu, int tpr, int irr)
>   set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>  }
>  
> +static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
> +{
> + return;
> +}
> +
>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
>  {
>   struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4290,6 +4295,7 @@ static struct kvm_x86_ops svm_x86_ops = {
>   .enable_nmi_window = enable_nmi_windo

Re: [PATCH v6 1/3] KVM: x86: clean up reexecute_instruction

2013-01-17 Thread Gleb Natapov
For the series:
Reviewed-by: Gleb Natapov 

On Sun, Jan 13, 2013 at 11:44:12PM +0800, Xiao Guangrong wrote:
> Little cleanup for reexecute_instruction, also use gpa_to_gfn in
> retry_instruction
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/x86.c |   13 ++---
>  1 files changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1c9c834..08cacd9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4761,19 +4761,18 @@ static bool reexecute_instruction(struct kvm_vcpu 
> *vcpu, gva_t gva)
>   if (tdp_enabled)
>   return false;
> 
> + gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL);
> + if (gpa == UNMAPPED_GVA)
> + return true; /* let cpu generate fault */
> +
>   /*
>* if emulation was due to access to shadowed page table
>* and it failed try to unshadow page and re-enter the
>* guest to let CPU execute the instruction.
>*/
> - if (kvm_mmu_unprotect_page_virt(vcpu, gva))
> + if (kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)))
>   return true;
> 
> - gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL);
> -
> - if (gpa == UNMAPPED_GVA)
> - return true; /* let cpu generate fault */
> -
>   /*
>* Do not retry the unhandleable instruction if it faults on the
>* readonly host memory, otherwise it will goto a infinite loop:
> @@ -4828,7 +4827,7 @@ static bool retry_instruction(struct x86_emulate_ctxt 
> *ctxt,
>   if (!vcpu->arch.mmu.direct_map)
>   gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL);
> 
> - kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
> + kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
> 
>   return true;
>  }
> -- 
> 1.7.7.6

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Clean up and optimize __kvm_set_memory_region() - part1

2013-01-17 Thread Marcelo Tosatti
On Thu, Jan 17, 2013 at 11:20:43AM +0800, Xiao Guangrong wrote:
> On 01/17/2013 03:07 AM, Marcelo Tosatti wrote:
> > On Fri, Jan 11, 2013 at 06:25:18PM +0900, Takuya Yoshikawa wrote:
> >> Patches 1 to 3 are trivial.
> >>
> >> Patch 4 is the main cause of the increased lines, but I think the new
> >> code makes it easier to understand why each condition in
> >> __kvm_set_memory_region() is there.
> >>
> >> If you don't agree with patch 4, please consider taking the rest of the
> >> series at this time.
> >>
> >> Takuya Yoshikawa (4):
> >>   KVM: set_memory_region: Don't jump to out_free unnecessarily
> >>   KVM: set_memory_region: Don't check for overlaps unless we create or 
> >> move a slot
> >>   KVM: set_memory_region: Remove unnecessary variable memslot
> >>   KVM: set_memory_region: Identify the requested change explicitly
> >>
> >>  virt/kvm/kvm_main.c |   94 
> >> ---
> >>  1 files changed, 59 insertions(+), 35 deletions(-)
> >>
> >> -- 
> >> 1.7.5.4
> > 
> > Reviewed-by: Marcelo Tosatti 
> > 
> > BTW, while at it, its probably worthwhile to restrict flags
> > modifications: change from flags = 0 to flags = read-only is 
> > incomplete. Xiao, should it be allowed only during creation?
> 
> Will Readonly memory be used for VM-mem-sharing in the future?
> I remember you mentioned about mem-sharing things before. ;)

It can.

> Actually, It is safe on KVM MMU because all the gfns in the slot
> can become readonly after calling __kvm_set_memory_region. It is
> unsafe on IOMMU, what need to be fixed is unmapping gfns on IOMMU
> when the flag is changed.

Kvm.git next does not clear spte W bit on 0 -> readonly flags change.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Clean up and optimize __kvm_set_memory_region() - part1

2013-01-17 Thread Gleb Natapov
On Fri, Jan 11, 2013 at 06:25:18PM +0900, Takuya Yoshikawa wrote:
> Patches 1 to 3 are trivial.
> 
> Patch 4 is the main cause of the increased lines, but I think the new
> code makes it easier to understand why each condition in
> __kvm_set_memory_region() is there.
> 
> If you don't agree with patch 4, please consider taking the rest of the
> series at this time.
> 
> Takuya Yoshikawa (4):
>   KVM: set_memory_region: Don't jump to out_free unnecessarily
>   KVM: set_memory_region: Don't check for overlaps unless we create or move a 
> slot
>   KVM: set_memory_region: Remove unnecessary variable memslot
>   KVM: set_memory_region: Identify the requested change explicitly
> 
>  virt/kvm/kvm_main.c |   94 
> ---
>  1 files changed, 59 insertions(+), 35 deletions(-)
> 
Applied 1-3. It is not clear whether kvm_iommu_map_pages() should be called
when flags change, so not applying 4 for now.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [QEMU PATCH v3] virtio-net: introduce a new macaddr control

2013-01-17 Thread Stefan Hajnoczi
On Thu, Jan 17, 2013 at 06:30:46PM +0800, ak...@redhat.com wrote:
> From: Amos Kong 
> 
> In virtio-net guest driver, currently we write MAC address to
> pci config space byte by byte, this means that we have an
> intermediate step where mac is wrong. This patch introduced
> a new control command to set MAC address, it's atomic.
> 
> VIRTIO_NET_F_CTRL_MAC_ADDR is a new feature bit for compatibility.
> 
> "mac" field will be set to read-only when VIRTIO_NET_F_CTRL_MAC_ADDR
> is acked.
> 
> Signed-off-by: Amos Kong 
> ---
> V2: check guest's iov_len
> V3: fix of migration compatibility
> make mac field in config space read-only when new feature is acked
> ---
>  hw/pc_piix.c|  4 
>  hw/virtio-net.c | 10 +-
>  hw/virtio-net.h | 12 ++--
>  3 files changed, 23 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Locking problem in KVM (or MM)

2013-01-17 Thread Marcelo Tosatti

Appears to be related to https://lkml.org/lkml/2013/1/15/542

On Wed, Jan 16, 2013 at 02:41:25PM -0800, Stephen Hemminger wrote:
> Lockdep splat, appears to be KVM related with 3.8-rc1
> 
> [24428.429305] 
> [24428.429308] =
> [24428.429309] [ INFO: possible recursive locking detected ]
> [24428.429312] 3.8.0-rc1-net-next+ #4 Tainted: GW   
> [24428.429312] -
> [24428.429313] kvm/7355 is trying to acquire lock:
> [24428.429314]  (&anon_vma->rwsem){..}, at: [] 
> mm_take_all
> _locks+0x102/0x14e
> [24428.429321] 
> [24428.429321] but task is already holding lock:
> [24428.429322]  (&anon_vma->rwsem){..}, at: [] 
> mm_take_all
> _locks+0x102/0x14e
> [24428.429325] 
> [24428.429325] other info that might help us debug this:
> [24428.429326]  Possible unsafe locking scenario:
> [24428.429326] 
> [24428.429327]CPU0
> [24428.429327]
> [24428.429328]   lock(&anon_vma->rwsem);
> [24428.429329]   lock(&anon_vma->rwsem);
> [24428.429331] 
> [24428.429331]  *** DEADLOCK ***
> [24428.429331] 
> [24428.429332]  May be due to missing lock nesting notation
> [24428.429332] 
> [24428.429333] 4 locks held by kvm/7355:
> [24428.429334]  #0:  (&mm->mmap_sem){++}, at: [] 
> do_mmu_no
> tifier_register+0x65/0x12c
> [24428.429338]  #1:  (mm_all_locks_mutex){+.+...}, at: [] 
> mm_t
> ake_all_locks+0x39/0x14e
> [24428.429341]  #2:  (&mapping->i_mmap_mutex){+.+...}, at: 
> [] 
> mm_take_all_locks+0xa1/0x14e
> [24428.429344]  #3:  (&anon_vma->rwsem){..}, at: [] 
> mm_tak
> e_all_locks+0x102/0x14e
> [24428.429347] 
> [24428.429347] stack backtrace:
> [24428.429348] Pid: 7355, comm: kvm Tainted: GW
> 3.8.0-rc1-net-next+ #
> 4
> [24428.429349] Call Trace:
> [24428.429354]  [] __lock_acquire+0x569/0xe12
> [24428.429356]  [] lock_acquire+0xd7/0x123
> [24428.429358]  [] ? mm_take_all_locks+0x102/0x14e
> [24428.429361]  [] down_write+0x49/0x58
> [24428.429363]  [] ? mm_take_all_locks+0x102/0x14e
> [24428.429365]  [] ? _mutex_lock_nest_lock+0x40/0x45
> [24428.429366]  [] mm_take_all_locks+0x102/0x14e
> [24428.429369]  [] do_mmu_notifier_register+0x6d/0x12c
> [24428.429371]  [] mmu_notifier_register+0x13/0x15
> [24428.429373]  [] kvm_dev_ioctl+0x277/0x3e4
> [24428.429376]  [] vfs_ioctl+0x26/0x39
> [24428.429378]  [] do_vfs_ioctl+0x40f/0x452
> [24428.429381]  [] ? time_hardirqs_off+0x15/0x2a
> [24428.429383]  [] ? error_sti+0x5/0x6
> [24428.429385]  [] ? trace_hardirqs_off_caller+0x3f/0x9e
> [24428.429388]  [] ? fget_light+0x3d/0x9d
> [24428.429389]  [] sys_ioctl+0x57/0x86
> [24428.429393]  [] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [24428.429395]  [] system_call_fastpath+0x16/0x1b
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [QEMU PATCH v2] virtio-net: introduce a new macaddr control

2013-01-17 Thread Michael S. Tsirkin
On Thu, Jan 17, 2013 at 11:49:20AM +1030, Rusty Russell wrote:
> ak...@redhat.com writes:
> > @@ -349,6 +351,14 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t 
> > cmd,
> >  {
> >  struct virtio_net_ctrl_mac mac_data;
> >  
> > +if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET && elem->out_num == 2 &&
> > +elem->out_sg[1].iov_len == ETH_ALEN) {
> > +/* Set MAC address */
> > +memcpy(n->mac, elem->out_sg[1].iov_base, elem->out_sg[1].iov_len);
> > +qemu_format_nic_info_str(&n->nic->nc, n->mac);
> > +return VIRTIO_NET_OK;
> > +}
> 
> Does the rest of the net device still rely on the layout of descriptors?
> If so, OK, we'll fix them all together.  If not, this introduces a new
> one.
> 
> Cheers,
> Rusty.

The following fixes all existing users.
Got to deal with some urgent stuff so did not test yet -
Amos, would you like to include this in your patchset
and build on it, test it all together?
If not I'll get to it next week.

--->

virtio-net: remove layout assumptions for ctrl vq

Signed-off-by: Michael S. Tsirkin 

---

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 4d80a25..5d1e084 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -316,44 +316,44 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
uint32_t features)
 }
 
 static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
- VirtQueueElement *elem)
+ struct iovec *iov, unsigned int iov_cnt)
 {
 uint8_t on;
+size_t s;
 
-if (elem->out_num != 2 || elem->out_sg[1].iov_len != sizeof(on)) {
-error_report("virtio-net ctrl invalid rx mode command");
-exit(1);
+s = iov_to_buf(iov, iov_cnt, 0, &on, sizeof on);
+if (s != sizeof on) {
+return VIRTIO_NET_ERR;
 }
 
-on = ldub_p(elem->out_sg[1].iov_base);
-
-if (cmd == VIRTIO_NET_CTRL_RX_MODE_PROMISC)
+if (cmd == VIRTIO_NET_CTRL_RX_MODE_PROMISC) {
 n->promisc = on;
-else if (cmd == VIRTIO_NET_CTRL_RX_MODE_ALLMULTI)
+} else if (cmd == VIRTIO_NET_CTRL_RX_MODE_ALLMULTI) {
 n->allmulti = on;
-else if (cmd == VIRTIO_NET_CTRL_RX_MODE_ALLUNI)
+} else if (cmd == VIRTIO_NET_CTRL_RX_MODE_ALLUNI) {
 n->alluni = on;
-else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOMULTI)
+} else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOMULTI) {
 n->nomulti = on;
-else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOUNI)
+} else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOUNI) {
 n->nouni = on;
-else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOBCAST)
+} else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOBCAST) {
 n->nobcast = on;
-else
+} else {
 return VIRTIO_NET_ERR;
+}
 
 return VIRTIO_NET_OK;
 }
 
 static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
- VirtQueueElement *elem)
+ struct iovec *iov, unsigned int iov_cnt)
 {
 struct virtio_net_ctrl_mac mac_data;
+size_t s;
 
-if (cmd != VIRTIO_NET_CTRL_MAC_TABLE_SET || elem->out_num != 3 ||
-elem->out_sg[1].iov_len < sizeof(mac_data) ||
-elem->out_sg[2].iov_len < sizeof(mac_data))
+if (cmd != VIRTIO_NET_CTRL_MAC_TABLE_SET) {
 return VIRTIO_NET_ERR;
+}
 
 n->mac_table.in_use = 0;
 n->mac_table.first_multi = 0;
@@ -361,54 +361,64 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t 
cmd,
 n->mac_table.multi_overflow = 0;
 memset(n->mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN);
 
-mac_data.entries = ldl_p(elem->out_sg[1].iov_base);
+s = iov_to_buf(iov, iov_cnt, 0, &mac_data.entries, sizeof 
mac_data.entries);
+if (s != sizeof mac_data.entries) {
+return VIRTIO_NET_ERR;
+}
+
+iov_discard_front(&iov, &iov_cnt, s);
+assert(s == sizeof mac_data.entries);
 
-if (sizeof(mac_data.entries) +
-(mac_data.entries * ETH_ALEN) > elem->out_sg[1].iov_len)
+if (mac_data.entries * ETH_ALEN > iov_size(iov, iov_cnt)) {
 return VIRTIO_NET_ERR;
+}
 
 if (mac_data.entries <= MAC_TABLE_ENTRIES) {
-memcpy(n->mac_table.macs, elem->out_sg[1].iov_base + sizeof(mac_data),
-   mac_data.entries * ETH_ALEN);
+s = iov_to_buf(iov, iov_cnt, 0, n->mac_table.macs,
+   mac_data.entries * ETH_ALEN);
 n->mac_table.in_use += mac_data.entries;
 } else {
 n->mac_table.uni_overflow = 1;
 }
 
+iov_discard_front(&iov, &iov_cnt, mac_data.entries * ETH_ALEN);
+
 n->mac_table.first_multi = n->mac_table.in_use;
 
-mac_data.entries = ldl_p(elem->out_sg[2].iov_base);
+s = iov_to_buf(iov, iov_cnt, 0, &mac_data.entries, sizeof 
mac_data.entries);
+if (s != sizeof mac_data.entries) {
+return VIRTIO_NET_ERR;
+}
+
+iov_discard_front(&iov, &iov_cnt, s);
+assert(s == sizeof mac_data.entries);
 
-if (sizeof(mac_data.entries) +
-(mac_data.entries * ETH_ALEN) > 

Re: [PULL 0/8] ppc patch queue 2013-01-10

2013-01-17 Thread Gleb Natapov
On Thu, Jan 17, 2013 at 11:53:38AM +0100, Alexander Graf wrote:
> 
> On 14.01.2013, at 10:03, Gleb Natapov wrote:
> 
> > On Thu, Jan 10, 2013 at 01:45:04PM +0100, Alexander Graf wrote:
> >> Hi Marcelo / Gleb,
> >> 
> >> This is my current patch queue for ppc.  Please pull.
> >> 
> >> Highlights this time:
> >> 
> >>  - Book3S: enable potential sPAPR guest emulation on PR KVM on pHyp
> >>  - BookE: EPR (External Proxy Register) support
> >> 
> >> Alex
> >> 
> > Pulled, thanks.
> 
> Any particular reason I don't see it in kvm/next?
> 
Not pushed yet. Sorry. Autotesting now and will push today.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 5/8] KVM: PPC: debug stub interface parameter defined

2013-01-17 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Paul Mackerras [mailto:pau...@samba.org]
> Sent: Thursday, January 17, 2013 12:53 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Bhushan 
> Bharat-
> R65777
> Subject: Re: [PATCH 5/8] KVM: PPC: debug stub interface parameter defined
> 
> On Wed, Jan 16, 2013 at 01:54:42PM +0530, Bharat Bhushan wrote:
> > This patch defines the interface parameter for KVM_SET_GUEST_DEBUG
> > ioctl support. Follow up patches will use this for setting up hardware
> > breakpoints, watchpoints and software breakpoints.
> 
> [snip]
> 
> > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index
> > 453a10f..7d5a51c 100644
> > --- a/arch/powerpc/kvm/booke.c
> > +++ b/arch/powerpc/kvm/booke.c
> > @@ -1483,6 +1483,12 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu,
> struct kvm_one_reg *reg)
> > return r;
> >  }
> >
> > +int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
> > +struct kvm_guest_debug *dbg)
> > +{
> > +   return -EINVAL;
> > +}
> > +
> >  int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu
> > *fpu)  {
> > return -ENOTSUPP;
> > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> > index 934413c..4c94ca9 100644
> > --- a/arch/powerpc/kvm/powerpc.c
> > +++ b/arch/powerpc/kvm/powerpc.c
> > @@ -532,12 +532,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> > #endif  }
> >
> > -int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
> > -struct kvm_guest_debug *dbg)
> > -{
> > -   return -EINVAL;
> > -}
> > -
> 
> This will break the build for non-book E machines, since
> kvm_arch_vcpu_ioctl_set_guest_debug() is referenced from generic code.
> You need to add it to arch/powerpc/kvm/book3s.c as well.

right,  I will correct this.

Thanks
-Bharat


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/2] make mac programming for virtio net more robust

2013-01-17 Thread akong
From: Amos Kong 

Currenly mac is programmed byte by byte. This means that we
have an intermediate step where mac is wrong. 

Second patch introduced a new vq control command to set mac
address, it's atomic.

V2: check return of sending command, delay eth_mac_addr()
V3: restore software address when fail to set hardware address

Amos Kong (2):
  move virtnet_send_command() above virtnet_set_mac_address()
  virtio-net: introduce a new control to set macaddr

 drivers/net/virtio_net.c| 110 +++-
 include/uapi/linux/virtio_net.h |   8 ++-
 2 files changed, 71 insertions(+), 47 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] virtio-net: introduce a new control to set macaddr

2013-01-17 Thread akong
From: Amos Kong 

Currently we write MAC address to pci config space byte by byte,
this means that we have an intermediate step where mac is wrong.
This patch introduced a new control command to set MAC address,
it's atomic.

VIRTIO_NET_F_CTRL_MAC_ADDR is a new feature bit for compatibility.

Signed-off-by: Amos Kong 
---
 drivers/net/virtio_net.c| 21 -
 include/uapi/linux/virtio_net.h |  8 +++-
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 395ab4f..837c978 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -802,14 +802,32 @@ static int virtnet_set_mac_address(struct net_device 
*dev, void *p)
struct virtnet_info *vi = netdev_priv(dev);
struct virtio_device *vdev = vi->vdev;
int ret;
+   struct scatterlist sg;
+   char save_addr[ETH_ALEN];
+   unsigned char save_aatype;
+
+   memcpy(save_addr, dev->dev_addr, ETH_ALEN);
+   save_aatype = dev->addr_assign_type;
 
ret = eth_mac_addr(dev, p);
if (ret)
return ret;
 
-   if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+   sg_init_one(&sg, dev->dev_addr, dev->addr_len);
+   if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC,
+ VIRTIO_NET_CTRL_MAC_ADDR_SET,
+ &sg, 1, 0)) {
+   dev_warn(&vdev->dev,
+"Failed to set mac address by vq command.\n");
+   memcpy(dev->dev_addr, save_addr, ETH_ALEN);
+   dev->addr_assign_type = save_aatype;
+   return -EINVAL;
+   }
+   } else if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) {
vdev->config->set(vdev, offsetof(struct virtio_net_config, mac),
  dev->dev_addr, dev->addr_len);
+   }
 
return 0;
 }
@@ -1627,6 +1645,7 @@ static unsigned int features[] = {
VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
+   VIRTIO_NET_F_CTRL_MAC_ADDR,
 };
 
 static struct virtio_driver virtio_net_driver = {
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index 848e358..a5a8c88 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -53,6 +53,7 @@
 * network */
 #define VIRTIO_NET_F_MQ22  /* Device supports Receive Flow
 * Steering */
+#define VIRTIO_NET_F_CTRL_MAC_ADDR 23  /* Set MAC address */
 
 #define VIRTIO_NET_S_LINK_UP   1   /* Link is up */
 #define VIRTIO_NET_S_ANNOUNCE  2   /* Announcement is needed */
@@ -127,7 +128,7 @@ typedef __u8 virtio_net_ctrl_ack;
  #define VIRTIO_NET_CTRL_RX_NOBCAST  5
 
 /*
- * Control the MAC filter table.
+ * Control the MAC
  *
  * The MAC filter table is managed by the hypervisor, the guest should
  * assume the size is infinite.  Filtering should be considered
@@ -140,6 +141,10 @@ typedef __u8 virtio_net_ctrl_ack;
  * first sg list contains unicast addresses, the second is for multicast.
  * This functionality is present if the VIRTIO_NET_F_CTRL_RX feature
  * is available.
+ *
+ * The ADDR_SET command requests one out scatterlist, it contains a
+ * 6 bytes MAC address. This functionality is present if the
+ * VIRTIO_NET_F_CTRL_MAC_ADDR feature is available.
  */
 struct virtio_net_ctrl_mac {
__u32 entries;
@@ -148,6 +153,7 @@ struct virtio_net_ctrl_mac {
 
 #define VIRTIO_NET_CTRL_MAC1
  #define VIRTIO_NET_CTRL_MAC_TABLE_SET0
+ #define VIRTIO_NET_CTRL_MAC_ADDR_SET 1
 
 /*
  * Control VLAN filtering
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] move virtnet_send_command() above virtnet_set_mac_address()

2013-01-17 Thread akong
From: Amos Kong 

We will send vq command to set mac address in virtnet_set_mac_address()
a little fix of coding style

Signed-off-by: Amos Kong 
---
 drivers/net/virtio_net.c | 89 
 1 file changed, 44 insertions(+), 45 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a6fcf15..395ab4f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -753,6 +753,50 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
return NETDEV_TX_OK;
 }
 
+/*
+ * Send command via the control virtqueue and check status.  Commands
+ * supported by the hypervisor, as indicated by feature bits, should
+ * never fail unless improperly formated.
+ */
+static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
+struct scatterlist *data, int out, int in)
+{
+   struct scatterlist *s, sg[VIRTNET_SEND_COMMAND_SG_MAX + 2];
+   struct virtio_net_ctrl_hdr ctrl;
+   virtio_net_ctrl_ack status = ~0;
+   unsigned int tmp;
+   int i;
+
+   /* Caller should know better */
+   BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
+   (out + in > VIRTNET_SEND_COMMAND_SG_MAX));
+
+   out++; /* Add header */
+   in++; /* Add return status */
+
+   ctrl.class = class;
+   ctrl.cmd = cmd;
+
+   sg_init_table(sg, out + in);
+
+   sg_set_buf(&sg[0], &ctrl, sizeof(ctrl));
+   for_each_sg(data, s, out + in - 2, i)
+   sg_set_buf(&sg[i + 1], sg_virt(s), s->length);
+   sg_set_buf(&sg[out + in - 1], &status, sizeof(status));
+
+   BUG_ON(virtqueue_add_buf(vi->cvq, sg, out, in, vi, GFP_ATOMIC) < 0);
+
+   virtqueue_kick(vi->cvq);
+
+   /* Spin for a response, the kick causes an ioport write, trapping
+* into the hypervisor, so the request should be handled immediately.
+*/
+   while (!virtqueue_get_buf(vi->cvq, &tmp))
+   cpu_relax();
+
+   return status == VIRTIO_NET_OK;
+}
+
 static int virtnet_set_mac_address(struct net_device *dev, void *p)
 {
struct virtnet_info *vi = netdev_priv(dev);
@@ -819,51 +863,6 @@ static void virtnet_netpoll(struct net_device *dev)
 }
 #endif
 
-/*
- * Send command via the control virtqueue and check status.  Commands
- * supported by the hypervisor, as indicated by feature bits, should
- * never fail unless improperly formated.
- */
-static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
-struct scatterlist *data, int out, int in)
-{
-   struct scatterlist *s, sg[VIRTNET_SEND_COMMAND_SG_MAX + 2];
-   struct virtio_net_ctrl_hdr ctrl;
-   virtio_net_ctrl_ack status = ~0;
-   unsigned int tmp;
-   int i;
-
-   /* Caller should know better */
-   BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
-   (out + in > VIRTNET_SEND_COMMAND_SG_MAX));
-
-   out++; /* Add header */
-   in++; /* Add return status */
-
-   ctrl.class = class;
-   ctrl.cmd = cmd;
-
-   sg_init_table(sg, out + in);
-
-   sg_set_buf(&sg[0], &ctrl, sizeof(ctrl));
-   for_each_sg(data, s, out + in - 2, i)
-   sg_set_buf(&sg[i + 1], sg_virt(s), s->length);
-   sg_set_buf(&sg[out + in - 1], &status, sizeof(status));
-
-   BUG_ON(virtqueue_add_buf(vi->cvq, sg, out, in, vi, GFP_ATOMIC) < 0);
-
-   virtqueue_kick(vi->cvq);
-
-   /*
-* Spin for a response, the kick causes an ioport write, trapping
-* into the hypervisor, so the request should be handled immediately.
-*/
-   while (!virtqueue_get_buf(vi->cvq, &tmp))
-   cpu_relax();
-
-   return status == VIRTIO_NET_OK;
-}
-
 static void virtnet_ack_link_announce(struct virtnet_info *vi)
 {
rtnl_lock();
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL 0/8] ppc patch queue 2013-01-10

2013-01-17 Thread Alexander Graf

On 14.01.2013, at 10:03, Gleb Natapov wrote:

> On Thu, Jan 10, 2013 at 01:45:04PM +0100, Alexander Graf wrote:
>> Hi Marcelo / Gleb,
>> 
>> This is my current patch queue for ppc.  Please pull.
>> 
>> Highlights this time:
>> 
>>  - Book3S: enable potential sPAPR guest emulation on PR KVM on pHyp
>>  - BookE: EPR (External Proxy Register) support
>> 
>> Alex
>> 
> Pulled, thanks.

Any particular reason I don't see it in kvm/next?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[QEMU PATCH v3] virtio-net: introduce a new macaddr control

2013-01-17 Thread akong
From: Amos Kong 

In virtio-net guest driver, currently we write MAC address to
pci config space byte by byte, this means that we have an
intermediate step where mac is wrong. This patch introduced
a new control command to set MAC address, it's atomic.

VIRTIO_NET_F_CTRL_MAC_ADDR is a new feature bit for compatibility.

"mac" field will be set to read-only when VIRTIO_NET_F_CTRL_MAC_ADDR
is acked.

Signed-off-by: Amos Kong 
---
V2: check guest's iov_len
V3: fix of migration compatibility
make mac field in config space read-only when new feature is acked
---
 hw/pc_piix.c|  4 
 hw/virtio-net.c | 10 +-
 hw/virtio-net.h | 12 ++--
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 7268dcd..66606b9 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -295,6 +295,10 @@ static QEMUMachine pc_machine_v1_4 = {
 .driver   = "usb-tablet",\
 .property = "usb_version",\
 .value= stringify(1),\
+},{\
+.driver   = "virtio-net-pci",\
+.property = "ctrl_mac_addr",\
+.value= "off",  \
 }
 
 static QEMUMachine pc_machine_v1_3 = {
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index dc7c6d6..941d782 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -93,7 +93,8 @@ static void virtio_net_set_config(VirtIODevice *vdev, const 
uint8_t *config)
 
 memcpy(&netcfg, config, sizeof(netcfg));
 
-if (memcmp(netcfg.mac, n->mac, ETH_ALEN)) {
+if (!(n->vdev.guest_features >> VIRTIO_NET_F_CTRL_MAC_ADDR & 1) &&
+memcmp(netcfg.mac, n->mac, ETH_ALEN)) {
 memcpy(n->mac, netcfg.mac, ETH_ALEN);
 qemu_format_nic_info_str(&n->nic->nc, n->mac);
 }
@@ -349,6 +350,13 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
 {
 struct virtio_net_ctrl_mac mac_data;
 
+if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET && elem->out_num == 2 &&
+elem->out_sg[1].iov_len == ETH_ALEN) {
+memcpy(n->mac, elem->out_sg[1].iov_base, elem->out_sg[1].iov_len);
+qemu_format_nic_info_str(&n->nic->nc, n->mac);
+return VIRTIO_NET_OK;
+}
+
 if (cmd != VIRTIO_NET_CTRL_MAC_TABLE_SET || elem->out_num != 3 ||
 elem->out_sg[1].iov_len < sizeof(mac_data) ||
 elem->out_sg[2].iov_len < sizeof(mac_data))
diff --git a/hw/virtio-net.h b/hw/virtio-net.h
index d46fb98..1ec632f 100644
--- a/hw/virtio-net.h
+++ b/hw/virtio-net.h
@@ -44,6 +44,8 @@
 #define VIRTIO_NET_F_CTRL_VLAN  19  /* Control channel VLAN filtering */
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20   /* Extra RX mode control support */
 
+#define VIRTIO_NET_F_CTRL_MAC_ADDR   23 /* Set MAC address */
+
 #define VIRTIO_NET_S_LINK_UP1   /* Link is up */
 
 #define TX_TIMER_INTERVAL 15 /* 150 us */
@@ -106,7 +108,7 @@ typedef uint8_t virtio_net_ctrl_ack;
  #define VIRTIO_NET_CTRL_RX_MODE_NOBCAST  5
 
 /*
- * Control the MAC filter table.
+ * Control the MAC
  *
  * The MAC filter table is managed by the hypervisor, the guest should
  * assume the size is infinite.  Filtering should be considered
@@ -119,6 +121,10 @@ typedef uint8_t virtio_net_ctrl_ack;
  * first sg list contains unicast addresses, the second is for multicast.
  * This functionality is present if the VIRTIO_NET_F_CTRL_RX feature
  * is available.
+ *
+ * The ADDR_SET command requests one out scatterlist, it contains a
+ * 6 bytes MAC address. This functionality is present if the
+ * VIRTIO_NET_F_CTRL_MAC_ADDR feature is available.
  */
 struct virtio_net_ctrl_mac {
 uint32_t entries;
@@ -126,6 +132,7 @@ struct virtio_net_ctrl_mac {
 };
 #define VIRTIO_NET_CTRL_MAC1
  #define VIRTIO_NET_CTRL_MAC_TABLE_SET0
+ #define VIRTIO_NET_CTRL_MAC_ADDR_SET 1
 
 /*
  * Control VLAN filtering
@@ -158,5 +165,6 @@ struct virtio_net_ctrl_mac {
 DEFINE_PROP_BIT("ctrl_vq", _state, _field, VIRTIO_NET_F_CTRL_VQ, 
true), \
 DEFINE_PROP_BIT("ctrl_rx", _state, _field, VIRTIO_NET_F_CTRL_RX, 
true), \
 DEFINE_PROP_BIT("ctrl_vlan", _state, _field, VIRTIO_NET_F_CTRL_VLAN, 
true), \
-DEFINE_PROP_BIT("ctrl_rx_extra", _state, _field, 
VIRTIO_NET_F_CTRL_RX_EXTRA, true)
+DEFINE_PROP_BIT("ctrl_rx_extra", _state, _field, 
VIRTIO_NET_F_CTRL_RX_EXTRA, true), \
+DEFINE_PROP_BIT("ctrl_mac_addr", _state, _field, 
VIRTIO_NET_F_CTRL_MAC_ADDR, true)
 #endif
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-17 Thread Michael S. Tsirkin
On Wed, Jan 16, 2013 at 10:14:33AM -0600, Anthony Liguori wrote:
> "Michael S. Tsirkin"  writes:
> 
> > On Wed, Jan 16, 2013 at 09:09:49AM -0600, Anthony Liguori wrote:
> >> Jason Wang  writes:
> >> 
> >> > On 01/15/2013 03:44 AM, Anthony Liguori wrote:
> >> >> Jason Wang  writes:
> >> >>
> >> >>> Hello all:
> >> >>>
> >> >>> This seires is an update of last version of multiqueue virtio-net 
> >> >>> support.
> >> >>>
> >> >>> Recently, linux tap gets multiqueue support. This series implements 
> >> >>> basic
> >> >>> support for multiqueue tap, nic and vhost. Then use it as an 
> >> >>> infrastructure to
> >> >>> enable the multiqueue support for virtio-net.
> >> >>>
> >> >>> Both vhost and userspace multiqueue were implemented for virtio-net, 
> >> >>> but
> >> >>> userspace could be get much benefits since dataplane like parallized 
> >> >>> mechanism
> >> >>> were not implemented.
> >> >>>
> >> >>> User could start a multiqueue virtio-net card through adding a "queues"
> >> >>> parameter to tap.
> >> >>>
> >> >>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
> >> >>> virtio-net-pci,netdev=hn0
> >> >>>
> >> >>> Management tools such as libvirt can pass multiple pre-created fds 
> >> >>> through
> >> >>>
> >> >>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
> >> >>> virtio-net-pci,netdev=hn0
> >> >> I'm confused/frightened that this syntax works.  You shouldn't be
> >> >> allowed to have two values for the same property.  Better to have a
> >> >> syntax like fd[0]=X,fd[1]=Y or something along those lines.
> >> >
> >> > Yes, but this what current a StringList type works for command line.
> >> > Some other parameters such as dnssearch, hostfwd and guestfwd have
> >> > already worked in this way. Looks like your suggestions need some
> >> > extension on QemuOps visitor, maybe we can do this on top.
> >> 
> >> It's a silly syntax and breaks compatibility.  This is valid syntax:
> >> 
> >> -net tap,fd=3,fd=4
> >> 
> >> In this case, it means 'fd=4' because the last fd overwrites the first
> >> one.
> >> 
> >> Now you've changed it to mean something else.  Having one thing mean
> >> something in one context, but something else in another context is
> >> terrible interface design.
> >> 
> >> Regards,
> >> 
> >> Anthony Liguori
> >
> > Aha so just renaming the field 'fds' would address this issue?
> 
> No, you still have the problem of different meanings.
> 
> -netdev tap,fd=X,fd=Y
> 
> -netdev tap,fds=X,fds=Y
> 
> Would have wildly different behavior.

I think even caring about -net tap,fd=1,fd=2 is a bit silly.  If this
resulted in fd=2 by mistake, I don't think it was ever intentionally
legal.
As Jason points out we have list support and for better or worse
it is currently using repeated options, e.g. with dnssearch, hostfwd and
guestfwd.
Isn't it better to be consistent?

> Just do:
> 
> -netdev tap,fds=X:Y
> 
> And then we're staying consistent wrt the interpretation of multiple
> properties of the same name.
> 
> Regards,
> 
> Anthony Liguori

This introduces : as a special character. However fds can
be fd names passed in with getfd, where : is a legal character.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] virtio-spec: set mac address by a new vq command

2013-01-17 Thread akong
From: Amos Kong 

Virtio-net driver currently programs MAC address byte by byte,
this means that we have an intermediate step where mac is wrong.
This patch introduced a new control command to set MAC address
in one time, and added a new feature flag VIRTIO_NET_F_MAC_ADDR
for this feature.

Signed-off-by: Amos Kong 
---
v2: add more detail about new command (Stefan)
---
 virtio-spec.lyx | 58 -
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index 1ba9992..1ec0cd4 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -56,6 +56,7 @@
 \html_math_output 0
 \html_css_as_file 0
 \html_be_strict false
+\author -1930653948 "Amos Kong" 
 \author -608949062 "Rusty Russell,,," 
 \author -385801441 "Cornelia Huck" cornelia.h...@de.ibm.com
 \author 1112500848 "Rusty Russell" ru...@rustcorp.com.au
@@ -4391,6 +4392,14 @@ VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send 
gratuitous packets.
 
 \change_inserted 1986246365 1352742808
 VIRTIO_NET_F_MQ(22) Device supports multiqueue with automatic receive steering.
+\change_inserted -1930653948 1358319033
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted -1930653948 1358319080
+VIRTIO_NET_F_CTRL_MAC_ADDR(23) Set MAC address.
 \change_unchanged
 
 \end_layout
@@ -5284,7 +5293,11 @@ The class VIRTIO_NET_CTRL_RX has two commands: 
VIRTIO_NET_CTRL_RX_PROMISC
 \end_layout
 
 \begin_layout Subsubsection*
-Setting MAC Address Filtering
+Setting MAC Address
+\change_deleted -1930653948 1358318470
+ Filtering
+\change_unchanged
+
 \end_layout
 
 \begin_layout Standard
@@ -5324,6 +5337,17 @@ struct virtio_net_ctrl_mac {
 \begin_layout Plain Layout
 
  #define VIRTIO_NET_CTRL_MAC_TABLE_SET0 
+\change_inserted -1930653948 1358318313
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1930653948 1358318331
+
+ #define VIRTIO_NET_CTRL_MAC_ADDR_SET 1
+\change_unchanged
+
 \end_layout
 
 \end_inset
@@ -5349,6 +5373,38 @@ T_CTRL_MAC_TABLE_SET.
  The command-specific-data is two variable length tables of 6-byte MAC 
addresses.
  The first table contains unicast addresses, and the second contains multicast
  addresses.
+\change_inserted -1930653948 1358318545
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted -1930653948 1358418243
+The config space 
+\begin_inset Quotes eld
+\end_inset
+
+mac
+\begin_inset Quotes erd
+\end_inset
+
+ field and the command VIRTIO_NET_CTRL_MAC_ADDR_SET both set the default
+ MAC address which rx filtering accepts.
+ The command VIRTIO_NET_CTRL_MAC_ADDR_SET is atomic whereas the config space
+ 
+\begin_inset Quotes eld
+\end_inset
+
+mac
+\begin_inset Quotes erd
+\end_inset
+
+ field is not.
+ Therefore, VIRTIO_NET_CTRL_MAC_ADDR_SET is preferred, especially while
+ the NIC is up.
+ The command-specific-data is a 6-byte MAC address.
+\change_unchanged
+
 \end_layout
 
 \begin_layout Subsection*
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [QEMU PATCH v2] virtio-net: introduce a new macaddr control

2013-01-17 Thread Michael S. Tsirkin
On Thu, Jan 17, 2013 at 09:39:54AM +0100, Stefan Hajnoczi wrote:
> On Thu, Jan 17, 2013 at 01:45:11PM +0800, Amos Kong wrote:
> > On Thu, Jan 17, 2013 at 11:49:20AM +1030, Rusty Russell wrote:
> > > ak...@redhat.com writes:
> > > > @@ -349,6 +351,14 @@ static int virtio_net_handle_mac(VirtIONet *n, 
> > > > uint8_t cmd,
> > > >  {
> > > >  struct virtio_net_ctrl_mac mac_data;
> > > >  
> > > > +if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET && elem->out_num == 2 &&
> > > > +elem->out_sg[1].iov_len == ETH_ALEN) {
> > > > +/* Set MAC address */
> > > > +memcpy(n->mac, elem->out_sg[1].iov_base, 
> > > > elem->out_sg[1].iov_len);
> > > > +qemu_format_nic_info_str(&n->nic->nc, n->mac);
> > > > +return VIRTIO_NET_OK;
> > > > +}
> > > 
> > > Does the rest of the net device still rely on the layout of descriptors?
> > 
> > No, only info string of net client relies on n->mac
> 
> I think the question is whether the hw/virtio-net.c code makes
> assumptions about virtqueue descriptor layout (e.g. sg[0] is the header,
> sg[1] is the data buffer).
> 
> The answer is yes, the control virtqueue function directly accesses
> iov[n].
> 
> Additional patches would be required to convert the existing
> hw/virtio-net.c code to make no assumptions about virtqueue descriptor
> layout.  It's outside the scope of this series.
> 
> Stefan

It's not hard at all though - the harder part is data path
processing, this has been done already. Will send a
patch shortly.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VirtIO id X is not a head!

2013-01-17 Thread Stefan Hajnoczi
On Wed, Jan 16, 2013 at 08:58:50PM +0100, Matthias Leinweber wrote:
> i try to implement a virtual device/driver, but i ran into some
> trouble using the virtio api.
> My implementation looks as follows:
> A kthread exposes memory via add_buf, kicks and sleeps. If a callback
> is issued he is woken up and takes the filled buffer back via get_buf.
> (No other kthread, process or whatever works on this vq in the
> kernel).
> In qemu a qemu_thread waits for some shared memory and tries to pop
> elements from the vq and copies some data into the guest accessible
> memory. Not all elements are  necessarily poped before fill flush and
> notify are called. If a pop returns with < 0 the thread goes to sleep
> until the handler routine for this vq wakes the thread up again.
> 
> from time to time (after several 100k gets,adds and pops) i get: id %u
> is not a head!.
> virtio_ring.c:
> if (unlikely(i >= vq->vring.num)) {
>   BAD_RING(vq, "id %u out of range\n", i);
>   return NULL;
> 
> I have no idea what i am doing wrong. Is synchronization needed
> between add pop and get or am i not allowed to use a qemu_thread when
> working on a vq?

Hard to tell exactly what is going on without seeing the code.

QEMU has a global mutex and therefore does not need to do much explicit
locking...except if you spawn your own thread.  The hw/virtio.c code in
QEMU is not thread-safe.  You cannot use it from a thread without
holding the QEMU global mutex.

It's fine to do I/O handling in worker threads, but you must use a BH,
event notifier, or some other mechanism of kicking the QEMU iothread and
process the virtqueue completion in a callback there.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] KVM: PPC: debug stub interface parameter defined

2013-01-17 Thread Paul Mackerras
On Wed, Jan 16, 2013 at 01:54:42PM +0530, Bharat Bhushan wrote:
> This patch defines the interface parameter for KVM_SET_GUEST_DEBUG
> ioctl support. Follow up patches will use this for setting up
> hardware breakpoints, watchpoints and software breakpoints.

[snip]

> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 453a10f..7d5a51c 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -1483,6 +1483,12 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, 
> struct kvm_one_reg *reg)
>   return r;
>  }
>  
> +int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
> +  struct kvm_guest_debug *dbg)
> +{
> + return -EINVAL;
> +}
> +
>  int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
>  {
>   return -ENOTSUPP;
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 934413c..4c94ca9 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -532,12 +532,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  #endif
>  }
>  
> -int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
> -struct kvm_guest_debug *dbg)
> -{
> - return -EINVAL;
> -}
> -

This will break the build for non-book E machines, since
kvm_arch_vcpu_ioctl_set_guest_debug() is referenced from generic code.
You need to add it to arch/powerpc/kvm/book3s.c as well.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] KVM: x86 emulator: convert a few freestanding emulations to fastop

2013-01-17 Thread Gleb Natapov
Missing Signed-off-by :(

On Sat, Jan 12, 2013 at 04:32:57PM +0200, Avi Kivity wrote:
> ---
>  arch/x86/kvm/emulate.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index da2b903..1bb0af2 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -2209,7 +2209,7 @@ static int em_cmpxchg(struct x86_emulate_ctxt *ctxt)
>   /* Save real source value, then compare EAX against destination. */
>   ctxt->src.orig_val = ctxt->src.val;
>   ctxt->src.val = reg_read(ctxt, VCPU_REGS_RAX);
> - emulate_2op_SrcV(ctxt, "cmp");
> + fastop(ctxt, em_cmp);
>  
>   if (ctxt->eflags & EFLG_ZF) {
>   /* Success: write back to memory. */
> @@ -2977,7 +2977,7 @@ static int em_das(struct x86_emulate_ctxt *ctxt)
>   ctxt->src.type = OP_IMM;
>   ctxt->src.val = 0;
>   ctxt->src.bytes = 1;
> - emulate_2op_SrcV(ctxt, "or");
> + fastop(ctxt, em_or);
>   ctxt->eflags &= ~(X86_EFLAGS_AF | X86_EFLAGS_CF);
>   if (cf)
>   ctxt->eflags |= X86_EFLAGS_CF;
> @@ -4816,7 +4816,7 @@ twobyte_insn:
>   (s16) ctxt->src.val;
>   break;
>   case 0xc0 ... 0xc1: /* xadd */
> - emulate_2op_SrcV(ctxt, "add");
> + fastop(ctxt, em_add);
>   /* Write back the register source. */
>   ctxt->src.val = ctxt->dst.orig_val;
>   write_register_operand(&ctxt->src);
> -- 
> 1.8.0.1

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: x86 emulator: covert SETCC to fastop

2013-01-17 Thread Gleb Natapov
On Sat, Jan 12, 2013 at 04:32:52PM +0200, Avi Kivity wrote:
> This is a bit of a special case since we don't have the usual
> byte/word/long/quad switch; instead we switch on the condition code embedded
> in the instruction.
> 
> Signed-off-by: Avi Kivity 
This fails autotest. f9-32 bit guest, but it looks like the failure is
in BIOS.

> ---
>  arch/x86/kvm/emulate.c | 60 
> --
>  1 file changed, 29 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index d641178..f6f615e 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -499,6 +499,28 @@ static void invalidate_registers(struct x86_emulate_ctxt 
> *ctxt)
>   ON64(FOP3E(op, rax, rbx, cl)) \
>   FOP_END
>  
> +/* Special case for SETcc - 1 instruction per cc */
> +#define FOP_SETCC(op) ".align 4; " #op " %al; ret \n\t"
> +
> +FOP_START(setcc)
> +FOP_SETCC(seto)
> +FOP_SETCC(setc)
> +FOP_SETCC(setz)
> +FOP_SETCC(setbe)
> +FOP_SETCC(sets)
> +FOP_SETCC(setp)
> +FOP_SETCC(setl)
> +FOP_SETCC(setle)
> +FOP_SETCC(setno)
> +FOP_SETCC(setnc)
> +FOP_SETCC(setnz)
> +FOP_SETCC(setnbe)
> +FOP_SETCC(setns)
> +FOP_SETCC(setnp)
> +FOP_SETCC(setnl)
> +FOP_SETCC(setnle)
> +FOP_END;
> +
>  #define __emulate_1op_rax_rdx(ctxt, _op, _suffix, _ex)   
> \
>   do {\
>   unsigned long _tmp; \
> @@ -939,39 +961,15 @@ static int read_descriptor(struct x86_emulate_ctxt 
> *ctxt,
>   return rc;
>  }
>  
> -static int test_cc(unsigned int condition, unsigned int flags)
> +static u8 test_cc(unsigned int condition, unsigned long flags)
>  {
> - int rc = 0;
> + u8 rc;
> + void (*fop)(void) = (void *)em_setcc + 4 * (condition & 0xf);
>  
> - switch ((condition & 15) >> 1) {
> - case 0: /* o */
> - rc |= (flags & EFLG_OF);
> - break;
> - case 1: /* b/c/nae */
> - rc |= (flags & EFLG_CF);
> - break;
> - case 2: /* z/e */
> - rc |= (flags & EFLG_ZF);
> - break;
> - case 3: /* be/na */
> - rc |= (flags & (EFLG_CF|EFLG_ZF));
> - break;
> - case 4: /* s */
> - rc |= (flags & EFLG_SF);
> - break;
> - case 5: /* p/pe */
> - rc |= (flags & EFLG_PF);
> - break;
> - case 7: /* le/ng */
> - rc |= (flags & EFLG_ZF);
> - /* fall through */
> - case 6: /* l/nge */
> - rc |= (!(flags & EFLG_SF) != !(flags & EFLG_OF));
> - break;
> - }
> -
> - /* Odd condition identifiers (lsb == 1) have inverted sense. */
> - return (!!rc ^ (condition & 1));
> + flags = (flags & EFLAGS_MASK) | X86_EFLAGS_IF;
> + asm("pushq %[flags]; popf; call *%[fastop]"
> + : "=a"(rc) : [fastop]"r"(fop), [flags]"r"(flags));
> + return rc;
>  }
>  
>  static void fetch_register_operand(struct operand *op)
> -- 
> 1.8.0.1

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/8] KVM: x86 emulator: Streamline SHLD, SHRD

2013-01-17 Thread Gleb Natapov
On Sat, Jan 12, 2013 at 04:32:50PM +0200, Avi Kivity wrote:
> Signed-off-by: Avi Kivity 
> ---
>  arch/x86/kvm/emulate.c | 33 +
>  1 file changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 619a33d..2189c6a 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -454,6 +454,8 @@ static void invalidate_registers(struct x86_emulate_ctxt 
> *ctxt)
>  #define FOP_END \
>   ".popsection")
>  
> +#define FOPNOP() FOP_ALIGN FOP_RET
> +
>  #define FOP1E(op,  dst) \
>   FOP_ALIGN #op " %" #dst " \n\t" FOP_RET
>  
> @@ -476,6 +478,18 @@ static void invalidate_registers(struct x86_emulate_ctxt 
> *ctxt)
>   ON64(FOP2E(op##q, rax, rbx)) \
>   FOP_END
>  
> +#define FOP3E(op,  dst, src, src2) \
> + FOP_ALIGN #op " %" #src2 ", %" #src ", %" #dst " \n\t" FOP_RET
> +
> +/* 3-operand, word-only, src2=cl */
> +#define FASTOP3WCL(op) \
> + FOP_START(op) \
> + FOPNOP() \
> + FOP3E(op, ax, bx, cl) \
> + FOP3E(op, eax, ebx, cl) \
> + ON64(FOP3E(op, rax, rbx, cl)) \

All other FASTOPs have explicit operand size. It is not strictly required
since assembler can figure it out from register name, but for consistency
why not put it there too?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [QEMU PATCH v2] virtio-net: introduce a new macaddr control

2013-01-17 Thread Stefan Hajnoczi
On Thu, Jan 17, 2013 at 01:45:11PM +0800, Amos Kong wrote:
> On Thu, Jan 17, 2013 at 11:49:20AM +1030, Rusty Russell wrote:
> > ak...@redhat.com writes:
> > > @@ -349,6 +351,14 @@ static int virtio_net_handle_mac(VirtIONet *n, 
> > > uint8_t cmd,
> > >  {
> > >  struct virtio_net_ctrl_mac mac_data;
> > >  
> > > +if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET && elem->out_num == 2 &&
> > > +elem->out_sg[1].iov_len == ETH_ALEN) {
> > > +/* Set MAC address */
> > > +memcpy(n->mac, elem->out_sg[1].iov_base, 
> > > elem->out_sg[1].iov_len);
> > > +qemu_format_nic_info_str(&n->nic->nc, n->mac);
> > > +return VIRTIO_NET_OK;
> > > +}
> > 
> > Does the rest of the net device still rely on the layout of descriptors?
> 
> No, only info string of net client relies on n->mac

I think the question is whether the hw/virtio-net.c code makes
assumptions about virtqueue descriptor layout (e.g. sg[0] is the header,
sg[1] is the data buffer).

The answer is yes, the control virtqueue function directly accesses
iov[n].

Additional patches would be required to convert the existing
hw/virtio-net.c code to make no assumptions about virtqueue descriptor
layout.  It's outside the scope of this series.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [Qemu-devel] [QEMU PATCH v2] virtio-net: introduce a new macaddr control

2013-01-17 Thread Amos Kong
On Thu, Jan 17, 2013 at 01:45:11PM +0800, Amos Kong wrote:
> On Thu, Jan 17, 2013 at 11:49:20AM +1030, Rusty Russell wrote:
> > ak...@redhat.com writes:
> > > @@ -349,6 +351,14 @@ static int virtio_net_handle_mac(VirtIONet *n, 
> > > uint8_t cmd,
> > >  {
> > >  struct virtio_net_ctrl_mac mac_data;
> > >  
> > > +if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET && elem->out_num == 2 &&
> > > +elem->out_sg[1].iov_len == ETH_ALEN) {
> > > +/* Set MAC address */
> > > +memcpy(n->mac, elem->out_sg[1].iov_base, 
> > > elem->out_sg[1].iov_len);
> > > +qemu_format_nic_info_str(&n->nic->nc, n->mac);
> > > +return VIRTIO_NET_OK;
> > > +}
> > 
> > Does the rest of the net device still rely on the layout of descriptors?
> 
> No, only info string of net client relies on n->mac

I misunderstood. There is no clear limitation of how much descriptor are
used for each vq command, but many commands rely on the layout of
descriptiors. eg:

virtio-net:
   VIRTIO_NET_CTRL_RX_PROMISC
   VIRTIO_NET_CTRL_RX_ALLMULTI
   VIRTIO_NET_CTRL_MAC_TABLE_SET
   etc
 
> > If so, OK, we'll fix them all together.  If not, this introduces a new
> > one.
> > 
> > Cheers,
> > Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html