Re: [PATCH] qemu: define and use VIRTIO_PFN_SHIFT

2008-11-06 Thread Hollis Blanchard
On Thursday 06 November 2008 19:38:40 Zhang, Xiantao wrote:
> 
> Hi, Hollis
>   Currenlty, kvm-qemu only supports the only case which is host page_size 
> = 
qemu's target page size for ia64. Does your patch meets the requirement ? For 
ia64, current linux support 4K, 16K and 64k page size, and 1M 16M 64M or 
bigger page will be supported in future, so if your patch consider the case, 
it should work for ia64.  Thanks! 

Take a look at the patch and you tell me. :) I've hardcoded the PFN shift as 
16, which matches your TARGET_PAGE_BITS, but does it *have* to be the same? I 
don't believe it does.

Out of curiosity, if you had 4K Linux host pages, and you redefined 
TARGET_PAGE_BITS to be 12, would everything just work?

Also, as long as I have you... can we use getpagesize() instead of 65536/4096 
in qemu_vmalloc()?

-- 
Hollis Blanchard
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [PATCH] kvm: powerpc: add exit timing statistics

2008-11-06 Thread Hollis Blanchard
On Tuesday 04 November 2008 06:33:39 Ehrhardt Christian wrote:
> From: Christian Ehrhardt <[EMAIL PROTECTED]>
> 
> Other existing kvm stats are either just counters (kvm_stat) reported for kvm
> generally or trace based aproaches like kvm_trace.
> For kvm on powerpc we had the need to track the timings of the different exit
> types. While this could be achieved parsing data created with a kvm_trace
> extension this adds too muhc overhead (at least on embedded powerpc) slowing
> down the workloads we wanted to measure.
> 
> Therefore this patch adds a in kernel exit timing statistic to the powerpc kvm
> code. These statistic is available per vm&vcpu under the kvm debugfs 
> directory.
> As this statistic is low, but still some overhead it can be enabled via a
> .config entry and should be off by default.

It's not just that kvm_trace is "slow" -- I think the important point is that 
this code is *not* implementing another 
trace mechanism. It's simply a set of counters. The reason it doesn't fit 
naturally into kvm_stat is that 
kvm_stat is designed just to record event frequency, so it aggregates the 
counters from all vcpus. Right?

> Since this patch touched all powerpc kvm_stat code anyway this code is now
> merged and simpliefied together with the exit timing statistic code (still
> working with exit timing disabled in .config).

In general this looks pretty good. I still need to go through it in more 
detail, but just a couple superficial 
comments right now:

> diff --git a/arch/powerpc/include/asm/mmu-44x.h 
> b/arch/powerpc/include/asm/mmu-44x.h
> --- a/arch/powerpc/include/asm/mmu-44x.h
> +++ b/arch/powerpc/include/asm/mmu-44x.h
> @@ -56,6 +56,7 @@
>  #ifndef __ASSEMBLY__
> 
>  extern unsigned int tlb_44x_hwater;
> +extern unsigned int tlb_44x_index;
> 
>  typedef struct {
>   unsigned long id;

This must have accidentally slipped in to your patch.

> @@ -108,6 +258,10 @@
>   kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
>   if (!kvm)
>   return ERR_PTR(-ENOMEM);
> +
> +#ifdef CONFIG_KVM_BOOKE_EXIT_TIMING
> + kvm->arch.vm_id = atomic_inc_return(&vm_count);
> +#endif
> 
>   return kvm;
>  }

This "vm_count" stuff caught my eye as looking strange. Is there no better "ID" 
value you can use? What 
about qemu's PID?

-- 
Hollis Blanchard
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] qemu: define and use VIRTIO_PFN_SHIFT

2008-11-06 Thread Zhang, Xiantao
Hollis Blanchard wrote:
> On Thu, 2008-11-06 at 08:01 -0600, Anthony Liguori wrote:
>> Hollis Blanchard wrote:
>>> # HG changeset patch
>>> # User Hollis Blanchard <[EMAIL PROTECTED]>
>>> # Date 1225946837 21600
>>> # Node ID 43a111ea61b542d3823e2a11d017e7b06b7ec254
>>> # Parent  b63967268af119e0faa4adc3086cdef857815548
>>> qemu: define and use VIRTIO_PFN_SHIFT
>>> 
>>> The virtio front and back ends must agree about how big a pfn
>>> really is. Since qemu has no idea what "page size" the guest may be
>>> using, it must be independent of TARGET_PAGE_BITS.
>>> 
>>> This patch should have no functional effect on x86 or ia64, but I'd
>>> like an ack from the ia64 guys. 
>>> 
>> 
>> Would be better to add a new header in target-XXX instead of using
>> cpu.h.  Virtio is not part of the CPU ISA.
> 
> OK.
> 
>>> Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>
>>> 
>>> diff --git a/qemu/hw/virtio.c b/qemu/hw/virtio.c
>>> --- a/qemu/hw/virtio.c
>>> +++ b/qemu/hw/virtio.c
>>> @@ -56,6 +56,10 @@
>>>   */
>>>  #define wmb() do { } while (0)
>>> 
>>> +#define VRING_PAGE_SIZE (1<<12)
>>> +
>>> +#define ALIGN(x, a)  (((x)+(a)-1) & ~((a)-1))
>>> +
>>>  /* virt queue functions */
>> 
>> Why is VRING_PAGE_SIZE not architecture specific?
> 
> I wanted to make sure people on non-x86 architectures couldn't run
> into vring-size related problems that didn't also appear on x86.

Hi, Hollis
Currenlty, kvm-qemu only supports the only case which is host page_size 
= qemu's target page size for ia64. Does your patch meets the requirement ? For 
ia64, current linux support 4K, 16K and 64k page size, and 1M 16M 64M or bigger 
page will be supported in future, so if your patch consider the case, it should 
work for ia64.  Thanks! 
Thanks
Xiantao

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel status update

2008-11-06 Thread Hollis Blanchard
On Fri, 2008-10-31 at 14:44 -0500, Hollis Blanchard wrote:
> 
>   * Yu's idea about TLB handling was great, and I saw about a 20%
> performance improvement on a couple small workloads on 440. Very
> happy about that one. :)

This patch, which I'll call the "direct TLB" patch, improved performance
for me, but I didn't test it hard enough. Christian saw a 20%
*regression* with it, because although it does slightly reduce exit
handling time across the board, it also dramatically increases the
number of TLB misses. I don't think he's posted his data here yet
though.

I guess this is because we don't make any effort to reload the TLB when
re-entering the guest, so it's far more likely to fault. However, I was
actually just thinking that we may be able to take a hybrid approach:
keep the shadow TLB, but only save/restore it on vcpu_get() and
vcpu_put(). That way, we wouldn't pay the per-exit cost, but we'd also
have a more populated TLB present when we enter the guest. I may try
that as soon as I get out of virtio page size hell.

Anyways, the patch was posted at
http://penguinppc.org/~hollisb/kvm/patches/kvmppc_no_shadow_tlb.diff,
but I'll repost it here in case anybody has any thoughts. (Because it
can degrade performance, I did not include it in my recent push to Avi.)

-

kvm: ppc: insert shadow mappings directly into the MMU

The overhead of checking for TLB updates after every exit handler outweighs any
benefits. Instead of keeping a shadow TLB data structure (which wouldn't work
well for larger TLBs anyways), insert shadow mappings directly into the
hardware TLB.

(This means that host mappings may overwrite shadow mappings, which on a
subsequent fault will cause the shadow mapping to be reinserted elsewhere (and
a corresponding increase to the guest page's refcount). However, we won't lose
the page pointer, so we still properly free it.)

The net is about a 20% performance improvement (!).

Thanks to Liu Yu <[EMAIL PROTECTED]> for the idea.

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>

diff --git a/arch/powerpc/include/asm/kvm_44x.h 
b/arch/powerpc/include/asm/kvm_44x.h
--- a/arch/powerpc/include/asm/kvm_44x.h
+++ b/arch/powerpc/include/asm/kvm_44x.h
@@ -22,19 +22,25 @@
 
 #include 
 
-/* XXX Can't include mmu-44x.h because it redefines struct mm_context. */
 #define PPC44x_TLB_SIZE 64
+
+/* If the guest is expecting it, this can be as large as we like; we'd just
+ * need to find some way of advertising it. */
+#define KVM44x_GUEST_TLB_SIZE 64
+
+struct kvmppc_44x_shadow_ref {
+   struct page *page;
+   u16 gtlb_index;
+   u8 writeable;
+   u8 tid;
+};
 
 struct kvmppc_vcpu_44x {
/* Unmodified copy of the guest's TLB. */
-   struct kvmppc_44x_tlbe guest_tlb[PPC44x_TLB_SIZE];
-   /* TLB that's actually used when the guest is running. */
-   struct kvmppc_44x_tlbe shadow_tlb[PPC44x_TLB_SIZE];
-   /* Pages which are referenced in the shadow TLB. */
-   struct page *shadow_pages[PPC44x_TLB_SIZE];
+   struct kvmppc_44x_tlbe guest_tlb[KVM44x_GUEST_TLB_SIZE];
 
-   /* Track which TLB entries we've modified in the current exit. */
-   u8 shadow_tlb_mod[PPC44x_TLB_SIZE];
+   /* Shadow pages referenced in the real TLB. */
+   struct kvmppc_44x_shadow_ref tlb_shadow[PPC44x_TLB_SIZE];
 
struct kvm_vcpu vcpu;
 };
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -52,8 +52,8 @@ extern int kvmppc_emulate_mmio(struct kv
 extern int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu);
 extern void kvmppc_emulate_dec(struct kvm_vcpu *vcpu);
 
-extern void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn,
-   u64 asid, u32 flags);
+extern void kvmppc_mmu_map(struct kvm_vcpu *vcpu, unsigned int gtlb_idx,
+   gva_t gvaddr, gfn_t gfn, u64 asid, u32 flags);
 extern void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode);
 extern void kvmppc_mmu_switch_pid(struct kvm_vcpu *vcpu, u32 pid);
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -357,12 +357,6 @@ int main(void)
 #ifdef CONFIG_KVM
DEFINE(TLBE_BYTES, sizeof(struct kvmppc_44x_tlbe));
 
-   DEFINE(VCPU_TO_44X, offsetof(struct kvmppc_vcpu_44x, vcpu));
-   DEFINE(VCPU44x_SHADOW_TLB,
-  offsetof(struct kvmppc_vcpu_44x, shadow_tlb));
-   DEFINE(VCPU44x_SHADOW_MOD,
-  offsetof(struct kvmppc_vcpu_44x, shadow_tlb_mod));
-
DEFINE(VCPU_HOST_STACK, offsetof(struct kvm_vcpu, arch.host_stack));
DEFINE(VCPU_HOST_PID, offsetof(struct kvm_vcpu, arch.host_pid));
DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr));
diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c
--- a/arch/powerpc/kvm/44x.c

Re: dynamic virtio page size

2008-11-06 Thread Hollis Blanchard
On Thu, 2008-11-06 at 17:27 -0600, Hollis Blanchard wrote:
> 
> In the patch I sent, they are named VIRTIO_PFN_SHIFT and
> VRING_PAGE_SIZE. In fact, the first could really be named
> VIRTIO_PCI_PFN_SHIFT or even more specific, which hopefully would
> alleviate the confusion.
> 
> Anyways, I've been trying to implement your other idea, which was to
> advertise a "virtio page size" from the host to the guest. Since the
> virtio IO space is only 20 bytes long, and it's full, we need to offer a
> feature bit to notify the guest that the IO space has been expanded. 
> 
>virtio's PCI IO resource
> 
> |20 bytes| |
> 
>virtio-pci   virtio-
> specific  specific
>   data  data
> 
> If guests don't acknowledge the feature, the host pretends the
> VIRTIO_PAGE_SIZE register (which would be the new IO offset 20) doesn't
> exist, and reads to offset 20 instead return the device-specific config
> data.

FWIW, here's my current (broken) patch. Aside from being broken, it also
doesn't seem like this approach will scale well should we need to extend
the IO space again in the future.


diff --git a/qemu/hw/virtio.c b/qemu/hw/virtio.c
--- a/qemu/hw/virtio.c
+++ b/qemu/hw/virtio.c
@@ -45,7 +45,11 @@
  * a read-and-acknowledge. */
 #define VIRTIO_PCI_ISR 19
 
-#define VIRTIO_PCI_CONFIG  20
+/* A 32-bit r/o register specifying the size of a "page", in bytes, in the
+ * virtio interface. */
+#define VIRTIO_PCI_PAGESIZE20
+
+#define VIRTIO_PCI_CONFIG  24
 
 /* Virtio ABI version, if we increment this, we break the guest driver. */
 #define VIRTIO_PCI_ABI_VERSION 0
@@ -55,6 +59,12 @@
  * KVM or if kqemu gets SMP support.
  */
 #define wmb() do { } while (0)
+
+#define VIRTIO_PAGE_SHIFT  12
+#define VIRTIO_PAGE_SIZE   (1vring.avail = p + vq->vring.num * sizeof(VRingDesc);
-vq->vring.used = (void *)TARGET_PAGE_ALIGN((unsigned 
long)&vq->vring.avail->ring[vq->vring.num]);
+vq->vring.used = (void *)VIRTIO_PAGE_ALIGN((unsigned 
long)&vq->vring.avail->ring[vq->vring.num]);
 }
 
 static unsigned virtqueue_next_desc(VirtQueue *vq, unsigned int i)
@@ -234,6 +244,9 @@ static uint32_t virtio_config_readb(void
 
 vdev->get_config(vdev, vdev->config);
 
+if (!(vdev->features & VIRTIO_F_EXPLICIT_PAGE_SIZE))
+   addr += 4;
+
 addr -= vdev->addr + VIRTIO_PCI_CONFIG;
 if (addr > (vdev->config_len - sizeof(val)))
return (uint32_t)-1;
@@ -248,6 +261,9 @@ static uint32_t virtio_config_readw(void
 uint16_t val;
 
 vdev->get_config(vdev, vdev->config);
+
+if (!(vdev->features & VIRTIO_F_EXPLICIT_PAGE_SIZE))
+   addr += 4;
 
 addr -= vdev->addr + VIRTIO_PCI_CONFIG;
 if (addr > (vdev->config_len - sizeof(val)))
@@ -264,6 +280,9 @@ static uint32_t virtio_config_readl(void
 
 vdev->get_config(vdev, vdev->config);
 
+if (!(vdev->features & VIRTIO_F_EXPLICIT_PAGE_SIZE))
+   addr += 4;
+
 addr -= vdev->addr + VIRTIO_PCI_CONFIG;
 if (addr > (vdev->config_len - sizeof(val)))
return (uint32_t)-1;
@@ -276,6 +295,9 @@ static void virtio_config_writeb(void *o
 {
 VirtIODevice *vdev = opaque;
 uint8_t val = data;
+
+if (!(vdev->features & VIRTIO_F_EXPLICIT_PAGE_SIZE))
+   addr += 4;
 
 addr -= vdev->addr + VIRTIO_PCI_CONFIG;
 if (addr > (vdev->config_len - sizeof(val)))
@@ -292,6 +314,9 @@ static void virtio_config_writew(void *o
 VirtIODevice *vdev = opaque;
 uint16_t val = data;
 
+if (!(vdev->features & VIRTIO_F_EXPLICIT_PAGE_SIZE))
+   addr += 4;
+
 addr -= vdev->addr + VIRTIO_PCI_CONFIG;
 if (addr > (vdev->config_len - sizeof(val)))
return;
@@ -306,6 +331,9 @@ static void virtio_config_writel(void *o
 {
 VirtIODevice *vdev = opaque;
 uint32_t val = data;
+
+if (!(vdev->features & VIRTIO_F_EXPLICIT_PAGE_SIZE))
+   addr += 4;
 
 addr -= vdev->addr + VIRTIO_PCI_CONFIG;
 if (addr > (vdev->config_len - sizeof(val)))
@@ -329,9 +357,10 @@ static void virtio_ioport_write(void *op
if (vdev->set_features)
vdev->set_features(vdev, val);
vdev->features = val;
+   printf("%s: features 0x%x\n", __func__, val);
break;
 case VIRTIO_PCI_QUEUE_PFN:
-   pa = (ram_addr_t)val << TARGET_PAGE_BITS;
+   pa = (ram_addr_t)val << VIRTIO_PAGE_SHIFT;
vdev->vq[vdev->queue_sel].pfn = val;
if (pa == 0) {
 virtio_reset(vdev);
@@ -368,6 +397,7 @@ static uint32_t virtio_ioport_read(void 
 case VIRTIO_PCI_HOST_FEATURES:
ret = vdev->get_features(vdev);
ret |= (1 << VIRTIO_F_NOTIFY_ON_EMPTY);
+   ret |= (1 << VIRTIO_F_EXPLICIT_PAGE_SIZE);
break;
 case VIRTIO_PCI_GUEST_FEATURES:
ret = vdev->features;
@@ -390,6 +420,16 @@ static uint32_t virtio_ioport_read(void 

Re: dynamic virtio page size

2008-11-06 Thread Hollis Blanchard
On Thu, 2008-11-06 at 14:02 -0600, Anthony Liguori wrote:
> Hollis Blanchard wrote:
> > I wanted to make sure people on non-x86 architectures couldn't run into
> > vring-size related problems that didn't also appear on x86.
> >   
> 
> Having a VRING_SHIFT and a VRING_PAGE_SIZE where VRING_PAGE_SIZE != (1 
> << VRING_SHIFT) is almost certainly going to break things in unexpected 
> ways because it's going against common understanding of the relationship 
> between shift and page_size.

In the patch I sent, they are named VIRTIO_PFN_SHIFT and
VRING_PAGE_SIZE. In fact, the first could really be named
VIRTIO_PCI_PFN_SHIFT or even more specific, which hopefully would
alleviate the confusion.

Anyways, I've been trying to implement your other idea, which was to
advertise a "virtio page size" from the host to the guest. Since the
virtio IO space is only 20 bytes long, and it's full, we need to offer a
feature bit to notify the guest that the IO space has been expanded. 

   virtio's PCI IO resource

|20 bytes| |

   virtio-pci   virtio-
specific  specific
  data  data

If guests don't acknowledge the feature, the host pretends the
VIRTIO_PAGE_SIZE register (which would be the new IO offset 20) doesn't
exist, and reads to offset 20 instead return the device-specific config
data.

Problem: the guest has not yet acknowledged the advertised features
before accessing device-specific data. So in the virtio-blk/virtio-pci
case, virtio_dev_probe() calls virtblk_probe() *before*
vp_finalize_features(). virtblk_probe() uses virtio_config_val(), which
returns the wrong data because the new yet unknown feature flag is still
set. (Of course, we can't just fix the ordering guest-side, because that
breaks backwards compatibility.)

It seems to me that virtio-pci configuration is not very flexible. I
think I'm going to continue revising the hardcoded page size patches I
sent earlier...

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio: Define and use per-architecture "pfn shift" constants

2008-11-06 Thread Anthony Liguori

Hollis Blanchard wrote:

# HG changeset patch
# User Hollis Blanchard <[EMAIL PROTECTED]>
# Date 1225946980 21600
# Node ID f58566dfe20e841604e1377ff41e9e0501c1cf18
# Parent  f776b102380286dd173a3b89f7dc976140812517
virtio: Define and use per-architecture "pfn shift" constants

Both sides of the virtio interface must agree about how big a pfn really is.
This is particularly an issue on architectures where the page size is
configurable (e.g. PowerPC, IA64) -- the interface must be independent of
PAGE_SHIFT.

This patch should have no functional effect on x86 or ia64, but an ack from the 
IA64 guys
would be good because I'm not clear on their page size requirements.

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>

  


This is missing a fix for virtio-balloon.

Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu: define and use VIRTIO_PFN_SHIFT

2008-11-06 Thread Anthony Liguori

Hollis Blanchard wrote:

I wanted to make sure people on non-x86 architectures couldn't run into
vring-size related problems that didn't also appear on x86.
  


Having a VRING_SHIFT and a VRING_PAGE_SIZE where VRING_PAGE_SIZE != (1 
<< VRING_SHIFT) is almost certainly going to break things in unexpected 
ways because it's going against common understanding of the relationship 
between shift and page_size.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu: define and use VIRTIO_PFN_SHIFT

2008-11-06 Thread Hollis Blanchard
On Thu, 2008-11-06 at 08:01 -0600, Anthony Liguori wrote:
> Hollis Blanchard wrote:
> > # HG changeset patch
> > # User Hollis Blanchard <[EMAIL PROTECTED]>
> > # Date 1225946837 21600
> > # Node ID 43a111ea61b542d3823e2a11d017e7b06b7ec254
> > # Parent  b63967268af119e0faa4adc3086cdef857815548
> > qemu: define and use VIRTIO_PFN_SHIFT
> >
> > The virtio front and back ends must agree about how big a pfn really
> is. Since
> > qemu has no idea what "page size" the guest may be using, it must be
> > independent of TARGET_PAGE_BITS.
> >
> > This patch should have no functional effect on x86 or ia64, but I'd
> like an ack from the
> > ia64 guys.
> >   
> 
> Would be better to add a new header in target-XXX instead of using 
> cpu.h.  Virtio is not part of the CPU ISA.

OK.

> > Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>
> >
> > diff --git a/qemu/hw/virtio.c b/qemu/hw/virtio.c
> > --- a/qemu/hw/virtio.c
> > +++ b/qemu/hw/virtio.c
> > @@ -56,6 +56,10 @@
> >   */
> >  #define wmb() do { } while (0)
> >  
> > +#define VRING_PAGE_SIZE (1<<12)
> > +
> > +#define ALIGN(x, a)  (((x)+(a)-1) & ~((a)-1))
> > +
> >  /* virt queue functions */
> 
> Why is VRING_PAGE_SIZE not architecture specific?

I wanted to make sure people on non-x86 architectures couldn't run into
vring-size related problems that didn't also appear on x86.

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Exit timing - results online

2008-11-06 Thread Hollis Blanchard
On Thu, 2008-11-06 at 12:25 +0100, Christian Ehrhardt wrote:
> Hi,
> I just added the following page to our wiki based online documentation 
> to show some numbers on kvmppc as of today.
> You can find that page at thew wiki entry:
>   PowerPC_Exittimings 
>  - an overview of 
> workload dependent overhead on non hardware assisted powerpc virtualization
>   (http://kvm.qumranet.com/kvmwiki/PowerPC_Exittimings)

Very interesting; thanks for putting these numbers together Christian.

I'm interested by the instruction emulation, which shows up as the
highest cost in every workload. That suggests that even small
improvements to the emulation path, such as using a better lookup than
nested switch statements, could have a significant effect on throughput.

It's also interesting that the average is really close to the minimum,
but the standard deviation is huge. That suggests that the vast majority
of exits take the minimum amount of time, but a few exits take a *ton*
of time. Looking through the set of instructions we emulate on 440, it
looks like the only even slightly sophisticated ones are rfi, tlbwe, and
tlbsx...

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio: Define and use per-architecture "pfn shift" constants

2008-11-06 Thread Anthony Liguori

Hollis Blanchard wrote:

On Thu, 2008-11-06 at 10:49 +, Mark McLoughlin wrote:
  

But actually, why do we align the size anyway?



I assume it's so that the last page in the ring (containing the "used"
fields) could be safely mapped into another guest's address space,
without fear of exposing other data.

I don't know how valuable that is, but that's not really my concern so I
preserved the behavior.

  

Also might make sense for vring_init() and vring_size() not to take a
pagesize argument and hard-code them to use VRING_PAGE_SIZE.



I think that's a good idea. Anthony mentioned earlier the code was done
this way so it could be copied to userspace, where PAGE_SIZE is
unavailable, but that isn't an issue if we switch to VRING_PAGE_SIZE.
  


Agreed.

Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio: Define and use per-architecture "pfn shift" constants

2008-11-06 Thread Hollis Blanchard
On Thu, 2008-11-06 at 10:49 +, Mark McLoughlin wrote:
> On Wed, 2008-11-05 at 22:49 -0600, Hollis Blanchard wrote:
> 
> > -   info->queue = kzalloc(PAGE_ALIGN(vring_size(num,PAGE_SIZE)), 
> > GFP_KERNEL);
> > +   vring_bytes = PAGE_ALIGN(vring_size(num, VRING_PAGE_SIZE));
> > +   info->queue = kzalloc(vring_bytes, GFP_KERNEL);
> 
> You're still aligning the size to PAGE_SIZE rather than VRING_PAGE_SIZE?

The original code was calling kzalloc with a multiple of PAGE_SIZE, so I
kept that behavior here...

> But actually, why do we align the size anyway?

I assume it's so that the last page in the ring (containing the "used"
fields) could be safely mapped into another guest's address space,
without fear of exposing other data.

I don't know how valuable that is, but that's not really my concern so I
preserved the behavior.

> Also might make sense for vring_init() and vring_size() not to take a
> pagesize argument and hard-code them to use VRING_PAGE_SIZE.

I think that's a good idea. Anthony mentioned earlier the code was done
this way so it could be copied to userspace, where PAGE_SIZE is
unavailable, but that isn't an issue if we switch to VRING_PAGE_SIZE.

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: cross-compiling flags?

2008-11-06 Thread Hollis Blanchard
On Thu, 2008-11-06 at 15:44 +0800, Liu Yu wrote:
> > 
> > When I cross-compile qemu, it's using the cross-toolchain 
> > glibc headers. Since 
> > my toolchain is from crosstool, those headers are 
> > at 
> > /opt/crosstool/gcc-3.4.5-glibc-2.3.6/powerpc-440-linux-gnu/pow
> > erpc-440-linux-gnu
> 
> Yes, me too.
> 
> > 
> > > In my case, the libc headers are already appointed to a
> > > toolchain-combined version in LTIB(assume you know it).
> > > But zlib.h is not existing so I had to use qemu-cflag to 
> > appoint another
> > > dir
> > 
> > If your zlib headers are mixed in with the target libc 
> > headers, then why do 
> > you need any -I at all?
> 
> The zlib headers are mixed in powerpc local libc headers not the toolchain 
> cross libc headers.
> And it seems the local libc headers is unfit for cross building.

Since I'm not familiar with LTIB, I chatted with Kumar Gala about this
briefly, and he suggested that you post your issue to the internal
Freescale LTIB list.

My understanding is that you have a special directory for target
headers, which is used by the cross toolchain. Without understanding the
details, my suggestion would be to build a second copy of zlib from
source, and configure it with --prefix=/target/specific/directory.

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu: define and use VIRTIO_PFN_SHIFT

2008-11-06 Thread Anthony Liguori

Hollis Blanchard wrote:

# HG changeset patch
# User Hollis Blanchard <[EMAIL PROTECTED]>
# Date 1225946837 21600
# Node ID 43a111ea61b542d3823e2a11d017e7b06b7ec254
# Parent  b63967268af119e0faa4adc3086cdef857815548
qemu: define and use VIRTIO_PFN_SHIFT

The virtio front and back ends must agree about how big a pfn really is. Since
qemu has no idea what "page size" the guest may be using, it must be
independent of TARGET_PAGE_BITS.

This patch should have no functional effect on x86 or ia64, but I'd like an ack 
from the
ia64 guys.
  


Would be better to add a new header in target-XXX instead of using 
cpu.h.  Virtio is not part of the CPU ISA.



Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>

diff --git a/qemu/hw/virtio.c b/qemu/hw/virtio.c
--- a/qemu/hw/virtio.c
+++ b/qemu/hw/virtio.c
@@ -56,6 +56,10 @@
  */
 #define wmb() do { } while (0)
 
+#define VRING_PAGE_SIZE (1<<12)

+
+#define ALIGN(x, a)  (((x)+(a)-1) & ~((a)-1))
+
 /* virt queue functions */


Why is VRING_PAGE_SIZE not architecture specific?

Regards,

Anthony Liguori


 
 static void *virtio_map_gpa(target_phys_addr_t addr, size_t size)

@@ -95,8 +99,8 @@ static void *virtio_map_gpa(target_phys_
 
 static size_t virtqueue_size(int num)

 {
-return TARGET_PAGE_ALIGN((sizeof(VRingDesc) * num) +
-(sizeof(VRingAvail) + sizeof(uint16_t) * num)) +
+return ALIGN((sizeof(VRingDesc) * num) + (sizeof(VRingAvail) +
+sizeof(uint16_t) * num), VRING_PAGE_SIZE) +
(sizeof(VRingUsed) + sizeof(VRingUsedElem) * num);
 }
 
@@ -104,7 +108,7 @@ static void virtqueue_init(VirtQueue *vq

 {
 vq->vring.desc = p;
 vq->vring.avail = p + vq->vring.num * sizeof(VRingDesc);
-vq->vring.used = (void *)TARGET_PAGE_ALIGN((unsigned 
long)&vq->vring.avail->ring[vq->vring.num]);
+vq->vring.used = (void *)ALIGN((unsigned 
long)&vq->vring.avail->ring[vq->vring.num], VRING_PAGE_SIZE);
 }
 
 static unsigned virtqueue_next_desc(VirtQueue *vq, unsigned int i)

@@ -241,7 +245,7 @@ static void virtio_ioport_write(void *op
vdev->features = val;
break;
 case VIRTIO_PCI_QUEUE_PFN:
-   pa = (ram_addr_t)val << TARGET_PAGE_BITS;
+   pa = (ram_addr_t)val << VIRTIO_PFN_SHIFT;
vdev->vq[vdev->queue_sel].pfn = val;
if (pa == 0) {
 virtio_reset(vdev);
@@ -519,7 +523,7 @@ void virtio_load(VirtIODevice *vdev, QEM
size_t size;
target_phys_addr_t pa;
 
-	pa = (ram_addr_t)vdev->vq[i].pfn << TARGET_PAGE_BITS;

+   pa = (ram_addr_t)vdev->vq[i].pfn << VIRTIO_PFN_SHIFT;
size = virtqueue_size(vdev->vq[i].vring.num);
virtqueue_init(&vdev->vq[i], virtio_map_gpa(pa, size));
}
diff --git a/qemu/target-i386/cpu.h b/qemu/target-i386/cpu.h
--- a/qemu/target-i386/cpu.h
+++ b/qemu/target-i386/cpu.h
@@ -751,6 +751,8 @@ static inline int cpu_get_time_fast(void
 
 #define TARGET_PAGE_BITS 12
 
+#define VIRTIO_PFN_SHIFT 12

+
 #define CPUState CPUX86State
 #define cpu_init cpu_x86_init
 #define cpu_exec cpu_x86_exec
diff --git a/qemu/target-ia64/cpu.h b/qemu/target-ia64/cpu.h
--- a/qemu/target-ia64/cpu.h
+++ b/qemu/target-ia64/cpu.h
@@ -30,6 +30,8 @@
 #define TARGET_LONG_BITS 64
 
 #define TARGET_PAGE_BITS 16

+
+#define VIRTIO_PFN_SHIFT 16
 
 #define ELF_MACHINE	EM_IA_64
 
diff --git a/qemu/target-ppc/cpu.h b/qemu/target-ppc/cpu.h

--- a/qemu/target-ppc/cpu.h
+++ b/qemu/target-ppc/cpu.h
@@ -54,6 +54,8 @@
 #endif /* defined(TARGET_PPCEMB) */
 
 #endif /* defined (TARGET_PPC64) */

+
+#define VIRTIO_PFN_SHIFT 10
 
 #include "cpu-defs.h"
 
--

To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio: Define and use per-architecture "pfn shift" constants

2008-11-06 Thread Anthony Liguori

Hollis Blanchard wrote:

# HG changeset patch
# User Hollis Blanchard <[EMAIL PROTECTED]>
# Date 1225946980 21600
# Node ID f58566dfe20e841604e1377ff41e9e0501c1cf18
# Parent  f776b102380286dd173a3b89f7dc976140812517
virtio: Define and use per-architecture "pfn shift" constants

Both sides of the virtio interface must agree about how big a pfn really is.
This is particularly an issue on architectures where the page size is
configurable (e.g. PowerPC, IA64) -- the interface must be independent of
PAGE_SHIFT.

This patch should have no functional effect on x86 or ia64, but an ack from the 
IA64 guys
would be good because I'm not clear on their page size requirements.

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>
  


Acked-by: Anthony Liguori <[EMAIL PROTECTED]>

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Exit timing - results online

2008-11-06 Thread Christian Ehrhardt

Hi,
I just added the following page to our wiki based online documentation 
to show some numbers on kvmppc as of today.

You can find that page at thew wiki entry:
 PowerPC_Exittimings 
 - an overview of 
workload dependent overhead on non hardware assisted powerpc virtualization

 (http://kvm.qumranet.com/kvmwiki/PowerPC_Exittimings)

Currently this page covers a brief introduction, workload description, 
exit timings and an example of paravirtualization improvement.


There is no "interpretation" of that data on the page yet.
I think I add some explanations and theories why you can see which 
effect there once I chatted with a few involved people.

I welcome everyone to discuss with us on the list about that too.

--

GrĂ¼sse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu: define and use VIRTIO_PFN_SHIFT

2008-11-06 Thread Mark McLoughlin
On Wed, 2008-11-05 at 22:49 -0600, Hollis Blanchard wrote:

> diff --git a/qemu/hw/virtio.c b/qemu/hw/virtio.c
> --- a/qemu/hw/virtio.c
> +++ b/qemu/hw/virtio.c
> @@ -56,6 +56,10 @@
>   */
>  #define wmb() do { } while (0)
>  
> +#define VRING_PAGE_SIZE (1<<12)
> +
> +#define ALIGN(x, a)  (((x)+(a)-1) & ~((a)-1))

Might be better to do:

#define VRING_PAGE_SIZE (1<<12)
#define VRING_PAGE_MASK ~(VRING_PAGE_SIZE - 1)
#define VRING_PAGE_ALIGN(x) (((x) + VRING_PAGE_SIZE - 1) & VRING_PAGE_MASK)

Cheers,
Mark

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio: Define and use per-architecture "pfn shift" constants

2008-11-06 Thread Mark McLoughlin
On Wed, 2008-11-05 at 22:49 -0600, Hollis Blanchard wrote:

> -   info->queue = kzalloc(PAGE_ALIGN(vring_size(num,PAGE_SIZE)), 
> GFP_KERNEL);
> +   vring_bytes = PAGE_ALIGN(vring_size(num, VRING_PAGE_SIZE));
> +   info->queue = kzalloc(vring_bytes, GFP_KERNEL);

You're still aligning the size to PAGE_SIZE rather than VRING_PAGE_SIZE?

But actually, why do we align the size anyway?

Also might make sense for vring_init() and vring_size() not to take a
pagesize argument and hard-code them to use VRING_PAGE_SIZE.

Cheers,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00 of 14] Kernel updates for PowerPC KVM

2008-11-06 Thread Avi Kivity

Hollis Blanchard wrote:

(Sorry, somehow I didn't send the first 5 patches in the series last time.)

Hi Avi, these patches have been tested in PowerPC trees for a while. Some of
these reorganize the PowerPC KVM code to make implementations on other cores
possible, but does cause a little code churn. There are also some performance
optimization for 440 from Christian, and a few bugfixes thrown in too.

Please apply.
  


Applied all, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html