Re: [ANNOUNCE] qemu-kvm-0.13.0-rc1

2010-09-08 Thread Avi Kivity

 On 09/08/2010 11:33 PM, Anthony Liguori wrote:

On 09/08/2010 03:05 PM, Arjan Koers wrote:

On 2010-09-08 18:29, Marcelo Tosatti wrote:

qemu-kvm-0.13.0-rc1 is now available. This release is based on the
upstream qemu 0.13.0-rc1, plus kvm-specific enhancements.

This release can be used with the kvm kernel modules provided by your
distribution kernel, or by the modules in the kvm-kmod package, such
as kvm-kmod-2.6.35.

Please help with testing for a stable 0.13.0 release.

The build fails when configure flag --disable-cpu-emulation is used:


That flag needs to go away.



It's perfectly reasonable to want to avoid building the tcg code if you 
aren't going to use it.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/4] Add a new API to virtio-pci

2010-09-08 Thread Krishna Kumar2
Rusty Russell  wrote on 09/09/2010 09:19:39 AM:

> On Wed, 8 Sep 2010 04:59:05 pm Krishna Kumar wrote:
> > Add virtio_get_queue_index() to get the queue index of a
> > vq.  This is needed by the cb handler to locate the queue
> > that should be processed.
>
> This seems a bit weird.  I mean, the driver used vdev->config->find_vqs
> to find the queues, which returns them (in order).  So, can't you put
this
> into your struct send_queue?

I am saving the vqs in the send_queue, but the cb needs
to locate the device txq from the svq. The only other way
I could think of is to iterate through the send_queue's
and compare svq against sq[i]->svq, but cb's happen quite
a bit. Is there a better way?

static void skb_xmit_done(struct virtqueue *svq)
{
struct virtnet_info *vi = svq->vdev->priv;
int qnum = virtio_get_queue_index(svq) - 1; /* 0 is RX vq */

/* Suppress further interrupts. */
virtqueue_disable_cb(svq);

/* We were probably waiting for more output buffers. */
netif_wake_subqueue(vi->dev, qnum);
}

> Also, why define VIRTIO_MAX_TXQS?  If the driver can't handle all of
them,
> it should simply not use them...

The main reason was vhost :) Since vhost_net_release
should not fail (__fput can't handle f_op->release()
failure), I needed a maximum number of socks to
clean up:

#define MAX_VQS (1 + VIRTIO_MAX_TXQS)
static int vhost_net_release(struct inode *inode, struct file *f)
{
struct vhost_net *n = f->private_data;
struct vhost_dev *dev = &n->dev;
struct socket *socks[MAX_VQS];
int i;

vhost_net_stop(n, socks);
vhost_net_flush(n);
vhost_dev_cleanup(dev);

for (i = n->dev.nvqs - 1; i >= 0; i--)
if (socks[i])
fput(socks[i]->file);
...
}

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/4] Add a new API to virtio-pci

2010-09-08 Thread Rusty Russell
On Wed, 8 Sep 2010 04:59:05 pm Krishna Kumar wrote:
> Add virtio_get_queue_index() to get the queue index of a
> vq.  This is needed by the cb handler to locate the queue
> that should be processed.

This seems a bit weird.  I mean, the driver used vdev->config->find_vqs
to find the queues, which returns them (in order).  So, can't you put this
into your struct send_queue?

Also, why define VIRTIO_MAX_TXQS?  If the driver can't handle all of them,
it should simply not use them...

Thanks!
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


vhost not working with version 12.5 with kernel 2.6.35.4

2010-09-08 Thread matthew . r . rohrer
> >>When trying to use vhost I get the error "vhost-net requested but
could
> >>not be initialized".  The only thing I have been able to find about
this
> >>problem relates to SElinux being turned off which mine is disabled
and
> >>permissive.  Just wondering if there were any other thoughts on this
> >>error? Am I correct that it should work with the .35.4 kernel and
> >>version 12.5 KVM?


> If you mean 0.12.5, no. If you mean 0.12.50 (i.e. a git checkout from
some 
> point after 0.12.0 was released), then it depends on when the checkout
is 
> from.

I do mean 0.12.50 checked out from qemu-kvm via git a couple of weeks
ago.  If I can ask, is 0.12.5 just regular qemu and 0.12.50 qemu-kvm?


> >>KVM Host OS: Fedora 12 x86_64
> >>KVM Guest OS Tiny Core Linux 2.6.33.3 kernel
> >> Host kernel 2.6.35.4 and qemu-system-x86_64 12.5 compiled from from

> >> qemu-kvm repo.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Sorry! I made an error in my last email!

2010-09-08 Thread Sam L. Carl
Hey,

Sorry I made an error with the links in my last email. Here is how it should of 
been:

Over the past few months I have taken a lot of my time to research and ask as 
many people as possible what the top 5 money making methods are. 

After weeks and weeks of different answers and even trying over 50 popular 
products and systems I have come to my conclusion and made the top 5 money 
making products online list. 

So here it goes: 

1) The Mobile Monopoly - http://tiny.cc/ndonh

2) Auto Traffic Avalanche - http://tiny.cc/3wsuq

3) Auto Blog System - http://tiny.cc/ytf7r

4) Zero Cost Commissions - http://tiny.cc/ermdw

5) CPA Instruments - http://tiny.cc/ruh9b


So there you have it. The reason I did this is because I am sick of "gurus" 
ripping people off, most of them are scammers! Be careful when buying online, 
only buy from trusted sources. I have checked the 5 sources above and so have 
thousands of other people just like you and me, and they do work. 

The problem with the internet is you don't know who to trust. The joke is you 
can check up reviews on some products and people would give it mixed reviews, 
some will be saying " amazing products, worked for me" others " a scam don't 
buy it" so sometimes you don't know who is telling the truth right? 

Well I let you in to a little secret here, most "gurus" will write reviews 
about their competitors saying how rubbish they are. They do this to wipe of 
competition. Therefore I decided I will take action and maybe get a some kind 
of Internet peace award for this :p so I tested and tested numerous products 
and came up with this list. 

So enjoy, it's worth checking these 5 products out, they do work especially the 
first two, the other three still work but are a bit over hyped. 


1) The Mobile Monopoly - http://tiny.cc/ndonh

2) Auto Traffic Avalanche - http://tiny.cc/3wsuq

3) Auto Blog System - http://tiny.cc/ytf7r

4) Zero Cost Commissions - http://tiny.cc/ermdw

5) CPA Instruments - http://tiny.cc/ruh9b


All the best,
Sam L. Carl

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Sorry! I made an error in my last email!

2010-09-08 Thread Sam L. Carl
Hey,

Sorry I made an error with the links in my last email. Here is how it should of 
been:

Over the past few months I have taken a lot of my time to research and ask as 
many people as possible what the top 5 money making methods are. 

After weeks and weeks of different answers and even trying over 50 popular 
products and systems I have come to my conclusion and made the top 5 money 
making products online list. 

So here it goes: 

1) The Mobile Monopoly - http://tiny.cc/ndonh

2) Auto Traffic Avalanche - http://tiny.cc/3wsuq

3) Auto Blog System - http://tiny.cc/ytf7r

4) Zero Cost Commissions - http://tiny.cc/ermdw

5) CPA Instruments - http://tiny.cc/ruh9b


So there you have it. The reason I did this is because I am sick of "gurus" 
ripping people off, most of them are scammers! Be careful when buying online, 
only buy from trusted sources. I have checked the 5 sources above and so have 
thousands of other people just like you and me, and they do work. 

The problem with the internet is you don't know who to trust. The joke is you 
can check up reviews on some products and people would give it mixed reviews, 
some will be saying " amazing products, worked for me" others " a scam don't 
buy it" so sometimes you don't know who is telling the truth right? 

Well I let you in to a little secret here, most "gurus" will write reviews 
about their competitors saying how rubbish they are. They do this to wipe of 
competition. Therefore I decided I will take action and maybe get a some kind 
of Internet peace award for this :p so I tested and tested numerous products 
and came up with this list. 

So enjoy, it's worth checking these 5 products out, they do work especially the 
first two, the other three still work but are a bit over hyped. 


1) The Mobile Monopoly - http://tiny.cc/ndonh

2) Auto Traffic Avalanche - http://tiny.cc/3wsuq

3) Auto Blog System - http://tiny.cc/ytf7r

4) Zero Cost Commissions - http://tiny.cc/ermdw

5) CPA Instruments - http://tiny.cc/ruh9b


All the best,
Sam L. Carl

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] qemu-kvm-0.13.0-rc1

2010-09-08 Thread Anthony Liguori

On 09/08/2010 03:05 PM, Arjan Koers wrote:

On 2010-09-08 18:29, Marcelo Tosatti wrote:
   

qemu-kvm-0.13.0-rc1 is now available. This release is based on the
upstream qemu 0.13.0-rc1, plus kvm-specific enhancements.

This release can be used with the kvm kernel modules provided by your
distribution kernel, or by the modules in the kvm-kmod package, such
as kvm-kmod-2.6.35.

Please help with testing for a stable 0.13.0 release.
 

The build fails when configure flag --disable-cpu-emulation is used:
   


That flag needs to go away.

Regards,

Anthony Liguori


...
   CCx86_64-softmmu/pcspk.o
   CCx86_64-softmmu/i8254.o
   CCx86_64-softmmu/i8254-kvm.o
   CCx86_64-softmmu/device-assignment.o
   LINK  x86_64-softmmu/qemu-system-x86_64
exec.o: In function `cpu_exec_init_all':
/home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_ctx'
/home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_prologue_init'
collect2: ld returned 1 exit status
make[1]: *** [qemu-system-x86_64] Error 1
make: *** [subdir-x86_64-softmmu] Error 2
   LINK  x86_64-softmmu/qemu-system-x86_64
exec.o: In function `cpu_exec_init_all':
/home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_ctx'
/home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_prologue_init'
collect2: ld returned 1 exit status
make[1]: *** [qemu-system-x86_64] Error 1
make: *** [subdir-x86_64-softmmu] Error 2


When line 585 'tcg_prologue_init(&tcg_ctx);' is removed, the compilation
succeeds and only one non-fatal warning remains:
/home/kvm/qemu-kvm/target-i386/fake-exec.c:26: warning: no previous
prototype for ‘code_gen_max_block_size’
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] qemu-kvm-0.13.0-rc1

2010-09-08 Thread Arjan Koers
On 2010-09-08 18:29, Marcelo Tosatti wrote:
> 
> qemu-kvm-0.13.0-rc1 is now available. This release is based on the
> upstream qemu 0.13.0-rc1, plus kvm-specific enhancements. 
> 
> This release can be used with the kvm kernel modules provided by your
> distribution kernel, or by the modules in the kvm-kmod package, such
> as kvm-kmod-2.6.35.
> 
> Please help with testing for a stable 0.13.0 release.

The build fails when configure flag --disable-cpu-emulation is used:
...
  CCx86_64-softmmu/pcspk.o
  CCx86_64-softmmu/i8254.o
  CCx86_64-softmmu/i8254-kvm.o
  CCx86_64-softmmu/device-assignment.o
  LINK  x86_64-softmmu/qemu-system-x86_64
exec.o: In function `cpu_exec_init_all':
/home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_ctx'
/home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_prologue_init'
collect2: ld returned 1 exit status
make[1]: *** [qemu-system-x86_64] Error 1
make: *** [subdir-x86_64-softmmu] Error 2
  LINK  x86_64-softmmu/qemu-system-x86_64
exec.o: In function `cpu_exec_init_all':
/home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_ctx'
/home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_prologue_init'
collect2: ld returned 1 exit status
make[1]: *** [qemu-system-x86_64] Error 1
make: *** [subdir-x86_64-softmmu] Error 2


When line 585 'tcg_prologue_init(&tcg_ctx);' is removed, the compilation
succeeds and only one non-fatal warning remains:
/home/kvm/qemu-kvm/target-i386/fake-exec.c:26: warning: no previous
prototype for ‘code_gen_max_block_size’
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-08 Thread Krishna Kumar2
"Michael S. Tsirkin"  wrote on 09/08/2010 01:40:11 PM:

>
___

> >UDP (#numtxqs=8)
> > N#  BW1 BW2   (%)  SD1 SD2   (%)
> > __
> > 4   29836   56761 (90.24)   67  63(-5.97)
> > 8   27666   63767 (130.48)  326 265   (-18.71)
> > 16  25452   60665 (138.35)  13961269  (-9.09)
> > 32  26172   63491 (142.59)  56174202  (-25.19)
> > 48  26146   64629 (147.18)  12813   9316  (-27.29)
> > 64  25575   65448 (155.90)  23063   16346 (-29.12)
> > 128 26454   63772 (141.06)  91054   85051 (-6.59)
> > __
> > N#: Number of netperf sessions, 90 sec runs
> > BW1,SD1,RSD1: Bandwidth (sum across 2 runs in mbps), SD and Remote
> >   SD for original code
> > BW2,SD2,RSD2: Bandwidth (sum across 2 runs in mbps), SD and Remote
> >   SD for new code. e.g. BW2=40716 means average BW2 was
> >   20358 mbps.
> >
>
> What happens with a single netperf?
> host -> guest performance with TCP and small packet speed
> are also worth measuring.

Guest -> Host (single netperf):
I am getting a drop of almost 20%. I am trying to figure out
why.

Host -> guest (single netperf):
I am getting an improvement of almost 15%. Again - unexpected.

Guest -> Host TCP_RR: I get an average 7.4% increase in #packets
for runs upto 128 sessions. With fewer netperf (under 8), there
was a drop of 3-7% in #packets, but beyond that, the #packets
improved significantly to give an average improvement of 7.4%.

So it seems that fewer sessions is having negative effect for
some reason on the tx side. The code path in virtio-net has not
changed much, so the drop in some cases is quite unexpected.

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-08 Thread Krishna Kumar2
> On Wednesday 08 September 2010, Krishna Kumar2 wrote:
> > > The new guest and qemu code work with old vhost-net, just with
reduced
> > > performance, yes?
> >
> > Yes, I have tested new guest/qemu with old vhost but using
> > #numtxqs=1 (or not passing any arguments at all to qemu to
> > enable MQ). Giving numtxqs > 1 fails with ENOBUFS in vhost,
> > since vhost_net_set_backend in the unmodified vhost checks
> > for boundary overflow.
> >
> > I have also tested running an unmodified guest with new
> > vhost/qemu, but qemu should not specify numtxqs>1.
>
> Can you live migrate a new guest from new-qemu/new-kernel
> to old-qemu/old-kernel, new-qemu/old-kernel and old-qemu/new-kernel?
> If not, do we need to support all those cases?

I have not tried this, though I added some minimal code in
virtio_net_load and virtio_net_save. I don't know what needs
to be done exactly at this time. I forgot to put this in the
"Next steps" list of things to do.

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH master/stable-0.12/stable-0.13] kvm: reset MSR_IA32_CR_PAT correctly

2010-09-08 Thread Marcelo Tosatti
On Tue, Sep 07, 2010 at 04:21:22PM +0300, Avi Kivity wrote:
> The power-on value of MSR_IA32_CR_PAT is not 0 - that disables cacheing and
> makes everything dog slow.
> 
> Fix to reset MSR_IA32_CR_PAT to the correct value.
> 
> Signed-off-by: Avi Kivity 
> ---
>  qemu-kvm-x86.c |   11 ++-
>  1 files changed, 10 insertions(+), 1 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] qemu-kvm-0.13.0-rc1

2010-09-08 Thread Marcelo Tosatti

qemu-kvm-0.13.0-rc1 is now available. This release is based on the
upstream qemu 0.13.0-rc1, plus kvm-specific enhancements. 

This release can be used with the kvm kernel modules provided by your
distribution kernel, or by the modules in the kvm-kmod package, such
as kvm-kmod-2.6.35.

Please help with testing for a stable 0.13.0 release.

http://www.linux-kvm.org

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm/e500v2: MMU optimization

2010-09-08 Thread Hollis Blanchard

On 09/08/2010 02:40 AM, Liu Yu wrote:

The patchset aims at mapping guest TLB1 to host TLB0.
And it includes:
[PATCH 1/2] kvm/e500v2: Remove shadow tlb
[PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0

The reason we need patch 1 is because patch 1 make things simple and flexible.
Only applying patch 1 aslo make kvm work.


I've always thought the best long-term "optimization" on these cores is 
to share in the host PID allocation (i.e. __init_new_context()). This 
way, the TID in guest mappings would not overlap the TID in host 
mappings, and guest mappings could be demand-faulted rather than swapped 
wholesale. To do that, you would need to track the host PID in KVM data 
structures, I guess in the tlbe_ref structure.


--
Hollis Blanchard
Mentor Graphics, Embedded Systems Division
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] kvm/e500v2: Remove shadow tlb

2010-09-08 Thread Hollis Blanchard

On 09/08/2010 02:40 AM, Liu Yu wrote:

It is unnecessary to keep shadow tlb.
first, shadow tlb keep fixed value in shadow, which make things unflexible.
second, remove shadow tlb can save a lot memory.

This patch remove shadow tlb and caculate the shadow tlb entry value
before we write it to hardware.

Also we use new struct tlbe_ref to trace the relation
between guest tlb entry and page.


Did you look at the performance impact?

Back in the day, we did essentially the same thing on 440. However, 
rather than discard the whole TLB when context switching away from the 
host (to be demand-faulted when the guest is resumed), we found a 
noticeable performance improvement by preserving a shadow TLB across 
context switches. We only use it in the vcpu_put/vcpu_load path.


Of course, our TLB was much smaller (64 entries), so the use model may 
not be the same at all (e.g. it takes longer to restore a full guest TLB 
working set, but maybe it's not really possible to use all 1024 TLB0 
entries in one timeslice anyways).


--
Hollis Blanchard
Mentor Graphics, Embedded Systems Division
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-08 Thread Arnd Bergmann
On Wednesday 08 September 2010, Krishna Kumar2 wrote:
> > The new guest and qemu code work with old vhost-net, just with reduced
> > performance, yes?
> 
> Yes, I have tested new guest/qemu with old vhost but using
> #numtxqs=1 (or not passing any arguments at all to qemu to
> enable MQ). Giving numtxqs > 1 fails with ENOBUFS in vhost,
> since vhost_net_set_backend in the unmodified vhost checks
> for boundary overflow.
> 
> I have also tested running an unmodified guest with new
> vhost/qemu, but qemu should not specify numtxqs>1.

Can you live migrate a new guest from new-qemu/new-kernel
to old-qemu/old-kernel, new-qemu/old-kernel and old-qemu/new-kernel?
If not, do we need to support all those cases?

Arnd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tracing KVM with Systemtap

2010-09-08 Thread Stefan Hajnoczi
On Wed, Sep 8, 2010 at 2:20 PM, Rayson Ho  wrote:
> Hi all,
>
> I am a developer of Systemtap. I am looking into tracing KVM (the kernel
> part and QEMU) and also the KVM guests with Systemtap. I googled and
> found references to Xenprobes and xdt+dtrace, and I was wondering if
> someone is working on the dynamic tracing interface for KVM?
>
> I've read the KVM kernel code and I think some expensive operations
> (things that need to be trapped back to the host kernel - eg. loading of
> control registers on x86/x64) can be interesting spots for adding an SDT
> (static marker), and I/O operations performed for the guests can be
> useful information to collect.
>
> I know that KVM guests run like a userspace process and thus techniques
> for tracing Xen might be overkilled, and also gdb can be used to trace
> KVM guests. However, are that anything special I need to be aware of
> before I go further into the development of the Systemtap KVM probes?
>
> (Opinions / Suggestions / Criticisms welcome!)

Hi Rayson,
For the KVM kernel module Linux trace events are already used.  For
example, see arch/x86/kvm/trace.h and check out
/sys/kernel/debug/tracing/events/kvm/*.  There is a set of useful
static trace points for vm_exit/vm_enter, pio, mmio, etc.

For the KVM guest there is perf-kvm(1).  This allows perf(1) to look
up addresses inside the guest (kernel only?).  It produces system-wide
performance profiles including guests.  Perhaps someone can comment on
perf-kvm's full feature set and limitations?

For QEMU userspace Prerna Saxena and I are proposing a static tracing
patchset.  It abstracts the trace backend (SystemTap, LTTng UST,
DTrace, etc) from the actual tracepoints so that portability can be
achieved.  There is a built-in trace backend that has a basic feature
set but isn't as fancy as SystemTap.  I have implemented LTTng
Userspace Tracer support, perhaps you'd like to add SystemTap/DTrace
support with sdt.h?

http://www.mail-archive.com/qemu-de...@nongnu.org/msg41323.html
http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/tracing_v3

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2353510 ] Fedora 10 and F11 failures

2010-09-08 Thread SourceForge.net
Bugs item #2353510, was opened at 2008-11-27 13:46
Message generated for change (Comment added) made by jessorensen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2353510&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Works For Me
Priority: 9
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fedora 10 and F11 failures

Initial Comment:

Description:
Fedora 10 fails to install on KVM. (KVM-79)

The DVD version stucks at the near end setup stage, when trying to install GRUB 
bootloader into HDD.
It didn't proceed within one hour, which indicates "stucked" VM.

Sometimes it may stuck earlier - during init or during early setup.

Live CD (32-bit) started fine on both Intel and AMD. (except top menu minor 
rendering bug)

Guest(s): Fedora 10 64-bit
Guest(s): Fedora 10 32-bit
Host(s): Fedora 7 64-bit, Intel, KVM-79
Host(s): Fedora 7 64-bit, AMD, KVM-79

Command: (for DVD)
qemu-kvm -cdrom /isos/linux/Fedora-10-x86_64-DVD.iso -m 512 -hda 
/vm/f10-64.qcow2  -boot d

*and* (for LiveCD)
qemu-kvm -cdrom /isos/linux/F10-i686-Live.iso -m 512

-Alexey, 27.11.2008.

--

>Comment By: Jes Sorensen (jessorensen)
Date: 2010-09-08 15:35

Message:
Tried here with recent KVM / F13 host - installing F11 works just dandy, so
problem has been fixed. 

Closing
Jes


--

Comment By: Technologov (technologov)
Date: 2009-06-11 16:18

Message:
Not only Fedora 10, but also Fedora 11 fails in the same way. Raising bug
priority.

Guest(s): Fedora 10 64-bit DVD

Tested on KVM-86, Intel CPU.

--

Comment By: Technologov (technologov)
Date: 2008-12-02 11:39

Message:

I have opened similar bug against Fedora 10 bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=474116

-Alexey

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2353510&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Tracing KVM with Systemtap

2010-09-08 Thread Rayson Ho
Hi all,

I am a developer of Systemtap. I am looking into tracing KVM (the kernel
part and QEMU) and also the KVM guests with Systemtap. I googled and
found references to Xenprobes and xdt+dtrace, and I was wondering if
someone is working on the dynamic tracing interface for KVM?

I've read the KVM kernel code and I think some expensive operations
(things that need to be trapped back to the host kernel - eg. loading of
control registers on x86/x64) can be interesting spots for adding an SDT
(static marker), and I/O operations performed for the guests can be
useful information to collect.

I know that KVM guests run like a userspace process and thus techniques
for tracing Xen might be overkilled, and also gdb can be used to trace
KVM guests. However, are that anything special I need to be aware of
before I go further into the development of the Systemtap KVM probes?

(Opinions / Suggestions / Criticisms welcome!)

Thanks,
Rayson



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-08 Thread Krishna Kumar2
"Michael S. Tsirkin"  wrote on 09/08/2010 04:18:33 PM:

>
___

> >
> > > >TCP (#numtxqs=2)
> > > > N#  BW1 BW2(%)  SD1 SD2(%)  RSD1
RSD2
> > (%)
> > > >
> > >
> >
>
___

> >
> > > > 4   26387   40716 (54.30)   20  28   (40.00)86i 85
> > (-1.16)
> > > > 8   24356   41843 (71.79)   88  129  (46.59)372 362
> > (-2.68)
> > > > 16  23587   40546 (71.89)   375 564  (50.40)1558
1519
> > (-2.50)
> > > > 32  22927   39490 (72.24)   16172171 (34.26)6694
5722
> > (-14.52)
> > > > 48  23067   39238 (70.10)   39315170 (31.51)15823
13552
> > (-14.35)
> > > > 64  22927   38750 (69.01)   71429914 (38.81)28972
26173
> > (-9.66)
> > > > 96  22568   38520 (70.68)   16258   27844 (71.26)   65944
73031
> > (10.74)
> > >
> > > That's a significant hit in TCP SD. Is it caused by the imbalance
between
> > > number of queues for TX and RX? Since you mention RX is complete,
> > > maybe measure with a balanced TX/RX?
> >
> > Yes, I am not sure why it is so high.
>
> Any errors at higher levels? Are any packets reordered?

I haven't seen any messages logged, and retransmission is similar
to non-mq case. Device also has no errors/dropped packets. Anything
else I should look for?

On the host:

# ifconfig vnet0
vnet0 Link encap:Ethernet  HWaddr 9A:9D:99:E1:CA:CE
  inet6 addr: fe80::989d:99ff:fee1:cace/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:5090371 errors:0 dropped:0 overruns:0 frame:0
  TX packets:5054616 errors:0 dropped:0 overruns:65 carrier:0
  collisions:0 txqueuelen:500
  RX bytes:237793761392 (221.4 GiB)  TX bytes:333630070 (318.1 MiB)
# netstat -s  |grep -i retrans
1310 segments retransmited
35 times recovered from packet loss due to fast retransmit
1 timeouts after reno fast retransmit
41 fast retransmits
1236 retransmits in slow start

So retranmissions are 0.025% of total packets received from the guest.

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-08 Thread Michael S. Tsirkin
On Wed, Sep 08, 2010 at 02:53:03PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin"  wrote on 09/08/2010 01:40:11 PM:
> 
> >
> ___
> 
> > >TCP (#numtxqs=2)
> > > N#  BW1 BW2(%)  SD1 SD2(%)  RSD1RSD2
> (%)
> > >
> >
> ___
> 
> > > 4   26387   40716 (54.30)   20  28   (40.00)86i 85
> (-1.16)
> > > 8   24356   41843 (71.79)   88  129  (46.59)372 362
> (-2.68)
> > > 16  23587   40546 (71.89)   375 564  (50.40)15581519
> (-2.50)
> > > 32  22927   39490 (72.24)   16172171 (34.26)66945722
> (-14.52)
> > > 48  23067   39238 (70.10)   39315170 (31.51)15823   13552
> (-14.35)
> > > 64  22927   38750 (69.01)   71429914 (38.81)28972   26173
> (-9.66)
> > > 96  22568   38520 (70.68)   16258   27844 (71.26)   65944   73031
> (10.74)
> >
> > That's a significant hit in TCP SD. Is it caused by the imbalance between
> > number of queues for TX and RX? Since you mention RX is complete,
> > maybe measure with a balanced TX/RX?
> 
> Yes, I am not sure why it is so high.

Any errors at higher levels? Are any packets reordered?

> I found the same with #RX=#TX
> too. As a hack, I tried ixgbe without MQ (set "indices=1" before
> calling alloc_etherdev_mq, not sure if that is entirely correct) -
> here too SD worsened by around 40%. I can't explain it, since the
> virtio-net driver runs lock free once sch_direct_xmit gets
> HARD_TX_LOCK for the specific txq. Maybe the SD calculation is not strictly
> correct since
> more threads are now running parallel and load is higher? Eg, if you
> compare SD between
> #netperfs = 8 vs 16 for original code (cut-n-paste relevant columns
> only) ...
> 
> N# BWSD
> 8   24356   88
> 16 23587   375
> 
> ... SD has increased more than 4 times for the same BW.
> 
> > What happens with a single netperf?
> > host -> guest performance with TCP and small packet speed
> > are also worth measuring.
> 
> OK, I will do this and send the results later today.
> 
> > At some level, host/guest communication is easy in that we don't really
> > care which queue is used.  I would like to give some thought (and
> > testing) to how is this going to work with a real NIC card and packet
> > steering at the backend.
> > Any idea?
> 
> I have done a little testing with guest -> remote server both
> using a bridge and with macvtap (mq is required only for rx).
> I didn't understand what you mean by packet steering though,
> is it whether packets go out of the NIC on different queues?
> If so, I verified that is the case by putting a counter and
> displaying through /debug interface on the host. dev_queue_xmit
> on the host handles it by calling dev_pick_tx().
> 
> > > Guest interrupts for a 4 TXQ device after a 5 min test:
> > > # egrep "virtio0|CPU" /proc/interrupts
> > >   CPU0 CPU1 CPU2CPU3
> > > 40:   000   0PCI-MSI-edge  virtio0-config
> > > 41:   126955   126912   126505  126940   PCI-MSI-edge  virtio0-input
> > > 42:   108583   107787   107853  107716   PCI-MSI-edge  virtio0-output.0
> > > 43:   300278   297653   299378  300554   PCI-MSI-edge  virtio0-output.1
> > > 44:   372607   374884   371092  372011   PCI-MSI-edge  virtio0-output.2
> > > 45:   162042   162261   163623  162923   PCI-MSI-edge  virtio0-output.3
> >
> > Does this mean each interrupt is constantly bouncing between CPUs?
> 
> Yes. I didn't do *any* tuning for the tests. The only "tuning"
> was to use 64K IO size with netperf. When I ran default netperf
> (16K), I got a little lesser improvement in BW and worse(!) SD
> than with 64K.
> 
> Thanks,
> 
> - KK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-08 Thread Krishna Kumar2
Avi Kivity  wrote on 09/08/2010 02:58:21 PM:

> >>> 1. This feature was first implemented with a single vhost.
> >>>  Testing showed 3-8% performance gain for upto 8 netperf
> >>>  sessions (and sometimes 16), but BW dropped with more
> >>>  sessions.  However, implementing per-txq vhost improved
> >>>  BW significantly all the way to 128 sessions.
> >> Why were vhost kernel changes required?  Can't you just instantiate
more
> >> vhost queues?
> > I did try using a single thread processing packets from multiple
> > vq's on host, but the BW dropped beyond a certain number of
> > sessions.
>
> Oh - so the interface has not changed (which can be seen from the
> patch).  That was my concern, I remembered that we planned for vhost-net
> to be multiqueue-ready.
>
> The new guest and qemu code work with old vhost-net, just with reduced
> performance, yes?

Yes, I have tested new guest/qemu with old vhost but using
#numtxqs=1 (or not passing any arguments at all to qemu to
enable MQ). Giving numtxqs > 1 fails with ENOBUFS in vhost,
since vhost_net_set_backend in the unmodified vhost checks
for boundary overflow.

I have also tested running an unmodified guest with new
vhost/qemu, but qemu should not specify numtxqs>1.

> > Are you suggesting this
> > combination:
> >IRQ on guest:
> >   40: CPU0
> >   41: CPU1
> >   42: CPU2
> >   43: CPU3 (all CPUs are on socket #0)
> >vhost:
> >   thread #0:  CPU0
> >   thread #1:  CPU1
> >   thread #2:  CPU2
> >   thread #3:  CPU3
> >qemu:
> >   thread #0:  CPU4
> >   thread #1:  CPU5
> >   thread #2:  CPU6
> >   thread #3:  CPU7 (all CPUs are on socket#1)
>
> May be better to put vcpu threads and vhost threads on the same socket.
>
> Also need to affine host interrupts.
>
> >netperf/netserver:
> >   Run on CPUs 0-4 on both sides
> >
> > The reason I did not optimize anything from user space is because
> > I felt showing the default works reasonably well is important.
>
> Definitely.  Heavy tuning is not a useful path for general end users.
> We need to make sure the the scheduler is able to arrive at the optimal
> layout without pinning (but perhaps with hints).

OK, I will see if I can get results with this.

Thanks for your suggestions,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fixup set_efer()

2010-09-08 Thread Avi Kivity

 On 09/04/2010 04:29 PM, Hillf Danton wrote:
The second call to kvm_mmu_reset_context() seems unnecessary and is 
removed.



@@ -783,10 +783,6 @@ static int set_efer(struct kvm_vcpu *vcp
vcpu->arch.mmu.base_role.nxe = (efer & EFER_NX) && !tdp_enabled;
kvm_mmu_reset_context(vcpu);
- /* Update reserved bits */
- if ((efer ^ old_efer) & EFER_NX)
- kvm_mmu_reset_context(vcpu);
-
return 0;
 }


Hm.  As far as I can tell, it's the first call that is unnecessary.  
I'll look at the history and try to understand why it was introduced.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] kvm/e500v2: MMU optimization

2010-09-08 Thread Liu Yu
The patchset aims at mapping guest TLB1 to host TLB0.
And it includes:
[PATCH 1/2] kvm/e500v2: Remove shadow tlb
[PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0

The reason we need patch 1 is because patch 1 make things simple and flexible.
Only applying patch 1 aslo make kvm work.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0

2010-09-08 Thread Liu Yu
Current guest TLB1 is mapped to host TLB1.
As host kernel only provides 4K uncontinuous pages,
we have to break guest large mapping into 4K shadow mappings.
These 4K shadow mappings are then mapped into host TLB1 on fly.
As host TLB1 only has 13 free entries, there's serious tlb miss.

Since e500v2 has a big number of TLB0 entries,
it should be help to map those 4K shadow mappings to host TLB0.
To achieve this, we need to unlink guest tlb and host tlb,
So that guest TLB1 mappings can route to any host TLB0 entries freely.

Pages/mappings are considerred in the same kind as host tlb entry.
This patch remove the link between pages and guest tlb entry to do the unlink.
And keep host_tlb0_ref in each vcpu to trace pages.
Then it's easy to map guest TLB1 to host TLB0.

In guest ramdisk boot test(guest mainly uses TLB1),
with this patch, the tlb miss number get down 90%.

Signed-off-by: Liu Yu 
---
 arch/powerpc/include/asm/kvm_e500.h |7 +-
 arch/powerpc/kvm/e500.c |4 +
 arch/powerpc/kvm/e500_tlb.c |  280 ---
 arch/powerpc/kvm/e500_tlb.h |1 +
 4 files changed, 104 insertions(+), 188 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_e500.h 
b/arch/powerpc/include/asm/kvm_e500.h
index cb785f9..16c0ed0 100644
--- a/arch/powerpc/include/asm/kvm_e500.h
+++ b/arch/powerpc/include/asm/kvm_e500.h
@@ -37,13 +37,10 @@ struct tlbe_ref {
 struct kvmppc_vcpu_e500 {
/* Unmodified copy of the guest's TLB. */
struct tlbe *guest_tlb[E500_TLB_NUM];
-   /* TLB that's actually used when the guest is running. */
-   struct tlbe *shadow_tlb[E500_TLB_NUM];
-   /* Pages which are referenced in the shadow TLB. */
-   struct tlbe_ref *shadow_refs[E500_TLB_NUM];
+   /* Pages which are referenced in host TLB. */
+   struct tlbe_ref *host_tlb0_ref;
 
unsigned int guest_tlb_size[E500_TLB_NUM];
-   unsigned int shadow_tlb_size[E500_TLB_NUM];
unsigned int guest_tlb_nv[E500_TLB_NUM];
 
u32 host_pid[E500_PID_NUM];
diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
index e8a00b0..14af6d7 100644
--- a/arch/powerpc/kvm/e500.c
+++ b/arch/powerpc/kvm/e500.c
@@ -146,6 +146,10 @@ static int __init kvmppc_e500_init(void)
if (r)
return r;
 
+   r = kvmppc_e500_mmu_init();
+   if (r)
+   return r;
+
/* copy extra E500 exception handlers */
ivor[0] = mfspr(SPRN_IVOR32);
ivor[1] = mfspr(SPRN_IVOR33);
diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index 0b657af..a6c2320 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -25,9 +25,15 @@
 #include "e500_tlb.h"
 #include "trace.h"
 
-#define to_htlb1_esel(esel) (tlb1_entry_num - (esel) - 1)
+static unsigned int host_tlb0_entry_num;
+static unsigned int host_tlb0_assoc;
+static unsigned int host_tlb0_assoc_bit;
 
-static unsigned int tlb1_entry_num;
+static inline unsigned int get_tlb0_entry_offset(u32 eaddr, u32 esel)
+{
+   return ((eaddr & 0x7F000) >> (12 - host_tlb0_assoc_bit) |
+   (esel & (host_tlb0_assoc - 1)));
+}
 
 void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu)
 {
@@ -62,11 +68,6 @@ static inline unsigned int tlb0_get_next_victim(
return victim;
 }
 
-static inline unsigned int tlb1_max_shadow_size(void)
-{
-   return tlb1_entry_num - tlbcam_index;
-}
-
 static inline int tlbe_is_writable(struct tlbe *tlbe)
 {
return tlbe->mas3 & (MAS3_SW|MAS3_UW);
@@ -100,7 +101,7 @@ static inline u32 e500_shadow_mas2_attrib(u32 mas2, int 
usermode)
 /*
  * writing shadow tlb entry to host TLB
  */
-static inline void __write_host_tlbe(struct tlbe *stlbe)
+static inline void __host_tlbe_write(struct tlbe *stlbe)
 {
mtspr(SPRN_MAS1, stlbe->mas1);
mtspr(SPRN_MAS2, stlbe->mas2);
@@ -109,25 +110,22 @@ static inline void __write_host_tlbe(struct tlbe *stlbe)
__asm__ __volatile__ ("tlbwe\n" : : );
 }
 
-static inline void write_host_tlbe(struct kvmppc_vcpu_e500 *vcpu_e500,
-   int tlbsel, int esel, struct tlbe *stlbe)
+static inline u32 host_tlb0_write(struct kvmppc_vcpu_e500 *vcpu_e500,
+   u32 gvaddr, struct tlbe *stlbe)
 {
-   local_irq_disable();
-   if (tlbsel == 0) {
-   __write_host_tlbe(stlbe);
-   } else {
-   unsigned register mas0;
+   unsigned register mas0;
 
-   mas0 = mfspr(SPRN_MAS0);
+   local_irq_disable();
 
-   mtspr(SPRN_MAS0, MAS0_TLBSEL(1) | 
MAS0_ESEL(to_htlb1_esel(esel)));
-   __write_host_tlbe(stlbe);
+   mas0 = mfspr(SPRN_MAS0);
+   __host_tlbe_write(stlbe);
 
-   mtspr(SPRN_MAS0, mas0);
-   }
local_irq_enable();
-   trace_kvm_stlb_write(index_of(tlbsel, esel), stlbe->mas1, stlbe->mas2,
+
+   trace_kvm_stlb_write(mas0, stlbe->mas1, stlbe->mas2,
stlbe->mas3, stlbe->mas7);
+
+  

[PATCH 1/2] kvm/e500v2: Remove shadow tlb

2010-09-08 Thread Liu Yu
It is unnecessary to keep shadow tlb.
first, shadow tlb keep fixed value in shadow, which make things unflexible.
second, remove shadow tlb can save a lot memory.

This patch remove shadow tlb and caculate the shadow tlb entry value
before we write it to hardware.

Also we use new struct tlbe_ref to trace the relation
between guest tlb entry and page.

Signed-off-by: Liu Yu 
---
 arch/powerpc/include/asm/kvm_e500.h |7 +-
 arch/powerpc/kvm/e500_tlb.c |  287 +--
 2 files changed, 108 insertions(+), 186 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_e500.h 
b/arch/powerpc/include/asm/kvm_e500.h
index 7fea26f..cb785f9 100644
--- a/arch/powerpc/include/asm/kvm_e500.h
+++ b/arch/powerpc/include/asm/kvm_e500.h
@@ -29,13 +29,18 @@ struct tlbe{
u32 mas7;
 };
 
+struct tlbe_ref {
+   struct page *page;
+   struct tlbe *gtlbe;
+};
+
 struct kvmppc_vcpu_e500 {
/* Unmodified copy of the guest's TLB. */
struct tlbe *guest_tlb[E500_TLB_NUM];
/* TLB that's actually used when the guest is running. */
struct tlbe *shadow_tlb[E500_TLB_NUM];
/* Pages which are referenced in the shadow TLB. */
-   struct page **shadow_pages[E500_TLB_NUM];
+   struct tlbe_ref *shadow_refs[E500_TLB_NUM];
 
unsigned int guest_tlb_size[E500_TLB_NUM];
unsigned int shadow_tlb_size[E500_TLB_NUM];
diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index f11ca0f..0b657af 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2008 Freescale Semiconductor, Inc. All rights reserved.
+ * Copyright (C) 2008, 2010 Freescale Semiconductor, Inc. All rights reserved.
  *
  * Author: Yu Liu, yu@freescale.com
  *
@@ -48,17 +48,6 @@ void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu)
tlbe->mas3, tlbe->mas7);
}
}
-
-   for (tlbsel = 0; tlbsel < 2; tlbsel++) {
-   printk("Shadow TLB%d:\n", tlbsel);
-   for (i = 0; i < vcpu_e500->shadow_tlb_size[tlbsel]; i++) {
-   tlbe = &vcpu_e500->shadow_tlb[tlbsel][i];
-   if (tlbe->mas1 & MAS1_VALID)
-   printk(" S[%d][%3d] |  %08X | %08X | %08X | 
%08X |\n",
-   tlbsel, i, tlbe->mas1, tlbe->mas2,
-   tlbe->mas3, tlbe->mas7);
-   }
-   }
 }
 
 static inline unsigned int tlb0_get_next_victim(
@@ -121,10 +110,8 @@ static inline void __write_host_tlbe(struct tlbe *stlbe)
 }
 
 static inline void write_host_tlbe(struct kvmppc_vcpu_e500 *vcpu_e500,
-   int tlbsel, int esel)
+   int tlbsel, int esel, struct tlbe *stlbe)
 {
-   struct tlbe *stlbe = &vcpu_e500->shadow_tlb[tlbsel][esel];
-
local_irq_disable();
if (tlbsel == 0) {
__write_host_tlbe(stlbe);
@@ -139,28 +126,12 @@ static inline void write_host_tlbe(struct 
kvmppc_vcpu_e500 *vcpu_e500,
mtspr(SPRN_MAS0, mas0);
}
local_irq_enable();
+   trace_kvm_stlb_write(index_of(tlbsel, esel), stlbe->mas1, stlbe->mas2,
+   stlbe->mas3, stlbe->mas7);
 }
 
 void kvmppc_e500_tlb_load(struct kvm_vcpu *vcpu, int cpu)
 {
-   struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
-   int i;
-   unsigned register mas0;
-
-   /* Load all valid TLB1 entries to reduce guest tlb miss fault */
-   local_irq_disable();
-   mas0 = mfspr(SPRN_MAS0);
-   for (i = 0; i < tlb1_max_shadow_size(); i++) {
-   struct tlbe *stlbe = &vcpu_e500->shadow_tlb[1][i];
-
-   if (get_tlb_v(stlbe)) {
-   mtspr(SPRN_MAS0, MAS0_TLBSEL(1)
-   | MAS0_ESEL(to_htlb1_esel(i)));
-   __write_host_tlbe(stlbe);
-   }
-   }
-   mtspr(SPRN_MAS0, mas0);
-   local_irq_enable();
 }
 
 void kvmppc_e500_tlb_put(struct kvm_vcpu *vcpu)
@@ -202,16 +173,19 @@ static int kvmppc_e500_tlb_index(struct kvmppc_vcpu_e500 
*vcpu_e500,
 }
 
 static void kvmppc_e500_shadow_release(struct kvmppc_vcpu_e500 *vcpu_e500,
-   int tlbsel, int esel)
+   int stlbsel, int sesel)
 {
-   struct tlbe *stlbe = &vcpu_e500->shadow_tlb[tlbsel][esel];
-   struct page *page = vcpu_e500->shadow_pages[tlbsel][esel];
+   struct tlbe_ref *ref;
+   struct page *page;
+
+   ref = &vcpu_e500->shadow_refs[stlbsel][sesel];
+   page = ref->page;
 
if (page) {
-   vcpu_e500->shadow_pages[tlbsel][esel] = NULL;
+   ref->page = NULL;
 
-   if (get_tlb_v(stlbe)) {
-   if (tlbe_is_writable(stlbe))
+   if (get_tlb_v(ref->gtlbe)) {
+   if (tlbe_is_writable(ref->gtlbe))
kvm_release_page_dirty(page);
 

Re: [PATCH] KVM: x86: fixup kvm_set_cr4()

2010-09-08 Thread Avi Kivity

 On 09/04/2010 03:43 PM, Hillf Danton wrote:


Subject lines such as "fixup $x" are too general.  Try to make them more 
specific.



X86_CR4_VMXE is checked earlier, since
[1] virtualization is not allowed in guest,


Why does that matter?  Note it may change one day.


[2] load_pdptrs() could be saved.


The common case is that the mov does not fault and we have to call 
load_pdptrs() anyway.


It's a little cleaner to check before doing anything, though.



Signed-off-by: Hillf Danton mailto:dhi...@gmail.com>>
---

--- o/linux-2.6.36-rc1/arch/x86/kvm/x86.c 2010-08-16 
08:41:38.0 +0800
+++ m/linux-2.6.36-rc1/arch/x86/kvm/x86.c 2010-09-04 
20:25:04.0 +0800

@@ -539,6 +539,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, u
if (cr4 & CR4_RESERVED_BITS)
return 1;
+ if (cr4 & X86_CR4_VMXE)
+ return 1;
+
if (!guest_cpuid_has_xsave(vcpu) && (cr4 & X86_CR4_OSXSAVE))
return 1;
@@ -550,9 +553,6 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, u
&& !load_pdptrs(vcpu, vcpu->arch.cr3))
return 1;
- if (cr4 & X86_CR4_VMXE)
- return 1;
-
kvm_x86_ops->set_cr4(vcpu, cr4);
if ((cr4 ^ old_cr4) & pdptr_bits)




--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/27] KVM: X86: Propagate fetch faults

2010-09-08 Thread Roedel, Joerg
On Tue, Sep 07, 2010 at 02:43:16PM -0400, Marcelo Tosatti wrote:
> On Mon, Sep 06, 2010 at 05:55:58PM +0200, Joerg Roedel wrote:
> > r = x86_decode_insn(&vcpu->arch.emulate_ctxt);
> > +   if (r == X86EMUL_PROPAGATE_FAULT)
> > +   goto done;
> > +
> 
> x86_decode_insn returns -1 / 0 ?

Yes. This looks like a left-over from v2 of the patch-set. I'll check
the path again and remove it if not necessary anymore.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-08 Thread Avi Kivity

 On 09/08/2010 12:22 PM, Krishna Kumar2 wrote:

Avi Kivity  wrote on 09/08/2010 01:17:34 PM:


   On 09/08/2010 10:28 AM, Krishna Kumar wrote:

Following patches implement Transmit mq in virtio-net.  Also
included is the user qemu changes.

1. This feature was first implemented with a single vhost.
 Testing showed 3-8% performance gain for upto 8 netperf
 sessions (and sometimes 16), but BW dropped with more
 sessions.  However, implementing per-txq vhost improved
 BW significantly all the way to 128 sessions.

Why were vhost kernel changes required?  Can't you just instantiate more
vhost queues?

I did try using a single thread processing packets from multiple
vq's on host, but the BW dropped beyond a certain number of
sessions.


Oh - so the interface has not changed (which can be seen from the 
patch).  That was my concern, I remembered that we planned for vhost-net 
to be multiqueue-ready.


The new guest and qemu code work with old vhost-net, just with reduced 
performance, yes?



I don't have the code and performance numbers for that
right now since it is a bit ancient, I can try to resuscitate
that if you want.


No need.


Guest interrupts for a 4 TXQ device after a 5 min test:
# egrep "virtio0|CPU" /proc/interrupts
CPU0 CPU1 CPU2CPU3
40:   000   0PCI-MSI-edge  virtio0-config
41:   126955   126912   126505  126940   PCI-MSI-edge  virtio0-input
42:   108583   107787   107853  107716   PCI-MSI-edge  virtio0-output.0
43:   300278   297653   299378  300554   PCI-MSI-edge  virtio0-output.1
44:   372607   374884   371092  372011   PCI-MSI-edge  virtio0-output.2
45:   162042   162261   163623  162923   PCI-MSI-edge  virtio0-output.3

How are vhost threads and host interrupts distributed?  We need to move
vhost queue threads to be colocated with the related vcpu threads (if no
extra cores are available) or on the same socket (if extra cores are
available).  Similarly, move device interrupts to the same core as the
vhost thread.

All my testing was without any tuning, including binding netperf&
netserver (irqbalance is also off). I assume (maybe wrongly) that
the above might give better results?


I hope so!


Are you suggesting this
combination:
IRQ on guest:
40: CPU0
41: CPU1
42: CPU2
43: CPU3 (all CPUs are on socket #0)
vhost:
thread #0:  CPU0
thread #1:  CPU1
thread #2:  CPU2
thread #3:  CPU3
qemu:
thread #0:  CPU4
thread #1:  CPU5
thread #2:  CPU6
thread #3:  CPU7 (all CPUs are on socket#1)


May be better to put vcpu threads and vhost threads on the same socket.

Also need to affine host interrupts.


netperf/netserver:
Run on CPUs 0-4 on both sides

The reason I did not optimize anything from user space is because
I felt showing the default works reasonably well is important.


Definitely.  Heavy tuning is not a useful path for general end users.  
We need to make sure the the scheduler is able to arrive at the optimal 
layout without pinning (but perhaps with hints).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-08 Thread Krishna Kumar2
Hi Michael,

"Michael S. Tsirkin"  wrote on 09/08/2010 01:43:26 PM:

> On Wed, Sep 08, 2010 at 12:58:59PM +0530, Krishna Kumar wrote:
> > 1. mq RX patch is also complete - plan to submit once TX is OK.
>
> It's good that you split patches, I think it would be interesting to see
> the RX patches at least once to complete the picture.
> You could make it a separate patchset, tag them as RFC.

OK, I need to re-do some parts of it, since I started the TX only
branch a couple of weeks earlier and the RX side is outdated. I
will try to send that out in the next couple of days, as you say
it will help to complete the picture. Reasons to send it only TX
now:

- Reduce size of patch and complexity
- I didn't get much improvement on multiple RX patch (netperf from
  host -> guest), so needed some time to figure out the reason and
  fix it.

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/27] KVM: MMU: Add infrastructure for two-level page walker

2010-09-08 Thread Roedel, Joerg
On Mon, Sep 06, 2010 at 02:05:35PM -0400, Avi Kivity wrote:
>   On 09/06/2010 06:55 PM, Joerg Roedel wrote:
> > This patch introduces a mmu-callback to translate gpa
> > addresses in the walk_addr code. This is later used to
> > translate l2_gpa addresses into l1_gpa addresses.
> 
> > @@ -534,6 +534,11 @@ static inline gpa_t gfn_to_gpa(gfn_t gfn)
> > return (gpa_t)gfn<<  PAGE_SHIFT;
> >   }
> >
> > +static inline gfn_t gpa_to_gfn(gpa_t gpa)
> > +{
> > +   return (gfn_t)gpa>>  PAGE_SHIFT;
> > +}
> > +
> 
> That's a bug - gfn_t may be smaller than gpa_t, so you're truncating 
> just before the shift.  Note the casts in the surrounding functions are 
> widening, not narrowing.
> 
> However, gfn_t is u64 so the bug is only theoretical.

Will fix that in v4 too. Thanks.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-08 Thread Krishna Kumar2
"Michael S. Tsirkin"  wrote on 09/08/2010 01:40:11 PM:

>
___

> >TCP (#numtxqs=2)
> > N#  BW1 BW2(%)  SD1 SD2(%)  RSD1RSD2
(%)
> >
>
___

> > 4   26387   40716 (54.30)   20  28   (40.00)86i 85
(-1.16)
> > 8   24356   41843 (71.79)   88  129  (46.59)372 362
(-2.68)
> > 16  23587   40546 (71.89)   375 564  (50.40)15581519
(-2.50)
> > 32  22927   39490 (72.24)   16172171 (34.26)66945722
(-14.52)
> > 48  23067   39238 (70.10)   39315170 (31.51)15823   13552
(-14.35)
> > 64  22927   38750 (69.01)   71429914 (38.81)28972   26173
(-9.66)
> > 96  22568   38520 (70.68)   16258   27844 (71.26)   65944   73031
(10.74)
>
> That's a significant hit in TCP SD. Is it caused by the imbalance between
> number of queues for TX and RX? Since you mention RX is complete,
> maybe measure with a balanced TX/RX?

Yes, I am not sure why it is so high. I found the same with #RX=#TX
too. As a hack, I tried ixgbe without MQ (set "indices=1" before
calling alloc_etherdev_mq, not sure if that is entirely correct) -
here too SD worsened by around 40%. I can't explain it, since the
virtio-net driver runs lock free once sch_direct_xmit gets
HARD_TX_LOCK for the specific txq. Maybe the SD calculation is not strictly
correct since
more threads are now running parallel and load is higher? Eg, if you
compare SD between
#netperfs = 8 vs 16 for original code (cut-n-paste relevant columns
only) ...

N# BWSD
8   24356   88
16 23587   375

... SD has increased more than 4 times for the same BW.

> What happens with a single netperf?
> host -> guest performance with TCP and small packet speed
> are also worth measuring.

OK, I will do this and send the results later today.

> At some level, host/guest communication is easy in that we don't really
> care which queue is used.  I would like to give some thought (and
> testing) to how is this going to work with a real NIC card and packet
> steering at the backend.
> Any idea?

I have done a little testing with guest -> remote server both
using a bridge and with macvtap (mq is required only for rx).
I didn't understand what you mean by packet steering though,
is it whether packets go out of the NIC on different queues?
If so, I verified that is the case by putting a counter and
displaying through /debug interface on the host. dev_queue_xmit
on the host handles it by calling dev_pick_tx().

> > Guest interrupts for a 4 TXQ device after a 5 min test:
> > # egrep "virtio0|CPU" /proc/interrupts
> >   CPU0 CPU1 CPU2CPU3
> > 40:   000   0PCI-MSI-edge  virtio0-config
> > 41:   126955   126912   126505  126940   PCI-MSI-edge  virtio0-input
> > 42:   108583   107787   107853  107716   PCI-MSI-edge  virtio0-output.0
> > 43:   300278   297653   299378  300554   PCI-MSI-edge  virtio0-output.1
> > 44:   372607   374884   371092  372011   PCI-MSI-edge  virtio0-output.2
> > 45:   162042   162261   163623  162923   PCI-MSI-edge  virtio0-output.3
>
> Does this mean each interrupt is constantly bouncing between CPUs?

Yes. I didn't do *any* tuning for the tests. The only "tuning"
was to use 64K IO size with netperf. When I ran default netperf
(16K), I got a little lesser improvement in BW and worse(!) SD
than with 64K.

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-08 Thread Krishna Kumar2
Avi Kivity  wrote on 09/08/2010 01:17:34 PM:

>   On 09/08/2010 10:28 AM, Krishna Kumar wrote:
> > Following patches implement Transmit mq in virtio-net.  Also
> > included is the user qemu changes.
> >
> > 1. This feature was first implemented with a single vhost.
> > Testing showed 3-8% performance gain for upto 8 netperf
> > sessions (and sometimes 16), but BW dropped with more
> > sessions.  However, implementing per-txq vhost improved
> > BW significantly all the way to 128 sessions.
>
> Why were vhost kernel changes required?  Can't you just instantiate more
> vhost queues?

I did try using a single thread processing packets from multiple
vq's on host, but the BW dropped beyond a certain number of
sessions. I don't have the code and performance numbers for that
right now since it is a bit ancient, I can try to resuscitate
that if you want.

> > Guest interrupts for a 4 TXQ device after a 5 min test:
> > # egrep "virtio0|CPU" /proc/interrupts
> >CPU0 CPU1 CPU2CPU3
> > 40:   000   0PCI-MSI-edge  virtio0-config
> > 41:   126955   126912   126505  126940   PCI-MSI-edge  virtio0-input
> > 42:   108583   107787   107853  107716   PCI-MSI-edge  virtio0-output.0
> > 43:   300278   297653   299378  300554   PCI-MSI-edge  virtio0-output.1
> > 44:   372607   374884   371092  372011   PCI-MSI-edge  virtio0-output.2
> > 45:   162042   162261   163623  162923   PCI-MSI-edge  virtio0-output.3
>
> How are vhost threads and host interrupts distributed?  We need to move
> vhost queue threads to be colocated with the related vcpu threads (if no
> extra cores are available) or on the same socket (if extra cores are
> available).  Similarly, move device interrupts to the same core as the
> vhost thread.

All my testing was without any tuning, including binding netperf &
netserver (irqbalance is also off). I assume (maybe wrongly) that
the above might give better results? Are you suggesting this
combination:
IRQ on guest:
40: CPU0
41: CPU1
42: CPU2
43: CPU3 (all CPUs are on socket #0)
vhost:
thread #0:  CPU0
thread #1:  CPU1
thread #2:  CPU2
thread #3:  CPU3
qemu:
thread #0:  CPU4
thread #1:  CPU5
thread #2:  CPU6
thread #3:  CPU7 (all CPUs are on socket#1)
netperf/netserver:
Run on CPUs 0-4 on both sides

The reason I did not optimize anything from user space is because
I felt showing the default works reasonably well is important.

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/27] KVM: MMU: Refactor mmu_alloc_roots function

2010-09-08 Thread Roedel, Joerg
On Wed, Sep 08, 2010 at 03:16:59AM -0400, Avi Kivity wrote:
>   On 09/07/2010 11:39 PM, Marcelo Tosatti wrote:
> >
> >> @@ -2406,16 +2441,11 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
> >>root_gfn = pdptr>>  PAGE_SHIFT;
> >>if (mmu_check_root(vcpu, root_gfn))
> >>return 1;
> >> -  } else if (vcpu->arch.mmu.root_level == 0)
> >> -  root_gfn = 0;
> >> -  if (vcpu->arch.mmu.direct_map) {
> >> -  direct = 1;
> >> -  root_gfn = i<<  30;
> >>}
> >>spin_lock(&vcpu->kvm->mmu_lock);
> >>kvm_mmu_free_some_pages(vcpu);
> >>sp = kvm_mmu_get_page(vcpu, root_gfn, i<<  30,
> >> -PT32_ROOT_LEVEL, direct,
> >> +PT32_ROOT_LEVEL, 0,
> >>  ACC_ALL, NULL);
> > Should not write protect the gfn for nonpaging mode.
> >
> 
> nonpaging mode should have direct_map set, so wouldn't enter this path 
> at all.

Hmm, actually the nonpaging path does not set direct_map. I'll fix this
too in v4. Thanks.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/27] KVM: MMU: Make walk_addr_generic capable for two-level walking

2010-09-08 Thread Roedel, Joerg
On Tue, Sep 07, 2010 at 01:48:05PM -0400, Marcelo Tosatti wrote:
> On Mon, Sep 06, 2010 at 05:55:53PM +0200, Joerg Roedel wrote:
> > This patch uses kvm_read_guest_page_tdp to make the
> > walk_addr_generic functions suitable for two-level page
> > table walking.
> > 
> > Signed-off-by: Joerg Roedel 
> > ---
> >  arch/x86/kvm/paging_tmpl.h |   27 ---
> >  1 files changed, 20 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> > index cd59af1..a5b5759 100644
> > --- a/arch/x86/kvm/paging_tmpl.h
> > +++ b/arch/x86/kvm/paging_tmpl.h
> > @@ -124,6 +124,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker 
> > *walker,
> > unsigned index, pt_access, uninitialized_var(pte_access);
> > gpa_t pte_gpa;
> > bool eperm, present, rsvd_fault;
> > +   int offset;
> > +   u32 error = 0;
> >  
> > trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
> >  fetch_fault);
> > @@ -153,12 +155,13 @@ walk:
> > index = PT_INDEX(addr, walker->level);
> >  
> > table_gfn = gpte_to_gfn(pte);
> > -   pte_gpa = gfn_to_gpa(table_gfn);
> > -   pte_gpa += index * sizeof(pt_element_t);
> > +   offset= index * sizeof(pt_element_t);
> > +   pte_gpa   = gfn_to_gpa(table_gfn) + offset;
> > walker->table_gfn[walker->level - 1] = table_gfn;
> > walker->pte_gpa[walker->level - 1] = pte_gpa;
> >  
> > -   if (kvm_read_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte))) {
> > +   if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, &pte, offset,
> > +   sizeof(pte), &error)) {
> > present = false;
> > break;
> > }
> 
> If there is failure reading the nested page tables here, you fill
> vcpu->arch.fault. But the nested fault error values will be overwritten
> at the end of walk_addr() by the original fault values?

True. Thanks for pointing that out. I will write a test-case for that
too. The results from my implemented tests show that sometimes the error
code is not reported correctly too. So I decided to do a v4 of this
patch-set with all found issues fixed.

Thanks for your review.

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/2] qemu-kvm: use usptream eventfd code

2010-09-08 Thread Avi Kivity

 On 09/07/2010 08:25 PM, Marcelo Tosatti wrote:

On Tue, Sep 07, 2010 at 11:21:32AM +0300, Avi Kivity wrote:

  On 09/06/2010 11:20 PM, Marcelo Tosatti wrote:

Upstream code is equivalent.

Signed-off-by: Marcelo Tosatti

Index: qemu-kvm/cpus.c
===
--- qemu-kvm.orig/cpus.c
+++ qemu-kvm/cpus.c
@@ -290,11 +290,6 @@ void qemu_notify_event(void)
  {
  CPUState *env = cpu_single_env;

-if (kvm_enabled()) {
-qemu_kvm_notify_work();
-return;
-}
-
  qemu_event_increment ();
  if (env) {
  cpu_exit(env);

qemu_event_increment() is indeed equivalent, but what about the
rest?  Are we guaranteed that cpu_single_env == NULL?

No, its not NULL. But env->current is, so its fine.


Ok, thanks.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-08 Thread Michael S. Tsirkin
On Wed, Sep 08, 2010 at 12:58:59PM +0530, Krishna Kumar wrote:
> 1. mq RX patch is also complete - plan to submit once TX is OK.

It's good that you split patches, I think it would be interesting to see
the RX patches at least once to complete the picture.
You could make it a separate patchset, tag them as RFC.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-08 Thread Michael S. Tsirkin
On Wed, Sep 08, 2010 at 12:58:59PM +0530, Krishna Kumar wrote:
> Following patches implement Transmit mq in virtio-net.  Also
> included is the user qemu changes.
> 
> 1. This feature was first implemented with a single vhost.
>Testing showed 3-8% performance gain for upto 8 netperf
>sessions (and sometimes 16), but BW dropped with more
>sessions.  However, implementing per-txq vhost improved
>BW significantly all the way to 128 sessions.
> 2. For this mq TX patch, 1 daemon is created for RX and 'n'
>daemons for the 'n' TXQ's, for a total of (n+1) daemons.
>The (subsequent) RX mq patch changes that to a total of
>'n' daemons, where RX and TX vq's share 1 daemon.
> 3. Service Demand increases for TCP, but significantly
>improves for UDP.
> 4. Interoperability: Many combinations, but not all, of
>qemu, host, guest tested together.
> 
> 
>   Enabling mq on virtio:
>   ---
> 
> When following options are passed to qemu:
> - smp > 1
> - vhost=on
> - mq=on (new option, default:off)
> then #txqueues = #cpus.  The #txqueues can be changed by using
> an optional 'numtxqs' option. e.g.  for a smp=4 guest:
> vhost=on,mq=on ->   #txqueues = 4
> vhost=on,mq=on,numtxqs=8   ->   #txqueues = 8
> vhost=on,mq=on,numtxqs=2   ->   #txqueues = 2
> 
> 
>Performance (guest -> local host):
>---
> 
> System configuration:
> Host:  8 Intel Xeon, 8 GB memory
> Guest: 4 cpus, 2 GB memory
> All testing without any tuning, and TCP netperf with 64K I/O
> ___
>TCP (#numtxqs=2)
> N#  BW1 BW2(%)  SD1 SD2(%)  RSD1RSD2(%)
> ___
> 4   26387   40716 (54.30)   20  28   (40.00)86i 85 (-1.16)
> 8   24356   41843 (71.79)   88  129  (46.59)372 362(-2.68)
> 16  23587   40546 (71.89)   375 564  (50.40)15581519   (-2.50)
> 32  22927   39490 (72.24)   16172171 (34.26)66945722   
> (-14.52)
> 48  23067   39238 (70.10)   39315170 (31.51)15823   13552  
> (-14.35)
> 64  22927   38750 (69.01)   71429914 (38.81)28972   26173  (-9.66)
> 96  22568   38520 (70.68)   16258   27844 (71.26)   65944   73031  (10.74)

That's a significant hit in TCP SD. Is it caused by the imbalance between
number of queues for TX and RX? Since you mention RX is complete,
maybe measure with a balanced TX/RX?


> ___
>UDP (#numtxqs=8)
> N#  BW1 BW2   (%)  SD1 SD2   (%)
> __
> 4   29836   56761 (90.24)   67  63(-5.97)
> 8   27666   63767 (130.48)  326 265   (-18.71)
> 16  25452   60665 (138.35)  13961269  (-9.09)
> 32  26172   63491 (142.59)  56174202  (-25.19)
> 48  26146   64629 (147.18)  12813   9316  (-27.29)
> 64  25575   65448 (155.90)  23063   16346 (-29.12)
> 128 26454   63772 (141.06)  91054   85051 (-6.59)
> __
> N#: Number of netperf sessions, 90 sec runs
> BW1,SD1,RSD1: Bandwidth (sum across 2 runs in mbps), SD and Remote
>   SD for original code
> BW2,SD2,RSD2: Bandwidth (sum across 2 runs in mbps), SD and Remote
>   SD for new code. e.g. BW2=40716 means average BW2 was
>   20358 mbps.
> 

What happens with a single netperf?
host -> guest performance with TCP and small packet speed
are also worth measuring.


>Next steps:
>---
> 
> 1. mq RX patch is also complete - plan to submit once TX is OK.
> 2. Cache-align data structures: I didn't see any BW/SD improvement
>after making the sq's (and similarly for vhost) cache-aligned
>statically:
> struct virtnet_info {
> ...
> struct send_queue sq[16] cacheline_aligned_in_smp;
> ...
> };
> 

At some level, host/guest communication is easy in that we don't really
care which queue is used.  I would like to give some thought (and
testing) to how is this going to work with a real NIC card and packet
steering at the backend.
Any idea?

> Guest interrupts for a 4 TXQ device after a 5 min test:
> # egrep "virtio0|CPU" /proc/interrupts 
>   CPU0 CPU1 CPU2CPU3   
> 40:   000   0PCI-MSI-edge  virtio0-config
> 41:   126955   126912   126505  126940   PCI-MSI-edge  virtio0-input
> 42:   108583   107787   107853  107716   PCI-MSI-edge  virtio0-output.0
> 43:   300278   297653   299378  300554   PCI-MSI-edge  virtio0-ou

Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-08 Thread Avi Kivity

 On 09/08/2010 10:28 AM, Krishna Kumar wrote:

Following patches implement Transmit mq in virtio-net.  Also
included is the user qemu changes.

1. This feature was first implemented with a single vhost.
Testing showed 3-8% performance gain for upto 8 netperf
sessions (and sometimes 16), but BW dropped with more
sessions.  However, implementing per-txq vhost improved
BW significantly all the way to 128 sessions.


Why were vhost kernel changes required?  Can't you just instantiate more 
vhost queues?



2. For this mq TX patch, 1 daemon is created for RX and 'n'
daemons for the 'n' TXQ's, for a total of (n+1) daemons.
The (subsequent) RX mq patch changes that to a total of
'n' daemons, where RX and TX vq's share 1 daemon.
3. Service Demand increases for TCP, but significantly
improves for UDP.
4. Interoperability: Many combinations, but not all, of
qemu, host, guest tested together.


Please update the virtio-pci spec @ http://ozlabs.org/~rusty/virtio-spec/.



   Enabling mq on virtio:
   ---

When following options are passed to qemu:
 - smp>  1
 - vhost=on
 - mq=on (new option, default:off)
then #txqueues = #cpus.  The #txqueues can be changed by using
an optional 'numtxqs' option. e.g.  for a smp=4 guest:
 vhost=on,mq=on ->#txqueues = 4
 vhost=on,mq=on,numtxqs=8   ->#txqueues = 8
 vhost=on,mq=on,numtxqs=2   ->#txqueues = 2


Performance (guest ->  local host):
---

System configuration:
 Host:  8 Intel Xeon, 8 GB memory
 Guest: 4 cpus, 2 GB memory
All testing without any tuning, and TCP netperf with 64K I/O
___
TCP (#numtxqs=2)
N#  BW1 BW2(%)  SD1 SD2(%)  RSD1RSD2(%)
___
4   26387   40716 (54.30)   20  28   (40.00)86i 85 (-1.16)
8   24356   41843 (71.79)   88  129  (46.59)372 362(-2.68)
16  23587   40546 (71.89)   375 564  (50.40)15581519   (-2.50)
32  22927   39490 (72.24)   16172171 (34.26)66945722   (-14.52)
48  23067   39238 (70.10)   39315170 (31.51)15823   13552  (-14.35)
64  22927   38750 (69.01)   71429914 (38.81)28972   26173  (-9.66)
96  22568   38520 (70.68)   16258   27844 (71.26)   65944   73031  (10.74)
___
UDP (#numtxqs=8)
N#  BW1 BW2   (%)  SD1 SD2   (%)
__
4   29836   56761 (90.24)   67  63(-5.97)
8   27666   63767 (130.48)  326 265   (-18.71)
16  25452   60665 (138.35)  13961269  (-9.09)
32  26172   63491 (142.59)  56174202  (-25.19)
48  26146   64629 (147.18)  12813   9316  (-27.29)
64  25575   65448 (155.90)  23063   16346 (-29.12)
128 26454   63772 (141.06)  91054   85051 (-6.59)


Impressive results.


__
N#: Number of netperf sessions, 90 sec runs
BW1,SD1,RSD1: Bandwidth (sum across 2 runs in mbps), SD and Remote
   SD for original code
BW2,SD2,RSD2: Bandwidth (sum across 2 runs in mbps), SD and Remote
   SD for new code. e.g. BW2=40716 means average BW2 was
   20358 mbps.


Next steps:
---

1. mq RX patch is also complete - plan to submit once TX is OK.
2. Cache-align data structures: I didn't see any BW/SD improvement
after making the sq's (and similarly for vhost) cache-aligned
statically:
 struct virtnet_info {
 ...
 struct send_queue sq[16] cacheline_aligned_in_smp;
 ...
 };

Guest interrupts for a 4 TXQ device after a 5 min test:
# egrep "virtio0|CPU" /proc/interrupts
   CPU0 CPU1 CPU2CPU3
40:   000   0PCI-MSI-edge  virtio0-config
41:   126955   126912   126505  126940   PCI-MSI-edge  virtio0-input
42:   108583   107787   107853  107716   PCI-MSI-edge  virtio0-output.0
43:   300278   297653   299378  300554   PCI-MSI-edge  virtio0-output.1
44:   372607   374884   371092  372011   PCI-MSI-edge  virtio0-output.2
45:   162042   162261   163623  162923   PCI-MSI-edge  virtio0-output.3


How are vhost threads and host interrupts distributed?  We need to move 
vhost queue threads to be colocated with the related vcpu threads (if no 
extra cores are available) or on the same socket (if extra cores are 
available).  Similarly, move device interrupts to the same core as the 
vhost thread.




--
I have a truly marvellous patch that fixes the bug which

[RFC PATCH 4/4] qemu changes

2010-09-08 Thread Krishna Kumar
Changes in qemu to support mq TX.

Signed-off-by: Krishna Kumar 
---
 hw/vhost.c  |8 ++-
 hw/vhost.h  |2 
 hw/vhost_net.c  |   16 +--
 hw/vhost_net.h  |2 
 hw/virtio-net.c |   97 ++
 hw/virtio-net.h |5 ++
 hw/virtio-pci.c |2 
 net.c   |   17 
 net.h   |1 
 net/tap.c   |   61 +---
 10 files changed, 155 insertions(+), 56 deletions(-)

diff -ruNp org/hw/vhost.c new/hw/vhost.c
--- org/hw/vhost.c  2010-08-09 09:51:58.0 +0530
+++ new/hw/vhost.c  2010-09-08 12:54:50.0 +0530
@@ -599,23 +599,27 @@ static void vhost_virtqueue_cleanup(stru
   0, virtio_queue_get_desc_size(vdev, idx));
 }
 
-int vhost_dev_init(struct vhost_dev *hdev, int devfd)
+int vhost_dev_init(struct vhost_dev *hdev, int devfd, int numtxqs)
 {
 uint64_t features;
 int r;
 if (devfd >= 0) {
 hdev->control = devfd;
+hdev->nvqs = 2;
 } else {
 hdev->control = open("/dev/vhost-net", O_RDWR);
 if (hdev->control < 0) {
 return -errno;
 }
 }
-r = ioctl(hdev->control, VHOST_SET_OWNER, NULL);
+
+r = ioctl(hdev->control, VHOST_SET_OWNER, numtxqs);
 if (r < 0) {
 goto fail;
 }
 
+hdev->nvqs = numtxqs + 1;
+
 r = ioctl(hdev->control, VHOST_GET_FEATURES, &features);
 if (r < 0) {
 goto fail;
diff -ruNp org/hw/vhost.h new/hw/vhost.h
--- org/hw/vhost.h  2010-07-01 11:42:09.0 +0530
+++ new/hw/vhost.h  2010-09-08 12:54:50.0 +0530
@@ -40,7 +40,7 @@ struct vhost_dev {
 unsigned long long log_size;
 };
 
-int vhost_dev_init(struct vhost_dev *hdev, int devfd);
+int vhost_dev_init(struct vhost_dev *hdev, int devfd, int nvqs);
 void vhost_dev_cleanup(struct vhost_dev *hdev);
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
diff -ruNp org/hw/vhost_net.c new/hw/vhost_net.c
--- org/hw/vhost_net.c  2010-08-09 09:51:58.0 +0530
+++ new/hw/vhost_net.c  2010-09-08 12:54:50.0 +0530
@@ -36,7 +36,8 @@
 
 struct vhost_net {
 struct vhost_dev dev;
-struct vhost_virtqueue vqs[2];
+struct vhost_virtqueue *vqs;
+int nvqs;
 int backend;
 VLANClientState *vc;
 };
@@ -76,7 +77,8 @@ static int vhost_net_get_fd(VLANClientSt
 }
 }
 
-struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd)
+struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd,
+int numtxqs)
 {
 int r;
 struct vhost_net *net = qemu_malloc(sizeof *net);
@@ -93,10 +95,14 @@ struct vhost_net *vhost_net_init(VLANCli
 (1 << VHOST_NET_F_VIRTIO_NET_HDR);
 net->backend = r;
 
-r = vhost_dev_init(&net->dev, devfd);
+r = vhost_dev_init(&net->dev, devfd, numtxqs);
 if (r < 0) {
 goto fail;
 }
+
+net->nvqs = numtxqs + 1;
+net->vqs = qemu_malloc(net->nvqs * (sizeof *net->vqs));
+
 if (~net->dev.features & net->dev.backend_features) {
 fprintf(stderr, "vhost lacks feature mask %" PRIu64 " for backend\n",
 (uint64_t)(~net->dev.features & net->dev.backend_features));
@@ -118,7 +124,6 @@ int vhost_net_start(struct vhost_net *ne
 struct vhost_vring_file file = { };
 int r;
 
-net->dev.nvqs = 2;
 net->dev.vqs = net->vqs;
 r = vhost_dev_start(&net->dev, dev);
 if (r < 0) {
@@ -166,7 +171,8 @@ void vhost_net_cleanup(struct vhost_net 
 qemu_free(net);
 }
 #else
-struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd)
+struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd,
+int nvqs)
 {
return NULL;
 }
diff -ruNp org/hw/vhost_net.h new/hw/vhost_net.h
--- org/hw/vhost_net.h  2010-07-01 11:42:09.0 +0530
+++ new/hw/vhost_net.h  2010-09-08 12:54:50.0 +0530
@@ -6,7 +6,7 @@
 struct vhost_net;
 typedef struct vhost_net VHostNetState;
 
-VHostNetState *vhost_net_init(VLANClientState *backend, int devfd);
+VHostNetState *vhost_net_init(VLANClientState *backend, int devfd, int nvqs);
 
 int vhost_net_start(VHostNetState *net, VirtIODevice *dev);
 void vhost_net_stop(VHostNetState *net, VirtIODevice *dev);
diff -ruNp org/hw/virtio-net.c new/hw/virtio-net.c
--- org/hw/virtio-net.c 2010-07-19 12:41:28.0 +0530
+++ new/hw/virtio-net.c 2010-09-08 12:54:50.0 +0530
@@ -32,17 +32,17 @@ typedef struct VirtIONet
 uint8_t mac[ETH_ALEN];
 uint16_t status;
 VirtQueue *rx_vq;
-VirtQueue *tx_vq;
+VirtQueue **tx_vq;
 VirtQueue *ctrl_vq;
 NICState *nic;
-QEMUTimer *tx_timer;
-int tx_timer_active;
+QEMUTimer **tx_timer;
+int *tx_timer_active;
 uint32_t has_vnet_hdr;
 uint8_t has_ufo;
 struct {
 VirtQueueElement elem;
 ssize_t len;
-} async_tx;
+} *async_tx;
 int mergeable_rx_bufs;
 

[RFC PATCH 3/4] Changes for vhost

2010-09-08 Thread Krishna Kumar
Changes for mq vhost.

vhost_net_open is changed to allocate a vhost_net and
return.  The remaining initializations are delayed till
SET_OWNER. SET_OWNER is changed so that the argument
is used to figure out how many txqs to use.  Unmodified
qemu's will pass NULL, so this is recognized and handled
as numtxqs=1.

Besides changing handle_tx to use 'vq', this patch also
changes handle_rx to take vq as parameter.  The mq RX
patch requires this change, but till then it is consistent
(and less confusing) to make the interfaces for handling
rx and tx similar.

Signed-off-by: Krishna Kumar 
---
 drivers/vhost/net.c   |  272 ++--
 drivers/vhost/vhost.c |  152 ++
 drivers/vhost/vhost.h |   15 +-
 3 files changed, 289 insertions(+), 150 deletions(-)

diff -ruNp org/drivers/vhost/net.c tx_only/drivers/vhost/net.c
--- org/drivers/vhost/net.c 2010-09-03 16:33:51.0 +0530
+++ tx_only/drivers/vhost/net.c 2010-09-08 10:20:54.0 +0530
@@ -33,12 +33,6 @@
  * Using this limit prevents one virtqueue from starving others. */
 #define VHOST_NET_WEIGHT 0x8
 
-enum {
-   VHOST_NET_VQ_RX = 0,
-   VHOST_NET_VQ_TX = 1,
-   VHOST_NET_VQ_MAX = 2,
-};
-
 enum vhost_net_poll_state {
VHOST_NET_POLL_DISABLED = 0,
VHOST_NET_POLL_STARTED = 1,
@@ -47,12 +41,12 @@ enum vhost_net_poll_state {
 
 struct vhost_net {
struct vhost_dev dev;
-   struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX];
-   struct vhost_poll poll[VHOST_NET_VQ_MAX];
+   struct vhost_virtqueue *vqs;
+   struct vhost_poll *poll;
/* Tells us whether we are polling a socket for TX.
 * We only do this when socket buffer fills up.
 * Protected by tx vq lock. */
-   enum vhost_net_poll_state tx_poll_state;
+   enum vhost_net_poll_state *tx_poll_state;
 };
 
 /* Pop first len bytes from iovec. Return number of segments used. */
@@ -92,28 +86,28 @@ static void copy_iovec_hdr(const struct 
 }
 
 /* Caller must have TX VQ lock */
-static void tx_poll_stop(struct vhost_net *net)
+static void tx_poll_stop(struct vhost_net *net, int qnum)
 {
-   if (likely(net->tx_poll_state != VHOST_NET_POLL_STARTED))
+   if (likely(net->tx_poll_state[qnum] != VHOST_NET_POLL_STARTED))
return;
-   vhost_poll_stop(net->poll + VHOST_NET_VQ_TX);
-   net->tx_poll_state = VHOST_NET_POLL_STOPPED;
+   vhost_poll_stop(&net->poll[qnum]);
+   net->tx_poll_state[qnum] = VHOST_NET_POLL_STOPPED;
 }
 
 /* Caller must have TX VQ lock */
-static void tx_poll_start(struct vhost_net *net, struct socket *sock)
+static void tx_poll_start(struct vhost_net *net, struct socket *sock, int qnum)
 {
-   if (unlikely(net->tx_poll_state != VHOST_NET_POLL_STOPPED))
+   if (unlikely(net->tx_poll_state[qnum] != VHOST_NET_POLL_STOPPED))
return;
-   vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
-   net->tx_poll_state = VHOST_NET_POLL_STARTED;
+   vhost_poll_start(&net->poll[qnum], sock->file);
+   net->tx_poll_state[qnum] = VHOST_NET_POLL_STARTED;
 }
 
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
-static void handle_tx(struct vhost_net *net)
+static void handle_tx(struct vhost_virtqueue *vq)
 {
-   struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX];
+   struct vhost_net *net = container_of(vq->dev, struct vhost_net, dev);
unsigned out, in, s;
int head;
struct msghdr msg = {
@@ -134,7 +128,7 @@ static void handle_tx(struct vhost_net *
wmem = atomic_read(&sock->sk->sk_wmem_alloc);
if (wmem >= sock->sk->sk_sndbuf) {
mutex_lock(&vq->mutex);
-   tx_poll_start(net, sock);
+   tx_poll_start(net, sock, vq->qnum);
mutex_unlock(&vq->mutex);
return;
}
@@ -144,7 +138,7 @@ static void handle_tx(struct vhost_net *
vhost_disable_notify(vq);
 
if (wmem < sock->sk->sk_sndbuf / 2)
-   tx_poll_stop(net);
+   tx_poll_stop(net, vq->qnum);
hdr_size = vq->vhost_hlen;
 
for (;;) {
@@ -159,7 +153,7 @@ static void handle_tx(struct vhost_net *
if (head == vq->num) {
wmem = atomic_read(&sock->sk->sk_wmem_alloc);
if (wmem >= sock->sk->sk_sndbuf * 3 / 4) {
-   tx_poll_start(net, sock);
+   tx_poll_start(net, sock, vq->qnum);
set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
break;
}
@@ -189,7 +183,7 @@ static void handle_tx(struct vhost_net *
err = sock->ops->sendmsg(NULL, sock, &msg, len);
if (unlikely(err < 0)) {
vhost_discard_vq_desc(vq, 1);
-   tx_poll_start(net, sock);
+  

[RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-08 Thread Krishna Kumar
Following patches implement Transmit mq in virtio-net.  Also
included is the user qemu changes.

1. This feature was first implemented with a single vhost.
   Testing showed 3-8% performance gain for upto 8 netperf
   sessions (and sometimes 16), but BW dropped with more
   sessions.  However, implementing per-txq vhost improved
   BW significantly all the way to 128 sessions.
2. For this mq TX patch, 1 daemon is created for RX and 'n'
   daemons for the 'n' TXQ's, for a total of (n+1) daemons.
   The (subsequent) RX mq patch changes that to a total of
   'n' daemons, where RX and TX vq's share 1 daemon.
3. Service Demand increases for TCP, but significantly
   improves for UDP.
4. Interoperability: Many combinations, but not all, of
   qemu, host, guest tested together.


  Enabling mq on virtio:
  ---

When following options are passed to qemu:
- smp > 1
- vhost=on
- mq=on (new option, default:off)
then #txqueues = #cpus.  The #txqueues can be changed by using
an optional 'numtxqs' option. e.g.  for a smp=4 guest:
vhost=on,mq=on ->   #txqueues = 4
vhost=on,mq=on,numtxqs=8   ->   #txqueues = 8
vhost=on,mq=on,numtxqs=2   ->   #txqueues = 2


   Performance (guest -> local host):
   ---

System configuration:
Host:  8 Intel Xeon, 8 GB memory
Guest: 4 cpus, 2 GB memory
All testing without any tuning, and TCP netperf with 64K I/O
___
   TCP (#numtxqs=2)
N#  BW1 BW2(%)  SD1 SD2(%)  RSD1RSD2(%)
___
4   26387   40716 (54.30)   20  28   (40.00)86i 85 (-1.16)
8   24356   41843 (71.79)   88  129  (46.59)372 362(-2.68)
16  23587   40546 (71.89)   375 564  (50.40)15581519   (-2.50)
32  22927   39490 (72.24)   16172171 (34.26)66945722   (-14.52)
48  23067   39238 (70.10)   39315170 (31.51)15823   13552  (-14.35)
64  22927   38750 (69.01)   71429914 (38.81)28972   26173  (-9.66)
96  22568   38520 (70.68)   16258   27844 (71.26)   65944   73031  (10.74)
___
   UDP (#numtxqs=8)
N#  BW1 BW2   (%)  SD1 SD2   (%)
__
4   29836   56761 (90.24)   67  63(-5.97)
8   27666   63767 (130.48)  326 265   (-18.71)
16  25452   60665 (138.35)  13961269  (-9.09)
32  26172   63491 (142.59)  56174202  (-25.19)
48  26146   64629 (147.18)  12813   9316  (-27.29)
64  25575   65448 (155.90)  23063   16346 (-29.12)
128 26454   63772 (141.06)  91054   85051 (-6.59)
__
N#: Number of netperf sessions, 90 sec runs
BW1,SD1,RSD1: Bandwidth (sum across 2 runs in mbps), SD and Remote
  SD for original code
BW2,SD2,RSD2: Bandwidth (sum across 2 runs in mbps), SD and Remote
  SD for new code. e.g. BW2=40716 means average BW2 was
  20358 mbps.


   Next steps:
   ---

1. mq RX patch is also complete - plan to submit once TX is OK.
2. Cache-align data structures: I didn't see any BW/SD improvement
   after making the sq's (and similarly for vhost) cache-aligned
   statically:
struct virtnet_info {
...
struct send_queue sq[16] cacheline_aligned_in_smp;
...
};

Guest interrupts for a 4 TXQ device after a 5 min test:
# egrep "virtio0|CPU" /proc/interrupts 
  CPU0 CPU1 CPU2CPU3   
40:   000   0PCI-MSI-edge  virtio0-config
41:   126955   126912   126505  126940   PCI-MSI-edge  virtio0-input
42:   108583   107787   107853  107716   PCI-MSI-edge  virtio0-output.0
43:   300278   297653   299378  300554   PCI-MSI-edge  virtio0-output.1
44:   372607   374884   371092  372011   PCI-MSI-edge  virtio0-output.2
45:   162042   162261   163623  162923   PCI-MSI-edge  virtio0-output.3

Review/feedback appreciated.

Signed-off-by: Krishna Kumar 
---
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/4] Add a new API to virtio-pci

2010-09-08 Thread Krishna Kumar
Add virtio_get_queue_index() to get the queue index of a
vq.  This is needed by the cb handler to locate the queue
that should be processed.

Signed-off-by: Krishna Kumar 
---
 drivers/virtio/virtio_pci.c |9 +
 include/linux/virtio.h  |3 +++
 2 files changed, 12 insertions(+)

diff -ruNp org/include/linux/virtio.h tx_only/include/linux/virtio.h
--- org/include/linux/virtio.h  2010-09-03 16:33:51.0 +0530
+++ tx_only/include/linux/virtio.h  2010-09-08 10:23:36.0 +0530
@@ -136,4 +136,7 @@ struct virtio_driver {
 
 int register_virtio_driver(struct virtio_driver *drv);
 void unregister_virtio_driver(struct virtio_driver *drv);
+
+/* return the internal queue index associated with the virtqueue */
+extern int virtio_get_queue_index(struct virtqueue *vq);
 #endif /* _LINUX_VIRTIO_H */
diff -ruNp org/drivers/virtio/virtio_pci.c tx_only/drivers/virtio/virtio_pci.c
--- org/drivers/virtio/virtio_pci.c 2010-09-03 16:33:51.0 +0530
+++ tx_only/drivers/virtio/virtio_pci.c 2010-09-08 10:23:16.0 +0530
@@ -359,6 +359,15 @@ static int vp_request_intx(struct virtio
return err;
 }
 
+/* Return the internal queue index associated with the virtqueue */
+int virtio_get_queue_index(struct virtqueue *vq)
+{
+   struct virtio_pci_vq_info *info = vq->priv;
+
+   return info->queue_index;
+}
+EXPORT_SYMBOL(virtio_get_queue_index);
+
 static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
  void (*callback)(struct virtqueue *vq),
  const char *name,
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/4] Changes for virtio-net

2010-09-08 Thread Krishna Kumar
Implement mq virtio-net driver.

Though struct virtio_net_config changes, it works with old
qemu's since the last element is not accessed, unless qemu
sets VIRTIO_NET_F_NUMTXQS.

Signed-off-by: Krishna Kumar 
---
 drivers/net/virtio_net.c   |  213 ++-
 include/linux/virtio_net.h |6 
 2 files changed, 166 insertions(+), 53 deletions(-)

diff -ruNp org/include/linux/virtio_net.h tx_only/include/linux/virtio_net.h
--- org/include/linux/virtio_net.h  2010-09-03 16:33:51.0 +0530
+++ tx_only/include/linux/virtio_net.h  2010-09-08 10:39:22.0 +0530
@@ -7,6 +7,9 @@
 #include 
 #include 
 
+/* The maximum of transmit queues supported */
+#define VIRTIO_MAX_TXQS16
+
 /* The feature bitmap for virtio net */
 #define VIRTIO_NET_F_CSUM  0   /* Host handles pkts w/ partial csum */
 #define VIRTIO_NET_F_GUEST_CSUM1   /* Guest handles pkts w/ 
partial csum */
@@ -26,6 +29,7 @@
 #define VIRTIO_NET_F_CTRL_RX   18  /* Control channel RX mode support */
 #define VIRTIO_NET_F_CTRL_VLAN 19  /* Control channel VLAN filtering */
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20  /* Extra RX mode control support */
+#define VIRTIO_NET_F_NUMTXQS   21  /* Device supports multiple TX queue */
 
 #define VIRTIO_NET_S_LINK_UP   1   /* Link is up */
 
@@ -34,6 +38,8 @@ struct virtio_net_config {
__u8 mac[6];
/* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
__u16 status;
+   /* number of transmit queues */
+   __u16 numtxqs;
 } __attribute__((packed));
 
 /* This is the first element of the scatter-gather list.  If you don't
diff -ruNp org/drivers/net/virtio_net.c tx_only/drivers/net/virtio_net.c
--- org/drivers/net/virtio_net.c2010-09-03 16:33:51.0 +0530
+++ tx_only/drivers/net/virtio_net.c2010-09-08 12:14:19.0 +0530
@@ -40,9 +40,20 @@ module_param(gso, bool, 0444);
 
 #define VIRTNET_SEND_COMMAND_SG_MAX2
 
+/* Our representation of a send virtqueue */
+struct send_queue {
+   struct virtqueue *svq;
+
+   /* TX: fragments + linear part + virtio header */
+   struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
+};
+
 struct virtnet_info {
struct virtio_device *vdev;
-   struct virtqueue *rvq, *svq, *cvq;
+   int numtxqs;/* Number of tx queues */
+   struct send_queue *sq;
+   struct virtqueue *rvq;
+   struct virtqueue *cvq;
struct net_device *dev;
struct napi_struct napi;
unsigned int status;
@@ -62,9 +73,8 @@ struct virtnet_info {
/* Chain pages by the private ptr. */
struct page *pages;
 
-   /* fragments + linear part + virtio header */
+   /* RX: fragments + linear part + virtio header */
struct scatterlist rx_sg[MAX_SKB_FRAGS + 2];
-   struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
 };
 
 struct skb_vnet_hdr {
@@ -120,12 +130,13 @@ static struct page *get_a_page(struct vi
 static void skb_xmit_done(struct virtqueue *svq)
 {
struct virtnet_info *vi = svq->vdev->priv;
+   int qnum = virtio_get_queue_index(svq) - 1; /* 0 is RX vq */
 
/* Suppress further interrupts. */
virtqueue_disable_cb(svq);
 
/* We were probably waiting for more output buffers. */
-   netif_wake_queue(vi->dev);
+   netif_wake_subqueue(vi->dev, qnum);
 }
 
 static void set_skb_frag(struct sk_buff *skb, struct page *page,
@@ -495,12 +506,13 @@ again:
return received;
 }
 
-static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
+static unsigned int free_old_xmit_skbs(struct virtnet_info *vi,
+  struct virtqueue *svq)
 {
struct sk_buff *skb;
unsigned int len, tot_sgs = 0;
 
-   while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
+   while ((skb = virtqueue_get_buf(svq, &len)) != NULL) {
pr_debug("Sent skb %p\n", skb);
vi->dev->stats.tx_bytes += skb->len;
vi->dev->stats.tx_packets++;
@@ -510,7 +522,8 @@ static unsigned int free_old_xmit_skbs(s
return tot_sgs;
 }
 
-static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
+static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb,
+   struct virtqueue *svq, struct scatterlist *tx_sg)
 {
struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
@@ -548,12 +561,12 @@ static int xmit_skb(struct virtnet_info 
 
/* Encode metadata header at front. */
if (vi->mergeable_rx_bufs)
-   sg_set_buf(vi->tx_sg, &hdr->mhdr, sizeof hdr->mhdr);
+   sg_set_buf(tx_sg, &hdr->mhdr, sizeof hdr->mhdr);
else
-   sg_set_buf(vi->tx_sg, &hdr->hdr, sizeof hdr->hdr);
+   sg_set_buf(tx_sg, &hdr->hdr, sizeof hdr->hdr);
 
-   hdr->num_sg = skb_to_sgvec(skb, vi->tx_sg + 1, 0, skb->len) + 1;
-   return virtqueue_add_b

Re: [PATCH 22/27] KVM: MMU: Refactor mmu_alloc_roots function

2010-09-08 Thread Avi Kivity

 On 09/07/2010 11:39 PM, Marcelo Tosatti wrote:



@@ -2406,16 +2441,11 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
root_gfn = pdptr>>  PAGE_SHIFT;
if (mmu_check_root(vcpu, root_gfn))
return 1;
-   } else if (vcpu->arch.mmu.root_level == 0)
-   root_gfn = 0;
-   if (vcpu->arch.mmu.direct_map) {
-   direct = 1;
-   root_gfn = i<<  30;
}
spin_lock(&vcpu->kvm->mmu_lock);
kvm_mmu_free_some_pages(vcpu);
sp = kvm_mmu_get_page(vcpu, root_gfn, i<<  30,
- PT32_ROOT_LEVEL, direct,
+ PT32_ROOT_LEVEL, 0,
  ACC_ALL, NULL);

Should not write protect the gfn for nonpaging mode.



nonpaging mode should have direct_map set, so wouldn't enter this path 
at all.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html