Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support

2012-09-03 Thread Paolo Bonzini
Il 04/09/2012 04:21, Nicholas A. Bellinger ha scritto:
>> @@ -112,6 +118,9 @@ static void virtscsi_complete_cmd(struct virtio_scsi 
>> *vscsi, void *buf)
>>  struct virtio_scsi_cmd *cmd = buf;
>>  struct scsi_cmnd *sc = cmd->sc;
>>  struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd;
>> +struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id];
>> +
>> +atomic_dec(&tgt->reqs);
>>  
> 
> As tgt->tgt_lock is taken in virtscsi_queuecommand_multi() before the
> atomic_inc_return(tgt->reqs) check, it seems like using atomic_dec() w/o
> smp_mb__after_atomic_dec or tgt_lock access here is not using atomic.h
> accessors properly, no..?

No, only a single "thing" is being accessed, and there is no need to
order the decrement with respect to preceding or subsequent accesses to
other locations.

In other words, tgt->reqs is already synchronized with itself, and that
is enough.

(Besides, on x86 smp_mb__after_atomic_dec is a nop).

>> +static int virtscsi_queuecommand_multi(struct Scsi_Host *sh,
>> +   struct scsi_cmnd *sc)
>> +{
>> +struct virtio_scsi *vscsi = shost_priv(sh);
>> +struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id];
>> +unsigned long flags;
>> +u32 queue_num;
>> +
>> +/* Using an atomic_t for tgt->reqs lets the virtqueue handler
>> + * decrement it without taking the spinlock.
>> + */
>> +spin_lock_irqsave(&tgt->tgt_lock, flags);
>> +if (atomic_inc_return(&tgt->reqs) == 1) {
>> +queue_num = smp_processor_id();
>> +while (unlikely(queue_num >= vscsi->num_queues))
>> +queue_num -= vscsi->num_queues;
>> +tgt->req_vq = &vscsi->req_vqs[queue_num];
>> +}
>> +spin_unlock_irqrestore(&tgt->tgt_lock, flags);
>> +return virtscsi_queuecommand(vscsi, tgt, sc);
>> +}
>> +
> 
> The extra memory barriers to get this right for the current approach are
> just going to slow things down even more for virtio-scsi-mq..

virtio-scsi multiqueue has a performance benefit up to 20% (for a single
LUN) or 40% (on overall bandwidth across multiple LUNs).  I doubt that a
single memory barrier can have that much impact. :)

The way to go to improve performance even more is to add new virtio APIs
for finer control of the usage of the ring.  These should let us avoid
copying the sg list and almost get rid of the tgt_lock; even though the
locking is quite efficient in virtio-scsi (see how tgt_lock and vq_lock
are "pipelined" so as to overlap the preparation of two requests), it
should give a nice improvement and especially avoid a kmalloc with small
requests.  I may have some time for it next month.

> Jen's approach is what we will ultimately need to re-architect in SCSI
> core if we're ever going to move beyond the issues of legacy host_lock,
> so I'm wondering if maybe this is the direction that virtio-scsi-mq
> needs to go in as well..?

We can see after the block layer multiqueue work goes in...  I also need
to look more closely at Jens's changes.

Have you measured the host_lock to be a bottleneck in high-iops
benchmarks, even for a modern driver that does not hold it in
queuecommand?  (Certainly it will become more important as the
virtio-scsi queuecommand becomes thinner and thinner).  If so, we can
start looking at limiting host_lock usage in the fast path.

BTW, supporting this in tcm-vhost should be quite trivial, as all the
request queues are the same and all serialization is done in the
virtio-scsi driver.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/3] KVM: x86: trace mmio begin and complete

2012-09-03 Thread Xiao Guangrong
On 09/03/2012 07:07 PM, Avi Kivity wrote:
> On 08/27/2012 12:51 PM, Dong Hao wrote:
>> From: Xiao Guangrong 
>>
>> 'perf kvm stat record/report' will use kvm_exit and kvm_mmio(read...) to
>> calculate mmio read emulated time for the old kernel, in order to trace
>> mmio read event more exactly, we add kvm_mmio_begin to trace the time when
>> mmio read begins, also, add kvm_io_done to trace the time when mmio/pio is
>> completed
>>
> 
> Why is this so critical?
> 
> If a lot of time is spent in in-kernel mmio, then 'perf top' will report
> it.  Otherwise the time between kvm_exit and kvm_entry describes the
> time spent in the host.  Not all of it is mmio handling, but it is quite
> close.

I have done some test, the new events can get more exact result, this is a
example in my slides:

the event handled time calculated by old events is: 0.66(s)
calculated by new events, the result is: 0.05(s).

So, i think it is worth introducing these new events.

But if you do not care it or think these events are duplicate with current
events or unmaintainable, i do not have strong opinion on that. Please let
me know, i will drop them in the next version.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 1/3] KVM: x86: export svm/vmx exit code and vector code to userspace

2012-09-03 Thread Xiao Guangrong
On 09/03/2012 07:13 PM, Avi Kivity wrote:
> On 08/27/2012 12:51 PM, Dong Hao wrote:
>> From: Xiao Guangrong 
>>
>> Exporting KVM exit information to userspace to be consumed by perf.
>>
>> [ Dong Hao : rebase it on acme's git tree ]
>> Signed-off-by: Xiao Guangrong 
>> Signed-off-by: Dong Hao 
>> ---
>>  arch/x86/include/asm/kvm_host.h |   36 ---
> 
> Please put ABIs in kvm.h.  But see below.
> 
>>  arch/x86/include/asm/svm.h  |  205 
>> +--
>>  arch/x86/include/asm/vmx.h  |  126 
>>  arch/x86/kvm/trace.h|   89 -
>>  4 files changed, 234 insertions(+), 222 deletions(-)
>>
>>  
>> +#define DE_VECTOR 0
>> +#define DB_VECTOR 1
>> +#define BP_VECTOR 3
>> +#define OF_VECTOR 4
>> +#define BR_VECTOR 5
>> +#define UD_VECTOR 6
>> +#define NM_VECTOR 7
>> +#define DF_VECTOR 8
>> +#define TS_VECTOR 10
>> +#define NP_VECTOR 11
>> +#define SS_VECTOR 12
>> +#define GP_VECTOR 13
>> +#define PF_VECTOR 14
>> +#define MF_VECTOR 16
>> +#define MC_VECTOR 18
> 
> This is not a kvm ABI, but an x86 architecture constants.  It should be
> put into an existing x86 header.

Okay, i will move them into asm/kvm.h

> 
>> +
>>  #endif /* _ASM_X86_KVM_HOST_H */
>> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
>> index f2b83bc..cdf5674 100644
>> --- a/arch/x86/include/asm/svm.h
>> +++ b/arch/x86/include/asm/svm.h
>> @@ -1,6 +1,135 @@
>>  #ifndef __SVM_H
>>  #define __SVM_H
>>  
>> +
>> +#ifdef __KERNEL__
>> +
> 
> The entire file can be exported; nothing in there is implementation
> specific.

Unfortunately, i have tied this but failed:

../../arch/x86/include/asm/svm.h:262:1: error: packed attribute is unnecessary 
for ‘vmcb_seg’ [-Werror=packed]
../../arch/x86/include/asm/svm.h:307:1: error: packed attribute is unnecessary 
for ‘vmcb_save_area’ [-Werror=packed]
../../arch/x86/include/asm/svm.h:312:1: error: packed attribute is unnecessary 
for ‘vmcb’ [-Werror=packed]
..

> 
>> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
>> index 74fcb96..61e04e9 100644
>> --- a/arch/x86/include/asm/vmx.h
>> +++ b/arch/x86/include/asm/vmx.h
>> +
>> +#ifdef __KERNEL__
>> +
> 
> Ditto.

Or:

../../arch/x86/include/asm/vmx.h:376:0: error: "REG_R8" redefined [-Werror]
/usr/include/sys/ucontext.h:46:0: note: this is the location of the previous 
definition
../../arch/x86/include/asm/vmx.h:377:0: error: "REG_R9" redefined [-Werror]
/usr/include/sys/ucontext.h:48:0: note: this is the location of the previous 
definition
..

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-09-03 Thread Alex Williamson
On Mon, 2012-09-03 at 18:59 +0300, Avi Kivity wrote:
> On 08/29/2012 11:49 AM, Peter Maydell wrote:
> > On 29 August 2012 09:47, Jan Kiszka  wrote:
> >> On 2012-08-28 23:26, Peter Maydell wrote:
> >>> Since this is arch-specific we should probably give the
> >>> resulting device a more specific name than "pci-assign",
> >>> which implies that it is (a) ok for any PCI system and
> >>> (b) not something that will be obsolete in the foreseen
> >>> future.
> >>
> >> The name has to be like this for libvirt compatibility. I can provide it
> >> as alias, though, calling the device "kvm-pci-assign" by default. Better?
> > 
> > ...is it likely to ever work for non-x86 PCI devices?
> 
> It used to work for ia64 before it died.  So it's not architecture
> specific by design.

Caveat; ia64 is a different architecture, but it was on Intel VT-d
chipsets, which therefore supported the IOMMU API and ia64 maps guests
the same way.  This is the same problem I ran into trying to name the
vfio iommu driver for x86.  It's not really x86-specific, but it's got
to look and smell pretty similar.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support

2012-09-03 Thread Nicholas A. Bellinger
On Tue, 2012-08-28 at 13:54 +0200, Paolo Bonzini wrote:
> This patch adds queue steering to virtio-scsi.  When a target is sent
> multiple requests, we always drive them to the same queue so that FIFO
> processing order is kept.  However, if a target was idle, we can choose
> a queue arbitrarily.  In this case the queue is chosen according to the
> current VCPU, so the driver expects the number of request queues to be
> equal to the number of VCPUs.  This makes it easy and fast to select
> the queue, and also lets the driver optimize the IRQ affinity for the
> virtqueues (each virtqueue's affinity is set to the CPU that "owns"
> the queue).
> 
> Signed-off-by: Paolo Bonzini 
> ---

Hey Paolo & Co,

I've not had a chance to try this with tcm_vhost just yet, but noticed
one thing wrt to assumptions about virtio_scsi_target_state->reqs access
below..

>  drivers/scsi/virtio_scsi.c |  162 
> +++-
>  1 files changed, 130 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 6414ea0..0c4b096 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -26,6 +26,7 @@
>  
>  #define VIRTIO_SCSI_MEMPOOL_SZ 64
>  #define VIRTIO_SCSI_EVENT_LEN 8
> +#define VIRTIO_SCSI_VQ_BASE 2
>  
>  /* Command queue element */
>  struct virtio_scsi_cmd {
> @@ -59,9 +60,13 @@ struct virtio_scsi_vq {
>  
>  /* Per-target queue state */
>  struct virtio_scsi_target_state {
> - /* Protects sg.  Lock hierarchy is tgt_lock -> vq_lock.  */
> + /* Protects sg, req_vq.  Lock hierarchy is tgt_lock -> vq_lock.  */
>   spinlock_t tgt_lock;
>  
> + struct virtio_scsi_vq *req_vq;
> +
> + atomic_t reqs;
> +
>   /* For sglist construction when adding commands to the virtqueue.  */
>   struct scatterlist sg[];
>  };
> @@ -70,14 +75,15 @@ struct virtio_scsi_target_state {
>  struct virtio_scsi {
>   struct virtio_device *vdev;
>  
> - struct virtio_scsi_vq ctrl_vq;
> - struct virtio_scsi_vq event_vq;
> - struct virtio_scsi_vq req_vq;
> -
>   /* Get some buffers ready for event vq */
>   struct virtio_scsi_event_node event_list[VIRTIO_SCSI_EVENT_LEN];
>  
> + u32 num_queues;
>   struct virtio_scsi_target_state **tgt;
> +
> + struct virtio_scsi_vq ctrl_vq;
> + struct virtio_scsi_vq event_vq;
> + struct virtio_scsi_vq req_vqs[];
>  };
>  
>  static struct kmem_cache *virtscsi_cmd_cache;
> @@ -112,6 +118,9 @@ static void virtscsi_complete_cmd(struct virtio_scsi 
> *vscsi, void *buf)
>   struct virtio_scsi_cmd *cmd = buf;
>   struct scsi_cmnd *sc = cmd->sc;
>   struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd;
> + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id];
> +
> + atomic_dec(&tgt->reqs);
>  

As tgt->tgt_lock is taken in virtscsi_queuecommand_multi() before the
atomic_inc_return(tgt->reqs) check, it seems like using atomic_dec() w/o
smp_mb__after_atomic_dec or tgt_lock access here is not using atomic.h
accessors properly, no..?

>   dev_dbg(&sc->device->sdev_gendev,
>   "cmd %p response %u status %#02x sense_len %u\n",
> @@ -185,11 +194,13 @@ static void virtscsi_req_done(struct virtqueue *vq)
>  {
>   struct Scsi_Host *sh = virtio_scsi_host(vq->vdev);
>   struct virtio_scsi *vscsi = shost_priv(sh);
> + int index = virtqueue_get_queue_index(vq) - VIRTIO_SCSI_VQ_BASE;
> + struct virtio_scsi_vq *req_vq = &vscsi->req_vqs[index];
>   unsigned long flags;
>  
> - spin_lock_irqsave(&vscsi->req_vq.vq_lock, flags);
> + spin_lock_irqsave(&req_vq->vq_lock, flags);
>   virtscsi_vq_done(vscsi, vq, virtscsi_complete_cmd);
> - spin_unlock_irqrestore(&vscsi->req_vq.vq_lock, flags);
> + spin_unlock_irqrestore(&req_vq->vq_lock, flags);
>  };
>  
>  static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf)
> @@ -429,10 +440,10 @@ static int virtscsi_kick_cmd(struct 
> virtio_scsi_target_state *tgt,
>   return ret;
>  }
>  
> -static int virtscsi_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
> +static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
> +  struct virtio_scsi_target_state *tgt,
> +  struct scsi_cmnd *sc)
>  {
> - struct virtio_scsi *vscsi = shost_priv(sh);
> - struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id];
>   struct virtio_scsi_cmd *cmd;
>   int ret;
>  
> @@ -466,7 +477,7 @@ static int virtscsi_queuecommand(struct Scsi_Host *sh, 
> struct scsi_cmnd *sc)
>   BUG_ON(sc->cmd_len > VIRTIO_SCSI_CDB_SIZE);
>   memcpy(cmd->req.cmd.cdb, sc->cmnd, sc->cmd_len);
>  
> - if (virtscsi_kick_cmd(tgt, &vscsi->req_vq, cmd,
> + if (virtscsi_kick_cmd(tgt, tgt->req_vq, cmd,
> sizeof cmd->req.cmd, sizeof cmd->resp.cmd,
> GFP_ATOMIC) >= 0)
>   ret = 0;
> @@ -475,6 +486,38 @@ out:
>  

Re: [PATCH v4 0/8] KVM paravirt remote flush tlb

2012-09-03 Thread Nikunj A Dadhania
On Mon, 03 Sep 2012 17:33:46 +0300, Avi Kivity  wrote:
> On 08/21/2012 02:25 PM, Nikunj A. Dadhania wrote:
> > 
> > kernbench(lower is better)
> > ==
> >  base  pvflushv4  %improvement
> > 1VM48.5800   46.8513   3.55846
> > 2VM   108.1823  104.6410   3.27346
> > 3VM   183.2733  163.3547  10.86825
> > 
> > ebizzy(higher is better)
> > 
> >  base pvflushv4  %improvement
> > 1VM 2414.5000 2089.8750 -13.44481
> > 2VM 2167.6250 2371.7500  9.41699
> > 3VM 1600. 2102.5556 31.40060
> > 
> 
> The regression is worrying.  We're improving the contended case at the
> cost of the non-contended case, this is usually the wrong thing to do.
> Do we have any clear idea of the cause of the regression?
> 
Previous perf numbers suggest that in 1VM scenario flush_tlb_others_ipi
is around 2%, while for contented case its around 10%. That is what is
helping contended case.

Regards,
Nikunj

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] kvm: Fix warning from static code analysis

2012-09-03 Thread Peter Maydell
On 3 September 2012 21:40, Stefan Weil  wrote:
> Report from smatch:
>
> kvm-all.c:1373 kvm_init(135) warn:
>  variable dereferenced before check 's' (see line 1360)
>
> 's' cannot by NULL (it was alloced using g_malloc0), so there is no need
> to check it here.
>
> Signed-off-by: Stefan Weil 

Reviewed-by: Peter Maydell 

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: Fix warning from static code analysis

2012-09-03 Thread Stefan Weil
Report from smatch:

kvm-all.c:1373 kvm_init(135) warn:
 variable dereferenced before check 's' (see line 1360)

's' cannot by NULL (it was alloced using g_malloc0), so there is no need
to check it here.

Signed-off-by: Stefan Weil 
---
 kvm-all.c |   12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 34b02c1..20b980d 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1370,13 +1370,11 @@ int kvm_init(void)
 return 0;
 
 err:
-if (s) {
-if (s->vmfd >= 0) {
-close(s->vmfd);
-}
-if (s->fd != -1) {
-close(s->fd);
-}
+if (s->vmfd >= 0) {
+close(s->vmfd);
+}
+if (s->fd != -1) {
+close(s->fd);
 }
 g_free(s);
 
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-09-03 Thread Blue Swirl
On Mon, Sep 3, 2012 at 4:14 PM, Avi Kivity  wrote:
> On 08/29/2012 11:27 AM, Markus Armbruster wrote:
>>
>> I don't see a point in making contributors avoid non-problems that might
>> conceivably become trivial problems some day.  Especially when there's
>> no automated help with the avoiding.
>
> -Wpointer-arith

+1

>
>
>
> --
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Sending a VM for restart (after it has been shutdown and saved)

2012-09-03 Thread Michael Johns
Hi list,

I have been hacking the KVM-QEMU code, but need some help to be able
to perform a particular operation.

Currently, I perform some operations on the VM image after it has
received a shutdown call, and after the image itself has been fully
saved before KVM-QEMU shuts down. I perform these operations within
the bdrv_close_all () method within block.c, and it works exactly as I
want it to.

What I would like to be able to do is to send the VM in question back
for a restart from that part of the code, and I was wondering if this
was possible with a given command. When I say send it back, I mean
literally to have the KVM-QEMU treat the VM as if has just been
started/restarted. Although I know a few things, there is still a lot
I don't know so it would be great to get some help on this.

Thanks,

M
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-09-03 Thread Avi Kivity
On 08/29/2012 11:27 AM, Markus Armbruster wrote:
> 
> I don't see a point in making contributors avoid non-problems that might
> conceivably become trivial problems some day.  Especially when there's
> no automated help with the avoiding.

-Wpointer-arith



-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-09-03 Thread Avi Kivity
On 08/28/2012 03:30 AM, Jan Kiszka wrote:
>> 
>> Maybe add case 8: and default: with abort(), also below.
> 
> PIO is never 8 bytes long, the generic layer protects us.

Note: eventually the pio space will be mapped directly to mmio (instead
of being bounced via cpu_inb() in the bridge's mmio handler), and so we
will have to add this restriction to the memory core.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool

2012-09-03 Thread David Ahern

On 9/3/12 2:48 AM, don wrote:

于 2012年08月31日 02:29, David Ahern 写道:

In addition to Andrew's comment about making the stats struct and
functions generic...

Yes. :-)


On 8/27/12 3:51 AM, Dong Hao wrote:
---8<---


+static void exit_event_decode_key(struct event_key *key, char
decode[20])
+{
+   const char *exit_reason = get_exit_reason(key->key);
+
+   snprintf(decode, 20, "%s", exit_reason);
+}


Use scnprintf rather than snprintf.

Why? Since we don't care about the return value, what's the difference?


1. consistency (scnprintf is preferred).
2. what if a new exit reason is added in the future that is larger than 
19 characters? Better to be safe now.




---8<---


+static bool kvm_event_expand(struct kvm_event *event, int vcpu_id)
+{
+int old_max_vcpu = event->max_vcpu;
+
+if (vcpu_id < event->max_vcpu)
+return true;
+
+while (event->max_vcpu <= vcpu_id)
+event->max_vcpu += DEFAULT_VCPU_NUM;
+
+event->vcpu = realloc(event->vcpu,
+  event->max_vcpu * sizeof(*event->vcpu));
+if (!event->vcpu) {


If realloc fails you leak memory by overwriting the current pointer.

Thanks for pointing it out, we will terminate the running instance in
our next
version.


Ok. Make sure to free the memory of the previous pointer on failure 
before cleaning up. The idea is that all allocations are properly freed 
on exit.




---8<---


+static double event_stats_stddev(int vcpu_id, struct kvm_event *event)
+{
+struct event_stats *stats = &event->total;
+double variance, variance_mean, stddev;
+
+if (vcpu_id != -1)
+stats = &event->vcpu[vcpu_id];
+
+BUG_ON(!stats->count);
+
+variance = stats->M2 / (stats->count - 1);
+variance_mean = variance / stats->count;
+stddev = sqrt(variance_mean);
+
+return stddev * 100 / stats->mean;
+}


perf should be consistent in the stddev it shows the user. Any reason
to dump relative stddev versus stddev used by perf-stat?

Since 'perf stat' uses relative standard deviation rather than stddev,
'perf kvm stat'
just follows the style of 'perf stat'.


Yes, I've been working on an idea motivated by Andi Kleen. I have 
noticed that perf stat also uses relative stddev just in a non-direct 
way. I suggest moving common stats code from perf-stat to 
utils/stat.[ch], add a rel_stddev_stats function that returns the above 
and have perf-kvm use that.




---8<---


+/*
+ * Append "-a" only if "-p"/"--pid" is not specified since they
+ * are mutually exclusive.
+ */
+if (!kvm_record_specified_guest(argc, argv))
+rec_argv[i++] = STRDUP_FAIL_EXIT("-a");


Other perf-kvm commands rely on perf-record semantics -- i.e., for
user to add the -a or -p option.

You mean, remove '-a' from the default options, then:
if a user wants to record all guest he will use 'perf stat record -a';
and if a user wants to record the specified guest, he should use
'perf stat record -p xxx'?


Yes. Well, 'perf kvm stat record' with a -a or -p. That makes the new 
stat subcommand consistent with just 'perf kvm record'.




Well, as the style of other subcommand, e.g., perf lock/perf sched, the
'perf xxx record' record all events on all cpus, no need to use '-a'.

Based on mentioned above, I prefer the original way. ;)


Yes, I noticed that some commands add -a before calling cmd_record(). 
Can't change that but we can keep perf-kvm consistent with its own 
variants which means not automagically adding -a.


David
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-09-03 Thread Avi Kivity
On 08/29/2012 11:49 AM, Peter Maydell wrote:
> On 29 August 2012 09:47, Jan Kiszka  wrote:
>> On 2012-08-28 23:26, Peter Maydell wrote:
>>> Since this is arch-specific we should probably give the
>>> resulting device a more specific name than "pci-assign",
>>> which implies that it is (a) ok for any PCI system and
>>> (b) not something that will be obsolete in the foreseen
>>> future.
>>
>> The name has to be like this for libvirt compatibility. I can provide it
>> as alias, though, calling the device "kvm-pci-assign" by default. Better?
> 
> ...is it likely to ever work for non-x86 PCI devices?

It used to work for ia64 before it died.  So it's not architecture
specific by design.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/8] KVM paravirt remote flush tlb

2012-09-03 Thread Avi Kivity
On 09/03/2012 05:33 PM, Avi Kivity wrote:
> On 08/21/2012 02:25 PM, Nikunj A. Dadhania wrote:
>> 
>> kernbench(lower is better)
>> ==
>>  base  pvflushv4  %improvement
>> 1VM48.5800   46.8513   3.55846
>> 2VM   108.1823  104.6410   3.27346
>> 3VM   183.2733  163.3547  10.86825
>> 
>> ebizzy(higher is better)
>> 
>>  base pvflushv4  %improvement
>> 1VM 2414.5000 2089.8750 -13.44481
>> 2VM 2167.6250 2371.7500  9.41699
>> 3VM 1600. 2102.5556 31.40060
>> 
> 
> The regression is worrying.  We're improving the contended case at the
> cost of the non-contended case, this is usually the wrong thing to do.
> Do we have any clear idea of the cause of the regression?

For that matter, why do we get an improvement in the 1VM kbuild test?
All vcpus should be running most of the time, barring I/O thread activity.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add a page cache-backed balloon device driver.

2012-09-03 Thread Avi Kivity
On 06/26/2012 11:32 PM, Frank Swiderski wrote:
> This implementation of a virtio balloon driver uses the page cache to
> "store" pages that have been released to the host.  The communication
> (outside of target counts) is one way--the guest notifies the host when
> it adds a page to the page cache, allowing the host to madvise(2) with
> MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
> (via the regular page reclaim).  This means that inflating the balloon
> is similar to the existing balloon mechanism, but the deflate is
> different--it re-uses existing Linux kernel functionality to
> automatically reclaim.

Interesting idea.

How is the host able to manage overcommit this way?  If deflate is not
host controlled, the host may start swapping guests out to disk if they
all self-deflate.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/8] KVM paravirt remote flush tlb

2012-09-03 Thread Avi Kivity
On 08/21/2012 02:25 PM, Nikunj A. Dadhania wrote:
> 
> kernbench(lower is better)
> ==
>  base  pvflushv4  %improvement
> 1VM48.5800   46.8513   3.55846
> 2VM   108.1823  104.6410   3.27346
> 3VM   183.2733  163.3547  10.86825
> 
> ebizzy(higher is better)
> 
>  base pvflushv4  %improvement
> 1VM 2414.5000 2089.8750 -13.44481
> 2VM 2167.6250 2371.7500  9.41699
> 3VM 1600. 2102.5556 31.40060
> 

The regression is worrying.  We're improving the contended case at the
cost of the non-contended case, this is usually the wrong thing to do.
Do we have any clear idea of the cause of the regression?


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Tuesday, September 4th

2012-09-03 Thread Avi Kivity
On 09/03/2012 04:35 PM, Jan Kiszka wrote:
> On 2012-09-03 13:48, Avi Kivity wrote:
>> On 09/03/2012 09:44 AM, Juan Quintela wrote:
>>>
>>> Hi
>>>
>>> Please send in any agenda items you are interested in covering.
>> 
>> - protecting MemoryRegion::opaque during dispatch
>> 
>> I'm guessing Ping won't make it due to timezone problems.  Jan, if you
>> will not participate, please remove the topic from the list (unless
>> someone else wants to argue your side).
> 
> Sorry, I'm blocked right at that time.

NP, will continue on the list.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Tuesday, September 4th

2012-09-03 Thread Jan Kiszka
On 2012-09-03 13:48, Avi Kivity wrote:
> On 09/03/2012 09:44 AM, Juan Quintela wrote:
>>
>> Hi
>>
>> Please send in any agenda items you are interested in covering.
> 
> - protecting MemoryRegion::opaque during dispatch
> 
> I'm guessing Ping won't make it due to timezone problems.  Jan, if you
> will not participate, please remove the topic from the list (unless
> someone else wants to argue your side).

Sorry, I'm blocked right at that time.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt-users] vm pxe fail

2012-09-03 Thread Avi Kivity
On 08/31/2012 05:37 PM, Alex Jia wrote:
> Hi Andrew,
> Great, BTW, in fact, you may pxe boot via VF of Intel82576, however, 
> Intel82576 SR-IOV network adapters 
> don't provide a ROM BIOS for the cards virtual functions (VF), but an image 
> of such a ROM is available, 
> and with this ROM visible to the guest, it can PXE boot.
> 
> In libvirt's xml, you need to configure guest XML like this:
> 
>   
> 
>   
>  
>  
>
>   
> 
> You need to build a ipxe-808610ca.rom by yourself, if you're interested in 
> this,
> please refer to http://ipxe.org/.

Is there a way to automate this?  Perhaps a database matching PCI IDs
and ipxe .roms, which qemu could consult?


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM on kernel 2.6.16

2012-09-03 Thread Avi Kivity
On 09/03/2012 04:24 PM, Avi Kivity wrote:
> On 09/02/2012 07:08 PM, Lentes, Bernd wrote:
>> Hi,
>> 
>> i have several servers with SLES 10 SP4 and want to run kvm on them. SLES 10 
>> has kernel 2.6.16.
>> Is that possible ?
>> 
>> Thanks for any answer.
> 
> In general no.  Please contact your server vendor though.

I meant OS vendor


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM on kernel 2.6.16

2012-09-03 Thread Avi Kivity
On 09/02/2012 07:08 PM, Lentes, Bernd wrote:
> Hi,
> 
> i have several servers with SLES 10 SP4 and want to run kvm on them. SLES 10 
> has kernel 2.6.16.
> Is that possible ?
> 
> Thanks for any answer.

In general no.  Please contact your server vendor though.



-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bright Cluster Manager Support #2699] NFSoRDMA not working with KVM when cache disabled

2012-09-03 Thread Pawan Jaitly
It looks like this issue has been dealt with, so I'm closing its ticket.

* If you would like to reopen the ticket, then please respond to this mail with
the same subject header.

* If you would like to open a new ticket, then please send a mail with a new
subject header.

In either case, the mail must go to the supp...@brightcomputing.com address.

regards
Pawan


On Sun Sep 02 22:46:22 2012, martijn wrote:
> Hi Andrew,
>
>
> In that case I don't know. I would suggest not to disable the cache.
>
> Best regards,
>
> Martijn
>
> On Sat, Sep 1, 2012 at 1:51 PM, Andrew Holway
> wrote:
>
> >
> >  http://support.brightcomputing.com/rt/Ticket/Display.html?id=2699 >
> >
> > Hi,
> >
> > That is FULL install (I think)
> >
> > I create a new virtual machine each time I test it.
> >
> > Ta,
> >
> > Andrew
> >
> >
> > On Aug 31, 2012, at 11:20 PM, Martijn de Vries wrote:
> >
> > > Hi Andrew,
> > >
> > > That's pretty strange. I am not a KVM expert, so I don't know what
> > happens
> > > under the hood when you disable cache. Have you tried doing a FULL
> > install?
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Fri, Aug 31, 2012 at 7:06 PM, Andrew Holway
> > > wrote:
> > >
> > >>
> > >> Fri Aug 31 19:06:00 2012: Request 2699 was acted upon.
> > >> Transaction: Ticket created by a.hol...@syseleven.de
> > >> Queue: Bright
> > >> Subject: NFSoRDMA not working with KVM when cache disabled
> > >> Owner: Nobody
> > >> Requestors: a.hol...@syseleven.de
> > >> Status: new
> > >> Ticket  > >> http://support.brightcomputing.com/rt/Ticket/Display.html?id=2699
> >
> > >>
> > >>
> > >> Hi,
> > >>
> > >> I am trying to host KVM machines on an NFSoRDMA mount.
> > >>
> > >> This works:
> > >>
> > >> -drive file=/mnt/vm001.img,if=none,id=drive-virtio-
> disk0,format=raw
> > >> -device
> > >>
> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-
> disk0,id=virtio-disk0
> > >>
> > >> This Doesn't!
> > >>
> > >> -drive
> > >> file=/mnt/vm001.img,if=none,id=drive-virtio-
> disk0,format=raw,cache=none
> > >> -device
> > >>
> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-
> disk0,id=virtio-disk0,bootindex=2
> > >>
> > >> Any ideas why this could be?
> > >>
> > >> Thanks,
> > >>
> > >> Andrew
> > >>
> > >>
> > >>
> > >> dmesg:
> > >>
> > >> vda:
> > >> vda: vda1
> > >> vda: vda1
> > >> vda: vda1 vda2
> > >> vda: vda1 vda2
> > >> vda: vda1 vda2 vda3
> > >> vda: vda1 vda2 vda3
> > >> vda: vda1 vda2 vda3 vda4 < >
> > >> vda: vda1 vda2 vda3 vda4 < >
> > >> vda: vda1 vda2 vda3 vda4 < vda5 >
> > >> vda: vda1 vda2 vda3 vda4 < vda5 >
> > >> vda: vda1 vda2 vda3 vda4 < vda5 vda6 >
> > >> vda: vda1 vda2 vda3 vda4 < vda5 vda6 >
> > >> Adding 15998968k swap on /dev/vda5. Priority:-1 extents:1
> > across:15998968k
> > >> EXT3-fs (vda1): error: can't find ext3 filesystem on dev vda1.
> > >>
> > >> /var/log/messages
> > >>
> > >> Aug 31 18:53:46 (none) node-installer: Mounting disks.
> > >> Aug 31 18:53:46 (none) node-installer: Updating device status:
> mounting
> > >> disks
> > >> Aug 31 18:53:46 (none) node-installer: Detecting device
> '/dev/sda': not
> > >> found
> > >> Aug 31 18:53:46 (none) node-installer: Detecting device
> '/dev/hda': not
> > >> found
> > >> Aug 31 18:53:46 (none) node-installer: Detecting device
> '/dev/vda':
> > found
> > >> Aug 31 18:53:46 (none) node-installer: swapon /dev/vda5
> > >> Aug 31 18:53:46 (none) node-installer: mkdir -p /localdisk/
> > >> Aug 31 18:53:46 (none) kernel: Adding 15998968k swap on
> /dev/vda5.
> > >> Priority:-1 extents:1 across:15998968k
> > >> Aug 31 18:53:46 (none) node-installer: Mounting /dev/vda1 on
> /localdisk/
> > >> Aug 31 18:53:46 (none) node-installer: mount -t ext3 -o
> > >> defaults,noatime,nodiratime /dev/vda1 /localdisk/
> > >> Aug 31 18:53:46 (none) node-installer: mount: wrong fs type, bad
> option,
> > >> bad superblock on /dev/vda1,
> > >> Aug 31 18:53:46 (none) node-installer: missing codepage or
> helper
> > >> program, or other error
> > >> Aug 31 18:53:46 (none) node-installer: In some cases
> useful info
> > is
> > >> found in syslog - try
> > >> Aug 31 18:53:46 (none) node-installer: dmesg | tail or so
> > >> Aug 31 18:53:46 (none) node-installer:
> > >> Aug 31 18:53:46 (none) node-installer: Command failed.
> > >> Aug 31 18:53:46 (none) node-installer: Running: "mount -t ext3 -o
> > >> defaults,noatime,nodiratime /dev/vda1 /localdisk/" failed:
> > >> Aug 31 18:53:46 (none) node-installer: Non zero exit code: 32
> > >> Aug 31 18:53:46 (none) kernel: EXT3-fs (vda1): error: can't find
> ext3
> > >> filesystem on dev vda1.
> > >> Aug 31 18:53:46 (none) node-installer: Failed to mount disks.
> (Exit code
> > >> 12, signal 0)
> > >> Aug 31 18:53:46 (none) node-installer: There was a fatal problem.
> This
> > >> node can not be installed until the problem is corrected.
> > >> Aug 31 18:53:46 (none) node-installer: The error was: failed to
> mount
> > disks
> > >> Aug 31 18:53:46 (none) node-installer: Updating device status:
> failed to
> 

Re: [RFC 0/5] Making KVM_GET_ONE_REG/KVM_SET_ONE_REG generic.

2012-09-03 Thread Peter Maydell
On 3 September 2012 13:33, Rusty Russell  wrote:
> I have no idea, since I didn't hear the complaints.  But any non-fixed
> size array has issues in C; there's not much we can do about it.
>
> x86 manages this fine for msrs, and I didn't have a problem using it for
> my test programs.  That's the limit of my experience, however.

Yes, I didn't particularly find the ioctls Rusty initially suggested
hard to use (apart from briefly getting confused between struct
kvm_msrs and struct kvm_msr_list; you could collapse those down
into one struct if you wanted I suppose). The nastiest problem
in the target-i386 qemu code that gets the MSR list is the fact
it has to work around some ancient kernel bug where the ioctl
would write beyond the end of the array...

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/5] Making KVM_GET_ONE_REG/KVM_SET_ONE_REG generic.

2012-09-03 Thread Rusty Russell
Avi Kivity  writes:
> On 09/01/2012 03:35 PM, Rusty Russell wrote:
>> Passing an address in a struct is pretty bad, since it involves
>> compatibility wrappers.  
>
> Right, some s390 thing.

Err, no, i386 on x86-64, or ppc32 on ppc64, or arm on arm64

Any time you put a pointer in a structure which is exposed to userspace,
you have to deal with this.

>> I don't think that is what makes the API hard
>> to use.
>
> What is it then?  I forgot what the original complaints/complainers were.

I have no idea, since I didn't hear the complaints.  But any non-fixed
size array has issues in C; there's not much we can do about it.

x86 manages this fine for msrs, and I didn't have a problem using it for
my test programs.  That's the limit of my experience, however.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] Provide userspace IO exit completion callback.

2012-09-03 Thread Gleb Natapov
Current code assumes that IO exit was due to instruction emulation
and handles execution back to emulator directly. This patch adds new
userspace IO exit completion callback that can be set by any other code
that caused IO exit to userspace.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/x86.c  |   93 +++
 2 files changed, 57 insertions(+), 37 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fc0e752..64adb61 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -414,6 +414,7 @@ struct kvm_vcpu_arch {
struct x86_emulate_ctxt emulate_ctxt;
bool emulate_regs_need_sync_to_vcpu;
bool emulate_regs_need_sync_from_vcpu;
+   int (*complete_userspace_io)(struct kvm_vcpu *vcpu);
 
gpa_t time;
struct pvclock_vcpu_time_info hv_clock;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 20f2266..2520d18 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4544,6 +4544,9 @@ static bool retry_instruction(struct x86_emulate_ctxt 
*ctxt,
return true;
 }
 
+static int complete_emulated_mmio(struct kvm_vcpu *vcpu);
+static int complete_emulated_pio(struct kvm_vcpu *vcpu);
+
 int x86_emulate_instruction(struct kvm_vcpu *vcpu,
unsigned long cr2,
int emulation_type,
@@ -4614,13 +4617,16 @@ restart:
} else if (vcpu->arch.pio.count) {
if (!vcpu->arch.pio.in)
vcpu->arch.pio.count = 0;
-   else
+   else {
writeback = false;
+   vcpu->arch.complete_userspace_io = 
complete_emulated_pio;
+   }
r = EMULATE_DO_MMIO;
} else if (vcpu->mmio_needed) {
if (!vcpu->mmio_is_write)
writeback = false;
r = EMULATE_DO_MMIO;
+   vcpu->arch.complete_userspace_io = complete_emulated_mmio;
} else if (r == EMULATION_RESTART)
goto restart;
else
@@ -5476,6 +5482,24 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
return r;
 }
 
+static inline int complete_emulated_io(struct kvm_vcpu *vcpu)
+{
+   int r;
+   vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+   r = emulate_instruction(vcpu, EMULTYPE_NO_DECODE);
+   srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
+   if (r != EMULATE_DONE)
+   return 0;
+   return 1;
+}
+
+static int complete_emulated_pio(struct kvm_vcpu *vcpu)
+{
+   BUG_ON(!vcpu->arch.pio.count);
+
+   return complete_emulated_io(vcpu);
+}
+
 /*
  * Implements the following, as a state machine:
  *
@@ -5492,47 +5516,37 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
  *  copy data
  *  exit
  */
-static int complete_mmio(struct kvm_vcpu *vcpu)
+static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
 {
struct kvm_run *run = vcpu->run;
struct kvm_mmio_fragment *frag;
-   int r;
 
-   if (!(vcpu->arch.pio.count || vcpu->mmio_needed))
-   return 1;
+   BUG_ON(!vcpu->mmio_needed);
 
-   if (vcpu->mmio_needed) {
-   /* Complete previous fragment */
-   frag = &vcpu->mmio_fragments[vcpu->mmio_cur_fragment++];
-   if (!vcpu->mmio_is_write)
-   memcpy(frag->data, run->mmio.data, frag->len);
-   if (vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments) {
-   vcpu->mmio_needed = 0;
-   if (vcpu->mmio_is_write)
-   return 1;
-   vcpu->mmio_read_completed = 1;
-   goto done;
-   }
-   /* Initiate next fragment */
-   ++frag;
-   run->exit_reason = KVM_EXIT_MMIO;
-   run->mmio.phys_addr = frag->gpa;
+   /* Complete previous fragment */
+   frag = &vcpu->mmio_fragments[vcpu->mmio_cur_fragment++];
+   if (!vcpu->mmio_is_write)
+   memcpy(frag->data, run->mmio.data, frag->len);
+   if (vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments) {
+   vcpu->mmio_needed = 0;
if (vcpu->mmio_is_write)
-   memcpy(run->mmio.data, frag->data, frag->len);
-   run->mmio.len = frag->len;
-   run->mmio.is_write = vcpu->mmio_is_write;
-   return 0;
-
-   }
-done:
-   vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
-   r = emulate_instruction(vcpu, EMULTYPE_NO_DECODE);
-   srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
-   if (r != EMULATE_DONE)
-   return 0;
-   return 1;
+   return 1;
+   vcpu->mmio_read_completed = 1;
+   return complete_emulated_io(vcpu);
+   }
+   /* Initiate next fragment */

[PATCH 4/4] KVM: emulator: optimize "rep ins" handling.

2012-09-03 Thread Gleb Natapov
Optimize "rep ins" by allowing emulator to write back more than one
datum at a time. Introduce new operand type OP_MEM_STR which tells
writeback() that dst contains pointer to an array that should be written
back as opposite to just one data element.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |4 +++-
 arch/x86/kvm/emulate.c |   33 -
 2 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index c80c091..08d1c64 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -213,8 +213,9 @@ typedef u32 __attribute__((vector_size(16))) sse128_t;
 
 /* Type, address-of, and value of an instruction's operand. */
 struct operand {
-   enum { OP_REG, OP_MEM, OP_IMM, OP_XMM, OP_MM, OP_NONE } type;
+   enum { OP_REG, OP_MEM, OP_MEM_STR, OP_IMM, OP_XMM, OP_MM, OP_NONE } 
type;
unsigned int bytes;
+   unsigned int count;
union {
unsigned long orig_val;
u64 orig_val64;
@@ -234,6 +235,7 @@ struct operand {
char valptr[sizeof(unsigned long) + 2];
sse128_t vec_val;
u64 mm_val;
+   void *data;
};
 };
 
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8a1c42e..e87e616 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1301,8 +1301,15 @@ static int pio_in_emulated(struct x86_emulate_ctxt *ctxt,
rc->end = n * size;
}
 
-   memcpy(dest, rc->data + rc->pos, size);
-   rc->pos += size;
+   if (ctxt->rep_prefix && !(ctxt->eflags & EFLG_DF)) {
+   ctxt->dst.data = rc->data + rc->pos;
+   ctxt->dst.type = OP_MEM_STR;
+   ctxt->dst.count = (rc->end - rc->pos) / size;
+   rc->pos = rc->end;
+   } else {
+   memcpy(dest, rc->data + rc->pos, size);
+   rc->pos += size;
+   }
return 1;
 }
 
@@ -1546,6 +1553,14 @@ static int writeback(struct x86_emulate_ctxt *ctxt)
if (rc != X86EMUL_CONTINUE)
return rc;
break;
+   case OP_MEM_STR:
+   rc = segmented_write(ctxt,
+   ctxt->dst.addr.mem,
+   ctxt->dst.data,
+   ctxt->dst.bytes * ctxt->dst.count);
+   if (rc != X86EMUL_CONTINUE)
+   return rc;
+   break;
case OP_XMM:
write_sse_reg(ctxt, &ctxt->dst.vec_val, ctxt->dst.addr.xmm);
break;
@@ -2793,7 +2808,7 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
 static void string_addr_inc(struct x86_emulate_ctxt *ctxt, int reg,
struct operand *op)
 {
-   int df = (ctxt->eflags & EFLG_DF) ? -1 : 1;
+   int df = (ctxt->eflags & EFLG_DF) ? -op->count : op->count;
 
register_address_increment(ctxt, reg_rmw(ctxt, reg), df * op->bytes);
op->addr.mem.ea = register_address(ctxt, reg_read(ctxt, reg));
@@ -3733,7 +3748,7 @@ static struct opcode opcode_table[256] = {
I(DstReg | SrcMem | ModRM | Src2Imm, em_imul_3op),
I(SrcImmByte | Mov | Stack, em_push),
I(DstReg | SrcMem | ModRM | Src2ImmByte, em_imul_3op),
-   I2bvIP(DstDI | SrcDX | Mov | String, em_in, ins, check_perm_in), /* 
insb, insw/insd */
+   I2bvIP(DstDI | SrcDX | Mov | String | Unaligned, em_in, ins, 
check_perm_in), /* insb, insw/insd */
I2bvIP(SrcSI | DstDX | String, em_out, outs, check_perm_out), /* outsb, 
outsw/outsd */
/* 0x70 - 0x7F */
X16(D(SrcImmByte)),
@@ -3991,6 +4006,7 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, 
struct operand *op,
register_address(ctxt, reg_read(ctxt, VCPU_REGS_RDI));
op->addr.mem.seg = VCPU_SREG_ES;
op->val = 0;
+   op->count = 1;
break;
case OpDX:
op->type = OP_REG;
@@ -4034,6 +4050,7 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, 
struct operand *op,
register_address(ctxt, reg_read(ctxt, VCPU_REGS_RSI));
op->addr.mem.seg = seg_override(ctxt);
op->val = 0;
+   op->count = 1;
break;
case OpImmFAddr:
op->type = OP_IMM;
@@ -4575,8 +4592,14 @@ writeback:
string_addr_inc(ctxt, VCPU_REGS_RDI, &ctxt->dst);
 
if (ctxt->rep_prefix && (ctxt->d & String)) {
+   unsigned int count;
struct read_cache *r = &ctxt->io_read;
-   register_address_increment(ctxt, reg_rmw(ctxt, VCPU_REGS_RCX), 
-1);
+   if ((ctxt->d & SrcMask) == SrcSI)
+   count = ctxt->src.count;
+   else
+   count = ctxt->dst.count;
+ 

[PATCH REBASE RESEND 0/4] improve speed of "rep ins" emulation

2012-09-03 Thread Gleb Natapov
This series (or rather the last patch of it) takes different approach
to "rep ins" optimization. Instead of writing separate fast path for
it do the fast path inside emulator itself. This way nobody can say the
code is not reused!

Patch 1,2 are now, strictly speaking, not needed, but I think this is
still nice cleanup and, in case of patch 1, eliminates some ifs at each
KVM_RUN ioctl.

Gleb Natapov (4):
  Provide userspace IO exit completion callback.
  KVM: emulator: make x86 emulation modes enum instead of defines
  KVM: emulator: string_addr_inc() cleanup
  KVM: emulator: optimize "rep ins" handling.

 arch/x86/include/asm/kvm_emulate.h |   26 +-
 arch/x86/include/asm/kvm_host.h|1 +
 arch/x86/kvm/emulate.c |   48 ++-
 arch/x86/kvm/x86.c |   93 ++--
 4 files changed, 105 insertions(+), 63 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] KVM: emulator: make x86 emulation modes enum instead of defines

2012-09-03 Thread Gleb Natapov

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |   22 ++
 arch/x86/kvm/emulate.c |4 +++-
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 282aee5..c80c091 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -249,6 +249,15 @@ struct read_cache {
unsigned long end;
 };
 
+/* Execution mode, passed to the emulator. */
+enum x86emul_mode {
+   X86EMUL_MODE_REAL,  /* Real mode. */
+   X86EMUL_MODE_VM86,  /* Virtual 8086 mode. */
+   X86EMUL_MODE_PROT16,/* 16-bit protected mode. */
+   X86EMUL_MODE_PROT32,/* 32-bit protected mode. */
+   X86EMUL_MODE_PROT64,/* 64-bit (long) mode.*/
+};
+
 struct x86_emulate_ctxt {
struct x86_emulate_ops *ops;
 
@@ -256,7 +265,7 @@ struct x86_emulate_ctxt {
unsigned long eflags;
unsigned long eip; /* eip before instruction emulation */
/* Emulated execution mode, represented by an X86EMUL_MODE value. */
-   int mode;
+   enum x86emul_mode mode;
 
/* interruptibility state, as a result of execution of STI or MOV SS */
int interruptibility;
@@ -308,17 +317,6 @@ struct x86_emulate_ctxt {
 #define REPE_PREFIX0xf3
 #define REPNE_PREFIX   0xf2
 
-/* Execution mode, passed to the emulator. */
-#define X86EMUL_MODE_REAL 0/* Real mode. */
-#define X86EMUL_MODE_VM86 1/* Virtual 8086 mode. */
-#define X86EMUL_MODE_PROT16   2/* 16-bit protected mode. */
-#define X86EMUL_MODE_PROT32   4/* 32-bit protected mode. */
-#define X86EMUL_MODE_PROT64   8/* 64-bit (long) mode.*/
-
-/* any protected mode   */
-#define X86EMUL_MODE_PROT (X86EMUL_MODE_PROT16|X86EMUL_MODE_PROT32| \
-  X86EMUL_MODE_PROT64)
-
 /* CPUID vendors */
 #define X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx 0x68747541
 #define X86EMUL_CPUID_VENDOR_AuthenticAMD_ecx 0x444d4163
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1451cff..a453357 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2268,6 +2268,8 @@ static int em_sysenter(struct x86_emulate_ctxt *ctxt)
if (msr_data == 0x0)
return emulate_gp(ctxt, 0);
break;
+   default:
+   break;
}
 
ctxt->eflags &= ~(EFLG_VM | EFLG_IF | EFLG_RF);
@@ -4400,7 +4402,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
}
 
/* Instruction can only be executed in protected mode */
-   if ((ctxt->d & Prot) && !(ctxt->mode & X86EMUL_MODE_PROT)) {
+   if ((ctxt->d & Prot) && ctxt->mode < X86EMUL_MODE_PROT16) {
rc = emulate_ud(ctxt);
goto done;
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] KVM: emulator: string_addr_inc() cleanup

2012-09-03 Thread Gleb Natapov
Remove unneeded segment argument. Address structure already has correct
segment which was put there during decode.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |   11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a453357..8a1c42e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2790,14 +2790,13 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
return (rc == X86EMUL_UNHANDLEABLE) ? EMULATION_FAILED : EMULATION_OK;
 }
 
-static void string_addr_inc(struct x86_emulate_ctxt *ctxt, unsigned seg,
-   int reg, struct operand *op)
+static void string_addr_inc(struct x86_emulate_ctxt *ctxt, int reg,
+   struct operand *op)
 {
int df = (ctxt->eflags & EFLG_DF) ? -1 : 1;
 
register_address_increment(ctxt, reg_rmw(ctxt, reg), df * op->bytes);
op->addr.mem.ea = register_address(ctxt, reg_read(ctxt, reg));
-   op->addr.mem.seg = seg;
 }
 
 static int em_das(struct x86_emulate_ctxt *ctxt)
@@ -4570,12 +4569,10 @@ writeback:
ctxt->dst.type = saved_dst_type;
 
if ((ctxt->d & SrcMask) == SrcSI)
-   string_addr_inc(ctxt, seg_override(ctxt),
-   VCPU_REGS_RSI, &ctxt->src);
+   string_addr_inc(ctxt, VCPU_REGS_RSI, &ctxt->src);
 
if ((ctxt->d & DstMask) == DstDI)
-   string_addr_inc(ctxt, VCPU_SREG_ES, VCPU_REGS_RDI,
-   &ctxt->dst);
+   string_addr_inc(ctxt, VCPU_REGS_RDI, &ctxt->dst);
 
if (ctxt->rep_prefix && (ctxt->d & String)) {
struct read_cache *r = &ctxt->io_read;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFSoRDMA not working with KVM when cache disabled

2012-09-03 Thread Avi Kivity
On 09/03/2012 02:57 PM, Andrew Holway wrote:
>> 
>> and report which (if any) of the output files (x1, x2, y1, y2) are
>> corrupted, by comparing them against the original.  This will tell us
>> whether O_DIRECT is broken, or 512 byte block size, or neither.
>> 
> 
> 
> Looks like you were directly on the money there. 512, 1K and 2K O_DIRECT 
> looks broken.

It was hardly unexpected.  Please report the problem to the nfs/rdma
list (and keep kvm@ copied, nfs/rdma is fairly relevant to kvm).

You can work around it by exposing a 4k logical block size to the guest
(logical_block_size=4k).  Note not all guests support this.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFSoRDMA not working with KVM when cache disabled

2012-09-03 Thread Andrew Holway
> 
> and report which (if any) of the output files (x1, x2, y1, y2) are
> corrupted, by comparing them against the original.  This will tell us
> whether O_DIRECT is broken, or 512 byte block size, or neither.
> 


Looks like you were directly on the money there. 512, 1K and 2K O_DIRECT looks 
broken.


[root@node001 mnt]# for f in 512 1024 2048 4096 8192 16384 32768 65536 131072; 
do dd bs="$f" if=CentOS-6.3-x86_64-netinstall.iso of=hello && md5sum hello && 
rm -f hello; done409600+0 records in
409600+0 records out
209715200 bytes (210 MB) copied, 0.774718 s, 271 MB/s
690138908de516b6e5d7d180d085c3f3  hello
204800+0 records in
204800+0 records out
209715200 bytes (210 MB) copied, 0.609578 s, 344 MB/s
690138908de516b6e5d7d180d085c3f3  hello
102400+0 records in
102400+0 records out
209715200 bytes (210 MB) copied, 0.490421 s, 428 MB/s
690138908de516b6e5d7d180d085c3f3  hello
51200+0 records in
51200+0 records out
209715200 bytes (210 MB) copied, 0.440156 s, 476 MB/s
690138908de516b6e5d7d180d085c3f3  hello
25600+0 records in
25600+0 records out
209715200 bytes (210 MB) copied, 0.4229 s, 496 MB/s
690138908de516b6e5d7d180d085c3f3  hello
12800+0 records in
12800+0 records out
209715200 bytes (210 MB) copied, 0.40914 s, 513 MB/s
690138908de516b6e5d7d180d085c3f3  hello
6400+0 records in
6400+0 records out
209715200 bytes (210 MB) copied, 0.427247 s, 491 MB/s
690138908de516b6e5d7d180d085c3f3  hello
3200+0 records in
3200+0 records out
209715200 bytes (210 MB) copied, 0.411776 s, 509 MB/s
690138908de516b6e5d7d180d085c3f3  hello
1600+0 records in
1600+0 records out
209715200 bytes (210 MB) copied, 0.417098 s, 503 MB/s
690138908de516b6e5d7d180d085c3f3  hello



[root@node001 mnt]# for f in 512 1024 2048 4096 8192 16384 32768 65536 131072; 
do dd bs="$f" if=CentOS-6.3-x86_64-netinstall.iso of=hello iflag=direct 
oflag=direct && md5sum hello && rm -f hello; done
409600+0 records in
409600+0 records out
209715200 bytes (210 MB) copied, 62.3649 s, 3.4 MB/s
aadd0ffe3c9dfa35d8354e99ecac9276  hello
204800+0 records in
204800+0 records out
209715200 bytes (210 MB) copied, 41.3876 s, 5.1 MB/s
336f6da78f93dab591edc18da81f002e  hello
102400+0 records in
102400+0 records out
209715200 bytes (210 MB) copied, 21.1712 s, 9.9 MB/s
f4cefe0a05c9b47ba68effdb17dc95d6  hello
51200+0 records in
51200+0 records out
209715200 bytes (210 MB) copied, 10.9631 s, 19.1 MB/s
690138908de516b6e5d7d180d085c3f3  hello
25600+0 records in
25600+0 records out
209715200 bytes (210 MB) copied, 5.4136 s, 38.7 MB/s
690138908de516b6e5d7d180d085c3f3  hello
12800+0 records in
12800+0 records out
209715200 bytes (210 MB) copied, 3.1448 s, 66.7 MB/s
690138908de516b6e5d7d180d085c3f3  hello
6400+0 records in
6400+0 records out
209715200 bytes (210 MB) copied, 1.77304 s, 118 MB/s
690138908de516b6e5d7d180d085c3f3  hello
3200+0 records in
3200+0 records out
209715200 bytes (210 MB) copied, 1.4331 s, 146 MB/s
690138908de516b6e5d7d180d085c3f3  hello
1600+0 records in
1600+0 records out
209715200 bytes (210 MB) copied, 0.922167 s, 227 MB/s
690138908de516b6e5d7d180d085c3f3  hello

> 
> -- 
> error compiling committee.c: too many arguments to function
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] virtio-spec: virtio network device multiqueue support

2012-09-03 Thread Michael S. Tsirkin
At Jason's request, I am trying to help finalize the spec for
the new multiqueue feature.

Changes from Jason's rfc:
- reserved vq 3: this makes all rx vqs even and tx vqs odd, which
  looks nicer to me.
- documented packet steering, added a generalized steering programming
  command. Current modes are single queue and host driven multiqueue,
  but I envision support for guest driven multiqueue in the future.
- make default vqs unused when in mq mode - this wastes some memory
  but makes it more efficient to switch between modes as
  we can avoid this causing packet reordering.

Rusty, could you please take a look and comment?
If this looks OK to everyone, we can proceed with finalizing the
implementation.  This patch is against
eb9fc84d0d3c46438aaab190e2401a9e5409a052 in virtio-spec git tree.

-->

virtio-spec: virtio network device multiqueue support

Add multiqueue support to virtio network device.
Add a new feature flag VIRTIO_NET_F_MULTIQUEUE for this feature, a new
configuration field max_virtqueue_pairs to detect supported number of
virtqueues as well as a new command VIRTIO_NET_CTRL_STEERING to program
packet steering.

Signed-off-by: Michael S. Tsirkin 

--

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index 7a073f4..583debc 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -58,6 +58,7 @@
 \html_be_strict false
 \author -608949062 "Rusty Russell,,," 
 \author 1531152142 "Paolo Bonzini,,," 
+\author 1986246365 "Michael S. Tsirkin" 
 \end_header
 
 \begin_body
@@ -3896,6 +3897,37 @@ Only if VIRTIO_NET_F_CTRL_VQ set
 \end_inset
 
 
+\change_inserted 1986246365 1346663522
+ 3: reserved
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1986246365 1346663550
+4: receiveq1.
+ 5: transmitq1.
+ 6: receiveq2.
+ 7.
+ transmitq2.
+ ...
+ 2N+2:receivqN, 2N+3:transmitqN
+\begin_inset Foot
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346663558
+Only if VIRTIO_NET_F_CTRL_VQ set.
+ N is indicated by max_virtqueue_pairs field.
+\change_unchanged
+
+\end_layout
+
+\end_inset
+
+
+\change_unchanged
+
 \end_layout
 
 \begin_layout Description
@@ -4056,6 +4088,17 @@ VIRTIO_NET_F_CTRL_VLAN
 
 \begin_layout Description
 VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets.
+\change_inserted 1986246365 1346617842
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1986246365 1346618103
+VIRTIO_NET_F_MULTIQUEUE(22) Device has multiple receive and transmission
+ queues.
+\change_unchanged
+
 \end_layout
 
 \end_deeper
@@ -4068,11 +4111,45 @@ configuration
 \begin_inset space ~
 \end_inset
 
-layout Two configuration fields are currently defined.
+layout 
+\change_deleted 1986246365 1346671560
+Two
+\change_inserted 1986246365 1346671647
+Six
+\change_unchanged
+ configuration fields are currently defined.
  The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC
  is set), and the status field only exists if VIRTIO_NET_F_STATUS is set.
  Two read-only bits are currently defined for the status field: 
VIRTIO_NET_S_LIN
 K_UP and VIRTIO_NET_S_ANNOUNCE.
+
+\change_inserted 1986246365 1346672138
+ The following four read-only fields only exists if VIRTIO_NET_F_MULTIQUEUE
+ is set.
+ The max_virtqueue_pairs field specifies the maximum number of each of transmit
+ and receive virtqueues that can used for multiqueue operation.
+ The following read-only fields: 
+\emph on
+current_steering_rule
+\emph default
+, 
+\emph on
+reserved
+\emph default
+ and 
+\emph on
+current_steering_param
+\emph default
+ store the last successful VIRTIO_NET_CTRL_STEERING
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "sub:Transmit-Packet-Steering"
+
+\end_inset
+
+ command executed by driver, for debugging.
+
+\change_unchanged
  
 \begin_inset listings
 inline false
@@ -4105,6 +4182,40 @@ struct virtio_net_config {
 \begin_layout Plain Layout
 
 u16 status;
+\change_inserted 1986246365 1346671221
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346671532
+
+u16 max_virtqueue_pairs;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346671531
+
+u8 current_steering_rule;
+\change_unchanged
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346671499
+
+u8 reserved;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346671530
+
+u16 current_steering_param;
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -4151,6 +4262,18 @@ physical
 \begin_layout Enumerate
 If the VIRTIO_NET_F_CTRL_VQ feature bit is negotiated, identify the control
  virtqueue.
+\change_inserted 1986246365 1346618052
+
+\end_layout
+
+\begin_layout Enumerate
+
+\change_inserted 1986246365 1346618175
+If VIRTIO_NET_F_MULTIQUEUE feature bit is negotiated, identify the receive
+ and transmission queues that are going to be used in multiqueue mode.
+ Only queues that are going to be used need to be initialized.
+\change_unchanged
+
 \end_layou

Re: KVM call agenda for Tuesday, September 4th

2012-09-03 Thread Avi Kivity
On 09/03/2012 09:44 AM, Juan Quintela wrote:
> 
> Hi
> 
> Please send in any agenda items you are interested in covering.

- protecting MemoryRegion::opaque during dispatch

I'm guessing Ping won't make it due to timezone problems.  Jan, if you
will not participate, please remove the topic from the list (unless
someone else wants to argue your side).

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND] KVM: cleanup pic reset

2012-09-03 Thread Gleb Natapov
kvm_pic_reset() is not used anywhere. Move reset logic from
pic_ioport_write() there.

Signed-off-by: Gleb Natapov 
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 90c84f9..848206d 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -275,23 +275,20 @@ void kvm_pic_reset(struct kvm_kpic_state *s)
 {
int irq, i;
struct kvm_vcpu *vcpu;
-   u8 irr = s->irr, isr = s->imr;
+   u8 edge_irr = s->irr & ~s->elcr;
bool found = false;
 
s->last_irr = 0;
-   s->irr = 0;
+   s->irr &= s->elcr;
s->imr = 0;
-   s->isr = 0;
s->priority_add = 0;
-   s->irq_base = 0;
-   s->read_reg_select = 0;
-   s->poll = 0;
s->special_mask = 0;
-   s->init_state = 0;
-   s->auto_eoi = 0;
-   s->rotate_on_auto_eoi = 0;
-   s->special_fully_nested_mode = 0;
-   s->init4 = 0;
+   s->read_reg_select = 0;
+   if (!s->init4) {
+   s->special_fully_nested_mode = 0;
+   s->auto_eoi = 0;
+   }
+   s->init_state = 1;
 
kvm_for_each_vcpu(i, vcpu, s->pics_state->kvm)
if (kvm_apic_accept_pic_intr(vcpu)) {
@@ -304,7 +301,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s)
return;
 
for (irq = 0; irq < PIC_NUM_PINS/2; irq++)
-   if (irr & (1 << irq) || isr & (1 << irq))
+   if (edge_irr & (1 << irq))
pic_clear_isr(s, irq);
 }
 
@@ -316,40 +313,13 @@ static void pic_ioport_write(void *opaque, u32 addr, u32 
val)
addr &= 1;
if (addr == 0) {
if (val & 0x10) {
-   u8 edge_irr = s->irr & ~s->elcr;
-   int i;
-   bool found;
-   struct kvm_vcpu *vcpu;
-
s->init4 = val & 1;
-   s->last_irr = 0;
-   s->irr &= s->elcr;
-   s->imr = 0;
-   s->priority_add = 0;
-   s->special_mask = 0;
-   s->read_reg_select = 0;
-   if (!s->init4) {
-   s->special_fully_nested_mode = 0;
-   s->auto_eoi = 0;
-   }
-   s->init_state = 1;
if (val & 0x02)
pr_pic_unimpl("single mode not supported");
if (val & 0x08)
pr_pic_unimpl(
-   "level sensitive irq not supported");
-
-   kvm_for_each_vcpu(i, vcpu, s->pics_state->kvm)
-   if (kvm_apic_accept_pic_intr(vcpu)) {
-   found = true;
-   break;
-   }
-
-
-   if (found)
-   for (irq = 0; irq < PIC_NUM_PINS/2; irq++)
-   if (edge_irr & (1 << irq))
-   pic_clear_isr(s, irq);
+   "level sensitive irq not 
supported");
+   kvm_pic_reset(s);
} else if (val & 0x08) {
if (val & 0x04)
s->poll = 1;
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Tuesday, September 4th

2012-09-03 Thread Juan Quintela
Juan Quintela  wrote:
> Hi
>
> Please send in any agenda items you are interested in covering.
>
> Thanks, Juan.

Forwarding my own email.  Just put a wrong address on the 1st try (kvm@
folks receive this twice).  Thanks to Markus for noticing it.

Later, Juan.

PD. /me learn not to sent email so soon.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 1/3] KVM: x86: export svm/vmx exit code and vector code to userspace

2012-09-03 Thread Avi Kivity
On 08/27/2012 12:51 PM, Dong Hao wrote:
> From: Xiao Guangrong 
> 
> Exporting KVM exit information to userspace to be consumed by perf.
> 
> [ Dong Hao : rebase it on acme's git tree ]
> Signed-off-by: Xiao Guangrong 
> Signed-off-by: Dong Hao 
> ---
>  arch/x86/include/asm/kvm_host.h |   36 ---

Please put ABIs in kvm.h.  But see below.

>  arch/x86/include/asm/svm.h  |  205 
> +--
>  arch/x86/include/asm/vmx.h  |  126 
>  arch/x86/kvm/trace.h|   89 -
>  4 files changed, 234 insertions(+), 222 deletions(-)
> 
>  
> +#define DE_VECTOR 0
> +#define DB_VECTOR 1
> +#define BP_VECTOR 3
> +#define OF_VECTOR 4
> +#define BR_VECTOR 5
> +#define UD_VECTOR 6
> +#define NM_VECTOR 7
> +#define DF_VECTOR 8
> +#define TS_VECTOR 10
> +#define NP_VECTOR 11
> +#define SS_VECTOR 12
> +#define GP_VECTOR 13
> +#define PF_VECTOR 14
> +#define MF_VECTOR 16
> +#define MC_VECTOR 18

This is not a kvm ABI, but an x86 architecture constants.  It should be
put into an existing x86 header.

> +
>  #endif /* _ASM_X86_KVM_HOST_H */
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index f2b83bc..cdf5674 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -1,6 +1,135 @@
>  #ifndef __SVM_H
>  #define __SVM_H
>  
> +
> +#ifdef __KERNEL__
> +

The entire file can be exported; nothing in there is implementation
specific.

> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 74fcb96..61e04e9 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> +
> +#ifdef __KERNEL__
> +

Ditto.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/3] KVM: x86: trace mmio begin and complete

2012-09-03 Thread Avi Kivity
On 08/27/2012 12:51 PM, Dong Hao wrote:
> From: Xiao Guangrong 
> 
> 'perf kvm stat record/report' will use kvm_exit and kvm_mmio(read...) to
> calculate mmio read emulated time for the old kernel, in order to trace
> mmio read event more exactly, we add kvm_mmio_begin to trace the time when
> mmio read begins, also, add kvm_io_done to trace the time when mmio/pio is
> completed
> 

Why is this so critical?

If a lot of time is spent in in-kernel mmio, then 'perf top' will report
it.  Otherwise the time between kvm_exit and kvm_entry describes the
time spent in the host.  Not all of it is mmio handling, but it is quite
close.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFSoRDMA not working with KVM when cache disabled

2012-09-03 Thread Avi Kivity
On 08/31/2012 08:05 PM, Andrew Holway wrote:
> Hi,
> 
> I am trying to host KVM machines on an NFSoRDMA mount.
> 
> This works:
> 
> -drive file=/mnt/vm001.img,if=none,id=drive-virtio-disk0,format=raw -device 
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0
> 
> This Doesn't!
> 
> -drive 
> file=/mnt/vm001.img,if=none,id=drive-virtio-disk0,format=raw,cache=none 
> -device 
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
> 
> Any ideas why this could be?

Please try

  dd bs=512 if=/mnt/vm001.img of=/mnt/x1
  dd bs=512 if=/mnt/vm001.img of=/mnt/x2

  dd bs=4096 if=/mnt/vm001.img of=/mnt/y1 iflag=direct oflag=direct
  dd bs=4096 if=/mnt/vm001.img of=/mnt/y2 iflag=direct oflag=direct

and report which (if any) of the output files (x1, x2, y1, y2) are
corrupted, by comparing them against the original.  This will tell us
whether O_DIRECT is broken, or 512 byte block size, or neither.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommended way access KVM data structures from other kernel subsystems ?

2012-09-03 Thread Avi Kivity
On 08/23/2012 10:49 AM, Pandarathil, Vijaymohan R wrote:
> Hi,
> 
> I am looking for the recommended approach for accessing KVM driver data from 
> other kernel components. In my case, I need to set some global variable/state 
> in KVM driver from one of the NMI handlers.  I see that using kvm_x86_ops is 
> one option. I can define a new function as part of kvm_x86_ops and invoke 
> that function from the NMI handler as below.
> 
> nmi_handler() {
> 
> ...
>If (kvm_x86_ops) {
>   kvm_x86_ops->new_fn();
> 
> }
> 
> after adding new_func() to struct kvm_x86_ops.
> 
> Is this the right approach ? 

No, kvm_x86_ops is not available to the core kernel code.

> Or is there some other existing mechanism ? Is KVM driver always loaded ?

No.

> I am fairly new to KVM and so any guidance is very much appreciated.

Do something like

  void nmi_xyz_register(struct nmi_xyz *x);
  void nmi_xyz_unregister(struct nmi_xyz *x);

  struct nmi_xyz {
  void (*nmi)(void);
  };

then add code to kvm to use nmi_xyz (whatever that is).

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm/fpu: Enable fully eager restore kvm FPU

2012-09-03 Thread Avi Kivity
On 08/23/2012 11:51 AM, Hao, Xudong wrote:
>> -Original Message-
>> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
>> Behalf Of Avi Kivity
>> Sent: Monday, August 20, 2012 6:43 PM
>> To: Roedel, Joerg
>> Cc: Hao, Xudong; kvm@vger.kernel.org; Zhang, Xiantao
>> Subject: Re: [PATCH] kvm/fpu: Enable fully eager restore kvm FPU
>> 
>> On 08/20/2012 01:14 PM, Roedel, Joerg wrote:
>> > On Mon, Aug 20, 2012 at 01:08:14PM +0300, Avi Kivity wrote:
>> >> On 08/20/2012 12:24 PM, Roedel, Joerg wrote:
>> >
>> >> So it was broken all along?  Yikes.
>> >
>> > There is no LWP support in the kernel and thus KVM can't expose it to
>> > guests. So for now nothing should be broken, no?
>> 
>> Oh, we mask out xcr0 bits not supported by the host.
>> 
>> So it's broken in another way: it isn't exposed.  Pity, it's such a nice
>> feature.
>> 
> 
> Hi, Avi/Joerg
> 
> What's the decision for it? I don't understand LWP, so how about this patch? 

It's fine (Joerg can send the LWP change), but there was a truncation
issue that needs fixing, no?


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/5] Making KVM_GET_ONE_REG/KVM_SET_ONE_REG generic.

2012-09-03 Thread Avi Kivity
On 09/01/2012 03:35 PM, Rusty Russell wrote:
> Avi Kivity  writes:
>>> -Capability: basic
>>> +Capability: KVM_CAP_REG_LIST
>>>  Architectures: arm
>>
>> all
> 
> OK, I guess that's to be true in future.  Fixed.
> 
>>>  Type: vcpu ioctl
>>> -Parameters: struct kvm_msr_list (in/out)
>>> +Parameters: struct kvm_reg_list (in/out)
>>>  Returns: 0 on success; -1 on error
>>>  Errors:
>>> -  E2BIG: the msr index list is too big to fit in the array specified by
>>> - the user.
>>> +  E2BIG: the reg index list is too big to fit in the array specified by
>>> + the user (the number required will be written into n).
>>>  
>>>  struct kvm_msr_list {
>>> -   __u32 nmsrs; /* number of msrs in entries */
>>> -   __u32 indices[0];
>>> +   __u64 n; /* number of registers in reg[] */
>>> +   __u64 reg[0];
>>>  };
>>>  
>>
>> People complain that this interface is hard to use.
>>
>> How about supplying the address of the array (in addition to n) so you
>> don't have to deal with variable sized arrays, and dropping E2BIG in
>> favour of always updating n (n changed to something bigger than you had
>> -> reallocate and rerun)
> 
> We re-write n anyway, *and* return -E2BIG.  Not returning an error is
> asking for trouble.
> 
> Passing an address in a struct is pretty bad, since it involves
> compatibility wrappers.  

Right, some s390 thing.

> I don't think that is what makes the API hard
> to use.

What is it then?  I forgot what the original complaints/complainers were.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Qcow2 preallocation + backing file

2012-09-03 Thread Stefan Hajnoczi
On Mon, Jul 30, 2012 at 9:10 PM, Mark Moseley  wrote:
> While I might've answered my own question, I'm no fio expert. I don't
> necessarily trust that I'm testing it correctly, which is why I'm
> asking here. Running the same test multiple times gives me
> 30-40meg/sec for both read and write on the
> preallocation=metadata-derived image, vs 0.5-1meg/sec for
> non-preallocation=metadata-derived images, both which sound way too
> high/low to be right. Same box (ubuntu 12.04 64-bit, running the stock
> ubuntu qemu 1.0), same disk, even the same directory. I could also see
> some bizarre side effect happening that I'm not aware of, ready to
> bite me later on down the road, like, say, that it's silently actually
> reading/writing the original image (just a made-up example).

qemu-de...@nongnu.org is the best place to discuss QEMU image formats.
 The development of qcow2 happens there.

Your fio question can't be answered without the job file.  Please post
it.  I agree that the numbers look weird, the difference is too great.

Are you using -drive ...,cache=none on your qemu command-line and
"direct=1" or "buffered=0" (they are synonyms) in the fio job file?
Those are necessary to take the host and guest page caches out of the
equation.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFSoRDMA not working with KVM when cache disabled

2012-09-03 Thread Stefan Hajnoczi
On Fri, Aug 31, 2012 at 6:05 PM, Andrew Holway  wrote:
> I am trying to host KVM machines on an NFSoRDMA mount.
>
> This works:
>
> -drive file=/mnt/vm001.img,if=none,id=drive-virtio-disk0,format=raw -device 
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0
>
> This Doesn't!
>
> -drive 
> file=/mnt/vm001.img,if=none,id=drive-virtio-disk0,format=raw,cache=none 
> -device 
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
>
> Any ideas why this could be?

-drive cache=none results in QEMU opening the image file with O_DIRECT.

It's not clear to me what the first error is.  The node-installer
output you have posted does not show a series of steps to reproduce
the problem.  Can you try running isolated tests with dd or
verify-data (http://people.redhat.com/sct/src/verify-data/)?

Have you tried running a different guest OS, say a Fedora or Debian Live CD?

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: invalidate vpid for invlpg instruction

2012-09-03 Thread Avi Kivity
On 09/03/2012 02:27 AM, Davidlohr Bueso wrote:
> On Fri, 2012-08-31 at 14:37 -0300, Marcelo Tosatti wrote:
>> On Fri, Aug 31, 2012 at 06:10:48PM +0200, Davidlohr Bueso wrote:
>> > For processors that support VPIDs we should invalidate the page table entry
>> > specified by the lineal address. For this purpose add support for 
>> > individual
>> > address invalidations.
>> 
>> Not necessary - a single context invalidation is performed through
>> KVM_REQ_TLB_FLUSH.
> 
> Since vpid_sync_context() supports both single and all-context vpid
> invalidations, wouldn't it make sense to also add individual address
> ones as well, supporting further granularity?

It might.  Do you have benchmarks supporting this?

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool

2012-09-03 Thread don

于 2012年08月31日 02:29, David Ahern 写道:
In addition to Andrew's comment about making the stats struct and 
functions generic...

Yes. :-)


On 8/27/12 3:51 AM, Dong Hao wrote:
---8<---

+static void exit_event_decode_key(struct event_key *key, char 
decode[20])

+{
+   const char *exit_reason = get_exit_reason(key->key);
+
+   snprintf(decode, 20, "%s", exit_reason);
+}


Use scnprintf rather than snprintf.

Why? Since we don't care about the return value, what's the difference?


---8<---


+static bool kvm_event_expand(struct kvm_event *event, int vcpu_id)
+{
+int old_max_vcpu = event->max_vcpu;
+
+if (vcpu_id < event->max_vcpu)
+return true;
+
+while (event->max_vcpu <= vcpu_id)
+event->max_vcpu += DEFAULT_VCPU_NUM;
+
+event->vcpu = realloc(event->vcpu,
+  event->max_vcpu * sizeof(*event->vcpu));
+if (!event->vcpu) {


If realloc fails you leak memory by overwriting the current pointer.
Thanks for pointing it out, we will terminate the running instance in 
our next

version.


---8<---


+static double event_stats_stddev(int vcpu_id, struct kvm_event *event)
+{
+struct event_stats *stats = &event->total;
+double variance, variance_mean, stddev;
+
+if (vcpu_id != -1)
+stats = &event->vcpu[vcpu_id];
+
+BUG_ON(!stats->count);
+
+variance = stats->M2 / (stats->count - 1);
+variance_mean = variance / stats->count;
+stddev = sqrt(variance_mean);
+
+return stddev * 100 / stats->mean;
+}


perf should be consistent in the stddev it shows the user. Any reason 
to dump relative stddev versus stddev used by perf-stat?
Since 'perf stat' uses relative standard deviation rather than stddev, 
'perf kvm stat'

just follows the style of 'perf stat'.



---8<---


+/* returns left most element of result, and erase it */
+static struct kvm_event *pop_from_result(void)
+{
+struct rb_node *node = result.rb_node;
+
+if (!node)
+return NULL;
+
+while (node->rb_left)
+node = node->rb_left;


Use rb_first().

OK, we will use it in the next version.


---8<---


+/*
+ * Append "-a" only if "-p"/"--pid" is not specified since they
+ * are mutually exclusive.
+ */
+if (!kvm_record_specified_guest(argc, argv))
+rec_argv[i++] = STRDUP_FAIL_EXIT("-a");


Other perf-kvm commands rely on perf-record semantics -- i.e., for 
user to add the -a or -p option.

You mean, remove '-a' from the default options, then:
if a user wants to record all guest he will use 'perf stat record -a';
and if a user wants to record the specified guest, he should use
'perf stat record -p xxx'?

Well, as the style of other subcommand, e.g., perf lock/perf sched, the
'perf xxx record' record all events on all cpus, no need to use '-a'.

Based on mentioned above, I prefer the original way. ;)


---8<---


+static const char * const kvm_events_report_usage[] = {
+"perf kvm stat report []",
+NULL
+};
+
+static const struct option kvm_events_report_options[] = {
+OPT_STRING(0, "event", &report_event, "report event",
+"event for reporting: vmexit, mmio, ioport"),
+OPT_INTEGER(0, "vcpu", &trace_vcpu,
+"vcpu id to report"),
+OPT_STRING('k', "key", &sort_key, "sort-key",
+"key for sorting: sample(sort by samples number)"
+" time (sort by avg time)"),
+OPT_END()
+};
+
+static int kvm_events_report(int argc, const char **argv)
+{
+symbol__init();
+
+if (argc) {
+argc = parse_options(argc, argv,
+ kvm_events_report_options,
+ kvm_events_report_usage, 0);
+if (argc)
+usage_with_options(kvm_events_report_usage,
+   kvm_events_report_options);
+}
+
+return kvm_events_report_vcpu(trace_vcpu);
+}
+
+static int kvm_cmd_stat(int argc, const char **argv)
+{
+if (argc > 1) {
+if (!strncmp(argv[1], "rec", 3))
+return kvm_events_record(argc - 1, argv + 1);
+
+if (!strncmp(argv[1], "rep", 3))
+return kvm_events_report(argc - 1 , argv + 1);
+}
+
+return cmd_stat(argc, argv, NULL);
+}
+
  static charname_buffer[256];

  static const char * const kvm_usage[] = {
-"perf kvm [] {top|record|report|diff|buildid-list}",
+"perf kvm [] {top|record|report|diff|buildid-list|stat}",
  NULL
  };



The usage for the report/record sub commands of stat is never shown. 
e.g.,

$ perf kvm stat
--> shows help for perf-stat

$ perf kvm
--> shows the above and perf-kvm's usage

To get help for the record/report subcommands you have to know that 
record and report are subcommands.

Okay, we will improve this.


---8<---

+static int perf_file_section__read_feature(struct perf_file_section 
*section,

+struct perf_header *ph,
+int feat, int fd, void *data)
+{
+struct header_read_data *hd = data;
+
+if (feat != hd->feat)
+return 0;
+
+if (lseek(fd, section->o