Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
Il 04/09/2012 04:21, Nicholas A. Bellinger ha scritto: >> @@ -112,6 +118,9 @@ static void virtscsi_complete_cmd(struct virtio_scsi >> *vscsi, void *buf) >> struct virtio_scsi_cmd *cmd = buf; >> struct scsi_cmnd *sc = cmd->sc; >> struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd; >> +struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; >> + >> +atomic_dec(&tgt->reqs); >> > > As tgt->tgt_lock is taken in virtscsi_queuecommand_multi() before the > atomic_inc_return(tgt->reqs) check, it seems like using atomic_dec() w/o > smp_mb__after_atomic_dec or tgt_lock access here is not using atomic.h > accessors properly, no..? No, only a single "thing" is being accessed, and there is no need to order the decrement with respect to preceding or subsequent accesses to other locations. In other words, tgt->reqs is already synchronized with itself, and that is enough. (Besides, on x86 smp_mb__after_atomic_dec is a nop). >> +static int virtscsi_queuecommand_multi(struct Scsi_Host *sh, >> + struct scsi_cmnd *sc) >> +{ >> +struct virtio_scsi *vscsi = shost_priv(sh); >> +struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; >> +unsigned long flags; >> +u32 queue_num; >> + >> +/* Using an atomic_t for tgt->reqs lets the virtqueue handler >> + * decrement it without taking the spinlock. >> + */ >> +spin_lock_irqsave(&tgt->tgt_lock, flags); >> +if (atomic_inc_return(&tgt->reqs) == 1) { >> +queue_num = smp_processor_id(); >> +while (unlikely(queue_num >= vscsi->num_queues)) >> +queue_num -= vscsi->num_queues; >> +tgt->req_vq = &vscsi->req_vqs[queue_num]; >> +} >> +spin_unlock_irqrestore(&tgt->tgt_lock, flags); >> +return virtscsi_queuecommand(vscsi, tgt, sc); >> +} >> + > > The extra memory barriers to get this right for the current approach are > just going to slow things down even more for virtio-scsi-mq.. virtio-scsi multiqueue has a performance benefit up to 20% (for a single LUN) or 40% (on overall bandwidth across multiple LUNs). I doubt that a single memory barrier can have that much impact. :) The way to go to improve performance even more is to add new virtio APIs for finer control of the usage of the ring. These should let us avoid copying the sg list and almost get rid of the tgt_lock; even though the locking is quite efficient in virtio-scsi (see how tgt_lock and vq_lock are "pipelined" so as to overlap the preparation of two requests), it should give a nice improvement and especially avoid a kmalloc with small requests. I may have some time for it next month. > Jen's approach is what we will ultimately need to re-architect in SCSI > core if we're ever going to move beyond the issues of legacy host_lock, > so I'm wondering if maybe this is the direction that virtio-scsi-mq > needs to go in as well..? We can see after the block layer multiqueue work goes in... I also need to look more closely at Jens's changes. Have you measured the host_lock to be a bottleneck in high-iops benchmarks, even for a modern driver that does not hold it in queuecommand? (Certainly it will become more important as the virtio-scsi queuecommand becomes thinner and thinner). If so, we can start looking at limiting host_lock usage in the fast path. BTW, supporting this in tcm-vhost should be quite trivial, as all the request queues are the same and all serialization is done in the virtio-scsi driver. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/3] KVM: x86: trace mmio begin and complete
On 09/03/2012 07:07 PM, Avi Kivity wrote: > On 08/27/2012 12:51 PM, Dong Hao wrote: >> From: Xiao Guangrong >> >> 'perf kvm stat record/report' will use kvm_exit and kvm_mmio(read...) to >> calculate mmio read emulated time for the old kernel, in order to trace >> mmio read event more exactly, we add kvm_mmio_begin to trace the time when >> mmio read begins, also, add kvm_io_done to trace the time when mmio/pio is >> completed >> > > Why is this so critical? > > If a lot of time is spent in in-kernel mmio, then 'perf top' will report > it. Otherwise the time between kvm_exit and kvm_entry describes the > time spent in the host. Not all of it is mmio handling, but it is quite > close. I have done some test, the new events can get more exact result, this is a example in my slides: the event handled time calculated by old events is: 0.66(s) calculated by new events, the result is: 0.05(s). So, i think it is worth introducing these new events. But if you do not care it or think these events are duplicate with current events or unmaintainable, i do not have strong opinion on that. Please let me know, i will drop them in the next version. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 1/3] KVM: x86: export svm/vmx exit code and vector code to userspace
On 09/03/2012 07:13 PM, Avi Kivity wrote: > On 08/27/2012 12:51 PM, Dong Hao wrote: >> From: Xiao Guangrong >> >> Exporting KVM exit information to userspace to be consumed by perf. >> >> [ Dong Hao : rebase it on acme's git tree ] >> Signed-off-by: Xiao Guangrong >> Signed-off-by: Dong Hao >> --- >> arch/x86/include/asm/kvm_host.h | 36 --- > > Please put ABIs in kvm.h. But see below. > >> arch/x86/include/asm/svm.h | 205 >> +-- >> arch/x86/include/asm/vmx.h | 126 >> arch/x86/kvm/trace.h| 89 - >> 4 files changed, 234 insertions(+), 222 deletions(-) >> >> >> +#define DE_VECTOR 0 >> +#define DB_VECTOR 1 >> +#define BP_VECTOR 3 >> +#define OF_VECTOR 4 >> +#define BR_VECTOR 5 >> +#define UD_VECTOR 6 >> +#define NM_VECTOR 7 >> +#define DF_VECTOR 8 >> +#define TS_VECTOR 10 >> +#define NP_VECTOR 11 >> +#define SS_VECTOR 12 >> +#define GP_VECTOR 13 >> +#define PF_VECTOR 14 >> +#define MF_VECTOR 16 >> +#define MC_VECTOR 18 > > This is not a kvm ABI, but an x86 architecture constants. It should be > put into an existing x86 header. Okay, i will move them into asm/kvm.h > >> + >> #endif /* _ASM_X86_KVM_HOST_H */ >> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h >> index f2b83bc..cdf5674 100644 >> --- a/arch/x86/include/asm/svm.h >> +++ b/arch/x86/include/asm/svm.h >> @@ -1,6 +1,135 @@ >> #ifndef __SVM_H >> #define __SVM_H >> >> + >> +#ifdef __KERNEL__ >> + > > The entire file can be exported; nothing in there is implementation > specific. Unfortunately, i have tied this but failed: ../../arch/x86/include/asm/svm.h:262:1: error: packed attribute is unnecessary for ‘vmcb_seg’ [-Werror=packed] ../../arch/x86/include/asm/svm.h:307:1: error: packed attribute is unnecessary for ‘vmcb_save_area’ [-Werror=packed] ../../arch/x86/include/asm/svm.h:312:1: error: packed attribute is unnecessary for ‘vmcb’ [-Werror=packed] .. > >> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h >> index 74fcb96..61e04e9 100644 >> --- a/arch/x86/include/asm/vmx.h >> +++ b/arch/x86/include/asm/vmx.h >> + >> +#ifdef __KERNEL__ >> + > > Ditto. Or: ../../arch/x86/include/asm/vmx.h:376:0: error: "REG_R8" redefined [-Werror] /usr/include/sys/ucontext.h:46:0: note: this is the location of the previous definition ../../arch/x86/include/asm/vmx.h:377:0: error: "REG_R9" redefined [-Werror] /usr/include/sys/ucontext.h:48:0: note: this is the location of the previous definition .. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment
On Mon, 2012-09-03 at 18:59 +0300, Avi Kivity wrote: > On 08/29/2012 11:49 AM, Peter Maydell wrote: > > On 29 August 2012 09:47, Jan Kiszka wrote: > >> On 2012-08-28 23:26, Peter Maydell wrote: > >>> Since this is arch-specific we should probably give the > >>> resulting device a more specific name than "pci-assign", > >>> which implies that it is (a) ok for any PCI system and > >>> (b) not something that will be obsolete in the foreseen > >>> future. > >> > >> The name has to be like this for libvirt compatibility. I can provide it > >> as alias, though, calling the device "kvm-pci-assign" by default. Better? > > > > ...is it likely to ever work for non-x86 PCI devices? > > It used to work for ia64 before it died. So it's not architecture > specific by design. Caveat; ia64 is a different architecture, but it was on Intel VT-d chipsets, which therefore supported the IOMMU API and ia64 maps guests the same way. This is the same problem I ran into trying to name the vfio iommu driver for x86. It's not really x86-specific, but it's got to look and smell pretty similar. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
On Tue, 2012-08-28 at 13:54 +0200, Paolo Bonzini wrote: > This patch adds queue steering to virtio-scsi. When a target is sent > multiple requests, we always drive them to the same queue so that FIFO > processing order is kept. However, if a target was idle, we can choose > a queue arbitrarily. In this case the queue is chosen according to the > current VCPU, so the driver expects the number of request queues to be > equal to the number of VCPUs. This makes it easy and fast to select > the queue, and also lets the driver optimize the IRQ affinity for the > virtqueues (each virtqueue's affinity is set to the CPU that "owns" > the queue). > > Signed-off-by: Paolo Bonzini > --- Hey Paolo & Co, I've not had a chance to try this with tcm_vhost just yet, but noticed one thing wrt to assumptions about virtio_scsi_target_state->reqs access below.. > drivers/scsi/virtio_scsi.c | 162 > +++- > 1 files changed, 130 insertions(+), 32 deletions(-) > > diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c > index 6414ea0..0c4b096 100644 > --- a/drivers/scsi/virtio_scsi.c > +++ b/drivers/scsi/virtio_scsi.c > @@ -26,6 +26,7 @@ > > #define VIRTIO_SCSI_MEMPOOL_SZ 64 > #define VIRTIO_SCSI_EVENT_LEN 8 > +#define VIRTIO_SCSI_VQ_BASE 2 > > /* Command queue element */ > struct virtio_scsi_cmd { > @@ -59,9 +60,13 @@ struct virtio_scsi_vq { > > /* Per-target queue state */ > struct virtio_scsi_target_state { > - /* Protects sg. Lock hierarchy is tgt_lock -> vq_lock. */ > + /* Protects sg, req_vq. Lock hierarchy is tgt_lock -> vq_lock. */ > spinlock_t tgt_lock; > > + struct virtio_scsi_vq *req_vq; > + > + atomic_t reqs; > + > /* For sglist construction when adding commands to the virtqueue. */ > struct scatterlist sg[]; > }; > @@ -70,14 +75,15 @@ struct virtio_scsi_target_state { > struct virtio_scsi { > struct virtio_device *vdev; > > - struct virtio_scsi_vq ctrl_vq; > - struct virtio_scsi_vq event_vq; > - struct virtio_scsi_vq req_vq; > - > /* Get some buffers ready for event vq */ > struct virtio_scsi_event_node event_list[VIRTIO_SCSI_EVENT_LEN]; > > + u32 num_queues; > struct virtio_scsi_target_state **tgt; > + > + struct virtio_scsi_vq ctrl_vq; > + struct virtio_scsi_vq event_vq; > + struct virtio_scsi_vq req_vqs[]; > }; > > static struct kmem_cache *virtscsi_cmd_cache; > @@ -112,6 +118,9 @@ static void virtscsi_complete_cmd(struct virtio_scsi > *vscsi, void *buf) > struct virtio_scsi_cmd *cmd = buf; > struct scsi_cmnd *sc = cmd->sc; > struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd; > + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; > + > + atomic_dec(&tgt->reqs); > As tgt->tgt_lock is taken in virtscsi_queuecommand_multi() before the atomic_inc_return(tgt->reqs) check, it seems like using atomic_dec() w/o smp_mb__after_atomic_dec or tgt_lock access here is not using atomic.h accessors properly, no..? > dev_dbg(&sc->device->sdev_gendev, > "cmd %p response %u status %#02x sense_len %u\n", > @@ -185,11 +194,13 @@ static void virtscsi_req_done(struct virtqueue *vq) > { > struct Scsi_Host *sh = virtio_scsi_host(vq->vdev); > struct virtio_scsi *vscsi = shost_priv(sh); > + int index = virtqueue_get_queue_index(vq) - VIRTIO_SCSI_VQ_BASE; > + struct virtio_scsi_vq *req_vq = &vscsi->req_vqs[index]; > unsigned long flags; > > - spin_lock_irqsave(&vscsi->req_vq.vq_lock, flags); > + spin_lock_irqsave(&req_vq->vq_lock, flags); > virtscsi_vq_done(vscsi, vq, virtscsi_complete_cmd); > - spin_unlock_irqrestore(&vscsi->req_vq.vq_lock, flags); > + spin_unlock_irqrestore(&req_vq->vq_lock, flags); > }; > > static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf) > @@ -429,10 +440,10 @@ static int virtscsi_kick_cmd(struct > virtio_scsi_target_state *tgt, > return ret; > } > > -static int virtscsi_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc) > +static int virtscsi_queuecommand(struct virtio_scsi *vscsi, > + struct virtio_scsi_target_state *tgt, > + struct scsi_cmnd *sc) > { > - struct virtio_scsi *vscsi = shost_priv(sh); > - struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; > struct virtio_scsi_cmd *cmd; > int ret; > > @@ -466,7 +477,7 @@ static int virtscsi_queuecommand(struct Scsi_Host *sh, > struct scsi_cmnd *sc) > BUG_ON(sc->cmd_len > VIRTIO_SCSI_CDB_SIZE); > memcpy(cmd->req.cmd.cdb, sc->cmnd, sc->cmd_len); > > - if (virtscsi_kick_cmd(tgt, &vscsi->req_vq, cmd, > + if (virtscsi_kick_cmd(tgt, tgt->req_vq, cmd, > sizeof cmd->req.cmd, sizeof cmd->resp.cmd, > GFP_ATOMIC) >= 0) > ret = 0; > @@ -475,6 +486,38 @@ out: >
Re: [PATCH v4 0/8] KVM paravirt remote flush tlb
On Mon, 03 Sep 2012 17:33:46 +0300, Avi Kivity wrote: > On 08/21/2012 02:25 PM, Nikunj A. Dadhania wrote: > > > > kernbench(lower is better) > > == > > base pvflushv4 %improvement > > 1VM48.5800 46.8513 3.55846 > > 2VM 108.1823 104.6410 3.27346 > > 3VM 183.2733 163.3547 10.86825 > > > > ebizzy(higher is better) > > > > base pvflushv4 %improvement > > 1VM 2414.5000 2089.8750 -13.44481 > > 2VM 2167.6250 2371.7500 9.41699 > > 3VM 1600. 2102.5556 31.40060 > > > > The regression is worrying. We're improving the contended case at the > cost of the non-contended case, this is usually the wrong thing to do. > Do we have any clear idea of the cause of the regression? > Previous perf numbers suggest that in 1VM scenario flush_tlb_others_ipi is around 2%, while for contented case its around 10%. That is what is helping contended case. Regards, Nikunj -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] kvm: Fix warning from static code analysis
On 3 September 2012 21:40, Stefan Weil wrote: > Report from smatch: > > kvm-all.c:1373 kvm_init(135) warn: > variable dereferenced before check 's' (see line 1360) > > 's' cannot by NULL (it was alloced using g_malloc0), so there is no need > to check it here. > > Signed-off-by: Stefan Weil Reviewed-by: Peter Maydell -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: Fix warning from static code analysis
Report from smatch: kvm-all.c:1373 kvm_init(135) warn: variable dereferenced before check 's' (see line 1360) 's' cannot by NULL (it was alloced using g_malloc0), so there is no need to check it here. Signed-off-by: Stefan Weil --- kvm-all.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 34b02c1..20b980d 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1370,13 +1370,11 @@ int kvm_init(void) return 0; err: -if (s) { -if (s->vmfd >= 0) { -close(s->vmfd); -} -if (s->fd != -1) { -close(s->fd); -} +if (s->vmfd >= 0) { +close(s->vmfd); +} +if (s->fd != -1) { +close(s->fd); } g_free(s); -- 1.7.10 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment
On Mon, Sep 3, 2012 at 4:14 PM, Avi Kivity wrote: > On 08/29/2012 11:27 AM, Markus Armbruster wrote: >> >> I don't see a point in making contributors avoid non-problems that might >> conceivably become trivial problems some day. Especially when there's >> no automated help with the avoiding. > > -Wpointer-arith +1 > > > > -- > error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sending a VM for restart (after it has been shutdown and saved)
Hi list, I have been hacking the KVM-QEMU code, but need some help to be able to perform a particular operation. Currently, I perform some operations on the VM image after it has received a shutdown call, and after the image itself has been fully saved before KVM-QEMU shuts down. I perform these operations within the bdrv_close_all () method within block.c, and it works exactly as I want it to. What I would like to be able to do is to send the VM in question back for a restart from that part of the code, and I was wondering if this was possible with a given command. When I say send it back, I mean literally to have the KVM-QEMU treat the VM as if has just been started/restarted. Although I know a few things, there is still a lot I don't know so it would be great to get some help on this. Thanks, M -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment
On 08/29/2012 11:27 AM, Markus Armbruster wrote: > > I don't see a point in making contributors avoid non-problems that might > conceivably become trivial problems some day. Especially when there's > no automated help with the avoiding. -Wpointer-arith -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment
On 08/28/2012 03:30 AM, Jan Kiszka wrote: >> >> Maybe add case 8: and default: with abort(), also below. > > PIO is never 8 bytes long, the generic layer protects us. Note: eventually the pio space will be mapped directly to mmio (instead of being bounced via cpu_inb() in the bridge's mmio handler), and so we will have to add this restriction to the memory core. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
On 9/3/12 2:48 AM, don wrote: 于 2012年08月31日 02:29, David Ahern 写道: In addition to Andrew's comment about making the stats struct and functions generic... Yes. :-) On 8/27/12 3:51 AM, Dong Hao wrote: ---8<--- +static void exit_event_decode_key(struct event_key *key, char decode[20]) +{ + const char *exit_reason = get_exit_reason(key->key); + + snprintf(decode, 20, "%s", exit_reason); +} Use scnprintf rather than snprintf. Why? Since we don't care about the return value, what's the difference? 1. consistency (scnprintf is preferred). 2. what if a new exit reason is added in the future that is larger than 19 characters? Better to be safe now. ---8<--- +static bool kvm_event_expand(struct kvm_event *event, int vcpu_id) +{ +int old_max_vcpu = event->max_vcpu; + +if (vcpu_id < event->max_vcpu) +return true; + +while (event->max_vcpu <= vcpu_id) +event->max_vcpu += DEFAULT_VCPU_NUM; + +event->vcpu = realloc(event->vcpu, + event->max_vcpu * sizeof(*event->vcpu)); +if (!event->vcpu) { If realloc fails you leak memory by overwriting the current pointer. Thanks for pointing it out, we will terminate the running instance in our next version. Ok. Make sure to free the memory of the previous pointer on failure before cleaning up. The idea is that all allocations are properly freed on exit. ---8<--- +static double event_stats_stddev(int vcpu_id, struct kvm_event *event) +{ +struct event_stats *stats = &event->total; +double variance, variance_mean, stddev; + +if (vcpu_id != -1) +stats = &event->vcpu[vcpu_id]; + +BUG_ON(!stats->count); + +variance = stats->M2 / (stats->count - 1); +variance_mean = variance / stats->count; +stddev = sqrt(variance_mean); + +return stddev * 100 / stats->mean; +} perf should be consistent in the stddev it shows the user. Any reason to dump relative stddev versus stddev used by perf-stat? Since 'perf stat' uses relative standard deviation rather than stddev, 'perf kvm stat' just follows the style of 'perf stat'. Yes, I've been working on an idea motivated by Andi Kleen. I have noticed that perf stat also uses relative stddev just in a non-direct way. I suggest moving common stats code from perf-stat to utils/stat.[ch], add a rel_stddev_stats function that returns the above and have perf-kvm use that. ---8<--- +/* + * Append "-a" only if "-p"/"--pid" is not specified since they + * are mutually exclusive. + */ +if (!kvm_record_specified_guest(argc, argv)) +rec_argv[i++] = STRDUP_FAIL_EXIT("-a"); Other perf-kvm commands rely on perf-record semantics -- i.e., for user to add the -a or -p option. You mean, remove '-a' from the default options, then: if a user wants to record all guest he will use 'perf stat record -a'; and if a user wants to record the specified guest, he should use 'perf stat record -p xxx'? Yes. Well, 'perf kvm stat record' with a -a or -p. That makes the new stat subcommand consistent with just 'perf kvm record'. Well, as the style of other subcommand, e.g., perf lock/perf sched, the 'perf xxx record' record all events on all cpus, no need to use '-a'. Based on mentioned above, I prefer the original way. ;) Yes, I noticed that some commands add -a before calling cmd_record(). Can't change that but we can keep perf-kvm consistent with its own variants which means not automagically adding -a. David -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment
On 08/29/2012 11:49 AM, Peter Maydell wrote: > On 29 August 2012 09:47, Jan Kiszka wrote: >> On 2012-08-28 23:26, Peter Maydell wrote: >>> Since this is arch-specific we should probably give the >>> resulting device a more specific name than "pci-assign", >>> which implies that it is (a) ok for any PCI system and >>> (b) not something that will be obsolete in the foreseen >>> future. >> >> The name has to be like this for libvirt compatibility. I can provide it >> as alias, though, calling the device "kvm-pci-assign" by default. Better? > > ...is it likely to ever work for non-x86 PCI devices? It used to work for ia64 before it died. So it's not architecture specific by design. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/8] KVM paravirt remote flush tlb
On 09/03/2012 05:33 PM, Avi Kivity wrote: > On 08/21/2012 02:25 PM, Nikunj A. Dadhania wrote: >> >> kernbench(lower is better) >> == >> base pvflushv4 %improvement >> 1VM48.5800 46.8513 3.55846 >> 2VM 108.1823 104.6410 3.27346 >> 3VM 183.2733 163.3547 10.86825 >> >> ebizzy(higher is better) >> >> base pvflushv4 %improvement >> 1VM 2414.5000 2089.8750 -13.44481 >> 2VM 2167.6250 2371.7500 9.41699 >> 3VM 1600. 2102.5556 31.40060 >> > > The regression is worrying. We're improving the contended case at the > cost of the non-contended case, this is usually the wrong thing to do. > Do we have any clear idea of the cause of the regression? For that matter, why do we get an improvement in the 1VM kbuild test? All vcpus should be running most of the time, barring I/O thread activity. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add a page cache-backed balloon device driver.
On 06/26/2012 11:32 PM, Frank Swiderski wrote: > This implementation of a virtio balloon driver uses the page cache to > "store" pages that have been released to the host. The communication > (outside of target counts) is one way--the guest notifies the host when > it adds a page to the page cache, allowing the host to madvise(2) with > MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit > (via the regular page reclaim). This means that inflating the balloon > is similar to the existing balloon mechanism, but the deflate is > different--it re-uses existing Linux kernel functionality to > automatically reclaim. Interesting idea. How is the host able to manage overcommit this way? If deflate is not host controlled, the host may start swapping guests out to disk if they all self-deflate. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/8] KVM paravirt remote flush tlb
On 08/21/2012 02:25 PM, Nikunj A. Dadhania wrote: > > kernbench(lower is better) > == > base pvflushv4 %improvement > 1VM48.5800 46.8513 3.55846 > 2VM 108.1823 104.6410 3.27346 > 3VM 183.2733 163.3547 10.86825 > > ebizzy(higher is better) > > base pvflushv4 %improvement > 1VM 2414.5000 2089.8750 -13.44481 > 2VM 2167.6250 2371.7500 9.41699 > 3VM 1600. 2102.5556 31.40060 > The regression is worrying. We're improving the contended case at the cost of the non-contended case, this is usually the wrong thing to do. Do we have any clear idea of the cause of the regression? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for Tuesday, September 4th
On 09/03/2012 04:35 PM, Jan Kiszka wrote: > On 2012-09-03 13:48, Avi Kivity wrote: >> On 09/03/2012 09:44 AM, Juan Quintela wrote: >>> >>> Hi >>> >>> Please send in any agenda items you are interested in covering. >> >> - protecting MemoryRegion::opaque during dispatch >> >> I'm guessing Ping won't make it due to timezone problems. Jan, if you >> will not participate, please remove the topic from the list (unless >> someone else wants to argue your side). > > Sorry, I'm blocked right at that time. NP, will continue on the list. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for Tuesday, September 4th
On 2012-09-03 13:48, Avi Kivity wrote: > On 09/03/2012 09:44 AM, Juan Quintela wrote: >> >> Hi >> >> Please send in any agenda items you are interested in covering. > > - protecting MemoryRegion::opaque during dispatch > > I'm guessing Ping won't make it due to timezone problems. Jan, if you > will not participate, please remove the topic from the list (unless > someone else wants to argue your side). Sorry, I'm blocked right at that time. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt-users] vm pxe fail
On 08/31/2012 05:37 PM, Alex Jia wrote: > Hi Andrew, > Great, BTW, in fact, you may pxe boot via VF of Intel82576, however, > Intel82576 SR-IOV network adapters > don't provide a ROM BIOS for the cards virtual functions (VF), but an image > of such a ROM is available, > and with this ROM visible to the guest, it can PXE boot. > > In libvirt's xml, you need to configure guest XML like this: > > > > > > > > > > You need to build a ipxe-808610ca.rom by yourself, if you're interested in > this, > please refer to http://ipxe.org/. Is there a way to automate this? Perhaps a database matching PCI IDs and ipxe .roms, which qemu could consult? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM on kernel 2.6.16
On 09/03/2012 04:24 PM, Avi Kivity wrote: > On 09/02/2012 07:08 PM, Lentes, Bernd wrote: >> Hi, >> >> i have several servers with SLES 10 SP4 and want to run kvm on them. SLES 10 >> has kernel 2.6.16. >> Is that possible ? >> >> Thanks for any answer. > > In general no. Please contact your server vendor though. I meant OS vendor -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM on kernel 2.6.16
On 09/02/2012 07:08 PM, Lentes, Bernd wrote: > Hi, > > i have several servers with SLES 10 SP4 and want to run kvm on them. SLES 10 > has kernel 2.6.16. > Is that possible ? > > Thanks for any answer. In general no. Please contact your server vendor though. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bright Cluster Manager Support #2699] NFSoRDMA not working with KVM when cache disabled
It looks like this issue has been dealt with, so I'm closing its ticket. * If you would like to reopen the ticket, then please respond to this mail with the same subject header. * If you would like to open a new ticket, then please send a mail with a new subject header. In either case, the mail must go to the supp...@brightcomputing.com address. regards Pawan On Sun Sep 02 22:46:22 2012, martijn wrote: > Hi Andrew, > > > In that case I don't know. I would suggest not to disable the cache. > > Best regards, > > Martijn > > On Sat, Sep 1, 2012 at 1:51 PM, Andrew Holway > wrote: > > > > > http://support.brightcomputing.com/rt/Ticket/Display.html?id=2699 > > > > > Hi, > > > > That is FULL install (I think) > > > > I create a new virtual machine each time I test it. > > > > Ta, > > > > Andrew > > > > > > On Aug 31, 2012, at 11:20 PM, Martijn de Vries wrote: > > > > > Hi Andrew, > > > > > > That's pretty strange. I am not a KVM expert, so I don't know what > > happens > > > under the hood when you disable cache. Have you tried doing a FULL > > install? > > > > > > Best regards, > > > > > > Martijn > > > > > > On Fri, Aug 31, 2012 at 7:06 PM, Andrew Holway > > > wrote: > > > > > >> > > >> Fri Aug 31 19:06:00 2012: Request 2699 was acted upon. > > >> Transaction: Ticket created by a.hol...@syseleven.de > > >> Queue: Bright > > >> Subject: NFSoRDMA not working with KVM when cache disabled > > >> Owner: Nobody > > >> Requestors: a.hol...@syseleven.de > > >> Status: new > > >> Ticket > >> http://support.brightcomputing.com/rt/Ticket/Display.html?id=2699 > > > > >> > > >> > > >> Hi, > > >> > > >> I am trying to host KVM machines on an NFSoRDMA mount. > > >> > > >> This works: > > >> > > >> -drive file=/mnt/vm001.img,if=none,id=drive-virtio- > disk0,format=raw > > >> -device > > >> > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio- > disk0,id=virtio-disk0 > > >> > > >> This Doesn't! > > >> > > >> -drive > > >> file=/mnt/vm001.img,if=none,id=drive-virtio- > disk0,format=raw,cache=none > > >> -device > > >> > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio- > disk0,id=virtio-disk0,bootindex=2 > > >> > > >> Any ideas why this could be? > > >> > > >> Thanks, > > >> > > >> Andrew > > >> > > >> > > >> > > >> dmesg: > > >> > > >> vda: > > >> vda: vda1 > > >> vda: vda1 > > >> vda: vda1 vda2 > > >> vda: vda1 vda2 > > >> vda: vda1 vda2 vda3 > > >> vda: vda1 vda2 vda3 > > >> vda: vda1 vda2 vda3 vda4 < > > > >> vda: vda1 vda2 vda3 vda4 < > > > >> vda: vda1 vda2 vda3 vda4 < vda5 > > > >> vda: vda1 vda2 vda3 vda4 < vda5 > > > >> vda: vda1 vda2 vda3 vda4 < vda5 vda6 > > > >> vda: vda1 vda2 vda3 vda4 < vda5 vda6 > > > >> Adding 15998968k swap on /dev/vda5. Priority:-1 extents:1 > > across:15998968k > > >> EXT3-fs (vda1): error: can't find ext3 filesystem on dev vda1. > > >> > > >> /var/log/messages > > >> > > >> Aug 31 18:53:46 (none) node-installer: Mounting disks. > > >> Aug 31 18:53:46 (none) node-installer: Updating device status: > mounting > > >> disks > > >> Aug 31 18:53:46 (none) node-installer: Detecting device > '/dev/sda': not > > >> found > > >> Aug 31 18:53:46 (none) node-installer: Detecting device > '/dev/hda': not > > >> found > > >> Aug 31 18:53:46 (none) node-installer: Detecting device > '/dev/vda': > > found > > >> Aug 31 18:53:46 (none) node-installer: swapon /dev/vda5 > > >> Aug 31 18:53:46 (none) node-installer: mkdir -p /localdisk/ > > >> Aug 31 18:53:46 (none) kernel: Adding 15998968k swap on > /dev/vda5. > > >> Priority:-1 extents:1 across:15998968k > > >> Aug 31 18:53:46 (none) node-installer: Mounting /dev/vda1 on > /localdisk/ > > >> Aug 31 18:53:46 (none) node-installer: mount -t ext3 -o > > >> defaults,noatime,nodiratime /dev/vda1 /localdisk/ > > >> Aug 31 18:53:46 (none) node-installer: mount: wrong fs type, bad > option, > > >> bad superblock on /dev/vda1, > > >> Aug 31 18:53:46 (none) node-installer: missing codepage or > helper > > >> program, or other error > > >> Aug 31 18:53:46 (none) node-installer: In some cases > useful info > > is > > >> found in syslog - try > > >> Aug 31 18:53:46 (none) node-installer: dmesg | tail or so > > >> Aug 31 18:53:46 (none) node-installer: > > >> Aug 31 18:53:46 (none) node-installer: Command failed. > > >> Aug 31 18:53:46 (none) node-installer: Running: "mount -t ext3 -o > > >> defaults,noatime,nodiratime /dev/vda1 /localdisk/" failed: > > >> Aug 31 18:53:46 (none) node-installer: Non zero exit code: 32 > > >> Aug 31 18:53:46 (none) kernel: EXT3-fs (vda1): error: can't find > ext3 > > >> filesystem on dev vda1. > > >> Aug 31 18:53:46 (none) node-installer: Failed to mount disks. > (Exit code > > >> 12, signal 0) > > >> Aug 31 18:53:46 (none) node-installer: There was a fatal problem. > This > > >> node can not be installed until the problem is corrected. > > >> Aug 31 18:53:46 (none) node-installer: The error was: failed to > mount > > disks > > >> Aug 31 18:53:46 (none) node-installer: Updating device status: > failed to >
Re: [RFC 0/5] Making KVM_GET_ONE_REG/KVM_SET_ONE_REG generic.
On 3 September 2012 13:33, Rusty Russell wrote: > I have no idea, since I didn't hear the complaints. But any non-fixed > size array has issues in C; there's not much we can do about it. > > x86 manages this fine for msrs, and I didn't have a problem using it for > my test programs. That's the limit of my experience, however. Yes, I didn't particularly find the ioctls Rusty initially suggested hard to use (apart from briefly getting confused between struct kvm_msrs and struct kvm_msr_list; you could collapse those down into one struct if you wanted I suppose). The nastiest problem in the target-i386 qemu code that gets the MSR list is the fact it has to work around some ancient kernel bug where the ioctl would write beyond the end of the array... -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/5] Making KVM_GET_ONE_REG/KVM_SET_ONE_REG generic.
Avi Kivity writes: > On 09/01/2012 03:35 PM, Rusty Russell wrote: >> Passing an address in a struct is pretty bad, since it involves >> compatibility wrappers. > > Right, some s390 thing. Err, no, i386 on x86-64, or ppc32 on ppc64, or arm on arm64 Any time you put a pointer in a structure which is exposed to userspace, you have to deal with this. >> I don't think that is what makes the API hard >> to use. > > What is it then? I forgot what the original complaints/complainers were. I have no idea, since I didn't hear the complaints. But any non-fixed size array has issues in C; there's not much we can do about it. x86 manages this fine for msrs, and I didn't have a problem using it for my test programs. That's the limit of my experience, however. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] Provide userspace IO exit completion callback.
Current code assumes that IO exit was due to instruction emulation and handles execution back to emulator directly. This patch adds new userspace IO exit completion callback that can be set by any other code that caused IO exit to userspace. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/x86.c | 93 +++ 2 files changed, 57 insertions(+), 37 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index fc0e752..64adb61 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -414,6 +414,7 @@ struct kvm_vcpu_arch { struct x86_emulate_ctxt emulate_ctxt; bool emulate_regs_need_sync_to_vcpu; bool emulate_regs_need_sync_from_vcpu; + int (*complete_userspace_io)(struct kvm_vcpu *vcpu); gpa_t time; struct pvclock_vcpu_time_info hv_clock; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 20f2266..2520d18 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4544,6 +4544,9 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt, return true; } +static int complete_emulated_mmio(struct kvm_vcpu *vcpu); +static int complete_emulated_pio(struct kvm_vcpu *vcpu); + int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2, int emulation_type, @@ -4614,13 +4617,16 @@ restart: } else if (vcpu->arch.pio.count) { if (!vcpu->arch.pio.in) vcpu->arch.pio.count = 0; - else + else { writeback = false; + vcpu->arch.complete_userspace_io = complete_emulated_pio; + } r = EMULATE_DO_MMIO; } else if (vcpu->mmio_needed) { if (!vcpu->mmio_is_write) writeback = false; r = EMULATE_DO_MMIO; + vcpu->arch.complete_userspace_io = complete_emulated_mmio; } else if (r == EMULATION_RESTART) goto restart; else @@ -5476,6 +5482,24 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) return r; } +static inline int complete_emulated_io(struct kvm_vcpu *vcpu) +{ + int r; + vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); + r = emulate_instruction(vcpu, EMULTYPE_NO_DECODE); + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); + if (r != EMULATE_DONE) + return 0; + return 1; +} + +static int complete_emulated_pio(struct kvm_vcpu *vcpu) +{ + BUG_ON(!vcpu->arch.pio.count); + + return complete_emulated_io(vcpu); +} + /* * Implements the following, as a state machine: * @@ -5492,47 +5516,37 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) * copy data * exit */ -static int complete_mmio(struct kvm_vcpu *vcpu) +static int complete_emulated_mmio(struct kvm_vcpu *vcpu) { struct kvm_run *run = vcpu->run; struct kvm_mmio_fragment *frag; - int r; - if (!(vcpu->arch.pio.count || vcpu->mmio_needed)) - return 1; + BUG_ON(!vcpu->mmio_needed); - if (vcpu->mmio_needed) { - /* Complete previous fragment */ - frag = &vcpu->mmio_fragments[vcpu->mmio_cur_fragment++]; - if (!vcpu->mmio_is_write) - memcpy(frag->data, run->mmio.data, frag->len); - if (vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments) { - vcpu->mmio_needed = 0; - if (vcpu->mmio_is_write) - return 1; - vcpu->mmio_read_completed = 1; - goto done; - } - /* Initiate next fragment */ - ++frag; - run->exit_reason = KVM_EXIT_MMIO; - run->mmio.phys_addr = frag->gpa; + /* Complete previous fragment */ + frag = &vcpu->mmio_fragments[vcpu->mmio_cur_fragment++]; + if (!vcpu->mmio_is_write) + memcpy(frag->data, run->mmio.data, frag->len); + if (vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments) { + vcpu->mmio_needed = 0; if (vcpu->mmio_is_write) - memcpy(run->mmio.data, frag->data, frag->len); - run->mmio.len = frag->len; - run->mmio.is_write = vcpu->mmio_is_write; - return 0; - - } -done: - vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); - r = emulate_instruction(vcpu, EMULTYPE_NO_DECODE); - srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); - if (r != EMULATE_DONE) - return 0; - return 1; + return 1; + vcpu->mmio_read_completed = 1; + return complete_emulated_io(vcpu); + } + /* Initiate next fragment */
[PATCH 4/4] KVM: emulator: optimize "rep ins" handling.
Optimize "rep ins" by allowing emulator to write back more than one datum at a time. Introduce new operand type OP_MEM_STR which tells writeback() that dst contains pointer to an array that should be written back as opposite to just one data element. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |4 +++- arch/x86/kvm/emulate.c | 33 - 2 files changed, 31 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index c80c091..08d1c64 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -213,8 +213,9 @@ typedef u32 __attribute__((vector_size(16))) sse128_t; /* Type, address-of, and value of an instruction's operand. */ struct operand { - enum { OP_REG, OP_MEM, OP_IMM, OP_XMM, OP_MM, OP_NONE } type; + enum { OP_REG, OP_MEM, OP_MEM_STR, OP_IMM, OP_XMM, OP_MM, OP_NONE } type; unsigned int bytes; + unsigned int count; union { unsigned long orig_val; u64 orig_val64; @@ -234,6 +235,7 @@ struct operand { char valptr[sizeof(unsigned long) + 2]; sse128_t vec_val; u64 mm_val; + void *data; }; }; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8a1c42e..e87e616 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1301,8 +1301,15 @@ static int pio_in_emulated(struct x86_emulate_ctxt *ctxt, rc->end = n * size; } - memcpy(dest, rc->data + rc->pos, size); - rc->pos += size; + if (ctxt->rep_prefix && !(ctxt->eflags & EFLG_DF)) { + ctxt->dst.data = rc->data + rc->pos; + ctxt->dst.type = OP_MEM_STR; + ctxt->dst.count = (rc->end - rc->pos) / size; + rc->pos = rc->end; + } else { + memcpy(dest, rc->data + rc->pos, size); + rc->pos += size; + } return 1; } @@ -1546,6 +1553,14 @@ static int writeback(struct x86_emulate_ctxt *ctxt) if (rc != X86EMUL_CONTINUE) return rc; break; + case OP_MEM_STR: + rc = segmented_write(ctxt, + ctxt->dst.addr.mem, + ctxt->dst.data, + ctxt->dst.bytes * ctxt->dst.count); + if (rc != X86EMUL_CONTINUE) + return rc; + break; case OP_XMM: write_sse_reg(ctxt, &ctxt->dst.vec_val, ctxt->dst.addr.xmm); break; @@ -2793,7 +2808,7 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, static void string_addr_inc(struct x86_emulate_ctxt *ctxt, int reg, struct operand *op) { - int df = (ctxt->eflags & EFLG_DF) ? -1 : 1; + int df = (ctxt->eflags & EFLG_DF) ? -op->count : op->count; register_address_increment(ctxt, reg_rmw(ctxt, reg), df * op->bytes); op->addr.mem.ea = register_address(ctxt, reg_read(ctxt, reg)); @@ -3733,7 +3748,7 @@ static struct opcode opcode_table[256] = { I(DstReg | SrcMem | ModRM | Src2Imm, em_imul_3op), I(SrcImmByte | Mov | Stack, em_push), I(DstReg | SrcMem | ModRM | Src2ImmByte, em_imul_3op), - I2bvIP(DstDI | SrcDX | Mov | String, em_in, ins, check_perm_in), /* insb, insw/insd */ + I2bvIP(DstDI | SrcDX | Mov | String | Unaligned, em_in, ins, check_perm_in), /* insb, insw/insd */ I2bvIP(SrcSI | DstDX | String, em_out, outs, check_perm_out), /* outsb, outsw/outsd */ /* 0x70 - 0x7F */ X16(D(SrcImmByte)), @@ -3991,6 +4006,7 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op, register_address(ctxt, reg_read(ctxt, VCPU_REGS_RDI)); op->addr.mem.seg = VCPU_SREG_ES; op->val = 0; + op->count = 1; break; case OpDX: op->type = OP_REG; @@ -4034,6 +4050,7 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op, register_address(ctxt, reg_read(ctxt, VCPU_REGS_RSI)); op->addr.mem.seg = seg_override(ctxt); op->val = 0; + op->count = 1; break; case OpImmFAddr: op->type = OP_IMM; @@ -4575,8 +4592,14 @@ writeback: string_addr_inc(ctxt, VCPU_REGS_RDI, &ctxt->dst); if (ctxt->rep_prefix && (ctxt->d & String)) { + unsigned int count; struct read_cache *r = &ctxt->io_read; - register_address_increment(ctxt, reg_rmw(ctxt, VCPU_REGS_RCX), -1); + if ((ctxt->d & SrcMask) == SrcSI) + count = ctxt->src.count; + else + count = ctxt->dst.count; +
[PATCH REBASE RESEND 0/4] improve speed of "rep ins" emulation
This series (or rather the last patch of it) takes different approach to "rep ins" optimization. Instead of writing separate fast path for it do the fast path inside emulator itself. This way nobody can say the code is not reused! Patch 1,2 are now, strictly speaking, not needed, but I think this is still nice cleanup and, in case of patch 1, eliminates some ifs at each KVM_RUN ioctl. Gleb Natapov (4): Provide userspace IO exit completion callback. KVM: emulator: make x86 emulation modes enum instead of defines KVM: emulator: string_addr_inc() cleanup KVM: emulator: optimize "rep ins" handling. arch/x86/include/asm/kvm_emulate.h | 26 +- arch/x86/include/asm/kvm_host.h|1 + arch/x86/kvm/emulate.c | 48 ++- arch/x86/kvm/x86.c | 93 ++-- 4 files changed, 105 insertions(+), 63 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] KVM: emulator: make x86 emulation modes enum instead of defines
Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h | 22 ++ arch/x86/kvm/emulate.c |4 +++- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 282aee5..c80c091 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -249,6 +249,15 @@ struct read_cache { unsigned long end; }; +/* Execution mode, passed to the emulator. */ +enum x86emul_mode { + X86EMUL_MODE_REAL, /* Real mode. */ + X86EMUL_MODE_VM86, /* Virtual 8086 mode. */ + X86EMUL_MODE_PROT16,/* 16-bit protected mode. */ + X86EMUL_MODE_PROT32,/* 32-bit protected mode. */ + X86EMUL_MODE_PROT64,/* 64-bit (long) mode.*/ +}; + struct x86_emulate_ctxt { struct x86_emulate_ops *ops; @@ -256,7 +265,7 @@ struct x86_emulate_ctxt { unsigned long eflags; unsigned long eip; /* eip before instruction emulation */ /* Emulated execution mode, represented by an X86EMUL_MODE value. */ - int mode; + enum x86emul_mode mode; /* interruptibility state, as a result of execution of STI or MOV SS */ int interruptibility; @@ -308,17 +317,6 @@ struct x86_emulate_ctxt { #define REPE_PREFIX0xf3 #define REPNE_PREFIX 0xf2 -/* Execution mode, passed to the emulator. */ -#define X86EMUL_MODE_REAL 0/* Real mode. */ -#define X86EMUL_MODE_VM86 1/* Virtual 8086 mode. */ -#define X86EMUL_MODE_PROT16 2/* 16-bit protected mode. */ -#define X86EMUL_MODE_PROT32 4/* 32-bit protected mode. */ -#define X86EMUL_MODE_PROT64 8/* 64-bit (long) mode.*/ - -/* any protected mode */ -#define X86EMUL_MODE_PROT (X86EMUL_MODE_PROT16|X86EMUL_MODE_PROT32| \ - X86EMUL_MODE_PROT64) - /* CPUID vendors */ #define X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx 0x68747541 #define X86EMUL_CPUID_VENDOR_AuthenticAMD_ecx 0x444d4163 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 1451cff..a453357 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2268,6 +2268,8 @@ static int em_sysenter(struct x86_emulate_ctxt *ctxt) if (msr_data == 0x0) return emulate_gp(ctxt, 0); break; + default: + break; } ctxt->eflags &= ~(EFLG_VM | EFLG_IF | EFLG_RF); @@ -4400,7 +4402,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt) } /* Instruction can only be executed in protected mode */ - if ((ctxt->d & Prot) && !(ctxt->mode & X86EMUL_MODE_PROT)) { + if ((ctxt->d & Prot) && ctxt->mode < X86EMUL_MODE_PROT16) { rc = emulate_ud(ctxt); goto done; } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] KVM: emulator: string_addr_inc() cleanup
Remove unneeded segment argument. Address structure already has correct segment which was put there during decode. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index a453357..8a1c42e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2790,14 +2790,13 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, return (rc == X86EMUL_UNHANDLEABLE) ? EMULATION_FAILED : EMULATION_OK; } -static void string_addr_inc(struct x86_emulate_ctxt *ctxt, unsigned seg, - int reg, struct operand *op) +static void string_addr_inc(struct x86_emulate_ctxt *ctxt, int reg, + struct operand *op) { int df = (ctxt->eflags & EFLG_DF) ? -1 : 1; register_address_increment(ctxt, reg_rmw(ctxt, reg), df * op->bytes); op->addr.mem.ea = register_address(ctxt, reg_read(ctxt, reg)); - op->addr.mem.seg = seg; } static int em_das(struct x86_emulate_ctxt *ctxt) @@ -4570,12 +4569,10 @@ writeback: ctxt->dst.type = saved_dst_type; if ((ctxt->d & SrcMask) == SrcSI) - string_addr_inc(ctxt, seg_override(ctxt), - VCPU_REGS_RSI, &ctxt->src); + string_addr_inc(ctxt, VCPU_REGS_RSI, &ctxt->src); if ((ctxt->d & DstMask) == DstDI) - string_addr_inc(ctxt, VCPU_SREG_ES, VCPU_REGS_RDI, - &ctxt->dst); + string_addr_inc(ctxt, VCPU_REGS_RDI, &ctxt->dst); if (ctxt->rep_prefix && (ctxt->d & String)) { struct read_cache *r = &ctxt->io_read; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFSoRDMA not working with KVM when cache disabled
On 09/03/2012 02:57 PM, Andrew Holway wrote: >> >> and report which (if any) of the output files (x1, x2, y1, y2) are >> corrupted, by comparing them against the original. This will tell us >> whether O_DIRECT is broken, or 512 byte block size, or neither. >> > > > Looks like you were directly on the money there. 512, 1K and 2K O_DIRECT > looks broken. It was hardly unexpected. Please report the problem to the nfs/rdma list (and keep kvm@ copied, nfs/rdma is fairly relevant to kvm). You can work around it by exposing a 4k logical block size to the guest (logical_block_size=4k). Note not all guests support this. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFSoRDMA not working with KVM when cache disabled
> > and report which (if any) of the output files (x1, x2, y1, y2) are > corrupted, by comparing them against the original. This will tell us > whether O_DIRECT is broken, or 512 byte block size, or neither. > Looks like you were directly on the money there. 512, 1K and 2K O_DIRECT looks broken. [root@node001 mnt]# for f in 512 1024 2048 4096 8192 16384 32768 65536 131072; do dd bs="$f" if=CentOS-6.3-x86_64-netinstall.iso of=hello && md5sum hello && rm -f hello; done409600+0 records in 409600+0 records out 209715200 bytes (210 MB) copied, 0.774718 s, 271 MB/s 690138908de516b6e5d7d180d085c3f3 hello 204800+0 records in 204800+0 records out 209715200 bytes (210 MB) copied, 0.609578 s, 344 MB/s 690138908de516b6e5d7d180d085c3f3 hello 102400+0 records in 102400+0 records out 209715200 bytes (210 MB) copied, 0.490421 s, 428 MB/s 690138908de516b6e5d7d180d085c3f3 hello 51200+0 records in 51200+0 records out 209715200 bytes (210 MB) copied, 0.440156 s, 476 MB/s 690138908de516b6e5d7d180d085c3f3 hello 25600+0 records in 25600+0 records out 209715200 bytes (210 MB) copied, 0.4229 s, 496 MB/s 690138908de516b6e5d7d180d085c3f3 hello 12800+0 records in 12800+0 records out 209715200 bytes (210 MB) copied, 0.40914 s, 513 MB/s 690138908de516b6e5d7d180d085c3f3 hello 6400+0 records in 6400+0 records out 209715200 bytes (210 MB) copied, 0.427247 s, 491 MB/s 690138908de516b6e5d7d180d085c3f3 hello 3200+0 records in 3200+0 records out 209715200 bytes (210 MB) copied, 0.411776 s, 509 MB/s 690138908de516b6e5d7d180d085c3f3 hello 1600+0 records in 1600+0 records out 209715200 bytes (210 MB) copied, 0.417098 s, 503 MB/s 690138908de516b6e5d7d180d085c3f3 hello [root@node001 mnt]# for f in 512 1024 2048 4096 8192 16384 32768 65536 131072; do dd bs="$f" if=CentOS-6.3-x86_64-netinstall.iso of=hello iflag=direct oflag=direct && md5sum hello && rm -f hello; done 409600+0 records in 409600+0 records out 209715200 bytes (210 MB) copied, 62.3649 s, 3.4 MB/s aadd0ffe3c9dfa35d8354e99ecac9276 hello 204800+0 records in 204800+0 records out 209715200 bytes (210 MB) copied, 41.3876 s, 5.1 MB/s 336f6da78f93dab591edc18da81f002e hello 102400+0 records in 102400+0 records out 209715200 bytes (210 MB) copied, 21.1712 s, 9.9 MB/s f4cefe0a05c9b47ba68effdb17dc95d6 hello 51200+0 records in 51200+0 records out 209715200 bytes (210 MB) copied, 10.9631 s, 19.1 MB/s 690138908de516b6e5d7d180d085c3f3 hello 25600+0 records in 25600+0 records out 209715200 bytes (210 MB) copied, 5.4136 s, 38.7 MB/s 690138908de516b6e5d7d180d085c3f3 hello 12800+0 records in 12800+0 records out 209715200 bytes (210 MB) copied, 3.1448 s, 66.7 MB/s 690138908de516b6e5d7d180d085c3f3 hello 6400+0 records in 6400+0 records out 209715200 bytes (210 MB) copied, 1.77304 s, 118 MB/s 690138908de516b6e5d7d180d085c3f3 hello 3200+0 records in 3200+0 records out 209715200 bytes (210 MB) copied, 1.4331 s, 146 MB/s 690138908de516b6e5d7d180d085c3f3 hello 1600+0 records in 1600+0 records out 209715200 bytes (210 MB) copied, 0.922167 s, 227 MB/s 690138908de516b6e5d7d180d085c3f3 hello > > -- > error compiling committee.c: too many arguments to function > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2] virtio-spec: virtio network device multiqueue support
At Jason's request, I am trying to help finalize the spec for the new multiqueue feature. Changes from Jason's rfc: - reserved vq 3: this makes all rx vqs even and tx vqs odd, which looks nicer to me. - documented packet steering, added a generalized steering programming command. Current modes are single queue and host driven multiqueue, but I envision support for guest driven multiqueue in the future. - make default vqs unused when in mq mode - this wastes some memory but makes it more efficient to switch between modes as we can avoid this causing packet reordering. Rusty, could you please take a look and comment? If this looks OK to everyone, we can proceed with finalizing the implementation. This patch is against eb9fc84d0d3c46438aaab190e2401a9e5409a052 in virtio-spec git tree. --> virtio-spec: virtio network device multiqueue support Add multiqueue support to virtio network device. Add a new feature flag VIRTIO_NET_F_MULTIQUEUE for this feature, a new configuration field max_virtqueue_pairs to detect supported number of virtqueues as well as a new command VIRTIO_NET_CTRL_STEERING to program packet steering. Signed-off-by: Michael S. Tsirkin -- diff --git a/virtio-spec.lyx b/virtio-spec.lyx index 7a073f4..583debc 100644 --- a/virtio-spec.lyx +++ b/virtio-spec.lyx @@ -58,6 +58,7 @@ \html_be_strict false \author -608949062 "Rusty Russell,,," \author 1531152142 "Paolo Bonzini,,," +\author 1986246365 "Michael S. Tsirkin" \end_header \begin_body @@ -3896,6 +3897,37 @@ Only if VIRTIO_NET_F_CTRL_VQ set \end_inset +\change_inserted 1986246365 1346663522 + 3: reserved +\end_layout + +\begin_layout Description + +\change_inserted 1986246365 1346663550 +4: receiveq1. + 5: transmitq1. + 6: receiveq2. + 7. + transmitq2. + ... + 2N+2:receivqN, 2N+3:transmitqN +\begin_inset Foot +status open + +\begin_layout Plain Layout + +\change_inserted 1986246365 1346663558 +Only if VIRTIO_NET_F_CTRL_VQ set. + N is indicated by max_virtqueue_pairs field. +\change_unchanged + +\end_layout + +\end_inset + + +\change_unchanged + \end_layout \begin_layout Description @@ -4056,6 +4088,17 @@ VIRTIO_NET_F_CTRL_VLAN \begin_layout Description VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets. +\change_inserted 1986246365 1346617842 + +\end_layout + +\begin_layout Description + +\change_inserted 1986246365 1346618103 +VIRTIO_NET_F_MULTIQUEUE(22) Device has multiple receive and transmission + queues. +\change_unchanged + \end_layout \end_deeper @@ -4068,11 +4111,45 @@ configuration \begin_inset space ~ \end_inset -layout Two configuration fields are currently defined. +layout +\change_deleted 1986246365 1346671560 +Two +\change_inserted 1986246365 1346671647 +Six +\change_unchanged + configuration fields are currently defined. The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC is set), and the status field only exists if VIRTIO_NET_F_STATUS is set. Two read-only bits are currently defined for the status field: VIRTIO_NET_S_LIN K_UP and VIRTIO_NET_S_ANNOUNCE. + +\change_inserted 1986246365 1346672138 + The following four read-only fields only exists if VIRTIO_NET_F_MULTIQUEUE + is set. + The max_virtqueue_pairs field specifies the maximum number of each of transmit + and receive virtqueues that can used for multiqueue operation. + The following read-only fields: +\emph on +current_steering_rule +\emph default +, +\emph on +reserved +\emph default + and +\emph on +current_steering_param +\emph default + store the last successful VIRTIO_NET_CTRL_STEERING +\begin_inset CommandInset ref +LatexCommand ref +reference "sub:Transmit-Packet-Steering" + +\end_inset + + command executed by driver, for debugging. + +\change_unchanged \begin_inset listings inline false @@ -4105,6 +4182,40 @@ struct virtio_net_config { \begin_layout Plain Layout u16 status; +\change_inserted 1986246365 1346671221 + +\end_layout + +\begin_layout Plain Layout + +\change_inserted 1986246365 1346671532 + +u16 max_virtqueue_pairs; +\end_layout + +\begin_layout Plain Layout + +\change_inserted 1986246365 1346671531 + +u8 current_steering_rule; +\change_unchanged + +\end_layout + +\begin_layout Plain Layout + +\change_inserted 1986246365 1346671499 + +u8 reserved; +\end_layout + +\begin_layout Plain Layout + +\change_inserted 1986246365 1346671530 + +u16 current_steering_param; +\change_unchanged + \end_layout \begin_layout Plain Layout @@ -4151,6 +4262,18 @@ physical \begin_layout Enumerate If the VIRTIO_NET_F_CTRL_VQ feature bit is negotiated, identify the control virtqueue. +\change_inserted 1986246365 1346618052 + +\end_layout + +\begin_layout Enumerate + +\change_inserted 1986246365 1346618175 +If VIRTIO_NET_F_MULTIQUEUE feature bit is negotiated, identify the receive + and transmission queues that are going to be used in multiqueue mode. + Only queues that are going to be used need to be initialized. +\change_unchanged + \end_layou
Re: KVM call agenda for Tuesday, September 4th
On 09/03/2012 09:44 AM, Juan Quintela wrote: > > Hi > > Please send in any agenda items you are interested in covering. - protecting MemoryRegion::opaque during dispatch I'm guessing Ping won't make it due to timezone problems. Jan, if you will not participate, please remove the topic from the list (unless someone else wants to argue your side). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND] KVM: cleanup pic reset
kvm_pic_reset() is not used anywhere. Move reset logic from pic_ioport_write() there. Signed-off-by: Gleb Natapov diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index 90c84f9..848206d 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -275,23 +275,20 @@ void kvm_pic_reset(struct kvm_kpic_state *s) { int irq, i; struct kvm_vcpu *vcpu; - u8 irr = s->irr, isr = s->imr; + u8 edge_irr = s->irr & ~s->elcr; bool found = false; s->last_irr = 0; - s->irr = 0; + s->irr &= s->elcr; s->imr = 0; - s->isr = 0; s->priority_add = 0; - s->irq_base = 0; - s->read_reg_select = 0; - s->poll = 0; s->special_mask = 0; - s->init_state = 0; - s->auto_eoi = 0; - s->rotate_on_auto_eoi = 0; - s->special_fully_nested_mode = 0; - s->init4 = 0; + s->read_reg_select = 0; + if (!s->init4) { + s->special_fully_nested_mode = 0; + s->auto_eoi = 0; + } + s->init_state = 1; kvm_for_each_vcpu(i, vcpu, s->pics_state->kvm) if (kvm_apic_accept_pic_intr(vcpu)) { @@ -304,7 +301,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s) return; for (irq = 0; irq < PIC_NUM_PINS/2; irq++) - if (irr & (1 << irq) || isr & (1 << irq)) + if (edge_irr & (1 << irq)) pic_clear_isr(s, irq); } @@ -316,40 +313,13 @@ static void pic_ioport_write(void *opaque, u32 addr, u32 val) addr &= 1; if (addr == 0) { if (val & 0x10) { - u8 edge_irr = s->irr & ~s->elcr; - int i; - bool found; - struct kvm_vcpu *vcpu; - s->init4 = val & 1; - s->last_irr = 0; - s->irr &= s->elcr; - s->imr = 0; - s->priority_add = 0; - s->special_mask = 0; - s->read_reg_select = 0; - if (!s->init4) { - s->special_fully_nested_mode = 0; - s->auto_eoi = 0; - } - s->init_state = 1; if (val & 0x02) pr_pic_unimpl("single mode not supported"); if (val & 0x08) pr_pic_unimpl( - "level sensitive irq not supported"); - - kvm_for_each_vcpu(i, vcpu, s->pics_state->kvm) - if (kvm_apic_accept_pic_intr(vcpu)) { - found = true; - break; - } - - - if (found) - for (irq = 0; irq < PIC_NUM_PINS/2; irq++) - if (edge_irr & (1 << irq)) - pic_clear_isr(s, irq); + "level sensitive irq not supported"); + kvm_pic_reset(s); } else if (val & 0x08) { if (val & 0x04) s->poll = 1; -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for Tuesday, September 4th
Juan Quintela wrote: > Hi > > Please send in any agenda items you are interested in covering. > > Thanks, Juan. Forwarding my own email. Just put a wrong address on the 1st try (kvm@ folks receive this twice). Thanks to Markus for noticing it. Later, Juan. PD. /me learn not to sent email so soon. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 1/3] KVM: x86: export svm/vmx exit code and vector code to userspace
On 08/27/2012 12:51 PM, Dong Hao wrote: > From: Xiao Guangrong > > Exporting KVM exit information to userspace to be consumed by perf. > > [ Dong Hao : rebase it on acme's git tree ] > Signed-off-by: Xiao Guangrong > Signed-off-by: Dong Hao > --- > arch/x86/include/asm/kvm_host.h | 36 --- Please put ABIs in kvm.h. But see below. > arch/x86/include/asm/svm.h | 205 > +-- > arch/x86/include/asm/vmx.h | 126 > arch/x86/kvm/trace.h| 89 - > 4 files changed, 234 insertions(+), 222 deletions(-) > > > +#define DE_VECTOR 0 > +#define DB_VECTOR 1 > +#define BP_VECTOR 3 > +#define OF_VECTOR 4 > +#define BR_VECTOR 5 > +#define UD_VECTOR 6 > +#define NM_VECTOR 7 > +#define DF_VECTOR 8 > +#define TS_VECTOR 10 > +#define NP_VECTOR 11 > +#define SS_VECTOR 12 > +#define GP_VECTOR 13 > +#define PF_VECTOR 14 > +#define MF_VECTOR 16 > +#define MC_VECTOR 18 This is not a kvm ABI, but an x86 architecture constants. It should be put into an existing x86 header. > + > #endif /* _ASM_X86_KVM_HOST_H */ > diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h > index f2b83bc..cdf5674 100644 > --- a/arch/x86/include/asm/svm.h > +++ b/arch/x86/include/asm/svm.h > @@ -1,6 +1,135 @@ > #ifndef __SVM_H > #define __SVM_H > > + > +#ifdef __KERNEL__ > + The entire file can be exported; nothing in there is implementation specific. > diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h > index 74fcb96..61e04e9 100644 > --- a/arch/x86/include/asm/vmx.h > +++ b/arch/x86/include/asm/vmx.h > + > +#ifdef __KERNEL__ > + Ditto. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/3] KVM: x86: trace mmio begin and complete
On 08/27/2012 12:51 PM, Dong Hao wrote: > From: Xiao Guangrong > > 'perf kvm stat record/report' will use kvm_exit and kvm_mmio(read...) to > calculate mmio read emulated time for the old kernel, in order to trace > mmio read event more exactly, we add kvm_mmio_begin to trace the time when > mmio read begins, also, add kvm_io_done to trace the time when mmio/pio is > completed > Why is this so critical? If a lot of time is spent in in-kernel mmio, then 'perf top' will report it. Otherwise the time between kvm_exit and kvm_entry describes the time spent in the host. Not all of it is mmio handling, but it is quite close. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFSoRDMA not working with KVM when cache disabled
On 08/31/2012 08:05 PM, Andrew Holway wrote: > Hi, > > I am trying to host KVM machines on an NFSoRDMA mount. > > This works: > > -drive file=/mnt/vm001.img,if=none,id=drive-virtio-disk0,format=raw -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 > > This Doesn't! > > -drive > file=/mnt/vm001.img,if=none,id=drive-virtio-disk0,format=raw,cache=none > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 > > Any ideas why this could be? Please try dd bs=512 if=/mnt/vm001.img of=/mnt/x1 dd bs=512 if=/mnt/vm001.img of=/mnt/x2 dd bs=4096 if=/mnt/vm001.img of=/mnt/y1 iflag=direct oflag=direct dd bs=4096 if=/mnt/vm001.img of=/mnt/y2 iflag=direct oflag=direct and report which (if any) of the output files (x1, x2, y1, y2) are corrupted, by comparing them against the original. This will tell us whether O_DIRECT is broken, or 512 byte block size, or neither. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended way access KVM data structures from other kernel subsystems ?
On 08/23/2012 10:49 AM, Pandarathil, Vijaymohan R wrote: > Hi, > > I am looking for the recommended approach for accessing KVM driver data from > other kernel components. In my case, I need to set some global variable/state > in KVM driver from one of the NMI handlers. I see that using kvm_x86_ops is > one option. I can define a new function as part of kvm_x86_ops and invoke > that function from the NMI handler as below. > > nmi_handler() { > > ... >If (kvm_x86_ops) { > kvm_x86_ops->new_fn(); > > } > > after adding new_func() to struct kvm_x86_ops. > > Is this the right approach ? No, kvm_x86_ops is not available to the core kernel code. > Or is there some other existing mechanism ? Is KVM driver always loaded ? No. > I am fairly new to KVM and so any guidance is very much appreciated. Do something like void nmi_xyz_register(struct nmi_xyz *x); void nmi_xyz_unregister(struct nmi_xyz *x); struct nmi_xyz { void (*nmi)(void); }; then add code to kvm to use nmi_xyz (whatever that is). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm/fpu: Enable fully eager restore kvm FPU
On 08/23/2012 11:51 AM, Hao, Xudong wrote: >> -Original Message- >> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On >> Behalf Of Avi Kivity >> Sent: Monday, August 20, 2012 6:43 PM >> To: Roedel, Joerg >> Cc: Hao, Xudong; kvm@vger.kernel.org; Zhang, Xiantao >> Subject: Re: [PATCH] kvm/fpu: Enable fully eager restore kvm FPU >> >> On 08/20/2012 01:14 PM, Roedel, Joerg wrote: >> > On Mon, Aug 20, 2012 at 01:08:14PM +0300, Avi Kivity wrote: >> >> On 08/20/2012 12:24 PM, Roedel, Joerg wrote: >> > >> >> So it was broken all along? Yikes. >> > >> > There is no LWP support in the kernel and thus KVM can't expose it to >> > guests. So for now nothing should be broken, no? >> >> Oh, we mask out xcr0 bits not supported by the host. >> >> So it's broken in another way: it isn't exposed. Pity, it's such a nice >> feature. >> > > Hi, Avi/Joerg > > What's the decision for it? I don't understand LWP, so how about this patch? It's fine (Joerg can send the LWP change), but there was a truncation issue that needs fixing, no? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/5] Making KVM_GET_ONE_REG/KVM_SET_ONE_REG generic.
On 09/01/2012 03:35 PM, Rusty Russell wrote: > Avi Kivity writes: >>> -Capability: basic >>> +Capability: KVM_CAP_REG_LIST >>> Architectures: arm >> >> all > > OK, I guess that's to be true in future. Fixed. > >>> Type: vcpu ioctl >>> -Parameters: struct kvm_msr_list (in/out) >>> +Parameters: struct kvm_reg_list (in/out) >>> Returns: 0 on success; -1 on error >>> Errors: >>> - E2BIG: the msr index list is too big to fit in the array specified by >>> - the user. >>> + E2BIG: the reg index list is too big to fit in the array specified by >>> + the user (the number required will be written into n). >>> >>> struct kvm_msr_list { >>> - __u32 nmsrs; /* number of msrs in entries */ >>> - __u32 indices[0]; >>> + __u64 n; /* number of registers in reg[] */ >>> + __u64 reg[0]; >>> }; >>> >> >> People complain that this interface is hard to use. >> >> How about supplying the address of the array (in addition to n) so you >> don't have to deal with variable sized arrays, and dropping E2BIG in >> favour of always updating n (n changed to something bigger than you had >> -> reallocate and rerun) > > We re-write n anyway, *and* return -E2BIG. Not returning an error is > asking for trouble. > > Passing an address in a struct is pretty bad, since it involves > compatibility wrappers. Right, some s390 thing. > I don't think that is what makes the API hard > to use. What is it then? I forgot what the original complaints/complainers were. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Qcow2 preallocation + backing file
On Mon, Jul 30, 2012 at 9:10 PM, Mark Moseley wrote: > While I might've answered my own question, I'm no fio expert. I don't > necessarily trust that I'm testing it correctly, which is why I'm > asking here. Running the same test multiple times gives me > 30-40meg/sec for both read and write on the > preallocation=metadata-derived image, vs 0.5-1meg/sec for > non-preallocation=metadata-derived images, both which sound way too > high/low to be right. Same box (ubuntu 12.04 64-bit, running the stock > ubuntu qemu 1.0), same disk, even the same directory. I could also see > some bizarre side effect happening that I'm not aware of, ready to > bite me later on down the road, like, say, that it's silently actually > reading/writing the original image (just a made-up example). qemu-de...@nongnu.org is the best place to discuss QEMU image formats. The development of qcow2 happens there. Your fio question can't be answered without the job file. Please post it. I agree that the numbers look weird, the difference is too great. Are you using -drive ...,cache=none on your qemu command-line and "direct=1" or "buffered=0" (they are synonyms) in the fio job file? Those are necessary to take the host and guest page caches out of the equation. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFSoRDMA not working with KVM when cache disabled
On Fri, Aug 31, 2012 at 6:05 PM, Andrew Holway wrote: > I am trying to host KVM machines on an NFSoRDMA mount. > > This works: > > -drive file=/mnt/vm001.img,if=none,id=drive-virtio-disk0,format=raw -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 > > This Doesn't! > > -drive > file=/mnt/vm001.img,if=none,id=drive-virtio-disk0,format=raw,cache=none > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 > > Any ideas why this could be? -drive cache=none results in QEMU opening the image file with O_DIRECT. It's not clear to me what the first error is. The node-installer output you have posted does not show a series of steps to reproduce the problem. Can you try running isolated tests with dd or verify-data (http://people.redhat.com/sct/src/verify-data/)? Have you tried running a different guest OS, say a Fedora or Debian Live CD? Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: invalidate vpid for invlpg instruction
On 09/03/2012 02:27 AM, Davidlohr Bueso wrote: > On Fri, 2012-08-31 at 14:37 -0300, Marcelo Tosatti wrote: >> On Fri, Aug 31, 2012 at 06:10:48PM +0200, Davidlohr Bueso wrote: >> > For processors that support VPIDs we should invalidate the page table entry >> > specified by the lineal address. For this purpose add support for >> > individual >> > address invalidations. >> >> Not necessary - a single context invalidation is performed through >> KVM_REQ_TLB_FLUSH. > > Since vpid_sync_context() supports both single and all-context vpid > invalidations, wouldn't it make sense to also add individual address > ones as well, supporting further granularity? It might. Do you have benchmarks supporting this? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
于 2012年08月31日 02:29, David Ahern 写道: In addition to Andrew's comment about making the stats struct and functions generic... Yes. :-) On 8/27/12 3:51 AM, Dong Hao wrote: ---8<--- +static void exit_event_decode_key(struct event_key *key, char decode[20]) +{ + const char *exit_reason = get_exit_reason(key->key); + + snprintf(decode, 20, "%s", exit_reason); +} Use scnprintf rather than snprintf. Why? Since we don't care about the return value, what's the difference? ---8<--- +static bool kvm_event_expand(struct kvm_event *event, int vcpu_id) +{ +int old_max_vcpu = event->max_vcpu; + +if (vcpu_id < event->max_vcpu) +return true; + +while (event->max_vcpu <= vcpu_id) +event->max_vcpu += DEFAULT_VCPU_NUM; + +event->vcpu = realloc(event->vcpu, + event->max_vcpu * sizeof(*event->vcpu)); +if (!event->vcpu) { If realloc fails you leak memory by overwriting the current pointer. Thanks for pointing it out, we will terminate the running instance in our next version. ---8<--- +static double event_stats_stddev(int vcpu_id, struct kvm_event *event) +{ +struct event_stats *stats = &event->total; +double variance, variance_mean, stddev; + +if (vcpu_id != -1) +stats = &event->vcpu[vcpu_id]; + +BUG_ON(!stats->count); + +variance = stats->M2 / (stats->count - 1); +variance_mean = variance / stats->count; +stddev = sqrt(variance_mean); + +return stddev * 100 / stats->mean; +} perf should be consistent in the stddev it shows the user. Any reason to dump relative stddev versus stddev used by perf-stat? Since 'perf stat' uses relative standard deviation rather than stddev, 'perf kvm stat' just follows the style of 'perf stat'. ---8<--- +/* returns left most element of result, and erase it */ +static struct kvm_event *pop_from_result(void) +{ +struct rb_node *node = result.rb_node; + +if (!node) +return NULL; + +while (node->rb_left) +node = node->rb_left; Use rb_first(). OK, we will use it in the next version. ---8<--- +/* + * Append "-a" only if "-p"/"--pid" is not specified since they + * are mutually exclusive. + */ +if (!kvm_record_specified_guest(argc, argv)) +rec_argv[i++] = STRDUP_FAIL_EXIT("-a"); Other perf-kvm commands rely on perf-record semantics -- i.e., for user to add the -a or -p option. You mean, remove '-a' from the default options, then: if a user wants to record all guest he will use 'perf stat record -a'; and if a user wants to record the specified guest, he should use 'perf stat record -p xxx'? Well, as the style of other subcommand, e.g., perf lock/perf sched, the 'perf xxx record' record all events on all cpus, no need to use '-a'. Based on mentioned above, I prefer the original way. ;) ---8<--- +static const char * const kvm_events_report_usage[] = { +"perf kvm stat report []", +NULL +}; + +static const struct option kvm_events_report_options[] = { +OPT_STRING(0, "event", &report_event, "report event", +"event for reporting: vmexit, mmio, ioport"), +OPT_INTEGER(0, "vcpu", &trace_vcpu, +"vcpu id to report"), +OPT_STRING('k', "key", &sort_key, "sort-key", +"key for sorting: sample(sort by samples number)" +" time (sort by avg time)"), +OPT_END() +}; + +static int kvm_events_report(int argc, const char **argv) +{ +symbol__init(); + +if (argc) { +argc = parse_options(argc, argv, + kvm_events_report_options, + kvm_events_report_usage, 0); +if (argc) +usage_with_options(kvm_events_report_usage, + kvm_events_report_options); +} + +return kvm_events_report_vcpu(trace_vcpu); +} + +static int kvm_cmd_stat(int argc, const char **argv) +{ +if (argc > 1) { +if (!strncmp(argv[1], "rec", 3)) +return kvm_events_record(argc - 1, argv + 1); + +if (!strncmp(argv[1], "rep", 3)) +return kvm_events_report(argc - 1 , argv + 1); +} + +return cmd_stat(argc, argv, NULL); +} + static charname_buffer[256]; static const char * const kvm_usage[] = { -"perf kvm [] {top|record|report|diff|buildid-list}", +"perf kvm [] {top|record|report|diff|buildid-list|stat}", NULL }; The usage for the report/record sub commands of stat is never shown. e.g., $ perf kvm stat --> shows help for perf-stat $ perf kvm --> shows the above and perf-kvm's usage To get help for the record/report subcommands you have to know that record and report are subcommands. Okay, we will improve this. ---8<--- +static int perf_file_section__read_feature(struct perf_file_section *section, +struct perf_header *ph, +int feat, int fd, void *data) +{ +struct header_read_data *hd = data; + +if (feat != hd->feat) +return 0; + +if (lseek(fd, section->o