Re: [PATCH 0/6] kvm/s390: sigp related changes for 3.6

2012-06-29 Thread Marcelo Tosatti
On Tue, Jun 26, 2012 at 04:06:35PM +0200, Cornelia Huck wrote:
> Avi, Marcelo,
> 
> here are some more s390 patches for the next release.
> 
> Patches 1 and 2 are included for dependency reasons; they will also
> be sent through Martin's s390 tree.

I don't see why patch 1 is a dependency for merging in the kvm 
tree, and why patch 2 should go through both trees?

That is, patch 1 can go through S390 tree, patches 2-6 through 
KVM tree. No?

> The other patches fix several problems in our sigp handling code
> and make it nicer to read.
> 
> Cornelia Huck (1):
>   KVM: s390: Fix sigp sense handling.
> 
> Heiko Carstens (5):
>   s390/smp: remove redundant check
>   s390/smp/kvm: unifiy sigp definitions
>   KVM: s390: fix sigp sense running condition code handling
>   KVM: s390: fix sigp set prefix status stored cases
>   KVM: s390: use sigp condition code defines
> 
>  arch/s390/include/asm/sigp.h |   32 
>  arch/s390/kernel/smp.c   |   76 ++-
>  arch/s390/kvm/sigp.c |  117 
> +-
>  3 files changed, 106 insertions(+), 119 deletions(-)
>  create mode 100644 arch/s390/include/asm/sigp.h

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm] kvm_pv_eoi: add flag support for -cpu

2012-06-29 Thread Marcelo Tosatti
On Mon, Jun 25, 2012 at 05:48:30PM +0300, Michael S. Tsirkin wrote:
> Support the new PV EOI flag in kvm - it just got merged
> into kvm.git. Set by default with -cpu kvm.
> Set for -cpu qemu by adding +kvm_pv_eoi.
> Clear by adding -kvm_pv_eoi to -cpu option.
> 
> Signed-off-by: Michael S. Tsirkin 

Please regenerate against uq/master.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Don't abort on kvm_irqchip_add_msi_route()

2012-06-29 Thread Marcelo Tosatti
On Mon, Jun 25, 2012 at 09:40:39AM -0600, Alex Williamson wrote:
> Anyone using these functions has to be prepared that irqchip
> support may not be present.  It shouldn't be up to the core
> code to determine whether this is a fatal error.  Currently
> code written as:
> 
> virq = kvm_irqchip_add_msi_route(...)
> if (virq < 0) {
> 
> } else {
> 
> }
> 
> works on x86 with and without kvm irqchip enabled, works
> without kvm support compiled in, but aborts() on !x86 with
> kvm support.
> 
> Signed-off-by: Alex Williamson 

Applied to uq/master.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

2012-06-29 Thread Alex Williamson
On Thu, 2012-06-28 at 11:46 +0300, Gleb Natapov wrote:
> On Thu, Jun 28, 2012 at 11:41:05AM +0300, Michael S. Tsirkin wrote:
> > On Thu, Jun 28, 2012 at 11:35:41AM +0300, Gleb Natapov wrote:
> > > On Thu, Jun 28, 2012 at 11:34:35AM +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Jun 28, 2012 at 09:34:31AM +0300, Gleb Natapov wrote:
> > > > > On Thu, Jun 28, 2012 at 01:31:29AM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jun 27, 2012 at 04:04:18PM -0600, Alex Williamson wrote:
> > > > > > > On Wed, 2012-06-27 at 18:26 +0300, Michael S. Tsirkin wrote:
> > > > > > > > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > > > > > > > @@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
> > > > > > > > >   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 
> > > > > > > > > irqfd->gsi, 0);
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > > +static void
> > > > > > > > > +irqfd_inject_level(struct work_struct *work)
> > > > > > > > > +{
> > > > > > > > > + struct _irqfd *irqfd = container_of(work, struct 
> > > > > > > > > _irqfd, inject);
> > > > > > > > > +
> > > > > > > > > + kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 
> > > > > > > > > 1);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  /*
> > > > > > > > >   * Race-free decouple logic (ordering is critical)
> > > > > > > > >   */
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Why is it safe to ignore return value here?
> > > > > > > > needs a comment.
> > > > > > > 
> > > > > > > Well, it seems like you and Gleb came to the conclusion that it's 
> > > > > > > safe,
> > > > > > > but I can really follow from the list thread.  Can you explain 
> > > > > > > and I'll
> > > > > > > add a comment?  Thanks,
> > > > > > > 
> > > > > > > Alex
> > > > > > 
> > > > > > We merely talked about edge interrupts.
> > > > > > 
> > > > > In fact it would have been nice to return -EBUSY when write() to level
> > > > > irqfd is coalesced.
> > > > 
> > > > Possibly nice but not really practical.
> > > > 
> > > What do you mean by that? Impossible to implement or not useful?
> > 
> > Impossible to implement and also does not match normal eventfd
> > semantics.
> > 
> Hmm, I remember we discussed using irqfd for level triggered interrupt ~2
> years ago and came to a conclusion that eventfd is a bad fit for it,
> was true than is true now. Not be able to detect coalescing will make
> irqfd level interrupts inferior to IRQ_LINE ioctl.

Why do we care about coalescing?  I've been worried we need to re-inject
based on the return value of kvm_set_irq(), but re-reading specs and
code, we always post the interrupt to the irr.  For device assignment we
don't really care if kvm_set_irq() managed to actually inject the
interrupt, we're happy as long as it eventually hits the vcpu.  Current
device assignment uses kvm_set_irq() without looking for coalescing.
KVM_LINE_STATUS is the only caller that does something with the return
value and neither apic nor ioapic code in qemu do anything with the
value other than update accounting stats.  What am I missing that makes
the return value worth knowing?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Will there be a qemu-kvm 1.1 version?

2012-06-29 Thread Marcelo Tosatti
On Fri, Jun 29, 2012 at 09:33:55AM +0200, Nico Prenzel wrote:
> I've seen on kvm devel list that there were serveral problems with the 
> qemu-kvm 1.1 release: http://www.spinics.net/lists/kvm/msg73755.html
> 
> Since these last posts, there seems to be no movement with that release.
> So, will there be a qemu-kvm 1.1 release?

Yes, early next week, hopefully Monday.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 5/5] KVM: Provide fast path for "rep ins" emulation if possible.

2012-06-29 Thread Marcelo Tosatti
On Tue, Jun 12, 2012 at 03:01:27PM +0300, Gleb Natapov wrote:
> "rep ins" emulation is going through emulator now. This is slow because
> emulator knows how to write back only one datum at a time. This patch
> provides fast path for the instruction in certain conditions. The
> conditions are: DF flag is not set, destination memory is RAM and single
> datum does not cross page boundary. If fast path code fails it falls
> back to emulation.
> 
> Signed-off-by: Gleb Natapov 
> ---
>  arch/x86/include/asm/kvm_host.h |6 ++
>  arch/x86/kvm/svm.c  |   20 +--
>  arch/x86/kvm/vmx.c  |   25 +--
>  arch/x86/kvm/x86.c  |  133 --
>  4 files changed, 165 insertions(+), 19 deletions(-)
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 7a41878..f3e7bb3 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1887,21 +1887,31 @@ static int io_interception(struct vcpu_svm *svm)
>  {
>   struct kvm_vcpu *vcpu = &svm->vcpu;
>   u32 io_info = svm->vmcb->control.exit_info_1; /* address size bug? */
> - int size, in, string;
> + int size, in, string, rep;
>   unsigned port;
>  
>   ++svm->vcpu.stat.io_exits;
>   string = (io_info & SVM_IOIO_STR_MASK) != 0;
> + rep = (io_info & SVM_IOIO_REP_MASK) != 0;
>   in = (io_info & SVM_IOIO_TYPE_MASK) != 0;
> - if (string || in)
> - return emulate_instruction(vcpu, 0) == EMULATE_DONE;
>  
>   port = io_info >> 16;
>   size = (io_info & SVM_IOIO_SIZE_MASK) >> SVM_IOIO_SIZE_SHIFT;
>   svm->next_rip = svm->vmcb->control.exit_info_2;
> - skip_emulated_instruction(&svm->vcpu);
>  
> - return kvm_fast_pio_out(vcpu, size, port);
> + if (!string && !in) {
> + skip_emulated_instruction(&svm->vcpu);
> + return kvm_fast_pio_out(vcpu, size, port);
> + } else if (string && in && rep) {

Is there a reason to restrict optimization to rep ? That is, 
it should be easy to extend to normal in?

> + kvm_x86_ops->skip_emulated_instruction(vcpu);
> + return EMULATE_DONE;
> + }
> + if (kvm_get_rflags(vcpu) & X86_EFLAGS_DF)
> + return EMULATE_FAIL;
> + if (ad_bytes_idx > 2)
> + return EMULATE_FAIL;
> +
> + ad_bytes = (u8[]){2, 4, 8}[ad_bytes_idx];
> +
> + rdi = kvm_address_mask(ad_bytes, rdi);
> +
> + count = (PAGE_SIZE - offset_in_page(rdi))/size;
> +
> + if (count == 0) /* 'in' crosses page boundry */
> + return EMULATE_FAIL;
> +
> + count = min(count, kvm_address_mask(ad_bytes, rcx));
> + 
> + r = kvm_linearize_address(vcpu, get_emulation_mode(vcpu),
> + rdi, VCPU_SREG_ES, count, true, false, ad_bytes,
> + &linear_addr);

kvm_linearize_address expects size parameter in bytes?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/7] memory: Flush coalesced MMIO on selected region access

2012-06-29 Thread Jan Kiszka
Instead of flushing pending coalesced MMIO requests on every vmexit,
this provides a mechanism to selectively flush when memory regions
related to the coalesced one are accessed. This first of all includes
the coalesced region itself but can also applied to other regions, e.g.
of the same device, by calling memory_region_set_flush_coalesced.

Signed-off-by: Jan Kiszka 
---

Changes in v3:
 - refuse to clear flush_coalesced_mmio for regions that have
   coalescing enabled

 memory.c |   24 
 memory.h |   26 ++
 2 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/memory.c b/memory.c
index aab4a31..7221c3c 100644
--- a/memory.c
+++ b/memory.c
@@ -311,6 +311,9 @@ static void memory_region_read_accessor(void *opaque,
 MemoryRegion *mr = opaque;
 uint64_t tmp;
 
+if (mr->flush_coalesced_mmio) {
+qemu_flush_coalesced_mmio_buffer();
+}
 tmp = mr->ops->read(mr->opaque, addr, size);
 *value |= (tmp & mask) << shift;
 }
@@ -325,6 +328,9 @@ static void memory_region_write_accessor(void *opaque,
 MemoryRegion *mr = opaque;
 uint64_t tmp;
 
+if (mr->flush_coalesced_mmio) {
+qemu_flush_coalesced_mmio_buffer();
+}
 tmp = (*value >> shift) & mask;
 mr->ops->write(mr->opaque, addr, tmp, size);
 }
@@ -826,6 +832,7 @@ void memory_region_init(MemoryRegion *mr,
 mr->dirty_log_mask = 0;
 mr->ioeventfd_nb = 0;
 mr->ioeventfds = NULL;
+mr->flush_coalesced_mmio = false;
 }
 
 static bool memory_region_access_valid(MemoryRegion *mr,
@@ -1176,12 +1183,16 @@ void memory_region_add_coalescing(MemoryRegion *mr,
 cmr->addr = addrrange_make(int128_make64(offset), int128_make64(size));
 QTAILQ_INSERT_TAIL(&mr->coalesced, cmr, link);
 memory_region_update_coalesced_range(mr);
+memory_region_set_flush_coalesced(mr);
 }
 
 void memory_region_clear_coalescing(MemoryRegion *mr)
 {
 CoalescedMemoryRange *cmr;
 
+qemu_flush_coalesced_mmio_buffer();
+mr->flush_coalesced_mmio = false;
+
 while (!QTAILQ_EMPTY(&mr->coalesced)) {
 cmr = QTAILQ_FIRST(&mr->coalesced);
 QTAILQ_REMOVE(&mr->coalesced, cmr, link);
@@ -1190,6 +1201,19 @@ void memory_region_clear_coalescing(MemoryRegion *mr)
 memory_region_update_coalesced_range(mr);
 }
 
+void memory_region_set_flush_coalesced(MemoryRegion *mr)
+{
+mr->flush_coalesced_mmio = true;
+}
+
+void memory_region_clear_flush_coalesced(MemoryRegion *mr)
+{
+qemu_flush_coalesced_mmio_buffer();
+if (QTAILQ_EMPTY(&mr->coalesced)) {
+mr->flush_coalesced_mmio = false;
+}
+}
+
 void memory_region_add_eventfd(MemoryRegion *mr,
target_phys_addr_t addr,
unsigned size,
diff --git a/memory.h b/memory.h
index 740c48e..77167d8 100644
--- a/memory.h
+++ b/memory.h
@@ -133,6 +133,7 @@ struct MemoryRegion {
 bool enabled;
 bool rom_device;
 bool warning_printed; /* For reservations */
+bool flush_coalesced_mmio;
 MemoryRegion *alias;
 target_phys_addr_t alias_offset;
 unsigned priority;
@@ -521,6 +522,31 @@ void memory_region_add_coalescing(MemoryRegion *mr,
 void memory_region_clear_coalescing(MemoryRegion *mr);
 
 /**
+ * memory_region_set_flush_coalesced: Enforce memory coalescing flush before
+ *accesses.
+ *
+ * Ensure that pending coalesced MMIO request are flushed before the memory
+ * region is accessed. This property is automatically enabled for all regions
+ * passed to memory_region_set_coalescing() and memory_region_add_coalescing().
+ *
+ * @mr: the memory region to be updated.
+ */
+void memory_region_set_flush_coalesced(MemoryRegion *mr);
+
+/**
+ * memory_region_clear_flush_coalesced: Disable memory coalescing flush before
+ *  accesses.
+ *
+ * Clear the automatic coalesced MMIO flushing enabled via
+ * memory_region_set_flush_coalesced. Note that this service has no effect on
+ * memory regions that have MMIO coalescing enabled for themselves. For them,
+ * automatic flushing will stop once coalescing is disabled.
+ *
+ * @mr: the memory region to be updated.
+ */
+void memory_region_clear_flush_coalesced(MemoryRegion *mr);
+
+/**
  * memory_region_add_eventfd: Request an eventfd to be triggered when a word
  *is written to a location.
  *
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/3] kvm: Sanitize KVM_IRQFD flags

2012-06-29 Thread Alex Williamson
We only know of one so far.

Signed-off-by: Alex Williamson 
---

 virt/kvm/eventfd.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index c307c24..7d7e2aa 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -340,6 +340,9 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
 int
 kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
 {
+   if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
+   return -EINVAL;
+
if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)
return kvm_irqfd_deassign(kvm, args);
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/3] kvm: Add missing KVM_IRQFD API documentation

2012-06-29 Thread Alex Williamson
Signed-off-by: Alex Williamson 
Acked-by: Michael S. Tsirkin 
---

 Documentation/virtual/kvm/api.txt |   16 
 1 file changed, 16 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 310fe50..100acde 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1965,6 +1965,22 @@ return the hash table order in the parameter.  (If the 
guest is using
 the virtualized real-mode area (VRMA) facility, the kernel will
 re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
 
+4.76 KVM_IRQFD
+
+Capability: KVM_CAP_IRQFD
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_irqfd (in)
+Returns: 0 on success, -1 on error
+
+Allows setting an eventfd to directly trigger a guest interrupt.
+kvm_irqfd.fd specifies the file descriptor to use as the eventfd and
+kvm_irqfd.gsi specifies the irqchip pin toggled by this event.  When
+an event is triggered on the eventfd, an interrupt is injected into
+the guest using the specified gsi pin.  The irqfd is removed using
+the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
+and kvm_irqfd.gsi.
+
 
 5. The kvm_run structure
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/3] kvm: Pass kvm_irqfd to functions

2012-06-29 Thread Alex Williamson
Prune this down to just the struct kvm_irqfd so we can avoid
changing function definition for every flag or field we use.

Signed-off-by: Alex Williamson 
Acked-by: Cornelia Huck 
---

 include/linux/kvm_host.h |4 ++--
 virt/kvm/eventfd.c   |   20 ++--
 virt/kvm/kvm_main.c  |2 +-
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 27ac8a4..ae3b426 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -824,7 +824,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {}
 #ifdef CONFIG_HAVE_KVM_EVENTFD
 
 void kvm_eventfd_init(struct kvm *kvm);
-int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
+int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
 void kvm_irqfd_release(struct kvm *kvm);
 void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
@@ -833,7 +833,7 @@ int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
*args);
 
 static inline void kvm_eventfd_init(struct kvm *kvm) {}
 
-static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
+static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
 {
return -EINVAL;
 }
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index f59c1e8..c307c24 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -198,7 +198,7 @@ static void irqfd_update(struct kvm *kvm, struct _irqfd 
*irqfd,
 }
 
 static int
-kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
+kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 {
struct kvm_irq_routing_table *irq_rt;
struct _irqfd *irqfd, *tmp;
@@ -212,12 +212,12 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
return -ENOMEM;
 
irqfd->kvm = kvm;
-   irqfd->gsi = gsi;
+   irqfd->gsi = args->gsi;
INIT_LIST_HEAD(&irqfd->list);
INIT_WORK(&irqfd->inject, irqfd_inject);
INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
 
-   file = eventfd_fget(fd);
+   file = eventfd_fget(args->fd);
if (IS_ERR(file)) {
ret = PTR_ERR(file);
goto fail;
@@ -298,19 +298,19 @@ kvm_eventfd_init(struct kvm *kvm)
  * shutdown any irqfd's that match fd+gsi
  */
 static int
-kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
+kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
 {
struct _irqfd *irqfd, *tmp;
struct eventfd_ctx *eventfd;
 
-   eventfd = eventfd_ctx_fdget(fd);
+   eventfd = eventfd_ctx_fdget(args->fd);
if (IS_ERR(eventfd))
return PTR_ERR(eventfd);
 
spin_lock_irq(&kvm->irqfds.lock);
 
list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds.items, list) {
-   if (irqfd->eventfd == eventfd && irqfd->gsi == gsi) {
+   if (irqfd->eventfd == eventfd && irqfd->gsi == args->gsi) {
/*
 * This rcu_assign_pointer is needed for when
 * another thread calls kvm_irq_routing_update before
@@ -338,12 +338,12 @@ kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
 }
 
 int
-kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
+kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
 {
-   if (flags & KVM_IRQFD_FLAG_DEASSIGN)
-   return kvm_irqfd_deassign(kvm, fd, gsi);
+   if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)
+   return kvm_irqfd_deassign(kvm, args);
 
-   return kvm_irqfd_assign(kvm, fd, gsi);
+   return kvm_irqfd_assign(kvm, args);
 }
 
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 02cb440..b4ad14cc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2059,7 +2059,7 @@ static long kvm_vm_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(&data, argp, sizeof data))
goto out;
-   r = kvm_irqfd(kvm, data.fd, data.gsi, data.flags);
+   r = kvm_irqfd(kvm, &data);
break;
}
case KVM_IOEVENTFD: {

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/3] kvm: KVM_IRQFD cleanup, docs, sanitize flags

2012-06-29 Thread Alex Williamson
Before we start fiddling with what we can and can't add to KVM_IRQFD
we need to figure out if anyone has been sloppy in their use of the
ioctl flags.  This series has a minor cleanup to pass the struct
kvm_irqfd to seup functions rather than individual parameters, making
it more consistent with ioeventfd, adds API documentation for this
ioctl, and sanitizes the flags.  If anyone screams, we may have to
revert this last patch.  Thanks,

Alex

---

Alex Williamson (3):
  kvm: Sanitize KVM_IRQFD flags
  kvm: Add missing KVM_IRQFD API documentation
  kvm: Pass kvm_irqfd to functions


 Documentation/virtual/kvm/api.txt |   16 
 include/linux/kvm_host.h  |4 ++--
 virt/kvm/eventfd.c|   23 +--
 virt/kvm/kvm_main.c   |2 +-
 4 files changed, 32 insertions(+), 13 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 6/6] kvm: Level IRQ de-assert for KVM_IRQFD

2012-06-29 Thread Alex Williamson
On Thu, 2012-06-28 at 15:59 +0300, Avi Kivity wrote:
> On 06/27/2012 08:10 AM, Alex Williamson wrote:
> > This is an alternate level irqfd de-assert mode that's potentially
> > useful for emulated drivers.  It's included here to show how easy it
> > is to implement with the new level irqfd and eoifd support.  It's
> > possible this mode might also prove interesting for device-assignment
> > where we inject via level irqfd, receive an EOI (w/o de-assert), and
> > use the level de-assert irqfd here.
> 
> This use case is racy.  The guest driver will have shut down the
> interrupt before EOI, but with what you describe, it will fire again
> until the eoifd/deassertfd sequence completes.

Hmm, that's a good point.  We'll continue asserting the interrupt in the
indeterminate gap between eoifd and de-assert-irqfd which could fire
enough times for the guest to disable it.  Oh well.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/6] kvm: Pass kvm_irqfd to functions

2012-06-29 Thread Alex Williamson
On Thu, 2012-06-28 at 11:38 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 27, 2012 at 04:24:30PM +0200, Cornelia Huck wrote:
> > On Tue, 26 Jun 2012 23:09:04 -0600
> > Alex Williamson  wrote:
> > 
> > > Prune this down to just the struct kvm_irqfd so we can avoid
> > > changing function definition for every flag or field we use.
> > > 
> > > Signed-off-by: Alex Williamson 
> > 
> > I'm currently trying to find a way to make irqfd workable for s390
> > which will likely include using a new field in kvm_irqfd, so I'd like
> > to have this change (and I also think it makes the code nicer to read).
> > So:
> > 
> > Acked-by: Cornelia Huck 
> 
> Unfortunately it looks like we are not sanitizing kvm_irqfd
> at all so we won't be able to use the padding :(
> We'll need a new ioctl instead.

I think you're jumping the gun on this decision.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

2012-06-29 Thread Alex Williamson
On Thu, 2012-06-28 at 11:29 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 27, 2012 at 09:52:52PM -0600, Alex Williamson wrote:
> > On Thu, 2012-06-28 at 01:28 +0300, Michael S. Tsirkin wrote:
> > > On Wed, Jun 27, 2012 at 03:28:19PM -0600, Alex Williamson wrote:
> > > > On Thu, 2012-06-28 at 00:14 +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, Jun 27, 2012 at 02:59:09PM -0600, Alex Williamson wrote:
> > > > > > On Wed, 2012-06-27 at 12:51 +0300, Michael S. Tsirkin wrote:
> > > > > > > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > > > > > > In order to inject an interrupt from an external source using an
> > > > > > > > irqfd, we need to allocate a new irq_source_id.  This allows us 
> > > > > > > > to
> > > > > > > > assert and (later) de-assert an interrupt line independently 
> > > > > > > > from
> > > > > > > > users of KVM_IRQ_LINE and avoid lost interrupts.
> > > > > > > > 
> > > > > > > > We also add what may appear like a bit of excessive 
> > > > > > > > infrastructure
> > > > > > > > around an object for storing this irq_source_id.  However, 
> > > > > > > > notice
> > > > > > > > that we only provide a way to assert the interrupt here.  A 
> > > > > > > > follow-on
> > > > > > > > interface will make use of the same irq_source_id to allow 
> > > > > > > > de-assert.
> > > > > > > > 
> > > > > > > > Signed-off-by: Alex Williamson 
> > > > > > > > ---
> > > > > > > > 
> > > > > > > >  Documentation/virtual/kvm/api.txt |5 ++
> > > > > > > >  arch/x86/kvm/x86.c|1 
> > > > > > > >  include/linux/kvm.h   |3 +
> > > > > > > >  virt/kvm/eventfd.c|   95 
> > > > > > > > +++--
> > > > > > > >  4 files changed, 99 insertions(+), 5 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/Documentation/virtual/kvm/api.txt 
> > > > > > > > b/Documentation/virtual/kvm/api.txt
> > > > > > > > index ea9edce..b216709 100644
> > > > > > > > --- a/Documentation/virtual/kvm/api.txt
> > > > > > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > > > > > @@ -1981,6 +1981,11 @@ the guest using the specified gsi pin.  
> > > > > > > > The irqfd is removed using
> > > > > > > >  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> > > > > > > >  and kvm_irqfd.gsi.
> > > > > > > >  
> > > > > > > > +With KVM_IRQFD_FLAG_LEVEL KVM_IRQFD allocates a new IRQ source 
> > > > > > > > ID for
> > > > > > > > +the requested irqfd.  This is necessary to share level 
> > > > > > > > triggered
> > > > > > > > +interrupts with those injected through KVM_IRQ_LINE.  IRQFDs 
> > > > > > > > created
> > > > > > > > +with KVM_IRQFD_FLAG_LEVEL must also set this flag when 
> > > > > > > > de-assiging.
> > > > > > > > +KVM_IRQFD_FLAG_LEVEL support is indicated by 
> > > > > > > > KVM_CAP_IRQFD_LEVEL.
> > > > > > > 
> > > > > > > Note that if my patch removing auto-deassert gets accepted,
> > > > > > > this is not needed at all: we can just look at the GSI
> > > > > > > to see if it's level or edge.
> > > > > > 
> > > > > > I'm not sure this is a good idea.  I know from vfio that I'm 
> > > > > > injecting a
> > > > > > level interrupt regardless of how the guest has the pic/ioapic
> > > > > > programmed at the time I'm calling this ioctl.  Peeking across 
> > > > > > address
> > > > > > spaces to get to the right pin on the right pic/ioapic and see how 
> > > > > > it's
> > > > > > currently programmed seems fragile.  Thanks,
> > > > > > 
> > > > > > Alex
> > > > > 
> > > > > Fragile? If you set eventfd as LEVEL but GSI is really edge then
> > > > > it all explodes, right? So why give users the option to shoot
> > > > > themselves in the foot?
> > > > 
> > > > If the guest has the ioapic rte set to edge at the time I call KVM_IRQFD
> > > > to register my level interrupt then it all explodes, right?  I'd rather
> > > > let the user shoot themselves than play Russian roulette with the guest.
> > > > Am I misunderstanding what you mean by looking that the GSI to see if
> > > > it's level or edge?
> > > 
> > > Not sure.
> > > I simply mean this: if eventfd is bound to irqfd, set level from irqfd
> > > and clear from eventfd ack notifier.
> > 
> > Are you simply saying assert (kvm_set_irq(,,,1)) from irqfd trigger and
> > de-assert (kvm_set_irq(,,,0)) from eventfd ack notifier (aka KVM_EOIFD)?
> 
> Yes.
> 
> > > There's no need for a special IRQ_LEVEL for this.
> > 
> > That ignores the whole problem of when do we need to allocate a new
> > irq_source_id and when do we inject using KVM_USERSPACE_IRQ_SOURCE_ID.
> > We've already discussed that a level triggered, externally fired
> > interrupt must use a separate source ID from Qemu userspace.  Therefore
> > when you say "look at the GSI to see if it's level or edge", I assume
> > you mean trace the gsi back to the pic/ioapic pin and look at the
> > trigger mode.  That trigger mode is configured by the guest, so that
> > means that at the point in time when we call KVM_

Re: [PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

2012-06-29 Thread Alex Williamson
On Fri, 2012-06-29 at 09:09 -0600, Alex Williamson wrote:
> On Thu, 2012-06-28 at 22:29 +0300, Michael S. Tsirkin wrote:
> > On Tue, Jun 26, 2012 at 11:10:08PM -0600, Alex Williamson wrote:
> > > diff --git a/Documentation/virtual/kvm/api.txt 
> > > b/Documentation/virtual/kvm/api.txt
> > > index b216709..87a2558 100644
> > > --- a/Documentation/virtual/kvm/api.txt
> > > +++ b/Documentation/virtual/kvm/api.txt
> > > @@ -1987,6 +1987,30 @@ interrupts with those injected through 
> > > KVM_IRQ_LINE.  IRQFDs created
> > >  with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > >  KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > >  
> > > +4.77 KVM_EOIFD
> > > +
> > > +Capability: KVM_CAP_EOIFD
> > > +Architectures: x86
> > > +Type: vm ioctl
> > > +Parameters: struct kvm_eoifd (in)
> > > +Returns: 0 on success, -1 on error
> > > +
> > > +KVM_EOIFD allows userspace to receive EOI notification through an
> > > +eventfd for level triggered irqchip interrupts.  Behavior for edge
> > > +triggered interrupts is undefined.  kvm_eoifd.fd specifies the eventfd
> > > +used for notification and kvm_eoifd.gsi specifies the irchip pin,
> > > +similar to KVM_IRQFD.  KVM_EOIFD_FLAG_DEASSIGN is used to deassign
> > > +a previously enabled eoifd and should also set fd and gsi to match.
> > > +
> > > +The KVM_EOIFD_FLAG_LEVEL_IRQFD flag indicates that the EOI is for
> > > +a level triggered EOI and the kvm_eoifd structure includes
> > > +kvm_eoifd.irqfd, which must be previously configured using KVM_IRQFD
> > > +with the KVM_IRQFD_FLAG_LEVEL flag.  This allows both EOI notification
> > > +through kvm_eoifd.fd as well as automatically de-asserting level
> > > +irqfds on EOI.  Both KVM_EOIFD_FLAG_DEASSIGN and
> > > +KVM_EOIFD_FLAG_LEVEL_IRQFD should be used to de-assign an eoifd
> > > +initially setup with KVM_EOIFD_FLAG_LEVEL_IRQFD.
> > > +
> > 
> > OK so thinking about this some more, does the below makes sense:
> > it is enough to have LEVEL either in IRQFD or in EOIFD but not both.
> > I weakly prefer it in EOIFD: when you bind it to irqfd you
> > also assign a source id to that irqfd.
> > 
> > We also can make is a bit clearer: rename KVM_EOIFD_FLAG_LEVEL_IRQFD
> > to KVM_EOIFD_FLAG_LEVEL_CLEAR which makes it explicit that we clear
> > on EOI and that it is for a level interrupt. The fact that it needs
> > an irqfd is less important IMHO.
> > Make this flag mandatory for now, we'll see later how to handle
> > vcpu filtering and edge.
> > 
> > How does it sound?
> 
> Honestly, I don't like it at all.  We're designing the interface
> assuming that we can't modify KVM_IRQFD flags.  Let's do as Avi suggests
> and test whether KVM_IRQFD is a beyond broken first.  Given that we do
> make use of 1 bit in the flags for de-assert, I'm optimistic that nobody

s/de-assert/de-assign/

> is leaving random garbage in the rest of it.
> 
> Your idea may work, but I really don't like that KVM_EOIFD can reach
> across and change the behavior of KVM_IRQFD.  That's not an intuitive
> interface.  The LEVEL_IRQFD flag is specifying that the struct kvm_eoifd
> includes an irqfd for a level triggered interrupt.  AFAICT, there's no
> reason to specify that except to clear it since the EOI notification is
> not per source ID.  However, it's still valid to ask for EOI
> notification without binding it to a level irqfd, in which case it
> really is just EOI notification.  Thanks,
> 
> Alex
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

2012-06-29 Thread Alex Williamson
On Thu, 2012-06-28 at 22:29 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 26, 2012 at 11:10:08PM -0600, Alex Williamson wrote:
> > diff --git a/Documentation/virtual/kvm/api.txt 
> > b/Documentation/virtual/kvm/api.txt
> > index b216709..87a2558 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -1987,6 +1987,30 @@ interrupts with those injected through KVM_IRQ_LINE. 
> >  IRQFDs created
> >  with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> >  KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> >  
> > +4.77 KVM_EOIFD
> > +
> > +Capability: KVM_CAP_EOIFD
> > +Architectures: x86
> > +Type: vm ioctl
> > +Parameters: struct kvm_eoifd (in)
> > +Returns: 0 on success, -1 on error
> > +
> > +KVM_EOIFD allows userspace to receive EOI notification through an
> > +eventfd for level triggered irqchip interrupts.  Behavior for edge
> > +triggered interrupts is undefined.  kvm_eoifd.fd specifies the eventfd
> > +used for notification and kvm_eoifd.gsi specifies the irchip pin,
> > +similar to KVM_IRQFD.  KVM_EOIFD_FLAG_DEASSIGN is used to deassign
> > +a previously enabled eoifd and should also set fd and gsi to match.
> > +
> > +The KVM_EOIFD_FLAG_LEVEL_IRQFD flag indicates that the EOI is for
> > +a level triggered EOI and the kvm_eoifd structure includes
> > +kvm_eoifd.irqfd, which must be previously configured using KVM_IRQFD
> > +with the KVM_IRQFD_FLAG_LEVEL flag.  This allows both EOI notification
> > +through kvm_eoifd.fd as well as automatically de-asserting level
> > +irqfds on EOI.  Both KVM_EOIFD_FLAG_DEASSIGN and
> > +KVM_EOIFD_FLAG_LEVEL_IRQFD should be used to de-assign an eoifd
> > +initially setup with KVM_EOIFD_FLAG_LEVEL_IRQFD.
> > +
> 
> OK so thinking about this some more, does the below makes sense:
> it is enough to have LEVEL either in IRQFD or in EOIFD but not both.
> I weakly prefer it in EOIFD: when you bind it to irqfd you
> also assign a source id to that irqfd.
> 
> We also can make is a bit clearer: rename KVM_EOIFD_FLAG_LEVEL_IRQFD
> to KVM_EOIFD_FLAG_LEVEL_CLEAR which makes it explicit that we clear
> on EOI and that it is for a level interrupt. The fact that it needs
> an irqfd is less important IMHO.
> Make this flag mandatory for now, we'll see later how to handle
> vcpu filtering and edge.
> 
> How does it sound?

Honestly, I don't like it at all.  We're designing the interface
assuming that we can't modify KVM_IRQFD flags.  Let's do as Avi suggests
and test whether KVM_IRQFD is a beyond broken first.  Given that we do
make use of 1 bit in the flags for de-assert, I'm optimistic that nobody
is leaving random garbage in the rest of it.

Your idea may work, but I really don't like that KVM_EOIFD can reach
across and change the behavior of KVM_IRQFD.  That's not an intuitive
interface.  The LEVEL_IRQFD flag is specifying that the struct kvm_eoifd
includes an irqfd for a level triggered interrupt.  AFAICT, there's no
reason to specify that except to clear it since the EOI notification is
not per source ID.  However, it's still valid to ask for EOI
notification without binding it to a level irqfd, in which case it
really is just EOI notification.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/18] KVM: x86: CPU isolation and direct interrupts handling by guests

2012-06-29 Thread Avi Kivity
On 06/29/2012 12:25 PM, Tomoki Sekiyama wrote:
> Hi, thanks for your comments.
>
> On 2012/06/29 2:34, Avi Kivity wrote:
> > On 06/28/2012 08:26 PM, Jan Kiszka wrote:
> >>> This is both impressive and scary.  What is the target scenario here?
> >>> Partitioning?  I don't see this working for generic consolidation.
> >>
> >> From my POV, partitioning - including hard realtime partitions - would
> >> provide some use cases. But, as far as I saw, there are still major
> >> restrictions in this approach, e.g. that you can't return to userspace
> >> on the slave core. Or even execute the in-kernel device models on that 
> >> core.
>
> Exactly this is for partitioning that requires bare-metal performance
> with low latency and realtime.

It's hard for me to evaluate how large that segment is.  Since the
patchset is so intrusive, it needs a large potential user set to
justify, or a large reduction in complexity, or both.

>  I think it is also useful for workload
> like HPC with MPI, that is CPU intensive and that needs low latency.

I keep hearing about people virtualizing these types of workloads, but I
haven't yet understood why.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: x86: Implement PCID/INVPCID for guests with EPT

2012-06-29 Thread Avi Kivity
On 06/29/2012 05:37 AM, Mao, Junjie wrote:
> > 
> > >
> > >  static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2
> > > *entry) @@ -6610,6 +6641,9 @@ static void prepare_vmcs02(struct
> > kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
> > > page_to_phys(vmx->nested.apic_access_page));
> > >   }
> > >
> > > + /* Explicitly disable INVPCID until PCID for L2 guest is 
> > > supported */
> > > + exec_control &= ~SECONDARY_EXEC_ENABLE_INVPCID;
> > > +
> > 
> > We can't just disable it without the guest knowing.  If we don't expose
> > INCPCID through the MSR, then we should fail VMKLAUNCH or VMRESUME is
> > this bit is set.
>
> I think this means I can replace the code here with a check in 
> nested_vmx_run. Do I understand correctly?

Correct, but the check already exists:
if (!vmx_control_verify(vmcs12->cpu_based_vm_exec_control,
  nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high) ||
!vmx_control_verify(vmcs12->secondary_vm_exec_control,
  nested_vmx_secondary_ctls_low, nested_vmx_secondary_ctls_high) ||
!vmx_control_verify(vmcs12->pin_based_vm_exec_control,
  nested_vmx_pinbased_ctls_low, nested_vmx_pinbased_ctls_high) ||
!vmx_control_verify(vmcs12->vm_exit_controls,
  nested_vmx_exit_ctls_low, nested_vmx_exit_ctls_high) ||
!vmx_control_verify(vmcs12->vm_entry_controls,
  nested_vmx_entry_ctls_low, nested_vmx_entry_ctls_high))
{
nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
return 1;
}

So all that is needed is to initializr nested_vmx_entry_ctls_high properly.

> > 
> > >   vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
> > >   }
> > >
> > > @@ -528,6 +528,10 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned
> > long cr0)
> > >   return 1;
> > >   }
> > >
> > > + if ((old_cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PG) &&
> > > + kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE))
> > > + return 1;
> > 
> > Why check old_cr0?
>
> MOV to CR0 causes a general-protection exception if it would clear CR0.PG to 
> 0 while CR4.PCIDE = 1. This check reflects the restriction.

It should not be possible to have cr0.pg=0 and cr4.pcide=1 in the first
place.  We are guaranteed that old_cr0.pg=1.

>
> > 
> > >
> > > + if ((cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE)) {
> > > + if (!guest_cpuid_has_pcid(vcpu))
> > > + return 1;
> > > +
> > > + /* PCID can not be enabled when cr3[11:0]!=000H or EFER.LMA=0 */
> > > + if ((kvm_read_cr3(vcpu) & X86_CR3_PCID_MASK)
> > || !is_long_mode(vcpu))
> > > + return 1;
> > > + }
> > > +
> > >   if (kvm_x86_ops->set_cr4(vcpu, cr4))
> > >   return 1;
> > >
> > > - if ((cr4 ^ old_cr4) & pdptr_bits)
> > > + if (((cr4 ^ old_cr4) & pdptr_bits) ||
> > > + (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
> > >   kvm_mmu_reset_context(vcpu);
> > 
> > 
> > You can do
> > 
> > 
> >   if ((cr4 ^ old_cr4) & (pdptr_bits | X86_CR4_PCIDE))
> >  ...
>
> TLB is not invalidated when CR4.PCIDE is changed from 0 to 1. 

Ok.



-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Security for Virtualization

2012-06-29 Thread 王永博
-- Forwarded message --
From: 王永博 
Date: 2012/6/29
Subject: Security for Virtualization
To: KVM devel mailing list 


Hi!

We are using KVM to do something and have a question.

Several security companies using the API provided by Vmware to develop
 virtualization platform security software。Like Kaspersky Security for
Virtualization ,TrendMicro.

This API include VMSafe ,VMWare vShield ,Vprobe . we can find them
from here: http://www.vmware.com/support/pubs/sdk_pubs.html

Now We want to do zhe same thing in KVM platform. Are there some APIs
or tools that can help us ?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 18/18] x86: request TLB flush to slave CPU using NMI

2012-06-29 Thread Tomoki Sekiyama
On 2012/06/29 1:38, Avi Kivity wrote:
> On 06/28/2012 09:08 AM, Tomoki Sekiyama wrote:
>> For slave CPUs, it is inapropriate to request TLB flush using IPI.
>> because the IPI may be sent to a KVM guest when the slave CPU is running
>> the guest with direct interrupt routing.
>>
>> Instead, it registers a TLB flush request in per-cpu bitmask and send a NMI
>> to interrupt execution of the guest. Then, NMI handler will check the
>> requests and handles the requests.
> 
> 
> Currently x86's get_user_pages_fast() depends on TLB flushes being held
> up by local_irq_disable().  With this patch, this is no longer true and
> get_user_pages_fast() can race with page table freeing. There are
> patches from Peter Zijlstra to remove this dependency though. 

Thank you for the information. I will check his patches.

> NMIs are
> still slow and fragile when compared to normal interrupts, so this patch
> is somewhat problematic.

OK, always sending NMIs is actually problematic. I should check the
slave core state and send NMIs only when slave guest is
running and NMI is really needed.

Thanks,
-- 
Tomoki Sekiyama 
Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 16/18] KVM: add kvm_arch_vcpu_prevent_run to prevent VM ENTER when NMI is received

2012-06-29 Thread Tomoki Sekiyama
On 2012/06/29 1:48, Avi Kivity wrote:
> On 06/28/2012 09:08 AM, Tomoki Sekiyama wrote:
>> Since NMI can not be disabled around VM enter, there is a race between
>> receiving NMI to kick a guest and entering the guests on slave CPUs.If the
>> NMI is received just before entering VM, after the NMI handler is invoked,
>> it continues entering the guest and the effect of the NMI will be lost.
>>
>> This patch adds kvm_arch_vcpu_prevent_run(), which causes VM exit right
>> after VM enter. The NMI handler uses this to ensure the execution of the
>> guest is cancelled after NMI.
>>
>>  
>> +/*
>> + * Make VMRESUME fail using preemption timer with timer value = 0.
>> + * On processors that doesn't support preemption timer, VMRESUME will fail
>> + * by internal error.
>> + */
>> +static void vmx_prevent_run(struct kvm_vcpu *vcpu, int prevent)
>> +{
>> +if (prevent)
>> +vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
>> +  PIN_BASED_PREEMPTION_TIMER);
>> +else
>> +vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,
>> +PIN_BASED_PREEMPTION_TIMER);
>> +}
> 
> This may interrupt another RMW sequence, which will then overwrite the
> control.  So it needs to be called only if inside the entry sequence
> (otherwise can just set a KVM_REQ_IMMEDIATE_EXIT in vcpu->requests).
> 

I agree. I will add the check whether it is in the entry sequence.

Thanks,
-- 
Tomoki Sekiyama 
Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 06/18] KVM: Add facility to run guests on slave CPUs

2012-06-29 Thread Tomoki Sekiyama
On 2012/06/29 2:02, Avi Kivity wrote:
> On 06/28/2012 09:07 AM, Tomoki Sekiyama wrote:
>> Add path to migrate execution of vcpu_enter_guest to a slave CPU when
>> vcpu->arch.slave_cpu is set.
>>
>> After moving to the slave CPU, it goes back to the online CPU when the
>> guest is exited by reasons that cannot be handled by the slave CPU only
>> (e.g. handling async page faults).
> 
> What about, say, instruction emulation?  It may need to touch guest
> memory, which cannot be done from interrupt disabled context.

Hmm, it seems difficult to resolve this in interrupt disabled context.

Within partitioning scenario, it might be possible to give up execution
if the memory is not pinned down, but I'm not sure that is acceptable.

It looks better to make the slave core interruptible and sleepable.

>> +
>> +static  int vcpu_post_run(struct kvm_vcpu *vcpu, struct task_struct *task,
>> + int *can_complete_async_pf)
>> +{
>> +int r = LOOP_ONLINE;
>> +
>> +clear_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests);
>> +if (kvm_cpu_has_pending_timer(vcpu))
>> +kvm_inject_pending_timer_irqs(vcpu);
>> +
>> +if (dm_request_for_irq_injection(vcpu)) {
>> +r = -EINTR;
>> +vcpu->run->exit_reason = KVM_EXIT_INTR;
>> +++vcpu->stat.request_irq_exits;
>> +}
>> +
>> +if (can_complete_async_pf) {
>> +*can_complete_async_pf = kvm_can_complete_async_pf(vcpu);
>> +if (r == LOOP_ONLINE)
>> +r = *can_complete_async_pf ? LOOP_APF : LOOP_SLAVE;
>> +} else
>> +kvm_check_async_pf_completion(vcpu);
>> +
>> +if (signal_pending(task)) {
>> +r = -EINTR;
>> +vcpu->run->exit_reason = KVM_EXIT_INTR;
>> +++vcpu->stat.signal_exits;
>> +}
> 
> Isn't this racy?  The signal can come right after this.

Oops, this is racy here.
However, this is resolved if later patch [RFC PATCH 16/18] is applied.
I will reorder patches.

Signals will wake up vCPU user thread (sleeping in
vcpu_enter_guest_slave > wait_for_completion_interruptible in an online CPU)
and the thread kicks vcpu(by NMI). Then, kvm_arch_vcpu_prevent_run is
called in NMI handler to fail the VM enter again.

(But kvm_arch_vcpu_prevent_run still has another problem, as you replied.)

Thanks,
-- 
Tomoki Sekiyama 
Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/18] KVM: x86: CPU isolation and direct interrupts handling by guests

2012-06-29 Thread Tomoki Sekiyama
Hi, thanks for your comments.

On 2012/06/29 2:34, Avi Kivity wrote:
> On 06/28/2012 08:26 PM, Jan Kiszka wrote:
>>> This is both impressive and scary.  What is the target scenario here?
>>> Partitioning?  I don't see this working for generic consolidation.
>>
>> From my POV, partitioning - including hard realtime partitions - would
>> provide some use cases. But, as far as I saw, there are still major
>> restrictions in this approach, e.g. that you can't return to userspace
>> on the slave core. Or even execute the in-kernel device models on that core.

Exactly this is for partitioning that requires bare-metal performance
with low latency and realtime. I think it is also useful for workload
like HPC with MPI, that is CPU intensive and that needs low latency.

>> I think we need something based on the no-hz work on the long run, ie.
>> the ability to run a single VCPU thread of the userland hypervisor on a
>> single core with zero rescheduling and unrelated interruptions - as far
>> as the guest load scenario allows this (we have some here).

With no-hz approach, we can ease some problems such as accessing userspace
memory from interrupt disabled context. Still we need IRQ vector remappping
or something like para-virtualized vector assignment for IRQs to reduce VM
exit on interrupts handling.

> What we can do perhaps is switch from direct mode to indirect mode on
> exit.  Instead of running with interrupts disabled, enable interrupts
> and make sure they are forwarded to the guest on the next entry.

Already with current implementation, the slave host kernel receives
interrupts while the guest execution is stopped for handling events like
qemu devices emulation on online CPUs. In that case, the interrupts are
forwarded to the guest as vIRQs.
I will reconsider about enabling interrupts on slave CPU for accessing
userspace memory and so on.

>> Well, and we need proper hardware support for direct IRQ injection on x86...

I really hope this feature ...

> Hardware support always helps, but it always seems to come after the
> software support is in place and needs to be supported forever.

Thanks,
-- 
Tomoki Sekiyama 
Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Will there be a qemu-kvm 1.1 version?

2012-06-29 Thread Nico Prenzel
I've seen on kvm devel list that there were serveral problems with the qemu-kvm 
1.1 release: http://www.spinics.net/lists/kvm/msg73755.html

Since these last posts, there seems to be no movement with that release.
So, will there be a qemu-kvm 1.1 release?


Thanks.

NicoP.--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html