RE: [PATCH v7 3/3] x86, apicv: add virtual x2apic support

2012-12-25 Thread Zhang, Yang Z
Gleb Natapov wrote on 2012-12-25:
 On Tue, Dec 25, 2012 at 07:46:53AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2012-12-25:
 On Tue, Dec 25, 2012 at 07:25:15AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2012-12-25:
 On Tue, Dec 25, 2012 at 06:42:59AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2012-12-25:
 On Mon, Dec 24, 2012 at 11:53:37PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2012-12-24:
 On Mon, Dec 24, 2012 at 02:35:35AM +, Zhang, Yang Z wrote:
 Zhang, Yang Z wrote on 2012-12-24:
 Gleb Natapov wrote on 2012-12-20:
 On Mon, Dec 17, 2012 at 01:30:50PM +0800, Yang Zhang wrote:
 basically to benefit from apicv, we need clear MSR bitmap for
 corresponding x2apic MSRs:
 0x800 - 0x8ff: no read intercept for apicv register
 virtualization TPR,EOI,SELF-IPI: no write intercept for
 virtual interrupt
 delivery
 We do not set Virtualize x2APIC mode bit in secondary
 execution control. If I read the spec correctly without that
 those MSR read/writes will go straight to physical local APIC.
 Right. Now it cannot get benefit, but we may enable it in
 future and then we can benefit from it.
 Without enabling it you cannot disable MSR intercept for x2apic
 MSRs.
 
 how about to add the following check:
 if (apicv_enabled  virtual_x2apic_enabled)
  clear_msr();
 
 I do not understand what do you mean here.
 In this patch, it will clear MSR bitmap(0x800 -0x8ff) when apicv
 enabled.
 As
 you
 said, since kvm doesn't set virtualize x2apic mode, APIC register
 virtualization never take effect. So we need to clear MSR bitmap only
 when apicv enabled and virtualize x2apic mode set.
 
 But currently it is never set.
 So you think the third patch is not necessary currently? Unless we
 enabled virtualize x2apic mode.
 
 Without third patch vid will not work properly if a guest is in x2apic
 mode. Actually second and third patches need to be reordered to not have
 a windows where x2apic is broken. The problem is that this patch itself
 is buggy since it does not set virtualize x2apic mode flag. It should
 set the flag if vid is enabled and if the flag cannot be set vid should
 be forced off.
 In what conditions this flag cannot be set? I think the only case is that 
 KVM
 doesn't expose the x2apic capability to guest, if this is true, the
 guest will never use x2apic and we still can use vid.
 
 We can indeed set virtualize x2apic mode unconditionally since it does
 not take any effect if x2apic MSRs are intercepted.
 No. Since Virtual APIC access must be cleared if virtualize x2apic mode 
 is set,
 and if guest still use xAPIC, then there should be lots of ept violations for 
 apic
 access emulation. This will hurt performance.
 Stupid HW, why this pointless limitation? Can you point me where SDM says 
 that?
Vol 3, 26.2.1.1

 We should only set virtualize x2apic mode when guest really uses
 x2apic(guest set bit 11 of APIC_BASE_MSR).
 
 Looks like SDM force us to.
 
 --
   Gleb.


Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] trace-cmd: fix kvm_mmu_prepare_zap_page even name and kvm_mmu_get_page event output in kvm plugin

2012-12-25 Thread Gleb Natapov
kvm_mmu_zap_page even was renamed to kvm_mmu_prepare_zap_page.
Print out created field for kvm_mmu_get_page event.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/plugin_kvm.c b/plugin_kvm.c
index 55812ef..adc5694 100644
--- a/plugin_kvm.c
+++ b/plugin_kvm.c
@@ -382,7 +382,7 @@ static int kvm_mmu_print_role(struct trace_seq *s, struct 
pevent_record *record,
} else
trace_seq_printf(s, WORD: %08x, role.word);
 
-   pevent_print_num_field(s,  root %u,  event,
+   pevent_print_num_field(s,  root %u ,  event,
   root_count, record, 1);
 
if (pevent_get_field_val(s, event, unsync, record, val, 1)  0)
@@ -397,6 +397,11 @@ static int kvm_mmu_get_page_handler(struct trace_seq *s, 
struct pevent_record *r
 {
unsigned long long val;
 
+   if (pevent_get_field_val(s, event, created, record, val, 1)  0)
+   return -1;
+
+   trace_seq_printf(s, %s , val ? new : existing);
+
if (pevent_get_field_val(s, event, gfn, record, val, 1)  0)
return -1;
 
@@ -430,7 +435,7 @@ int PEVENT_PLUGIN_LOADER(struct pevent *pevent)
pevent_register_event_handler(pevent, -1, kvmmmu, 
kvm_mmu_unsync_page,
  kvm_mmu_print_role, NULL);
 
-   pevent_register_event_handler(pevent, -1, kvmmmu, kvm_mmu_zap_page,
+   pevent_register_event_handler(pevent, -1, kvmmmu, 
kvm_mmu_prepare_zap_page,
  kvm_mmu_print_role, NULL);
 
return 0;
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/3] x86, apicv: add virtual x2apic support

2012-12-25 Thread Gleb Natapov
On Tue, Dec 25, 2012 at 08:24:43AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2012-12-25:
  On Tue, Dec 25, 2012 at 07:46:53AM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2012-12-25:
  On Tue, Dec 25, 2012 at 07:25:15AM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2012-12-25:
  On Tue, Dec 25, 2012 at 06:42:59AM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2012-12-25:
  On Mon, Dec 24, 2012 at 11:53:37PM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2012-12-24:
  On Mon, Dec 24, 2012 at 02:35:35AM +, Zhang, Yang Z wrote:
  Zhang, Yang Z wrote on 2012-12-24:
  Gleb Natapov wrote on 2012-12-20:
  On Mon, Dec 17, 2012 at 01:30:50PM +0800, Yang Zhang wrote:
  basically to benefit from apicv, we need clear MSR bitmap for
  corresponding x2apic MSRs:
  0x800 - 0x8ff: no read intercept for apicv register
  virtualization TPR,EOI,SELF-IPI: no write intercept for
  virtual interrupt
  delivery
  We do not set Virtualize x2APIC mode bit in secondary
  execution control. If I read the spec correctly without that
  those MSR read/writes will go straight to physical local APIC.
  Right. Now it cannot get benefit, but we may enable it in
  future and then we can benefit from it.
  Without enabling it you cannot disable MSR intercept for x2apic
  MSRs.
  
  how about to add the following check:
  if (apicv_enabled  virtual_x2apic_enabled)
 clear_msr();
  
  I do not understand what do you mean here.
  In this patch, it will clear MSR bitmap(0x800 -0x8ff) when apicv
  enabled.
  As
  you
  said, since kvm doesn't set virtualize x2apic mode, APIC register
  virtualization never take effect. So we need to clear MSR bitmap only
  when apicv enabled and virtualize x2apic mode set.
  
  But currently it is never set.
  So you think the third patch is not necessary currently? Unless we
  enabled virtualize x2apic mode.
  
  Without third patch vid will not work properly if a guest is in x2apic
  mode. Actually second and third patches need to be reordered to not have
  a windows where x2apic is broken. The problem is that this patch itself
  is buggy since it does not set virtualize x2apic mode flag. It should
  set the flag if vid is enabled and if the flag cannot be set vid should
  be forced off.
  In what conditions this flag cannot be set? I think the only case is 
  that KVM
  doesn't expose the x2apic capability to guest, if this is true, the
  guest will never use x2apic and we still can use vid.
  
  We can indeed set virtualize x2apic mode unconditionally since it does
  not take any effect if x2apic MSRs are intercepted.
  No. Since Virtual APIC access must be cleared if virtualize x2apic 
  mode is set,
  and if guest still use xAPIC, then there should be lots of ept violations 
  for apic
  access emulation. This will hurt performance.
  Stupid HW, why this pointless limitation? Can you point me where SDM says 
  that?
 Vol 3, 26.2.1.1
 
Thanks.

  We should only set virtualize x2apic mode when guest really uses
  x2apic(guest set bit 11 of APIC_BASE_MSR).
  
  Looks like SDM force us to.
  
And we can disable x2apic MSR interception only after virtualize x2apic
mode is set i.e when guest sets bit 11 of APIC_BASE_MSR.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: mmu: remove unused trace event

2012-12-25 Thread Gleb Natapov
trace_kvm_mmu_delay_free_pages() is no longer used.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index cd6e983..b8f6172 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -195,12 +195,6 @@ DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_prepare_zap_page,
TP_ARGS(sp)
 );
 
-DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_delay_free_pages,
-   TP_PROTO(struct kvm_mmu_page *sp),
-
-   TP_ARGS(sp)
-);
-
 TRACE_EVENT(
mark_mmio_spte,
TP_PROTO(u64 *sptep, gfn_t gfn, unsigned access),
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] qemu-kvm/pci-assign: 64 bits bar emulation

2012-12-25 Thread Gleb Natapov
On Thu, Dec 20, 2012 at 11:07:23AM +0800, Xudong Hao wrote:
 Enable 64 bits bar emulation.
 
 v3 changes from v2:
 - Leave original error string and drop the leading 016.
 
 v2 changes from v1:
 - Change 0lx% to 0x%016 when print a 64 bit variable.
 
 Test pass with the current seabios which already support 64bit pci bars.
 
 Signed-off-by: Xudong Hao xudong@intel.com
Thanks, applied to uq/master.

 ---
  hw/kvm/pci-assign.c |   14 ++
  1 files changed, 10 insertions(+), 4 deletions(-)
 
 diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c
 index 7a0998c..2271a2e 100644
 --- a/hw/kvm/pci-assign.c
 +++ b/hw/kvm/pci-assign.c
 @@ -46,6 +46,7 @@
  #define IORESOURCE_IRQ  0x0400
  #define IORESOURCE_DMA  0x0800
  #define IORESOURCE_PREFETCH 0x2000  /* No side effects */
 +#define IORESOURCE_MEM_64   0x0010
  
  //#define DEVICE_ASSIGNMENT_DEBUG
  
 @@ -442,9 +443,13 @@ static int assigned_dev_register_regions(PCIRegion 
 *io_regions,
  
  /* handle memory io regions */
  if (cur_region-type  IORESOURCE_MEM) {
 -int t = cur_region-type  IORESOURCE_PREFETCH
 -? PCI_BASE_ADDRESS_MEM_PREFETCH
 -: PCI_BASE_ADDRESS_SPACE_MEMORY;
 +int t = PCI_BASE_ADDRESS_SPACE_MEMORY;
 +if (cur_region-type  IORESOURCE_PREFETCH) {
 +t |= PCI_BASE_ADDRESS_MEM_PREFETCH;
 +}
 +if (cur_region-type  IORESOURCE_MEM_64) {
 +t |= PCI_BASE_ADDRESS_MEM_TYPE_64;
 +}
  
  /* map physical memory */
  pci_dev-v_addrs[i].u.r_virtbase = mmap(NULL, cur_region-size,
 @@ -632,7 +637,8 @@ again:
  rp-valid = 0;
  rp-resource_fd = -1;
  size = end - start + 1;
 -flags = IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH;
 +flags = IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH
 + | IORESOURCE_MEM_64;
  if (size == 0 || (flags  ~IORESOURCE_PREFETCH) == 0) {
  continue;
  }
 -- 
 1.5.5

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] virtio-scsi: introduce multiqueue support

2012-12-25 Thread Wanlong Gao
On 12/19/2012 12:02 AM, Michael S. Tsirkin wrote:
 On Tue, Dec 18, 2012 at 04:51:28PM +0100, Paolo Bonzini wrote:
 Il 18/12/2012 16:03, Michael S. Tsirkin ha scritto:
 On Tue, Dec 18, 2012 at 03:08:08PM +0100, Paolo Bonzini wrote:
 Il 18/12/2012 14:57, Michael S. Tsirkin ha scritto:
 -static int virtscsi_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd 
 *sc)
 +static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
 + struct virtio_scsi_target_state *tgt,
 + struct scsi_cmnd *sc)
  {
 -struct virtio_scsi *vscsi = shost_priv(sh);
 -struct virtio_scsi_target_state *tgt = 
 vscsi-tgt[sc-device-id];
  struct virtio_scsi_cmd *cmd;
 +struct virtio_scsi_vq *req_vq;
  int ret;
  
  struct Scsi_Host *shost = virtio_scsi_host(vscsi-vdev);
 @@ -461,7 +533,8 @@ static int virtscsi_queuecommand(struct Scsi_Host 
 *sh, struct scsi_cmnd *sc)
  BUG_ON(sc-cmd_len  VIRTIO_SCSI_CDB_SIZE);
  memcpy(cmd-req.cmd.cdb, sc-cmnd, sc-cmd_len);
  
 -if (virtscsi_kick_cmd(tgt, vscsi-req_vq, cmd,
 +req_vq = ACCESS_ONCE(tgt-req_vq);

 This ACCESS_ONCE without a barrier looks strange to me.
 Can req_vq change? Needs a comment.

 Barriers are needed to order two things.  Here I don't have the second 
 thing
 to order against, hence no barrier.

 Accessing req_vq lockless is safe, and there's a comment about it, but you
 still want ACCESS_ONCE to ensure the compiler doesn't play tricks.

 That's just it.
 Why don't you want compiler to play tricks?

 Because I want the lockless access to occur exactly when I write it.
 
 It doesn't occur when you write it. CPU can still move accesses
 around. That's why you either need both ACCESS_ONCE and a barrier
 or none.
 
 Otherwise I have one more thing to think about, i.e. what a crazy
 compiler writer could do with my code.  And having been on the other
 side of the trench, compiler writers can have *really* crazy ideas.

 Anyhow, I'll reorganize the code to move the ACCESS_ONCE closer to the
 write and make it clearer.

 +if (virtscsi_kick_cmd(tgt, req_vq, cmd,
sizeof cmd-req.cmd, sizeof cmd-resp.cmd,
GFP_ATOMIC) == 0)
  ret = 0;
 @@ -472,6 +545,48 @@ out:
  return ret;
  }
  
 +static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
 +struct scsi_cmnd *sc)
 +{
 +struct virtio_scsi *vscsi = shost_priv(sh);
 +struct virtio_scsi_target_state *tgt = 
 vscsi-tgt[sc-device-id];
 +
 +atomic_inc(tgt-reqs);

 And here we don't have barrier after atomic? Why? Needs a comment.

 Because we don't write req_vq, so there's no two writes to order.  Barrier
 against what?

 Between atomic update and command. Once you queue command it
 can complete and decrement reqs, if this happens before
 increment reqs can become negative even.

 This is not a problem.  Please read Documentation/memory-barrier.txt:

The following also do _not_ imply memory barriers, and so may
require explicit memory barriers under some circumstances
(smp_mb__before_atomic_dec() for instance):

 atomic_add();
 atomic_sub();
 atomic_inc();
 atomic_dec();

If they're used for statistics generation, then they probably don't
need memory barriers, unless there's a coupling between statistical
data.

 This is the single-queue case, so it falls under this case.
 
 Aha I missed it's single queue. Correct but please add a comment.
 
  /* Discover virtqueues and write information to configuration.  
 */
 -err = vdev-config-find_vqs(vdev, 3, vqs, callbacks, names);
 +err = vdev-config-find_vqs(vdev, num_vqs, vqs, callbacks, 
 names);
  if (err)
  return err;
  
 -virtscsi_init_vq(vscsi-ctrl_vq, vqs[0]);
 -virtscsi_init_vq(vscsi-event_vq, vqs[1]);
 -virtscsi_init_vq(vscsi-req_vq, vqs[2]);
 +virtscsi_init_vq(vscsi-ctrl_vq, vqs[0], false);
 +virtscsi_init_vq(vscsi-event_vq, vqs[1], false);
 +for (i = VIRTIO_SCSI_VQ_BASE; i  num_vqs; i++)
 +virtscsi_init_vq(vscsi-req_vqs[i - 
 VIRTIO_SCSI_VQ_BASE],
 + vqs[i], vscsi-num_queues  1);

 So affinity is true if 1 vq? I am guessing this is not
 going to do the right thing unless you have at least
 as many vqs as CPUs.

 Yes, and then you're not setting up the thing correctly.

 Why not just check instead of doing the wrong thing?

 The right thing could be to set the affinity with a stride, e.g. CPUs
 0-4 for virtqueue 0 and so on until CPUs 3-7 for virtqueue 3.

 Paolo
 
 I think a simple #vqs == #cpus check would be kind of OK for
 starters, otherwise let userspace set affinity.
 Again need to think what happens with CPU hotplug.

How about dynamicly setting affinity this way?

Re: KVM: VMX: fix incorrect cached cpl value with real/v8086 modes

2012-12-25 Thread Gleb Natapov
On Sat, Dec 22, 2012 at 02:31:10PM +0200, Avi Kivity wrote:
 On Wed, Dec 19, 2012 at 3:29 PM, Marcelo Tosatti mtosa...@redhat.comwrote:
 
 
 
  CPL is always 0 when in real mode, and always 3 when virtual 8086 mode.
 
  Using values other than those can cause failures on operations that check
  CPL.
 
  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
  diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
  index a4ecf7c..3abe433 100644
  --- a/arch/x86/kvm/vmx.c
  +++ b/arch/x86/kvm/vmx.c
  @@ -3215,13 +3215,6 @@ static u64 vmx_get_segment_base(struct kvm_vcpu
  *vcpu, int seg)
 
   static int __vmx_get_cpl(struct kvm_vcpu *vcpu)
   {
  -   if (!is_protmode(vcpu))
  -   return 0;
  -
  -   if (!is_long_mode(vcpu)
  -(kvm_get_rflags(vcpu)  X86_EFLAGS_VM)) /* if virtual 8086
  */
  -   return 3;
  -
  return vmx_read_guest_seg_selector(to_vmx(vcpu), VCPU_SREG_CS)  3;
   }
 
  @@ -3229,6 +3222,13 @@ static int vmx_get_cpl(struct kvm_vcpu *vcpu)
   {
  struct vcpu_vmx *vmx = to_vmx(vcpu);
 
  +   if (!is_protmode(vcpu))
  +   return 0;
  +
  +   if (!is_long_mode(vcpu)
  +(kvm_get_rflags(vcpu)  X86_EFLAGS_VM)) /* if virtual 8086
  */
  +   return 3;
  +
  /*
   * If we enter real mode with cs.sel  3 != 0, the normal CPL
  calculations
   * fail; use the cache instead.
 
 
 
 This undoes the cache, now every vmx_get_cpl() in protected mode has to
 VMREAD(GUEST_RFLAGS).
True. Marcelo what failure do you see without the patch?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: VMX: fix incorrect cached cpl value with real/v8086 modes

2012-12-25 Thread Marcelo Tosatti
On Tue, Dec 25, 2012 at 02:48:08PM +0200, Gleb Natapov wrote:
 On Sat, Dec 22, 2012 at 02:31:10PM +0200, Avi Kivity wrote:
  On Wed, Dec 19, 2012 at 3:29 PM, Marcelo Tosatti mtosa...@redhat.comwrote:
  
  
  
   CPL is always 0 when in real mode, and always 3 when virtual 8086 mode.
  
   Using values other than those can cause failures on operations that check
   CPL.
  
   Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
  
   diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
   index a4ecf7c..3abe433 100644
   --- a/arch/x86/kvm/vmx.c
   +++ b/arch/x86/kvm/vmx.c
   @@ -3215,13 +3215,6 @@ static u64 vmx_get_segment_base(struct kvm_vcpu
   *vcpu, int seg)
  
static int __vmx_get_cpl(struct kvm_vcpu *vcpu)
{
   -   if (!is_protmode(vcpu))
   -   return 0;
   -
   -   if (!is_long_mode(vcpu)
   -(kvm_get_rflags(vcpu)  X86_EFLAGS_VM)) /* if virtual 8086
   */
   -   return 3;
   -
   return vmx_read_guest_seg_selector(to_vmx(vcpu), VCPU_SREG_CS)  
   3;
}
  
   @@ -3229,6 +3222,13 @@ static int vmx_get_cpl(struct kvm_vcpu *vcpu)
{
   struct vcpu_vmx *vmx = to_vmx(vcpu);
  
   +   if (!is_protmode(vcpu))
   +   return 0;
   +
   +   if (!is_long_mode(vcpu)
   +(kvm_get_rflags(vcpu)  X86_EFLAGS_VM)) /* if virtual 8086
   */
   +   return 3;
   +
   /*
* If we enter real mode with cs.sel  3 != 0, the normal CPL
   calculations
* fail; use the cache instead.
  
  
  
  This undoes the cache, now every vmx_get_cpl() in protected mode has to
  VMREAD(GUEST_RFLAGS).
 True. Marcelo what failure do you see without the patch?
 
 --
   Gleb.

On transition _to_ real mode, linearize fails due to CPL checks
(FreeBSD). I'll resend the patch with use of cache for
VMREAD(GUEST_RFLAGS), which is already implemented.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v12 0/8] pv event to notify host when the guest is panicked

2012-12-25 Thread Marcelo Tosatti
On Thu, Dec 20, 2012 at 03:53:59PM +0800, Hu Tao wrote:
 Hi,
 
 Any comments?

As far as i can see, items 2 and 3 of

https://lkml.org/lkml/2012/11/12/588

Have not been addressed.

https://lkml.org/lkml/2012/11/20/653 contains discussions on those
items.

2) Format of the interface for other architectures (you can choose
a different KVM supported architecture and write an example). It was
your choice to choose an I/O port, which is x86 specific.

3) Clear/documented management interface for the feature.

Note 3 is for management, not the guest-host interface.

 On Wed, Dec 12, 2012 at 02:13:43PM +0800, Hu Tao wrote:
  This series implements a new interface, kvm pv event, to notify host when
  some events happen in guest. Right now there is one supported event: guest
  panic.
  
  changes from v11:
  
- add a new patch 'save/load cpu runstate'
- fix a bug of null-dereference when no -machine option is supplied
- reserve RUN_STATE_GUEST_PANICKED during migration
- add doc of enable_pv_event option
- disable reboot-on-panic if pv_event is on
  
  v11: http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg04361.html
  
  Hu Tao (7):
save/load cpu runstate
update kernel headers
add a new runstate: RUN_STATE_GUEST_PANICKED
add a new qevent: QEVENT_GUEST_PANICKED
introduce a new qom device to deal with panicked event
allower the user to disable pv event support
pv event: add document to describe the usage
  
  Wen Congyang (1):
start vm after resetting it
  
   block.h  |   2 +
   docs/pv-event.txt|  17 
   hw/kvm/Makefile.objs |   2 +-
   hw/kvm/pv_event.c| 197 
  +++
   hw/pc_piix.c |  11 +++
   kvm-stub.c   |   4 +
   kvm.h|   2 +
   linux-headers/asm-x86/kvm_para.h |   1 +
   linux-headers/linux/kvm_para.h   |   6 ++
   migration.c  |   7 +-
   monitor.c|   6 +-
   monitor.h|   1 +
   qapi-schema.json |   6 +-
   qemu-config.c|   4 +
   qemu-options.hx  |   3 +-
   qmp.c|   5 +-
   savevm.c |   1 +
   sysemu.h |   2 +
   vl.c |  52 ++-
   19 files changed, 312 insertions(+), 17 deletions(-)
   create mode 100644 docs/pv-event.txt
   create mode 100644 hw/kvm/pv_event.c
  
  -- 
  1.8.0.1.240.ge8a1f5a
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v12 0/8] pv event to notify host when the guest is panicked

2012-12-25 Thread Marcelo Tosatti
On Thu, Dec 20, 2012 at 03:53:59PM +0800, Hu Tao wrote:
 Hi,
 
 Any comments?

Did you verify possibilities listed at 
https://lkml.org/lkml/2012/11/20/653 ?

If so, a summary in the patchset would be helpful.

 On Wed, Dec 12, 2012 at 02:13:43PM +0800, Hu Tao wrote:
  This series implements a new interface, kvm pv event, to notify host when
  some events happen in guest. Right now there is one supported event: guest
  panic.
  
  changes from v11:
  
- add a new patch 'save/load cpu runstate'
- fix a bug of null-dereference when no -machine option is supplied
- reserve RUN_STATE_GUEST_PANICKED during migration
- add doc of enable_pv_event option
- disable reboot-on-panic if pv_event is on
  
  v11: http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg04361.html
  
  Hu Tao (7):
save/load cpu runstate
update kernel headers
add a new runstate: RUN_STATE_GUEST_PANICKED
add a new qevent: QEVENT_GUEST_PANICKED
introduce a new qom device to deal with panicked event
allower the user to disable pv event support
pv event: add document to describe the usage
  
  Wen Congyang (1):
start vm after resetting it
  
   block.h  |   2 +
   docs/pv-event.txt|  17 
   hw/kvm/Makefile.objs |   2 +-
   hw/kvm/pv_event.c| 197 
  +++
   hw/pc_piix.c |  11 +++
   kvm-stub.c   |   4 +
   kvm.h|   2 +
   linux-headers/asm-x86/kvm_para.h |   1 +
   linux-headers/linux/kvm_para.h   |   6 ++
   migration.c  |   7 +-
   monitor.c|   6 +-
   monitor.h|   1 +
   qapi-schema.json |   6 +-
   qemu-config.c|   4 +
   qemu-options.hx  |   3 +-
   qmp.c|   5 +-
   savevm.c |   1 +
   sysemu.h |   2 +
   vl.c |  52 ++-
   19 files changed, 312 insertions(+), 17 deletions(-)
   create mode 100644 docs/pv-event.txt
   create mode 100644 hw/kvm/pv_event.c
  
  -- 
  1.8.0.1.240.ge8a1f5a
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm lockdep splat with 3.8-rc1+

2012-12-25 Thread Borislav Petkov
Hi all,

just saw this in dmesg while running -rc1 + tip/master:


[ 6983.694615] =
[ 6983.694617] [ INFO: possible recursive locking detected ]
[ 6983.694620] 3.8.0-rc1+ #26 Not tainted
[ 6983.694621] -
[ 6983.694623] kvm/20461 is trying to acquire lock:
[ 6983.694625]  (anon_vma-rwsem){..}, at: [8111d2c8] 
mm_take_all_locks+0x148/0x1a0
[ 6983.694636] 
[ 6983.694636] but task is already holding lock:
[ 6983.694638]  (anon_vma-rwsem){..}, at: [8111d2c8] 
mm_take_all_locks+0x148/0x1a0
[ 6983.694645] 
[ 6983.694645] other info that might help us debug this:
[ 6983.694647]  Possible unsafe locking scenario:
[ 6983.694647] 
[ 6983.694649]CPU0
[ 6983.694650]
[ 6983.694651]   lock(anon_vma-rwsem);
[ 6983.694654]   lock(anon_vma-rwsem);
[ 6983.694657] 
[ 6983.694657]  *** DEADLOCK ***
[ 6983.694657] 
[ 6983.694659]  May be due to missing lock nesting notation
[ 6983.694659] 
[ 6983.694661] 4 locks held by kvm/20461:
[ 6983.694663]  #0:  (mm-mmap_sem){++}, at: [8112afb3] 
do_mmu_notifier_register+0x153/0x180
[ 6983.694670]  #1:  (mm_all_locks_mutex){+.+...}, at: [8111d1bc] 
mm_take_all_locks+0x3c/0x1a0
[ 6983.694678]  #2:  (mapping-i_mmap_mutex){+.+...}, at: [8111d24d] 
mm_take_all_locks+0xcd/0x1a0
[ 6983.694686]  #3:  (anon_vma-rwsem){..}, at: [8111d2c8] 
mm_take_all_locks+0x148/0x1a0
[ 6983.694694] 
[ 6983.694694] stack backtrace:
[ 6983.694696] Pid: 20461, comm: kvm Not tainted 3.8.0-rc1+ #26
[ 6983.694698] Call Trace:
[ 6983.694704]  [8109c2fa] __lock_acquire+0x89a/0x1f30
[ 6983.694708]  [810978ed] ? trace_hardirqs_off+0xd/0x10
[ 6983.694711]  [81099b8d] ? mark_held_locks+0x8d/0x110
[ 6983.694714]  [8111d24d] ? mm_take_all_locks+0xcd/0x1a0
[ 6983.694718]  [8109e05e] lock_acquire+0x9e/0x1f0
[ 6983.694720]  [8111d2c8] ? mm_take_all_locks+0x148/0x1a0
[ 6983.694724]  [81097ace] ? put_lock_stats.isra.17+0xe/0x40
[ 6983.694728]  [81519949] down_write+0x49/0x90
[ 6983.694731]  [8111d2c8] ? mm_take_all_locks+0x148/0x1a0
[ 6983.694734]  [8111d2c8] mm_take_all_locks+0x148/0x1a0
[ 6983.694737]  [8112afb3] ? do_mmu_notifier_register+0x153/0x180
[ 6983.694740]  [8112aedf] do_mmu_notifier_register+0x7f/0x180
[ 6983.694742]  [8112b013] mmu_notifier_register+0x13/0x20
[ 6983.694765]  [a00e665d] kvm_dev_ioctl+0x3cd/0x4f0 [kvm]
[ 6983.694768]  [8114bcb0] do_vfs_ioctl+0x90/0x570
[ 6983.694772]  [81157403] ? fget_light+0x323/0x4c0
[ 6983.694775]  [8114c1e0] sys_ioctl+0x50/0x90
[ 6983.694781]  [8123a25e] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 6983.694785]  [8151d4c2] system_call_fastpath+0x16/0x1b

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: VMX: fix incorrect cached cpl value with real/v8086 modes

2012-12-25 Thread Gleb Natapov
On Tue, Dec 25, 2012 at 07:37:10PM -0200, Marcelo Tosatti wrote:
 On Tue, Dec 25, 2012 at 02:48:08PM +0200, Gleb Natapov wrote:
  On Sat, Dec 22, 2012 at 02:31:10PM +0200, Avi Kivity wrote:
   On Wed, Dec 19, 2012 at 3:29 PM, Marcelo Tosatti 
   mtosa...@redhat.comwrote:
   
   
   
CPL is always 0 when in real mode, and always 3 when virtual 8086 mode.
   
Using values other than those can cause failures on operations that 
check
CPL.
   
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
   
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a4ecf7c..3abe433 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3215,13 +3215,6 @@ static u64 vmx_get_segment_base(struct kvm_vcpu
*vcpu, int seg)
   
 static int __vmx_get_cpl(struct kvm_vcpu *vcpu)
 {
-   if (!is_protmode(vcpu))
-   return 0;
-
-   if (!is_long_mode(vcpu)
-(kvm_get_rflags(vcpu)  X86_EFLAGS_VM)) /* if virtual 
8086
*/
-   return 3;
-
return vmx_read_guest_seg_selector(to_vmx(vcpu), VCPU_SREG_CS) 
 3;
 }
   
@@ -3229,6 +3222,13 @@ static int vmx_get_cpl(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
   
+   if (!is_protmode(vcpu))
+   return 0;
+
+   if (!is_long_mode(vcpu)
+(kvm_get_rflags(vcpu)  X86_EFLAGS_VM)) /* if virtual 
8086
*/
+   return 3;
+
/*
 * If we enter real mode with cs.sel  3 != 0, the normal CPL
calculations
 * fail; use the cache instead.
   
   
   
   This undoes the cache, now every vmx_get_cpl() in protected mode has to
   VMREAD(GUEST_RFLAGS).
  True. Marcelo what failure do you see without the patch?
  
  --
  Gleb.
 
 On transition _to_ real mode, linearize fails due to CPL checks
 (FreeBSD). I'll resend the patch with use of cache for
 VMREAD(GUEST_RFLAGS), which is already implemented.
I am curious does it still fails with all my vmx patches applied too?
The question is how does it happen that we enter real mode while cache
is set to 3. It should never be 3 during boot since boot process never
enters the userspace.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html