KVM Test report, kernel 81f4f76b... qemu 8912bdea...

2013-04-07 Thread Ren, Yongjie
Hi All,

This is KVM upstream test result against kvm.git next branch and qemu-kvm.git 
uq/master branch.
 kvm.git next branch: 81f4f76bbc712a2dff8bb020057c554e285370e1 based on 
kernel 3.9.0-rc3
 qemu-kvm.git uq/master branch: 8912bdea01e8671e59fe0287314379be9c1f40ec

We found 1 new bug and verified 2 fixed bugs in the past month. 

New issue (1):
1. igb VF can't work in KVM guest.
  https://bugzilla.kernel.org/show_bug.cgi?id=55421

Fixed issues (2):
1. guest panic after live migration
  https://bugzilla.kernel.org/show_bug.cgi?id=54061
2. host panic when creating guest, doing scp and killing QEMU process 
continuously
  https://bugzilla.kernel.org/show_bug.cgi?id=55201

Old issues (6):
--
1. guest panic with parameter -cpu host in qemu command line (about vPMU 
issue).
  https://bugs.launchpad.net/qemu/+bug/994378
2. Can't install or boot up 32bit win8 guest.
  https://bugs.launchpad.net/qemu/+bug/1007269
3. vCPU hot-add makes the guest abort. 
  https://bugs.launchpad.net/qemu/+bug/1019179
4. Nested Virt: VMX can't be initialized in L1 Xen (Xen on KVM)
  https://bugzilla.kernel.org/show_bug.cgi?id=45931
5. Guest has no xsave feature with parameter -cpu qemu64,+xsave in qemu 
command line.
  https://bugs.launchpad.net/qemu/+bug/1042561
6. Guest hang when doing kernel build and writing data in guest.
  https://bugs.launchpad.net/qemu/+bug/1096814


Test environment:
==
  Platform   IvyBridge-EPSandybridge-EP
  CPU Cores   3232
  Memory size 64GB  32GB


Regards
Yongjie Ren  (Jay)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 4/7] KVM: Add reset/restore rtc_status support

2013-04-07 Thread Gleb Natapov
On Sun, Apr 07, 2013 at 02:30:15AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-04:
  On Mon, Apr 01, 2013 at 08:40:13AM +0800, Yang Zhang wrote:
  From: Yang Zhang yang.z.zh...@intel.com
  
  Signed-off-by: Yang Zhang yang.z.zh...@intel.com
  ---
   arch/x86/kvm/lapic.c |9 + arch/x86/kvm/lapic.h |2 ++
   virt/kvm/ioapic.c|   43
   +++ virt/kvm/ioapic.h|   
   1 + 4 files changed, 55 insertions(+), 0 deletions(-)
  diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
  index 96ab160..9c041fa 100644
  --- a/arch/x86/kvm/lapic.c
  +++ b/arch/x86/kvm/lapic.c
  @@ -94,6 +94,14 @@ static inline int apic_test_vector(int vec, void 
  *bitmap)
 return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
   }
  +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
  +{
  +  struct kvm_lapic *apic = vcpu-arch.apic;
  +
  +  return apic_test_vector(vector, apic-regs + APIC_ISR) ||
  +  apic_test_vector(vector, apic-regs + APIC_IRR);
  +}
  +
   static inline void apic_set_vector(int vec, void *bitmap)
   {
 set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
  @@ -1665,6 +1673,7 @@ void kvm_apic_post_state_restore(struct kvm_vcpu
  *vcpu,
 apic-highest_isr_cache = -1;
 kvm_x86_ops-hwapic_isr_update(vcpu-kvm,
   apic_find_highest_isr(apic)); kvm_make_request(KVM_REQ_EVENT, vcpu);
   + kvm_rtc_irq_restore(vcpu); }
   
   void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
  diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
  index 967519c..004d2ad 100644
  --- a/arch/x86/kvm/lapic.h
  +++ b/arch/x86/kvm/lapic.h
  @@ -170,4 +170,6 @@ static inline bool kvm_apic_has_events(struct
  kvm_vcpu *vcpu)
 return vcpu-arch.apic-pending_events;
   }
  +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
  +
   #endif
  diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
  index 8664812..0b12b17 100644
  --- a/virt/kvm/ioapic.c
  +++ b/virt/kvm/ioapic.c
  @@ -90,6 +90,47 @@ static unsigned long ioapic_read_indirect(struct
  kvm_ioapic *ioapic,
 return result;
   }
  +static void rtc_irq_reset(struct kvm_ioapic *ioapic)
  +{
  +  ioapic-rtc_status.pending_eoi = 0;
  +  bitmap_zero(ioapic-rtc_status.dest_map, KVM_MAX_VCPUS);
  +}
  +
  +static void rtc_irq_restore(struct kvm_ioapic *ioapic)
  +{
  +  struct kvm_vcpu *vcpu;
  +  int vector, i, pending_eoi = 0;
  +
  +  if (RTC_GSI = IOAPIC_NUM_PINS)
  +  return;
  +
  +  vector = ioapic-redirtbl[RTC_GSI].fields.vector;
  +  kvm_for_each_vcpu(i, vcpu, ioapic-kvm) {
  +  if (kvm_apic_pending_eoi(vcpu, vector)) {
  +  pending_eoi++;
  +  __set_bit(vcpu-vcpu_id, ioapic-rtc_status.dest_map);
  You should cleat dest_map at the beginning to get rid of stale bits.
 I thought kvm_set_ioapic is called only after save/restore or migration. And 
 the ioapic should be reset successfully before call it. So the dest_map is 
 empty before call rtc_irq_restore().
 But it is possible kvm_set_ioapic is called beside save/restore or migration. 
 Right?
 
First of all userspace should not care when it calls kvm_set_ioapic()
the kernel need to do the right thing. Second, believe it or not,
kvm_ioapic_reset() is not called during system reset. Instead userspace
reset it by calling kvm_set_ioapic() with ioapic state after reset.

  
  +  }
  +  }
  +  ioapic-rtc_status.pending_eoi = pending_eoi;
  +}
  +
  +void kvm_rtc_irq_restore(struct kvm_vcpu *vcpu)
  +{
  +  struct kvm_ioapic *ioapic = vcpu-kvm-arch.vioapic;
  +  int vector;
  +
  +  if (!ioapic)
  +  return;
  +
  Can this be called if ioapic == NULL?
 Yes. IIRC, unit test will test lapic function without ioapic.
Unit test is a guest code. This has nothing to do with a guest code.
Unit test is not the one who creates lapic.

 
  Should check for if (RTC_GSI = IOAPIC_NUM_PINS) here too.
 Not necessary. kvm_rtc_irq_restore is called from arch/x86/ and we have the 
 defination:
 #ifdef CONFIG_X86
 #define RTC_GSI 8
 
 The check will be false always. As the logic you suggested below, this check 
 is necessary for _all() not _one();
OK.

 
  
  +  spin_lock(ioapic-lock);
  +  vector = ioapic-redirtbl[RTC_GSI].fields.vector;
  +  if (kvm_apic_pending_eoi(vcpu, vector)) {
  +  __set_bit(vcpu-vcpu_id, ioapic-rtc_status.dest_map);
  +  ioapic-rtc_status.pending_eoi++;
  The bit may have been set already. The logic should be:
 Right.
 
 
  
  new_val = kvm_apic_pending_eoi(vcpu, vector)
  old_val = set_bit(vcpu_id, dest_map)
  
  if (new_val == old_val)
  return;
  
  if (new_val) {
  __set_bit(vcpu_id, dest_map);
  pending_eoi++;
  } else {
  __clear_bit(vcpu_id, dest_map);
  pending_eoi--;
  }
  
  The naming of above two functions are not good either. Call
  them something like kvm_rtc_eoi_tracking_restore_all() and
  kvm_rtc_eoi_tracking_restore_one().  And _all should call _one() for
  each vcpu. Make 

Re: [PATCH RFC] kvm: add PV MMIO EVENTFD

2013-04-07 Thread Michael S. Tsirkin
On Thu, Apr 04, 2013 at 04:32:01PM -0700, Christoffer Dall wrote:
 [...]
 
   to give us some idea how much performance we would gain from each 
   approach? Thoughput should be completely unaffected anyway, since 
   virtio just coalesces kicks internally.
  
   Latency is dominated by the scheduling latency.
   This means virtio-net is not the best benchmark.
 
  So what is a good benchmark?
 
  E.g. ping pong stress will do but need to look at CPU utilization,
  that's what is affected, not latency.
 
  Is there any difference in speed at all? I strongly doubt it. One of 
  virtio's main points is to reduce the number of kicks.
 
  For this stage of the project I think microbenchmarks are more appropriate.
  Doubling the price of exit is likely to be measureable. 30 cycles likely
  not ...
 
 I don't quite understand this point here. If we don't have anything
 real-world where we can measure a decent difference, then why are we
 doing this? I would agree with Alex that the three test scenarios
 proposed by him should be tried out before adding this complexity,
 measured in CPU utilization or latency as you wish.

Sure, plan to do real world benchmarks for PV MMIO versus PIO as well.
I don't see why I should bother implementing hypercalls given that the
kvm maintainer says they won't be merged.

 FWIW, ARM always uses MMIO and provides hardware decoding of all sane
 (not user register-writeback) instruction, but the hypercall vs. mmio
 looks like this:
 
 hvc:  4,917
 mmio_kernel: 6,248

So 20% difference?  That's not far from what happens on my intel laptop:
vmcall 1519
outl_to_kernel 1745
10% difference here.

 
 But I doubt that an hvc wrapper around mmio decoding would take care
 of all this difference, because the mmio operation needs to do other
 work not realated to emulating the instruction in software, which
 you'd have to do for an hvc anyway (populate kvm_mmio structure etc.)
 
 -Christoffer

Instead of speculating, someone with relevant hardware
could just try this, but kvm unittest doesn't seem to have arm support
at the moment. Anyone working on this?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: add PV MMIO EVENTFD

2013-04-07 Thread Gleb Natapov
On Thu, Apr 04, 2013 at 04:14:57PM +0300, Gleb Natapov wrote:
   
   is to move to MMIO only when PIO address space is exhausted. For PCI it
   will be never, for PCI-e it will be after ~16 devices.
   
   Ok, let's go back a step here. Are you actually able to measure any 
   speed in performance with this patch applied and without when going 
   through MMIO kicks?
   
   
   That's the question for MST. I think he did only micro benchmarks till
   now and he already posted his result here:
   
   mmio-wildcard-eventfd:pci-mem 3529
   mmio-pv-eventfd:pci-mem 1878
   portio-wildcard-eventfd:pci-io 1846
   
   So the patch speedup mmio by almost 100% and it is almost the same as PIO.
  
  Those numbers don't align at all with what I measured.
 I am trying to run vmexit test on AMD now, but something does not work
 there. Next week I'll fix it and see how AMD differs, bit on Intel those are 
 the
 numbers.
 
The numbers are:
vmcall 1921
inl_from_kernel 4227
outl_to_kernel 2345

outl is specifically optimized to not go through the emulator since it
is used for virtio kick. mmio-pv-eventfd is the same kind of
optimization but for mmio.
 
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: add PV MMIO EVENTFD

2013-04-07 Thread Michael S. Tsirkin
On Sun, Apr 07, 2013 at 12:30:38PM +0300, Gleb Natapov wrote:
 On Thu, Apr 04, 2013 at 04:14:57PM +0300, Gleb Natapov wrote:

is to move to MMIO only when PIO address space is exhausted. For PCI 
it
will be never, for PCI-e it will be after ~16 devices.

Ok, let's go back a step here. Are you actually able to measure any 
speed in performance with this patch applied and without when going 
through MMIO kicks?


That's the question for MST. I think he did only micro benchmarks till
now and he already posted his result here:

mmio-wildcard-eventfd:pci-mem 3529
mmio-pv-eventfd:pci-mem 1878
portio-wildcard-eventfd:pci-io 1846

So the patch speedup mmio by almost 100% and it is almost the same as 
PIO.
   
   Those numbers don't align at all with what I measured.
  I am trying to run vmexit test on AMD now, but something does not work
  there. Next week I'll fix it and see how AMD differs, bit on Intel those 
  are the
  numbers.
  
 The numbers are:
 vmcall 1921
 inl_from_kernel 4227
 outl_to_kernel 2345
 
 outl is specifically optimized to not go through the emulator since it
 is used for virtio kick. mmio-pv-eventfd is the same kind of
 optimization but for mmio.
  
 --
   Gleb.


Hmm so on AMD it's more like 20% overhead, like ARM.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: KVM: kvm_set_slave_cpu: Invalid argument when trying direct interrupt delivery

2013-04-07 Thread Yangminqiang
Hi Tomoki,

I offline the cpu2 and cpu3 on my machine and continue to try your patch.  I 
run the vm without pass-through device for I only want to know the interrupt 
latency improvement.(Am I right?)

my qemu parameter:
./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 1024 -cpu qemu64,+x2apic 
-no-kvm-pit -serial pty -nographic -drive 
file=/mnt/sdb/vms/testfc/testfc.qcow2,if=virtio,index=0,format=qcow2 -spice 
port=12000,addr=186.100.8.171,disable-ticketing,plaintext-channel=main,plaintext-channel=playback,plaintext-channel=record,image-compression=auto_glz
 -no-kvm-pit

cyclictest:
   cyclictest -m -p 99 -n -l 10 -h 3000 -q  


but I got very bad result:
  avg lantency: 2+ us
  max lantency: 5+ us

and got

Message from syslogd@kvmsteven at Apr  7 05:43:30 ...
 kernel:[ 2201.151817] BUG: soft lockup - CPU#18 stuck for 22s! 
[qemu-system-x86:2365]

my setup:
host kernel: 3.6.0-rc4+ and your patches
guest kernel: 3.6.11.1-rt32
qemu: qemu-kvm-1.0 with your patch

BTW, I am sure that my rt-kernel works well, which got 12us max latency as a 
host OS.

Could you please provoide me more detail about your benchmark so I could 
reproduce your benchmark result? 

Thanks,
Steven


From: Tomoki Sekiyama [tomoki.sekiyama...@hitachi.com]
Sent: Wednesday, April 03, 2013 10:02
To: Yangminqiang
Cc: tomoki.sekiy...@hds.com; kvm@vger.kernel.org
Subject: Re: KVM: kvm_set_slave_cpu: Invalid argument when trying direct 
interrupt delivery

Hi,

Thank you for testing the patch.

Yangminqiang yangminqi...@huawei.com wrote:
 Hi Tomoki

 I tried your smart patch cpu isolation and direct interrupt delivery,
   http://article.gmane.org/gmane.linux.kernel/1353803

 got  output when I run qemu
   kvm_set_slave_cpu: Invalid argument

 So I wonder
 * Did I  misuse your patches?
 * How is the offlined CPU assigned? or the Guest OS will automaticly detect
 and use it?

Currently it is hard-coded in the patch for qemu-kvm just for testing:

diff -Narup a/qemu-kvm-1.0/qemu-kvm-x86.c b/qemu-kvm-1.0/qemu-kvm-x86.c
--- a/qemu-kvm-1.0/qemu-kvm-x86.c   2011-12-04 19:38:06.0 +0900
+++ b/qemu-kvm-1.0/qemu-kvm-x86.c   2012-09-06 20:19:44.828163734 +0900
@@ -139,12 +139,28 @@ static int kvm_enable_tpr_access_reporti
 return kvm_vcpu_ioctl(env, KVM_TPR_ACCESS_REPORTING, tac);
 }

+static int kvm_set_slave_cpu(CPUState *env)
+{
+int r, slave = env-cpu_index == 0 ? 2 : env-cpu_index == 1 ? 3 : -1;

`slave' is the offlined CPU ID assigned, and `env-cpu_index' is
the virtual CPU ID. You need to modify here and recompile qemu-kvm
(or just offline cpu 2 and 3 for a 2vcpus guest ;) ).

Thanks,
Tomoki Sekiyama
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM fixes for 3.9-rc5

2013-04-07 Thread Gleb Natapov

Linus,

Please pull from

git://git.kernel.org/pub/scm/virt/kvm/kvm.git master

To receive the bugfix for the regression introduced by c300aa64ddf57. 

Andrew Honig (1):
  KVM: Allow cross page reads and writes from cached translations.

 arch/x86/kvm/lapic.c  |2 -
 arch/x86/kvm/x86.c|   13 +---
 include/linux/kvm_host.h  |2 -
 include/linux/kvm_types.h |1 
 virt/kvm/kvm_main.c   |   47 --
 5 files changed, 46 insertions(+), 19 deletions(-)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 RESEND] KVM: Call kvm_apic_match_dest() to check destination vcpu

2013-04-07 Thread Gleb Natapov
On Sun, Apr 07, 2013 at 08:25:18AM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 A typo in previous version, resend it.
 
Please put things that should not be in git commit message under the ---
below.

 For a given vcpu, kvm_apic_match_dest() will tell you whether
 the vcpu in the destination list quickly. Drop kvm_calculate_eoi_exitmap()
 and use kvm_apic_match_dest() instead.
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
Applied again, thanks.

 ---
  arch/x86/kvm/lapic.c |   47 ---
  arch/x86/kvm/lapic.h |4 
  virt/kvm/ioapic.c|9 +++--
  3 files changed, 3 insertions(+), 57 deletions(-)
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index a8e9369..e227474 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -145,53 +145,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
   return (kvm_apic_get_reg(apic, APIC_ID)  24)  0xff;
  }
  
 -void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
 - struct kvm_lapic_irq *irq,
 - u64 *eoi_exit_bitmap)
 -{
 - struct kvm_lapic **dst;
 - struct kvm_apic_map *map;
 - unsigned long bitmap = 1;
 - int i;
 -
 - rcu_read_lock();
 - map = rcu_dereference(vcpu-kvm-arch.apic_map);
 -
 - if (unlikely(!map)) {
 - __set_bit(irq-vector, (unsigned long *)eoi_exit_bitmap);
 - goto out;
 - }
 -
 - if (irq-dest_mode == 0) { /* physical mode */
 - if (irq-delivery_mode == APIC_DM_LOWEST ||
 - irq-dest_id == 0xff) {
 - __set_bit(irq-vector,
 -   (unsigned long *)eoi_exit_bitmap);
 - goto out;
 - }
 - dst = map-phys_map[irq-dest_id  0xff];
 - } else {
 - u32 mda = irq-dest_id  (32 - map-ldr_bits);
 -
 - dst = map-logical_map[apic_cluster_id(map, mda)];
 -
 - bitmap = apic_logical_id(map, mda);
 - }
 -
 - for_each_set_bit(i, bitmap, 16) {
 - if (!dst[i])
 - continue;
 - if (dst[i]-vcpu == vcpu) {
 - __set_bit(irq-vector,
 -   (unsigned long *)eoi_exit_bitmap);
 - break;
 - }
 - }
 -
 -out:
 - rcu_read_unlock();
 -}
 -
  static void recalculate_apic_map(struct kvm *kvm)
  {
   struct kvm_apic_map *new, *old = NULL;
 diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
 index 2c721b9..baa20cf 100644
 --- a/arch/x86/kvm/lapic.h
 +++ b/arch/x86/kvm/lapic.h
 @@ -160,10 +160,6 @@ static inline u16 apic_logical_id(struct kvm_apic_map 
 *map, u32 ldr)
   return ldr  map-lid_mask;
  }
  
 -void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
 - struct kvm_lapic_irq *irq,
 - u64 *eoi_bitmap);
 -
  static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu)
  {
   return vcpu-arch.apic-pending_events;
 diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
 index 5ba005c..914cbe0 100644
 --- a/virt/kvm/ioapic.c
 +++ b/virt/kvm/ioapic.c
 @@ -124,7 +124,6 @@ void kvm_ioapic_calculate_eoi_exitmap(struct kvm_vcpu 
 *vcpu,
  {
   struct kvm_ioapic *ioapic = vcpu-kvm-arch.vioapic;
   union kvm_ioapic_redirect_entry *e;
 - struct kvm_lapic_irq irqe;
   int index;
  
   spin_lock(ioapic-lock);
 @@ -135,11 +134,9 @@ void kvm_ioapic_calculate_eoi_exitmap(struct kvm_vcpu 
 *vcpu,
   (e-fields.trig_mode == IOAPIC_LEVEL_TRIG ||
kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC,
index))) {
 - irqe.dest_id = e-fields.dest_id;
 - irqe.vector = e-fields.vector;
 - irqe.dest_mode = e-fields.dest_mode;
 - irqe.delivery_mode = e-fields.delivery_mode  8;
 - kvm_calculate_eoi_exitmap(vcpu, irqe, eoi_exit_bitmap);
 + if (kvm_apic_match_dest(vcpu, NULL, 0,
 + e-fields.dest_id, e-fields.dest_mode))
 + __set_bit(e-fields.vector, (unsigned long 
 *)eoi_exit_bitmap);
   }
   }
   spin_unlock(ioapic-lock);
 -- 
 1.7.1

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Check exit control for VM_EXIT_SAVE_IA32_PAT, not entry controls

2013-04-07 Thread Gleb Natapov
On Sat, Apr 06, 2013 at 01:51:21PM +0200, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com
 
 Obviously a copypaste mistake: prepare_vmcs12 has to check L1's exit
 controls for VM_EXIT_SAVE_IA32_PAT.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Applied, thanks.

 ---
  arch/x86/kvm/vmx.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 74c9cc1..a1761e8 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -7456,7 +7456,7 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
   /* TODO: These cannot have changed unless we have MSR bitmaps and
* the relevant bit asks not to trap the change */
   vmcs12-guest_ia32_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
 - if (vmcs12-vm_entry_controls  VM_EXIT_SAVE_IA32_PAT)
 + if (vmcs12-vm_exit_controls  VM_EXIT_SAVE_IA32_PAT)
   vmcs12-guest_ia32_pat = vmcs_read64(GUEST_IA32_PAT);
   vmcs12-guest_sysenter_cs = vmcs_read32(GUEST_SYSENTER_CS);
   vmcs12-guest_sysenter_esp = vmcs_readl(GUEST_SYSENTER_ESP);
 -- 
 1.7.3.4

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fw: [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1

2013-04-07 Thread Abel Gordon


Hi Gleb,

Do you have any comments/issues related to the  shadow vmcs patches I sent
almost a month ago  ?
If you don't have any objection let me know and I'll test+submit a re-based
version so you can apply it.

Thanks,
Abel.

- Forwarded by Abel Gordon/Haifa/IBM on 07/04/2013 02:15 PM -

Abel Gordon/Haifa/IBM@IBMIL wrote on 10/03/2013 06:03:25 PM:

 From: Abel Gordon/Haifa/IBM@IBMIL
 To: kvm@vger.kernel.org,
 Cc: owass...@redhat.com, na...@harel.org.il, jun.nakaj...@intel.com,
 dongxiao...@intel.com, Abel Gordon/Haifa/IBM@IBMIL
 Date: 10/03/2013 06:03 PM
 Subject: [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1

 This series of patches implements shadow-vmcs capability for nested VMX.

 Shadow-vmcs - background and overview:

  In Intel VMX, vmread and vmwrite privileged instructions are used by the
  hypervisor to read and modify the guest and host specifications (VMCS).
In a
  nested virtualization environment, L1 executes multiple vmread and
vmwrite
  instruction to handle a single L2 exit. Each vmread and vmwrite
 executed by L1
  traps (cause an exit) to the L0 hypervisor (KVM). L0 emulates the
instruction
  behaviour and resumes L1 execution.

  Removing the need to trap and emulate these special instructions reduces
the
  number of exits and improves nested virtualization performance. As
 it was first
  evaluated in [1], exit-less vmread and vmwrite can reduce nested
 virtualization
  overhead up-to 40%.

  Intel introduced a new feature to their processors called shadow-vmcs.
Using
  shadow-vmcs, L0 can configure the processor to let L1 running in
guest-mode
  access VMCS12 fields using vmread and vmwrite instructions but
 without causing
  an exit to L0. The VMCS12 fields' data is stored in a shadow-vmcs
controlled
  by L0.

 Shadow-vmcs - design considerations:

  A shadow-vmcs is processor-dependent and must be accessed by L0 or L1
using
  vmread and vmwrite instructions. With nested virtualization we aim
 to abstract
  the hardware from the L1 hypervisor. Thus, to avoid hardware
dependencies we
  prefered to keep the software defined VMCS12 format as part of L1
 address space
  and hold the processor-specific shadow-vmcs format only in L0 address
space.
  In other words, the shadow-vmcs is used by L0 as an accelerator butthe
format
  and content is never exposed to L1 directly. L0 syncs the content of the
  processor-specific shadow vmcs with the content of the
software-controlled
  VMCS12 format.

  We could have been kept the processor-specific shadow-vmcs format
 in L1 address
  space to avoid using the software defined VMCS12 format, however,
 this type of
  design/implementation would have been created hardware dependencies and
  would complicate other capabilities (e.g. Live Migration of L1).

 Acknowledgments:

  Many thanks to
  Xu, Dongxiao dongxiao...@intel.com
  Nakajima, Jun jun.nakaj...@intel.com
  Har'El, Nadav na...@harel.org.il

  for the insightful discussions, comments and reviews.


  These patches were easily created and maintained using
  Patchouli -- patch creator
  http://patchouli.sourceforge.net/


 [1] The Turtles Project: Design and Implementation of Nested
Virtualization,
 http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1

2013-04-07 Thread Gleb Natapov
On Sun, Apr 07, 2013 at 02:25:06PM +0300, Abel Gordon wrote:
 
 
 Hi Gleb,
 
 Do you have any comments/issues related to the  shadow vmcs patches I sent
 almost a month ago  ?
 If you don't have any objection let me know and I'll test+submit a re-based
 version so you can apply it.
 
Sorry, haven't reviewed it yet, but plan too. The month had a lot of
holidays as you know :)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86: kvmclock: Do not setup kvmclock vsyscall in the absence of that clock

2013-04-07 Thread Jan Kiszka
On 2013-02-27 12:19, Gleb Natapov wrote:
 On Sat, Feb 23, 2013 at 05:05:29PM +0100, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 This fixes boot lockups with no-kvmclock, when the host is not
 exposing this particular feature (QEMU: -cpu ...,-kvmclock) or when
 the kvmclock initialization failed for whatever reason.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 Applied, thanks.

Just noticed that this fix isn't in Linus tree yet, thus also not yet in
3.8.y. Which route does it take?

Jan

 
 ---

 Should go to 3.8 as well, I presume.

  arch/x86/kernel/kvmclock.c |9 -
  1 files changed, 8 insertions(+), 1 deletions(-)

 diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
 index 5bedbdd..b730efa 100644
 --- a/arch/x86/kernel/kvmclock.c
 +++ b/arch/x86/kernel/kvmclock.c
 @@ -160,8 +160,12 @@ int kvm_register_clock(char *txt)
  {
  int cpu = smp_processor_id();
  int low, high, ret;
 -struct pvclock_vcpu_time_info *src = hv_clock[cpu].pvti;
 +struct pvclock_vcpu_time_info *src;
 +
 +if (!hv_clock)
 +return 0;
  
 +src = hv_clock[cpu].pvti;
  low = (int)__pa(src) | 1;
  high = ((u64)__pa(src)  32);
  ret = native_write_msr_safe(msr_kvm_system_time, low, high);
 @@ -276,6 +280,9 @@ int __init kvm_setup_vsyscall_timeinfo(void)
  struct pvclock_vcpu_time_info *vcpu_time;
  unsigned int size;
  
 +if (!hv_clock)
 +return 0;
 +
  size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
  
  preempt_disable();
 -- 
 1.7.3.4




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] x86: kvmclock: Do not setup kvmclock vsyscall in the absence of that clock

2013-04-07 Thread Gleb Natapov
On Sun, Apr 07, 2013 at 01:33:05PM +0200, Jan Kiszka wrote:
 On 2013-02-27 12:19, Gleb Natapov wrote:
  On Sat, Feb 23, 2013 at 05:05:29PM +0100, Jan Kiszka wrote:
  From: Jan Kiszka jan.kis...@siemens.com
 
  This fixes boot lockups with no-kvmclock, when the host is not
  exposing this particular feature (QEMU: -cpu ...,-kvmclock) or when
  the kvmclock initialization failed for whatever reason.
 
  Signed-off-by: Jan Kiszka jan.kis...@siemens.com
  Applied, thanks.
 
 Just noticed that this fix isn't in Linus tree yet, thus also not yet in
 3.8.y. Which route does it take?
 
It is scheduled to go into 3.10. To get it into 3.8.y we need to send it
to stable@.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86: kvmclock: Do not setup kvmclock vsyscall in the absence of that clock

2013-04-07 Thread Jan Kiszka
On 2013-04-07 13:37, Gleb Natapov wrote:
 On Sun, Apr 07, 2013 at 01:33:05PM +0200, Jan Kiszka wrote:
 On 2013-02-27 12:19, Gleb Natapov wrote:
 On Sat, Feb 23, 2013 at 05:05:29PM +0100, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 This fixes boot lockups with no-kvmclock, when the host is not
 exposing this particular feature (QEMU: -cpu ...,-kvmclock) or when
 the kvmclock initialization failed for whatever reason.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 Applied, thanks.

 Just noticed that this fix isn't in Linus tree yet, thus also not yet in
 3.8.y. Which route does it take?

 It is scheduled to go into 3.10. To get it into 3.8.y we need to send it
 to stable@.

Hmm, it's a bug fix for an ugly early kernel crash...

Jan




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] x86: kvmclock: Do not setup kvmclock vsyscall in the absence of that clock

2013-04-07 Thread Gleb Natapov
On Sun, Apr 07, 2013 at 01:38:48PM +0200, Jan Kiszka wrote:
 On 2013-04-07 13:37, Gleb Natapov wrote:
  On Sun, Apr 07, 2013 at 01:33:05PM +0200, Jan Kiszka wrote:
  On 2013-02-27 12:19, Gleb Natapov wrote:
  On Sat, Feb 23, 2013 at 05:05:29PM +0100, Jan Kiszka wrote:
  From: Jan Kiszka jan.kis...@siemens.com
 
  This fixes boot lockups with no-kvmclock, when the host is not
  exposing this particular feature (QEMU: -cpu ...,-kvmclock) or when
  the kvmclock initialization failed for whatever reason.
 
  Signed-off-by: Jan Kiszka jan.kis...@siemens.com
  Applied, thanks.
 
  Just noticed that this fix isn't in Linus tree yet, thus also not yet in
  3.8.y. Which route does it take?
 
  It is scheduled to go into 3.10. To get it into 3.8.y we need to send it
  to stable@.
 
 Hmm, it's a bug fix for an ugly early kernel crash...
 
It was not a regression introduced in 3.9 and this usually means that
fix is delayed to the next merge window. Severe bugs with simple fix
could be exceptions, and I probably misjudged this one. Note that the
patch is for 3.9 would have helped to avoid that. As it stands the patch
will go to 3.9 via stable.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: fix MMIO/PIO collision misdetection

2013-04-07 Thread Gleb Natapov
On Thu, Apr 04, 2013 at 01:27:21PM +0300, Michael S. Tsirkin wrote:
 PIO and MMIO are separate address spaces, but
 ioeventfd registration code mistakenly detected
 two eventfds as duplicate if they use the same address,
 even if one is PIO and another one MMIO.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
Applied, thanks.

 ---
  virt/kvm/eventfd.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)
 
 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index adb17f2..93e5b05 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -577,6 +577,7 @@ struct _ioeventfd {
   struct eventfd_ctx  *eventfd;
   u64  datamatch;
   struct kvm_io_device dev;
 + u8   bus_idx;
   bool wildcard;
  };
  
 @@ -669,7 +670,8 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
 _ioeventfd *p)
   struct _ioeventfd *_p;
  
   list_for_each_entry(_p, kvm-ioeventfds, list)
 - if (_p-addr == p-addr  _p-length == p-length 
 + if (_p-bus_idx == p-bus_idx 
 + _p-addr == p-addr  _p-length == p-length 
   (_p-wildcard || p-wildcard ||
_p-datamatch == p-datamatch))
   return true;
 @@ -717,6 +719,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct 
 kvm_ioeventfd *args)
  
   INIT_LIST_HEAD(p-list);
   p-addr= args-addr;
 + p-bus_idx = bus_idx;
   p-length  = args-len;
   p-eventfd = eventfd;
  
 @@ -775,7 +778,8 @@ kvm_deassign_ioeventfd(struct kvm *kvm, struct 
 kvm_ioeventfd *args)
   list_for_each_entry_safe(p, tmp, kvm-ioeventfds, list) {
   bool wildcard = !(args-flags  KVM_IOEVENTFD_FLAG_DATAMATCH);
  
 - if (p-eventfd != eventfd  ||
 + if (p-bus_idx != bus_idx ||
 + p-eventfd != eventfd  ||
   p-addr != args-addr  ||
   p-length != args-len ||
   p-wildcard != wildcard)
 -- 
 MST

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v7 4/7] KVM: Add reset/restore rtc_status support

2013-04-07 Thread Zhang, Yang Z
Gleb Natapov wrote on 2013-04-07:
 On Sun, Apr 07, 2013 at 02:30:15AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-04:
 On Mon, Apr 01, 2013 at 08:40:13AM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/x86/kvm/lapic.c |9 + arch/x86/kvm/lapic.h |2 ++
  virt/kvm/ioapic.c|   43
  +++ virt/kvm/ioapic.h | 1 +
  4 files changed, 55 insertions(+), 0 deletions(-)
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 96ab160..9c041fa 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -94,6 +94,14 @@ static inline int apic_test_vector(int vec, void 
 *bitmap)
return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
  }
 +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
 +{
 +  struct kvm_lapic *apic = vcpu-arch.apic;
 +
 +  return apic_test_vector(vector, apic-regs + APIC_ISR) ||
 +  apic_test_vector(vector, apic-regs + APIC_IRR);
 +}
 +
  static inline void apic_set_vector(int vec, void *bitmap)
  {
set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
 @@ -1665,6 +1673,7 @@ void kvm_apic_post_state_restore(struct
 kvm_vcpu
 *vcpu,
apic-highest_isr_cache = -1;
kvm_x86_ops-hwapic_isr_update(vcpu-kvm,
  apic_find_highest_isr(apic)); kvm_make_request(KVM_REQ_EVENT,
  vcpu); +  kvm_rtc_irq_restore(vcpu); }
  
  void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
 index 967519c..004d2ad 100644
 --- a/arch/x86/kvm/lapic.h
 +++ b/arch/x86/kvm/lapic.h
 @@ -170,4 +170,6 @@ static inline bool kvm_apic_has_events(struct
 kvm_vcpu *vcpu)
return vcpu-arch.apic-pending_events;
  }
 +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 +
  #endif
 diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
 index 8664812..0b12b17 100644
 --- a/virt/kvm/ioapic.c
 +++ b/virt/kvm/ioapic.c
 @@ -90,6 +90,47 @@ static unsigned long ioapic_read_indirect(struct
 kvm_ioapic *ioapic,
return result;
  }
 +static void rtc_irq_reset(struct kvm_ioapic *ioapic)
 +{
 +  ioapic-rtc_status.pending_eoi = 0;
 +  bitmap_zero(ioapic-rtc_status.dest_map, KVM_MAX_VCPUS);
 +}
 +
 +static void rtc_irq_restore(struct kvm_ioapic *ioapic)
 +{
 +  struct kvm_vcpu *vcpu;
 +  int vector, i, pending_eoi = 0;
 +
 +  if (RTC_GSI = IOAPIC_NUM_PINS)
 +  return;
 +
 +  vector = ioapic-redirtbl[RTC_GSI].fields.vector;
 +  kvm_for_each_vcpu(i, vcpu, ioapic-kvm) {
 +  if (kvm_apic_pending_eoi(vcpu, vector)) {
 +  pending_eoi++;
 +  __set_bit(vcpu-vcpu_id, ioapic-rtc_status.dest_map);
 You should cleat dest_map at the beginning to get rid of stale bits.
 I thought kvm_set_ioapic is called only after save/restore or migration. And 
 the
 ioapic should be reset successfully before call it. So the dest_map is empty
 before call rtc_irq_restore().
 But it is possible kvm_set_ioapic is called beside save/restore or
 migration. Right?
 
 First of all userspace should not care when it calls kvm_set_ioapic()
 the kernel need to do the right thing. Second, believe it or not,
 kvm_ioapic_reset() is not called during system reset. Instead userspace
 reset it by calling kvm_set_ioapic() with ioapic state after reset.
Ok. I see. As the logic you suggested, it will clear dest_map if no pending eoi 
in vcpu, so we don't need to do it again.

 
 
 +  }
 +  }
 +  ioapic-rtc_status.pending_eoi = pending_eoi;
 +}
 +
 +void kvm_rtc_irq_restore(struct kvm_vcpu *vcpu)
 +{
 +  struct kvm_ioapic *ioapic = vcpu-kvm-arch.vioapic;
 +  int vector;
 +
 +  if (!ioapic)
 +  return;
 +
 Can this be called if ioapic == NULL?
 Yes. IIRC, unit test will test lapic function without ioapic.
 Unit test is a guest code. This has nothing to do with a guest code.
 Unit test is not the one who creates lapic.
Ok. Then !ioapic is unnecessary.

 
 Should check for if (RTC_GSI = IOAPIC_NUM_PINS) here too.
 Not necessary. kvm_rtc_irq_restore is called from arch/x86/ and we
 have the defination: #ifdef CONFIG_X86 #define RTC_GSI 8
 
 The check will be false always. As the logic you suggested below, this check 
 is
 necessary for _all() not _one();
 OK.
 
 
 
 +  spin_lock(ioapic-lock);
 +  vector = ioapic-redirtbl[RTC_GSI].fields.vector;
 +  if (kvm_apic_pending_eoi(vcpu, vector)) {
 +  __set_bit(vcpu-vcpu_id, ioapic-rtc_status.dest_map);
 +  ioapic-rtc_status.pending_eoi++;
 The bit may have been set already. The logic should be:
 Right.
 
 
 
 new_val = kvm_apic_pending_eoi(vcpu, vector)
 old_val = set_bit(vcpu_id, dest_map)
 
 if (new_val == old_val)
 return;
 
 if (new_val) {
 __set_bit(vcpu_id, dest_map);
 pending_eoi++;
 } else {
 __clear_bit(vcpu_id, dest_map);
 pending_eoi--;
 }
 
 The naming of above two functions are not good either. Call
 them something like kvm_rtc_eoi_tracking_restore_all() and
 

Re: [PATCH v7 4/7] KVM: Add reset/restore rtc_status support

2013-04-07 Thread Gleb Natapov
On Sun, Apr 07, 2013 at 12:39:32PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-07:
  On Sun, Apr 07, 2013 at 02:30:15AM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-04-04:
  On Mon, Apr 01, 2013 at 08:40:13AM +0800, Yang Zhang wrote:
  From: Yang Zhang yang.z.zh...@intel.com
  
  Signed-off-by: Yang Zhang yang.z.zh...@intel.com
  ---
   arch/x86/kvm/lapic.c |9 + arch/x86/kvm/lapic.h |2 ++
   virt/kvm/ioapic.c|   43
   +++ virt/kvm/ioapic.h | 1 +
   4 files changed, 55 insertions(+), 0 deletions(-)
  diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
  index 96ab160..9c041fa 100644
  --- a/arch/x86/kvm/lapic.c
  +++ b/arch/x86/kvm/lapic.c
  @@ -94,6 +94,14 @@ static inline int apic_test_vector(int vec, void 
  *bitmap)
   return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
   }
  +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
  +{
  +struct kvm_lapic *apic = vcpu-arch.apic;
  +
  +return apic_test_vector(vector, apic-regs + APIC_ISR) ||
  +apic_test_vector(vector, apic-regs + APIC_IRR);
  +}
  +
   static inline void apic_set_vector(int vec, void *bitmap)
   {
   set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
  @@ -1665,6 +1673,7 @@ void kvm_apic_post_state_restore(struct
  kvm_vcpu
  *vcpu,
   apic-highest_isr_cache = -1;
   kvm_x86_ops-hwapic_isr_update(vcpu-kvm,
   apic_find_highest_isr(apic));   kvm_make_request(KVM_REQ_EVENT,
   vcpu); +kvm_rtc_irq_restore(vcpu); }
   
   void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
  diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
  index 967519c..004d2ad 100644
  --- a/arch/x86/kvm/lapic.h
  +++ b/arch/x86/kvm/lapic.h
  @@ -170,4 +170,6 @@ static inline bool kvm_apic_has_events(struct
  kvm_vcpu *vcpu)
   return vcpu-arch.apic-pending_events;
   }
  +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
  +
   #endif
  diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
  index 8664812..0b12b17 100644
  --- a/virt/kvm/ioapic.c
  +++ b/virt/kvm/ioapic.c
  @@ -90,6 +90,47 @@ static unsigned long ioapic_read_indirect(struct
  kvm_ioapic *ioapic,
   return result;
   }
  +static void rtc_irq_reset(struct kvm_ioapic *ioapic)
  +{
  +ioapic-rtc_status.pending_eoi = 0;
  +bitmap_zero(ioapic-rtc_status.dest_map, KVM_MAX_VCPUS);
  +}
  +
  +static void rtc_irq_restore(struct kvm_ioapic *ioapic)
  +{
  +struct kvm_vcpu *vcpu;
  +int vector, i, pending_eoi = 0;
  +
  +if (RTC_GSI = IOAPIC_NUM_PINS)
  +return;
  +
  +vector = ioapic-redirtbl[RTC_GSI].fields.vector;
  +kvm_for_each_vcpu(i, vcpu, ioapic-kvm) {
  +if (kvm_apic_pending_eoi(vcpu, vector)) {
  +pending_eoi++;
  +__set_bit(vcpu-vcpu_id, 
  ioapic-rtc_status.dest_map);
  You should cleat dest_map at the beginning to get rid of stale bits.
  I thought kvm_set_ioapic is called only after save/restore or migration. 
  And the
  ioapic should be reset successfully before call it. So the dest_map is empty
  before call rtc_irq_restore().
  But it is possible kvm_set_ioapic is called beside save/restore or
  migration. Right?
  
  First of all userspace should not care when it calls kvm_set_ioapic()
  the kernel need to do the right thing. Second, believe it or not,
  kvm_ioapic_reset() is not called during system reset. Instead userspace
  reset it by calling kvm_set_ioapic() with ioapic state after reset.
 Ok. I see. As the logic you suggested, it will clear dest_map if no pending 
 eoi in vcpu, so we don't need to do it again.
 
You again rely on userspace doing thing in certain manner. What is
set_lapic() is never called? Kernel internal state have to be correct
after each ioctl call.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v7 4/7] KVM: Add reset/restore rtc_status support

2013-04-07 Thread Zhang, Yang Z
Gleb Natapov wrote on 2013-04-07:
 On Sun, Apr 07, 2013 at 12:39:32PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-07:
 On Sun, Apr 07, 2013 at 02:30:15AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-04:
 On Mon, Apr 01, 2013 at 08:40:13AM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/x86/kvm/lapic.c |9 + arch/x86/kvm/lapic.h |2
  ++ virt/kvm/ioapic.c|   43
  +++ virt/kvm/ioapic.h | 1
  + 4 files changed, 55 insertions(+), 0 deletions(-)
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 96ab160..9c041fa 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -94,6 +94,14 @@ static inline int apic_test_vector(int vec, void
 *bitmap)
  return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
  }
 +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
 +{
 +struct kvm_lapic *apic = vcpu-arch.apic;
 +
 +return apic_test_vector(vector, apic-regs + APIC_ISR) ||
 +apic_test_vector(vector, apic-regs + APIC_IRR);
 +}
 +
  static inline void apic_set_vector(int vec, void *bitmap)
  {
  set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
 @@ -1665,6 +1673,7 @@ void kvm_apic_post_state_restore(struct
 kvm_vcpu
 *vcpu,
  apic-highest_isr_cache = -1;
  kvm_x86_ops-hwapic_isr_update(vcpu-kvm,
  apic_find_highest_isr(apic));   kvm_make_request(KVM_REQ_EVENT,
  vcpu); +kvm_rtc_irq_restore(vcpu); }
  
  void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
 index 967519c..004d2ad 100644
 --- a/arch/x86/kvm/lapic.h
 +++ b/arch/x86/kvm/lapic.h
 @@ -170,4 +170,6 @@ static inline bool kvm_apic_has_events(struct
 kvm_vcpu *vcpu)
  return vcpu-arch.apic-pending_events;
  }
 +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 +
  #endif
 diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
 index 8664812..0b12b17 100644
 --- a/virt/kvm/ioapic.c
 +++ b/virt/kvm/ioapic.c
 @@ -90,6 +90,47 @@ static unsigned long ioapic_read_indirect(struct
 kvm_ioapic *ioapic,
  return result;
  }
 +static void rtc_irq_reset(struct kvm_ioapic *ioapic)
 +{
 +ioapic-rtc_status.pending_eoi = 0;
 +bitmap_zero(ioapic-rtc_status.dest_map, KVM_MAX_VCPUS);
 +}
 +
 +static void rtc_irq_restore(struct kvm_ioapic *ioapic)
 +{
 +struct kvm_vcpu *vcpu;
 +int vector, i, pending_eoi = 0;
 +
 +if (RTC_GSI = IOAPIC_NUM_PINS)
 +return;
 +
 +vector = ioapic-redirtbl[RTC_GSI].fields.vector;
 +kvm_for_each_vcpu(i, vcpu, ioapic-kvm) {
 +if (kvm_apic_pending_eoi(vcpu, vector)) {
 +pending_eoi++;
 +__set_bit(vcpu-vcpu_id, 
 ioapic-rtc_status.dest_map);
 You should cleat dest_map at the beginning to get rid of stale bits.
 I thought kvm_set_ioapic is called only after save/restore or migration. 
 And
 the
 ioapic should be reset successfully before call it. So the dest_map is empty
 before call rtc_irq_restore().
 But it is possible kvm_set_ioapic is called beside save/restore or
 migration. Right?
 
 First of all userspace should not care when it calls kvm_set_ioapic()
 the kernel need to do the right thing. Second, believe it or not,
 kvm_ioapic_reset() is not called during system reset. Instead userspace
 reset it by calling kvm_set_ioapic() with ioapic state after reset.
 Ok. I see. As the logic you suggested, it will clear dest_map if no
 pending eoi in vcpu, so we don't need to do it again.
 
 You again rely on userspace doing thing in certain manner. What is
 set_lapic() is never called? Kernel internal state have to be correct
 after each ioctl call.
Sorry. I cannot figure out what's the problem if don't clear dest_map? Can you 
elaborate it?

Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 4/7] KVM: Add reset/restore rtc_status support

2013-04-07 Thread Gleb Natapov
On Sun, Apr 07, 2013 at 01:05:02PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-07:
  On Sun, Apr 07, 2013 at 12:39:32PM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-04-07:
  On Sun, Apr 07, 2013 at 02:30:15AM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-04-04:
  On Mon, Apr 01, 2013 at 08:40:13AM +0800, Yang Zhang wrote:
  From: Yang Zhang yang.z.zh...@intel.com
  
  Signed-off-by: Yang Zhang yang.z.zh...@intel.com
  ---
   arch/x86/kvm/lapic.c |9 + arch/x86/kvm/lapic.h |2
   ++ virt/kvm/ioapic.c|   43
   +++ virt/kvm/ioapic.h | 1
   + 4 files changed, 55 insertions(+), 0 deletions(-)
  diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
  index 96ab160..9c041fa 100644
  --- a/arch/x86/kvm/lapic.c
  +++ b/arch/x86/kvm/lapic.c
  @@ -94,6 +94,14 @@ static inline int apic_test_vector(int vec, void
  *bitmap)
 return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
   }
  +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
  +{
  +  struct kvm_lapic *apic = vcpu-arch.apic;
  +
  +  return apic_test_vector(vector, apic-regs + APIC_ISR) ||
  +  apic_test_vector(vector, apic-regs + APIC_IRR);
  +}
  +
   static inline void apic_set_vector(int vec, void *bitmap)
   {
 set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
  @@ -1665,6 +1673,7 @@ void kvm_apic_post_state_restore(struct
  kvm_vcpu
  *vcpu,
 apic-highest_isr_cache = -1;
 kvm_x86_ops-hwapic_isr_update(vcpu-kvm,
   apic_find_highest_isr(apic)); kvm_make_request(KVM_REQ_EVENT,
   vcpu); +  kvm_rtc_irq_restore(vcpu); }
   
   void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
  diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
  index 967519c..004d2ad 100644
  --- a/arch/x86/kvm/lapic.h
  +++ b/arch/x86/kvm/lapic.h
  @@ -170,4 +170,6 @@ static inline bool kvm_apic_has_events(struct
  kvm_vcpu *vcpu)
 return vcpu-arch.apic-pending_events;
   }
  +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
  +
   #endif
  diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
  index 8664812..0b12b17 100644
  --- a/virt/kvm/ioapic.c
  +++ b/virt/kvm/ioapic.c
  @@ -90,6 +90,47 @@ static unsigned long ioapic_read_indirect(struct
  kvm_ioapic *ioapic,
 return result;
   }
  +static void rtc_irq_reset(struct kvm_ioapic *ioapic)
  +{
  +  ioapic-rtc_status.pending_eoi = 0;
  +  bitmap_zero(ioapic-rtc_status.dest_map, KVM_MAX_VCPUS);
  +}
  +
  +static void rtc_irq_restore(struct kvm_ioapic *ioapic)
  +{
  +  struct kvm_vcpu *vcpu;
  +  int vector, i, pending_eoi = 0;
  +
  +  if (RTC_GSI = IOAPIC_NUM_PINS)
  +  return;
  +
  +  vector = ioapic-redirtbl[RTC_GSI].fields.vector;
  +  kvm_for_each_vcpu(i, vcpu, ioapic-kvm) {
  +  if (kvm_apic_pending_eoi(vcpu, vector)) {
  +  pending_eoi++;
  +  __set_bit(vcpu-vcpu_id, 
  ioapic-rtc_status.dest_map);
  You should cleat dest_map at the beginning to get rid of stale bits.
  I thought kvm_set_ioapic is called only after save/restore or migration. 
  And
  the
  ioapic should be reset successfully before call it. So the dest_map is 
  empty
  before call rtc_irq_restore().
  But it is possible kvm_set_ioapic is called beside save/restore or
  migration. Right?
  
  First of all userspace should not care when it calls kvm_set_ioapic()
  the kernel need to do the right thing. Second, believe it or not,
  kvm_ioapic_reset() is not called during system reset. Instead userspace
  reset it by calling kvm_set_ioapic() with ioapic state after reset.
  Ok. I see. As the logic you suggested, it will clear dest_map if no
  pending eoi in vcpu, so we don't need to do it again.
  
  You again rely on userspace doing thing in certain manner. What is
  set_lapic() is never called? Kernel internal state have to be correct
  after each ioctl call.
 Sorry. I cannot figure out what's the problem if don't clear dest_map? Can 
 you elaborate it?
 
What is not obvious about it? If there is a bit in dest_map that should
be cleared after rtc_irq_restore() it will not.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v7 4/7] KVM: Add reset/restore rtc_status support

2013-04-07 Thread Zhang, Yang Z
Gleb Natapov wrote on 2013-04-07:
 On Sun, Apr 07, 2013 at 01:05:02PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-07:
 On Sun, Apr 07, 2013 at 12:39:32PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-07:
 On Sun, Apr 07, 2013 at 02:30:15AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-04:
 On Mon, Apr 01, 2013 at 08:40:13AM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/x86/kvm/lapic.c |9 + arch/x86/kvm/lapic.h | 2
  ++ virt/kvm/ioapic.c|   43
  +++ virt/kvm/ioapic.h |
  1 + 4 files changed, 55 insertions(+), 0 deletions(-)
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 96ab160..9c041fa 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -94,6 +94,14 @@ static inline int apic_test_vector(int vec, void
 *bitmap)
return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
  }
 +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
 +{
 +  struct kvm_lapic *apic = vcpu-arch.apic;
 +
 +  return apic_test_vector(vector, apic-regs + APIC_ISR) ||
 +  apic_test_vector(vector, apic-regs + APIC_IRR);
 +}
 +
  static inline void apic_set_vector(int vec, void *bitmap)
  {
set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
 @@ -1665,6 +1673,7 @@ void kvm_apic_post_state_restore(struct
 kvm_vcpu
 *vcpu,
apic-highest_isr_cache = -1;
kvm_x86_ops-hwapic_isr_update(vcpu-kvm,
  apic_find_highest_isr(apic)); kvm_make_request(KVM_REQ_EVENT,
  vcpu); +  kvm_rtc_irq_restore(vcpu); }
  
  void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
 index 967519c..004d2ad 100644
 --- a/arch/x86/kvm/lapic.h
 +++ b/arch/x86/kvm/lapic.h
 @@ -170,4 +170,6 @@ static inline bool kvm_apic_has_events(struct
 kvm_vcpu *vcpu)
return vcpu-arch.apic-pending_events;
  }
 +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 +
  #endif
 diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
 index 8664812..0b12b17 100644
 --- a/virt/kvm/ioapic.c
 +++ b/virt/kvm/ioapic.c
 @@ -90,6 +90,47 @@ static unsigned long ioapic_read_indirect(struct
 kvm_ioapic *ioapic,
return result;
  }
 +static void rtc_irq_reset(struct kvm_ioapic *ioapic)
 +{
 +  ioapic-rtc_status.pending_eoi = 0;
 +  bitmap_zero(ioapic-rtc_status.dest_map, KVM_MAX_VCPUS);
 +}
 +
 +static void rtc_irq_restore(struct kvm_ioapic *ioapic)
 +{
 +  struct kvm_vcpu *vcpu;
 +  int vector, i, pending_eoi = 0;
 +
 +  if (RTC_GSI = IOAPIC_NUM_PINS)
 +  return;
 +
 +  vector = ioapic-redirtbl[RTC_GSI].fields.vector;
 +  kvm_for_each_vcpu(i, vcpu, ioapic-kvm) {
 +  if (kvm_apic_pending_eoi(vcpu, vector)) {
 +  pending_eoi++;
 +  __set_bit(vcpu-vcpu_id,
 ioapic-rtc_status.dest_map);
 You should cleat dest_map at the beginning to get rid of stale bits.
 I thought kvm_set_ioapic is called only after save/restore or migration.
 And
 the
 ioapic should be reset successfully before call it. So the dest_map is 
 empty
 before call rtc_irq_restore().
 But it is possible kvm_set_ioapic is called beside save/restore or
 migration. Right?
 
 First of all userspace should not care when it calls kvm_set_ioapic()
 the kernel need to do the right thing. Second, believe it or not,
 kvm_ioapic_reset() is not called during system reset. Instead userspace
 reset it by calling kvm_set_ioapic() with ioapic state after reset.
 Ok. I see. As the logic you suggested, it will clear dest_map if no
 pending eoi in vcpu, so we don't need to do it again.
 
 You again rely on userspace doing thing in certain manner. What is
 set_lapic() is never called? Kernel internal state have to be correct
 after each ioctl call.
 Sorry. I cannot figure out what's the problem if don't clear dest_map?
 Can you elaborate it?
 
 What is not obvious about it? If there is a bit in dest_map that should
 be cleared after rtc_irq_restore() it will not.
Why it will not? If new_val is false, and the old_val is true. __clear_bit() 
will clear the dest_map. Am I wrong?

new_val = kvm_apic_pending_eoi(vcpu, vector);
old_val = test_bit(vcpu-vcpu_id, ioapic-rtc_status.dest_map);

if (new_val == old_val)
return;

if (new_val) {
__set_bit(vcpu-vcpu_id, ioapic-rtc_status.dest_map);
ioapic-rtc_status.pending_eoi++;
} else {
__clear_bit(vcpu-vcpu_id, ioapic-rtc_status.dest_map);
ioapic-rtc_status.pending_eoi--;
}

Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 4/7] KVM: Add reset/restore rtc_status support

2013-04-07 Thread Gleb Natapov
On Sun, Apr 07, 2013 at 01:16:51PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-07:
  On Sun, Apr 07, 2013 at 01:05:02PM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-04-07:
  On Sun, Apr 07, 2013 at 12:39:32PM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-04-07:
  On Sun, Apr 07, 2013 at 02:30:15AM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-04-04:
  On Mon, Apr 01, 2013 at 08:40:13AM +0800, Yang Zhang wrote:
  From: Yang Zhang yang.z.zh...@intel.com
  
  Signed-off-by: Yang Zhang yang.z.zh...@intel.com
  ---
   arch/x86/kvm/lapic.c |9 + arch/x86/kvm/lapic.h | 2
   ++ virt/kvm/ioapic.c|   43
   +++ virt/kvm/ioapic.h |
   1 + 4 files changed, 55 insertions(+), 0 deletions(-)
  diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
  index 96ab160..9c041fa 100644
  --- a/arch/x86/kvm/lapic.c
  +++ b/arch/x86/kvm/lapic.c
  @@ -94,6 +94,14 @@ static inline int apic_test_vector(int vec, void
  *bitmap)
   return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
   }
  +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
  +{
  +struct kvm_lapic *apic = vcpu-arch.apic;
  +
  +return apic_test_vector(vector, apic-regs + APIC_ISR) ||
  +apic_test_vector(vector, apic-regs + APIC_IRR);
  +}
  +
   static inline void apic_set_vector(int vec, void *bitmap)
   {
   set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
  @@ -1665,6 +1673,7 @@ void kvm_apic_post_state_restore(struct
  kvm_vcpu
  *vcpu,
   apic-highest_isr_cache = -1;
   kvm_x86_ops-hwapic_isr_update(vcpu-kvm,
   apic_find_highest_isr(apic));   kvm_make_request(KVM_REQ_EVENT,
   vcpu); +kvm_rtc_irq_restore(vcpu); }
   
   void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
  diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
  index 967519c..004d2ad 100644
  --- a/arch/x86/kvm/lapic.h
  +++ b/arch/x86/kvm/lapic.h
  @@ -170,4 +170,6 @@ static inline bool kvm_apic_has_events(struct
  kvm_vcpu *vcpu)
   return vcpu-arch.apic-pending_events;
   }
  +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
  +
   #endif
  diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
  index 8664812..0b12b17 100644
  --- a/virt/kvm/ioapic.c
  +++ b/virt/kvm/ioapic.c
  @@ -90,6 +90,47 @@ static unsigned long ioapic_read_indirect(struct
  kvm_ioapic *ioapic,
   return result;
   }
  +static void rtc_irq_reset(struct kvm_ioapic *ioapic)
  +{
  +ioapic-rtc_status.pending_eoi = 0;
  +bitmap_zero(ioapic-rtc_status.dest_map, KVM_MAX_VCPUS);
  +}
  +
  +static void rtc_irq_restore(struct kvm_ioapic *ioapic)
  +{
  +struct kvm_vcpu *vcpu;
  +int vector, i, pending_eoi = 0;
  +
  +if (RTC_GSI = IOAPIC_NUM_PINS)
  +return;
  +
  +vector = ioapic-redirtbl[RTC_GSI].fields.vector;
  +kvm_for_each_vcpu(i, vcpu, ioapic-kvm) {
  +if (kvm_apic_pending_eoi(vcpu, vector)) {
  +pending_eoi++;
  +__set_bit(vcpu-vcpu_id,
  ioapic-rtc_status.dest_map);
  You should cleat dest_map at the beginning to get rid of stale bits.
  I thought kvm_set_ioapic is called only after save/restore or 
  migration.
  And
  the
  ioapic should be reset successfully before call it. So the dest_map is 
  empty
  before call rtc_irq_restore().
  But it is possible kvm_set_ioapic is called beside save/restore or
  migration. Right?
  
  First of all userspace should not care when it calls kvm_set_ioapic()
  the kernel need to do the right thing. Second, believe it or not,
  kvm_ioapic_reset() is not called during system reset. Instead userspace
  reset it by calling kvm_set_ioapic() with ioapic state after reset.
  Ok. I see. As the logic you suggested, it will clear dest_map if no
  pending eoi in vcpu, so we don't need to do it again.
  
  You again rely on userspace doing thing in certain manner. What is
  set_lapic() is never called? Kernel internal state have to be correct
  after each ioctl call.
  Sorry. I cannot figure out what's the problem if don't clear dest_map?
  Can you elaborate it?
  
  What is not obvious about it? If there is a bit in dest_map that should
  be cleared after rtc_irq_restore() it will not.
 Why it will not? If new_val is false, and the old_val is true. __clear_bit() 
 will clear the dest_map. Am I wrong?
 
Ah, yes with new logic since we go over all vcpus and calculate new
value for each one in theory it should be fine, but if we add cpu
destruction this will be no longer true.

 new_val = kvm_apic_pending_eoi(vcpu, vector);
Which reminds me there are more bugs in the current code. It is not
enough to call kvm_apic_pending_eoi() to check the new value. You need to
see if the entry is masked and vcpu is the destination of the interrupt too.
 
 old_val = test_bit(vcpu-vcpu_id, ioapic-rtc_status.dest_map);
 
 if (new_val == old_val)
 return;
 
 if (new_val) {
 __set_bit(vcpu-vcpu_id, 

Re: [PATCH v7 4/7] KVM: Call common update function when ioapic entry changed.

2013-04-07 Thread Gleb Natapov
On Mon, Apr 01, 2013 at 11:32:32AM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Both TMR and EOI exit bitmap need to be updated when ioapic changed
 or vcpu's id/ldr/dfr changed. So use common function instead eoi exit
 bitmap specific function.
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/ia64/kvm/lapic.h|6 --
  arch/x86/kvm/lapic.c |2 +-
  arch/x86/kvm/vmx.c   |3 +++
  arch/x86/kvm/x86.c   |   11 +++
  include/linux/kvm_host.h |4 ++--
  virt/kvm/ioapic.c|   26 +++---
  virt/kvm/ioapic.h|7 +++
  virt/kvm/irq_comm.c  |4 ++--
  virt/kvm/kvm_main.c  |4 ++--
  9 files changed, 35 insertions(+), 32 deletions(-)
 
 diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
 index c3e2935..c5f92a9 100644
 --- a/arch/ia64/kvm/lapic.h
 +++ b/arch/ia64/kvm/lapic.h
 @@ -27,10 +27,4 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct 
 kvm_lapic_irq *irq);
  #define kvm_apic_present(x) (true)
  #define kvm_lapic_enabled(x) (true)
  
 -static inline bool kvm_apic_vid_enabled(void)
 -{
 - /* IA64 has no apicv supporting, do nothing here */
 - return false;
 -}
 -
  #endif
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index e227474..ce8d6f6 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -209,7 +209,7 @@ out:
   if (old)
   kfree_rcu(old, rcu);
  
 - kvm_ioapic_make_eoibitmap_request(kvm);
 + kvm_vcpu_scan_ioapic(kvm);
  }
  
  static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index b2e95bc..edfc87a 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -6420,6 +6420,9 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu 
 *vcpu, int max_irr)
  
  static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
  {
 + if (!vmx_vm_has_apicv(vcpu-kvm))
 + return;
 +
   vmcs_write64(EOI_EXIT_BITMAP0, eoi_exit_bitmap[0]);
   vmcs_write64(EOI_EXIT_BITMAP1, eoi_exit_bitmap[1]);
   vmcs_write64(EOI_EXIT_BITMAP2, eoi_exit_bitmap[2]);
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 4d42fe1..64241b6 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -5647,13 +5647,16 @@ static void kvm_gen_update_masterclock(struct kvm 
 *kvm)
  #endif
  }
  
 -static void update_eoi_exitmap(struct kvm_vcpu *vcpu)
 +static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
  {
   u64 eoi_exit_bitmap[4];
  
 + if (!kvm_lapic_enabled(vcpu))
 + return;
 +
Why is this needed here?

   memset(eoi_exit_bitmap, 0, 32);
  
 - kvm_ioapic_calculate_eoi_exitmap(vcpu, eoi_exit_bitmap);
 + kvm_ioapic_scan_entry(vcpu, (unsigned long *)eoi_exit_bitmap);
   kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap);
  }
  
 @@ -5710,8 +5713,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
   kvm_handle_pmu_event(vcpu);
   if (kvm_check_request(KVM_REQ_PMI, vcpu))
   kvm_deliver_pmi(vcpu);
 - if (kvm_check_request(KVM_REQ_EOIBITMAP, vcpu))
 - update_eoi_exitmap(vcpu);
 + if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
 + vcpu_scan_ioapic(vcpu);
   }
  
   if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index 1c0be23..ef1b3e3 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -126,7 +126,7 @@ static inline bool is_error_page(struct page *page)
  #define KVM_REQ_MASTERCLOCK_UPDATE 19
  #define KVM_REQ_MCLOCK_INPROGRESS 20
  #define KVM_REQ_EPR_EXIT  21
 -#define KVM_REQ_EOIBITMAP 22
 +#define KVM_REQ_SCAN_IOAPIC   22
  
  #define KVM_USERSPACE_IRQ_SOURCE_ID  0
  #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1
 @@ -571,7 +571,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
  void kvm_flush_remote_tlbs(struct kvm *kvm);
  void kvm_reload_remote_mmus(struct kvm *kvm);
  void kvm_make_mclock_inprogress_request(struct kvm *kvm);
 -void kvm_make_update_eoibitmap_request(struct kvm *kvm);
 +void kvm_make_scan_ioapic_request(struct kvm *kvm);
  
  long kvm_arch_dev_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg);
 diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
 index fbd0556..f37c889 100644
 --- a/virt/kvm/ioapic.c
 +++ b/virt/kvm/ioapic.c
 @@ -119,8 +119,8 @@ static void update_handled_vectors(struct kvm_ioapic 
 *ioapic)
   smp_wmb();
  }
  
 -void kvm_ioapic_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
 - u64 *eoi_exit_bitmap)
 +void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
 + unsigned long *eoi_exit_bitmap)
  {
   struct kvm_ioapic *ioapic = vcpu-kvm-arch.vioapic;
   union kvm_ioapic_redirect_entry *e;
 @@ -128,7 

RE: [PATCH v7 4/7] KVM: Add reset/restore rtc_status support

2013-04-07 Thread Zhang, Yang Z
Gleb Natapov wrote on 2013-04-07:
 On Sun, Apr 07, 2013 at 01:16:51PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-07:
 On Sun, Apr 07, 2013 at 01:05:02PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-07:
 On Sun, Apr 07, 2013 at 12:39:32PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-07:
 On Sun, Apr 07, 2013 at 02:30:15AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-04:
 On Mon, Apr 01, 2013 at 08:40:13AM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/x86/kvm/lapic.c |9 + arch/x86/kvm/lapic.h | 2
  ++ virt/kvm/ioapic.c|   43
  +++ virt/kvm/ioapic.h
  | 1 + 4 files changed, 55 insertions(+), 0 deletions(-)
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 96ab160..9c041fa 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -94,6 +94,14 @@ static inline int apic_test_vector(int vec, void
 *bitmap)
  return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
  }
 +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
 +{
 +struct kvm_lapic *apic = vcpu-arch.apic;
 +
 +return apic_test_vector(vector, apic-regs + APIC_ISR) ||
 +apic_test_vector(vector, apic-regs + APIC_IRR);
 +}
 +
  static inline void apic_set_vector(int vec, void *bitmap)
  {
  set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
 @@ -1665,6 +1673,7 @@ void kvm_apic_post_state_restore(struct
 kvm_vcpu
 *vcpu,
  apic-highest_isr_cache = -1;
  kvm_x86_ops-hwapic_isr_update(vcpu-kvm,
  apic_find_highest_isr(apic));   kvm_make_request(KVM_REQ_EVENT,
  vcpu); +kvm_rtc_irq_restore(vcpu); }
  
  void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
 index 967519c..004d2ad 100644
 --- a/arch/x86/kvm/lapic.h
 +++ b/arch/x86/kvm/lapic.h
 @@ -170,4 +170,6 @@ static inline bool
 kvm_apic_has_events(struct
 kvm_vcpu *vcpu)
  return vcpu-arch.apic-pending_events;
  }
 +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 +
  #endif
 diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
 index 8664812..0b12b17 100644
 --- a/virt/kvm/ioapic.c
 +++ b/virt/kvm/ioapic.c
 @@ -90,6 +90,47 @@ static unsigned long
 ioapic_read_indirect(struct
 kvm_ioapic *ioapic,
  return result;
  }
 +static void rtc_irq_reset(struct kvm_ioapic *ioapic)
 +{
 +ioapic-rtc_status.pending_eoi = 0;
 +bitmap_zero(ioapic-rtc_status.dest_map, KVM_MAX_VCPUS);
 +}
 +
 +static void rtc_irq_restore(struct kvm_ioapic *ioapic)
 +{
 +struct kvm_vcpu *vcpu;
 +int vector, i, pending_eoi = 0;
 +
 +if (RTC_GSI = IOAPIC_NUM_PINS)
 +return;
 +
 +vector = ioapic-redirtbl[RTC_GSI].fields.vector;
 +kvm_for_each_vcpu(i, vcpu, ioapic-kvm) {
 +if (kvm_apic_pending_eoi(vcpu, vector)) {
 +pending_eoi++;
 +__set_bit(vcpu-vcpu_id,
 ioapic-rtc_status.dest_map);
 You should cleat dest_map at the beginning to get rid of stale bits.
 I thought kvm_set_ioapic is called only after save/restore or 
 migration.
 And
 the
 ioapic should be reset successfully before call it. So the
 dest_map is empty before call rtc_irq_restore().
 But it is possible kvm_set_ioapic is called beside save/restore or
 migration. Right?
 
 First of all userspace should not care when it calls kvm_set_ioapic()
 the kernel need to do the right thing. Second, believe it or not,
 kvm_ioapic_reset() is not called during system reset. Instead userspace
 reset it by calling kvm_set_ioapic() with ioapic state after reset.
 Ok. I see. As the logic you suggested, it will clear dest_map if no
 pending eoi in vcpu, so we don't need to do it again.
 
 You again rely on userspace doing thing in certain manner. What is
 set_lapic() is never called? Kernel internal state have to be correct
 after each ioctl call.
 Sorry. I cannot figure out what's the problem if don't clear dest_map?
 Can you elaborate it?
 
 What is not obvious about it? If there is a bit in dest_map that should
 be cleared after rtc_irq_restore() it will not.
 Why it will not? If new_val is false, and the old_val is true.
 __clear_bit() will clear the dest_map. Am I wrong?
 
 Ah, yes with new logic since we go over all vcpus and calculate new
 value for each one in theory it should be fine, but if we add cpu
 destruction this will be no longer true.
Yes. Given cpu destruction, it is wrong. Do you think we should clear dest_map 
now or delay it until cpu destruction is ready?

 
 new_val = kvm_apic_pending_eoi(vcpu, vector);
 Which reminds me there are more bugs in the current code. It is not
 enough to call kvm_apic_pending_eoi() to check the new value. You need to
 see if the entry is masked and vcpu is the destination of the interrupt too.
Yes, you are right.

 
 old_val = test_bit(vcpu-vcpu_id, ioapic-rtc_status.dest_map);
 
 if (new_val == old_val)
 

Re: [PATCH v7 4/7] KVM: Add reset/restore rtc_status support

2013-04-07 Thread Gleb Natapov
On Sun, Apr 07, 2013 at 01:46:57PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-07:
  On Sun, Apr 07, 2013 at 01:16:51PM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-04-07:
  On Sun, Apr 07, 2013 at 01:05:02PM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-04-07:
  On Sun, Apr 07, 2013 at 12:39:32PM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-04-07:
  On Sun, Apr 07, 2013 at 02:30:15AM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-04-04:
  On Mon, Apr 01, 2013 at 08:40:13AM +0800, Yang Zhang wrote:
  From: Yang Zhang yang.z.zh...@intel.com
  
  Signed-off-by: Yang Zhang yang.z.zh...@intel.com
  ---
   arch/x86/kvm/lapic.c |9 + arch/x86/kvm/lapic.h | 2
   ++ virt/kvm/ioapic.c|   43
   +++ virt/kvm/ioapic.h
   | 1 + 4 files changed, 55 insertions(+), 0 deletions(-)
  diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
  index 96ab160..9c041fa 100644
  --- a/arch/x86/kvm/lapic.c
  +++ b/arch/x86/kvm/lapic.c
  @@ -94,6 +94,14 @@ static inline int apic_test_vector(int vec, void
  *bitmap)
 return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
   }
  +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
  +{
  +  struct kvm_lapic *apic = vcpu-arch.apic;
  +
  +  return apic_test_vector(vector, apic-regs + APIC_ISR) ||
  +  apic_test_vector(vector, apic-regs + APIC_IRR);
  +}
  +
   static inline void apic_set_vector(int vec, void *bitmap)
   {
 set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
  @@ -1665,6 +1673,7 @@ void kvm_apic_post_state_restore(struct
  kvm_vcpu
  *vcpu,
 apic-highest_isr_cache = -1;
 kvm_x86_ops-hwapic_isr_update(vcpu-kvm,
   apic_find_highest_isr(apic)); kvm_make_request(KVM_REQ_EVENT,
   vcpu); +  kvm_rtc_irq_restore(vcpu); }
   
   void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
  diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
  index 967519c..004d2ad 100644
  --- a/arch/x86/kvm/lapic.h
  +++ b/arch/x86/kvm/lapic.h
  @@ -170,4 +170,6 @@ static inline bool
  kvm_apic_has_events(struct
  kvm_vcpu *vcpu)
 return vcpu-arch.apic-pending_events;
   }
  +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
  +
   #endif
  diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
  index 8664812..0b12b17 100644
  --- a/virt/kvm/ioapic.c
  +++ b/virt/kvm/ioapic.c
  @@ -90,6 +90,47 @@ static unsigned long
  ioapic_read_indirect(struct
  kvm_ioapic *ioapic,
 return result;
   }
  +static void rtc_irq_reset(struct kvm_ioapic *ioapic)
  +{
  +  ioapic-rtc_status.pending_eoi = 0;
  +  bitmap_zero(ioapic-rtc_status.dest_map, KVM_MAX_VCPUS);
  +}
  +
  +static void rtc_irq_restore(struct kvm_ioapic *ioapic)
  +{
  +  struct kvm_vcpu *vcpu;
  +  int vector, i, pending_eoi = 0;
  +
  +  if (RTC_GSI = IOAPIC_NUM_PINS)
  +  return;
  +
  +  vector = ioapic-redirtbl[RTC_GSI].fields.vector;
  +  kvm_for_each_vcpu(i, vcpu, ioapic-kvm) {
  +  if (kvm_apic_pending_eoi(vcpu, vector)) {
  +  pending_eoi++;
  +  __set_bit(vcpu-vcpu_id,
  ioapic-rtc_status.dest_map);
  You should cleat dest_map at the beginning to get rid of stale bits.
  I thought kvm_set_ioapic is called only after save/restore or 
  migration.
  And
  the
  ioapic should be reset successfully before call it. So the
  dest_map is empty before call rtc_irq_restore().
  But it is possible kvm_set_ioapic is called beside save/restore or
  migration. Right?
  
  First of all userspace should not care when it calls kvm_set_ioapic()
  the kernel need to do the right thing. Second, believe it or not,
  kvm_ioapic_reset() is not called during system reset. Instead 
  userspace
  reset it by calling kvm_set_ioapic() with ioapic state after reset.
  Ok. I see. As the logic you suggested, it will clear dest_map if no
  pending eoi in vcpu, so we don't need to do it again.
  
  You again rely on userspace doing thing in certain manner. What is
  set_lapic() is never called? Kernel internal state have to be correct
  after each ioctl call.
  Sorry. I cannot figure out what's the problem if don't clear dest_map?
  Can you elaborate it?
  
  What is not obvious about it? If there is a bit in dest_map that should
  be cleared after rtc_irq_restore() it will not.
  Why it will not? If new_val is false, and the old_val is true.
  __clear_bit() will clear the dest_map. Am I wrong?
  
  Ah, yes with new logic since we go over all vcpus and calculate new
  value for each one in theory it should be fine, but if we add cpu
  destruction this will be no longer true.
 Yes. Given cpu destruction, it is wrong. Do you think we should clear 
 dest_map now or delay it until cpu destruction is ready?
 
We will surely forgot it at that point and this is slow path. Lets clear
it.

  
  new_val = kvm_apic_pending_eoi(vcpu, vector);
  Which reminds me there are more bugs in the current code. It is not
  enough to call kvm_apic_pending_eoi() to check the 

RE: [PATCH v7 4/7] KVM: Call common update function when ioapic entry changed.

2013-04-07 Thread Zhang, Yang Z
Gleb Natapov wrote on 2013-04-07:
 On Mon, Apr 01, 2013 at 11:32:32AM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Both TMR and EOI exit bitmap need to be updated when ioapic changed
 or vcpu's id/ldr/dfr changed. So use common function instead eoi exit
 bitmap specific function.
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/ia64/kvm/lapic.h|6 --
  arch/x86/kvm/lapic.c |2 +-
  arch/x86/kvm/vmx.c   |3 +++
  arch/x86/kvm/x86.c   |   11 +++
  include/linux/kvm_host.h |4 ++--
  virt/kvm/ioapic.c|   26 +++---
  virt/kvm/ioapic.h|7 +++
  virt/kvm/irq_comm.c  |4 ++--
  virt/kvm/kvm_main.c  |4 ++--
  9 files changed, 35 insertions(+), 32 deletions(-)
 diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
 index c3e2935..c5f92a9 100644
 --- a/arch/ia64/kvm/lapic.h
 +++ b/arch/ia64/kvm/lapic.h
 @@ -27,10 +27,4 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct
 kvm_lapic_irq *irq);
  #define kvm_apic_present(x) (true)
  #define kvm_lapic_enabled(x) (true)
 -static inline bool kvm_apic_vid_enabled(void)
 -{
 -/* IA64 has no apicv supporting, do nothing here */
 -return false;
 -}
 -
  #endif
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index e227474..ce8d6f6 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -209,7 +209,7 @@ out:
  if (old)
  kfree_rcu(old, rcu);
 -kvm_ioapic_make_eoibitmap_request(kvm);
 +kvm_vcpu_scan_ioapic(kvm);
  }
  
  static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
 b2e95bc..edfc87a 100644 --- a/arch/x86/kvm/vmx.c +++
 b/arch/x86/kvm/vmx.c @@ -6420,6 +6420,9 @@ static void
 vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
 
  static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64
  *eoi_exit_bitmap) {
 +if (!vmx_vm_has_apicv(vcpu-kvm))
 +return;
 +
  vmcs_write64(EOI_EXIT_BITMAP0, eoi_exit_bitmap[0]);
  vmcs_write64(EOI_EXIT_BITMAP1, eoi_exit_bitmap[1]);
  vmcs_write64(EOI_EXIT_BITMAP2, eoi_exit_bitmap[2]);
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 4d42fe1..64241b6 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -5647,13 +5647,16 @@ static void kvm_gen_update_masterclock(struct
 kvm *kvm)
  #endif
  }
 -static void update_eoi_exitmap(struct kvm_vcpu *vcpu)
 +static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
  {
  u64 eoi_exit_bitmap[4];
 +if (!kvm_lapic_enabled(vcpu))
 +return;
 +
 Why is this needed here?
We don't need to calculate eoi_exit_bitmap and TMR if lapic is not enabled. 
Also, ioapic is meaningless for the vcpu that doesn't enable the lapic.

 
  memset(eoi_exit_bitmap, 0, 32);
 -kvm_ioapic_calculate_eoi_exitmap(vcpu, eoi_exit_bitmap);
 +kvm_ioapic_scan_entry(vcpu, (unsigned long *)eoi_exit_bitmap);
  kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap);
  }
 @@ -5710,8 +5713,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
  kvm_handle_pmu_event(vcpu);
  if (kvm_check_request(KVM_REQ_PMI, vcpu))
  kvm_deliver_pmi(vcpu);
 -if (kvm_check_request(KVM_REQ_EOIBITMAP, vcpu))
 -update_eoi_exitmap(vcpu);
 +if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
 +vcpu_scan_ioapic(vcpu);
  }
  
  if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index 1c0be23..ef1b3e3 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -126,7 +126,7 @@ static inline bool is_error_page(struct page *page)
  #define KVM_REQ_MASTERCLOCK_UPDATE 19
  #define KVM_REQ_MCLOCK_INPROGRESS 20
  #define KVM_REQ_EPR_EXIT  21
 -#define KVM_REQ_EOIBITMAP 22
 +#define KVM_REQ_SCAN_IOAPIC   22
 
  #define KVM_USERSPACE_IRQ_SOURCE_ID 0 #define
  KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID1 @@ -571,7 +571,7 @@ void
  kvm_put_guest_fpu(struct kvm_vcpu *vcpu); void
  kvm_flush_remote_tlbs(struct kvm *kvm); void
  kvm_reload_remote_mmus(struct kvm *kvm); void
  kvm_make_mclock_inprogress_request(struct kvm *kvm);
 -void kvm_make_update_eoibitmap_request(struct kvm *kvm);
 +void kvm_make_scan_ioapic_request(struct kvm *kvm);
 
  long kvm_arch_dev_ioctl(struct file *filp,
  unsigned int ioctl, unsigned long arg);
 diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
 index fbd0556..f37c889 100644
 --- a/virt/kvm/ioapic.c
 +++ b/virt/kvm/ioapic.c
 @@ -119,8 +119,8 @@ static void update_handled_vectors(struct kvm_ioapic
 *ioapic)
  smp_wmb();
  }
 -void kvm_ioapic_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
 -u64 *eoi_exit_bitmap)
 +void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
 +unsigned long 

Re: [PATCH v7 4/7] KVM: Call common update function when ioapic entry changed.

2013-04-07 Thread Gleb Natapov
On Sun, Apr 07, 2013 at 02:00:04PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-07:
  On Mon, Apr 01, 2013 at 11:32:32AM +0800, Yang Zhang wrote:
  From: Yang Zhang yang.z.zh...@intel.com
  
  Both TMR and EOI exit bitmap need to be updated when ioapic changed
  or vcpu's id/ldr/dfr changed. So use common function instead eoi exit
  bitmap specific function.
  
  Signed-off-by: Yang Zhang yang.z.zh...@intel.com
  ---
   arch/ia64/kvm/lapic.h|6 --
   arch/x86/kvm/lapic.c |2 +-
   arch/x86/kvm/vmx.c   |3 +++
   arch/x86/kvm/x86.c   |   11 +++
   include/linux/kvm_host.h |4 ++--
   virt/kvm/ioapic.c|   26 +++---
   virt/kvm/ioapic.h|7 +++
   virt/kvm/irq_comm.c  |4 ++--
   virt/kvm/kvm_main.c  |4 ++--
   9 files changed, 35 insertions(+), 32 deletions(-)
  diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
  index c3e2935..c5f92a9 100644
  --- a/arch/ia64/kvm/lapic.h
  +++ b/arch/ia64/kvm/lapic.h
  @@ -27,10 +27,4 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct
  kvm_lapic_irq *irq);
   #define kvm_apic_present(x) (true)
   #define kvm_lapic_enabled(x) (true)
  -static inline bool kvm_apic_vid_enabled(void)
  -{
  -  /* IA64 has no apicv supporting, do nothing here */
  -  return false;
  -}
  -
   #endif
  diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
  index e227474..ce8d6f6 100644
  --- a/arch/x86/kvm/lapic.c
  +++ b/arch/x86/kvm/lapic.c
  @@ -209,7 +209,7 @@ out:
 if (old)
 kfree_rcu(old, rcu);
  -  kvm_ioapic_make_eoibitmap_request(kvm);
  +  kvm_vcpu_scan_ioapic(kvm);
   }
   
   static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
  diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
  b2e95bc..edfc87a 100644 --- a/arch/x86/kvm/vmx.c +++
  b/arch/x86/kvm/vmx.c @@ -6420,6 +6420,9 @@ static void
  vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
  
   static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64
   *eoi_exit_bitmap) {
  +  if (!vmx_vm_has_apicv(vcpu-kvm))
  +  return;
  +
 vmcs_write64(EOI_EXIT_BITMAP0, eoi_exit_bitmap[0]);
 vmcs_write64(EOI_EXIT_BITMAP1, eoi_exit_bitmap[1]);
 vmcs_write64(EOI_EXIT_BITMAP2, eoi_exit_bitmap[2]);
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index 4d42fe1..64241b6 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -5647,13 +5647,16 @@ static void kvm_gen_update_masterclock(struct
  kvm *kvm)
   #endif
   }
  -static void update_eoi_exitmap(struct kvm_vcpu *vcpu)
  +static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
   {
 u64 eoi_exit_bitmap[4];
  +  if (!kvm_lapic_enabled(vcpu))
  +  return;
  +
  Why is this needed here?
 We don't need to calculate eoi_exit_bitmap and TMR if lapic is not enabled. 
 Also, ioapic is meaningless for the vcpu that doesn't enable the lapic.
 
OK, but then let's use apic_enabled() since kvm_lapic_enabled() also
checks for in kernel apic and we should not be here if apic is not
emulated in kernel. Also please make sure that we rescan ioapic on all
apic state changes.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 5/7] KVM: Set TMR when programming ioapic entry

2013-04-07 Thread Gleb Natapov
On Mon, Apr 01, 2013 at 11:32:33AM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 We already know the trigger mode of a given interrupt when programming
 the ioapice entry. So it's not necessary to set it in each interrupt
 delivery.
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/x86/kvm/lapic.c |   15 +--
  arch/x86/kvm/lapic.h |1 +
  arch/x86/kvm/x86.c   |6 +-
  virt/kvm/ioapic.c|9 +++--
  virt/kvm/ioapic.h|3 ++-
  5 files changed, 24 insertions(+), 10 deletions(-)
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index ce8d6f6..686afee 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -458,6 +458,15 @@ static inline int apic_find_highest_isr(struct kvm_lapic 
 *apic)
   return result;
  }
  
 +void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr)
 +{
 + struct kvm_lapic *apic = vcpu-arch.apic;
 + int i;
 +
 + for (i = 0; i  8; i++)
 + apic_set_reg(apic, APIC_TMR + 0x10 * i, tmr[i]);
 +}
 +
  static void apic_update_ppr(struct kvm_lapic *apic)
  {
   u32 tpr, isrv, ppr, old_ppr;
 @@ -647,12 +656,6 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
 delivery_mode,
   if (unlikely(!apic_enabled(apic)))
   break;
  
 - if (trig_mode) {
 - apic_debug(level trig mode for vector %d, vector);
 - apic_set_vector(vector, apic-regs + APIC_TMR);
 - } else
 - apic_clear_vector(vector, apic-regs + APIC_TMR);
 -
   result = !apic_test_and_set_irr(vector, apic);
   trace_kvm_apic_accept_irq(vcpu-vcpu_id, delivery_mode,
 trig_mode, vector, !result);
 diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
 index baa20cf..599076e 100644
 --- a/arch/x86/kvm/lapic.h
 +++ b/arch/x86/kvm/lapic.h
 @@ -53,6 +53,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
  u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
  void kvm_apic_set_version(struct kvm_vcpu *vcpu);
  
 +void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr);
  int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
  int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 64241b6..6ad7760 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -5650,14 +5650,18 @@ static void kvm_gen_update_masterclock(struct kvm 
 *kvm)
  static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
  {
   u64 eoi_exit_bitmap[4];
 + u64 tmr[4];
  
   if (!kvm_lapic_enabled(vcpu))
   return;
  
   memset(eoi_exit_bitmap, 0, 32);
 + memset(tmr, 0, 32);
  
 - kvm_ioapic_scan_entry(vcpu, (unsigned long *)eoi_exit_bitmap);
 + kvm_ioapic_scan_entry(vcpu, (unsigned long *)eoi_exit_bitmap,
 + (unsigned long *)tmr);
   kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap);
 + kvm_apic_update_tmr(vcpu, (u32 *)tmr);
Again, cast it inside if needed.

  }
  
  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
 index f37c889..b704b67 100644
 --- a/virt/kvm/ioapic.c
 +++ b/virt/kvm/ioapic.c
 @@ -120,7 +120,8 @@ static void update_handled_vectors(struct kvm_ioapic 
 *ioapic)
  }
  
  void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
 - unsigned long *eoi_exit_bitmap)
 + unsigned long *eoi_exit_bitmap,
 + unsigned long *tmr)
  {
   struct kvm_ioapic *ioapic = vcpu-kvm-arch.vioapic;
   union kvm_ioapic_redirect_entry *e;
 @@ -135,8 +136,12 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC,
index))) {
   if (kvm_apic_match_dest(vcpu, NULL, 0,
 - e-fields.dest_id, e-fields.dest_mode))
 + e-fields.dest_id, e-fields.dest_mode)) {
 +
   __set_bit(irqe.vector, eoi_exit_bitmap);
 + if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG)
 + __set_bit(irqe.vector, tmr);
 + }
   }
   }
   spin_unlock(ioapic-lock);
 diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
 index e3538ba..280b4fc 100644
 --- a/virt/kvm/ioapic.h
 +++ b/virt/kvm/ioapic.h
 @@ -84,6 +84,7 @@ int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state 
 *state);
  int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
  void kvm_vcpu_scan_ioapic(struct kvm *kvm);
  void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
 - unsigned long *eoi_exit_bitmap);
 + unsigned long *eoi_exit_bitmap,
 +

Re: [PATCH v7 3/7] KVM: VMX: Check the posted interrupt capability

2013-04-07 Thread Gleb Natapov
On Mon, Apr 01, 2013 at 11:32:31AM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Detect the posted interrupt feature. If it exists, then set it in vmcs_config.
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/x86/include/asm/vmx.h |4 ++
  arch/x86/kvm/vmx.c |   87 
 ++--
  2 files changed, 71 insertions(+), 20 deletions(-)
 
 diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
 index fc1c313..6f07f19 100644
 --- a/arch/x86/include/asm/vmx.h
 +++ b/arch/x86/include/asm/vmx.h
 @@ -71,6 +71,7 @@
  #define PIN_BASED_NMI_EXITING   0x0008
  #define PIN_BASED_VIRTUAL_NMIS  0x0020
  #define PIN_BASED_VMX_PREEMPTION_TIMER  0x0040
 +#define PIN_BASED_POSTED_INTR   0x0080
  
  #define PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR  0x0016
  
 @@ -102,6 +103,7 @@
  /* VMCS Encodings */
  enum vmcs_field {
   VIRTUAL_PROCESSOR_ID= 0x,
 + POSTED_INTR_NV  = 0x0002,
   GUEST_ES_SELECTOR   = 0x0800,
   GUEST_CS_SELECTOR   = 0x0802,
   GUEST_SS_SELECTOR   = 0x0804,
 @@ -136,6 +138,8 @@ enum vmcs_field {
   VIRTUAL_APIC_PAGE_ADDR_HIGH = 0x2013,
   APIC_ACCESS_ADDR= 0x2014,
   APIC_ACCESS_ADDR_HIGH   = 0x2015,
 + POSTED_INTR_DESC_ADDR   = 0x2016,
 + POSTED_INTR_DESC_ADDR_HIGH  = 0x2017,
   EPT_POINTER = 0x201a,
   EPT_POINTER_HIGH= 0x201b,
   EOI_EXIT_BITMAP0= 0x201c,
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 7408d93..b2e95bc 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -84,7 +84,8 @@ module_param(vmm_exclusive, bool, S_IRUGO);
  static bool __read_mostly fasteoi = 1;
  module_param(fasteoi, bool, S_IRUGO);
  
 -static bool __read_mostly enable_apicv_reg_vid;
 +static bool __read_mostly enable_apicv;
 +module_param(enable_apicv, bool, S_IRUGO);
  
  /*
   * If nested=1, nested virtualization is supported, i.e., guests may use
 @@ -366,6 +367,19 @@ struct nested_vmx {
   struct page *apic_access_page;
  };
  
 +#define POSTED_INTR_ON  0
 +/* Posted-Interrupt Descriptor */
 +struct pi_desc {
 + u32 pir[8]; /* Posted interrupt requested */
 + union {
 + struct {
 + u8  on:1,
Do you actually use the 'on' member of the bit field? As far as I can
tell the paths always access control with (set|clear)_bit(). And C does not
guaranty layout of the bit field, so on may not point to what you think
it points to.
 
 + rsvd:7;
 + } control;
 + u32 rsvd[8];
 + } u;
 +} __aligned(64);
 +
  struct vcpu_vmx {
   struct kvm_vcpu   vcpu;
   unsigned long host_rsp;
 @@ -430,6 +444,9 @@ struct vcpu_vmx {
  
   bool rdtscp_enabled;
  
 + /* Posted interrupt descriptor */
 + struct pi_desc pi_desc;
 +
   /* Support for a guest hypervisor (nested VMX) */
   struct nested_vmx nested;
  };
 @@ -785,6 +802,18 @@ static inline bool 
 cpu_has_vmx_virtual_intr_delivery(void)
   SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
  }
  
 +static inline bool cpu_has_vmx_posted_intr(void)
 +{
 + return vmcs_config.pin_based_exec_ctrl  PIN_BASED_POSTED_INTR;
 +}
 +
 +static inline bool cpu_has_vmx_apicv(void)
 +{
 + return cpu_has_vmx_apic_register_virt() 
 + cpu_has_vmx_virtual_intr_delivery() 
 + cpu_has_vmx_posted_intr();
 +}
 +
  static inline bool cpu_has_vmx_flexpriority(void)
  {
   return cpu_has_vmx_tpr_shadow() 
 @@ -2552,12 +2581,6 @@ static __init int setup_vmcs_config(struct vmcs_config 
 *vmcs_conf)
   u32 _vmexit_control = 0;
   u32 _vmentry_control = 0;
  
 - min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING;
 - opt = PIN_BASED_VIRTUAL_NMIS;
 - if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
 - _pin_based_exec_control)  0)
 - return -EIO;
 -
   min = CPU_BASED_HLT_EXITING |
  #ifdef CONFIG_X86_64
 CPU_BASED_CR8_LOAD_EXITING |
 @@ -2634,6 +2657,17 @@ static __init int setup_vmcs_config(struct vmcs_config 
 *vmcs_conf)
   _vmexit_control)  0)
   return -EIO;
  
 + min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING;
 + opt = PIN_BASED_VIRTUAL_NMIS | PIN_BASED_POSTED_INTR;
 + if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
 + _pin_based_exec_control)  0)
 + return -EIO;
 +
 + if (!(_cpu_based_2nd_exec_control 
 + SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) ||
 + !(_vmexit_control  VM_EXIT_ACK_INTR_ON_EXIT))
 + _pin_based_exec_control = ~PIN_BASED_POSTED_INTR;
 +
   min = 0;

Re: [PATCH v7 6/7] KVM: VMX: Add the algorithm of deliver posted interrupt

2013-04-07 Thread Gleb Natapov
On Mon, Apr 01, 2013 at 11:32:34AM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Only deliver the posted interrupt when target vcpu is running
 and there is no previous interrupt pending in pir.
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/x86/include/asm/kvm_host.h |2 +
  arch/x86/kvm/lapic.c|   13 
  arch/x86/kvm/lapic.h|1 +
  arch/x86/kvm/svm.c  |6 
  arch/x86/kvm/vmx.c  |   60 
 ++-
  virt/kvm/kvm_main.c |1 +
  6 files changed, 82 insertions(+), 1 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index 8e95512..842ea5a 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -704,6 +704,8 @@ struct kvm_x86_ops {
   void (*hwapic_isr_update)(struct kvm *kvm, int isr);
   void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
   void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
 + void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
 + void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
   int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
   int (*get_tdp_level)(void);
   u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 686afee..95e8f4a 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -310,6 +310,19 @@ static u8 count_vectors(void *bitmap)
   return count;
  }
  
 +void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
 +{
 + u32 i, pir_val;
 + struct kvm_lapic *apic = vcpu-arch.apic;
 +
 + for (i = 0; i = 7; i++) {
 + pir_val = xchg(pir[i], 0);
 + if (pir_val)
 + *((u32 *)(apic-regs + APIC_IRR + i * 0x10)) |= pir_val;
 + }
 +}
 +EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
 +
  static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic)
  {
   apic-irr_pending = true;
 diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
 index 599076e..16c3949 100644
 --- a/arch/x86/kvm/lapic.h
 +++ b/arch/x86/kvm/lapic.h
 @@ -54,6 +54,7 @@ u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
  void kvm_apic_set_version(struct kvm_vcpu *vcpu);
  
  void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr);
 +void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
  int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
  int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index 2f8fe3f..d6713e1 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -3577,6 +3577,11 @@ static void svm_hwapic_isr_update(struct kvm *kvm, int 
 isr)
   return;
  }
  
 +static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 +{
 + return;
 +}
 +
  static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
  {
   struct vcpu_svm *svm = to_svm(vcpu);
 @@ -4305,6 +4310,7 @@ static struct kvm_x86_ops svm_x86_ops = {
   .vm_has_apicv = svm_vm_has_apicv,
   .load_eoi_exitmap = svm_load_eoi_exitmap,
   .hwapic_isr_update = svm_hwapic_isr_update,
 + .sync_pir_to_irr = svm_sync_pir_to_irr,
  
   .set_tss_addr = svm_set_tss_addr,
   .get_tdp_level = get_npt_level,
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index edfc87a..690734c 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -380,6 +380,23 @@ struct pi_desc {
   } u;
  } __aligned(64);
  
 +static bool pi_test_and_set_on(struct pi_desc *pi_desc)
 +{
 + return test_and_set_bit(POSTED_INTR_ON,
 + (unsigned long *)pi_desc-u.control);
 +}
 +
 +static bool pi_test_and_clear_on(struct pi_desc *pi_desc)
 +{
 + return test_and_clear_bit(POSTED_INTR_ON,
 + (unsigned long *)pi_desc-u.control);
 +}
 +
 +static int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
 +{
 + return test_and_set_bit(vector, (unsigned long *)pi_desc-pir);
 +}
 +
  struct vcpu_vmx {
   struct kvm_vcpu   vcpu;
   unsigned long host_rsp;
 @@ -2851,8 +2868,10 @@ static __init int hardware_setup(void)
  
   if (enable_apicv)
   kvm_x86_ops-update_cr8_intercept = NULL;
 - else
 + else {
   kvm_x86_ops-hwapic_irr_update = NULL;
 + kvm_x86_ops-deliver_posted_interrupt = NULL;
 + }
  
   if (nested)
   nested_vmx_setup_ctls_msrs();
 @@ -3914,6 +3933,43 @@ static int vmx_vm_has_apicv(struct kvm *kvm)
  }
  
  /*
 + * Send interrupt to vcpu via posted interrupt way.
 + * 1. If target vcpu is running(non-root mode), send posted interrupt
 + * notification to vcpu and hardware will sync PIR to vIRR atomically.
 + * 2. If target vcpu isn't running(root mode), kick it to pick up the
 + * 

Re: [PATCH RFC] kvm: add PV MMIO EVENTFD

2013-04-07 Thread Christoffer Dall
On Sun, Apr 7, 2013 at 12:41 AM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Apr 04, 2013 at 04:32:01PM -0700, Christoffer Dall wrote:
 [...]

   to give us some idea how much performance we would gain from each 
   approach? Thoughput should be completely unaffected anyway, since 
   virtio just coalesces kicks internally.
  
   Latency is dominated by the scheduling latency.
   This means virtio-net is not the best benchmark.
 
  So what is a good benchmark?
 
  E.g. ping pong stress will do but need to look at CPU utilization,
  that's what is affected, not latency.
 
  Is there any difference in speed at all? I strongly doubt it. One of 
  virtio's main points is to reduce the number of kicks.
 
  For this stage of the project I think microbenchmarks are more appropriate.
  Doubling the price of exit is likely to be measureable. 30 cycles likely
  not ...
 
 I don't quite understand this point here. If we don't have anything
 real-world where we can measure a decent difference, then why are we
 doing this? I would agree with Alex that the three test scenarios
 proposed by him should be tried out before adding this complexity,
 measured in CPU utilization or latency as you wish.

 Sure, plan to do real world benchmarks for PV MMIO versus PIO as well.
 I don't see why I should bother implementing hypercalls given that the
 kvm maintainer says they won't be merged.


the implementation effort to simply measure the hypercall performance
should be minimal, no? If we can measure a true difference in
performance, I'm sure we can revisit the issue of what will be merged
and what won't be, but until we have those numbers it's all
speculation.

 FWIW, ARM always uses MMIO and provides hardware decoding of all sane
 (not user register-writeback) instruction, but the hypercall vs. mmio
 looks like this:

 hvc:  4,917
 mmio_kernel: 6,248

 So 20% difference?  That's not far from what happens on my intel laptop:
 vmcall 1519
 outl_to_kernel 1745
 10% difference here.


 But I doubt that an hvc wrapper around mmio decoding would take care
 of all this difference, because the mmio operation needs to do other
 work not realated to emulating the instruction in software, which
 you'd have to do for an hvc anyway (populate kvm_mmio structure etc.)


 Instead of speculating, someone with relevant hardware
 could just try this, but kvm unittest doesn't seem to have arm support
 at the moment. Anyone working on this?

We have a branch called kvm-selftest that replicates much of the
functionality, which is what I run to get these measurements. I can
port it over to unittest at some point, but I'm not active working on
that.

I can measure it, but we have bigger fish to fry on the ARM side right
now, so it'll be a while until I get to that.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v7 3/7] KVM: VMX: Check the posted interrupt capability

2013-04-07 Thread Zhang, Yang Z
Gleb Natapov wrote on 2013-04-07:
 On Mon, Apr 01, 2013 at 11:32:31AM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Detect the posted interrupt feature. If it exists, then set it in 
 vmcs_config.
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/x86/include/asm/vmx.h |4 ++ arch/x86/kvm/vmx.c |   87
  ++-- 2 files changed, 71
  insertions(+), 20 deletions(-)
 diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
 index fc1c313..6f07f19 100644
 --- a/arch/x86/include/asm/vmx.h
 +++ b/arch/x86/include/asm/vmx.h
 @@ -71,6 +71,7 @@
  #define PIN_BASED_NMI_EXITING   0x0008
  #define PIN_BASED_VIRTUAL_NMIS  0x0020
  #define PIN_BASED_VMX_PREEMPTION_TIMER  0x0040
 +#define PIN_BASED_POSTED_INTR   0x0080
 
  #define PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR 0x0016
 @@ -102,6 +103,7 @@
  /* VMCS Encodings */ enum vmcs_field {  VIRTUAL_PROCESSOR_ID  
   = 0x, +POSTED_INTR_NV  = 0x0002,
  GUEST_ES_SELECTOR   = 0x0800,   GUEST_CS_SELECTOR 
   = 0x0802,  GUEST_SS_SELECTOR   = 0x0804,
  @@ -136,6 +138,8 @@ enum vmcs_field {   VIRTUAL_APIC_PAGE_ADDR_HIGH
  = 0x2013,   APIC_ACCESS_ADDR= 0x2014,
  APIC_ACCESS_ADDR_HIGH   = 0x2015,
 +POSTED_INTR_DESC_ADDR   = 0x2016,
 +POSTED_INTR_DESC_ADDR_HIGH  = 0x2017,
  EPT_POINTER = 0x201a,
  EPT_POINTER_HIGH= 0x201b,
  EOI_EXIT_BITMAP0= 0x201c,
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 7408d93..b2e95bc 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -84,7 +84,8 @@ module_param(vmm_exclusive, bool, S_IRUGO);
  static bool __read_mostly fasteoi = 1;
  module_param(fasteoi, bool, S_IRUGO);
 -static bool __read_mostly enable_apicv_reg_vid;
 +static bool __read_mostly enable_apicv;
 +module_param(enable_apicv, bool, S_IRUGO);
 
  /*
   * If nested=1, nested virtualization is supported, i.e., guests may use
 @@ -366,6 +367,19 @@ struct nested_vmx {
  struct page *apic_access_page;
  };
 +#define POSTED_INTR_ON  0
 +/* Posted-Interrupt Descriptor */
 +struct pi_desc {
 +u32 pir[8]; /* Posted interrupt requested */
 +union {
 +struct {
 +u8  on:1,
 Do you actually use the 'on' member of the bit field? As far as I can
'on' is just an indicator to easy for code review. I don't use it.

 tell the paths always access control with (set|clear)_bit(). And C does not
 guaranty layout of the bit field, so on may not point to what you think
 it points to.
You are right. I will remove it and comments it instead.

Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v7 4/7] KVM: Call common update function when ioapic entry changed.

2013-04-07 Thread Zhang, Yang Z
Gleb Natapov wrote on 2013-04-07:
 On Sun, Apr 07, 2013 at 02:00:04PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-04-07:
 On Mon, Apr 01, 2013 at 11:32:32AM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Both TMR and EOI exit bitmap need to be updated when ioapic changed
 or vcpu's id/ldr/dfr changed. So use common function instead eoi exit
 bitmap specific function.
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/ia64/kvm/lapic.h|6 --
  arch/x86/kvm/lapic.c |2 +-
  arch/x86/kvm/vmx.c   |3 +++
  arch/x86/kvm/x86.c   |   11 +++
  include/linux/kvm_host.h |4 ++--
  virt/kvm/ioapic.c|   26 +++---
  virt/kvm/ioapic.h|7 +++
  virt/kvm/irq_comm.c  |4 ++--
  virt/kvm/kvm_main.c  |4 ++--
  9 files changed, 35 insertions(+), 32 deletions(-)
 diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
 index c3e2935..c5f92a9 100644
 --- a/arch/ia64/kvm/lapic.h
 +++ b/arch/ia64/kvm/lapic.h
 @@ -27,10 +27,4 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct
 kvm_lapic_irq *irq);
  #define kvm_apic_present(x) (true)
  #define kvm_lapic_enabled(x) (true)
 -static inline bool kvm_apic_vid_enabled(void)
 -{
 -  /* IA64 has no apicv supporting, do nothing here */
 -  return false;
 -}
 -
  #endif
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index e227474..ce8d6f6 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -209,7 +209,7 @@ out:
if (old)
kfree_rcu(old, rcu);
 -  kvm_ioapic_make_eoibitmap_request(kvm);
 +  kvm_vcpu_scan_ioapic(kvm);
  }
  
  static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
 b2e95bc..edfc87a 100644 --- a/arch/x86/kvm/vmx.c +++
 b/arch/x86/kvm/vmx.c @@ -6420,6 +6420,9 @@ static void
 vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
 
  static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64
  *eoi_exit_bitmap) {
 +  if (!vmx_vm_has_apicv(vcpu-kvm))
 +  return;
 +
vmcs_write64(EOI_EXIT_BITMAP0, eoi_exit_bitmap[0]);
vmcs_write64(EOI_EXIT_BITMAP1, eoi_exit_bitmap[1]);
vmcs_write64(EOI_EXIT_BITMAP2, eoi_exit_bitmap[2]);
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 4d42fe1..64241b6 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -5647,13 +5647,16 @@ static void kvm_gen_update_masterclock(struct
 kvm *kvm)
  #endif
  }
 -static void update_eoi_exitmap(struct kvm_vcpu *vcpu)
 +static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
  {
u64 eoi_exit_bitmap[4];
 +  if (!kvm_lapic_enabled(vcpu))
 +  return;
 +
 Why is this needed here?
 We don't need to calculate eoi_exit_bitmap and TMR if lapic is not
 enabled. Also, ioapic is meaningless for the vcpu that doesn't enable
 the lapic.
 
 OK, but then let's use apic_enabled() since kvm_lapic_enabled() also
 checks for in kernel apic and we should not be here if apic is not
Sure.

 emulated in kernel. Also please make sure that we rescan ioapic on all
 apic state changes.
Yes. recalculate_apic_map() is called on all apic state changes, so request 
ioapic scan in recalculate_apic_map is enough.

Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 1/6] kvm: add device control API

2013-04-07 Thread Paul Mackerras
On Tue, Apr 02, 2013 at 08:57:48PM -0500, Scott Wood wrote:

[snip]

 +static int kvm_ioctl_create_device(struct kvm *kvm,
 +struct kvm_create_device *cd)
 +{
 + bool test = cd-flags  KVM_CREATE_DEVICE_TEST;
 +
 + switch (cd-type) {
 + default:
 + return -ENODEV;
 + }
 +}

This gives a compile error saying error: unused variable `test',
which is fatal since this gets compiled under arch/powerpc/kvm, and we
treat all warnings as errors there.

This still gives a compile error at the end of your series if you try
to compile with CONFIG_KVM_MPIC=n.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v7 6/7] KVM: VMX: Add the algorithm of deliver posted interrupt

2013-04-07 Thread Zhang, Yang Z
Gleb Natapov wrote on 2013-04-07:
 On Mon, Apr 01, 2013 at 11:32:34AM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Only deliver the posted interrupt when target vcpu is running
 and there is no previous interrupt pending in pir.
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/x86/include/asm/kvm_host.h |2 + arch/x86/kvm/lapic.c 
|   13  arch/x86/kvm/lapic.h|1 +
  arch/x86/kvm/svm.c  |6  arch/x86/kvm/vmx.c
   |   60 ++-
  virt/kvm/kvm_main.c |1 + 6 files changed, 82
  insertions(+), 1 deletions(-)
 diff --git a/arch/x86/include/asm/kvm_host.h
 b/arch/x86/include/asm/kvm_host.h index 8e95512..842ea5a 100644 ---
 a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h
 @@ -704,6 +704,8 @@ struct kvm_x86_ops {
  void (*hwapic_isr_update)(struct kvm *kvm, int isr);
  void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
  void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
 +void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
 +void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
  int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
  int (*get_tdp_level)(void);
  u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 686afee..95e8f4a 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -310,6 +310,19 @@ static u8 count_vectors(void *bitmap)
  return count;
  }
 +void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
 +{
 +u32 i, pir_val;
 +struct kvm_lapic *apic = vcpu-arch.apic;
 +
 +for (i = 0; i = 7; i++) {
 +pir_val = xchg(pir[i], 0);
 +if (pir_val)
 +*((u32 *)(apic-regs + APIC_IRR + i * 0x10)) |= pir_val;
 +}
 +}
 +EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
 +
  static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic)
  {
  apic-irr_pending = true;
 diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
 index 599076e..16c3949 100644
 --- a/arch/x86/kvm/lapic.h
 +++ b/arch/x86/kvm/lapic.h
 @@ -54,6 +54,7 @@ u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
  void kvm_apic_set_version(struct kvm_vcpu *vcpu);
  
  void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr); +void
  kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir); int
  kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); int
  kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); int
  kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index 2f8fe3f..d6713e1 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -3577,6 +3577,11 @@ static void svm_hwapic_isr_update(struct kvm *kvm,
 int isr)
  return;
  }
 +static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 +{
 +return;
 +}
 +
  static int svm_nmi_allowed(struct kvm_vcpu *vcpu) { struct vcpu_svm
  *svm = to_svm(vcpu); @@ -4305,6 +4310,7 @@ static struct kvm_x86_ops
  svm_x86_ops = { .vm_has_apicv = svm_vm_has_apicv,   
 .load_eoi_exitmap
  = svm_load_eoi_exitmap, .hwapic_isr_update = svm_hwapic_isr_update,
 +.sync_pir_to_irr = svm_sync_pir_to_irr,
 
  .set_tss_addr = svm_set_tss_addr,
  .get_tdp_level = get_npt_level,
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index edfc87a..690734c 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -380,6 +380,23 @@ struct pi_desc {
  } u;
  } __aligned(64);
 +static bool pi_test_and_set_on(struct pi_desc *pi_desc)
 +{
 +return test_and_set_bit(POSTED_INTR_ON,
 +(unsigned long *)pi_desc-u.control);
 +}
 +
 +static bool pi_test_and_clear_on(struct pi_desc *pi_desc)
 +{
 +return test_and_clear_bit(POSTED_INTR_ON,
 +(unsigned long *)pi_desc-u.control);
 +}
 +
 +static int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
 +{
 +return test_and_set_bit(vector, (unsigned long *)pi_desc-pir);
 +}
 +
  struct vcpu_vmx {
  struct kvm_vcpu   vcpu;
  unsigned long host_rsp;
 @@ -2851,8 +2868,10 @@ static __init int hardware_setup(void)
 
  if (enable_apicv)
  kvm_x86_ops-update_cr8_intercept = NULL;
 -else
 +else {
  kvm_x86_ops-hwapic_irr_update = NULL;
 +kvm_x86_ops-deliver_posted_interrupt = NULL;
 +}
 
  if (nested) nested_vmx_setup_ctls_msrs(); @@ -3914,6 
 +3933,43 @@
  static int vmx_vm_has_apicv(struct kvm *kvm) }
  
  /*
 + * Send interrupt to vcpu via posted interrupt way.
 + * 1. If target vcpu is running(non-root mode), send posted interrupt
 + * notification to vcpu and hardware will sync PIR to vIRR atomically.
 + * 2. If target vcpu isn't running(root mode), kick it to pick up the
 + * interrupt from PIR in 

Re: [RFC PATCH v3 1/6] kvm: add device control API

2013-04-07 Thread Paul Mackerras
On Tue, Apr 02, 2013 at 08:57:48PM -0500, Scott Wood wrote:

[snip]

 +static int kvm_ioctl_create_device(struct kvm *kvm,
 +struct kvm_create_device *cd)
 +{
 + bool test = cd-flags  KVM_CREATE_DEVICE_TEST;
 +
 + switch (cd-type) {
 + default:
 + return -ENODEV;
 + }
 +}

This gives a compile error saying error: unused variable `test',
which is fatal since this gets compiled under arch/powerpc/kvm, and we
treat all warnings as errors there.

This still gives a compile error at the end of your series if you try
to compile with CONFIG_KVM_MPIC=n.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html