date:20130318

[PATCH v2 1/8] KVM: Parse ioapic entry to get destination vcpu

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

Get destination vcpu map from one ioapic entry.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 arch/x86/kvm/lapic.c |   40 
 arch/x86/kvm/lapic.h |3 +++
 2 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 02b51dd..b4339a5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -145,28 +145,26 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
return (kvm_apic_get_reg(apic, APIC_ID)  24)  0xff;
 }
 
-void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
-   struct kvm_lapic_irq *irq,
-   u64 *eoi_exit_bitmap)
+void kvm_get_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+   unsigned long *vcpu_map)
 {
struct kvm_lapic **dst;
struct kvm_apic_map *map;
unsigned long bitmap = 1;
+   struct kvm_vcpu *vcpu;
int i;
 
rcu_read_lock();
-   map = rcu_dereference(vcpu-kvm-arch.apic_map);
+   map = rcu_dereference(kvm-arch.apic_map);
 
-   if (unlikely(!map)) {
-   __set_bit(irq-vector, (unsigned long *)eoi_exit_bitmap);
+   if (unlikely(!map))
goto out;
-   }
 
if (irq-dest_mode == 0) { /* physical mode */
-   if (irq-delivery_mode == APIC_DM_LOWEST ||
-   irq-dest_id == 0xff) {
-   __set_bit(irq-vector,
- (unsigned long *)eoi_exit_bitmap);
+   if (irq-dest_id == 0xff) {
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   if (apic_enabled(vcpu-arch.apic))
+   set_bit(vcpu-vcpu_id, vcpu_map);
goto out;
}
dst = map-phys_map[irq-dest_id  0xff];
@@ -181,17 +179,27 @@ void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
for_each_set_bit(i, bitmap, 16) {
if (!dst[i])
continue;
-   if (dst[i]-vcpu == vcpu) {
-   __set_bit(irq-vector,
- (unsigned long *)eoi_exit_bitmap);
-   break;
-   }
+   set_bit(dst[i]-vcpu-vcpu_id, vcpu_map);
}
 
 out:
rcu_read_unlock();
 }
 
+void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
+   struct kvm_lapic_irq *irq,
+   u64 *eoi_exit_bitmap)
+{
+   DECLARE_BITMAP(vcpu_map, KVM_MAX_VCPUS);
+
+   memset(vcpu_map, 0, sizeof(vcpu_map));
+
+   kvm_get_dest_vcpu(vcpu-kvm, irq, vcpu_map);
+   if (test_bit(vcpu-vcpu_id, vcpu_map) ||
+   bitmap_empty(vcpu_map, sizeof(vcpu_map)))
+   __set_bit(irq-vector, (unsigned long *)eoi_exit_bitmap);
+}
+
 static void recalculate_apic_map(struct kvm *kvm)
 {
struct kvm_apic_map *new, *old = NULL;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 1676d34..3a0f9d8 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -158,4 +158,7 @@ void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
struct kvm_lapic_irq *irq,
u64 *eoi_bitmap);
 
+void kvm_get_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+   unsigned long *vcpu_map);
+
 #endif
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/8] Use eoi to track RTC interrupt delivery status

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

Current interrupt coalescing logci which only used by RTC has conflict
with Posted Interrupt.

This patch introduces a new mechinism to use eoi to track interrupt:
When delivering an interrupt to vcpu, the need_eoi set to number of
vcpu that received the interrupt. And decrease it when each vcpu writing
eoi. No subsequent RTC interrupt can deliver to vcpu until all vcpus
write eoi.

Changes from v1 to v2:
* Don't register a dumy ack notifier for RTC. Check it directly when calculating
  eoi exit bitmap.
* Use kvm_apic_pending_eoi() instead call apic_test_vector() directly.
* Check coalesced info before set ioapic-irr
* Calculate destination vcpu map of RTC when ioapic entry or apic(id,ldr/dfr) 
changed.
  And only set need_eoi when delivering RTC interrupt.

Yang Zhang (8):
  KVM: Parse ioapic entry to get destination vcpu
  KVM: Rename kvm_ioapic_make_eoibitmap_request to
kvm_scan_ioapic_entry
  KVM: Add vcpu info to ioapic_update_eoi()
  KVM: Introduce struct rtc_status
  KVM: Recalculate destination vcpu map
  KVM: Add reset/restore rtc_status support
  KVM: Add rtc irq to eoi exit bitmap
  KVM: Use eoi to track RTC interrupt delivery status

 arch/x86/kvm/lapic.c |   52 +++--
 arch/x86/kvm/lapic.h |4 +
 virt/kvm/ioapic.c|  158 ++
 virt/kvm/ioapic.h|   14 -
 virt/kvm/irq_comm.c  |4 +-
 5 files changed, 198 insertions(+), 34 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 4/8] KVM: Introduce struct rtc_status

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 virt/kvm/ioapic.h |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index 2001b61..4904ca3 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -34,6 +34,12 @@ struct kvm_vcpu;
 #defineIOAPIC_INIT 0x5
 #defineIOAPIC_EXTINT   0x7
 
+struct rtc_status {
+   int need_eoi;
+   DECLARE_BITMAP(vcpu_map, KVM_MAX_VCPUS);
+   struct kvm_irq_ack_notifier irq_ack_notifier;
+};
+
 struct kvm_ioapic {
u64 base_address;
u32 ioregsel;
@@ -47,6 +53,9 @@ struct kvm_ioapic {
void (*ack_notifier)(void *opaque, int irq);
spinlock_t lock;
DECLARE_BITMAP(handled_vectors, 256);
+#ifdef CONFIG_X86
+   struct rtc_status rtc_status;
+#endif
 };
 
 #ifdef DEBUG
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 7/8] KVM: Add rtc irq to eoi exit bitmap

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

Add rtc irq to eoi exit bitmap to force vmexit when virtual interrupt
delivery is enabled.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 virt/kvm/ioapic.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 6266d1f..7e47da8 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -192,7 +192,7 @@ void kvm_ioapic_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
if (!e-fields.mask 
(e-fields.trig_mode == IOAPIC_LEVEL_TRIG ||
 kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC,
-index))) {
+index) || index == 8)) {
irqe.dest_id = e-fields.dest_id;
irqe.vector = e-fields.vector;
irqe.dest_mode = e-fields.dest_mode;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 5/8] KVM: Recalculate destination vcpu map

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

Update destination vcpu map when ioapic entry or apic(id, ldr, dfr) is changed

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 virt/kvm/ioapic.c |   38 --
 1 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 4296116..659511d 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -87,6 +87,36 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic 
*ioapic,
return result;
 }
 
+#ifdef CONFIG_X86
+static void rtc_irq_get_dest_vcpu(struct kvm_ioapic *ioapic, int irq)
+{
+   union kvm_ioapic_redirect_entry *entry = ioapic-redirtbl[irq];
+   struct kvm_lapic_irq irqe;
+
+   if (irq != 8 || entry-fields.mask)
+   return;
+
+   irqe.dest_id = entry-fields.dest_id;
+   irqe.vector = entry-fields.vector;
+   irqe.dest_mode = entry-fields.dest_mode;
+   irqe.trig_mode = entry-fields.trig_mode;
+   irqe.delivery_mode = entry-fields.delivery_mode  8;
+   irqe.level = 1;
+   irqe.shorthand = 0;
+
+   bitmap_zero(ioapic-rtc_status.vcpu_map, KVM_MAX_VCPUS);
+
+   kvm_get_dest_vcpu(ioapic-kvm, irqe, ioapic-rtc_status.vcpu_map);
+}
+
+#else
+
+static void rtc_irq_get_dest_vcpu(struct kvm_ioapic *ioapic, int irq)
+{
+   return;
+}
+#endif
+
 static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
 {
union kvm_ioapic_redirect_entry *pent;
@@ -147,9 +177,13 @@ void kvm_scan_ioapic_entry(struct kvm *kvm)
 {
struct kvm_ioapic *ioapic = kvm-arch.vioapic;
 
-   if (!kvm_apic_vid_enabled(kvm) || !ioapic)
+   if (!ioapic)
return;
-   kvm_make_update_eoibitmap_request(kvm);
+
+   rtc_irq_get_dest_vcpu(ioapic, 8);
+
+   if (kvm_apic_vid_enabled(kvm))
+   kvm_make_update_eoibitmap_request(kvm);
 }
 
 static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 3/8] KVM: Add vcpu info to ioapic_update_eoi()

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

Need to know which vcpu writes EOI.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 arch/x86/kvm/lapic.c |2 +-
 virt/kvm/ioapic.c|   12 ++--
 virt/kvm/ioapic.h|3 ++-
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 12179b3..6fb22e3 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -790,7 +790,7 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int 
vector)
trigger_mode = IOAPIC_LEVEL_TRIG;
else
trigger_mode = IOAPIC_EDGE_TRIG;
-   kvm_ioapic_update_eoi(apic-vcpu-kvm, vector, trigger_mode);
+   kvm_ioapic_update_eoi(apic-vcpu, vector, trigger_mode);
}
 }
 
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index d087e07..4296116 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -264,8 +264,8 @@ void kvm_ioapic_clear_all(struct kvm_ioapic *ioapic, int 
irq_source_id)
spin_unlock(ioapic-lock);
 }
 
-static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
-int trigger_mode)
+static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
+   struct kvm_ioapic *ioapic, int vector, int trigger_mode)
 {
int i;
 
@@ -304,12 +304,12 @@ bool kvm_ioapic_handles_vector(struct kvm *kvm, int 
vector)
return test_bit(vector, ioapic-handled_vectors);
 }
 
-void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int trigger_mode)
+void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
 {
-   struct kvm_ioapic *ioapic = kvm-arch.vioapic;
+   struct kvm_ioapic *ioapic = vcpu-kvm-arch.vioapic;
 
spin_lock(ioapic-lock);
-   __kvm_ioapic_update_eoi(ioapic, vector, trigger_mode);
+   __kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode);
spin_unlock(ioapic-lock);
 }
 
@@ -407,7 +407,7 @@ static int ioapic_mmio_write(struct kvm_io_device *this, 
gpa_t addr, int len,
break;
 #ifdef CONFIG_IA64
case IOAPIC_REG_EOI:
-   __kvm_ioapic_update_eoi(ioapic, data, IOAPIC_LEVEL_TRIG);
+   __kvm_ioapic_update_eoi(NULL, ioapic, data, IOAPIC_LEVEL_TRIG);
break;
 #endif
 
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index cebc123..2001b61 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -70,7 +70,8 @@ static inline struct kvm_ioapic *ioapic_irqchip(struct kvm 
*kvm)
 int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
int short_hand, int dest, int dest_mode);
 int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2);
-void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int trigger_mode);
+void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector,
+   int trigger_mode);
 bool kvm_ioapic_handles_vector(struct kvm *kvm, int vector);
 int kvm_ioapic_init(struct kvm *kvm);
 void kvm_ioapic_destroy(struct kvm *kvm);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 8/8] KVM: Use eoi to track RTC interrupt delivery status

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

Current interrupt coalescing logci which only used by RTC has conflict
with Posted Interrupt.
This patch introduces a new mechinism to use eoi to track interrupt:
When delivering an interrupt to vcpu, the need_eoi set to number of
vcpu that received the interrupt. And decrease it when each vcpu writing
eoi. No subsequent RTC interrupt can deliver to vcpu until all vcpus
write eoi.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 virt/kvm/ioapic.c |   67 +
 1 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 7e47da8..8d498e5 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -130,6 +130,48 @@ static void rtc_irq_get_dest_vcpu(struct kvm_ioapic 
*ioapic, int irq)
kvm_get_dest_vcpu(ioapic-kvm, irqe, ioapic-rtc_status.vcpu_map);
 }
 
+static void rtc_irq_set_eoi(struct kvm_ioapic *ioapic, int irq)
+{
+   union kvm_ioapic_redirect_entry *entry = ioapic-redirtbl[irq];
+
+   if (irq != 8)
+   return;
+
+   if (likely(!bitmap_empty(ioapic-rtc_status.vcpu_map, KVM_MAX_VCPUS))) {
+   if (entry-fields.delivery_mode == APIC_DM_LOWEST)
+   ioapic-rtc_status.need_eoi = 1;
+   else {
+   int weight;
+   weight = bitmap_weight(ioapic-rtc_status.vcpu_map,
+   sizeof(ioapic-rtc_status.vcpu_map));
+   ioapic-rtc_status.need_eoi = weight;
+   }
+   }
+}
+
+static void rtc_irq_ack_eoi(struct kvm_vcpu *vcpu,
+   struct rtc_status *rtc_status, int irq)
+{
+   if (irq != 8)
+   return;
+
+   if (test_bit(vcpu-vcpu_id, rtc_status-vcpu_map))
+   --rtc_status-need_eoi;
+
+   WARN_ON(rtc_status-need_eoi  0);
+}
+
+static bool rtc_irq_check(struct kvm_ioapic *ioapic, int irq)
+{
+   if (irq != 8)
+   return false;
+
+   if (ioapic-rtc_status.need_eoi  0)
+   return true; /* coalesced */
+
+   return false;
+}
+
 #else
 
 static void rtc_irq_reset(struct kvm_ioapic *ioapic)
@@ -146,6 +188,22 @@ static void rtc_irq_get_dest_vcpu(struct kvm_ioapic 
*ioapic, int irq)
 {
return;
 }
+
+static void rtc_irq_set_eoi(struct kvm_ioapic *ioapic, int irq)
+{
+   return;
+}
+
+static void rtc_irq_ack_eoi(struct kvm_vcpu *vcpu,
+   struct rtc_status *rtc_status, int irq)
+{
+   return;
+}
+
+static bool rtc_irq_check(struct kvm_ioapic *ioapic, int irq)
+{
+   return false;
+}
 #endif
 
 static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
@@ -282,6 +340,8 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
irqe.level = 1;
irqe.shorthand = 0;
 
+   rtc_irq_set_eoi(ioapic, irq);
+
return kvm_irq_delivery_to_apic(ioapic-kvm, NULL, irqe);
 }
 
@@ -306,6 +366,11 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, 
int irq_source_id,
ret = 1;
} else {
int edge = (entry.fields.trig_mode == IOAPIC_EDGE_TRIG);
+
+   if (rtc_irq_check(ioapic, irq)) {
+   ret = 0; /* coalesced */
+   goto out;
+   }
ioapic-irr |= mask;
if ((edge  old_irr != ioapic-irr) ||
(!edge  !entry.fields.remote_irr))
@@ -313,6 +378,7 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, 
int irq_source_id,
else
ret = 0; /* report coalesced interrupt */
}
+out:
trace_kvm_ioapic_set_irq(entry.bits, irq, ret == 0);
spin_unlock(ioapic-lock);
 
@@ -340,6 +406,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
if (ent-fields.vector != vector)
continue;
 
+   rtc_irq_ack_eoi(vcpu, ioapic-rtc_status, i);
/*
 * We are dropping lock while calling ack notifiers because ack
 * notifier callbacks for assigned devices call into IOAPIC
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 6/8] KVM: Add reset/restore rtc_status support

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

reset/restore rtc_status when ioapic reset/restore.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 arch/x86/kvm/lapic.c |8 
 arch/x86/kvm/lapic.h |1 +
 virt/kvm/ioapic.c|   33 +
 3 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6fb22e3..a223170 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -94,6 +94,14 @@ static inline int apic_test_vector(int vec, void *bitmap)
return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
 }
 
+bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
+{
+   struct kvm_lapic *apic = vcpu-arch.apic;
+
+   return apic_test_vector(vector, apic-regs + APIC_ISR) ||
+   apic_test_vector(vector, apic-regs + APIC_IRR);
+}
+
 static inline void apic_set_vector(int vec, void *bitmap)
 {
set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 3a0f9d8..e2a03d1 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -160,5 +160,6 @@ void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
 
 void kvm_get_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
unsigned long *vcpu_map);
+bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
 #endif
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 659511d..6266d1f 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -88,6 +88,27 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic 
*ioapic,
 }
 
 #ifdef CONFIG_X86
+static void rtc_irq_reset(struct kvm_ioapic *ioapic)
+{
+   ioapic-rtc_status.need_eoi = 0;
+   bitmap_zero(ioapic-rtc_status.vcpu_map, KVM_MAX_VCPUS);
+}
+
+static void rtc_irq_restore(struct kvm_ioapic *ioapic)
+{
+   struct kvm_vcpu *vcpu;
+   int vector, i, need_eoi = 0, rtc_pin = 8;
+
+   vector = ioapic-redirtbl[rtc_pin].fields.vector;
+   kvm_for_each_vcpu(i, vcpu, ioapic-kvm) {
+   if (kvm_apic_pending_eoi(vcpu, vector)) {
+   need_eoi++;
+   set_bit(vcpu-vcpu_id, ioapic-rtc_status.vcpu_map);
+   }
+   }
+   ioapic-rtc_status.need_eoi = need_eoi;
+}
+
 static void rtc_irq_get_dest_vcpu(struct kvm_ioapic *ioapic, int irq)
 {
union kvm_ioapic_redirect_entry *entry = ioapic-redirtbl[irq];
@@ -111,6 +132,16 @@ static void rtc_irq_get_dest_vcpu(struct kvm_ioapic 
*ioapic, int irq)
 
 #else
 
+static void rtc_irq_reset(struct kvm_ioapic *ioapic)
+{
+   return;
+}
+
+static void rtc_irq_restore(struct kvm_ioapic *ioapic)
+{
+   return;
+}
+
 static void rtc_irq_get_dest_vcpu(struct kvm_ioapic *ioapic, int irq)
 {
return;
@@ -462,6 +493,7 @@ void kvm_ioapic_reset(struct kvm_ioapic *ioapic)
ioapic-ioregsel = 0;
ioapic-irr = 0;
ioapic-id = 0;
+   rtc_irq_reset(ioapic);
update_handled_vectors(ioapic);
 }
 
@@ -527,6 +559,7 @@ int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state 
*state)
spin_lock(ioapic-lock);
memcpy(ioapic, state, sizeof(struct kvm_ioapic_state));
update_handled_vectors(ioapic);
+   rtc_irq_restore(ioapic);
kvm_scan_ioapic_entry(kvm);
spin_unlock(ioapic-lock);
return 0;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/8] KVM: Rename kvm_ioapic_make_eoibitmap_request to kvm_scan_ioapic_entry

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

RTC interrupt coalesced need to parse ioapic entry to get destionation
vcpu too. Rename it to more common name.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 arch/x86/kvm/lapic.c |2 +-
 virt/kvm/ioapic.c|6 +++---
 virt/kvm/ioapic.h|2 +-
 virt/kvm/irq_comm.c  |4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b4339a5..12179b3 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -264,7 +264,7 @@ out:
if (old)
kfree_rcu(old, rcu);
 
-   kvm_ioapic_make_eoibitmap_request(kvm);
+   kvm_scan_ioapic_entry(kvm);
 }
 
 static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index ce82b94..d087e07 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -143,7 +143,7 @@ void kvm_ioapic_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
 }
 EXPORT_SYMBOL_GPL(kvm_ioapic_calculate_eoi_exitmap);
 
-void kvm_ioapic_make_eoibitmap_request(struct kvm *kvm)
+void kvm_scan_ioapic_entry(struct kvm *kvm)
 {
struct kvm_ioapic *ioapic = kvm-arch.vioapic;
 
@@ -193,7 +193,7 @@ static void ioapic_write_indirect(struct kvm_ioapic 
*ioapic, u32 val)
if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG
 ioapic-irr  (1  index))
ioapic_service(ioapic, index);
-   kvm_ioapic_make_eoibitmap_request(ioapic-kvm);
+   kvm_scan_ioapic_entry(ioapic-kvm);
break;
}
 }
@@ -493,7 +493,7 @@ int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state 
*state)
spin_lock(ioapic-lock);
memcpy(ioapic, state, sizeof(struct kvm_ioapic_state));
update_handled_vectors(ioapic);
-   kvm_ioapic_make_eoibitmap_request(kvm);
+   kvm_scan_ioapic_entry(kvm);
spin_unlock(ioapic-lock);
return 0;
 }
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index 0400a46..cebc123 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -82,7 +82,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
struct kvm_lapic_irq *irq);
 int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
 int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
-void kvm_ioapic_make_eoibitmap_request(struct kvm *kvm);
+void kvm_scan_ioapic_entry(struct kvm *kvm);
 void kvm_ioapic_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
u64 *eoi_exit_bitmap);
 
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index ff6d40e..15de3b7 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -284,7 +284,7 @@ void kvm_register_irq_ack_notifier(struct kvm *kvm,
mutex_lock(kvm-irq_lock);
hlist_add_head_rcu(kian-link, kvm-irq_ack_notifier_list);
mutex_unlock(kvm-irq_lock);
-   kvm_ioapic_make_eoibitmap_request(kvm);
+   kvm_scan_ioapic_entry(kvm);
 }
 
 void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
@@ -294,7 +294,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
hlist_del_init_rcu(kian-link);
mutex_unlock(kvm-irq_lock);
synchronize_rcu();
-   kvm_ioapic_make_eoibitmap_request(kvm);
+   kvm_scan_ioapic_entry(kvm);
 }
 
 int kvm_request_irq_source_id(struct kvm *kvm)
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Revert KVM: x86: Optimize mmio spte zapping when, creating/moving memslot

2013-03-18 Thread Xiao Guangrong

On 03/16/2013 10:07 AM, Takuya Yoshikawa wrote:
 On Fri, 15 Mar 2013 23:26:59 +0800
 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index d3c4787..61a5bb6 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -6991,7 +6991,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
   * mmio sptes.
   */
  if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
 -kvm_mmu_zap_mmio_sptes(kvm);

 
 ???
 + kvm_mmu_zap_all()

Ouch, this is a stupid copy-paste error. It looks like that i should
use git send-mail.

Thank you for pointing this out, Takuya!

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

2013-03-18 Thread Xiao Guangrong

On 03/16/2013 10:13 AM, Takuya Yoshikawa wrote:
 On Fri, 15 Mar 2013 23:29:53 +0800
 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
 
 +/*
 + * The caller should protect concurrent access on
 + * kvm-arch.mmio_invalid_gen. Currently, it is used by
 + * kvm_arch_commit_memory_region and protected by kvm-slots_lock.
 + */
 +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm)
 
 kvm_mmu_invalidate_mmio_sptes() may be a better name.

Yes, i am not good at naming, this is a better name, thank you!

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

2013-03-18 Thread Xiao Guangrong

On 03/17/2013 11:02 PM, Gleb Natapov wrote:
 On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
 This patch tries to introduce a very simple and scale way to invalid all
 mmio sptes - it need not walk any shadow pages and hold mmu-lock

 KVM maintains a global mmio invalid generation-number which is stored in
 kvm-arch.mmio_invalid_gen and every mmio spte stores the current global
 generation-number into his available bits when it is created

 When KVM need zap all mmio sptes, it just simply increase the global
 generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
 then it walks the shadow page table and get the mmio spte. If the
 generation-number on the spte does not equal the global generation-number,
 it will go to the normal #PF handler to update the mmio spte

 Since 19 bits are used to store generation-number on mmio spte, the
 generation-number can be round after 33554432 times. It is large enough
 for nearly all most cases, but making the code be more strong, we zap all
 shadow pages when the number is round

 Very nice idea, but why drop Takuya patches instead of using
 kvm_mmu_zap_mmio_sptes() when generation number overflows.

I am not sure whether it is still needed. Requesting to zap all mmio sptes for
more than 50 times is really really rare, it nearly does not happen.
(By the way, 33554432 is wrong in the changelog, i just copy that for my origin
implantation.) And, after my patch optimizing zapping all shadow pages,
zap-all-sps should not be a problem anymore since it does not take too much lock
time.

Your idea?

 
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/include/asm/kvm_host.h |2 +
  arch/x86/kvm/mmu.c  |   61 
 +--
  arch/x86/kvm/mmutrace.h |   17 +++
  arch/x86/kvm/paging_tmpl.h  |7 +++-
  arch/x86/kvm/vmx.c  |4 ++
  arch/x86/kvm/x86.c  |6 +--
  6 files changed, 82 insertions(+), 15 deletions(-)

 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index ef7f4a5..572398e 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -529,6 +529,7 @@ struct kvm_arch {
  unsigned int n_requested_mmu_pages;
  unsigned int n_max_mmu_pages;
  unsigned int indirect_shadow_pages;
 +unsigned int mmio_invalid_gen;
 Why invalid? Should be mmio_valid_gen or mmio_current_get.

mmio_invalid_gen is only updated in kvm_mmu_invalidate_mmio_sptes,
so i named it as _invalid_. But mmio_valid_gen is good for me.

 
  struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
  /*
   * Hash table of struct kvm_mmu_page.
 @@ -765,6 +766,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, 
 int slot);
  void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
   struct kvm_memory_slot *slot,
   gfn_t gfn_offset, unsigned long mask);
 +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm);
 Agree with Takuya that kvm_mmu_invalidate_mmio_sptes() is a better name.

Me too.

 
  void kvm_mmu_zap_all(struct kvm *kvm);
  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int 
 kvm_nr_mmu_pages);
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 13626f4..7093a92 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -234,12 +234,13 @@ static unsigned int get_mmio_spte_generation(u64 spte)
  static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
 unsigned access)
  {
 -u64 mask = generation_mmio_spte_mask(0);
 +unsigned int gen = ACCESS_ONCE(kvm-arch.mmio_invalid_gen);
 +u64 mask = generation_mmio_spte_mask(gen);

  access = ACC_WRITE_MASK | ACC_USER_MASK;
  mask |= shadow_mmio_mask | access | gfn  PAGE_SHIFT;

 -trace_mark_mmio_spte(sptep, gfn, access, 0);
 +trace_mark_mmio_spte(sptep, gfn, access, gen);
  mmu_spte_set(sptep, mask);
  }

 @@ -269,6 +270,34 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, 
 gfn_t gfn,
  return false;
  }

 +static bool check_mmio_spte(struct kvm *kvm, u64 spte)
 +{
 +return get_mmio_spte_generation(spte) ==
 +ACCESS_ONCE(kvm-arch.mmio_invalid_gen);
 +}
 +
 +/*
 + * The caller should protect concurrent access on
 + * kvm-arch.mmio_invalid_gen. Currently, it is used by
 + * kvm_arch_commit_memory_region and protected by kvm-slots_lock.
 + */
 +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm)
 +{
 +/* Ensure update memslot has been completed. */
 +smp_mb();
 What barrier this one is paired with?

It is paired with nothing. :)

I used mb here just for avoid increasing the generation-number before updating
the memslot. But on other sides (storing gen and checking gen), we do not need
to care it - the worse case is that we emulate a memory-accessed instruction.

 
 +
 +

Re: [PATCH V3 3/3] tcm_vhost: Use vq-private_data to indicate if the endpoint is setup

2013-03-18 Thread Michael S. Tsirkin

On Fri, Mar 15, 2013 at 09:14:07AM +0800, Asias He wrote:
 Currently, vs-vs_endpoint is used indicate if the endpoint is setup or
 not. It is set or cleared in vhost_scsi_set_endpoint() or
 vhost_scsi_clear_endpoint() under the vs-dev.mutex lock. However, when
 we check it in vhost_scsi_handle_vq(), we ignored the lock, this is
 wrong.

This one, I don't get. Why is it wrong? Could you please describe the
race codition you are trying to prevent?

 Instead of using the vs-vs_endpoint and the vs-dev.mutex lock to
 indicate the status of the endpoint, we use per virtqueue
 vq-private_data to indicate it. In this way, we can only take the
 vq-mutex lock which is per queue and make the concurrent multiqueue
 process having less lock contention. Further, in the read side of
 vq-private_data, we can even do not take only lock if it is accessed in
 the vhost worker thread, because it is protected by vhost rcu.

But (unlike with -net) you never actually need the pointer. So why all
the complexity?

 Signed-off-by: Asias He as...@redhat.com
 ---
  drivers/vhost/tcm_vhost.c | 46 --
  1 file changed, 40 insertions(+), 6 deletions(-)
 
 diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
 index 43fb11e..099feef 100644
 --- a/drivers/vhost/tcm_vhost.c
 +++ b/drivers/vhost/tcm_vhost.c
 @@ -67,7 +67,6 @@ struct vhost_scsi {
   /* Protected by vhost_scsi-dev.mutex */
   struct tcm_vhost_tpg *vs_tpg[VHOST_SCSI_MAX_TARGET];
   char vs_vhost_wwpn[TRANSPORT_IQN_LEN];
 - bool vs_endpoint;
  
   struct vhost_dev dev;
   struct vhost_virtqueue vqs[VHOST_SCSI_MAX_VQ];
 @@ -91,6 +90,24 @@ static int iov_num_pages(struct iovec *iov)
  ((unsigned long)iov-iov_base  PAGE_MASK))  PAGE_SHIFT;
  }
  
 +static bool tcm_vhost_check_endpoint(struct vhost_virtqueue *vq)
 +{
 + bool ret = false;
 +
 + /*
 +  * We can handle the vq only after the endpoint is setup by calling the
 +  * VHOST_SCSI_SET_ENDPOINT ioctl.
 +  *
 +  * TODO: Check that we are running from vhost_worker which acts
 +  * as read-side critical section for vhost kind of RCU.
 +  * See the comments in struct vhost_virtqueue in drivers/vhost/vhost.h
 +  */
 + if (rcu_dereference_check(vq-private_data, 1))
 + ret = true;
 +
 + return ret;
 +}
 +
  static int tcm_vhost_check_true(struct se_portal_group *se_tpg)
  {
   return 1;
 @@ -581,8 +598,7 @@ static void vhost_scsi_handle_vq(struct vhost_scsi *vs,
   int head, ret;
   u8 target;
  
 - /* Must use ioctl VHOST_SCSI_SET_ENDPOINT */
 - if (unlikely(!vs-vs_endpoint))
 + if (!tcm_vhost_check_endpoint(vq))
   return;
  
   mutex_lock(vq-mutex);
 @@ -781,8 +797,9 @@ static int vhost_scsi_set_endpoint(
  {
   struct tcm_vhost_tport *tv_tport;
   struct tcm_vhost_tpg *tv_tpg;
 + struct vhost_virtqueue *vq;
   bool match = false;
 - int index, ret;
 + int index, ret, i;
  
   mutex_lock(vs-dev.mutex);
   /* Verify that ring has been setup correctly. */
 @@ -826,7 +843,13 @@ static int vhost_scsi_set_endpoint(
   if (match) {
   memcpy(vs-vs_vhost_wwpn, t-vhost_wwpn,
  sizeof(vs-vs_vhost_wwpn));
 - vs-vs_endpoint = true;
 + for (i = 0; i  VHOST_SCSI_MAX_VQ; i++) {
 + vq = vs-vqs[i];
 + /* Flushing the vhost_work acts as synchronize_rcu */
 + mutex_lock(vq-mutex);
 + rcu_assign_pointer(vq-private_data, vs);
 + mutex_unlock(vq-mutex);
 + }
   ret = 0;
   } else {
   ret = -EEXIST;
 @@ -842,6 +865,8 @@ static int vhost_scsi_clear_endpoint(
  {
   struct tcm_vhost_tport *tv_tport;
   struct tcm_vhost_tpg *tv_tpg;
 + struct vhost_virtqueue *vq;
 + bool match = false;
   int index, ret, i;
   u8 target;
  
 @@ -877,9 +902,18 @@ static int vhost_scsi_clear_endpoint(
   }
   tv_tpg-tv_tpg_vhost_count--;
   vs-vs_tpg[target] = NULL;
 - vs-vs_endpoint = false;
 + match = true;
   mutex_unlock(tv_tpg-tv_tpg_mutex);
   }
 + if (match) {
 + for (i = 0; i  VHOST_SCSI_MAX_VQ; i++) {
 + vq = vs-vqs[i];
 + /* Flushing the vhost_work acts as synchronize_rcu */
 + mutex_lock(vq-mutex);
 + rcu_assign_pointer(vq-private_data, NULL);
 + mutex_unlock(vq-mutex);
 + }
 + }
   mutex_unlock(vs-dev.mutex);
   return 0;
  
 -- 
 1.8.1.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

2013-03-18 Thread Gleb Natapov

On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
 On 03/17/2013 11:02 PM, Gleb Natapov wrote:
  On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
  This patch tries to introduce a very simple and scale way to invalid all
  mmio sptes - it need not walk any shadow pages and hold mmu-lock
 
  KVM maintains a global mmio invalid generation-number which is stored in
  kvm-arch.mmio_invalid_gen and every mmio spte stores the current global
  generation-number into his available bits when it is created
 
  When KVM need zap all mmio sptes, it just simply increase the global
  generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
  then it walks the shadow page table and get the mmio spte. If the
  generation-number on the spte does not equal the global generation-number,
  it will go to the normal #PF handler to update the mmio spte
 
  Since 19 bits are used to store generation-number on mmio spte, the
  generation-number can be round after 33554432 times. It is large enough
  for nearly all most cases, but making the code be more strong, we zap all
  shadow pages when the number is round
 
  Very nice idea, but why drop Takuya patches instead of using
  kvm_mmu_zap_mmio_sptes() when generation number overflows.
 
 I am not sure whether it is still needed. Requesting to zap all mmio sptes for
 more than 50 times is really really rare, it nearly does not happen.
 (By the way, 33554432 is wrong in the changelog, i just copy that for my 
 origin
 implantation.) And, after my patch optimizing zapping all shadow pages,
 zap-all-sps should not be a problem anymore since it does not take too much 
 lock
 time.
 
 Your idea?
 
I expect 50 to become less since I already had plans to store some
information in mmio spte. Even if all zap-all-sptes becomes faster we
still needlessly zap all sptes while we can zap only mmio.

  
  
  Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
  ---
   arch/x86/include/asm/kvm_host.h |2 +
   arch/x86/kvm/mmu.c  |   61 
  +--
   arch/x86/kvm/mmutrace.h |   17 +++
   arch/x86/kvm/paging_tmpl.h  |7 +++-
   arch/x86/kvm/vmx.c  |4 ++
   arch/x86/kvm/x86.c  |6 +--
   6 files changed, 82 insertions(+), 15 deletions(-)
 
  diff --git a/arch/x86/include/asm/kvm_host.h 
  b/arch/x86/include/asm/kvm_host.h
  index ef7f4a5..572398e 100644
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -529,6 +529,7 @@ struct kvm_arch {
 unsigned int n_requested_mmu_pages;
 unsigned int n_max_mmu_pages;
 unsigned int indirect_shadow_pages;
  +  unsigned int mmio_invalid_gen;
  Why invalid? Should be mmio_valid_gen or mmio_current_get.
 
 mmio_invalid_gen is only updated in kvm_mmu_invalidate_mmio_sptes,
 so i named it as _invalid_. But mmio_valid_gen is good for me.
 
It holds currently valid value though, so calling it invalid is
confusing.

  
 struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
 /*
  * Hash table of struct kvm_mmu_page.
  @@ -765,6 +766,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, 
  int slot);
   void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
  struct kvm_memory_slot *slot,
  gfn_t gfn_offset, unsigned long mask);
  +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm);
  Agree with Takuya that kvm_mmu_invalidate_mmio_sptes() is a better name.
 
 Me too.
 
  
   void kvm_mmu_zap_all(struct kvm *kvm);
   unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
   void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int 
  kvm_nr_mmu_pages);
  diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
  index 13626f4..7093a92 100644
  --- a/arch/x86/kvm/mmu.c
  +++ b/arch/x86/kvm/mmu.c
  @@ -234,12 +234,13 @@ static unsigned int get_mmio_spte_generation(u64 
  spte)
   static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
unsigned access)
   {
  -  u64 mask = generation_mmio_spte_mask(0);
  +  unsigned int gen = ACCESS_ONCE(kvm-arch.mmio_invalid_gen);
  +  u64 mask = generation_mmio_spte_mask(gen);
 
 access = ACC_WRITE_MASK | ACC_USER_MASK;
 mask |= shadow_mmio_mask | access | gfn  PAGE_SHIFT;
 
  -  trace_mark_mmio_spte(sptep, gfn, access, 0);
  +  trace_mark_mmio_spte(sptep, gfn, access, gen);
 mmu_spte_set(sptep, mask);
   }
 
  @@ -269,6 +270,34 @@ static bool set_mmio_spte(struct kvm *kvm, u64 
  *sptep, gfn_t gfn,
 return false;
   }
 
  +static bool check_mmio_spte(struct kvm *kvm, u64 spte)
  +{
  +  return get_mmio_spte_generation(spte) ==
  +  ACCESS_ONCE(kvm-arch.mmio_invalid_gen);
  +}
  +
  +/*
  + * The caller should protect concurrent access on
  + * kvm-arch.mmio_invalid_gen. Currently, it is used by
  + * kvm_arch_commit_memory_region and protected by kvm-slots_lock.
  + */
  +void

Re: [PATCH V3 3/3] tcm_vhost: Use vq-private_data to indicate if the endpoint is setup

2013-03-18 Thread Asias He

On Mon, Mar 18, 2013 at 10:19:00AM +0200, Michael S. Tsirkin wrote:
 On Fri, Mar 15, 2013 at 09:14:07AM +0800, Asias He wrote:
  Currently, vs-vs_endpoint is used indicate if the endpoint is setup or
  not. It is set or cleared in vhost_scsi_set_endpoint() or
  vhost_scsi_clear_endpoint() under the vs-dev.mutex lock. However, when
  we check it in vhost_scsi_handle_vq(), we ignored the lock, this is
  wrong.
 
 This one, I don't get. Why is it wrong? Could you please describe the
 race codition you are trying to prevent?

Why is it safe to access vs-vs_endpoint without any lock?

  Instead of using the vs-vs_endpoint and the vs-dev.mutex lock to
  indicate the status of the endpoint, we use per virtqueue
  vq-private_data to indicate it. In this way, we can only take the
  vq-mutex lock which is per queue and make the concurrent multiqueue
  process having less lock contention. Further, in the read side of
  vq-private_data, we can even do not take only lock if it is accessed in
  the vhost worker thread, because it is protected by vhost rcu.
 
 But (unlike with -net) you never actually need the pointer. So why all
 the complexity?

It works as a flag, NULL or !NULL.

This is from your other mail:

'''
This takes dev mutex on data path which will introduce
contention esp for multiqueue.
How about storing the endpoint as part of vq
private data and protecting with vq mutex?
'''

  Signed-off-by: Asias He as...@redhat.com
  ---
   drivers/vhost/tcm_vhost.c | 46 
  --
   1 file changed, 40 insertions(+), 6 deletions(-)
  
  diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
  index 43fb11e..099feef 100644
  --- a/drivers/vhost/tcm_vhost.c
  +++ b/drivers/vhost/tcm_vhost.c
  @@ -67,7 +67,6 @@ struct vhost_scsi {
  /* Protected by vhost_scsi-dev.mutex */
  struct tcm_vhost_tpg *vs_tpg[VHOST_SCSI_MAX_TARGET];
  char vs_vhost_wwpn[TRANSPORT_IQN_LEN];
  -   bool vs_endpoint;
   
  struct vhost_dev dev;
  struct vhost_virtqueue vqs[VHOST_SCSI_MAX_VQ];
  @@ -91,6 +90,24 @@ static int iov_num_pages(struct iovec *iov)
 ((unsigned long)iov-iov_base  PAGE_MASK))  PAGE_SHIFT;
   }
   
  +static bool tcm_vhost_check_endpoint(struct vhost_virtqueue *vq)
  +{
  +   bool ret = false;
  +
  +   /*
  +* We can handle the vq only after the endpoint is setup by calling the
  +* VHOST_SCSI_SET_ENDPOINT ioctl.
  +*
  +* TODO: Check that we are running from vhost_worker which acts
  +* as read-side critical section for vhost kind of RCU.
  +* See the comments in struct vhost_virtqueue in drivers/vhost/vhost.h
  +*/
  +   if (rcu_dereference_check(vq-private_data, 1))
  +   ret = true;
  +
  +   return ret;
  +}
  +
   static int tcm_vhost_check_true(struct se_portal_group *se_tpg)
   {
  return 1;
  @@ -581,8 +598,7 @@ static void vhost_scsi_handle_vq(struct vhost_scsi *vs,
  int head, ret;
  u8 target;
   
  -   /* Must use ioctl VHOST_SCSI_SET_ENDPOINT */
  -   if (unlikely(!vs-vs_endpoint))
  +   if (!tcm_vhost_check_endpoint(vq))
  return;
   
  mutex_lock(vq-mutex);
  @@ -781,8 +797,9 @@ static int vhost_scsi_set_endpoint(
   {
  struct tcm_vhost_tport *tv_tport;
  struct tcm_vhost_tpg *tv_tpg;
  +   struct vhost_virtqueue *vq;
  bool match = false;
  -   int index, ret;
  +   int index, ret, i;
   
  mutex_lock(vs-dev.mutex);
  /* Verify that ring has been setup correctly. */
  @@ -826,7 +843,13 @@ static int vhost_scsi_set_endpoint(
  if (match) {
  memcpy(vs-vs_vhost_wwpn, t-vhost_wwpn,
 sizeof(vs-vs_vhost_wwpn));
  -   vs-vs_endpoint = true;
  +   for (i = 0; i  VHOST_SCSI_MAX_VQ; i++) {
  +   vq = vs-vqs[i];
  +   /* Flushing the vhost_work acts as synchronize_rcu */
  +   mutex_lock(vq-mutex);
  +   rcu_assign_pointer(vq-private_data, vs);
  +   mutex_unlock(vq-mutex);
  +   }
  ret = 0;
  } else {
  ret = -EEXIST;
  @@ -842,6 +865,8 @@ static int vhost_scsi_clear_endpoint(
   {
  struct tcm_vhost_tport *tv_tport;
  struct tcm_vhost_tpg *tv_tpg;
  +   struct vhost_virtqueue *vq;
  +   bool match = false;
  int index, ret, i;
  u8 target;
   
  @@ -877,9 +902,18 @@ static int vhost_scsi_clear_endpoint(
  }
  tv_tpg-tv_tpg_vhost_count--;
  vs-vs_tpg[target] = NULL;
  -   vs-vs_endpoint = false;
  +   match = true;
  mutex_unlock(tv_tpg-tv_tpg_mutex);
  }
  +   if (match) {
  +   for (i = 0; i  VHOST_SCSI_MAX_VQ; i++) {
  +   vq = vs-vqs[i];
  +   /* Flushing the vhost_work acts as synchronize_rcu */
  +   mutex_lock(vq-mutex);
  +   rcu_assign_pointer(vq-private_data, NULL);
  +   mutex_unlock(vq-mutex);
  +

Re: [PATCH v6 0/5] KVM: VMX: Add Posted Interrupt supporting

2013-03-18 Thread Gleb Natapov

On Mon, Mar 18, 2013 at 02:49:25AM +, Zhang, Yang Z wrote:
 Zhang, Yang Z wrote on 2013-03-15:
  From: Yang Zhang yang.z.zh...@intel.com
  
  The follwoing patches are adding the Posted Interrupt supporting to KVM:
  The first patch enables the feature 'acknowledge interrupt on vmexit'.Since
  it is required by Posted interrupt, we need to enable it firstly.
  
  And the subsequent patches are adding the posted interrupt supporting:
  Posted Interrupt allows APIC interrupts to inject into guest directly
  without any vmexit.
  
  - When delivering a interrupt to guest, if target vcpu is running,
update Posted-interrupt requests bitmap and send a notification event
to the vcpu. Then the vcpu will handle this interrupt automatically,
without any software involvemnt.
  - If target vcpu is not running or there already a notification event
pending in the vcpu, do nothing. The interrupt will be handled by
next vm entry
  NOTE: We don't turn on the Posted Interrupt until the coalesced issue is
  solved.
  
  Changes from v5 to v6:
  * Split sync_pir_to_irr into two functions one to query whether PIR is empty
and the other to perform the sync.
  * Add comments to explain how vmx_sync_pir_to_irr() work.
  * Rebase on top of KVM upstream.
  
  Changes from v4 to v5:
  * Add per cpu count for posted IPI handler.
  * Dont' check irr when delivering interrupt. Since we can not get interrupt
coalesced info with Posted Interrupt. So there is no need to check the
irr. There is another patch will changed current interrupt coalesced
logic. Before it, we will not turn on Posted Interrupt. * Clear
outstanding notification bit after call local_irq_disable, but before
check request. As Marcelo suggested, we can ensure the IPI not lost in
this way * Remove the spinlock. Same as item 2, if not need to get
coalesced info, then no need the lock.
  * Rebase on top of KVM upstream.
  
  Yang Zhang (5):
KVM: VMX: Enable acknowledge interupt on vmexit
KVM: VMX: Register a new IPI for posted interrupt
KVM: VMX: Check the posted interrupt capability
KVM: VMX: Add the algorithm of deliver posted interrupt
KVM : VMX: Use posted interrupt to deliver virtual interrupt
   arch/x86/include/asm/entry_arch.h  |4 +
   arch/x86/include/asm/hardirq.h |3 +
   arch/x86/include/asm/hw_irq.h  |1 +
   arch/x86/include/asm/irq_vectors.h |5 +
   arch/x86/include/asm/kvm_host.h|5 + arch/x86/include/asm/vmx.h 
  |4 + arch/x86/kernel/entry_64.S |5 +
   arch/x86/kernel/irq.c  |   22 
   arch/x86/kernel/irqinit.c  |4 + arch/x86/kvm/irq.c 
  |3 +- arch/x86/kvm/lapic.c   |   29 -
   arch/x86/kvm/lapic.h   |2 + arch/x86/kvm/svm.c 
  |   30 + arch/x86/kvm/vmx.c |  223
    arch/x86/kvm/x86.c
   |8 +- virt/kvm/kvm_main.c|1 + 16 files changed,
   320 insertions(+), 29 deletions(-)
 Are there any other comments with this patch?
 
Haven't reviewed it yet, will do ASAP. Use eoi to track RTC interrupt
delivery status series is pre-request for this one.

 For TMR issue, since it has nothing to do with APICv, if we really need to 
 handle it later, then we may need a separate patch to fix it. But currently, 
 we may focused on APICv only.
 
What do you mean by TMR has nothing to do with APICv?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V3 3/3] tcm_vhost: Use vq-private_data to indicate if the endpoint is setup

2013-03-18 Thread Michael S. Tsirkin

On Mon, Mar 18, 2013 at 05:14:33PM +0800, Asias He wrote:
 On Mon, Mar 18, 2013 at 10:19:00AM +0200, Michael S. Tsirkin wrote:
  On Fri, Mar 15, 2013 at 09:14:07AM +0800, Asias He wrote:
   Currently, vs-vs_endpoint is used indicate if the endpoint is setup or
   not. It is set or cleared in vhost_scsi_set_endpoint() or
   vhost_scsi_clear_endpoint() under the vs-dev.mutex lock. However, when
   we check it in vhost_scsi_handle_vq(), we ignored the lock, this is
   wrong.
  
  This one, I don't get. Why is it wrong? Could you please describe the
  race codition you are trying to prevent?
 
 Why is it safe to access vs-vs_endpoint without any lock?

For the same reason it's safe with the pointer: either readers
see the old or the new value, and we flush before relying on
the new value.
RCU macros also include barriers that are irrelevant if you are not
going to access any data through the pointer.
Nowdays they also including lockdep-like checks, which you override.

   Instead of using the vs-vs_endpoint and the vs-dev.mutex lock to
   indicate the status of the endpoint, we use per virtqueue
   vq-private_data to indicate it. In this way, we can only take the
   vq-mutex lock which is per queue and make the concurrent multiqueue
   process having less lock contention. Further, in the read side of
   vq-private_data, we can even do not take only lock if it is accessed in
   the vhost worker thread, because it is protected by vhost rcu.
  
  But (unlike with -net) you never actually need the pointer. So why all
  the complexity?
 
 It works as a flag, NULL or !NULL.
 
 This is from your other mail:
 
 '''
 This takes dev mutex on data path which will introduce
 contention esp for multiqueue.
 How about storing the endpoint as part of vq
 private data and protecting with vq mutex?
 '''

Yes this is better than taking the mutex but I don't see
a problem as is, either. For patch to go into 3.9 it needs
to fix a bug, not just be a refactoring.

   Signed-off-by: Asias He as...@redhat.com
   ---
drivers/vhost/tcm_vhost.c | 46 
   --
1 file changed, 40 insertions(+), 6 deletions(-)
   
   diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
   index 43fb11e..099feef 100644
   --- a/drivers/vhost/tcm_vhost.c
   +++ b/drivers/vhost/tcm_vhost.c
   @@ -67,7 +67,6 @@ struct vhost_scsi {
 /* Protected by vhost_scsi-dev.mutex */
 struct tcm_vhost_tpg *vs_tpg[VHOST_SCSI_MAX_TARGET];
 char vs_vhost_wwpn[TRANSPORT_IQN_LEN];
   - bool vs_endpoint;

 struct vhost_dev dev;
 struct vhost_virtqueue vqs[VHOST_SCSI_MAX_VQ];
   @@ -91,6 +90,24 @@ static int iov_num_pages(struct iovec *iov)
((unsigned long)iov-iov_base  PAGE_MASK))  PAGE_SHIFT;
}

   +static bool tcm_vhost_check_endpoint(struct vhost_virtqueue *vq)
   +{
   + bool ret = false;
   +
   + /*
   +  * We can handle the vq only after the endpoint is setup by calling the
   +  * VHOST_SCSI_SET_ENDPOINT ioctl.
   +  *
   +  * TODO: Check that we are running from vhost_worker which acts
   +  * as read-side critical section for vhost kind of RCU.
   +  * See the comments in struct vhost_virtqueue in drivers/vhost/vhost.h
   +  */
   + if (rcu_dereference_check(vq-private_data, 1))
   + ret = true;
   +
   + return ret;
   +}
   +
static int tcm_vhost_check_true(struct se_portal_group *se_tpg)
{
 return 1;
   @@ -581,8 +598,7 @@ static void vhost_scsi_handle_vq(struct vhost_scsi 
   *vs,
 int head, ret;
 u8 target;

   - /* Must use ioctl VHOST_SCSI_SET_ENDPOINT */
   - if (unlikely(!vs-vs_endpoint))
   + if (!tcm_vhost_check_endpoint(vq))
 return;

 mutex_lock(vq-mutex);
   @@ -781,8 +797,9 @@ static int vhost_scsi_set_endpoint(
{
 struct tcm_vhost_tport *tv_tport;
 struct tcm_vhost_tpg *tv_tpg;
   + struct vhost_virtqueue *vq;
 bool match = false;
   - int index, ret;
   + int index, ret, i;

 mutex_lock(vs-dev.mutex);
 /* Verify that ring has been setup correctly. */
   @@ -826,7 +843,13 @@ static int vhost_scsi_set_endpoint(
 if (match) {
 memcpy(vs-vs_vhost_wwpn, t-vhost_wwpn,
sizeof(vs-vs_vhost_wwpn));
   - vs-vs_endpoint = true;
   + for (i = 0; i  VHOST_SCSI_MAX_VQ; i++) {
   + vq = vs-vqs[i];
   + /* Flushing the vhost_work acts as synchronize_rcu */
   + mutex_lock(vq-mutex);
   + rcu_assign_pointer(vq-private_data, vs);
   + mutex_unlock(vq-mutex);
   + }
 ret = 0;
 } else {
 ret = -EEXIST;
   @@ -842,6 +865,8 @@ static int vhost_scsi_clear_endpoint(
{
 struct tcm_vhost_tport *tv_tport;
 struct tcm_vhost_tpg *tv_tpg;
   + struct vhost_virtqueue *vq;
   + bool match = false;
 int index, ret, i;
 u8 target;

   @@ -877,9 +902,18 @@ static int vhost_scsi_clear_endpoint(
 }

Re: [PATCH v2 4/8] KVM: Introduce struct rtc_status

2013-03-18 Thread Gleb Natapov

On Mon, Mar 18, 2013 at 03:24:35PM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  virt/kvm/ioapic.h |9 +
  1 files changed, 9 insertions(+), 0 deletions(-)
 
 diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
 index 2001b61..4904ca3 100644
 --- a/virt/kvm/ioapic.h
 +++ b/virt/kvm/ioapic.h
 @@ -34,6 +34,12 @@ struct kvm_vcpu;
  #define  IOAPIC_INIT 0x5
  #define  IOAPIC_EXTINT   0x7
  
 +struct rtc_status {
 + int need_eoi;
 + DECLARE_BITMAP(vcpu_map, KVM_MAX_VCPUS);
 + struct kvm_irq_ack_notifier irq_ack_notifier;
If we do not register ack notifier any more why do you need this here?
Also give the structure more kvmish name.

 +};
 +
  struct kvm_ioapic {
   u64 base_address;
   u32 ioregsel;
 @@ -47,6 +53,9 @@ struct kvm_ioapic {
   void (*ack_notifier)(void *opaque, int irq);
   spinlock_t lock;
   DECLARE_BITMAP(handled_vectors, 256);
 +#ifdef CONFIG_X86
 + struct rtc_status rtc_status;
 +#endif
  };
  
  #ifdef DEBUG
 -- 
 1.7.1

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 5/8] KVM: Recalculate destination vcpu map

2013-03-18 Thread Gleb Natapov

On Mon, Mar 18, 2013 at 03:24:36PM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Update destination vcpu map when ioapic entry or apic(id, ldr, dfr) is changed
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  virt/kvm/ioapic.c |   38 --
  1 files changed, 36 insertions(+), 2 deletions(-)
 
 diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
 index 4296116..659511d 100644
 --- a/virt/kvm/ioapic.c
 +++ b/virt/kvm/ioapic.c
 @@ -87,6 +87,36 @@ static unsigned long ioapic_read_indirect(struct 
 kvm_ioapic *ioapic,
   return result;
  }
  
 +#ifdef CONFIG_X86
 +static void rtc_irq_get_dest_vcpu(struct kvm_ioapic *ioapic, int irq)
 +{
 + union kvm_ioapic_redirect_entry *entry = ioapic-redirtbl[irq];
 + struct kvm_lapic_irq irqe;
 +
You cannot access ioapic without taking ioapic lock.

 + if (irq != 8 || entry-fields.mask)
 + return;
 +
 + irqe.dest_id = entry-fields.dest_id;
 + irqe.vector = entry-fields.vector;
 + irqe.dest_mode = entry-fields.dest_mode;
 + irqe.trig_mode = entry-fields.trig_mode;
 + irqe.delivery_mode = entry-fields.delivery_mode  8;
 + irqe.level = 1;
 + irqe.shorthand = 0;
 +
 + bitmap_zero(ioapic-rtc_status.vcpu_map, KVM_MAX_VCPUS);
 +
 + kvm_get_dest_vcpu(ioapic-kvm, irqe, ioapic-rtc_status.vcpu_map);
 +}
 +
 +#else
 +
 +static void rtc_irq_get_dest_vcpu(struct kvm_ioapic *ioapic, int irq)
 +{
 + return;
 +}
 +#endif
 +
  static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
  {
   union kvm_ioapic_redirect_entry *pent;
 @@ -147,9 +177,13 @@ void kvm_scan_ioapic_entry(struct kvm *kvm)
  {
   struct kvm_ioapic *ioapic = kvm-arch.vioapic;
  
 - if (!kvm_apic_vid_enabled(kvm) || !ioapic)
 + if (!ioapic)
   return;
 - kvm_make_update_eoibitmap_request(kvm);
 +
 + rtc_irq_get_dest_vcpu(ioapic, 8);
 +
 + if (kvm_apic_vid_enabled(kvm))
 + kvm_make_update_eoibitmap_request(kvm);
  }
  
  static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 -- 
 1.7.1

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 8/8] KVM: Use eoi to track RTC interrupt delivery status

2013-03-18 Thread Gleb Natapov

On Mon, Mar 18, 2013 at 03:24:39PM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Current interrupt coalescing logci which only used by RTC has conflict
 with Posted Interrupt.
 This patch introduces a new mechinism to use eoi to track interrupt:
 When delivering an interrupt to vcpu, the need_eoi set to number of
 vcpu that received the interrupt. And decrease it when each vcpu writing
 eoi. No subsequent RTC interrupt can deliver to vcpu until all vcpus
 write eoi.
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  virt/kvm/ioapic.c |   67 
 +
  1 files changed, 67 insertions(+), 0 deletions(-)
 
 diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
 index 7e47da8..8d498e5 100644
 --- a/virt/kvm/ioapic.c
 +++ b/virt/kvm/ioapic.c
 @@ -130,6 +130,48 @@ static void rtc_irq_get_dest_vcpu(struct kvm_ioapic 
 *ioapic, int irq)
   kvm_get_dest_vcpu(ioapic-kvm, irqe, ioapic-rtc_status.vcpu_map);
  }
  
 +static void rtc_irq_set_eoi(struct kvm_ioapic *ioapic, int irq)
 +{
 + union kvm_ioapic_redirect_entry *entry = ioapic-redirtbl[irq];
 +
 + if (irq != 8)
 + return;
 +
 + if (likely(!bitmap_empty(ioapic-rtc_status.vcpu_map, KVM_MAX_VCPUS))) {
 + if (entry-fields.delivery_mode == APIC_DM_LOWEST)
 + ioapic-rtc_status.need_eoi = 1;
 + else {
 + int weight;
 + weight = bitmap_weight(ioapic-rtc_status.vcpu_map,
 + sizeof(ioapic-rtc_status.vcpu_map));
 + ioapic-rtc_status.need_eoi = weight;
 + }
 + }
 +}
 +
 +static void rtc_irq_ack_eoi(struct kvm_vcpu *vcpu,
 + struct rtc_status *rtc_status, int irq)
 +{
 + if (irq != 8)
 + return;
 +
 + if (test_bit(vcpu-vcpu_id, rtc_status-vcpu_map))
If you do not use test_and_clear_bit() here the WARN_ON() bellow can
be triggered by a malicious guest. Lets define rtc_status-expected_eoi
bitmap and copy vcpu_map into expected_eoi on each RTC irq.

 + --rtc_status-need_eoi;
 +
 + WARN_ON(rtc_status-need_eoi  0);
 +}
 +
 +static bool rtc_irq_check(struct kvm_ioapic *ioapic, int irq)
 +{
 + if (irq != 8)
 + return false;
 +
 + if (ioapic-rtc_status.need_eoi  0)
 + return true; /* coalesced */
 +
 + return false;
 +}
 +
  #else
  
  static void rtc_irq_reset(struct kvm_ioapic *ioapic)
 @@ -146,6 +188,22 @@ static void rtc_irq_get_dest_vcpu(struct kvm_ioapic 
 *ioapic, int irq)
  {
   return;
  }
 +
 +static void rtc_irq_set_eoi(struct kvm_ioapic *ioapic, int irq)
 +{
 + return;
 +}
 +
 +static void rtc_irq_ack_eoi(struct kvm_vcpu *vcpu,
 + struct rtc_status *rtc_status, int irq)
 +{
 + return;
 +}
 +
 +static bool rtc_irq_check(struct kvm_ioapic *ioapic, int irq)
 +{
 + return false;
 +}
  #endif
  
  static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
 @@ -282,6 +340,8 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
 irq)
   irqe.level = 1;
   irqe.shorthand = 0;
  
 + rtc_irq_set_eoi(ioapic, irq);
 +
   return kvm_irq_delivery_to_apic(ioapic-kvm, NULL, irqe);
  }
  
 @@ -306,6 +366,11 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int 
 irq, int irq_source_id,
   ret = 1;
   } else {
   int edge = (entry.fields.trig_mode == IOAPIC_EDGE_TRIG);
 +
 + if (rtc_irq_check(ioapic, irq)) {
 + ret = 0; /* coalesced */
 + goto out;
 + }
   ioapic-irr |= mask;
   if ((edge  old_irr != ioapic-irr) ||
   (!edge  !entry.fields.remote_irr))
 @@ -313,6 +378,7 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int 
 irq, int irq_source_id,
   else
   ret = 0; /* report coalesced interrupt */
   }
 +out:
   trace_kvm_ioapic_set_irq(entry.bits, irq, ret == 0);
   spin_unlock(ioapic-lock);
  
 @@ -340,6 +406,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
   if (ent-fields.vector != vector)
   continue;
  
 + rtc_irq_ack_eoi(vcpu, ioapic-rtc_status, i);
   /*
* We are dropping lock while calling ack notifiers because ack
* notifier callbacks for assigned devices call into IOAPIC
 -- 
 1.7.1

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v6 0/5] KVM: VMX: Add Posted Interrupt supporting

2013-03-18 Thread Zhang, Yang Z

Gleb Natapov wrote on 2013-03-18:
 On Mon, Mar 18, 2013 at 02:49:25AM +, Zhang, Yang Z wrote:
 Zhang, Yang Z wrote on 2013-03-15:
 From: Yang Zhang yang.z.zh...@intel.com
 
 The follwoing patches are adding the Posted Interrupt supporting to KVM:
 The first patch enables the feature 'acknowledge interrupt on vmexit'.Since
 it is required by Posted interrupt, we need to enable it firstly.
 
 And the subsequent patches are adding the posted interrupt supporting:
 Posted Interrupt allows APIC interrupts to inject into guest directly
 without any vmexit.
 
 - When delivering a interrupt to guest, if target vcpu is running,
   update Posted-interrupt requests bitmap and send a notification
   event to the vcpu. Then the vcpu will handle this interrupt
   automatically, without any software involvemnt. - If target vcpu is
   not running or there already a notification event pending in the
   vcpu, do nothing. The interrupt will be handled by next vm entry
 NOTE: We don't turn on the Posted Interrupt until the coalesced issue is
 solved.
 
 Changes from v5 to v6:
 * Split sync_pir_to_irr into two functions one to query whether PIR is empty
   and the other to perform the sync.
 * Add comments to explain how vmx_sync_pir_to_irr() work.
 * Rebase on top of KVM upstream.
 
 Changes from v4 to v5:
 * Add per cpu count for posted IPI handler.
 * Dont' check irr when delivering interrupt. Since we can not get interrupt
   coalesced info with Posted Interrupt. So there is no need to check the
   irr. There is another patch will changed current interrupt coalesced
   logic. Before it, we will not turn on Posted Interrupt. * Clear
   outstanding notification bit after call local_irq_disable, but before
   check request. As Marcelo suggested, we can ensure the IPI not lost in
   this way * Remove the spinlock. Same as item 2, if not need to get
   coalesced info, then no need the lock.
 * Rebase on top of KVM upstream.
 
 Yang Zhang (5):
   KVM: VMX: Enable acknowledge interupt on vmexit
   KVM: VMX: Register a new IPI for posted interrupt
   KVM: VMX: Check the posted interrupt capability
   KVM: VMX: Add the algorithm of deliver posted interrupt
   KVM : VMX: Use posted interrupt to deliver virtual interrupt
  arch/x86/include/asm/entry_arch.h  |4 +
  arch/x86/include/asm/hardirq.h |3 +
  arch/x86/include/asm/hw_irq.h  |1 +
  arch/x86/include/asm/irq_vectors.h |5 +
  arch/x86/include/asm/kvm_host.h|5 +
 arch/x86/include/asm/vmx.h
 |4 + arch/x86/kernel/entry_64.S |5 +
  arch/x86/kernel/irq.c  |   22 
  arch/x86/kernel/irqinit.c  |4 + arch/x86/kvm/irq.c
 |3 +- arch/x86/kvm/lapic.c   |   29 -
 arch/x86/kvm/lapic.h   |2 + arch/x86/kvm/svm.c
 |   30 + arch/x86/kvm/vmx.c |  223
   arch/x86/kvm/x86.c
  |8 +- virt/kvm/kvm_main.c|1 + 16 files changed,
  320 insertions(+), 29 deletions(-)
 Are there any other comments with this patch?
 
 Haven't reviewed it yet, will do ASAP. Use eoi to track RTC interrupt
 delivery status series is pre-request for this one.
 
 For TMR issue, since it has nothing to do with APICv, if we really need to 
 handle
 it later, then we may need a separate patch to fix it. But currently, we may
 focused on APICv only.
 
 What do you mean by TMR has nothing to do with APICv?
Just ignore it. I will send out the fixing with APICv patch.

Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v2 4/8] KVM: Introduce struct rtc_status

2013-03-18 Thread Zhang, Yang Z

Gleb Natapov wrote on 2013-03-18:
 On Mon, Mar 18, 2013 at 03:24:35PM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  virt/kvm/ioapic.h |9 +
  1 files changed, 9 insertions(+), 0 deletions(-)
 diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
 index 2001b61..4904ca3 100644
 --- a/virt/kvm/ioapic.h
 +++ b/virt/kvm/ioapic.h
 @@ -34,6 +34,12 @@ struct kvm_vcpu;
  #define IOAPIC_INIT 0x5
  #define IOAPIC_EXTINT   0x7
 +struct rtc_status {
 +int need_eoi;
 +DECLARE_BITMAP(vcpu_map, KVM_MAX_VCPUS);
 +struct kvm_irq_ack_notifier irq_ack_notifier;
 If we do not register ack notifier any more why do you need this here?
 Also give the structure more kvmish name.
You are right. It's a mistake to leave it here.

 +};
 +
  struct kvm_ioapic { u64 base_address;   u32 ioregsel; @@ -47,6 
 +53,9
  @@ struct kvm_ioapic {  void (*ack_notifier)(void *opaque, int irq);
  spinlock_t lock;DECLARE_BITMAP(handled_vectors, 256);
 +#ifdef CONFIG_X86
 +struct rtc_status rtc_status;
 +#endif
  };
  
  #ifdef DEBUG
 --
 1.7.1
 
 --
   Gleb.


Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v2 5/8] KVM: Recalculate destination vcpu map

2013-03-18 Thread Zhang, Yang Z

Gleb Natapov wrote on 2013-03-18:
 On Mon, Mar 18, 2013 at 03:24:36PM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Update destination vcpu map when ioapic entry or apic(id, ldr, dfr) is 
 changed
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  virt/kvm/ioapic.c |   38 --
  1 files changed, 36 insertions(+), 2 deletions(-)
 diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
 index 4296116..659511d 100644
 --- a/virt/kvm/ioapic.c
 +++ b/virt/kvm/ioapic.c
 @@ -87,6 +87,36 @@ static unsigned long ioapic_read_indirect(struct
 kvm_ioapic *ioapic,
  return result;
  }
 +#ifdef CONFIG_X86
 +static void rtc_irq_get_dest_vcpu(struct kvm_ioapic *ioapic, int irq)
 +{
 +union kvm_ioapic_redirect_entry *entry = ioapic-redirtbl[irq];
 +struct kvm_lapic_irq irqe;
 +
 You cannot access ioapic without taking ioapic lock.
Right.
 
 +if (irq != 8 || entry-fields.mask)
 +return;
 +
 +irqe.dest_id = entry-fields.dest_id;
 +irqe.vector = entry-fields.vector;
 +irqe.dest_mode = entry-fields.dest_mode;
 +irqe.trig_mode = entry-fields.trig_mode;
 +irqe.delivery_mode = entry-fields.delivery_mode  8;
 +irqe.level = 1;
 +irqe.shorthand = 0;
 +
 +bitmap_zero(ioapic-rtc_status.vcpu_map, KVM_MAX_VCPUS);
 +
 +kvm_get_dest_vcpu(ioapic-kvm, irqe, ioapic-rtc_status.vcpu_map);
 +}
 +
 +#else
 +
 +static void rtc_irq_get_dest_vcpu(struct kvm_ioapic *ioapic, int irq)
 +{
 +return;
 +}
 +#endif
 +
  static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
  {   union kvm_ioapic_redirect_entry *pent; @@ -147,9 +177,13 @@ void
  kvm_scan_ioapic_entry(struct kvm *kvm) {struct kvm_ioapic *ioapic =
  kvm-arch.vioapic;
 -if (!kvm_apic_vid_enabled(kvm) || !ioapic)
 +if (!ioapic)
  return;
 -kvm_make_update_eoibitmap_request(kvm);
 +
 +rtc_irq_get_dest_vcpu(ioapic, 8);
 +
 +if (kvm_apic_vid_enabled(kvm))
 +kvm_make_update_eoibitmap_request(kvm);
  }
  
  static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 --
 1.7.1
 
 --
   Gleb.


Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v2 8/8] KVM: Use eoi to track RTC interrupt delivery status

2013-03-18 Thread Zhang, Yang Z

Gleb Natapov wrote on 2013-03-18:
 On Mon, Mar 18, 2013 at 03:24:39PM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Current interrupt coalescing logci which only used by RTC has conflict
 with Posted Interrupt.
 This patch introduces a new mechinism to use eoi to track interrupt:
 When delivering an interrupt to vcpu, the need_eoi set to number of
 vcpu that received the interrupt. And decrease it when each vcpu writing
 eoi. No subsequent RTC interrupt can deliver to vcpu until all vcpus
 write eoi.
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  virt/kvm/ioapic.c |   67
  + 1 files changed,
  67 insertions(+), 0 deletions(-)
 diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
 index 7e47da8..8d498e5 100644
 --- a/virt/kvm/ioapic.c
 +++ b/virt/kvm/ioapic.c
 @@ -130,6 +130,48 @@ static void rtc_irq_get_dest_vcpu(struct kvm_ioapic
 *ioapic, int irq)
  kvm_get_dest_vcpu(ioapic-kvm, irqe, ioapic-rtc_status.vcpu_map);
  }
 +static void rtc_irq_set_eoi(struct kvm_ioapic *ioapic, int irq) +{
 +union kvm_ioapic_redirect_entry *entry = ioapic-redirtbl[irq]; +
 +if (irq != 8) + return; + + if
 (likely(!bitmap_empty(ioapic-rtc_status.vcpu_map, KVM_MAX_VCPUS))) {
 +if (entry-fields.delivery_mode == APIC_DM_LOWEST)
 +ioapic-rtc_status.need_eoi = 1; +  else { 
 +int weight;
 +weight = bitmap_weight(ioapic-rtc_status.vcpu_map,
 +sizeof(ioapic-rtc_status.vcpu_map));
 +ioapic-rtc_status.need_eoi = weight; + } + 
 } +} + +static void
 rtc_irq_ack_eoi(struct kvm_vcpu *vcpu, + struct 
 rtc_status
 *rtc_status, int irq) +{ +   if (irq != 8) + return; + + if
 (test_bit(vcpu-vcpu_id, rtc_status-vcpu_map))
 If you do not use test_and_clear_bit() here the WARN_ON() bellow can
 be triggered by a malicious guest. Lets define rtc_status-expected_eoi
 bitmap and copy vcpu_map into expected_eoi on each RTC irq.
Sure.
 
 +--rtc_status-need_eoi;
 +
 +WARN_ON(rtc_status-need_eoi  0);
 +}
 +
 +static bool rtc_irq_check(struct kvm_ioapic *ioapic, int irq)
 +{
 +if (irq != 8)
 +return false;
 +
 +if (ioapic-rtc_status.need_eoi  0)
 +return true; /* coalesced */
 +
 +return false;
 +}
 +
  #else
  
  static void rtc_irq_reset(struct kvm_ioapic *ioapic)
 @@ -146,6 +188,22 @@ static void rtc_irq_get_dest_vcpu(struct kvm_ioapic
 *ioapic, int irq)
  {
  return;
  }
 +
 +static void rtc_irq_set_eoi(struct kvm_ioapic *ioapic, int irq)
 +{
 +return;
 +}
 +
 +static void rtc_irq_ack_eoi(struct kvm_vcpu *vcpu,
 +struct rtc_status *rtc_status, int irq)
 +{
 +return;
 +}
 +
 +static bool rtc_irq_check(struct kvm_ioapic *ioapic, int irq)
 +{
 +return false;
 +}
  #endif
  
  static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
 @@ -282,6 +340,8 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int
 irq)
  irqe.level = 1;
  irqe.shorthand = 0;
 +rtc_irq_set_eoi(ioapic, irq);
 +
  return kvm_irq_delivery_to_apic(ioapic-kvm, NULL, irqe);
  }
 @@ -306,6 +366,11 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int
 irq, int irq_source_id,
  ret = 1;
  } else {
  int edge = (entry.fields.trig_mode == IOAPIC_EDGE_TRIG);
 +
 +if (rtc_irq_check(ioapic, irq)) {
 +ret = 0; /* coalesced */
 +goto out;
 +}
  ioapic-irr |= mask;
  if ((edge  old_irr != ioapic-irr) ||
  (!edge  !entry.fields.remote_irr))
 @@ -313,6 +378,7 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int 
 irq,
 int irq_source_id,
  elseret = 0; /* report coalesced interrupt 
 */   } +out:
  trace_kvm_ioapic_set_irq(entry.bits, irq, ret == 0);
  spin_unlock(ioapic-lock);
 @@ -340,6 +406,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu
 *vcpu,
  if (ent-fields.vector != vector)
  continue;
 +rtc_irq_ack_eoi(vcpu, ioapic-rtc_status, i);
  /*
   * We are dropping lock while calling ack notifiers because ack
   * notifier callbacks for assigned devices call into IOAPIC
 --
 1.7.1
 
 --
   Gleb.


Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PULL] vhost: tcm_vhost fixes for 3.9

2013-03-18 Thread Michael S. Tsirkin

The following changes since commit 8c6216d7f118a128678270824b6a1286a63863ca:

  Revert ip_gre: make ipgre_tunnel_xmit() not parse network header as IP 
unconditionally (2013-03-16 23:00:41 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net

for you to fetch changes up to deb7cb067dabd3be625eaf5495da8bdc97377fc1:

  tcm_vhost: Flush vhost_work in vhost_scsi_flush() (2013-03-17 13:04:14 +0200)


Asias He (2):
  tcm_vhost: Add missed lock in vhost_scsi_clear_endpoint()
  tcm_vhost: Flush vhost_work in vhost_scsi_flush()

 drivers/vhost/tcm_vhost.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM call agenda for 2013-03-19

2013-03-18 Thread Juan Quintela


Hi

Please send in any agenda topics you are interested in.

Later, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 0/5] KVM: VMX: Add Posted Interrupt supporting

2013-03-18 Thread Gleb Natapov

On Mon, Mar 18, 2013 at 10:43:16AM +, Zhang, Yang Z wrote:
  For TMR issue, since it has nothing to do with APICv, if we really need to 
  handle
  it later, then we may need a separate patch to fix it. But currently, we may
  focused on APICv only.
  
  What do you mean by TMR has nothing to do with APICv?
 Just ignore it. I will send out the fixing with APICv patch.
 
Does this mean we cannot get away with updating TMR like we do now?
Because if we can I prefer to not complicate the code further.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v6 0/5] KVM: VMX: Add Posted Interrupt supporting

2013-03-18 Thread Zhang, Yang Z

Gleb Natapov wrote on 2013-03-18:
 On Mon, Mar 18, 2013 at 10:43:16AM +, Zhang, Yang Z wrote:
 For TMR issue, since it has nothing to do with APICv, if we really need to
 handle
 it later, then we may need a separate patch to fix it. But currently, we may
 focused on APICv only.
 
 What do you mean by TMR has nothing to do with APICv?
 Just ignore it. I will send out the fixing with APICv patch.
 
 Does this mean we cannot get away with updating TMR like we do now?
 Because if we can I prefer to not complicate the code further.
No, we still need to update TMR but only in vcpu context.

Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: Set TMR when programming ioapic entry

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

We already know the trigger mode of a given interrupt when programming
the ioapice entry. So it's not necessary to set it in each interrupt
delivery.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 arch/x86/kvm/lapic.c |   26 ++
 arch/x86/kvm/lapic.h |5 ++---
 arch/x86/kvm/vmx.c   |2 ++
 arch/x86/kvm/x86.c   |   12 
 include/linux/kvm_host.h |4 ++--
 virt/kvm/ioapic.c|   17 +
 virt/kvm/ioapic.h|5 ++---
 virt/kvm/kvm_main.c  |4 ++--
 8 files changed, 41 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index d0b553b..0d2bcde 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -194,18 +194,17 @@ out:
rcu_read_unlock();
 }
 
-void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
-   struct kvm_lapic_irq *irq,
-   u64 *eoi_exit_bitmap)
+bool kvm_vcpu_match_dest(struct kvm_vcpu *vcpu,struct kvm_lapic_irq 
*irq)
 {
DECLARE_BITMAP(vcpu_map, KVM_MAX_VCPUS);
 
memset(vcpu_map, 0, sizeof(vcpu_map));
 
kvm_get_dest_vcpu(vcpu-kvm, irq, vcpu_map);
-   if (test_bit(vcpu-vcpu_id, vcpu_map) ||
-   bitmap_empty(vcpu_map, sizeof(vcpu_map)))
-   __set_bit(irq-vector, (unsigned long *)eoi_exit_bitmap);
+   if (test_bit(vcpu-vcpu_id, vcpu_map))
+   return true;
+
+   return false;
 }
 
 static void recalculate_apic_map(struct kvm *kvm)
@@ -534,6 +533,15 @@ static inline int apic_find_highest_isr(struct kvm_lapic 
*apic)
return result;
 }
 
+void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr)
+{
+   struct kvm_lapic *apic = vcpu-arch.apic;
+   int i;
+
+   for (i = 0; i  8; i++)
+   apic_set_reg(apic, APIC_TMR + 0x10 * i, tmr[i]);
+}
+
 static void apic_update_ppr(struct kvm_lapic *apic)
 {
u32 tpr, isrv, ppr, old_ppr;
@@ -723,12 +731,6 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
if (unlikely(!apic_enabled(apic)))
break;
 
-   if (trig_mode) {
-   apic_debug(level trig mode for vector %d, vector);
-   apic_set_vector(vector, apic-regs + APIC_TMR);
-   } else
-   apic_clear_vector(vector, apic-regs + APIC_TMR);
-
result = 1;
if (!kvm_x86_ops-deliver_posted_interrupt(vcpu, vector))
result = !apic_test_and_set_irr(vector, apic);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 4d38836..7e8d35f 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -48,6 +48,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
 u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
 void kvm_apic_set_version(struct kvm_vcpu *vcpu);
 
+void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr);
 int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
 int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
 int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
@@ -155,9 +156,7 @@ static inline u16 apic_logical_id(struct kvm_apic_map *map, 
u32 ldr)
return ldr  map-lid_mask;
 }
 
-void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
-   struct kvm_lapic_irq *irq,
-   u64 *eoi_bitmap);
+bool kvm_vcpu_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
 void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
 
 void kvm_get_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1985849..1571df2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6486,6 +6486,8 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, 
int max_irr)
 
 static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 {
+   if (!vmx_vm_has_apicv(vcpu-kvm))
+   return;
vmcs_write64(EOI_EXIT_BITMAP0, eoi_exit_bitmap[0]);
vmcs_write64(EOI_EXIT_BITMAP1, eoi_exit_bitmap[1]);
vmcs_write64(EOI_EXIT_BITMAP2, eoi_exit_bitmap[2]);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f54987..5b88c2c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5632,13 +5632,17 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
 #endif
 }
 
-static void update_eoi_exitmap(struct kvm_vcpu *vcpu)
+static void update_intr_bitmap(struct kvm_vcpu *vcpu)
 {
u64 eoi_exit_bitmap[4];
+   u64 tmr[4];
 
memset(eoi_exit_bitmap, 0, 32);
+   memset(tmr, 0, 32);
 
-   kvm_ioapic_calculate_eoi_exitmap(vcpu, eoi_exit_bitmap);
+   kvm_ioapic_get_intr_map(vcpu, (unsigned long *)tmr,
+   (unsigned long *)eoi_exit_bitmap);
+   kvm_apic_update_tmr(vcpu, (u32

Re: [PATCH] KVM: Set TMR when programming ioapic entry

2013-03-18 Thread Gleb Natapov

On Mon, Mar 18, 2013 at 07:42:22PM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 We already know the trigger mode of a given interrupt when programming
 the ioapice entry. So it's not necessary to set it in each interrupt
 delivery.
 
What this patch suppose to go on top of?

 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/x86/kvm/lapic.c |   26 ++
  arch/x86/kvm/lapic.h |5 ++---
  arch/x86/kvm/vmx.c   |2 ++
  arch/x86/kvm/x86.c   |   12 
  include/linux/kvm_host.h |4 ++--
  virt/kvm/ioapic.c|   17 +
  virt/kvm/ioapic.h|5 ++---
  virt/kvm/kvm_main.c  |4 ++--
  8 files changed, 41 insertions(+), 34 deletions(-)
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index d0b553b..0d2bcde 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -194,18 +194,17 @@ out:
   rcu_read_unlock();
  }
  
 -void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
 - struct kvm_lapic_irq *irq,
 - u64 *eoi_exit_bitmap)
 +bool kvm_vcpu_match_dest(struct kvm_vcpu *vcpu,  struct kvm_lapic_irq 
 *irq)
  {
   DECLARE_BITMAP(vcpu_map, KVM_MAX_VCPUS);
  
   memset(vcpu_map, 0, sizeof(vcpu_map));
  
   kvm_get_dest_vcpu(vcpu-kvm, irq, vcpu_map);
 - if (test_bit(vcpu-vcpu_id, vcpu_map) ||
 - bitmap_empty(vcpu_map, sizeof(vcpu_map)))
 - __set_bit(irq-vector, (unsigned long *)eoi_exit_bitmap);
 + if (test_bit(vcpu-vcpu_id, vcpu_map))
 + return true;
 +
 + return false;
  }
  
  static void recalculate_apic_map(struct kvm *kvm)
 @@ -534,6 +533,15 @@ static inline int apic_find_highest_isr(struct kvm_lapic 
 *apic)
   return result;
  }
  
 +void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr)
 +{
 + struct kvm_lapic *apic = vcpu-arch.apic;
 + int i;
 +
 + for (i = 0; i  8; i++)
 + apic_set_reg(apic, APIC_TMR + 0x10 * i, tmr[i]);
 +}
 +
  static void apic_update_ppr(struct kvm_lapic *apic)
  {
   u32 tpr, isrv, ppr, old_ppr;
 @@ -723,12 +731,6 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
 delivery_mode,
   if (unlikely(!apic_enabled(apic)))
   break;
  
 - if (trig_mode) {
 - apic_debug(level trig mode for vector %d, vector);
 - apic_set_vector(vector, apic-regs + APIC_TMR);
 - } else
 - apic_clear_vector(vector, apic-regs + APIC_TMR);
 -
   result = 1;
   if (!kvm_x86_ops-deliver_posted_interrupt(vcpu, vector))
   result = !apic_test_and_set_irr(vector, apic);
 diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
 index 4d38836..7e8d35f 100644
 --- a/arch/x86/kvm/lapic.h
 +++ b/arch/x86/kvm/lapic.h
 @@ -48,6 +48,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
  u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
  void kvm_apic_set_version(struct kvm_vcpu *vcpu);
  
 +void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr);
  int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
  int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
 @@ -155,9 +156,7 @@ static inline u16 apic_logical_id(struct kvm_apic_map 
 *map, u32 ldr)
   return ldr  map-lid_mask;
  }
  
 -void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
 - struct kvm_lapic_irq *irq,
 - u64 *eoi_bitmap);
 +bool kvm_vcpu_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
  void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
  
  void kvm_get_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 1985849..1571df2 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -6486,6 +6486,8 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu 
 *vcpu, int max_irr)
  
  static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
  {
 + if (!vmx_vm_has_apicv(vcpu-kvm))
 + return;
   vmcs_write64(EOI_EXIT_BITMAP0, eoi_exit_bitmap[0]);
   vmcs_write64(EOI_EXIT_BITMAP1, eoi_exit_bitmap[1]);
   vmcs_write64(EOI_EXIT_BITMAP2, eoi_exit_bitmap[2]);
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 1f54987..5b88c2c 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -5632,13 +5632,17 @@ static void kvm_gen_update_masterclock(struct kvm 
 *kvm)
  #endif
  }
  
 -static void update_eoi_exitmap(struct kvm_vcpu *vcpu)
 +static void update_intr_bitmap(struct kvm_vcpu *vcpu)
  {
   u64 eoi_exit_bitmap[4];
 + u64 tmr[4];
  
   memset(eoi_exit_bitmap, 0, 32);
 + memset(tmr, 0, 32);
  
 - kvm_ioapic_calculate_eoi_exitmap(vcpu, eoi_exit_bitmap);
 +

Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

2013-03-18 Thread Xiao Guangrong

On 03/18/2013 05:13 PM, Gleb Natapov wrote:
 On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
 On 03/17/2013 11:02 PM, Gleb Natapov wrote:
 On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
 This patch tries to introduce a very simple and scale way to invalid all
 mmio sptes - it need not walk any shadow pages and hold mmu-lock

 KVM maintains a global mmio invalid generation-number which is stored in
 kvm-arch.mmio_invalid_gen and every mmio spte stores the current global
 generation-number into his available bits when it is created

 When KVM need zap all mmio sptes, it just simply increase the global
 generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
 then it walks the shadow page table and get the mmio spte. If the
 generation-number on the spte does not equal the global generation-number,
 it will go to the normal #PF handler to update the mmio spte

 Since 19 bits are used to store generation-number on mmio spte, the
 generation-number can be round after 33554432 times. It is large enough
 for nearly all most cases, but making the code be more strong, we zap all
 shadow pages when the number is round

 Very nice idea, but why drop Takuya patches instead of using
 kvm_mmu_zap_mmio_sptes() when generation number overflows.

 I am not sure whether it is still needed. Requesting to zap all mmio sptes 
 for
 more than 50 times is really really rare, it nearly does not happen.
 (By the way, 33554432 is wrong in the changelog, i just copy that for my 
 origin
 implantation.) And, after my patch optimizing zapping all shadow pages,
 zap-all-sps should not be a problem anymore since it does not take too much 
 lock
 time.

 Your idea?

 I expect 50 to become less since I already had plans to store some

Interesting, just curious, what are the plans? ;)

 information in mmio spte. Even if all zap-all-sptes becomes faster we
 still needlessly zap all sptes while we can zap only mmio.

Okay.

 


 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/include/asm/kvm_host.h |2 +
  arch/x86/kvm/mmu.c  |   61 
 +--
  arch/x86/kvm/mmutrace.h |   17 +++
  arch/x86/kvm/paging_tmpl.h  |7 +++-
  arch/x86/kvm/vmx.c  |4 ++
  arch/x86/kvm/x86.c  |6 +--
  6 files changed, 82 insertions(+), 15 deletions(-)

 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index ef7f4a5..572398e 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -529,6 +529,7 @@ struct kvm_arch {
unsigned int n_requested_mmu_pages;
unsigned int n_max_mmu_pages;
unsigned int indirect_shadow_pages;
 +  unsigned int mmio_invalid_gen;
 Why invalid? Should be mmio_valid_gen or mmio_current_get.

 mmio_invalid_gen is only updated in kvm_mmu_invalidate_mmio_sptes,
 so i named it as _invalid_. But mmio_valid_gen is good for me.

 It holds currently valid value though, so calling it invalid is
 confusing.

I agree.

 

struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
/*
 * Hash table of struct kvm_mmu_page.
 @@ -765,6 +766,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, 
 int slot);
  void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 struct kvm_memory_slot *slot,
 gfn_t gfn_offset, unsigned long mask);
 +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm);
 Agree with Takuya that kvm_mmu_invalidate_mmio_sptes() is a better name.

 Me too.


  void kvm_mmu_zap_all(struct kvm *kvm);
  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int 
 kvm_nr_mmu_pages);
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 13626f4..7093a92 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -234,12 +234,13 @@ static unsigned int get_mmio_spte_generation(u64 
 spte)
  static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
   unsigned access)
  {
 -  u64 mask = generation_mmio_spte_mask(0);
 +  unsigned int gen = ACCESS_ONCE(kvm-arch.mmio_invalid_gen);
 +  u64 mask = generation_mmio_spte_mask(gen);

access = ACC_WRITE_MASK | ACC_USER_MASK;
mask |= shadow_mmio_mask | access | gfn  PAGE_SHIFT;

 -  trace_mark_mmio_spte(sptep, gfn, access, 0);
 +  trace_mark_mmio_spte(sptep, gfn, access, gen);
mmu_spte_set(sptep, mask);
  }

 @@ -269,6 +270,34 @@ static bool set_mmio_spte(struct kvm *kvm, u64 
 *sptep, gfn_t gfn,
return false;
  }

 +static bool check_mmio_spte(struct kvm *kvm, u64 spte)
 +{
 +  return get_mmio_spte_generation(spte) ==
 +  ACCESS_ONCE(kvm-arch.mmio_invalid_gen);
 +}
 +
 +/*
 + * The caller should protect concurrent access on
 + * kvm-arch.mmio_invalid_gen. Currently, it is used by
 + * kvm_arch_commit_memory_region and protected by kvm-slots_lock.
 + */
 +void

RE: [PATCH] KVM: Set TMR when programming ioapic entry

2013-03-18 Thread Zhang, Yang Z

Gleb Natapov wrote on 2013-03-18:
 On Mon, Mar 18, 2013 at 07:42:22PM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 We already know the trigger mode of a given interrupt when programming
 the ioapice entry. So it's not necessary to set it in each interrupt
 delivery.
 
 What this patch suppose to go on top of?
Sorry, forget to mention it.
This is based on RTC patch(Use eoi to track RTC interrupt delivery status) and 
Posted interrupt patch(KVM: VMX: Add Posted Interrupt supporting).

 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/x86/kvm/lapic.c |   26 ++
  arch/x86/kvm/lapic.h |5 ++---
  arch/x86/kvm/vmx.c   |2 ++
  arch/x86/kvm/x86.c   |   12 
  include/linux/kvm_host.h |4 ++--
  virt/kvm/ioapic.c|   17 +
  virt/kvm/ioapic.h|5 ++---
  virt/kvm/kvm_main.c  |4 ++--
  8 files changed, 41 insertions(+), 34 deletions(-)
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index d0b553b..0d2bcde 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -194,18 +194,17 @@ out:
  rcu_read_unlock();
  }
 -void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
 -struct kvm_lapic_irq *irq,
 -u64 *eoi_exit_bitmap)
 +bool kvm_vcpu_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic_irq
 *irq)
  {
  DECLARE_BITMAP(vcpu_map, KVM_MAX_VCPUS);
  
  memset(vcpu_map, 0, sizeof(vcpu_map));
  
  kvm_get_dest_vcpu(vcpu-kvm, irq, vcpu_map);
 -if (test_bit(vcpu-vcpu_id, vcpu_map) ||
 -bitmap_empty(vcpu_map, sizeof(vcpu_map)))
 -__set_bit(irq-vector, (unsigned long *)eoi_exit_bitmap);
 +if (test_bit(vcpu-vcpu_id, vcpu_map))
 +return true;
 +
 +return false;
  }
  
  static void recalculate_apic_map(struct kvm *kvm)
 @@ -534,6 +533,15 @@ static inline int apic_find_highest_isr(struct kvm_lapic
 *apic)
  return result;
  }
 +void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr)
 +{
 +struct kvm_lapic *apic = vcpu-arch.apic;
 +int i;
 +
 +for (i = 0; i  8; i++)
 +apic_set_reg(apic, APIC_TMR + 0x10 * i, tmr[i]);
 +}
 +
  static void apic_update_ppr(struct kvm_lapic *apic)
  {
  u32 tpr, isrv, ppr, old_ppr;
 @@ -723,12 +731,6 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int
 delivery_mode,
  if (unlikely(!apic_enabled(apic)))
  break;
 -if (trig_mode) {
 -apic_debug(level trig mode for vector %d, vector);
 -apic_set_vector(vector, apic-regs + APIC_TMR);
 -} else
 -apic_clear_vector(vector, apic-regs + APIC_TMR);
 -
  result = 1;
  if (!kvm_x86_ops-deliver_posted_interrupt(vcpu, vector))
  result = !apic_test_and_set_irr(vector, apic);
 diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
 index 4d38836..7e8d35f 100644
 --- a/arch/x86/kvm/lapic.h
 +++ b/arch/x86/kvm/lapic.h
 @@ -48,6 +48,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64
 value);
  u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
  void kvm_apic_set_version(struct kvm_vcpu *vcpu);
 +void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr);
  int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
  int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
 @@ -155,9 +156,7 @@ static inline u16 apic_logical_id(struct kvm_apic_map
 *map, u32 ldr)
  return ldr  map-lid_mask;
  }
 -void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
 -struct kvm_lapic_irq *irq,
 -u64 *eoi_bitmap);
 +bool kvm_vcpu_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic_irq
 *irq);
  void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
  
  void kvm_get_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
 1985849..1571df2 100644 --- a/arch/x86/kvm/vmx.c +++
 b/arch/x86/kvm/vmx.c @@ -6486,6 +6486,8 @@ static void
 vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
 
  static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64
  *eoi_exit_bitmap) {
 +if (!vmx_vm_has_apicv(vcpu-kvm))
 +return;
  vmcs_write64(EOI_EXIT_BITMAP0, eoi_exit_bitmap[0]);
  vmcs_write64(EOI_EXIT_BITMAP1, eoi_exit_bitmap[1]);
  vmcs_write64(EOI_EXIT_BITMAP2, eoi_exit_bitmap[2]);
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 1f54987..5b88c2c 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -5632,13 +5632,17 @@ static void kvm_gen_update_masterclock(struct
 kvm *kvm)
  #endif
  }
 -static void update_eoi_exitmap(struct kvm_vcpu *vcpu)
 +static void update_intr_bitmap(struct kvm_vcpu *vcpu)
  {
  u64 eoi_exit_bitmap[4];
 +u64 tmr[4];

Re: [PATCH 4/5] KVM: MMU: store generation-number into mmio spte

2013-03-18 Thread Xiao Guangrong

On 03/18/2013 07:19 PM, Paolo Bonzini wrote:
 Il 15/03/2013 16:29, Xiao Guangrong ha scritto:
 +/*
 + * spte bits of bit 3 ~ bit 11 are used as low 9 bits of
 + * generation, the bits of bits 52 ~ bit 61 are used as
 + * high 12 bits of generation.
 + */
 
 High 10 bits.

Yes, i forgot to update the comments! Thank you, Paolo!

 
 How often does the generation number change?  Especially with Takuya's
 patches, using just 10 bits for the generation might be enough and
 simplifies the code.

I guess it's only updated when guest is initializing and doing hotplug.
In my origin thought, if the number is big enough that it could not
round on all most case of hotplug, the things introduced by Takuya is
not necessary. But, if you guys want to keep the things, 10 bits should
be enough. :)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

2013-03-18 Thread Gleb Natapov

On Mon, Mar 18, 2013 at 08:29:29PM +0800, Xiao Guangrong wrote:
 On 03/18/2013 05:13 PM, Gleb Natapov wrote:
  On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
  On 03/17/2013 11:02 PM, Gleb Natapov wrote:
  On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
  This patch tries to introduce a very simple and scale way to invalid all
  mmio sptes - it need not walk any shadow pages and hold mmu-lock
 
  KVM maintains a global mmio invalid generation-number which is stored in
  kvm-arch.mmio_invalid_gen and every mmio spte stores the current global
  generation-number into his available bits when it is created
 
  When KVM need zap all mmio sptes, it just simply increase the global
  generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
  then it walks the shadow page table and get the mmio spte. If the
  generation-number on the spte does not equal the global 
  generation-number,
  it will go to the normal #PF handler to update the mmio spte
 
  Since 19 bits are used to store generation-number on mmio spte, the
  generation-number can be round after 33554432 times. It is large enough
  for nearly all most cases, but making the code be more strong, we zap all
  shadow pages when the number is round
 
  Very nice idea, but why drop Takuya patches instead of using
  kvm_mmu_zap_mmio_sptes() when generation number overflows.
 
  I am not sure whether it is still needed. Requesting to zap all mmio sptes 
  for
  more than 50 times is really really rare, it nearly does not happen.
  (By the way, 33554432 is wrong in the changelog, i just copy that for my 
  origin
  implantation.) And, after my patch optimizing zapping all shadow pages,
  zap-all-sps should not be a problem anymore since it does not take too 
  much lock
  time.
 
  Your idea?
 
  I expect 50 to become less since I already had plans to store some
 
 Interesting, just curious, what are the plans? ;)
 
Currently we uses pio to signal that work is pending to virtio devices. The
requirement is that signaling should be fast and PIO is fast since there
is not need to emulate instruction. PCIE though is not really designed
with PIO in mind, so we will have to use MMIO to do signaling. To avoid
instruction emulation I thought about making guest access these devices using
predefined variety of MOV instruction so that emulation can be skipped.
The idea is to mark mmio spte to know that emulation is not needed.

  information in mmio spte. Even if all zap-all-sptes becomes faster we
  still needlessly zap all sptes while we can zap only mmio.
 
 Okay.
 
  
 
 
  Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
  ---
   arch/x86/include/asm/kvm_host.h |2 +
   arch/x86/kvm/mmu.c  |   61 
  +--
   arch/x86/kvm/mmutrace.h |   17 +++
   arch/x86/kvm/paging_tmpl.h  |7 +++-
   arch/x86/kvm/vmx.c  |4 ++
   arch/x86/kvm/x86.c  |6 +--
   6 files changed, 82 insertions(+), 15 deletions(-)
 
  diff --git a/arch/x86/include/asm/kvm_host.h 
  b/arch/x86/include/asm/kvm_host.h
  index ef7f4a5..572398e 100644
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -529,6 +529,7 @@ struct kvm_arch {
   unsigned int n_requested_mmu_pages;
   unsigned int n_max_mmu_pages;
   unsigned int indirect_shadow_pages;
  +unsigned int mmio_invalid_gen;
  Why invalid? Should be mmio_valid_gen or mmio_current_get.
 
  mmio_invalid_gen is only updated in kvm_mmu_invalidate_mmio_sptes,
  so i named it as _invalid_. But mmio_valid_gen is good for me.
 
  It holds currently valid value though, so calling it invalid is
  confusing.
 
 I agree.
 
  
 
   struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
   /*
* Hash table of struct kvm_mmu_page.
  @@ -765,6 +766,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm 
  *kvm, int slot);
   void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset, unsigned long 
  mask);
  +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm);
  Agree with Takuya that kvm_mmu_invalidate_mmio_sptes() is a better name.
 
  Me too.
 
 
   void kvm_mmu_zap_all(struct kvm *kvm);
   unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
   void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int 
  kvm_nr_mmu_pages);
  diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
  index 13626f4..7093a92 100644
  --- a/arch/x86/kvm/mmu.c
  +++ b/arch/x86/kvm/mmu.c
  @@ -234,12 +234,13 @@ static unsigned int get_mmio_spte_generation(u64 
  spte)
   static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
  unsigned access)
   {
  -u64 mask = generation_mmio_spte_mask(0);
  +unsigned int gen = ACCESS_ONCE(kvm-arch.mmio_invalid_gen);
  +

Re: [PATCH 4/5] KVM: MMU: store generation-number into mmio spte

2013-03-18 Thread Gleb Natapov

On Mon, Mar 18, 2013 at 08:42:09PM +0800, Xiao Guangrong wrote:
 On 03/18/2013 07:19 PM, Paolo Bonzini wrote:
  Il 15/03/2013 16:29, Xiao Guangrong ha scritto:
  +/*
  + * spte bits of bit 3 ~ bit 11 are used as low 9 bits of
  + * generation, the bits of bits 52 ~ bit 61 are used as
  + * high 12 bits of generation.
  + */
  
  High 10 bits.
 
 Yes, i forgot to update the comments! Thank you, Paolo!
 
  
  How often does the generation number change?  Especially with Takuya's
  patches, using just 10 bits for the generation might be enough and
  simplifies the code.
 
 I guess it's only updated when guest is initializing and doing hotplug.
 In my origin thought, if the number is big enough that it could not
 round on all most case of hotplug, the things introduced by Takuya is
 not necessary. But, if you guys want to keep the things, 10 bits should
 be enough. :)
I think you encapsulated the 19bit generation number handling good enough, so
I personally do not mind making it as big as we can.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Can I bridge the loopback?

2013-03-18 Thread Stefan Hajnoczi

On Sat, Mar 16, 2013 at 12:06:30AM -0500, Steve wrote:
 Here's the issue. I want to communicate between virtual machines, second 
 Ethernet virtual port. But I would like to use the host loopback for that so 
 as to not be limited to Ethernet port speeds, for large copies, etc. Right 
 now, the machine is connected to a 10mbps switch on port 2 and would like to 
 get far faster transfer speeds when using the so called private LAN. Bridging 
 eth1 merely limits the speed to 10 Mbps. If it was bridged to the host 
 loopback, I was hoping it could achieve far faster speeds and also not 
 saturate the switch.
 
 So, could I make a br1 that is assigned to 127.0.0.1 and then each host can 
 use that as eth1?--

Guest-guest communication is not affected by physical NIC link speed.
A software bridge with the guest tap interfaces and the host's physical
interface should allow guests to communicate 10 Mbps.

Have you measured the speed of guest-guest networking and found it is
10 Mbps?

If you still experience poor performance, please post your QEMU
command-line, ifconfig -a (on host), and brctl show (on host) output.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: Set TMR when programming ioapic entry

2013-03-18 Thread Gleb Natapov

On Mon, Mar 18, 2013 at 12:32:51PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-03-18:
  On Mon, Mar 18, 2013 at 07:42:22PM +0800, Yang Zhang wrote:
  From: Yang Zhang yang.z.zh...@intel.com
  
  We already know the trigger mode of a given interrupt when programming
  the ioapice entry. So it's not necessary to set it in each interrupt
  delivery.
  
  What this patch suppose to go on top of?
 Sorry, forget to mention it.
 This is based on RTC patch(Use eoi to track RTC interrupt delivery status) 
 and Posted interrupt patch(KVM: VMX: Add Posted Interrupt supporting).
 
Since it touches the code added by RTC patch series that will be changed
after my latest comments it is hard to review that.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 0/8] Use eoi to track RTC interrupt delivery status

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

Current interrupt coalescing logci which only used by RTC has conflict
with Posted Interrupt.

This patch introduces a new mechinism to use eoi to track interrupt:
When delivering an interrupt to vcpu, the need_eoi set to number of
vcpu that received the interrupt. And decrease it when each vcpu writing
eoi. No subsequent RTC interrupt can deliver to vcpu until all vcpus
write eoi.

Changes from v2 to v3:
* Remove unused irq_ack_notifier.
* Acquire ioapic-lock before calculte destination vcpu map.
* Copy vcpu_map to expected_eoi_timap on each RTC irq and clear it on eoi.

Changes from v1 to v2:
* Don't register a dumy ack notifier for RTC. Check it directly when calculating
  eoi exit bitmap.
* Use kvm_apic_pending_eoi() instead call apic_test_vector() directly.
* Check coalesced info before set ioapic-irr
* Calculate destination vcpu map of RTC when ioapic entry or apic(id,ldr/dfr) 
changed.
  And only set need_eoi when delivering RTC interrupt.

Yang Zhang (8):
  KVM: Parse ioapic entry to get destination vcpu
  KVM: Rename kvm_ioapic_make_eoibitmap_request to
kvm_scan_ioapic_entry
  KVM: Add vcpu info to ioapic_update_eoi()
  KVM: Introduce struct rtc_status
  KVM: Recalculate destination vcpu map
  KVM: Add reset/restore rtc_status support
  KVM: Add rtc irq to eoi exit bitmap
  KVM: Use eoi to track RTC interrupt delivery status

 arch/x86/kvm/lapic.c |   52 ++--
 arch/x86/kvm/lapic.h |4 +
 virt/kvm/ioapic.c|  165 ++
 virt/kvm/ioapic.h|   14 -
 virt/kvm/irq_comm.c  |4 +-
 5 files changed, 205 insertions(+), 34 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/8] KVM: Rename kvm_ioapic_make_eoibitmap_request to kvm_scan_ioapic_entry

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

RTC interrupt coalesced need to parse ioapic entry to get destionation
vcpu too. Rename it to more common name.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 arch/x86/kvm/lapic.c |2 +-
 virt/kvm/ioapic.c|6 +++---
 virt/kvm/ioapic.h|2 +-
 virt/kvm/irq_comm.c  |4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b4339a5..12179b3 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -264,7 +264,7 @@ out:
if (old)
kfree_rcu(old, rcu);
 
-   kvm_ioapic_make_eoibitmap_request(kvm);
+   kvm_scan_ioapic_entry(kvm);
 }
 
 static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index ce82b94..d087e07 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -143,7 +143,7 @@ void kvm_ioapic_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
 }
 EXPORT_SYMBOL_GPL(kvm_ioapic_calculate_eoi_exitmap);
 
-void kvm_ioapic_make_eoibitmap_request(struct kvm *kvm)
+void kvm_scan_ioapic_entry(struct kvm *kvm)
 {
struct kvm_ioapic *ioapic = kvm-arch.vioapic;
 
@@ -193,7 +193,7 @@ static void ioapic_write_indirect(struct kvm_ioapic 
*ioapic, u32 val)
if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG
 ioapic-irr  (1  index))
ioapic_service(ioapic, index);
-   kvm_ioapic_make_eoibitmap_request(ioapic-kvm);
+   kvm_scan_ioapic_entry(ioapic-kvm);
break;
}
 }
@@ -493,7 +493,7 @@ int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state 
*state)
spin_lock(ioapic-lock);
memcpy(ioapic, state, sizeof(struct kvm_ioapic_state));
update_handled_vectors(ioapic);
-   kvm_ioapic_make_eoibitmap_request(kvm);
+   kvm_scan_ioapic_entry(kvm);
spin_unlock(ioapic-lock);
return 0;
 }
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index 0400a46..cebc123 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -82,7 +82,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
struct kvm_lapic_irq *irq);
 int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
 int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
-void kvm_ioapic_make_eoibitmap_request(struct kvm *kvm);
+void kvm_scan_ioapic_entry(struct kvm *kvm);
 void kvm_ioapic_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
u64 *eoi_exit_bitmap);
 
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index ff6d40e..15de3b7 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -284,7 +284,7 @@ void kvm_register_irq_ack_notifier(struct kvm *kvm,
mutex_lock(kvm-irq_lock);
hlist_add_head_rcu(kian-link, kvm-irq_ack_notifier_list);
mutex_unlock(kvm-irq_lock);
-   kvm_ioapic_make_eoibitmap_request(kvm);
+   kvm_scan_ioapic_entry(kvm);
 }
 
 void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
@@ -294,7 +294,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
hlist_del_init_rcu(kian-link);
mutex_unlock(kvm-irq_lock);
synchronize_rcu();
-   kvm_ioapic_make_eoibitmap_request(kvm);
+   kvm_scan_ioapic_entry(kvm);
 }
 
 int kvm_request_irq_source_id(struct kvm *kvm)
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 3/8] KVM: Add vcpu info to ioapic_update_eoi()

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

Need to know which vcpu writes EOI.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 arch/x86/kvm/lapic.c |2 +-
 virt/kvm/ioapic.c|   12 ++--
 virt/kvm/ioapic.h|3 ++-
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 12179b3..6fb22e3 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -790,7 +790,7 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int 
vector)
trigger_mode = IOAPIC_LEVEL_TRIG;
else
trigger_mode = IOAPIC_EDGE_TRIG;
-   kvm_ioapic_update_eoi(apic-vcpu-kvm, vector, trigger_mode);
+   kvm_ioapic_update_eoi(apic-vcpu, vector, trigger_mode);
}
 }
 
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index d087e07..4296116 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -264,8 +264,8 @@ void kvm_ioapic_clear_all(struct kvm_ioapic *ioapic, int 
irq_source_id)
spin_unlock(ioapic-lock);
 }
 
-static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
-int trigger_mode)
+static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
+   struct kvm_ioapic *ioapic, int vector, int trigger_mode)
 {
int i;
 
@@ -304,12 +304,12 @@ bool kvm_ioapic_handles_vector(struct kvm *kvm, int 
vector)
return test_bit(vector, ioapic-handled_vectors);
 }
 
-void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int trigger_mode)
+void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
 {
-   struct kvm_ioapic *ioapic = kvm-arch.vioapic;
+   struct kvm_ioapic *ioapic = vcpu-kvm-arch.vioapic;
 
spin_lock(ioapic-lock);
-   __kvm_ioapic_update_eoi(ioapic, vector, trigger_mode);
+   __kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode);
spin_unlock(ioapic-lock);
 }
 
@@ -407,7 +407,7 @@ static int ioapic_mmio_write(struct kvm_io_device *this, 
gpa_t addr, int len,
break;
 #ifdef CONFIG_IA64
case IOAPIC_REG_EOI:
-   __kvm_ioapic_update_eoi(ioapic, data, IOAPIC_LEVEL_TRIG);
+   __kvm_ioapic_update_eoi(NULL, ioapic, data, IOAPIC_LEVEL_TRIG);
break;
 #endif
 
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index cebc123..2001b61 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -70,7 +70,8 @@ static inline struct kvm_ioapic *ioapic_irqchip(struct kvm 
*kvm)
 int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
int short_hand, int dest, int dest_mode);
 int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2);
-void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int trigger_mode);
+void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector,
+   int trigger_mode);
 bool kvm_ioapic_handles_vector(struct kvm *kvm, int vector);
 int kvm_ioapic_init(struct kvm *kvm);
 void kvm_ioapic_destroy(struct kvm *kvm);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 4/8] KVM: Introduce struct rtc_status

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 virt/kvm/ioapic.h |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index 2001b61..20715d0 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -34,6 +34,12 @@ struct kvm_vcpu;
 #defineIOAPIC_INIT 0x5
 #defineIOAPIC_EXTINT   0x7
 
+struct rtc_status {
+   int need_eoi;
+   DECLARE_BITMAP(vcpu_map, KVM_MAX_VCPUS);
+   DECLARE_BITMAP(expected_eoi_bitmap, KVM_MAX_VCPUS);
+};
+
 struct kvm_ioapic {
u64 base_address;
u32 ioregsel;
@@ -47,6 +53,9 @@ struct kvm_ioapic {
void (*ack_notifier)(void *opaque, int irq);
spinlock_t lock;
DECLARE_BITMAP(handled_vectors, 256);
+#ifdef CONFIG_X86
+   struct rtc_status rtc_status;
+#endif
 };
 
 #ifdef DEBUG
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 5/8] KVM: Recalculate destination vcpu map

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

Update destination vcpu map when ioapic entry or apic(id, ldr, dfr) is changed

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 virt/kvm/ioapic.c |   40 ++--
 1 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 4296116..329efe1 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -87,6 +87,38 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic 
*ioapic,
return result;
 }
 
+#ifdef CONFIG_X86
+static void rtc_irq_get_dest_vcpu(struct kvm_ioapic *ioapic, int irq)
+{
+   union kvm_ioapic_redirect_entry *entry = ioapic-redirtbl[irq];
+   struct kvm_lapic_irq irqe;
+
+   if (irq != 8 || entry-fields.mask)
+   return;
+
+   spin_lock(ioapic-lock);
+   irqe.dest_id = entry-fields.dest_id;
+   irqe.vector = entry-fields.vector;
+   irqe.dest_mode = entry-fields.dest_mode;
+   irqe.trig_mode = entry-fields.trig_mode;
+   irqe.delivery_mode = entry-fields.delivery_mode  8;
+   irqe.level = 1;
+   irqe.shorthand = 0;
+
+   bitmap_zero(ioapic-rtc_status.vcpu_map, KVM_MAX_VCPUS);
+
+   kvm_get_dest_vcpu(ioapic-kvm, irqe, ioapic-rtc_status.vcpu_map);
+   spin_unlock(ioapic-lock);
+}
+
+#else
+
+static void rtc_irq_get_dest_vcpu(struct kvm_ioapic *ioapic, int irq)
+{
+   return;
+}
+#endif
+
 static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
 {
union kvm_ioapic_redirect_entry *pent;
@@ -147,9 +179,13 @@ void kvm_scan_ioapic_entry(struct kvm *kvm)
 {
struct kvm_ioapic *ioapic = kvm-arch.vioapic;
 
-   if (!kvm_apic_vid_enabled(kvm) || !ioapic)
+   if (!ioapic)
return;
-   kvm_make_update_eoibitmap_request(kvm);
+
+   rtc_irq_get_dest_vcpu(ioapic, 8);
+
+   if (kvm_apic_vid_enabled(kvm))
+   kvm_make_update_eoibitmap_request(kvm);
 }
 
 static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 7/8] KVM: Add rtc irq to eoi exit bitmap

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

Add rtc irq to eoi exit bitmap to force vmexit when virtual interrupt
delivery is enabled.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 virt/kvm/ioapic.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index b74c73f..4ddaa07 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -194,7 +194,7 @@ void kvm_ioapic_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
if (!e-fields.mask 
(e-fields.trig_mode == IOAPIC_LEVEL_TRIG ||
 kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC,
-index))) {
+index) || index == 8)) {
irqe.dest_id = e-fields.dest_id;
irqe.vector = e-fields.vector;
irqe.dest_mode = e-fields.dest_mode;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 6/8] KVM: Add reset/restore rtc_status support

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

reset/restore rtc_status when ioapic reset/restore.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 arch/x86/kvm/lapic.c |8 
 arch/x86/kvm/lapic.h |1 +
 virt/kvm/ioapic.c|   33 +
 3 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6fb22e3..a223170 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -94,6 +94,14 @@ static inline int apic_test_vector(int vec, void *bitmap)
return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
 }
 
+bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
+{
+   struct kvm_lapic *apic = vcpu-arch.apic;
+
+   return apic_test_vector(vector, apic-regs + APIC_ISR) ||
+   apic_test_vector(vector, apic-regs + APIC_IRR);
+}
+
 static inline void apic_set_vector(int vec, void *bitmap)
 {
set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 3a0f9d8..e2a03d1 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -160,5 +160,6 @@ void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
 
 void kvm_get_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
unsigned long *vcpu_map);
+bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
 #endif
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 329efe1..b74c73f 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -88,6 +88,27 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic 
*ioapic,
 }
 
 #ifdef CONFIG_X86
+static void rtc_irq_reset(struct kvm_ioapic *ioapic)
+{
+   ioapic-rtc_status.need_eoi = 0;
+   bitmap_zero(ioapic-rtc_status.vcpu_map, KVM_MAX_VCPUS);
+}
+
+static void rtc_irq_restore(struct kvm_ioapic *ioapic)
+{
+   struct kvm_vcpu *vcpu;
+   int vector, i, need_eoi = 0, rtc_pin = 8;
+
+   vector = ioapic-redirtbl[rtc_pin].fields.vector;
+   kvm_for_each_vcpu(i, vcpu, ioapic-kvm) {
+   if (kvm_apic_pending_eoi(vcpu, vector)) {
+   need_eoi++;
+   set_bit(vcpu-vcpu_id, ioapic-rtc_status.vcpu_map);
+   }
+   }
+   ioapic-rtc_status.need_eoi = need_eoi;
+}
+
 static void rtc_irq_get_dest_vcpu(struct kvm_ioapic *ioapic, int irq)
 {
union kvm_ioapic_redirect_entry *entry = ioapic-redirtbl[irq];
@@ -113,6 +134,16 @@ static void rtc_irq_get_dest_vcpu(struct kvm_ioapic 
*ioapic, int irq)
 
 #else
 
+static void rtc_irq_reset(struct kvm_ioapic *ioapic)
+{
+   return;
+}
+
+static void rtc_irq_restore(struct kvm_ioapic *ioapic)
+{
+   return;
+}
+
 static void rtc_irq_get_dest_vcpu(struct kvm_ioapic *ioapic, int irq)
 {
return;
@@ -464,6 +495,7 @@ void kvm_ioapic_reset(struct kvm_ioapic *ioapic)
ioapic-ioregsel = 0;
ioapic-irr = 0;
ioapic-id = 0;
+   rtc_irq_reset(ioapic);
update_handled_vectors(ioapic);
 }
 
@@ -529,6 +561,7 @@ int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state 
*state)
spin_lock(ioapic-lock);
memcpy(ioapic, state, sizeof(struct kvm_ioapic_state));
update_handled_vectors(ioapic);
+   rtc_irq_restore(ioapic);
kvm_scan_ioapic_entry(kvm);
spin_unlock(ioapic-lock);
return 0;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 8/8] KVM: Use eoi to track RTC interrupt delivery status

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

Current interrupt coalescing logci which only used by RTC has conflict
with Posted Interrupt.
This patch introduces a new mechinism to use eoi to track interrupt:
When delivering an interrupt to vcpu, the need_eoi set to number of
vcpu that received the interrupt. And decrease it when each vcpu writing
eoi. No subsequent RTC interrupt can deliver to vcpu until all vcpus
write eoi.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 virt/kvm/ioapic.c |   76 +++-
 1 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 4ddaa07..35b3d8c 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -103,7 +103,8 @@ static void rtc_irq_restore(struct kvm_ioapic *ioapic)
kvm_for_each_vcpu(i, vcpu, ioapic-kvm) {
if (kvm_apic_pending_eoi(vcpu, vector)) {
need_eoi++;
-   set_bit(vcpu-vcpu_id, ioapic-rtc_status.vcpu_map);
+   set_bit(vcpu-vcpu_id,
+   ioapic-rtc_status.expected_eoi_bitmap);
}
}
ioapic-rtc_status.need_eoi = need_eoi;
@@ -132,6 +133,50 @@ static void rtc_irq_get_dest_vcpu(struct kvm_ioapic 
*ioapic, int irq)
spin_unlock(ioapic-lock);
 }
 
+static void rtc_irq_set_eoi(struct kvm_ioapic *ioapic, int irq)
+{
+   union kvm_ioapic_redirect_entry *entry = ioapic-redirtbl[irq];
+
+   if (irq != 8)
+   return;
+
+   if (likely(!bitmap_empty(ioapic-rtc_status.vcpu_map, KVM_MAX_VCPUS))) {
+   if (entry-fields.delivery_mode == APIC_DM_LOWEST)
+   ioapic-rtc_status.need_eoi = 1;
+   else {
+   int weight;
+   weight = bitmap_weight(ioapic-rtc_status.vcpu_map,
+   sizeof(ioapic-rtc_status.vcpu_map));
+   ioapic-rtc_status.need_eoi = weight;
+   }
+   bitmap_copy(ioapic-rtc_status.expected_eoi_bitmap,
+   ioapic-rtc_status.vcpu_map, KVM_MAX_VCPUS);
+   }
+}
+
+static void rtc_irq_ack_eoi(struct kvm_vcpu *vcpu,
+   struct rtc_status *rtc_status, int irq)
+{
+   if (irq != 8)
+   return;
+
+   if (test_and_clear_bit(vcpu-vcpu_id, rtc_status-expected_eoi_bitmap))
+   --rtc_status-need_eoi;
+
+   WARN_ON(rtc_status-need_eoi  0);
+}
+
+static bool rtc_irq_check(struct kvm_ioapic *ioapic, int irq)
+{
+   if (irq != 8)
+   return false;
+
+   if (ioapic-rtc_status.need_eoi  0)
+   return true; /* coalesced */
+
+   return false;
+}
+
 #else
 
 static void rtc_irq_reset(struct kvm_ioapic *ioapic)
@@ -148,6 +193,22 @@ static void rtc_irq_get_dest_vcpu(struct kvm_ioapic 
*ioapic, int irq)
 {
return;
 }
+
+static void rtc_irq_set_eoi(struct kvm_ioapic *ioapic, int irq)
+{
+   return;
+}
+
+static void rtc_irq_ack_eoi(struct kvm_vcpu *vcpu,
+   struct rtc_status *rtc_status, int irq)
+{
+   return;
+}
+
+static bool rtc_irq_check(struct kvm_ioapic *ioapic, int irq)
+{
+   return false;
+}
 #endif
 
 static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
@@ -260,7 +321,9 @@ static void ioapic_write_indirect(struct kvm_ioapic 
*ioapic, u32 val)
if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG
 ioapic-irr  (1  index))
ioapic_service(ioapic, index);
+   spin_unlock(ioapic-lock);
kvm_scan_ioapic_entry(ioapic-kvm);
+   spin_lock(ioapic-lock);
break;
}
 }
@@ -284,6 +347,8 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
irqe.level = 1;
irqe.shorthand = 0;
 
+   rtc_irq_set_eoi(ioapic, irq);
+
return kvm_irq_delivery_to_apic(ioapic-kvm, NULL, irqe);
 }
 
@@ -308,6 +373,11 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, 
int irq_source_id,
ret = 1;
} else {
int edge = (entry.fields.trig_mode == IOAPIC_EDGE_TRIG);
+
+   if (rtc_irq_check(ioapic, irq)) {
+   ret = 0; /* coalesced */
+   goto out;
+   }
ioapic-irr |= mask;
if ((edge  old_irr != ioapic-irr) ||
(!edge  !entry.fields.remote_irr))
@@ -315,6 +385,7 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, 
int irq_source_id,
else
ret = 0; /* report coalesced interrupt */
}
+out:
trace_kvm_ioapic_set_irq(entry.bits, irq, ret == 0);
spin_unlock(ioapic-lock);
 
@@ -342,6 +413,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
if (ent-fields.vector != vector)
continue;
 
+

[PATCH v3 1/8] KVM: Parse ioapic entry to get destination vcpu

2013-03-18 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

Get destination vcpu map from one ioapic entry.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 arch/x86/kvm/lapic.c |   40 
 arch/x86/kvm/lapic.h |3 +++
 2 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 02b51dd..b4339a5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -145,28 +145,26 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
return (kvm_apic_get_reg(apic, APIC_ID)  24)  0xff;
 }
 
-void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
-   struct kvm_lapic_irq *irq,
-   u64 *eoi_exit_bitmap)
+void kvm_get_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+   unsigned long *vcpu_map)
 {
struct kvm_lapic **dst;
struct kvm_apic_map *map;
unsigned long bitmap = 1;
+   struct kvm_vcpu *vcpu;
int i;
 
rcu_read_lock();
-   map = rcu_dereference(vcpu-kvm-arch.apic_map);
+   map = rcu_dereference(kvm-arch.apic_map);
 
-   if (unlikely(!map)) {
-   __set_bit(irq-vector, (unsigned long *)eoi_exit_bitmap);
+   if (unlikely(!map))
goto out;
-   }
 
if (irq-dest_mode == 0) { /* physical mode */
-   if (irq-delivery_mode == APIC_DM_LOWEST ||
-   irq-dest_id == 0xff) {
-   __set_bit(irq-vector,
- (unsigned long *)eoi_exit_bitmap);
+   if (irq-dest_id == 0xff) {
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   if (apic_enabled(vcpu-arch.apic))
+   set_bit(vcpu-vcpu_id, vcpu_map);
goto out;
}
dst = map-phys_map[irq-dest_id  0xff];
@@ -181,17 +179,27 @@ void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
for_each_set_bit(i, bitmap, 16) {
if (!dst[i])
continue;
-   if (dst[i]-vcpu == vcpu) {
-   __set_bit(irq-vector,
- (unsigned long *)eoi_exit_bitmap);
-   break;
-   }
+   set_bit(dst[i]-vcpu-vcpu_id, vcpu_map);
}
 
 out:
rcu_read_unlock();
 }
 
+void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
+   struct kvm_lapic_irq *irq,
+   u64 *eoi_exit_bitmap)
+{
+   DECLARE_BITMAP(vcpu_map, KVM_MAX_VCPUS);
+
+   memset(vcpu_map, 0, sizeof(vcpu_map));
+
+   kvm_get_dest_vcpu(vcpu-kvm, irq, vcpu_map);
+   if (test_bit(vcpu-vcpu_id, vcpu_map) ||
+   bitmap_empty(vcpu_map, sizeof(vcpu_map)))
+   __set_bit(irq-vector, (unsigned long *)eoi_exit_bitmap);
+}
+
 static void recalculate_apic_map(struct kvm *kvm)
 {
struct kvm_apic_map *new, *old = NULL;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 1676d34..3a0f9d8 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -158,4 +158,7 @@ void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
struct kvm_lapic_irq *irq,
u64 *eoi_bitmap);
 
+void kvm_get_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+   unsigned long *vcpu_map);
+
 #endif
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] KVM: Set TMR when programming ioapic entry

2013-03-18 Thread Zhang, Yang Z

Gleb Natapov wrote on 2013-03-18:
 On Mon, Mar 18, 2013 at 12:32:51PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-03-18:
 On Mon, Mar 18, 2013 at 07:42:22PM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 We already know the trigger mode of a given interrupt when programming
 the ioapice entry. So it's not necessary to set it in each interrupt
 delivery.
 
 What this patch suppose to go on top of?
 Sorry, forget to mention it. This is based on RTC patch(Use eoi to
 track RTC interrupt delivery status) and Posted interrupt patch(KVM:
 VMX: Add Posted Interrupt supporting).
 
 Since it touches the code added by RTC patch series that will be changed
 after my latest comments it is hard to review that.
I have send out the modified RTC patch. Please review it.

Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 55201] host panic when creating guest, doing scp and killing QEMU process continuously

2013-03-18 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=55201





--- Comment #7 from Jay Ren yongjie@intel.com  2013-03-18 13:02:06 ---
(In reply to comment #5)
 Jay Ren, would you please confirm whether the attached patch fixes the 
 problem.

Hi Marcelo, you patch can fix this bug. With your patch, I tested the loop for
about 3700 times, and didn't found any panic for Host. Without your patch, I
can reproduce this bug (host panic) in less than 300 times.
Reported-and-Tested-by: Yongjie Ren yongjie@intel.com

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

2013-03-18 Thread Xiao Guangrong

On 03/18/2013 08:46 PM, Gleb Natapov wrote:
 On Mon, Mar 18, 2013 at 08:29:29PM +0800, Xiao Guangrong wrote:
 On 03/18/2013 05:13 PM, Gleb Natapov wrote:
 On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
 On 03/17/2013 11:02 PM, Gleb Natapov wrote:
 On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
 This patch tries to introduce a very simple and scale way to invalid all
 mmio sptes - it need not walk any shadow pages and hold mmu-lock

 KVM maintains a global mmio invalid generation-number which is stored in
 kvm-arch.mmio_invalid_gen and every mmio spte stores the current global
 generation-number into his available bits when it is created

 When KVM need zap all mmio sptes, it just simply increase the global
 generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
 then it walks the shadow page table and get the mmio spte. If the
 generation-number on the spte does not equal the global 
 generation-number,
 it will go to the normal #PF handler to update the mmio spte

 Since 19 bits are used to store generation-number on mmio spte, the
 generation-number can be round after 33554432 times. It is large enough
 for nearly all most cases, but making the code be more strong, we zap all
 shadow pages when the number is round

 Very nice idea, but why drop Takuya patches instead of using
 kvm_mmu_zap_mmio_sptes() when generation number overflows.

 I am not sure whether it is still needed. Requesting to zap all mmio sptes 
 for
 more than 50 times is really really rare, it nearly does not happen.
 (By the way, 33554432 is wrong in the changelog, i just copy that for my 
 origin
 implantation.) And, after my patch optimizing zapping all shadow pages,
 zap-all-sps should not be a problem anymore since it does not take too 
 much lock
 time.

 Your idea?

 I expect 50 to become less since I already had plans to store some

 Interesting, just curious, what are the plans? ;)

 Currently we uses pio to signal that work is pending to virtio devices. The
 requirement is that signaling should be fast and PIO is fast since there
 is not need to emulate instruction. PCIE though is not really designed
 with PIO in mind, so we will have to use MMIO to do signaling. To avoid
 instruction emulation I thought about making guest access these devices using
 predefined variety of MOV instruction so that emulation can be skipped.
 The idea is to mark mmio spte to know that emulation is not needed.

How to know page-fault is caused by the predefined instruction?

 
 information in mmio spte. Even if all zap-all-sptes becomes faster we
 still needlessly zap all sptes while we can zap only mmio.

 Okay.




 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/include/asm/kvm_host.h |2 +
  arch/x86/kvm/mmu.c  |   61 
 +--
  arch/x86/kvm/mmutrace.h |   17 +++
  arch/x86/kvm/paging_tmpl.h  |7 +++-
  arch/x86/kvm/vmx.c  |4 ++
  arch/x86/kvm/x86.c  |6 +--
  6 files changed, 82 insertions(+), 15 deletions(-)

 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index ef7f4a5..572398e 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -529,6 +529,7 @@ struct kvm_arch {
  unsigned int n_requested_mmu_pages;
  unsigned int n_max_mmu_pages;
  unsigned int indirect_shadow_pages;
 +unsigned int mmio_invalid_gen;
 Why invalid? Should be mmio_valid_gen or mmio_current_get.

 mmio_invalid_gen is only updated in kvm_mmu_invalidate_mmio_sptes,
 so i named it as _invalid_. But mmio_valid_gen is good for me.

 It holds currently valid value though, so calling it invalid is
 confusing.

 I agree.



  struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
  /*
   * Hash table of struct kvm_mmu_page.
 @@ -765,6 +766,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm 
 *kvm, int slot);
  void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
   struct kvm_memory_slot *slot,
   gfn_t gfn_offset, unsigned long 
 mask);
 +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm);
 Agree with Takuya that kvm_mmu_invalidate_mmio_sptes() is a better name.

 Me too.


  void kvm_mmu_zap_all(struct kvm *kvm);
  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int 
 kvm_nr_mmu_pages);
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 13626f4..7093a92 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -234,12 +234,13 @@ static unsigned int get_mmio_spte_generation(u64 
 spte)
  static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
 unsigned access)
  {
 -u64 mask = generation_mmio_spte_mask(0);
 +unsigned int gen =

Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

2013-03-18 Thread Gleb Natapov

On Mon, Mar 18, 2013 at 09:09:43PM +0800, Xiao Guangrong wrote:
 On 03/18/2013 08:46 PM, Gleb Natapov wrote:
  On Mon, Mar 18, 2013 at 08:29:29PM +0800, Xiao Guangrong wrote:
  On 03/18/2013 05:13 PM, Gleb Natapov wrote:
  On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
  On 03/17/2013 11:02 PM, Gleb Natapov wrote:
  On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
  This patch tries to introduce a very simple and scale way to invalid 
  all
  mmio sptes - it need not walk any shadow pages and hold mmu-lock
 
  KVM maintains a global mmio invalid generation-number which is stored 
  in
  kvm-arch.mmio_invalid_gen and every mmio spte stores the current 
  global
  generation-number into his available bits when it is created
 
  When KVM need zap all mmio sptes, it just simply increase the global
  generation-number. When guests do mmio access, KVM intercepts a MMIO 
  #PF
  then it walks the shadow page table and get the mmio spte. If the
  generation-number on the spte does not equal the global 
  generation-number,
  it will go to the normal #PF handler to update the mmio spte
 
  Since 19 bits are used to store generation-number on mmio spte, the
  generation-number can be round after 33554432 times. It is large enough
  for nearly all most cases, but making the code be more strong, we zap 
  all
  shadow pages when the number is round
 
  Very nice idea, but why drop Takuya patches instead of using
  kvm_mmu_zap_mmio_sptes() when generation number overflows.
 
  I am not sure whether it is still needed. Requesting to zap all mmio 
  sptes for
  more than 50 times is really really rare, it nearly does not happen.
  (By the way, 33554432 is wrong in the changelog, i just copy that for my 
  origin
  implantation.) And, after my patch optimizing zapping all shadow pages,
  zap-all-sps should not be a problem anymore since it does not take too 
  much lock
  time.
 
  Your idea?
 
  I expect 50 to become less since I already had plans to store some
 
  Interesting, just curious, what are the plans? ;)
 
  Currently we uses pio to signal that work is pending to virtio devices. The
  requirement is that signaling should be fast and PIO is fast since there
  is not need to emulate instruction. PCIE though is not really designed
  with PIO in mind, so we will have to use MMIO to do signaling. To avoid
  instruction emulation I thought about making guest access these devices 
  using
  predefined variety of MOV instruction so that emulation can be skipped.
  The idea is to mark mmio spte to know that emulation is not needed.
 
 How to know page-fault is caused by the predefined instruction?
 
Only predefined phys address rages will be accessed that way. If page
fault is in a range we assume the knows instruction is used.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

2013-03-18 Thread Xiao Guangrong

On 03/18/2013 09:19 PM, Gleb Natapov wrote:
 On Mon, Mar 18, 2013 at 09:09:43PM +0800, Xiao Guangrong wrote:
 On 03/18/2013 08:46 PM, Gleb Natapov wrote:
 On Mon, Mar 18, 2013 at 08:29:29PM +0800, Xiao Guangrong wrote:
 On 03/18/2013 05:13 PM, Gleb Natapov wrote:
 On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
 On 03/17/2013 11:02 PM, Gleb Natapov wrote:
 On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
 This patch tries to introduce a very simple and scale way to invalid 
 all
 mmio sptes - it need not walk any shadow pages and hold mmu-lock

 KVM maintains a global mmio invalid generation-number which is stored 
 in
 kvm-arch.mmio_invalid_gen and every mmio spte stores the current 
 global
 generation-number into his available bits when it is created

 When KVM need zap all mmio sptes, it just simply increase the global
 generation-number. When guests do mmio access, KVM intercepts a MMIO 
 #PF
 then it walks the shadow page table and get the mmio spte. If the
 generation-number on the spte does not equal the global 
 generation-number,
 it will go to the normal #PF handler to update the mmio spte

 Since 19 bits are used to store generation-number on mmio spte, the
 generation-number can be round after 33554432 times. It is large enough
 for nearly all most cases, but making the code be more strong, we zap 
 all
 shadow pages when the number is round

 Very nice idea, but why drop Takuya patches instead of using
 kvm_mmu_zap_mmio_sptes() when generation number overflows.

 I am not sure whether it is still needed. Requesting to zap all mmio 
 sptes for
 more than 50 times is really really rare, it nearly does not happen.
 (By the way, 33554432 is wrong in the changelog, i just copy that for my 
 origin
 implantation.) And, after my patch optimizing zapping all shadow pages,
 zap-all-sps should not be a problem anymore since it does not take too 
 much lock
 time.

 Your idea?

 I expect 50 to become less since I already had plans to store some

 Interesting, just curious, what are the plans? ;)

 Currently we uses pio to signal that work is pending to virtio devices. The
 requirement is that signaling should be fast and PIO is fast since there
 is not need to emulate instruction. PCIE though is not really designed
 with PIO in mind, so we will have to use MMIO to do signaling. To avoid
 instruction emulation I thought about making guest access these devices 
 using
 predefined variety of MOV instruction so that emulation can be skipped.
 The idea is to mark mmio spte to know that emulation is not needed.

 How to know page-fault is caused by the predefined instruction?

 Only predefined phys address rages will be accessed that way. If page
 fault is in a range we assume the knows instruction is used.

That means the access can be identified by the gfn, why need cache
other things into mmio spte?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

2013-03-18 Thread Gleb Natapov

On Mon, Mar 18, 2013 at 09:25:10PM +0800, Xiao Guangrong wrote:
 On 03/18/2013 09:19 PM, Gleb Natapov wrote:
  On Mon, Mar 18, 2013 at 09:09:43PM +0800, Xiao Guangrong wrote:
  On 03/18/2013 08:46 PM, Gleb Natapov wrote:
  On Mon, Mar 18, 2013 at 08:29:29PM +0800, Xiao Guangrong wrote:
  On 03/18/2013 05:13 PM, Gleb Natapov wrote:
  On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
  On 03/17/2013 11:02 PM, Gleb Natapov wrote:
  On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
  This patch tries to introduce a very simple and scale way to invalid 
  all
  mmio sptes - it need not walk any shadow pages and hold mmu-lock
 
  KVM maintains a global mmio invalid generation-number which is 
  stored in
  kvm-arch.mmio_invalid_gen and every mmio spte stores the current 
  global
  generation-number into his available bits when it is created
 
  When KVM need zap all mmio sptes, it just simply increase the global
  generation-number. When guests do mmio access, KVM intercepts a MMIO 
  #PF
  then it walks the shadow page table and get the mmio spte. If the
  generation-number on the spte does not equal the global 
  generation-number,
  it will go to the normal #PF handler to update the mmio spte
 
  Since 19 bits are used to store generation-number on mmio spte, the
  generation-number can be round after 33554432 times. It is large 
  enough
  for nearly all most cases, but making the code be more strong, we 
  zap all
  shadow pages when the number is round
 
  Very nice idea, but why drop Takuya patches instead of using
  kvm_mmu_zap_mmio_sptes() when generation number overflows.
 
  I am not sure whether it is still needed. Requesting to zap all mmio 
  sptes for
  more than 50 times is really really rare, it nearly does not 
  happen.
  (By the way, 33554432 is wrong in the changelog, i just copy that for 
  my origin
  implantation.) And, after my patch optimizing zapping all shadow pages,
  zap-all-sps should not be a problem anymore since it does not take too 
  much lock
  time.
 
  Your idea?
 
  I expect 50 to become less since I already had plans to store some
 
  Interesting, just curious, what are the plans? ;)
 
  Currently we uses pio to signal that work is pending to virtio devices. 
  The
  requirement is that signaling should be fast and PIO is fast since there
  is not need to emulate instruction. PCIE though is not really designed
  with PIO in mind, so we will have to use MMIO to do signaling. To avoid
  instruction emulation I thought about making guest access these devices 
  using
  predefined variety of MOV instruction so that emulation can be skipped.
  The idea is to mark mmio spte to know that emulation is not needed.
 
  How to know page-fault is caused by the predefined instruction?
 
  Only predefined phys address rages will be accessed that way. If page
  fault is in a range we assume the knows instruction is used.
 
 That means the access can be identified by the gfn, why need cache
 other things into mmio spte?
Two not search through memory ranges on each access.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

2013-03-18 Thread Xiao Guangrong

On 03/18/2013 09:27 PM, Gleb Natapov wrote:
 On Mon, Mar 18, 2013 at 09:25:10PM +0800, Xiao Guangrong wrote:
 On 03/18/2013 09:19 PM, Gleb Natapov wrote:
 On Mon, Mar 18, 2013 at 09:09:43PM +0800, Xiao Guangrong wrote:
 On 03/18/2013 08:46 PM, Gleb Natapov wrote:
 On Mon, Mar 18, 2013 at 08:29:29PM +0800, Xiao Guangrong wrote:
 On 03/18/2013 05:13 PM, Gleb Natapov wrote:
 On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
 On 03/17/2013 11:02 PM, Gleb Natapov wrote:
 On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
 This patch tries to introduce a very simple and scale way to invalid 
 all
 mmio sptes - it need not walk any shadow pages and hold mmu-lock

 KVM maintains a global mmio invalid generation-number which is 
 stored in
 kvm-arch.mmio_invalid_gen and every mmio spte stores the current 
 global
 generation-number into his available bits when it is created

 When KVM need zap all mmio sptes, it just simply increase the global
 generation-number. When guests do mmio access, KVM intercepts a MMIO 
 #PF
 then it walks the shadow page table and get the mmio spte. If the
 generation-number on the spte does not equal the global 
 generation-number,
 it will go to the normal #PF handler to update the mmio spte

 Since 19 bits are used to store generation-number on mmio spte, the
 generation-number can be round after 33554432 times. It is large 
 enough
 for nearly all most cases, but making the code be more strong, we 
 zap all
 shadow pages when the number is round

 Very nice idea, but why drop Takuya patches instead of using
 kvm_mmu_zap_mmio_sptes() when generation number overflows.

 I am not sure whether it is still needed. Requesting to zap all mmio 
 sptes for
 more than 50 times is really really rare, it nearly does not 
 happen.
 (By the way, 33554432 is wrong in the changelog, i just copy that for 
 my origin
 implantation.) And, after my patch optimizing zapping all shadow pages,
 zap-all-sps should not be a problem anymore since it does not take too 
 much lock
 time.

 Your idea?

 I expect 50 to become less since I already had plans to store some

 Interesting, just curious, what are the plans? ;)

 Currently we uses pio to signal that work is pending to virtio devices. 
 The
 requirement is that signaling should be fast and PIO is fast since there
 is not need to emulate instruction. PCIE though is not really designed
 with PIO in mind, so we will have to use MMIO to do signaling. To avoid
 instruction emulation I thought about making guest access these devices 
 using
 predefined variety of MOV instruction so that emulation can be skipped.
 The idea is to mark mmio spte to know that emulation is not needed.

 How to know page-fault is caused by the predefined instruction?

 Only predefined phys address rages will be accessed that way. If page
 fault is in a range we assume the knows instruction is used.

 That means the access can be identified by the gfn, why need cache
 other things into mmio spte?
 Two not search through memory ranges on each access.

Aha... got it. Thank you! ;)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 55421] New: igb VF can't work in KVM guest

2013-03-18 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=55421

   Summary: igb VF can't work in KVM guest
   Product: Virtualization
   Version: unspecified
Kernel Version: 3.9.0-rc1
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: kvm
AssignedTo: virtualization_...@kernel-bugs.osdl.org
ReportedBy: yongjie@intel.com
Regression: No


Environment:

Host OS (ia32/ia32e/IA64):ia32e
Guest OS (ia32/ia32e/IA64):ia32e
Guest OS Type (Linux/Windows):Linux
kvm.git next branch Commit:5da596078f915a62e39a20e582308eab91b88c9a
qemu-kvm uq/master branch Commit:3e41a753551a906dd9ed66fb0fc34167a6af3ba0
Host Kernel Version:3.9.0-rc1
Hardware:SNB-EP, WSM-EP


Bug detailed description:
--
After assigning a igb VF to a guest or hot-plugging a VF to a guest, the VF
cannot work in the guest.
Note:
1. The VF of Intel 82599 NIC (ixgbe) can work in guest
2. The PF of Intel 82576 NIC or Intel I350 NIC (using igb driver) can work in
guest

This should be a kvm bug.
kvm  + qemu-kvm   =  result
5da59607 + 3e41a753   = bad
3ab66e8a + 3e41a753   = good


Reproduce steps:

1.start up a KVM host
2.assign a VF to a guest in qemu commandline: 
qemu-system-x86_64 --enable-kvm -m 1024 -smp 2 -device pci-assign,host=05:10.0
-net none -hda rhel6u3.img

Current result:

the VF of cannot work in guest 

Expected result:

the VF can work in the guest

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 55421] igb VF can't work in KVM guest

2013-03-18 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=55421





--- Comment #1 from Jay Ren yongjie@intel.com  2013-03-18 15:11:16 ---
Created an attachment (id=95741)
 -- (https://bugzilla.kernel.org/attachment.cgi?id=95741)
host dmesg

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 55421] igb VF can't work in KVM guest

2013-03-18 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=55421





--- Comment #2 from Jay Ren yongjie@intel.com  2013-03-18 15:12:02 ---
Created an attachment (id=95751)
 -- (https://bugzilla.kernel.org/attachment.cgi?id=95751)
guest dmesg

in the dmesg in guest:
igbvf :00:03.0: irq 26 for MSI/MSI-X
igbvf :00:03.0: Invalid MAC Address: 00:00:00:00:00:00
igbvf: probe of :00:03.0 failed with error -5

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 55421] igb VF can't work in KVM guest

2013-03-18 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=55421





--- Comment #3 from Jay Ren yongjie@intel.com  2013-03-18 15:12:33 ---
Created an attachment (id=95761)
 -- (https://bugzilla.kernel.org/attachment.cgi?id=95761)
lspci info of the igbvf in guest

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM guest 100% cpu lock up

2013-03-18 Thread Andrew Jones



- Original Message -
 Hello,
 
 I am having an intermittent issue where one of my KVM guests is
 locking up and when checking the process its running @ 100% cpu.
  The host is CentOS 6.4 running the kernel
 kernel-2.6.32-358.0.1.el6.x86_64.
 
 Would it be worth attempting to compile a 3.2 kernel and use the
 latest qemu-kvm package instead ?
 

Can you please supply the full qemu command line?

thanks,
drew


 Thanks.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM/MIPS32: Sync up with latest KVM API changes

2013-03-18 Thread Sanjay Lal


On Mar 18, 2013, at 12:04 PM, Jonas Gorski wrote:

 On 15 March 2013 03:09, Sanjay Lal sanj...@kymasys.com wrote:
 - Rename KVM_MEMORY_SLOTS - KVM_USER_MEM_SLOTS
 - Fix kvm_arch_{prepare,commit}_memory_region()
 - Also remove kvm_arch_set_memory_region which was unused.
 
 I just stumbled upon the build issue caused by the first item and can
 confirm that this patch fixes it.
 
 I guess this is a replacement for KVM/MIPS32: define
 KVM_USER_MEM_SLOTS (which doesn't apply anyway because of missing
 spaces)? If yes, then you should mark the old one as superseded in
 http://patchwork.linux-mips.org/. And maybe include here that it fixes
 a build issue.
 
 
 Regards
 Jonas
 

Hi Jonas, will do.

Regards and thanks
Sanjay

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM guest 100% cpu lock up

2013-03-18 Thread Phil Daws


- Original Message -
From: Andrew Jones drjo...@redhat.com
To: Phil Daws ux...@splatnix.net
Cc: kvm@vger.kernel.org
Sent: Monday, 18 March, 2013 4:08:48 PM
Subject: Re: KVM guest 100% cpu lock up



- Original Message -
 Hello,
 
 I am having an intermittent issue where one of my KVM guests is
 locking up and when checking the process its running @ 100% cpu.
  The host is CentOS 6.4 running the kernel
 kernel-2.6.32-358.0.1.el6.x86_64.
 
 Would it be worth attempting to compile a 3.2 kernel and use the
 latest qemu-kvm package instead ?
 

Can you please supply the full qemu command line?

thanks,
drew


 Thanks.
 --

qemu  5919 22.5 22.3 2705424 1798484 ? Sl   15:23   4:44 
/usr/libexec/qemu-kvm -name vs2 -S -M rhel6.4.0 -enable-kvm -m 2048 -smp 
2,sockets=2,cores=1,threads=1 -uuid 38fb2902-06a5-0781-f14a-f17bf96668f1 
-nodefconfig -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/vs2.monitor,server,nowait -mon 
chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device 
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
file=/images/vs2.img,if=none,id=drive-virtio-disk0,format=raw,cache=none 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device 
ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev 
tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=54:52:00:02:01:04,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-device usb-tablet,id=input0 -vnc 127.0.0.1:2 -vga cirrus -device 
intel-hda,id=sound0,bus=pci.0,addr=0x4 -device 
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM/MIPS32: Sync up with latest KVM API changes

2013-03-18 Thread Jonas Gorski

On 15 March 2013 03:09, Sanjay Lal sanj...@kymasys.com wrote:
 - Rename KVM_MEMORY_SLOTS - KVM_USER_MEM_SLOTS
 - Fix kvm_arch_{prepare,commit}_memory_region()
 - Also remove kvm_arch_set_memory_region which was unused.

I just stumbled upon the build issue caused by the first item and can
confirm that this patch fixes it.

I guess this is a replacement for KVM/MIPS32: define
KVM_USER_MEM_SLOTS (which doesn't apply anyway because of missing
spaces)? If yes, then you should mark the old one as superseded in
http://patchwork.linux-mips.org/. And maybe include here that it fixes
a build issue.


Regards
Jonas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: VMX: Require KVM_SET_TSS_ADDR being called prior to running a VCPU

2013-03-18 Thread Marcelo Tosatti

On Fri, Mar 15, 2013 at 08:38:56AM +0100, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com
 
 Very old user space (namely qemu-kvm before kvm-49) didn't set the TSS
 base before running the VCPU. We always warned about this bug, but no
 reports about users actually seeing this are known. Time to finally
 remove the workaround that effectively prevented to call vmx_vcpu_reset
 while already holding the KVM srcu lock.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
 
 This obsoletes all other attempts to allow vmx_vcpu_reset to be called
 with the srcu lock held. Hope we can settle the topic this way.

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM: x86: fix deadlock in clock-in-progress request handling

2013-03-18 Thread Marcelo Tosatti


There is a deadlock in pvclock handling:

cpu0:   cpu1:
kvm_gen_update_masterclock()
  kvm_guest_time_update()
 spin_lock(pvclock_gtod_sync_lock)
   local_irq_save(flags)
  
spin_lock(pvclock_gtod_sync_lock)

 kvm_make_mclock_inprogress_request(kvm)
  make_all_cpus_request()
   smp_call_function_many()

Now if smp_call_function_many() called by cpu0 tries to call function on
cpu1 there will be a deadlock.

Fix by moving pvclock_gtod_sync_lock protected section outside irq
disabled section.

Analyzed by Gleb Natapov g...@redhat.com
Reported-and-Tested-by: Yongjie Ren yongjie@intel.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f71500a..f7c850b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1416,15 +1416,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
kernel_ns = 0;
host_tsc = 0;
 
-   /* Keep irq disabled to prevent changes to the clock */
-   local_irq_save(flags);
-   this_tsc_khz = __get_cpu_var(cpu_tsc_khz);
-   if (unlikely(this_tsc_khz == 0)) {
-   local_irq_restore(flags);
-   kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
-   return 1;
-   }
-
/*
 * If the host uses TSC clock, then passthrough TSC as stable
 * to the guest.
@@ -1436,6 +1427,15 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
kernel_ns = ka-master_kernel_ns;
}
spin_unlock(ka-pvclock_gtod_sync_lock);
+
+   /* Keep irq disabled to prevent changes to the clock */
+   local_irq_save(flags);
+   this_tsc_khz = __get_cpu_var(cpu_tsc_khz);
+   if (unlikely(this_tsc_khz == 0)) {
+   local_irq_restore(flags);
+   kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
+   return 1;
+   }
if (!use_master_clock) {
host_tsc = native_read_tsc();
kernel_ns = get_kernel_ns();
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PULL] vhost: tcm_vhost fixes for 3.9

2013-03-18 Thread David Miller

From: Michael S. Tsirkin m...@redhat.com
Date: Mon, 18 Mar 2013 13:20:03 +0200

 The following changes since commit 8c6216d7f118a128678270824b6a1286a63863ca:
 
   Revert ip_gre: make ipgre_tunnel_xmit() not parse network header as IP 
 unconditionally (2013-03-16 23:00:41 -0400)
 
 are available in the git repository at:
 
   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net
 
 for you to fetch changes up to deb7cb067dabd3be625eaf5495da8bdc97377fc1:
 
   tcm_vhost: Flush vhost_work in vhost_scsi_flush() (2013-03-17 13:04:14 
 +0200)

This is a scsi driver, I therefore don't think this pull request is for me.

Please avoid such confusion in the future, and don't use branch names
like vhost-net for SCSI driver fixes.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 22/29] arm64: KVM: define 32bit specific registers

2013-03-18 Thread Christopher Covington

Hi Marc,

On 03/04/2013 10:47 PM, Marc Zyngier wrote:
 Define the 32bit specific registers (SPSRs, cp15...).
 
 Most CPU registers are directly mapped to a 64bit register
 (r0-x0...). Only the SPSRs have separate registers.
 
 cp15 registers are also mapped into their 64bit counterpart in most
 cases.
 
 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 ---
  arch/arm64/include/asm/kvm_asm.h  | 38 +-

[...]

 +/* 32bit mapping */
 +#define c0_MPIDR (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
 +#define c0_CSSELR(CSSELR_EL1 * 2)/* Cache Size Selection Register */
 +#define c1_SCTLR (SCTLR_EL1 * 2) /* System Control Register */
 +#define c1_ACTLR (ACTLR_EL1 * 2) /* Auxilliary Control Register */

Auxiliary

 +#define c1_CPACR (CPACR_EL1 * 2) /* Coprocessor Access Control */
 +#define c2_TTBR0 (TTBR0_EL1 * 2) /* Translation Table Base Register 0 */
 +#define c2_TTBR0_high(c2_TTBR0 + 1)  /* TTBR0 top 32 bits */
 +#define c2_TTBR1 (TTBR1_EL1 * 2) /* Translation Table Base Register 1 */
 +#define c2_TTBR1_high(c2_TTBR1 + 1)  /* TTBR1 top 32 bits */
 +#define c2_TTBCR (TCR_EL1 * 2)   /* Translation Table Base Control R. */
 +#define c3_DACR  (DACR32_EL2 * 2)/* Domain Access Control 
 Register */
 +#define c5_DFSR  (ESR_EL1 * 2)   /* Data Fault Status Register */
 +#define c5_IFSR  (IFSR32_EL2 * 2)/* Instruction Fault Status 
 Register */
 +#define c5_ADFSR (AFSR0_EL1 * 2) /* Auxilary Data Fault Status R */
 +#define c5_AIFSR (AFSR1_EL1 * 2) /* Auxilary Instr Fault Status R */

Auxiliary

 +#define c6_DFAR  (FAR_EL1 * 2)   /* Data Fault Address Register 
 */
 +#define c6_IFAR  (c6_DFAR + 1)   /* Instruction Fault Address 
 Register */
 +#define c10_PRRR (MAIR_EL1 * 2)  /* Primary Region Remap Register */
 +#define c10_NMRR (c10_PRRR + 1)  /* Normal Memory Remap Register */
 +#define c12_VBAR (VBAR_EL1 * 2)  /* Vector Base Address Register */
 +#define c13_CID  (CONTEXTIDR_EL1 * 2)/* Context ID Register 
 */
 +#define c13_TID_URW  (TPIDR_EL0 * 2) /* Thread ID, User R/W */
 +#define c13_TID_URO  (TPIDRRO_EL0 * 2)/* Thread ID, User R/O */
 +#define c13_TID_PRIV (TPIDR_EL1 * 2) /* Thread ID, Priveleged */

Privileged

 +#define c10_AMAIR(AMAIR_EL1 * 2) /* Aux Memory Attr Indirection Reg */
 +#define c14_CNTKCTL  (CNTKCTL_EL1 * 2) /* Timer Control Register (PL1) */
 +#define NR_CP15_REGS (NR_SYS_REGS * 2)

[...]

Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
the Linux Foundation
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 24/29] arm64: KVM: 32bit conditional execution emulation

2013-03-18 Thread Christopher Covington

Hi Marc,

On 03/04/2013 10:47 PM, Marc Zyngier wrote:
 As conditionnal instructions can trap on AArch32, add the thinest

conditional

 possible emulation layer to keep 32bit guests happy.

[...]

 diff --git a/arch/arm64/kvm/emulate.c b/arch/arm64/kvm/emulate.c
 new file mode 100644
 index 000..6b3dbc3
 --- /dev/null
 +++ b/arch/arm64/kvm/emulate.c
 @@ -0,0 +1,154 @@
 +/*
 + * (not much of an) Emulation layer for 32bit guests.
 + *
 + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
 + * Author: Christoffer Dall c.d...@virtualopensystems.com
 + *
 + * This program is free software: you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 as
 + * published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program.  If not, see http://www.gnu.org/licenses/.
 + */
 +
 +#include linux/kvm_host.h
 +#include asm/kvm_emulate.h
 +
 +/*
 + * stolen from arch/arm/kernel/opcodes.c
 + *
 + * condition code lookup table
 + * index into the table is test code: EQ, NE, ... LT, GT, AL, NV
 + *
 + * bit position in short is condition code: NZCV
 + */
 +static const unsigned short cc_map[16] = {
 + 0xF0F0, /* EQ == Z set*/
 + 0x0F0F, /* NE */
 + 0x, /* CS == C set*/
 + 0x, /* CC */
 + 0xFF00, /* MI == N set*/
 + 0x00FF, /* PL */
 + 0x, /* VS == V set*/
 + 0x, /* VC */
 + 0x0C0C, /* HI == C set  Z clear */
 + 0xF3F3, /* LS == C clear || Z set */
 + 0xAA55, /* GE == (N==V)   */
 + 0x55AA, /* LT == (N!=V)   */
 + 0x0A05, /* GT == (!Z  (N==V))   */
 + 0xF5FA, /* LE == (Z || (N!=V))*/
 + 0x, /* AL always  */
 + 0   /* NV */
 +};
 +
 +static int kvm_vcpu_get_condition(struct kvm_vcpu *vcpu)
 +{
 + u32 esr = kvm_vcpu_get_hsr(vcpu);
 +
 + if (esr  ESR_EL2_CV)
 + return (esr  ESR_EL2_COND)  ESR_EL2_COND_SHIFT;
 +
 + return -1;
 +}
 +
 +/*
 + * Check if a trapped instruction should have been executed or not.
 + */
 +bool kvm_condition_valid32(struct kvm_vcpu *vcpu)
 +{
 + unsigned long cpsr;
 + u32 cpsr_cond;
 + int cond;
 +
 + /* Top two bits non-zero?  Unconditional. */
 + if (kvm_vcpu_get_hsr(vcpu)  30)
 + return true;
 +
 + /* Is condition field valid? */
 + cond = kvm_vcpu_get_condition(vcpu);
 + if (cond == 0xE)
 + return true;
 +
 + cpsr = *vcpu_cpsr(vcpu);
 +
 + if (cond  0) {
 + /* This can happen in Thumb mode: examine IT state. */
 + unsigned long it;
 +
 + it = ((cpsr  8)  0xFC) | ((cpsr  25)  0x3);
 +
 + /* it == 0 = unconditional. */
 + if (it == 0)
 + return true;
 +
 + /* The cond for this insn works out as the top 4 bits. */
 + cond = (it  4);
 + }
 +
 + cpsr_cond = cpsr  28;
 +
 + if (!((cc_map[cond]  cpsr_cond)  1))
 + return false;
 +
 + return true;
 +}
 +
 +/**
 + * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
 + * @vcpu:The VCPU pointer
 + *
 + * When exceptions occur while instructions are executed in Thumb IF-THEN
 + * blocks, the ITSTATE field of the CPSR is not advanved (updated), so we 
 have
 + * to do this little bit of work manually. The fields map like this:
 + *
 + * IT[7:0] - CPSR[26:25],CPSR[15:10]
 + */
 +static void kvm_adjust_itstate(struct kvm_vcpu *vcpu)
 +{
 + unsigned long itbits, cond;
 + unsigned long cpsr = *vcpu_cpsr(vcpu);
 + bool is_arm = !(cpsr  COMPAT_PSR_T_BIT);
 +
 + BUG_ON(is_arm  (cpsr  COMPAT_PSR_IT_MASK));
 +
 + if (!(cpsr  COMPAT_PSR_IT_MASK))
 + return;
 +
 + cond = (cpsr  0xe000)  13;
 + itbits = (cpsr  0x1c00)  (10 - 2);
 + itbits |= (cpsr  (0x3  25))  25;
 +
 + /* Perform ITAdvance (see page A-52 in ARM DDI 0406C) */
 + if ((itbits  0x7) == 0)
 + itbits = cond = 0;
 + else
 + itbits = (itbits  1)  0x1f;
 +
 + cpsr = ~COMPAT_PSR_IT_MASK;
 + cpsr |= cond  13;
 + cpsr |= (itbits  0x1c)  (10 - 2);
 + cpsr |= (itbits  0x3)  25;
 + *vcpu_cpsr(vcpu) = cpsr;
 +}

Maybe I'm spoiled by the breadth of the 64-bit

Re: KVM: x86: fix deadlock in clock-in-progress request handling

2013-03-18 Thread Gleb Natapov

On Mon, Mar 18, 2013 at 01:54:32PM -0300, Marcelo Tosatti wrote:
 
 There is a deadlock in pvclock handling:
 
 cpu0:   cpu1:
 kvm_gen_update_masterclock()
   kvm_guest_time_update()
  spin_lock(pvclock_gtod_sync_lock)
local_irq_save(flags)
   
 spin_lock(pvclock_gtod_sync_lock)
 
  kvm_make_mclock_inprogress_request(kvm)
   make_all_cpus_request()
smp_call_function_many()
 
 Now if smp_call_function_many() called by cpu0 tries to call function on
 cpu1 there will be a deadlock.
 
 Fix by moving pvclock_gtod_sync_lock protected section outside irq
 disabled section.
 
 Analyzed by Gleb Natapov g...@redhat.com
 Reported-and-Tested-by: Yongjie Ren yongjie@intel.com
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
Acked-by: Gleb Natapov g...@redhat.com

 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index f71500a..f7c850b 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -1416,15 +1416,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
   kernel_ns = 0;
   host_tsc = 0;
  
 - /* Keep irq disabled to prevent changes to the clock */
 - local_irq_save(flags);
 - this_tsc_khz = __get_cpu_var(cpu_tsc_khz);
 - if (unlikely(this_tsc_khz == 0)) {
 - local_irq_restore(flags);
 - kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
 - return 1;
 - }
 -
   /*
* If the host uses TSC clock, then passthrough TSC as stable
* to the guest.
 @@ -1436,6 +1427,15 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
   kernel_ns = ka-master_kernel_ns;
   }
   spin_unlock(ka-pvclock_gtod_sync_lock);
 +
 + /* Keep irq disabled to prevent changes to the clock */
 + local_irq_save(flags);
 + this_tsc_khz = __get_cpu_var(cpu_tsc_khz);
 + if (unlikely(this_tsc_khz == 0)) {
 + local_irq_restore(flags);
 + kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
 + return 1;
 + }
   if (!use_master_clock) {
   host_tsc = native_read_tsc();
   kernel_ns = get_kernel_ns();

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 28/29] arm64: KVM: 32bit guest fault injection

2013-03-18 Thread Christopher Covington

Hi Marc,

Here are a few more preprocessor definition suggestions.

On 03/04/2013 10:47 PM, Marc Zyngier wrote:
 Add fault injection capability for 32bit guests.
 
 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 ---
  arch/arm64/kvm/inject_fault.c | 79 
 ++-
  1 file changed, 78 insertions(+), 1 deletion(-)
 
 diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
 index 80b245f..85a4548 100644
 --- a/arch/arm64/kvm/inject_fault.c
 +++ b/arch/arm64/kvm/inject_fault.c
 @@ -1,5 +1,5 @@
  /*
 - * Fault injection for 64bit guests.
 + * Fault injection for both 32 and 64bit guests.
   *
   * Copyright (C) 2012 - ARM Ltd
   * Author: Marc Zyngier marc.zyng...@arm.com
 @@ -24,6 +24,74 @@
  #include linux/kvm_host.h
  #include asm/kvm_emulate.h
  
 +static void prepare_fault32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
 +{
 + unsigned long cpsr;
 + unsigned long new_spsr_value = *vcpu_cpsr(vcpu);
 + bool is_thumb = (new_spsr_value  COMPAT_PSR_T_BIT);
 + u32 return_offset = (is_thumb) ? 4 : 0;
 + u32 sctlr = vcpu-arch.cp15[c1_SCTLR];
 +
 + cpsr = mode | COMPAT_PSR_I_BIT;
 +
 + if (sctlr  (1  30))

COMPAT_SCTLR_TE?

 + cpsr |= COMPAT_PSR_T_BIT;
 + if (sctlr  (1  25))

SCTLR_EL2_EE

 + cpsr |= COMPAT_PSR_E_BIT;
 +
 + *vcpu_cpsr(vcpu) = cpsr;
 +
 + /* Note: These now point to the banked copies */
 + *vcpu_spsr(vcpu) = new_spsr_value;
 + *vcpu_reg(vcpu, 14) = *vcpu_pc(vcpu) + return_offset;
 +
 + /* Branch to exception vector */
 + if (sctlr  (1  13))

COMPAT_SCTLR_V?

 + vect_offset += 0x;
 + else /* always have security exceptions */
 + vect_offset += vcpu-arch.cp15[c12_VBAR];
 +
 + *vcpu_pc(vcpu) = vect_offset;
 +}
 +
 +static void inject_undef32(struct kvm_vcpu *vcpu)
 +{
 + prepare_fault32(vcpu, COMPAT_PSR_MODE_UND, 4);
 +}
 +
 +/*
 + * Modelled after TakeDataAbortException() and TakePrefetchAbortException
 + * pseudocode.
 + */
 +static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
 +  unsigned long addr)
 +{
 + u32 vect_offset;
 + u32 *far, *fsr;
 + bool is_lpae;
 +
 + if (is_pabt) {
 + vect_offset = 12;
 + far = vcpu-arch.cp15[c6_IFAR];
 + fsr = vcpu-arch.cp15[c5_IFSR];
 + } else { /* !iabt */
 + vect_offset = 16;
 + far = vcpu-arch.cp15[c6_DFAR];
 + fsr = vcpu-arch.cp15[c5_DFSR];
 + }
 +
 + prepare_fault32(vcpu, COMPAT_PSR_MODE_ABT | COMPAT_PSR_A_BIT, 
 vect_offset);
 +
 + *far = addr;
 +
 + /* Always give debug fault for now - should give guest a clue */
 + is_lpae = (vcpu-arch.cp15[c2_TTBCR]  31);

COMPAT_TTBCR_EAE?

 + if (is_lpae)
 + *fsr = 1  9 | 0x22;

COMPAT_FSR_LPAE, COMPAT_FSR_LD_FS_DBG?

 + else
 + *fsr = 2;

COMPAT_FSR_SD_FS_DBG?

[...]

Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
the Linux Foundation
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] KVM: MMU: fast zap all shadow pages

2013-03-18 Thread Marcelo Tosatti

On Wed, Mar 13, 2013 at 12:59:12PM +0800, Xiao Guangrong wrote:
 The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
 walk and zap all shadow pages one by one, also it need to zap all guest
 page's rmap and all shadow page's parent spte list. Particularly, things
 become worse if guest uses more memory or vcpus. It is not good for
 scalability.
 
 Since all shadow page will be zapped, we can directly zap the mmu-cache
 and rmap so that vcpu will fault on the new mmu-cache, after that, we can
 directly free the memory used by old mmu-cache.
 
 The root shadow page is little especial since they are currently used by
 vcpus, we can not directly free them. So, we zap the root shadow pages and
 re-add them into the new mmu-cache.
 
 After this patch, kvm_mmu_zap_all can be faster 113% than before
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/mmu.c |   62 ++-
  1 files changed, 56 insertions(+), 6 deletions(-)
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index e326099..536d9ce 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -4186,18 +4186,68 @@ void kvm_mmu_slot_remove_write_access(struct kvm 
 *kvm, int slot)
 
  void kvm_mmu_zap_all(struct kvm *kvm)
  {
 - struct kvm_mmu_page *sp, *node;
 + LIST_HEAD(root_mmu_pages);
   LIST_HEAD(invalid_list);
 + struct list_head pte_list_descs;
 + struct kvm_mmu_cache *cache = kvm-arch.mmu_cache;
 + struct kvm_mmu_page *sp, *node;
 + struct pte_list_desc *desc, *ndesc;
 + int root_sp = 0;
 
   spin_lock(kvm-mmu_lock);
 +
  restart:
 - list_for_each_entry_safe(sp, node,
 -   kvm-arch.mmu_cache.active_mmu_pages, link)
 - if (kvm_mmu_prepare_zap_page(kvm, sp, invalid_list))
 - goto restart;
 + /*
 +  * The root shadow pages are being used on vcpus that can not
 +  * directly removed, we filter them out and re-add them to the
 +  * new mmu cache.
 +  */
 + list_for_each_entry_safe(sp, node, cache-active_mmu_pages, link)
 + if (sp-root_count) {
 + int ret;
 +
 + root_sp++;
 + ret = kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
 + list_move(sp-link, root_mmu_pages);
 + if (ret)
 + goto restart;
 + }
 +
 + list_splice(cache-active_mmu_pages, invalid_list);
 + list_replace(cache-pte_list_descs, pte_list_descs);
 +
 + /*
 +  * Reset the mmu cache so that later vcpu will fault on the new
 +  * mmu cache.
 +  */
 + memset(cache, 0, sizeof(*cache));
 + kvm_mmu_init(kvm);

Xiao,

I suppose zeroing of kvm_mmu_cache can be avoided, if the links are
removed at prepare_zap_page. So perhaps

- spin_lock(mmu_lock)
- for each page
- zero sp-spt[], remove page from linked lists
- flush remote TLB (batched)
- spin_unlock(mmu_lock)
- free data (which is safe because freeing has its own serialization)
- spin_lock(mmu_lock)
- account for the pages freed
- spin_unlock(mmu_lock)

(or if you think of some other way to not have the mmu_cache zeroing step).

Note the account for pages freed step after pages are actually
freed: as discussed with Takuya, having pages freed and freed page
accounting out of sync across mmu_lock is potentially problematic:
kvm-arch.n_used_mmu_pages and friends do not reflect reality which can
cause problems for SLAB freeing and page allocation throttling.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] VFIO: VFIO_DEVICE_SET_ADDR_MAPPING command

2013-03-18 Thread Alex Williamson

On Sat, 2013-03-16 at 06:37 +0100, Benjamin Herrenschmidt wrote:
 On Sat, 2013-03-16 at 09:34 +0800, Gavin Shan wrote:
  Could you explain further how this will be used?  How the device is
  exposed to a guest is entirely a userspace construct, so why does vfio
  need to know or care about this?  I had assumed for AER that QEMU would
  do the translation from host to guest address space.
  
  
  The weak IOCTL function (vfio_pci_arch_ioctl) was introduced by previous
  patch. The PowerNV platform is going to override it to figure out the
  information for EEH core to use. On the other hand, QEMU will runs into
  the IOCTL command while opening (creating) one VFIO device.
  
  Though I'm not familiar with AER very much. AER is quite different from
  EEH. The EEH functionality implemented in PHB instead of in PCI device
  core. So we don't care AER stuff in EEH directly :-)
 
 To give Alex a bit more background...
 
 EEH is our IBM specific error handling facility which is a superset of AER.
 
 IE. In addition to AER's error detection and logging, it adds a layer of
 error detection at the host bridge level (such as iommu violations etc...)
 and a mechanism for handling and recovering from errors. This is tied to
 our iommu domain stuff (our PE's) and our device freezing capability
 among others.
 
 With VFIO + KVM, we want to implement most of the EEH support for guests in
 the host kernel. The reason is multipart and we can discuss this separately
 as some of it might well be debatable (mostly it's more convenient that way
 because we hook into the underlying HW/FW EEH which isn't directly userspace
 accessible so we don't have to add a new layer of kernel - user API in
 addition to the VFIO stuff), but there's at least one aspect of it that drives
 this requirement more strongly which is performance:
 
 When EEH is enabled, whenever any MMIO returns all 1's, the kernel will do
 a firmware call to query the EEH state of the device and check whether it
 has been frozen. On some devices, that can be a performance issue, and
 going all the way to qemu for that would be horribly expensive.
 
 So we want at least a way to handle that call in the kernel and for that we
 need at least some way of mapping things there.

There's no notification mechanism when a PHB is frozen?  I suppose
notification would be asynchronous so you risk data for every read that
happens in the interim.  So the choices are a) tell the host kernel the
mapping, b) tell the guest kernel the mapping, c) identity mapping, or
d) qemu intercept?

Presumably your firmware call to query the EEH is not going through
VFIO, so is VFIO the appropriate place to setup this mapping?  As you
say, this seems like just a convenient place to put it even though it
really has nothing to do with the VFIO kernel component.  QEMU has this
information and could register it with the host kernel through other
means if available.  Maybe the mapping should be registered with KVM if
that's how the EEH data is accessed.  I'm not yet sold on why this
mapping is registered here.  Thanks,

Alex


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PULL] vhost: tcm_vhost fixes for 3.9

2013-03-18 Thread Michael S. Tsirkin

On Mon, Mar 18, 2013 at 02:10:03PM -0700, Nicholas A. Bellinger wrote:
 On Mon, 2013-03-18 at 21:54 +0200, Michael S. Tsirkin wrote:
  On Mon, Mar 18, 2013 at 01:01:36PM -0400, David Miller wrote:
   From: Michael S. Tsirkin m...@redhat.com
   Date: Mon, 18 Mar 2013 13:20:03 +0200

The following changes since commit 
8c6216d7f118a128678270824b6a1286a63863ca:

  Revert ip_gre: make ipgre_tunnel_xmit() not parse network header as 
IP unconditionally (2013-03-16 23:00:41 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net

for you to fetch changes up to deb7cb067dabd3be625eaf5495da8bdc97377fc1:

  tcm_vhost: Flush vhost_work in vhost_scsi_flush() (2013-03-17 
13:04:14 +0200)

   This is a scsi driver, I therefore don't think this pull request is for 
   me.

   Please avoid such confusion in the future, and don't use branch names
   like vhost-net for SCSI driver fixes.

   Thanks.

  We are sharing quite a bit of code, so it's a common tree.
  There was a vhost-net patch there too but you picked it up
  directly so I dropped that and what's left is vhost-scsi.
  Typically vhost-net dominates so I'm just used to merging all vhost
  things through you but if you think it's wrong I can send it directly to
  Linus.

 I'll be sending out a target-pending 3.9-rc-fixes pull request later in
 the week, and am happy to include this series if you'd like.

 --nab

Sure, go ahead. You can add
Acked-by: Michael S. Tsirkin m...@redhat.com
to the patches.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] VFIO: Direct access config reg without capability

2013-03-18 Thread Alex Williamson

On Sat, 2013-03-16 at 06:30 +0100, Benjamin Herrenschmidt wrote:
 On Fri, 2013-03-15 at 13:41 -0600, Alex Williamson wrote:
  
  This basically gives userspace free access to any regions that aren't
  covered by known capabilities. 
 
 And ?
 
 I mean seriously :-) We already had that discussion ... trying to
 protect config space is just plain doomed. There is no point.
 
 It makes sense to do things like emulate BARs etc... for things to
 function properly under some circumstances/setups where you can't just
 expose the original BAR values to the guest and have the HW take care of
 it but you *must* be prepared to deal with anything in config space
 being changed without you knowing about it.
 
 Devices *will* have backdoors via MMIO. Period. You cannot rely on those
 not existing, whether they are documented or not.
 
 If you can't cope with the config space accesses then you aren't
 properly isolated. It can be deemed acceptable (depends what you use
 your VMs for) but that I mean is that any config space
 filtering/emulation for the sake of security is ... pointless.
 
 Doing it for functionality to work at all (ie BAR emulation) is fine,
 but that's about it. IE. As a mean of security it's pointless.
 
 
   We have no idea what this might expose on some devices.
 
 No more than we have any idea what MMIO mapping of the device register
 space exposes :-)

Yeah, yeah.  Ok, I can't come up with a reasonable argument otherwise,
it'll give us better device support, and I believe pci-assign has always
done this.  I'll take another look at the patch.  Thanks,

Alex



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PULL] vhost: tcm_vhost fixes for 3.9

2013-03-18 Thread Nicholas A. Bellinger

On Mon, 2013-03-18 at 21:54 +0200, Michael S. Tsirkin wrote:
 On Mon, Mar 18, 2013 at 01:01:36PM -0400, David Miller wrote:
  From: Michael S. Tsirkin m...@redhat.com
  Date: Mon, 18 Mar 2013 13:20:03 +0200

   The following changes since commit 
   8c6216d7f118a128678270824b6a1286a63863ca:

 Revert ip_gre: make ipgre_tunnel_xmit() not parse network header as IP 
   unconditionally (2013-03-16 23:00:41 -0400)

   are available in the git repository at:

 git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net

   for you to fetch changes up to deb7cb067dabd3be625eaf5495da8bdc97377fc1:

 tcm_vhost: Flush vhost_work in vhost_scsi_flush() (2013-03-17 13:04:14 
   +0200)

  This is a scsi driver, I therefore don't think this pull request is for me.

  Please avoid such confusion in the future, and don't use branch names
  like vhost-net for SCSI driver fixes.

  Thanks.

 We are sharing quite a bit of code, so it's a common tree.
 There was a vhost-net patch there too but you picked it up
 directly so I dropped that and what's left is vhost-scsi.
 Typically vhost-net dominates so I'm just used to merging all vhost
 things through you but if you think it's wrong I can send it directly to
 Linus.

I'll be sending out a target-pending 3.9-rc-fixes pull request later in
the week, and am happy to include this series if you'd like.

--nab

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Fix booting tcm_vhost + seabios

2013-03-18 Thread Nicholas A. Bellinger

On Fri, 2013-03-15 at 09:45 +0800, Asias He wrote:
 Asias He (2):
   virtio-scsi: Set _DRIVER_OK flag before scsi target scanning
   virtio-scsi: Pack struct virtio_scsi_{req_cmd,resp_cmd}
 
  src/virtio-scsi.c | 5 +++--
  src/virtio-scsi.h | 4 ++--
  2 files changed, 5 insertions(+), 4 deletions(-)
 

Hi Asias,

Thanks for taking the initiative on this, and nice work tracking both of
these stubborn bugs down.

So with these out of the way, we're good to go for an RFC of Paolo's
vhost-scsi-pci code for upstream QEMU, yes..?

I'll have some extra bandwidth this week to spend time on the RFC if
you'd like, otherwise I'm happy with you making the upstream QEMU push
for Paolo's code. 

Whatever works best for you.  :)

Thank you,

--nab



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PULL] vhost: tcm_vhost fixes for 3.9

2013-03-18 Thread Nicholas A. Bellinger

On Mon, 2013-03-18 at 23:14 +0200, Michael S. Tsirkin wrote:
 On Mon, Mar 18, 2013 at 02:10:03PM -0700, Nicholas A. Bellinger wrote:
  On Mon, 2013-03-18 at 21:54 +0200, Michael S. Tsirkin wrote:
   On Mon, Mar 18, 2013 at 01:01:36PM -0400, David Miller wrote:
From: Michael S. Tsirkin m...@redhat.com
Date: Mon, 18 Mar 2013 13:20:03 +0200

 The following changes since commit 
 8c6216d7f118a128678270824b6a1286a63863ca:

   Revert ip_gre: make ipgre_tunnel_xmit() not parse network header 
 as IP unconditionally (2013-03-16 23:00:41 -0400)

 are available in the git repository at:

   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git 
 vhost-net

 for you to fetch changes up to 
 deb7cb067dabd3be625eaf5495da8bdc97377fc1:

   tcm_vhost: Flush vhost_work in vhost_scsi_flush() (2013-03-17 
 13:04:14 +0200)

This is a scsi driver, I therefore don't think this pull request is for 
me.

Please avoid such confusion in the future, and don't use branch names
like vhost-net for SCSI driver fixes.

Thanks.

   We are sharing quite a bit of code, so it's a common tree.
   There was a vhost-net patch there too but you picked it up
   directly so I dropped that and what's left is vhost-scsi.
   Typically vhost-net dominates so I'm just used to merging all vhost
   things through you but if you think it's wrong I can send it directly to
   Linus.

  I'll be sending out a target-pending 3.9-rc-fixes pull request later in
  the week, and am happy to include this series if you'd like.

  --nab

 Sure, go ahead. You can add
 Acked-by: Michael S. Tsirkin m...@redhat.com
 to the patches.

Applied to target-pending/master.

Thanks MST  Asias!

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V3 0/2] tcm_vhost hotplug/hotunplug support

2013-03-18 Thread Nicholas A. Bellinger

On Tue, 2013-03-12 at 10:45 +0800, Asias He wrote:
 Changes in v3:
 - Separate the bug fix to another thread
 
 Changes in v2:
 - Remove code duplication in tcm_vhost_{hotplug,hotunplug}
 - Fix racing of vs_events_nr
 - Add flush fix patch to this series
 
 
 Asias He (2):
   tcm_vhost: Introduce tcm_vhost_check_feature()
   tcm_vhost: Add hotplug/hotunplug support
 
  drivers/vhost/tcm_vhost.c | 224 
 --
  drivers/vhost/tcm_vhost.h |  10 +++
  2 files changed, 229 insertions(+), 5 deletions(-)
 

MST, are you OK with these two being applied to target-pending/for-next
now..?

--nab





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V3 0/2] tcm_vhost hotplug/hotunplug support

2013-03-18 Thread Michael S. Tsirkin

On Mon, Mar 18, 2013 at 02:46:12PM -0700, Nicholas A. Bellinger wrote:
 On Tue, 2013-03-12 at 10:45 +0800, Asias He wrote:
  Changes in v3:
  - Separate the bug fix to another thread
  
  Changes in v2:
  - Remove code duplication in tcm_vhost_{hotplug,hotunplug}
  - Fix racing of vs_events_nr
  - Add flush fix patch to this series
  
  
  Asias He (2):
tcm_vhost: Introduce tcm_vhost_check_feature()
tcm_vhost: Add hotplug/hotunplug support
  
   drivers/vhost/tcm_vhost.c | 224 
  --
   drivers/vhost/tcm_vhost.h |  10 +++
   2 files changed, 229 insertions(+), 5 deletions(-)
  
 
 MST, are you OK with these two being applied to target-pending/for-next
 now..?
 
 --nab

Sorry, no, I'd prefer we get userspace support in qemu in first.
If there's only a single user for this driver (kvmtool),
then it was a mistake to merge it, the right thing would be to freeze it
and look at whether we can drop it completely.

I really don't want this, I hope it will be merged and all will
be shiny, but I think it's wrong to rush in more code,
we have some time before the next merge window.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

2013-03-18 Thread Eric Northup

On Fri, Mar 15, 2013 at 8:29 AM, Xiao Guangrong
xiaoguangr...@linux.vnet.ibm.com wrote:
 This patch tries to introduce a very simple and scale way to invalid all
 mmio sptes - it need not walk any shadow pages and hold mmu-lock

 KVM maintains a global mmio invalid generation-number which is stored in
 kvm-arch.mmio_invalid_gen and every mmio spte stores the current global
 generation-number into his available bits when it is created

 When KVM need zap all mmio sptes, it just simply increase the global
 generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
 then it walks the shadow page table and get the mmio spte. If the
 generation-number on the spte does not equal the global generation-number,
 it will go to the normal #PF handler to update the mmio spte

 Since 19 bits are used to store generation-number on mmio spte, the
 generation-number can be round after 33554432 times. It is large enough
 for nearly all most cases, but making the code be more strong, we zap all
 shadow pages when the number is round

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/include/asm/kvm_host.h |2 +
  arch/x86/kvm/mmu.c  |   61 
 +--
  arch/x86/kvm/mmutrace.h |   17 +++
  arch/x86/kvm/paging_tmpl.h  |7 +++-
  arch/x86/kvm/vmx.c  |4 ++
  arch/x86/kvm/x86.c  |6 +--
  6 files changed, 82 insertions(+), 15 deletions(-)

 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index ef7f4a5..572398e 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -529,6 +529,7 @@ struct kvm_arch {
 unsigned int n_requested_mmu_pages;
 unsigned int n_max_mmu_pages;
 unsigned int indirect_shadow_pages;
 +   unsigned int mmio_invalid_gen;

Could this get initialized to something close to the wrap-around
value, so that the wrap-around case gets more real-world coverage?

 struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
 /*
  * Hash table of struct kvm_mmu_page.
 @@ -765,6 +766,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, 
 int slot);
  void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
  struct kvm_memory_slot *slot,
  gfn_t gfn_offset, unsigned long mask);
 +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm);
  void kvm_mmu_zap_all(struct kvm *kvm);
  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int 
 kvm_nr_mmu_pages);
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 13626f4..7093a92 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -234,12 +234,13 @@ static unsigned int get_mmio_spte_generation(u64 spte)
  static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
unsigned access)
  {
 -   u64 mask = generation_mmio_spte_mask(0);
 +   unsigned int gen = ACCESS_ONCE(kvm-arch.mmio_invalid_gen);
 +   u64 mask = generation_mmio_spte_mask(gen);

 access = ACC_WRITE_MASK | ACC_USER_MASK;
 mask |= shadow_mmio_mask | access | gfn  PAGE_SHIFT;

 -   trace_mark_mmio_spte(sptep, gfn, access, 0);
 +   trace_mark_mmio_spte(sptep, gfn, access, gen);
 mmu_spte_set(sptep, mask);
  }

 @@ -269,6 +270,34 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, 
 gfn_t gfn,
 return false;
  }

 +static bool check_mmio_spte(struct kvm *kvm, u64 spte)
 +{
 +   return get_mmio_spte_generation(spte) ==
 +   ACCESS_ONCE(kvm-arch.mmio_invalid_gen);
 +}
 +
 +/*
 + * The caller should protect concurrent access on
 + * kvm-arch.mmio_invalid_gen. Currently, it is used by
 + * kvm_arch_commit_memory_region and protected by kvm-slots_lock.
 + */
 +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm)
 +{
 +   /* Ensure update memslot has been completed. */
 +   smp_mb();
 +
 +trace_kvm_mmu_invalid_mmio_spte(kvm);
 +
 +   /*
 +* The very rare case: if the generation-number is round,
 +* zap all shadow pages.
 +*/
 +   if (unlikely(kvm-arch.mmio_invalid_gen++ == MAX_GEN)) {
 +   kvm-arch.mmio_invalid_gen = 0;
 +   return kvm_mmu_zap_all(kvm);
 +   }
 +}
 +
  static inline u64 rsvd_bits(int s, int e)
  {
 return ((1ULL  (e - s + 1)) - 1)  s;
 @@ -3183,9 +3212,12 @@ static u64 walk_shadow_page_get_mmio_spte(struct 
 kvm_vcpu *vcpu, u64 addr)
  }

  /*
 - * If it is a real mmio page fault, return 1 and emulat the instruction
 - * directly, return 0 to let CPU fault again on the address, -1 is
 - * returned if bug is detected.
 + * Return value:
 + * 2: invalid spte is detected then let the real page fault path
 + *update the mmio spte.
 + * 1: it is a real mmio page fault, emulate the instruction directly.
 + * 0: let CPU fault again on the

Re: [PATCH V3 0/2] tcm_vhost hotplug/hotunplug support

2013-03-18 Thread Nicholas A. Bellinger

On Mon, 2013-03-18 at 23:53 +0200, Michael S. Tsirkin wrote:
 On Mon, Mar 18, 2013 at 02:46:12PM -0700, Nicholas A. Bellinger wrote:
  On Tue, 2013-03-12 at 10:45 +0800, Asias He wrote:
   Changes in v3:
   - Separate the bug fix to another thread
   
   Changes in v2:
   - Remove code duplication in tcm_vhost_{hotplug,hotunplug}
   - Fix racing of vs_events_nr
   - Add flush fix patch to this series
   
   
   Asias He (2):
 tcm_vhost: Introduce tcm_vhost_check_feature()
 tcm_vhost: Add hotplug/hotunplug support
   
drivers/vhost/tcm_vhost.c | 224 
   --
drivers/vhost/tcm_vhost.h |  10 +++
2 files changed, 229 insertions(+), 5 deletions(-)
   
  
  MST, are you OK with these two being applied to target-pending/for-next
  now..?
  
  --nab
 
 Sorry, no, I'd prefer we get userspace support in qemu in first.
 If there's only a single user for this driver (kvmtool),
 then it was a mistake to merge it, the right thing would be to freeze it
 and look at whether we can drop it completely.
 
 I really don't want this, I hope it will be merged and all will
 be shiny, but I think it's wrong to rush in more code,
 we have some time before the next merge window.
 

Fair enough.

With the seabios issues now out of the way, the proposed vhost-scsi-pci
series from Paolo can make it's way into upstream QEMU in an expedited
fashion.

If Asias is OK with it, I'll take ownership of the WIP code and make
sure it finally makes it into upstream QEMU proper.

Thanks,

--nab

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 0/5] KVM: VMX: Add Posted Interrupt supporting

2013-03-18 Thread Marcelo Tosatti

On Fri, Mar 15, 2013 at 09:31:06PM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 The follwoing patches are adding the Posted Interrupt supporting to KVM:
 The first patch enables the feature 'acknowledge interrupt on vmexit'.Since
 it is required by Posted interrupt, we need to enable it firstly.
 
 And the subsequent patches are adding the posted interrupt supporting:
 Posted Interrupt allows APIC interrupts to inject into guest directly
 without any vmexit.
 
 - When delivering a interrupt to guest, if target vcpu is running,
   update Posted-interrupt requests bitmap and send a notification event
   to the vcpu. Then the vcpu will handle this interrupt automatically,
   without any software involvemnt.
 
 - If target vcpu is not running or there already a notification event
   pending in the vcpu, do nothing. The interrupt will be handled by
   next vm entry
 
 NOTE: We don't turn on the Posted Interrupt until the coalesced issue is
 solved.

Looks good, but RTC issue should be resolved.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: allow host header to be included even for !CONFIG_KVM

2013-03-18 Thread Marcelo Tosatti

On Thu, Mar 14, 2013 at 05:13:46PM -0700, Kevin Hilman wrote:
 The new context tracking subsystem unconditionally includes kvm_host.h
 headers for the guest enter/exit macros.  This causes a compile
 failure when KVM is not enabled.
 
 Fix by adding an IS_ENABLED(CONFIG_KVM) check to kvm_host so it can
 be included/compiled even when KVM is not enabled.
 
 Cc: Frederic Weisbecker fweis...@gmail.com
 Signed-off-by: Kevin Hilman khil...@linaro.org
 ---
 Applies on v3.9-rc2
 
  include/linux/kvm_host.h | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

Applied and queued for v3.9, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V3 0/2] tcm_vhost hotplug/hotunplug support

2013-03-18 Thread Asias He

On Mon, Mar 18, 2013 at 03:19:30PM -0700, Nicholas A. Bellinger wrote:
 On Mon, 2013-03-18 at 23:53 +0200, Michael S. Tsirkin wrote:
  On Mon, Mar 18, 2013 at 02:46:12PM -0700, Nicholas A. Bellinger wrote:
   On Tue, 2013-03-12 at 10:45 +0800, Asias He wrote:
Changes in v3:
- Separate the bug fix to another thread

Changes in v2:
- Remove code duplication in tcm_vhost_{hotplug,hotunplug}
- Fix racing of vs_events_nr
- Add flush fix patch to this series


Asias He (2):
  tcm_vhost: Introduce tcm_vhost_check_feature()
  tcm_vhost: Add hotplug/hotunplug support

 drivers/vhost/tcm_vhost.c | 224 
--
 drivers/vhost/tcm_vhost.h |  10 +++
 2 files changed, 229 insertions(+), 5 deletions(-)

   
   MST, are you OK with these two being applied to target-pending/for-next
   now..?
   
   --nab
  
  Sorry, no, I'd prefer we get userspace support in qemu in first.
  If there's only a single user for this driver (kvmtool),
  then it was a mistake to merge it, the right thing would be to freeze it
  and look at whether we can drop it completely.
  
  I really don't want this, I hope it will be merged and all will
  be shiny, but I think it's wrong to rush in more code,
  we have some time before the next merge window.
  
 
 Fair enough.
 
 With the seabios issues now out of the way, the proposed vhost-scsi-pci
 series from Paolo can make it's way into upstream QEMU in an expedited
 fashion.
 
 If Asias is OK with it, I'll take ownership of the WIP code and make
 sure it finally makes it into upstream QEMU proper.

Nicholas, sure, go ahead. I will send out WIP v3 where you can start
with.

-- 
Asias
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V3 WIP 0/3] vhost-scsi: new device supporting the tcm_vhost Linux kernel module

2013-03-18 Thread Asias He

This is on top of Paolo and Nick's work.

Current status:
Basically, tcm_vhost + seabios works now. We still have one more issue,
vhost_verify_ring_mappings fails. The hotplug also works with the latest
tcm_vhost.ko hotplug patch.

Asias He (1):
  disable vhost_verify_ring_mappings check

Paolo Bonzini (2):
  virtio-scsi: create VirtIOSCSICommon
  vhost-scsi: new device supporting the tcm_vhost Linux kernel module

 configure  |  15 ++-
 hw/Makefile.objs   |   5 +-
 hw/s390x/s390-virtio-bus.c |  35 +++
 hw/vhost-scsi.c| 242 +
 hw/vhost-scsi.h|  65 
 hw/vhost.c |   2 +
 hw/virtio-pci.c|  62 
 hw/virtio-scsi.c   | 199 +
 hw/virtio-scsi.h   | 129 
 include/qemu/osdep.h   |   4 +
 10 files changed, 606 insertions(+), 152 deletions(-)
 create mode 100644 hw/vhost-scsi.c
 create mode 100644 hw/vhost-scsi.h

-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V3 WIP 1/3] virtio-scsi: create VirtIOSCSICommon

2013-03-18 Thread Asias He

From: Paolo Bonzini pbonz...@redhat.com

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/virtio-scsi.c | 199 +--
 hw/virtio-scsi.h | 127 
 include/qemu/osdep.h |   4 ++
 3 files changed, 180 insertions(+), 150 deletions(-)

diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c
index 72cc519..6cd0272 100644
--- a/hw/virtio-scsi.c
+++ b/hw/virtio-scsi.c
@@ -18,131 +18,12 @@
 #include hw/scsi.h
 #include hw/scsi-defs.h
 
-#define VIRTIO_SCSI_VQ_SIZE 128
-#define VIRTIO_SCSI_CDB_SIZE32
-#define VIRTIO_SCSI_SENSE_SIZE  96
-#define VIRTIO_SCSI_MAX_CHANNEL 0
-#define VIRTIO_SCSI_MAX_TARGET  255
-#define VIRTIO_SCSI_MAX_LUN 16383
-
-/* Response codes */
-#define VIRTIO_SCSI_S_OK   0
-#define VIRTIO_SCSI_S_OVERRUN  1
-#define VIRTIO_SCSI_S_ABORTED  2
-#define VIRTIO_SCSI_S_BAD_TARGET   3
-#define VIRTIO_SCSI_S_RESET4
-#define VIRTIO_SCSI_S_BUSY 5
-#define VIRTIO_SCSI_S_TRANSPORT_FAILURE6
-#define VIRTIO_SCSI_S_TARGET_FAILURE   7
-#define VIRTIO_SCSI_S_NEXUS_FAILURE8
-#define VIRTIO_SCSI_S_FAILURE  9
-#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED   10
-#define VIRTIO_SCSI_S_FUNCTION_REJECTED11
-#define VIRTIO_SCSI_S_INCORRECT_LUN12
-
-/* Controlq type codes.  */
-#define VIRTIO_SCSI_T_TMF  0
-#define VIRTIO_SCSI_T_AN_QUERY 1
-#define VIRTIO_SCSI_T_AN_SUBSCRIBE 2
-
-/* Valid TMF subtypes.  */
-#define VIRTIO_SCSI_T_TMF_ABORT_TASK   0
-#define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET   1
-#define VIRTIO_SCSI_T_TMF_CLEAR_ACA2
-#define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET   3
-#define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET  4
-#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET   5
-#define VIRTIO_SCSI_T_TMF_QUERY_TASK   6
-#define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET   7
-
-/* Events.  */
-#define VIRTIO_SCSI_T_EVENTS_MISSED0x8000
-#define VIRTIO_SCSI_T_NO_EVENT 0
-#define VIRTIO_SCSI_T_TRANSPORT_RESET  1
-#define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
-#define VIRTIO_SCSI_T_PARAM_CHANGE 3
-
-/* Reasons for transport reset event */
-#define VIRTIO_SCSI_EVT_RESET_HARD 0
-#define VIRTIO_SCSI_EVT_RESET_RESCAN   1
-#define VIRTIO_SCSI_EVT_RESET_REMOVED  2
-
-/* SCSI command request, followed by data-out */
 typedef struct {
-uint8_t lun[8];  /* Logical Unit Number */
-uint64_t tag;/* Command identifier */
-uint8_t task_attr;   /* Task attribute */
-uint8_t prio;
-uint8_t crn;
-uint8_t cdb[];
-} QEMU_PACKED VirtIOSCSICmdReq;
-
-/* Response, followed by sense data and data-in */
-typedef struct {
-uint32_t sense_len;  /* Sense data length */
-uint32_t resid;  /* Residual bytes in data buffer */
-uint16_t status_qualifier;   /* Status qualifier */
-uint8_t status;  /* Command completion status */
-uint8_t response;/* Response values */
-uint8_t sense[];
-} QEMU_PACKED VirtIOSCSICmdResp;
-
-/* Task Management Request */
-typedef struct {
-uint32_t type;
-uint32_t subtype;
-uint8_t lun[8];
-uint64_t tag;
-} QEMU_PACKED VirtIOSCSICtrlTMFReq;
-
-typedef struct {
-uint8_t response;
-} QEMU_PACKED VirtIOSCSICtrlTMFResp;
-
-/* Asynchronous notification query/subscription */
-typedef struct {
-uint32_t type;
-uint8_t lun[8];
-uint32_t event_requested;
-} QEMU_PACKED VirtIOSCSICtrlANReq;
-
-typedef struct {
-uint32_t event_actual;
-uint8_t response;
-} QEMU_PACKED VirtIOSCSICtrlANResp;
-
-typedef struct {
-uint32_t event;
-uint8_t lun[8];
-uint32_t reason;
-} QEMU_PACKED VirtIOSCSIEvent;
-
-typedef struct {
-uint32_t num_queues;
-uint32_t seg_max;
-uint32_t max_sectors;
-uint32_t cmd_per_lun;
-uint32_t event_info_size;
-uint32_t sense_size;
-uint32_t cdb_size;
-uint16_t max_channel;
-uint16_t max_target;
-uint32_t max_lun;
-} QEMU_PACKED VirtIOSCSIConfig;
-
-typedef struct {
-VirtIODevice vdev;
-DeviceState *qdev;
-VirtIOSCSIConf *conf;
+VirtIOSCSICommon vs;
 
 SCSIBus bus;
-uint32_t sense_size;
-uint32_t cdb_size;
 int resetting;
 bool events_dropped;
-VirtQueue *ctrl_vq;
-VirtQueue *event_vq;
-VirtQueue *cmd_vqs[0];
 } VirtIOSCSI;
 
 typedef struct VirtIOSCSIReq {
@@ -193,7 +74,7 @@ static void virtio_scsi_complete_req(VirtIOSCSIReq *req)
 scsi_req_unref(req-sreq);
 }
 g_free(req);
-virtio_notify(s-vdev, vq);
+virtio_notify(s-vs.vdev, vq);
 }
 
 static void virtio_scsi_bad_req(void)
@@ -252,7 +133,7 @@ static void virtio_scsi_save_request(QEMUFile *f, 
SCSIRequest *sreq)
 VirtIOSCSIReq *req = sreq-hba_private;
 uint32_t n =

[PATCH V3 WIP 2/3] vhost-scsi: new device supporting the tcm_vhost Linux kernel module

2013-03-18 Thread Asias He

From: Paolo Bonzini pbonz...@redhat.com

The WWPN specified in configfs is passed to -device vhost-scsi-pci.
The tgpt field of the SET_ENDPOINT ioctl is obsolete now, so it is not
available from the QEMU command-line.  Instead, I hardcode it to zero.

Changes in V3:
   - Drop ioeventfd vhost_scsi_properties (asias, thanks stefanha)
   - Add CONFIG_VHOST_SCSI (asias, thanks stefanha)
   - Add hotplug feature bit

Changes in V2:
   - Add backend guest masking support (nab)
   - Bump ABI_VERSION to 1 (nab)
   - Set up set_guest_notifiers (asias)
   - Set up vs-dev.vq_index (asias)
   - Drop vs-vs.vdev.{set,clear}_vhost_endpoint (asias)
   - Drop VIRTIO_CONFIG_S_DRIVER check in vhost_scsi_set_status (asias)

Howto:
   Use the latest seabios
   git clone git://git.seabios.org/seabios.git
   make
   cp out/bios.bin /usr/share/qemu/bios.bin
   qemu -device vhost-scsi-pci,wwpn=naa.6001405bd4e8476d,event_idx=off ...

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org
Signed-off-by: Asias He as...@redhat.com
---
 configure  |  15 ++-
 hw/Makefile.objs   |   5 +-
 hw/s390x/s390-virtio-bus.c |  35 +++
 hw/vhost-scsi.c| 242 +
 hw/vhost-scsi.h|  65 
 hw/virtio-pci.c|  62 
 hw/virtio-scsi.h   |   2 +
 7 files changed, 424 insertions(+), 2 deletions(-)
 create mode 100644 hw/vhost-scsi.c
 create mode 100644 hw/vhost-scsi.h

diff --git a/configure b/configure
index 84317c6..dca4a66 100755
--- a/configure
+++ b/configure
@@ -169,6 +169,7 @@ libattr=
 xfs=
 
 vhost_net=no
+vhost_scsi=no
 kvm=no
 gprof=no
 debug_tcg=no
@@ -531,6 +532,7 @@ Haiku)
   usb=linux
   kvm=yes
   vhost_net=yes
+  vhost_scsi=yes
   if [ $cpu = i386 -o $cpu = x86_64 ] ; then
 audio_possible_drivers=$audio_possible_drivers fmod
   fi
@@ -857,6 +859,10 @@ for opt do
   ;;
   --enable-vhost-net) vhost_net=yes
   ;;
+  --disable-vhost-scsi) vhost_scsi=no
+  ;;
+  --enable-vhost-scsi) vhost_scsi=yes
+  ;;
   --disable-opengl) opengl=no
   ;;
   --enable-opengl) opengl=yes
@@ -3058,7 +3064,7 @@ fi
 # __sync_fetch_and_and requires at least -march=i486. Many toolchains
 # use i686 as default anyway, but for those that don't, an explicit
 # specification is necessary
-if test $vhost_net = yes  test $cpu = i386; then
+if (test $vhost_net = yes -o $vhost_scsi = yes)  test $cpu = 
i386; then
   cat  $TMPC  EOF
 static int sfaa(int *ptr)
 {
@@ -3404,6 +3410,7 @@ echo sigev_thread_id   $sigev_thread_id
 echo uuid support  $uuid
 echo libcap-ng support $cap_ng
 echo vhost-net support $vhost_net
+echo vhost-scsi support $vhost_scsi
 echo Trace backend $trace_backend
 echo Trace output file $trace_file-pid
 echo spice support $spice ($spice_protocol_version/$spice_server_version)
@@ -3673,6 +3680,9 @@ fi
 if test $virtfs = yes ; then
   echo CONFIG_VIRTFS=y  $config_host_mak
 fi
+if test $vhost_scsi = yes ; then
+  echo CONFIG_VHOST_SCSI=y  $config_host_mak
+fi
 if test $blobs = yes ; then
   echo INSTALL_BLOBS=yes  $config_host_mak
 fi
@@ -4149,6 +4159,9 @@ case $target_arch2 in
   if test $vhost_net = yes ; then
 echo CONFIG_VHOST_NET=y  $config_target_mak
   fi
+  if test $vhost_scsi = yes ; then
+echo CONFIG_VHOST_SCSI=y  $config_target_mak
+  fi
 fi
 esac
 case $target_arch2 in
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index eb7eb31..df57e1f 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -202,8 +202,11 @@ common-obj-$(CONFIG_XEN_BACKEND) += xen_console.o xenfb.o 
xen_disk.o xen_nic.o
 obj-$(CONFIG_VIRTIO) += dataplane/
 obj-$(CONFIG_VIRTIO) += virtio.o virtio-blk.o virtio-balloon.o virtio-net.o
 obj-$(CONFIG_VIRTIO) += virtio-serial-bus.o virtio-scsi.o
+ifeq ($(CONFIG_VIRTIO), y)
+obj-$(CONFIG_LINUX) += vhost-scsi.o
+endif
 obj-$(CONFIG_SOFTMMU) += vhost_net.o
-obj-$(CONFIG_VHOST_NET) += vhost.o
+obj-$(CONFIG_LINUX) += vhost.o
 obj-$(CONFIG_REALLY_VIRTFS) += 9pfs/
 obj-$(CONFIG_VGA) += vga.o
 
diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c
index d9b7f83..d86365b 100644
--- a/hw/s390x/s390-virtio-bus.c
+++ b/hw/s390x/s390-virtio-bus.c
@@ -28,6 +28,8 @@
 #include hw/virtio-rng.h
 #include hw/virtio-serial.h
 #include hw/virtio-net.h
+#include hw/virtio-scsi.h
+#include hw/vhost-scsi.h
 #include hw/sysbus.h
 #include sysemu/kvm.h
 
@@ -207,6 +209,18 @@ static int s390_virtio_scsi_init(VirtIOS390Device *dev)
 return s390_virtio_device_init(dev, vdev);
 }
 
+static int s390_vhost_scsi_init(VirtIOS390Device *dev)
+{
+VirtIODevice *vdev;
+
+vdev = vhost_scsi_init((DeviceState *)dev, dev-scsi);
+if (!vdev) {
+return -1;
+}
+
+return s390_virtio_device_init(dev, vdev);
+}
+
 static int s390_virtio_rng_init(VirtIOS390Device *dev)
 {
 VirtIODevice *vdev;
@@ -534,6 +548,27 @@ static const TypeInfo virtio_s390_device_info = {
 .abstract = true,
 };
 
+static

[PATCH V3 WIP 3/3] disable vhost_verify_ring_mappings check

2013-03-18 Thread Asias He

---
 hw/vhost.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/vhost.c b/hw/vhost.c
index 4d6aee3..0c52ec4 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -421,10 +421,12 @@ static void vhost_set_memory(MemoryListener *listener,
 return;
 }
 
+#if 0
 if (dev-started) {
 r = vhost_verify_ring_mappings(dev, start_addr, size);
 assert(r = 0);
 }
+#endif
 
 if (!dev-log_enabled) {
 r = ioctl(dev-control, VHOST_SET_MEM_TABLE, dev-mem);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Fix booting tcm_vhost + seabios

2013-03-18 Thread Asias He

On Mon, Mar 18, 2013 at 02:26:14PM -0700, Nicholas A. Bellinger wrote:
 On Fri, 2013-03-15 at 09:45 +0800, Asias He wrote:
  Asias He (2):
virtio-scsi: Set _DRIVER_OK flag before scsi target scanning
virtio-scsi: Pack struct virtio_scsi_{req_cmd,resp_cmd}
  
   src/virtio-scsi.c | 5 +++--
   src/virtio-scsi.h | 4 ++--
   2 files changed, 5 insertions(+), 4 deletions(-)
  
 
 Hi Asias,
 
 Thanks for taking the initiative on this, and nice work tracking both of
 these stubborn bugs down.
 
 So with these out of the way, we're good to go for an RFC of Paolo's
 vhost-scsi-pci code for upstream QEMU, yes..?
 
 I'll have some extra bandwidth this week to spend time on the RFC if
 you'd like, otherwise I'm happy with you making the upstream QEMU push
 for Paolo's code. 
 
 Whatever works best for you.  :)

Nice, Nicholas, go ahead. I have sent out the WIP V3 for you ;-)

 Thank you,
 
 --nab
 
 
 

-- 
Asias
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] tcm_vhost: Wait for pending requests in vhost_scsi_flush()

2013-03-18 Thread Asias He

On Wed, Mar 13, 2013 at 01:16:59PM +0800, Asias He wrote:
 On Tue, Mar 12, 2013 at 09:21:51AM +0100, Paolo Bonzini wrote:
  Il 12/03/2013 02:31, Asias He ha scritto:
   On Mon, Mar 11, 2013 at 12:36:37PM +0100, Paolo Bonzini wrote:
   Il 11/03/2013 06:09, Asias He ha scritto:
   This patch makes vhost_scsi_flush() wait for all the pending requests
   issued before the flush operation to be finished.
  
   There is no protection against issuing concurrent flush operations.  If
   we later would like to make the flush a ioctl (for example for migration
   purposes), it would be confusing, and I'm not sure how you could extend
   the during_flush machinery.
   
   vhost_scsi_flush() is called under the vs-dev.mutex lock.
  
  Ah, ok.
  
   What about making vhost_scsi_flush() wait for all pending requests,
   including those issues during the flush operation?
   
   This will take unbonded time if guest keep sending requests.
  
  Yes, that's correct, but flush doesn't really mean much if new requests
  can come in (unlike _cache_ flushes like SYNCHRONIZE CACHE).  In the end
  you'll have to stop the VM first and then issue the flush.  At this
  point, it does not change much if you wait for previous requests or all
  requests, and I suspect that waiting for all requests simplifies the
  code noticeably.
 
 Michael, any comments? You suggested flushing of previous requests other
 than all the requests when I was doing vhost-blk.

Ping.

   Then you can easily
   support concurrent flushes; just add a waitqueue and wake_up_all at the
   end of the flush operation.
   
   I am not sure why we want concurrent flushes. The flush thing is
   already getting complex.
  
  Yeah, it is too complex...
  
  Paolo
  
   BTW, adding such a ioctl as part of this patch would probably be a good
   thing to do anyway.
  
   Paolo
   
  
 
 -- 
 Asias

-- 
Asias
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V3 3/3] tcm_vhost: Use vq-private_data to indicate if the endpoint is setup

2013-03-18 Thread Asias He

On Mon, Mar 18, 2013 at 11:30:57AM +0200, Michael S. Tsirkin wrote:
 On Mon, Mar 18, 2013 at 05:14:33PM +0800, Asias He wrote:
  On Mon, Mar 18, 2013 at 10:19:00AM +0200, Michael S. Tsirkin wrote:
   On Fri, Mar 15, 2013 at 09:14:07AM +0800, Asias He wrote:
Currently, vs-vs_endpoint is used indicate if the endpoint is setup or
not. It is set or cleared in vhost_scsi_set_endpoint() or
vhost_scsi_clear_endpoint() under the vs-dev.mutex lock. However, when
we check it in vhost_scsi_handle_vq(), we ignored the lock, this is
wrong.
   
   This one, I don't get. Why is it wrong? Could you please describe the
   race codition you are trying to prevent?
  
  Why is it safe to access vs-vs_endpoint without any lock?
 
 For the same reason it's safe with the pointer: either readers
 see the old or the new value, and we flush before relying on
 the new value.

vs_endpoint is a bool not a pointer. Here it is even more implicit to
understand the whole story. Why not make it more consistent with the other
user of vhost. Using vq-private_data for backend related data.

We have enough special tricks (vhost rcu, vhost work queue).

 RCU macros also include barriers that are irrelevant if you are not
 going to access any data through the pointer.
 Nowdays they also including lockdep-like checks, which you override.

vhost-net is also overriding, no? And I am not seeing any effect to make
the '1' gonna.

  sock = rcu_dereference_check(vq-private_data, 1);

Instead of using the vs-vs_endpoint and the vs-dev.mutex lock to
indicate the status of the endpoint, we use per virtqueue
vq-private_data to indicate it. In this way, we can only take the
vq-mutex lock which is per queue and make the concurrent multiqueue
process having less lock contention. Further, in the read side of
vq-private_data, we can even do not take only lock if it is accessed in
the vhost worker thread, because it is protected by vhost rcu.
   
   But (unlike with -net) you never actually need the pointer. So why all
   the complexity?
  
  It works as a flag, NULL or !NULL.
  
  This is from your other mail:
  
  '''
  This takes dev mutex on data path which will introduce
  contention esp for multiqueue.
  How about storing the endpoint as part of vq
  private data and protecting with vq mutex?
  '''
 
 Yes this is better than taking the mutex but I don't see
 a problem as is, either. For patch to go into 3.9 it needs
 to fix a bug, not just be a refactoring.

Well, if it is not fix a real bug. Let's skip it for 3.9. 

Signed-off-by: Asias He as...@redhat.com
---
 drivers/vhost/tcm_vhost.c | 46 
--
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
index 43fb11e..099feef 100644
--- a/drivers/vhost/tcm_vhost.c
+++ b/drivers/vhost/tcm_vhost.c
@@ -67,7 +67,6 @@ struct vhost_scsi {
/* Protected by vhost_scsi-dev.mutex */
struct tcm_vhost_tpg *vs_tpg[VHOST_SCSI_MAX_TARGET];
char vs_vhost_wwpn[TRANSPORT_IQN_LEN];
-   bool vs_endpoint;
 
struct vhost_dev dev;
struct vhost_virtqueue vqs[VHOST_SCSI_MAX_VQ];
@@ -91,6 +90,24 @@ static int iov_num_pages(struct iovec *iov)
   ((unsigned long)iov-iov_base  PAGE_MASK))  
PAGE_SHIFT;
 }
 
+static bool tcm_vhost_check_endpoint(struct vhost_virtqueue *vq)
+{
+   bool ret = false;
+
+   /*
+* We can handle the vq only after the endpoint is setup by 
calling the
+* VHOST_SCSI_SET_ENDPOINT ioctl.
+*
+* TODO: Check that we are running from vhost_worker which acts
+* as read-side critical section for vhost kind of RCU.
+* See the comments in struct vhost_virtqueue in 
drivers/vhost/vhost.h
+*/
+   if (rcu_dereference_check(vq-private_data, 1))
+   ret = true;
+
+   return ret;
+}
+
 static int tcm_vhost_check_true(struct se_portal_group *se_tpg)
 {
return 1;
@@ -581,8 +598,7 @@ static void vhost_scsi_handle_vq(struct vhost_scsi 
*vs,
int head, ret;
u8 target;
 
-   /* Must use ioctl VHOST_SCSI_SET_ENDPOINT */
-   if (unlikely(!vs-vs_endpoint))
+   if (!tcm_vhost_check_endpoint(vq))
return;
 
mutex_lock(vq-mutex);
@@ -781,8 +797,9 @@ static int vhost_scsi_set_endpoint(
 {
struct tcm_vhost_tport *tv_tport;
struct tcm_vhost_tpg *tv_tpg;
+   struct vhost_virtqueue *vq;
bool match = false;
-   int index, ret;
+   int index, ret, i;
 
mutex_lock(vs-dev.mutex);
/* Verify that ring has been setup correctly. */
@@ -826,7 +843,13 @@ static int

Re: [PATCH 6/6] KVM: MMU: fast zap all shadow pages

2013-03-18 Thread Xiao Guangrong

On 03/19/2013 04:46 AM, Marcelo Tosatti wrote:
 On Wed, Mar 13, 2013 at 12:59:12PM +0800, Xiao Guangrong wrote:
 The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
 walk and zap all shadow pages one by one, also it need to zap all guest
 page's rmap and all shadow page's parent spte list. Particularly, things
 become worse if guest uses more memory or vcpus. It is not good for
 scalability.

 Since all shadow page will be zapped, we can directly zap the mmu-cache
 and rmap so that vcpu will fault on the new mmu-cache, after that, we can
 directly free the memory used by old mmu-cache.

 The root shadow page is little especial since they are currently used by
 vcpus, we can not directly free them. So, we zap the root shadow pages and
 re-add them into the new mmu-cache.

 After this patch, kvm_mmu_zap_all can be faster 113% than before

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/mmu.c |   62 
 ++-
  1 files changed, 56 insertions(+), 6 deletions(-)

 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index e326099..536d9ce 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -4186,18 +4186,68 @@ void kvm_mmu_slot_remove_write_access(struct kvm 
 *kvm, int slot)

  void kvm_mmu_zap_all(struct kvm *kvm)
  {
 -struct kvm_mmu_page *sp, *node;
 +LIST_HEAD(root_mmu_pages);
  LIST_HEAD(invalid_list);
 +struct list_head pte_list_descs;
 +struct kvm_mmu_cache *cache = kvm-arch.mmu_cache;
 +struct kvm_mmu_page *sp, *node;
 +struct pte_list_desc *desc, *ndesc;
 +int root_sp = 0;

  spin_lock(kvm-mmu_lock);
 +
  restart:
 -list_for_each_entry_safe(sp, node,
 -  kvm-arch.mmu_cache.active_mmu_pages, link)
 -if (kvm_mmu_prepare_zap_page(kvm, sp, invalid_list))
 -goto restart;
 +/*
 + * The root shadow pages are being used on vcpus that can not
 + * directly removed, we filter them out and re-add them to the
 + * new mmu cache.
 + */
 +list_for_each_entry_safe(sp, node, cache-active_mmu_pages, link)
 +if (sp-root_count) {
 +int ret;
 +
 +root_sp++;
 +ret = kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
 +list_move(sp-link, root_mmu_pages);
 +if (ret)
 +goto restart;
 +}
 +
 +list_splice(cache-active_mmu_pages, invalid_list);
 +list_replace(cache-pte_list_descs, pte_list_descs);
 +
 +/*
 + * Reset the mmu cache so that later vcpu will fault on the new
 + * mmu cache.
 + */
 +memset(cache, 0, sizeof(*cache));
 +kvm_mmu_init(kvm);
 
 Xiao,
 
 I suppose zeroing of kvm_mmu_cache can be avoided, if the links are
 removed at prepare_zap_page. So perhaps

The purpose of zeroing of kvm_mmu_cache is resetting the hashtable and
some count numbers.
[.n_request_mmu_pages and .n_max_mmu_pages should not be changed, i will
fix this].

 
 - spin_lock(mmu_lock)
 - for each page
   - zero sp-spt[], remove page from linked lists

sizeof(mmu_cache) is:
(1  10) * sizeof (hlist_head) + 4 * sizeof(unsigned int) = 2^13 + 16
and it is constant. In your way, for every sp, we need to zap:
512 entries + a hash-node = 2^12 + 8
especially the workload depends on the size of guest memory.
Why you think this way is better?

 - flush remote TLB (batched)
 - spin_unlock(mmu_lock)
 - free data (which is safe because freeing has its own serialization)

We should free the root sp in mmu-lock like my patch.

 - spin_lock(mmu_lock)
 - account for the pages freed
 - spin_unlock(mmu_lock)

The count numbers are still inconsistent if other thread hold mmu-lock between
zero shadow page and recount.

Marcelo, i really confused what is the benefit in this way but i might
completely misunderstand it.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

2013-03-18 Thread Xiao Guangrong

On 03/19/2013 06:16 AM, Eric Northup wrote:
 On Fri, Mar 15, 2013 at 8:29 AM, Xiao Guangrong
 xiaoguangr...@linux.vnet.ibm.com wrote:
 This patch tries to introduce a very simple and scale way to invalid all
 mmio sptes - it need not walk any shadow pages and hold mmu-lock

 KVM maintains a global mmio invalid generation-number which is stored in
 kvm-arch.mmio_invalid_gen and every mmio spte stores the current global
 generation-number into his available bits when it is created

 When KVM need zap all mmio sptes, it just simply increase the global
 generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
 then it walks the shadow page table and get the mmio spte. If the
 generation-number on the spte does not equal the global generation-number,
 it will go to the normal #PF handler to update the mmio spte

 Since 19 bits are used to store generation-number on mmio spte, the
 generation-number can be round after 33554432 times. It is large enough
 for nearly all most cases, but making the code be more strong, we zap all
 shadow pages when the number is round

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/include/asm/kvm_host.h |2 +
  arch/x86/kvm/mmu.c  |   61 
 +--
  arch/x86/kvm/mmutrace.h |   17 +++
  arch/x86/kvm/paging_tmpl.h  |7 +++-
  arch/x86/kvm/vmx.c  |4 ++
  arch/x86/kvm/x86.c  |6 +--
  6 files changed, 82 insertions(+), 15 deletions(-)

 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index ef7f4a5..572398e 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -529,6 +529,7 @@ struct kvm_arch {
 unsigned int n_requested_mmu_pages;
 unsigned int n_max_mmu_pages;
 unsigned int indirect_shadow_pages;
 +   unsigned int mmio_invalid_gen;
 
 Could this get initialized to something close to the wrap-around
 value, so that the wrap-around case gets more real-world coverage?

I am afraid we can not. We cache the current mmio_invalid_gen into mmio spte 
when
it is created no matter what the initiation value is.

If you have a better way, please show me. ;)



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] VFIO: VFIO_DEVICE_SET_ADDR_MAPPING command

2013-03-18 Thread Alex Williamson

On Tue, 2013-03-19 at 11:24 +0800, Gavin Shan wrote:
 On Mon, Mar 18, 2013 at 03:01:14PM -0600, Alex Williamson wrote:
 On Sat, 2013-03-16 at 06:37 +0100, Benjamin Herrenschmidt wrote:
  On Sat, 2013-03-16 at 09:34 +0800, Gavin Shan wrote:
   Could you explain further how this will be used?  How the device is
   exposed to a guest is entirely a userspace construct, so why does vfio
   need to know or care about this?  I had assumed for AER that QEMU would
   do the translation from host to guest address space.
   
   
   The weak IOCTL function (vfio_pci_arch_ioctl) was introduced by previous
   patch. The PowerNV platform is going to override it to figure out the
   information for EEH core to use. On the other hand, QEMU will runs into
   the IOCTL command while opening (creating) one VFIO device.
   
   Though I'm not familiar with AER very much. AER is quite different from
   EEH. The EEH functionality implemented in PHB instead of in PCI device
   core. So we don't care AER stuff in EEH directly :-)
  
  To give Alex a bit more background...
  
  EEH is our IBM specific error handling facility which is a superset of AER.
  
  IE. In addition to AER's error detection and logging, it adds a layer of
  error detection at the host bridge level (such as iommu violations etc...)
  and a mechanism for handling and recovering from errors. This is tied to
  our iommu domain stuff (our PE's) and our device freezing capability
  among others.
  
  With VFIO + KVM, we want to implement most of the EEH support for guests in
  the host kernel. The reason is multipart and we can discuss this separately
  as some of it might well be debatable (mostly it's more convenient that way
  because we hook into the underlying HW/FW EEH which isn't directly 
  userspace
  accessible so we don't have to add a new layer of kernel - user API in
  addition to the VFIO stuff), but there's at least one aspect of it that 
  drives
  this requirement more strongly which is performance:
  
  When EEH is enabled, whenever any MMIO returns all 1's, the kernel will do
  a firmware call to query the EEH state of the device and check whether it
  has been frozen. On some devices, that can be a performance issue, and
  going all the way to qemu for that would be horribly expensive.
  
  So we want at least a way to handle that call in the kernel and for that we
  need at least some way of mapping things there.
 
 There's no notification mechanism when a PHB is frozen?  I suppose
 notification would be asynchronous so you risk data for every read that
 happens in the interim.  So the choices are a) tell the host kernel the
 mapping, b) tell the guest kernel the mapping, c) identity mapping, or
 d) qemu intercept?
 
 
 We do have dedicated interrupts on detecting frozen PHB on host side.
 However, the guest has to poll/check the frozen state (frozen PE) during
 access to config or MMIO space.

Can you make use of something like this to notify the guest:

https://github.com/awilliam/linux-vfio/commit/dad9f8972e04cd081a028d8fb1249d746d97fc03

As a first step this only notifies QEMU, but the plan is to forward that
on to the guest.  If we can leverage similar interfaces between AER and
EEH, I'd obviously like to do that.

 For the recommended methods, (a) is what
 we want to do with the patchset. (b) seems infeasible since the guest
 shouldn't be aware of hypervisor (e.g. KVM or PowerVM) it's running on
 top of, it's hard to polish the guest to do it. (d) sounds applicable
 since the QEMU should know the address (BDF) of host and guest devices.
 However, we still need let the host EEH core know that which PCI device
 has been passed to guest and the best place to do that would be when opening
 the corresponding VFIO PCI device. In turn, it will still need weak function
 for ppc platform to override it. Why we not directly take (a) to finish
 everything in one VFIO IOCTL command?

Because it seems like VFIO is just being used as a relay and has no
purpose knowing this information on it's own.  It's just a convenient
place to host the ioctl, but that alone is not a good enough reason to
put it there.

 Sorry, Alex. I didn't understand (c) well :-)

(c) is if the buid/bus/slot/func exposed to the guest matches the same
for the host, then there's no need for mapping translation.

 Presumably your firmware call to query the EEH is not going through
 VFIO, so is VFIO the appropriate place to setup this mapping?  As you
 say, this seems like just a convenient place to put it even though it
 really has nothing to do with the VFIO kernel component.  QEMU has this
 information and could register it with the host kernel through other
 means if available.  Maybe the mapping should be registered with KVM if
 that's how the EEH data is accessed.  I'm not yet sold on why this
 mapping is registered here.  Thanks,
 
 
 Yes, EEH firmware call needn't going through VFIO. However, EEH has
 very close relationship with PCI and so VFIO-PCI does. Eventually, EEH
 has

Re: [PATCH 2/3] VFIO: VFIO_DEVICE_SET_ADDR_MAPPING command

2013-03-18 Thread Benjamin Herrenschmidt

On Mon, 2013-03-18 at 22:18 -0600, Alex Williamson wrote:
  Yes, EEH firmware call needn't going through VFIO. However, EEH has
  very close relationship with PCI and so VFIO-PCI does. Eventually, EEH
  has close relationship with VFIO-PCI :-)
 
 Is there some plan to do more with EEH through VFIO in the future or are
 we just talking about some kind of weird associative property to sell
 this ioctl?  Thanks,

Well, I'm not sure how 'weird' that is but it makes sense to me... VFIO
is the mechanism that virtualizes access to a PCI device and provides
interfaces to qemu  kvm to access it | map it.

Or rather VFIO-PCI is.

At a basic level it provides ... the basic PCI interfaces, ie, config
space access (with or without filtering), interrupts, etc...

In our environment, EEH is just another functionality of PCI really.
The firmware calls used by the guest to do that fall into more or less
the same category as the ones used for PCI config space accesses,
manipulation of DMA windows, etc... Similar to host (though guest
and host use a different FW interface for various reasons).

So it's very natural to transport these via VFIO-PCI like everything
else, I don't see a more natural place to put the ioctl's we need for
qemu to be able to access the EEH state, trigger EEH (un)isolation,
resets, etc...

Fundamentally, the design should be for VFIO-PCI to provide some specific
ioctls for EEH that userspace programs such as qemu can use, and then
re-expose those APIs to the guest.

In addition, to do some of it in the kernel for performance reason, we
want to establish that mapping, but I see that as a VFIO accelerator.

IE. Whatever is going to respond to the EEH calls from the guest in-kernel
will have to share state with the rest of the EEH stuff provided to qemu
by vfio-pci.

Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

93 matches

Mail list logo