Re: [PATCH for-1.2 0/2] migrate PV EOI MSR

2012-08-27 Thread Jan Kiszka
On 2012-08-26 17:59, Michael S. Tsirkin wrote:
 It turns out PV EOI gets disabled after migration -
 until next guest reset.
 This is because we are missing code to actually migrate it.
 This patch fixes it up: it does not do anything useful
 without kvm irqchip but applies cleanly to qemu.git
 as well as qemu-kvm.git, so I think it's cleaner
 to apply it in qemu.git to keep diff to minimum.

There is nothing except pci-assign left in qemu-kvm (which will be
posted for upstream in a minute), so you are intuitively doing the right
thing.

Patch 2 looks good to me, see patch 1 for the clean procedure.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [PATCH for-1.2 1/2] linux-headers: update asm/kvm_para.h to 3.6

2012-08-27 Thread Jan Kiszka
On 2012-08-26 17:59, Michael S. Tsirkin wrote:
 Update asm-x96/kvm_para.h to version present in Linux 3.6.

Nope, we have update-linux-headers.sh for this. Just run it again
3.6-rcX, grab the result, and mention the source (release version or
kvm.git hash).

Jan

 This is needed for the new PV EOI feature.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  linux-headers/asm-x86/kvm_para.h | 7 +++
  1 file changed, 7 insertions(+)
 
 diff --git a/linux-headers/asm-x86/kvm_para.h 
 b/linux-headers/asm-x86/kvm_para.h
 index f2ac46a..a1c3d72 100644
 --- a/linux-headers/asm-x86/kvm_para.h
 +++ b/linux-headers/asm-x86/kvm_para.h
 @@ -22,6 +22,7 @@
  #define KVM_FEATURE_CLOCKSOURCE23
  #define KVM_FEATURE_ASYNC_PF 4
  #define KVM_FEATURE_STEAL_TIME   5
 +#define KVM_FEATURE_PV_EOI   6
  
  /* The last 8 bits are used to indicate how to interpret the flags field
   * in pvclock structure. If no bits are set, all flags are ignored.
 @@ -37,6 +38,7 @@
  #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
  #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
  #define MSR_KVM_STEAL_TIME  0x4b564d03
 +#define MSR_KVM_PV_EOI_EN  0x4b564d04
  
  struct kvm_steal_time {
   __u64 steal;
 @@ -89,5 +91,10 @@ struct kvm_vcpu_pv_apf_data {
   __u32 enabled;
  };
  
 +#define KVM_PV_EOI_BIT 0
 +#define KVM_PV_EOI_MASK (0x1  KVM_PV_EOI_BIT)
 +#define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
 +#define KVM_PV_EOI_DISABLED 0x0
 +
  
  #endif /* _ASM_X86_KVM_PARA_H */
 




signature.asc
Description: OpenPGP digital signature


[PATCH 0/4] uq/master: Add classic PCI device assignment

2012-08-27 Thread Jan Kiszka
I'm proud to present probably the last patch series to merge qemu-kvm
into upstream: This one adds PCI device assignment for x86 using the
classic interface that the KVM model provides. See the last patch for
reasons why we still want this while next-generation device assignment
via VFIO is approaching.

It's been a long journey, but once this is merged, I think we can close
the qemu-kvm chapter. I already did so, all work is based on QEMU now.

Jan Kiszka (4):
  kvm: Introduce kvm_irqchip_update_msi_route
  kvm: Introduce kvm_has_intx_set_mask
  kvm: i386: Add services required for PCI device assignment
  kvm: i386: Add classic PCI device assignment

 hw/kvm/Makefile.objs   |2 +-
 hw/kvm/pci-assign.c| 1929 
 kvm-all.c  |   50 ++
 kvm.h  |2 +
 target-i386/kvm.c  |  141 
 target-i386/kvm_i386.h |   22 +
 6 files changed, 2145 insertions(+), 1 deletions(-)
 create mode 100644 hw/kvm/pci-assign.c

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] kvm: Introduce kvm_has_intx_set_mask

2012-08-27 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Will be used by PCI device assignment code.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 kvm-all.c |8 
 kvm.h |1 +
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index fd9d9b4..84d4f7f 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -88,6 +88,7 @@ struct KVMState
 int pit_state2;
 int xsave, xcrs;
 int many_ioeventfds;
+int intx_set_mask;
 /* The man page (and posix) say ioctl numbers are signed int, but
  * they're not.  Linux, glibc and *BSD all treat ioctl numbers as
  * unsigned, and treating them as signed here can break things */
@@ -1387,6 +1388,8 @@ int kvm_init(void)
 s-irq_set_ioctl = KVM_IRQ_LINE_STATUS;
 }
 
+s-intx_set_mask = kvm_check_extension(s, KVM_CAP_PCI_2_3);
+
 ret = kvm_arch_init(s);
 if (ret  0) {
 goto err;
@@ -1739,6 +1742,11 @@ int kvm_has_gsi_routing(void)
 #endif
 }
 
+int kvm_has_intx_set_mask(void)
+{
+return kvm_state-intx_set_mask;
+}
+
 void *kvm_vmalloc(ram_addr_t size)
 {
 #ifdef TARGET_S390X
diff --git a/kvm.h b/kvm.h
index 5cefe3a..dea2998 100644
--- a/kvm.h
+++ b/kvm.h
@@ -117,6 +117,7 @@ int kvm_has_xcrs(void);
 int kvm_has_pit_state2(void);
 int kvm_has_many_ioeventfds(void);
 int kvm_has_gsi_routing(void);
+int kvm_has_intx_set_mask(void);
 
 #ifdef NEED_CPU_H
 int kvm_init_vcpu(CPUArchState *env);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] kvm: Introduce kvm_irqchip_update_msi_route

2012-08-27 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

This service allows to update an MSI route without releasing/reacquiring
the associated VIRQ. Will be used by PCI device assignment, later on
likely also by virtio/vhost and VFIO.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 kvm-all.c |   42 ++
 kvm.h |1 +
 2 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index d4d8a1f..fd9d9b4 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -963,6 +963,30 @@ static void kvm_add_routing_entry(KVMState *s,
 kvm_irqchip_commit_routes(s);
 }
 
+static int kvm_update_routing_entry(KVMState *s,
+struct kvm_irq_routing_entry *new_entry)
+{
+struct kvm_irq_routing_entry *entry;
+int n;
+
+for (n = 0; n  s-irq_routes-nr; n++) {
+entry = s-irq_routes-entries[n];
+if (entry-gsi != new_entry-gsi) {
+continue;
+}
+
+entry-type = new_entry-type;
+entry-flags = new_entry-flags;
+entry-u = new_entry-u;
+
+kvm_irqchip_commit_routes(s);
+
+return 0;
+}
+
+return -ESRCH;
+}
+
 void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int pin)
 {
 struct kvm_irq_routing_entry e;
@@ -1125,6 +1149,24 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage 
msg)
 return virq;
 }
 
+int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg)
+{
+struct kvm_irq_routing_entry kroute;
+
+if (!kvm_irqchip_in_kernel()) {
+return -ENOSYS;
+}
+
+kroute.gsi = virq;
+kroute.type = KVM_IRQ_ROUTING_MSI;
+kroute.flags = 0;
+kroute.u.msi.address_lo = (uint32_t)msg.address;
+kroute.u.msi.address_hi = msg.address  32;
+kroute.u.msi.data = msg.data;
+
+return kvm_update_routing_entry(s, kroute);
+}
+
 static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, int virq, bool assign)
 {
 struct kvm_irqfd irqfd = {
diff --git a/kvm.h b/kvm.h
index 37d1f81..5cefe3a 100644
--- a/kvm.h
+++ b/kvm.h
@@ -270,6 +270,7 @@ int kvm_set_ioeventfd_mmio(int fd, uint32_t adr, uint32_t 
val, bool assign,
 int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool 
assign);
 
 int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg);
+int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg);
 void kvm_irqchip_release_virq(KVMState *s, int virq);
 
 int kvm_irqchip_add_irqfd_notifier(KVMState *s, EventNotifier *n, int virq);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] kvm: i386: Add services required for PCI device assignment

2012-08-27 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

These helpers abstract the interaction of upcoming pci-assign with the
KVM kernel services. Put them under i386 only as other archs will
implement device pass-through via VFIO and not this classic interface.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 target-i386/kvm.c  |  141 
 target-i386/kvm_i386.h |   22 
 2 files changed, 163 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 696b14a..5e2d4f5 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -31,6 +31,7 @@
 #include hw/apic.h
 #include ioport.h
 #include hyperv.h
+#include hw/pci.h
 
 //#define DEBUG_KVM
 
@@ -2055,3 +2056,143 @@ void kvm_arch_init_irq_routing(KVMState *s)
 kvm_msi_via_irqfd_allowed = true;
 kvm_gsi_routing_allowed = true;
 }
+
+/* Classic KVM device assignment interface. Will remain x86 only. */
+int kvm_device_pci_assign(KVMState *s, PCIHostDeviceAddress *dev_addr,
+  uint32_t flags, uint32_t *dev_id)
+{
+struct kvm_assigned_pci_dev dev_data = {
+.segnr = dev_addr-domain,
+.busnr = dev_addr-bus,
+.devfn = PCI_DEVFN(dev_addr-slot, dev_addr-function),
+.flags = flags,
+};
+int ret;
+
+dev_data.assigned_dev_id =
+(dev_addr-domain  16) | (dev_addr-bus  8) | dev_data.devfn;
+
+ret = kvm_vm_ioctl(s, KVM_ASSIGN_PCI_DEVICE, dev_data);
+if (ret  0) {
+return ret;
+}
+
+*dev_id = dev_data.assigned_dev_id;
+
+return 0;
+}
+
+int kvm_device_pci_deassign(KVMState *s, uint32_t dev_id)
+{
+struct kvm_assigned_pci_dev dev_data = {
+.assigned_dev_id = dev_id,
+};
+
+return kvm_vm_ioctl(s, KVM_DEASSIGN_PCI_DEVICE, dev_data);
+}
+
+static int kvm_assign_irq_internal(KVMState *s, uint32_t dev_id,
+   uint32_t irq_type, uint32_t guest_irq)
+{
+struct kvm_assigned_irq assigned_irq = {
+.assigned_dev_id = dev_id,
+.guest_irq = guest_irq,
+.flags = irq_type,
+};
+
+if (kvm_check_extension(s, KVM_CAP_ASSIGN_DEV_IRQ)) {
+return kvm_vm_ioctl(s, KVM_ASSIGN_DEV_IRQ, assigned_irq);
+} else {
+return kvm_vm_ioctl(s, KVM_ASSIGN_IRQ, assigned_irq);
+}
+}
+
+int kvm_device_intx_assign(KVMState *s, uint32_t dev_id, bool use_host_msi,
+   uint32_t guest_irq)
+{
+uint32_t irq_type = KVM_DEV_IRQ_GUEST_INTX |
+(use_host_msi ? KVM_DEV_IRQ_HOST_MSI : KVM_DEV_IRQ_HOST_INTX);
+
+return kvm_assign_irq_internal(s, dev_id, irq_type, guest_irq);
+}
+
+int kvm_device_intx_set_mask(KVMState *s, uint32_t dev_id, bool masked)
+{
+struct kvm_assigned_pci_dev dev_data = {
+.assigned_dev_id = dev_id,
+.flags = masked ? KVM_DEV_ASSIGN_MASK_INTX : 0,
+};
+
+return kvm_vm_ioctl(s, KVM_ASSIGN_SET_INTX_MASK, dev_data);
+}
+
+static int kvm_deassign_irq_internal(KVMState *s, uint32_t dev_id,
+ uint32_t type)
+{
+struct kvm_assigned_irq assigned_irq = {
+.assigned_dev_id = dev_id,
+.flags = type,
+};
+
+return kvm_vm_ioctl(s, KVM_DEASSIGN_DEV_IRQ, assigned_irq);
+}
+
+int kvm_device_intx_deassign(KVMState *s, uint32_t dev_id, bool use_host_msi)
+{
+return kvm_deassign_irq_internal(s, dev_id, KVM_DEV_IRQ_GUEST_INTX |
+(use_host_msi ? KVM_DEV_IRQ_HOST_MSI : KVM_DEV_IRQ_HOST_INTX));
+}
+
+int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, int virq)
+{
+return kvm_assign_irq_internal(s, dev_id, KVM_DEV_IRQ_HOST_MSI |
+  KVM_DEV_IRQ_GUEST_MSI, virq);
+}
+
+int kvm_device_msi_deassign(KVMState *s, uint32_t dev_id)
+{
+return kvm_deassign_irq_internal(s, dev_id, KVM_DEV_IRQ_GUEST_MSI |
+KVM_DEV_IRQ_HOST_MSI);
+}
+
+bool kvm_device_msix_supported(KVMState *s)
+{
+/* The kernel lacks a corresponding KVM_CAP, so we probe by calling
+ * KVM_ASSIGN_SET_MSIX_NR with an invalid parameter. */
+return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, NULL) == -EFAULT;
+}
+
+int kvm_device_msix_init_vectors(KVMState *s, uint32_t dev_id,
+ uint32_t nr_vectors)
+{
+struct kvm_assigned_msix_nr msix_nr = {
+.assigned_dev_id = dev_id,
+.entry_nr = nr_vectors,
+};
+
+return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, msix_nr);
+}
+
+int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t vector,
+   int virq)
+{
+struct kvm_assigned_msix_entry msix_entry = {
+.assigned_dev_id = dev_id,
+.gsi = virq,
+.entry = vector,
+};
+
+return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_ENTRY, msix_entry);
+}
+
+int kvm_device_msix_assign(KVMState *s, uint32_t dev_id)
+{
+return kvm_assign_irq_internal(s, dev_id, KVM_DEV_IRQ_HOST_MSIX |
+   

Export offsets of VMCS fields as note information for kdump

2012-08-27 Thread Zhang Yanfei
Hello Avi,

About this VMCSINFO patch, we really need this functionality in our development.
And YOSHIDA Masanori(masanori.yoshida...@hitachi.com), the developer from 
Hitachi,
has said they need this too. So could you please tell us why the patch is 
unacceptable?
You dislike the whole export-VMCSINFO-thing in all, or you just dislike the way
we implement the path? Finally do you have any suggestion about all this?

Below is why we need this patch and how we will use this patch in our 
development.

We once came to an abnormal situation: a host scheduler bug caused guest 
machine's
vcpu stopped for a long time and then led to heartbeat stop (host is still 
running).
 
We want to have an efficient way to make the bug analysis when we come to the 
similar
situations where guest machine doesn't work well due to something of host 
machine's.
Actually, these situations have happened many times, in particular, under 
development.
  
So here comes the requirement:
If we want to find the root cause, we should debug both host machine's and guest
machine's sides. But first we should get both host machine's crash dump and 
guest
machine's crash dump and they must be dumped at the same time when the abnormal
situation remains. So the only way to do this is to panic the host with the 
abnormal
guest running on it and then the guest's image is contained in host's crash 
dump.

Logically, retrieving guest's crash dump from the host's crash dump is the very
important step to accomplish our goal. Unfortunately, in kvm implementation, 
some
registers' values of the guest are hidden in vmcs, and vmcs internal is hidden 
by
Intel. If we could not retrieve these registers from the vmcs, the guest crash 
dump
we make is incomplete, and some key information is lost when we analyse the 
guest
crash dump. 

So we make this patch to export the vmcs internal. With the patch applied, we
could write registers' values stored in vmcs into guest's crash dump. And that's
what we want.
  
If a bug was found on customer's environment, we have two ways to avoid
affecting other guest machines running on the same host. First, we could do bug
analysis on another environment to reproduce the buggy situation; Second, we
could migrate other guest machines to other hosts.

After the abnormal situation is reproduced, we panic the host *manually*.
Then we could use userland tools to get guest machine's crash dump from host 
machine's
with the feature provided by this patch. Finally we could analyse them 
separately
to find which side causes the problem.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/3] KVM: move postcommit flush to x86, as mmio sptes are x86 specific

2012-08-27 Thread Xiao Guangrong
On 08/25/2012 02:54 AM, Marcelo Tosatti wrote:
 Other arches do not need this.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 Index: kvm/arch/x86/kvm/x86.c
 ===
 --- kvm.orig/arch/x86/kvm/x86.c
 +++ kvm/arch/x86/kvm/x86.c
 @@ -6455,6 +6455,14 @@ void kvm_arch_commit_memory_region(struc
   kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
   kvm_mmu_slot_remove_write_access(kvm, mem-slot);
   spin_unlock(kvm-mmu_lock);
 + /*
 +  * If the new memory slot is created, we need to clear all
 +  * mmio sptes.
 +  */
 + if (old.npages == 0  npages) {
 + kvm_mmu_zap_all(kvm);
 + kvm_reload_remote_mmus(kvm);
 + }

Can not use kvm_arch_flush_shadow_all()?

Others are fine to me.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: registering ioeventfd in qemu/kvm

2012-08-27 Thread Paolo Bonzini
Il 23/08/2012 05:35, Shesha Sreenivasamurthy ha scritto:
 Hi,
 I am trying to generate eventfd upon a IO write from the guest, say it
 is at offset IO_NOTIFY_REG (0x10). When the guest writes to this
 register, I get control to QEMU's to the write function associated in
 mypci_iomem_ops. However, instead of this I would like to register an
 eventfd.
 
 To achieve that, first I tried:
memory_region_add_eventfd(mypci-bar_iomem, IO_NOTIFY_REG, 4,
 true, 1, fd);

This is the right way.  You can look (in the git tree of QEMU) at
hw/ivshmem.c, which is the simplest user of the eventfd API.

Note that recently the API was changed to accept an EventNotifier rather
than the raw eventfd.

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: Timekeeping between Host and Guest with NTP

2012-08-27 Thread Aritoki TAKADA

Hello,

(2012/08/25 0:00), Marcelo Tosatti wrote:
snip


kvmclock driver has access to the ntpd corrected frequency of the host, but:

1) kvmclock time as reported to the guest uses the TSC as an offset in
addition to the host monotonic clock, TSC is susceptible to frequency
variations.

The guest has its own timekeeping (it accumulates time from kvmclock,
at every timer interrupt). The algorithm is not
perfect, and its suspectible to small variations.

These add up over time.

2) Corrections to UTC, such as leap seconds, are not reflected to the
host monotonic clock. NTP algorithm in the guest is responsible for
synchronization to UTC.


I see, I understood the pitfalls of the guest only syncing to kvmclock,
and now NTP on the guest seems simple and reasonable for me.
Thank you again for your detailed explanation.

Sincerely,

---
Aritoki TAKADA
aritoki.takada...@hitachi.com
Hitachi, Ltd., Yokohama Research Laboratory
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 0/3] KVM: perf: kvm events analysis tool

2012-08-27 Thread Dong Hao
From: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com

Changelog:
- rebased it on Arnaldo's newest git tree perf/core branch

the change from Arnaldo's comments:
- directly get event from evsel-tp_format
- remove die() and return the proper error code
- rename thread-private to thread-priv

the change from David's comments:
- use is_valid_tracepoint instead of kvm_events_exist 

This patchset introduces a perf-based tool (perf kvm stat record/report)
which can analyze kvm events more smartly. Below is the presentation slice
on 2012 Japan LinuxCon:
http://events.linuxfoundation.org/images/stories/pdf/lcjp2012_guangrong.pdf
You can get more details from it. If any questions/comments, please feel free
to let us know.

This patchset is based on Arnaldo's git tree perf/core branch, and patch 2
is just doing the improvement work, which can be picked up independently.


Usage:
- kvm stat
  run a command and gather performance counter statistics, it is the alias of
  perf stat

- trace kvm events:
  perf kvm stat record, or, if other tracepoints are interesting as well, we
  can append the events like this:
  perf kvm stat record -e kvm:*
  If many guests are running, we can track the specified guest by using -p or
  --pid

- show the result:
  perf kvm stat report

The output example is following:
# pgrep qemu-kvm
26071
32253
32564

total 3 guests are running on the host

Then, track the guest whose pid is 26071:
# ./perf kvm stat record -p 26071
^C[ perf record: Woken up 9 times to write data ]
[ perf record: Captured and wrote 24.903 MB perf.data.guest (~1088034 samples) ]

See the vmexit events:
# ./perf kvm stat report --event=vmexit

Analyze events for all VCPUs:

 VM-EXITSamples  Samples% Time% Avg time

 APIC_ACCESS  6538166.58% 5.95% 37.72us ( +-   6.54% )
  EXTERNAL_INTERRUPT  1603116.32% 3.06% 79.11us ( +-   7.34% )
   CPUID   5360 5.46% 0.06%  4.50us ( +-  35.07% )
 HLT   4496 4.58%90.75%   8360.34us ( +-   5.22% )
   EPT_VIOLATION   2667 2.72% 0.04%  5.49us ( +-   5.05% )
   PENDING_INTERRUPT   2242 2.28% 0.03%  5.25us ( +-   2.96% )
   EXCEPTION_NMI   1332 1.36% 0.02%  6.53us ( +-   6.51% )
  IO_INSTRUCTION383 0.39% 0.09% 93.39us ( +-  40.92% )
   CR_ACCESS310 0.32% 0.00%  6.10us ( +-   3.95% )

Total Samples:98202, Total events handled time:41419293.63us.

See the mmio events:
# ./perf kvm stat report --event=mmio

Analyze events for all VCPUs:

 MMIO AccessSamples  Samples% Time% Avg time

0xfee00380:W  5868690.21%15.67%  4.95us ( +-   2.96% )
0xfee00300:R   2124 3.26% 1.48% 12.93us ( +-  14.75% )
0xfee00310:W   2124 3.26% 0.34%  3.00us ( +-   1.33% )
0xfee00300:W   2123 3.26%82.50%720.68us ( +-  10.24% )

Total Samples:65057, Total events handled time:1854470.45us.

See the ioport event:
# ./perf kvm stat report --event=ioport

Analyze events for all VCPUs:

  IO Port AccessSamples  Samples% Time% Avg time

 0xc090:POUT383   100.00%   100.00% 89.00us ( +-  42.94% )

Total Samples:383, Total events handled time:34085.56us.

And, --vcpu is used to track the specified vcpu and --key is used to sort the
result:
# ./perf kvm stat report --event=vmexit --vcpu=0 --key=time

Analyze events for VCPU 0:

 VM-EXITSamples  Samples% Time% Avg time

 HLT551 5.05%94.81%   9501.72us ( +-  12.52% )
  EXTERNAL_INTERRUPT   139012.74% 2.39% 94.80us ( +-  20.92% )
 APIC_ACCESS   618656.68% 2.62% 23.41us ( +-  23.62% )
  IO_INSTRUCTION 17 0.16% 0.01% 20.39us ( +-  22.33% )
   EXCEPTION_NMI 94 0.86% 0.01%  6.07us ( +-   7.13% )
   PENDING_INTERRUPT199 1.82% 0.02%  5.48us ( +-   4.36% )
   CR_ACCESS 52 0.48% 0.00%  4.89us ( +-   4.09% )
   EPT_VIOLATION   205718.85% 0.12%  3.15us ( +-   1.33% )
   CPUID368 3.37% 0.02%  2.82us ( +-   2.79% )

Total Samples:10914, Total events handled time:5521782.02us.


Dong Hao (3):
  KVM: x86: export svm/vmx exit code and vector code to userspace
  KVM: x86: trace mmio begin and complete
  KVM: perf: kvm events analysis tool

 arch/x86/include/asm/kvm_host.h   |   36 +-
 arch/x86/include/asm/svm.h|  205 +---
 arch/x86/include/asm/vmx.h|  126 +++--
 arch/x86/kvm/trace.h  |   89 
 arch/x86/kvm/x86.c|   32 +-
 include/trace/events/kvm.h|   37 ++
 tools/perf/Documentation/perf-kvm.txt |   30 +-
 tools/perf/MANIFEST   |3 +
 tools/perf/builtin-kvm.c 

[PATCH v7 1/3] KVM: x86: export svm/vmx exit code and vector code to userspace

2012-08-27 Thread Dong Hao
From: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com

Exporting KVM exit information to userspace to be consumed by perf.

[ Dong Hao haod...@linux.vnet.ibm.com: rebase it on acme's git tree ]
Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
Signed-off-by: Dong Hao haod...@linux.vnet.ibm.com
---
 arch/x86/include/asm/kvm_host.h |   36 ---
 arch/x86/include/asm/svm.h  |  205 +--
 arch/x86/include/asm/vmx.h  |  126 
 arch/x86/kvm/trace.h|   89 -
 4 files changed, 234 insertions(+), 222 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09155d6..ad2d229 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -11,6 +11,24 @@
 #ifndef _ASM_X86_KVM_HOST_H
 #define _ASM_X86_KVM_HOST_H
 
+#define DE_VECTOR 0
+#define DB_VECTOR 1
+#define BP_VECTOR 3
+#define OF_VECTOR 4
+#define BR_VECTOR 5
+#define UD_VECTOR 6
+#define NM_VECTOR 7
+#define DF_VECTOR 8
+#define TS_VECTOR 10
+#define NP_VECTOR 11
+#define SS_VECTOR 12
+#define GP_VECTOR 13
+#define PF_VECTOR 14
+#define MF_VECTOR 16
+#define MC_VECTOR 18
+
+#ifdef __KERNEL__
+
 #include linux/types.h
 #include linux/mm.h
 #include linux/mmu_notifier.h
@@ -75,22 +93,6 @@
 #define KVM_HPAGE_MASK(x)  (~(KVM_HPAGE_SIZE(x) - 1))
 #define KVM_PAGES_PER_HPAGE(x) (KVM_HPAGE_SIZE(x) / PAGE_SIZE)
 
-#define DE_VECTOR 0
-#define DB_VECTOR 1
-#define BP_VECTOR 3
-#define OF_VECTOR 4
-#define BR_VECTOR 5
-#define UD_VECTOR 6
-#define NM_VECTOR 7
-#define DF_VECTOR 8
-#define TS_VECTOR 10
-#define NP_VECTOR 11
-#define SS_VECTOR 12
-#define GP_VECTOR 13
-#define PF_VECTOR 14
-#define MF_VECTOR 16
-#define MC_VECTOR 18
-
 #define SELECTOR_TI_MASK (1  2)
 #define SELECTOR_RPL_MASK 0x03
 
@@ -994,4 +996,6 @@ int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, 
u64 *data);
 void kvm_handle_pmu_event(struct kvm_vcpu *vcpu);
 void kvm_deliver_pmi(struct kvm_vcpu *vcpu);
 
+#endif
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index f2b83bc..cdf5674 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -1,6 +1,135 @@
 #ifndef __SVM_H
 #define __SVM_H
 
+#define SVM_EXIT_READ_CR0  0x000
+#define SVM_EXIT_READ_CR3  0x003
+#define SVM_EXIT_READ_CR4  0x004
+#define SVM_EXIT_READ_CR8  0x008
+#define SVM_EXIT_WRITE_CR0 0x010
+#define SVM_EXIT_WRITE_CR3 0x013
+#define SVM_EXIT_WRITE_CR4 0x014
+#define SVM_EXIT_WRITE_CR8 0x018
+#define SVM_EXIT_READ_DR0  0x020
+#define SVM_EXIT_READ_DR1  0x021
+#define SVM_EXIT_READ_DR2  0x022
+#define SVM_EXIT_READ_DR3  0x023
+#define SVM_EXIT_READ_DR4  0x024
+#define SVM_EXIT_READ_DR5  0x025
+#define SVM_EXIT_READ_DR6  0x026
+#define SVM_EXIT_READ_DR7  0x027
+#define SVM_EXIT_WRITE_DR0 0x030
+#define SVM_EXIT_WRITE_DR1 0x031
+#define SVM_EXIT_WRITE_DR2 0x032
+#define SVM_EXIT_WRITE_DR3 0x033
+#define SVM_EXIT_WRITE_DR4 0x034
+#define SVM_EXIT_WRITE_DR5 0x035
+#define SVM_EXIT_WRITE_DR6 0x036
+#define SVM_EXIT_WRITE_DR7 0x037
+#define SVM_EXIT_EXCP_BASE 0x040
+#define SVM_EXIT_INTR  0x060
+#define SVM_EXIT_NMI   0x061
+#define SVM_EXIT_SMI   0x062
+#define SVM_EXIT_INIT  0x063
+#define SVM_EXIT_VINTR 0x064
+#define SVM_EXIT_CR0_SEL_WRITE 0x065
+#define SVM_EXIT_IDTR_READ 0x066
+#define SVM_EXIT_GDTR_READ 0x067
+#define SVM_EXIT_LDTR_READ 0x068
+#define SVM_EXIT_TR_READ   0x069
+#define SVM_EXIT_IDTR_WRITE0x06a
+#define SVM_EXIT_GDTR_WRITE0x06b
+#define SVM_EXIT_LDTR_WRITE0x06c
+#define SVM_EXIT_TR_WRITE  0x06d
+#define SVM_EXIT_RDTSC 0x06e
+#define SVM_EXIT_RDPMC 0x06f
+#define SVM_EXIT_PUSHF 0x070
+#define SVM_EXIT_POPF  0x071
+#define SVM_EXIT_CPUID 0x072
+#define SVM_EXIT_RSM   0x073
+#define SVM_EXIT_IRET  0x074
+#define SVM_EXIT_SWINT 0x075
+#define SVM_EXIT_INVD  0x076
+#define SVM_EXIT_PAUSE 0x077
+#define SVM_EXIT_HLT   0x078
+#define SVM_EXIT_INVLPG0x079
+#define SVM_EXIT_INVLPGA   0x07a
+#define SVM_EXIT_IOIO  0x07b
+#define SVM_EXIT_MSR   0x07c
+#define SVM_EXIT_TASK_SWITCH   0x07d
+#define SVM_EXIT_FERR_FREEZE   0x07e
+#define SVM_EXIT_SHUTDOWN  0x07f
+#define SVM_EXIT_VMRUN 0x080
+#define SVM_EXIT_VMMCALL   0x081
+#define SVM_EXIT_VMLOAD0x082
+#define SVM_EXIT_VMSAVE0x083
+#define SVM_EXIT_STGI  0x084
+#define SVM_EXIT_CLGI  0x085
+#define SVM_EXIT_SKINIT0x086
+#define SVM_EXIT_RDTSCP0x087
+#define SVM_EXIT_ICEBP 0x088
+#define SVM_EXIT_WBINVD0x089
+#define SVM_EXIT_MONITOR   0x08a
+#define SVM_EXIT_MWAIT 0x08b
+#define SVM_EXIT_MWAIT_COND0x08c
+#define SVM_EXIT_XSETBV0x08d
+#define 

[PATCH v7 2/3] KVM: x86: trace mmio begin and complete

2012-08-27 Thread Dong Hao
From: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com

'perf kvm stat record/report' will use kvm_exit and kvm_mmio(read...) to
calculate mmio read emulated time for the old kernel, in order to trace
mmio read event more exactly, we add kvm_mmio_begin to trace the time when
mmio read begins, also, add kvm_io_done to trace the time when mmio/pio is
completed

[ Dong Hao haod...@linux.vnet.ibm.com: rebase it on current kvm tree ]
Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
Signed-off-by: Dong Hao haod...@linux.vnet.ibm.com
---
 arch/x86/kvm/x86.c |   32 
 include/trace/events/kvm.h |   37 +
 2 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42bce48..b90394d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3828,9 +3828,12 @@ mmio:
/*
 * Is this MMIO handled locally?
 */
+   trace_kvm_mmio_begin(vcpu-vcpu_id, write, gpa);
handled = ops-read_write_mmio(vcpu, gpa, bytes, val);
-   if (handled == bytes)
+   if (handled == bytes) {
+   trace_kvm_io_done(vcpu-vcpu_id);
return X86EMUL_CONTINUE;
+   }
 
gpa += handled;
bytes -= handled;
@@ -4025,6 +4028,7 @@ static int emulator_pio_in_out(struct kvm_vcpu *vcpu, int 
size,
vcpu-arch.pio.size = size;
 
if (!kernel_pio(vcpu, vcpu-arch.pio_data)) {
+   trace_kvm_io_done(vcpu-vcpu_id);
vcpu-arch.pio.count = 0;
return 1;
}
@@ -4625,9 +4629,7 @@ restart:
inject_emulated_exception(vcpu);
r = EMULATE_DONE;
} else if (vcpu-arch.pio.count) {
-   if (!vcpu-arch.pio.in)
-   vcpu-arch.pio.count = 0;
-   else
+   if (vcpu-arch.pio.in)
writeback = false;
r = EMULATE_DO_MMIO;
} else if (vcpu-mmio_needed) {
@@ -4658,8 +4660,6 @@ int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, 
unsigned short port)
unsigned long val = kvm_register_read(vcpu, VCPU_REGS_RAX);
int ret = emulator_pio_out_emulated(vcpu-arch.emulate_ctxt,
size, port, val, 1);
-   /* do not return to emulator after return from userspace */
-   vcpu-arch.pio.count = 0;
return ret;
 }
 EXPORT_SYMBOL_GPL(kvm_fast_pio_out);
@@ -5509,11 +5509,16 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
 {
struct kvm_run *run = vcpu-run;
struct kvm_mmio_fragment *frag;
-   int r;
+   int r = 1;
 
if (!(vcpu-arch.pio.count || vcpu-mmio_needed))
return 1;
 
+   if (vcpu-arch.pio.count  !vcpu-arch.pio.in) {
+   vcpu-arch.pio.count = 0;
+   goto exit;
+   }
+
if (vcpu-mmio_needed) {
/* Complete previous fragment */
frag = vcpu-mmio_fragments[vcpu-mmio_cur_fragment++];
@@ -5521,8 +5526,10 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
memcpy(frag-data, run-mmio.data, frag-len);
if (vcpu-mmio_cur_fragment == vcpu-mmio_nr_fragments) {
vcpu-mmio_needed = 0;
+
if (vcpu-mmio_is_write)
-   return 1;
+   goto exit;
+
vcpu-mmio_read_completed = 1;
goto done;
}
@@ -5539,11 +5546,12 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
}
 done:
vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
-   r = emulate_instruction(vcpu, EMULTYPE_NO_DECODE);
+   r = emulate_instruction(vcpu, EMULTYPE_NO_DECODE) == EMULATE_DONE;
srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);
-   if (r != EMULATE_DONE)
-   return 0;
-   return 1;
+
+exit:
+   trace_kvm_io_done(vcpu-vcpu_id);
+   return r;
 }
 
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 7ef9e75..d4182fa 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -177,6 +177,43 @@ TRACE_EVENT(kvm_mmio,
  __entry-len, __entry-gpa, __entry-val)
 );
 
+TRACE_EVENT(kvm_mmio_begin,
+   TP_PROTO(unsigned int vcpu_id, bool rw, u64 gpa),
+   TP_ARGS(vcpu_id, rw, gpa),
+
+   TP_STRUCT__entry(
+   __field(unsigned int, vcpu_id)
+   __field(int, type)
+   __field(u64, gpa)
+   ),
+
+   TP_fast_assign(
+   __entry-vcpu_id = vcpu_id;
+   __entry-type = rw ? KVM_TRACE_MMIO_WRITE :
+ KVM_TRACE_MMIO_READ;
+   __entry-gpa = gpa;
+   ),
+
+   TP_printk(vcpu %u mmio %s gpa 0x%llx, __entry-vcpu_id,
+ 

[PATCH v7 3/3] KVM: perf: kvm events analysis tool

2012-08-27 Thread Dong Hao
From: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com

Add 'perf kvm stat' support to analyze kvm vmexit/mmio/ioport smartly

Usage:
- kvm stat
  run a command and gather performance counter statistics, it is the alias of
  perf stat

- trace kvm events:
  perf kvm stat record, or, if other tracepoints are interesting as well, we
  can append the events like this:
  perf kvm stat record -e timer:*
  If many guests are running, we can track the specified guest by using -p or
  --pid

- show the result:
  perf kvm stat report

The output example is following:
# pgrep qemu-kvm
26071
32253
32564

total 3 guests are running on the host

Then, track the guest whose pid is 26071:
# ./perf kvm stat record -p 26071
^C[ perf record: Woken up 9 times to write data ]
[ perf record: Captured and wrote 24.903 MB perf.data.guest (~1088034 samples) ]

See the vmexit events:
# ./perf kvm stat report --event=vmexit

Analyze events for all VCPUs:

 VM-EXITSamples  Samples% Time% Avg time

 APIC_ACCESS  6538166.58% 5.95% 37.72us ( +-   6.54% )
  EXTERNAL_INTERRUPT  1603116.32% 3.06% 79.11us ( +-   7.34% )
   CPUID   5360 5.46% 0.06%  4.50us ( +-  35.07% )
 HLT   4496 4.58%90.75%   8360.34us ( +-   5.22% )
   EPT_VIOLATION   2667 2.72% 0.04%  5.49us ( +-   5.05% )
   PENDING_INTERRUPT   2242 2.28% 0.03%  5.25us ( +-   2.96% )
   EXCEPTION_NMI   1332 1.36% 0.02%  6.53us ( +-   6.51% )
  IO_INSTRUCTION383 0.39% 0.09% 93.39us ( +-  40.92% )
   CR_ACCESS310 0.32% 0.00%  6.10us ( +-   3.95% )

Total Samples:98202, Total events handled time:41419293.63us.

See the mmio events:
# ./perf kvm stat report --event=mmio

Analyze events for all VCPUs:

 MMIO AccessSamples  Samples% Time% Avg time

0xfee00380:W  5868690.21%15.67%  4.95us ( +-   2.96% )
0xfee00300:R   2124 3.26% 1.48% 12.93us ( +-  14.75% )
0xfee00310:W   2124 3.26% 0.34%  3.00us ( +-   1.33% )
0xfee00300:W   2123 3.26%82.50%720.68us ( +-  10.24% )

Total Samples:65057, Total events handled time:1854470.45us.

See the ioport event:
# ./perf kvm stat report --event=ioport

Analyze events for all VCPUs:

  IO Port AccessSamples  Samples% Time% Avg time

 0xc090:POUT383   100.00%   100.00% 89.00us ( +-  42.94% )

Total Samples:383, Total events handled time:34085.56us.

And, --vcpu is used to track the specified vcpu and --key is used to sort the
result:
# ./perf kvm stat report --event=vmexit --vcpu=0 --key=time

Analyze events for VCPU 0:

 VM-EXITSamples  Samples% Time% Avg time

 HLT551 5.05%94.81%   9501.72us ( +-  12.52% )
  EXTERNAL_INTERRUPT   139012.74% 2.39% 94.80us ( +-  20.92% )
 APIC_ACCESS   618656.68% 2.62% 23.41us ( +-  23.62% )
  IO_INSTRUCTION 17 0.16% 0.01% 20.39us ( +-  22.33% )
   EXCEPTION_NMI 94 0.86% 0.01%  6.07us ( +-   7.13% )
   PENDING_INTERRUPT199 1.82% 0.02%  5.48us ( +-   4.36% )
   CR_ACCESS 52 0.48% 0.00%  4.89us ( +-   4.09% )
   EPT_VIOLATION   205718.85% 0.12%  3.15us ( +-   1.33% )
   CPUID368 3.37% 0.02%  2.82us ( +-   2.79% )

Total Samples:10914, Total events handled time:5521782.02us.

[ Dong Hao haod...@linux.vnet.ibm.com:
 - rebase it on current acme's tree
 - fix the compiling-error on i386
]

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
Signed-off-by: Dong Hao haod...@linux.vnet.ibm.com
---
 tools/perf/Documentation/perf-kvm.txt |   30 +-
 tools/perf/MANIFEST   |3 +
 tools/perf/builtin-kvm.c  |  889 -
 tools/perf/util/header.c  |   54 ++-
 tools/perf/util/header.h  |1 +
 tools/perf/util/thread.h  |2 +
 6 files changed, 973 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-kvm.txt 
b/tools/perf/Documentation/perf-kvm.txt
index dd84cb2..326f2cb 100644
--- a/tools/perf/Documentation/perf-kvm.txt
+++ b/tools/perf/Documentation/perf-kvm.txt
@@ -12,7 +12,7 @@ SYNOPSIS
[--guestkallsyms=path --guestmodules=path | --guestvmlinux=path]]
{top|record|report|diff|buildid-list}
 'perf kvm' [--host] [--guest] [--guestkallsyms=path --guestmodules=path
-   | --guestvmlinux=path] {top|record|report|diff|buildid-list}
+   | --guestvmlinux=path] {top|record|report|diff|buildid-list|stat}
 
 DESCRIPTION
 ---
@@ -38,6 +38,18 @@ There are a couple of variants of perf kvm:
   so that other tools can be used to fetch packages with matching 

Re: [PATCH v7 0/3] KVM: perf: kvm events analysis tool

2012-08-27 Thread Xiao Guangrong
CC David.

Hi David,

I should apologize to you that Dong forgot to post the patchset
to you. Could you pick these up from the mail list?

On 08/27/2012 05:51 PM, Dong Hao wrote:
 From: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 
 Changelog:
 - rebased it on Arnaldo's newest git tree perf/core branch
 
 the change from Arnaldo's comments:
 - directly get event from evsel-tp_format
 - remove die() and return the proper error code
 - rename thread-private to thread-priv
 
 the change from David's comments:
 - use is_valid_tracepoint instead of kvm_events_exist 
 
 This patchset introduces a perf-based tool (perf kvm stat record/report)
 which can analyze kvm events more smartly. Below is the presentation slice
 on 2012 Japan LinuxCon:
 http://events.linuxfoundation.org/images/stories/pdf/lcjp2012_guangrong.pdf
 You can get more details from it. If any questions/comments, please feel free
 to let us know.
 
 This patchset is based on Arnaldo's git tree perf/core branch, and patch 2
 is just doing the improvement work, which can be picked up independently.
 
 
 Usage:
 - kvm stat
   run a command and gather performance counter statistics, it is the alias of
   perf stat
 
 - trace kvm events:
   perf kvm stat record, or, if other tracepoints are interesting as well, we
   can append the events like this:
   perf kvm stat record -e kvm:*
   If many guests are running, we can track the specified guest by using -p or
   --pid
 
 - show the result:
   perf kvm stat report
 
 The output example is following:
 # pgrep qemu-kvm
 26071
 32253
 32564
 
 total 3 guests are running on the host
 
 Then, track the guest whose pid is 26071:
 # ./perf kvm stat record -p 26071
 ^C[ perf record: Woken up 9 times to write data ]
 [ perf record: Captured and wrote 24.903 MB perf.data.guest (~1088034 
 samples) ]
 
 See the vmexit events:
 # ./perf kvm stat report --event=vmexit
 
 Analyze events for all VCPUs:
 
  VM-EXITSamples  Samples% Time% Avg time
 
  APIC_ACCESS  6538166.58% 5.95% 37.72us ( +-   6.54% )
   EXTERNAL_INTERRUPT  1603116.32% 3.06% 79.11us ( +-   7.34% )
CPUID   5360 5.46% 0.06%  4.50us ( +-  35.07% )
  HLT   4496 4.58%90.75%   8360.34us ( +-   5.22% )
EPT_VIOLATION   2667 2.72% 0.04%  5.49us ( +-   5.05% )
PENDING_INTERRUPT   2242 2.28% 0.03%  5.25us ( +-   2.96% )
EXCEPTION_NMI   1332 1.36% 0.02%  6.53us ( +-   6.51% )
   IO_INSTRUCTION383 0.39% 0.09% 93.39us ( +-  40.92% )
CR_ACCESS310 0.32% 0.00%  6.10us ( +-   3.95% )
 
 Total Samples:98202, Total events handled time:41419293.63us.
 
 See the mmio events:
 # ./perf kvm stat report --event=mmio
 
 Analyze events for all VCPUs:
 
  MMIO AccessSamples  Samples% Time% Avg time
 
 0xfee00380:W  5868690.21%15.67%  4.95us ( +-   2.96% )
 0xfee00300:R   2124 3.26% 1.48% 12.93us ( +-  14.75% )
 0xfee00310:W   2124 3.26% 0.34%  3.00us ( +-   1.33% )
 0xfee00300:W   2123 3.26%82.50%720.68us ( +-  10.24% )
 
 Total Samples:65057, Total events handled time:1854470.45us.
 
 See the ioport event:
 # ./perf kvm stat report --event=ioport
 
 Analyze events for all VCPUs:
 
   IO Port AccessSamples  Samples% Time% Avg time
 
  0xc090:POUT383   100.00%   100.00% 89.00us ( +-  42.94% )
 
 Total Samples:383, Total events handled time:34085.56us.
 
 And, --vcpu is used to track the specified vcpu and --key is used to sort the
 result:
 # ./perf kvm stat report --event=vmexit --vcpu=0 --key=time
 
 Analyze events for VCPU 0:
 
  VM-EXITSamples  Samples% Time% Avg time
 
  HLT551 5.05%94.81%   9501.72us ( +-  12.52% )
   EXTERNAL_INTERRUPT   139012.74% 2.39% 94.80us ( +-  20.92% )
  APIC_ACCESS   618656.68% 2.62% 23.41us ( +-  23.62% )
   IO_INSTRUCTION 17 0.16% 0.01% 20.39us ( +-  22.33% )
EXCEPTION_NMI 94 0.86% 0.01%  6.07us ( +-   7.13% )
PENDING_INTERRUPT199 1.82% 0.02%  5.48us ( +-   4.36% )
CR_ACCESS 52 0.48% 0.00%  4.89us ( +-   4.09% )
EPT_VIOLATION   205718.85% 0.12%  3.15us ( +-   1.33% )
CPUID368 3.37% 0.02%  2.82us ( +-   2.79% )
 
 Total Samples:10914, Total events handled time:5521782.02us.
 
 
 Dong Hao (3):
   KVM: x86: export svm/vmx exit code and vector code to userspace
   KVM: x86: trace mmio begin and complete
   KVM: perf: kvm events analysis tool
 
  arch/x86/include/asm/kvm_host.h   |   36 +-
  arch/x86/include/asm/svm.h|  205 +---
  arch/x86/include/asm/vmx.h   

Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-08-27 Thread Andreas Färber
Hi,

Am 27.08.2012 08:28, schrieb Jan Kiszka:
 From: Jan Kiszka jan.kis...@siemens.com
 
 This adds PCI device assignment for i386 targets using the classic KVM
 interfaces. This version is 100% identical to what is being maintained
 in qemu-kvm for several years and is supported by libvirt as well. It is
 expected to remain relevant for another couple of years until kernels
 without full-features and performance-wise equivalent VFIO support are
 obsolete.
 
 A refactoring to-do that should be done in-tree is to model MSI and
 MSI-X support via the generic PCI layer, similar to what VFIO is already
 doing for MSI-X. This should improve the correctness and clean up the
 code from duplicate logic.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  hw/kvm/Makefile.objs |2 +-
  hw/kvm/pci-assign.c  | 1929 
 ++
  2 files changed, 1930 insertions(+), 1 deletions(-)
  create mode 100644 hw/kvm/pci-assign.c
[...]
 diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c
 new file mode 100644
 index 000..9cce02c
 --- /dev/null
 +++ b/hw/kvm/pci-assign.c
 @@ -0,0 +1,1929 @@
 +/*
 + * Copyright (c) 2007, Neocleus Corporation.
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms and conditions of the GNU General Public License,
 + * version 2, as published by the Free Software Foundation.

The downside of accepting this into qemu.git is that it gets us a huge
blob of GPLv2-only code without history of contributors for GPLv2+
relicensing...

 + *
 + * This program is distributed in the hope it will be useful, but WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + *
 + * You should have received a copy of the GNU General Public License along 
 with
 + * this program; if not, write to the Free Software Foundation, Inc., 59 
 Temple
 + * Place - Suite 330, Boston, MA 02111-1307 USA.

(Expect the usual GNU address reminder here.)

 + *
 + *
 + *  Assign a PCI device from the host to a guest VM.
 + *
 + *  Adapted for KVM by Qumranet.
 + *
 + *  Copyright (c) 2007, Neocleus, Alex Novik (a...@neocleus.com)
 + *  Copyright (c) 2007, Neocleus, Guy Zana (g...@neocleus.com)
 + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.s...@qumranet.com)
 + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.s...@redhat.com)
 + *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (m...@il.ibm.com)
 + */
 +#include stdio.h
 +#include unistd.h
 +#include sys/io.h
 +#include sys/mman.h
 +#include sys/types.h
 +#include sys/stat.h
 +#include hw/hw.h
 +#include hw/pc.h
 +#include qemu-error.h
 +#include console.h
 +#include hw/loader.h
 +#include monitor.h
 +#include range.h
 +#include sysemu.h
 +#include hw/pci.h
 +#include hw/msi.h

 +#include kvm_i386.h

Am I correct to understand we compile this only for i386 / x86_64?
(apic.o in kvm/Makefile.objs hints in that direction) You may want to
update the description in the comment above accordingly, also mentioning
that this is some deprecated backwards-compatibility thing.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-08-27 Thread Jan Kiszka
On 2012-08-27 14:07, Andreas Färber wrote:
 Hi,
 
 Am 27.08.2012 08:28, schrieb Jan Kiszka:
 From: Jan Kiszka jan.kis...@siemens.com

 This adds PCI device assignment for i386 targets using the classic KVM
 interfaces. This version is 100% identical to what is being maintained
 in qemu-kvm for several years and is supported by libvirt as well. It is
 expected to remain relevant for another couple of years until kernels
 without full-features and performance-wise equivalent VFIO support are
 obsolete.

 A refactoring to-do that should be done in-tree is to model MSI and
 MSI-X support via the generic PCI layer, similar to what VFIO is already
 doing for MSI-X. This should improve the correctness and clean up the
 code from duplicate logic.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  hw/kvm/Makefile.objs |2 +-
  hw/kvm/pci-assign.c  | 1929 
 ++
  2 files changed, 1930 insertions(+), 1 deletions(-)
  create mode 100644 hw/kvm/pci-assign.c
 [...]
 diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c
 new file mode 100644
 index 000..9cce02c
 --- /dev/null
 +++ b/hw/kvm/pci-assign.c
 @@ -0,0 +1,1929 @@
 +/*
 + * Copyright (c) 2007, Neocleus Corporation.
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms and conditions of the GNU General Public License,
 + * version 2, as published by the Free Software Foundation.
 
 The downside of accepting this into qemu.git is that it gets us a huge
 blob of GPLv2-only code without history of contributors for GPLv2+
 relicensing...

The history is documented in qemu-kvm. I personally don't see it will
pay off going through this, but someone else may, and nothing will
prevent trying this at least. I can leave a comment.

BTW, VFIO will be GPLv2 only as well. If I understood Alex correctly, it
is too much derived from this code. IOW: There is probably no PCI
assignment without this restriction in the foreseeable future.

 
 + *
 + * This program is distributed in the hope it will be useful, but WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + *
 + * You should have received a copy of the GNU General Public License along 
 with
 + * this program; if not, write to the Free Software Foundation, Inc., 59 
 Temple
 + * Place - Suite 330, Boston, MA 02111-1307 USA.
 
 (Expect the usual GNU address reminder here.)

Will fix.

 
 + *
 + *
 + *  Assign a PCI device from the host to a guest VM.
 + *
 + *  Adapted for KVM by Qumranet.
 + *
 + *  Copyright (c) 2007, Neocleus, Alex Novik (a...@neocleus.com)
 + *  Copyright (c) 2007, Neocleus, Guy Zana (g...@neocleus.com)
 + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.s...@qumranet.com)
 + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.s...@redhat.com)
 + *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (m...@il.ibm.com)
 + */
 +#include stdio.h
 +#include unistd.h
 +#include sys/io.h
 +#include sys/mman.h
 +#include sys/types.h
 +#include sys/stat.h
 +#include hw/hw.h
 +#include hw/pc.h
 +#include qemu-error.h
 +#include console.h
 +#include hw/loader.h
 +#include monitor.h
 +#include range.h
 +#include sysemu.h
 +#include hw/pci.h
 +#include hw/msi.h
 
 +#include kvm_i386.h
 
 Am I correct to understand we compile this only for i386 / x86_64?

This is correct.

 (apic.o in kvm/Makefile.objs hints in that direction) You may want to
 update the description in the comment above accordingly, also mentioning
 that this is some deprecated backwards-compatibility thing.

You mean in the header of pci-assign.c? Can do.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 0/4] migrate PV EOI MSR

2012-08-27 Thread Michael S. Tsirkin
It turns out PV EOI gets disabled after migration -
until next guest reset.
This is because we are missing code to actually migrate it.
This patch fixes it up: it applies cleanly to qemu.git
as well as qemu-kvm.git, so I think it's cleaner
to apply it in qemu.git to keep diff to minimum.

Note: there's talk about adding infrastructure for
CPUID whitelisting which thinkably could be used
for migration compat support. I am guessing this won't be
1.2 material - when it's ready we can easily replace
a simple flag that this patchset adds with something else.

So this just adds minimal code to avoid regressing
cross-version migration.

Note: there's a kernel bug in linux 3.6-rc3 - apply
my patch 'kvm: fix KVM_GET_MSR for PV EOI' in order to
use this patchset on it.

Needed for 1.2.

Changes from v1:
Update all headers from 3.6-rc3 to keep them in sync (Jan)
Disable cpuid flag for qemu 1.2 and older (Orit)

Michael S. Tsirkin (4):
  linux-headers: update to 3.6-rc3
  pc: refactor compat code
  cpuid: disable pv eoi for 1.1 and older compat types
  kvm: get/set PV EOI MSR

 hw/Makefile.objs  |  2 +-
 hw/cpu_flags.c| 32 +++
 hw/cpu_flags.h|  9 
 hw/pc_piix.c  | 46 ---
 linux-headers/asm-s390/kvm.h  |  2 +-
 linux-headers/asm-s390/kvm_para.h |  2 +-
 linux-headers/asm-x86/kvm.h   |  1 +
 linux-headers/asm-x86/kvm_para.h  |  7 ++
 linux-headers/linux/kvm.h |  3 +++
 target-i386/cpu.c |  8 +++
 target-i386/cpu.h |  1 +
 target-i386/kvm.c | 13 +++
 target-i386/machine.c | 21 ++
 13 files changed, 136 insertions(+), 11 deletions(-)
 create mode 100644 hw/cpu_flags.c
 create mode 100644 hw/cpu_flags.h

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 1/4] linux-headers: update to 3.6-rc3

2012-08-27 Thread Michael S. Tsirkin
Update linux-headers to version present in Linux 3.6-rc3.
Header asm-x96_64/kvm_para.h update is needed for the new PV EOI
feature.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 linux-headers/asm-s390/kvm.h  | 2 +-
 linux-headers/asm-s390/kvm_para.h | 2 +-
 linux-headers/asm-x86/kvm.h   | 1 +
 linux-headers/asm-x86/kvm_para.h  | 7 +++
 linux-headers/linux/kvm.h | 3 +++
 5 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/linux-headers/asm-s390/kvm.h b/linux-headers/asm-s390/kvm.h
index bdcbe0f..d25da59 100644
--- a/linux-headers/asm-s390/kvm.h
+++ b/linux-headers/asm-s390/kvm.h
@@ -1,7 +1,7 @@
 #ifndef __LINUX_KVM_S390_H
 #define __LINUX_KVM_S390_H
 /*
- * asm-s390/kvm.h - KVM s390 specific structures and definitions
+ * KVM s390 specific structures and definitions
  *
  * Copyright IBM Corp. 2008
  *
diff --git a/linux-headers/asm-s390/kvm_para.h 
b/linux-headers/asm-s390/kvm_para.h
index 8e2dd67..870051f 100644
--- a/linux-headers/asm-s390/kvm_para.h
+++ b/linux-headers/asm-s390/kvm_para.h
@@ -1,5 +1,5 @@
 /*
- * asm-s390/kvm_para.h - definition for paravirtual devices on s390
+ * definition for paravirtual devices on s390
  *
  * Copyright IBM Corp. 2008
  *
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index e7d1c19..246617e 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -12,6 +12,7 @@
 /* Select x86 specific features in linux/kvm.h */
 #define __KVM_HAVE_PIT
 #define __KVM_HAVE_IOAPIC
+#define __KVM_HAVE_IRQ_LINE
 #define __KVM_HAVE_DEVICE_ASSIGNMENT
 #define __KVM_HAVE_MSI
 #define __KVM_HAVE_USER_NMI
diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h
index f2ac46a..a1c3d72 100644
--- a/linux-headers/asm-x86/kvm_para.h
+++ b/linux-headers/asm-x86/kvm_para.h
@@ -22,6 +22,7 @@
 #define KVM_FEATURE_CLOCKSOURCE23
 #define KVM_FEATURE_ASYNC_PF   4
 #define KVM_FEATURE_STEAL_TIME 5
+#define KVM_FEATURE_PV_EOI 6
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -37,6 +38,7 @@
 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
+#define MSR_KVM_PV_EOI_EN  0x4b564d04
 
 struct kvm_steal_time {
__u64 steal;
@@ -89,5 +91,10 @@ struct kvm_vcpu_pv_apf_data {
__u32 enabled;
 };
 
+#define KVM_PV_EOI_BIT 0
+#define KVM_PV_EOI_MASK (0x1  KVM_PV_EOI_BIT)
+#define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
+#define KVM_PV_EOI_DISABLED 0x0
+
 
 #endif /* _ASM_X86_KVM_PARA_H */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 5a9d4e3..4b9e575 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -617,6 +617,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_SIGNAL_MSI 77
 #define KVM_CAP_PPC_GET_SMMU_INFO 78
 #define KVM_CAP_S390_COW 79
+#define KVM_CAP_PPC_ALLOC_HTAB 80
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -828,6 +829,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SIGNAL_MSI_IOW(KVMIO,  0xa5, struct kvm_msi)
 /* Available with KVM_CAP_PPC_GET_SMMU_INFO */
 #define KVM_PPC_GET_SMMU_INFO_IOR(KVMIO,  0xa6, struct kvm_ppc_smmu_info)
+/* Available with KVM_CAP_PPC_ALLOC_HTAB */
+#define KVM_PPC_ALLOCATE_HTAB_IOWR(KVMIO, 0xa7, __u32)
 
 /*
  * ioctls for vcpu fds
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 2/4] pc: refactor compat code

2012-08-27 Thread Michael S. Tsirkin
In preparation to adding PV EOI migration for 1.2,
trivially refactor some some compat code
to make it easier to add version specific
cpuid tweaks.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/pc_piix.c | 44 
 1 file changed, 36 insertions(+), 8 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index a771d79..008d42f 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -369,6 +369,22 @@ static QEMUMachine pc_machine_v1_2 = {
 .default_machine_opts = KVM_MACHINE_OPTIONS,
 };
 
+static void pc_machine_v1_1_compat(void)
+{
+}
+
+static void pc_init_pci_v1_1(ram_addr_t ram_size,
+ const char *boot_device,
+ const char *kernel_filename,
+ const char *kernel_cmdline,
+ const char *initrd_filename,
+ const char *cpu_model)
+{
+pc_machine_v1_1_compat();
+pc_init_pci(ram_size, boot_device, kernel_filename,
+kernel_cmdline, initrd_filename, cpu_model);
+}
+
 #define PC_COMPAT_1_1 \
 {\
 .driver   = virtio-scsi-pci,\
@@ -403,7 +419,7 @@ static QEMUMachine pc_machine_v1_2 = {
 static QEMUMachine pc_machine_v1_1 = {
 .name = pc-1.1,
 .desc = Standard PC,
-.init = pc_init_pci,
+.init = pc_init_pci_v1_1,
 .max_cpus = 255,
 .default_machine_opts = KVM_MACHINE_OPTIONS,
 .compat_props = (GlobalProperty[]) {
@@ -439,7 +455,7 @@ static QEMUMachine pc_machine_v1_1 = {
 static QEMUMachine pc_machine_v1_0 = {
 .name = pc-1.0,
 .desc = Standard PC,
-.init = pc_init_pci,
+.init = pc_init_pci_v1_1,
 .max_cpus = 255,
 .default_machine_opts = KVM_MACHINE_OPTIONS,
 .compat_props = (GlobalProperty[]) {
@@ -455,7 +471,7 @@ static QEMUMachine pc_machine_v1_0 = {
 static QEMUMachine pc_machine_v0_15 = {
 .name = pc-0.15,
 .desc = Standard PC,
-.init = pc_init_pci,
+.init = pc_init_pci_v1_1,
 .max_cpus = 255,
 .default_machine_opts = KVM_MACHINE_OPTIONS,
 .compat_props = (GlobalProperty[]) {
@@ -488,7 +504,7 @@ static QEMUMachine pc_machine_v0_15 = {
 static QEMUMachine pc_machine_v0_14 = {
 .name = pc-0.14,
 .desc = Standard PC,
-.init = pc_init_pci,
+.init = pc_init_pci_v1_1,
 .max_cpus = 255,
 .default_machine_opts = KVM_MACHINE_OPTIONS,
 .compat_props = (GlobalProperty[]) {
@@ -519,10 +535,22 @@ static QEMUMachine pc_machine_v0_14 = {
 .value= stringify(1),\
 }
 
+static void pc_init_pci_v0_13(ram_addr_t ram_size,
+ const char *boot_device,
+ const char *kernel_filename,
+ const char *kernel_cmdline,
+ const char *initrd_filename,
+ const char *cpu_model)
+{
+pc_machine_v1_1_compat();
+pc_init_pci_no_kvmclock(ram_size, boot_device, kernel_filename,
+kernel_cmdline, initrd_filename, cpu_model);
+}
+
 static QEMUMachine pc_machine_v0_13 = {
 .name = pc-0.13,
 .desc = Standard PC,
-.init = pc_init_pci_no_kvmclock,
+.init = pc_init_pci_v0_13,
 .max_cpus = 255,
 .default_machine_opts = KVM_MACHINE_OPTIONS,
 .compat_props = (GlobalProperty[]) {
@@ -560,7 +588,7 @@ static QEMUMachine pc_machine_v0_13 = {
 static QEMUMachine pc_machine_v0_12 = {
 .name = pc-0.12,
 .desc = Standard PC,
-.init = pc_init_pci_no_kvmclock,
+.init = pc_init_pci_v0_13,
 .max_cpus = 255,
 .default_machine_opts = KVM_MACHINE_OPTIONS,
 .compat_props = (GlobalProperty[]) {
@@ -594,7 +622,7 @@ static QEMUMachine pc_machine_v0_12 = {
 static QEMUMachine pc_machine_v0_11 = {
 .name = pc-0.11,
 .desc = Standard PC, qemu 0.11,
-.init = pc_init_pci_no_kvmclock,
+.init = pc_init_pci_v0_13,
 .max_cpus = 255,
 .default_machine_opts = KVM_MACHINE_OPTIONS,
 .compat_props = (GlobalProperty[]) {
@@ -616,7 +644,7 @@ static QEMUMachine pc_machine_v0_11 = {
 static QEMUMachine pc_machine_v0_10 = {
 .name = pc-0.10,
 .desc = Standard PC, qemu 0.10,
-.init = pc_init_pci_no_kvmclock,
+.init = pc_init_pci_v0_13,
 .max_cpus = 255,
 .default_machine_opts = KVM_MACHINE_OPTIONS,
 .compat_props = (GlobalProperty[]) {
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types

2012-08-27 Thread Michael S. Tsirkin
In preparation for adding PV EOI support, disable PV EOI by default for
1.1 and older machine types, to avoid CPUID changing during migration.

PV EOI can still be enabled/disabled by specifying it explicitly.
Enable for 1.1
-M pc-1.1 -cpu kvm64,+kvm_pv_eoi
Disable for 1.2
-M pc-1.2 -cpu kvm64,-kvm_pv_eoi

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/Makefile.objs  |  2 +-
 hw/cpu_flags.c| 32 
 hw/cpu_flags.h|  9 +
 hw/pc_piix.c  |  2 ++
 target-i386/cpu.c |  8 
 5 files changed, 52 insertions(+), 1 deletion(-)
 create mode 100644 hw/cpu_flags.c
 create mode 100644 hw/cpu_flags.h

diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 850b87b..3f2532a 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -1,5 +1,5 @@
 hw-obj-y = usb/ ide/
-hw-obj-y += loader.o
+hw-obj-y += loader.o cpu_flags.o
 hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
 hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
 hw-obj-y += fw_cfg.o
diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c
new file mode 100644
index 000..2422d20
--- /dev/null
+++ b/hw/cpu_flags.c
@@ -0,0 +1,32 @@
+/*
+ * CPU compatibility flags.
+ *
+ * Copyright (c) 2012 Red Hat Inc.
+ * Author: Michael S. Tsirkin.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see http://www.gnu.org/licenses/.
+ */
+#include hw/cpu_flags.h
+
+static bool __kvm_pv_eoi_disabled;
+
+void disable_kvm_pv_eoi(void)
+{
+   __kvm_pv_eoi_disabled = true;
+}
+
+bool kvm_pv_eoi_disabled(void)
+{
+   return __kvm_pv_eoi_disabled;
+}
diff --git a/hw/cpu_flags.h b/hw/cpu_flags.h
new file mode 100644
index 000..05777b6
--- /dev/null
+++ b/hw/cpu_flags.h
@@ -0,0 +1,9 @@
+#ifndef HW_CPU_FLAGS_H
+#define HW_CPU_FLAGS_H
+
+#include stdbool.h
+
+void disable_kvm_pv_eoi(void);
+bool kvm_pv_eoi_disabled(void);
+
+#endif
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 008d42f..bdbceda 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -46,6 +46,7 @@
 #ifdef CONFIG_XEN
 #  include xen/hvm/hvm_info_table.h
 #endif
+#include cpu_flags.h
 
 #define MAX_IDE_BUS 2
 
@@ -371,6 +372,7 @@ static QEMUMachine pc_machine_v1_2 = {
 
 static void pc_machine_v1_1_compat(void)
 {
+disable_kvm_pv_eoi();
 }
 
 static void pc_init_pci_v1_1(ram_addr_t ram_size,
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 120a2e3..0d02fd1 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -23,6 +23,7 @@
 
 #include cpu.h
 #include kvm.h
+#include asm/kvm_para.h
 
 #include qemu-option.h
 #include qemu-config.h
@@ -33,6 +34,7 @@
 #include hyperv.h
 
 #include hw/hw.h
+#include hw/cpu_flags.h
 
 /* feature flags taken from Intel Processor Identification and the CPUID
  * Instruction and AMD's CPUID Specification.  In cases of disagreement
@@ -889,6 +891,12 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
const char *cpu_model)
 
 plus_kvm_features = ~0; /* not supported bits will be filtered out later */
 
+/* Disable PV EOI for old machine types.
+ * Feature flags can still override. */
+if (kvm_pv_eoi_disabled()) {
+plus_kvm_features = ~(0x1  KVM_FEATURE_PV_EOI);
+}
+
 add_flagname_to_bitmaps(hypervisor, plus_features,
 plus_ext_features, plus_ext2_features, plus_ext3_features,
 plus_kvm_features, plus_svm_features);
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 4/4] kvm: get/set PV EOI MSR

2012-08-27 Thread Michael S. Tsirkin
Support get/set of new PV EOI MSR, for migration.
Add an optional section for MSR value - send it
out in case MSR was changed from the default value (0).

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 target-i386/cpu.h |  1 +
 target-i386/kvm.c | 13 +
 target-i386/machine.c | 21 +
 3 files changed, 35 insertions(+)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index aabf993..3c57d8b 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -699,6 +699,7 @@ typedef struct CPUX86State {
 uint64_t system_time_msr;
 uint64_t wall_clock_msr;
 uint64_t async_pf_en_msr;
+uint64_t pv_eoi_en_msr;
 
 uint64_t tsc;
 uint64_t tsc_deadline;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5e2d4f5..6790180 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -64,6 +64,7 @@ static bool has_msr_star;
 static bool has_msr_hsave_pa;
 static bool has_msr_tsc_deadline;
 static bool has_msr_async_pf_en;
+static bool has_msr_pv_eoi_en;
 static bool has_msr_misc_enable;
 static int lm_capable_kernel;
 
@@ -456,6 +457,8 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 
 has_msr_async_pf_en = c-eax  (1  KVM_FEATURE_ASYNC_PF);
 
+has_msr_pv_eoi_en = c-eax  (1  KVM_FEATURE_PV_EOI);
+
 cpu_x86_cpuid(env, 0, 0, limit, unused, unused, unused);
 
 for (i = 0; i = limit; i++) {
@@ -1018,6 +1021,10 @@ static int kvm_put_msrs(CPUX86State *env, int level)
 kvm_msr_entry_set(msrs[n++], MSR_KVM_ASYNC_PF_EN,
   env-async_pf_en_msr);
 }
+if (has_msr_pv_eoi_en) {
+kvm_msr_entry_set(msrs[n++], MSR_KVM_PV_EOI_EN,
+  env-pv_eoi_en_msr);
+}
 if (hyperv_hypercall_available()) {
 kvm_msr_entry_set(msrs[n++], HV_X64_MSR_GUEST_OS_ID, 0);
 kvm_msr_entry_set(msrs[n++], HV_X64_MSR_HYPERCALL, 0);
@@ -1260,6 +1267,9 @@ static int kvm_get_msrs(CPUX86State *env)
 if (has_msr_async_pf_en) {
 msrs[n++].index = MSR_KVM_ASYNC_PF_EN;
 }
+if (has_msr_pv_eoi_en) {
+msrs[n++].index = MSR_KVM_PV_EOI_EN;
+}
 
 if (env-mcg_cap) {
 msrs[n++].index = MSR_MCG_STATUS;
@@ -1339,6 +1349,9 @@ static int kvm_get_msrs(CPUX86State *env)
 case MSR_KVM_ASYNC_PF_EN:
 env-async_pf_en_msr = msrs[i].data;
 break;
+case MSR_KVM_PV_EOI_EN:
+env-pv_eoi_en_msr = msrs[i].data;
+break;
 }
 }
 
diff --git a/target-i386/machine.c b/target-i386/machine.c
index a8be058..4771508 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -279,6 +279,13 @@ static bool async_pf_msr_needed(void *opaque)
 return cpu-async_pf_en_msr != 0;
 }
 
+static bool pv_eoi_msr_needed(void *opaque)
+{
+CPUX86State *cpu = opaque;
+
+return cpu-pv_eoi_en_msr != 0;
+}
+
 static const VMStateDescription vmstate_async_pf_msr = {
 .name = cpu/async_pf_msr,
 .version_id = 1,
@@ -290,6 +297,17 @@ static const VMStateDescription vmstate_async_pf_msr = {
 }
 };
 
+static const VMStateDescription vmstate_pv_eoi_msr = {
+.name = cpu/async_pv_eoi_msr,
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_UINT64(pv_eoi_en_msr, CPUX86State),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static bool fpop_ip_dp_needed(void *opaque)
 {
 CPUX86State *env = opaque;
@@ -454,6 +472,9 @@ static const VMStateDescription vmstate_cpu = {
 .vmsd = vmstate_async_pf_msr,
 .needed = async_pf_msr_needed,
 } , {
+.vmsd = vmstate_pv_eoi_msr,
+.needed = pv_eoi_msr_needed,
+} , {
 .vmsd = vmstate_fpop_ip_dp,
 .needed = fpop_ip_dp_needed,
 }, {
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3

2012-08-27 Thread Peter Maydell
On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote:
 Update linux-headers to version present in Linux 3.6-rc3.
 Header asm-x96_64/kvm_para.h update is needed for the new PV EOI
 feature.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  linux-headers/asm-s390/kvm.h  | 2 +-
  linux-headers/asm-s390/kvm_para.h | 2 +-
  linux-headers/asm-x86/kvm.h   | 1 +
  linux-headers/asm-x86/kvm_para.h  | 7 +++
  linux-headers/linux/kvm.h | 3 +++
  5 files changed, 13 insertions(+), 2 deletions(-)

The latest version of update-linux-headers.sh should have caused
this update to include asm-generic/kvm_para.h, I think. Did the
script not pull that header in, or were you maybe using an old
version of the script or forgot to git add the new file?

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3

2012-08-27 Thread Jan Kiszka
On 2012-08-27 14:42, Peter Maydell wrote:
 On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote:
 Update linux-headers to version present in Linux 3.6-rc3.
 Header asm-x96_64/kvm_para.h update is needed for the new PV EOI
 feature.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  linux-headers/asm-s390/kvm.h  | 2 +-
  linux-headers/asm-s390/kvm_para.h | 2 +-
  linux-headers/asm-x86/kvm.h   | 1 +
  linux-headers/asm-x86/kvm_para.h  | 7 +++
  linux-headers/linux/kvm.h | 3 +++
  5 files changed, 13 insertions(+), 2 deletions(-)
 
 The latest version of update-linux-headers.sh should have caused
 this update to include asm-generic/kvm_para.h, I think. Did the
 script not pull that header in, or were you maybe using an old
 version of the script or forgot to git add the new file?

To be fair, that is hard to guess. We should add some magic to the
update script to detect new files and maybe suggest them for addition.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 0/3] KVM: perf: kvm events analysis tool

2012-08-27 Thread David Ahern

On 8/27/12 3:59 AM, Xiao Guangrong wrote:

CC David.

Hi David,

I should apologize to you that Dong forgot to post the patchset
to you. Could you pick these up from the mail list?


Yes, I do catch all perf related emails to LKML. I'll take a look at the 
patches today or tomorrow.


David

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/3] KVM: move postcommit flush to x86, as mmio sptes are x86 specific

2012-08-27 Thread Takuya Yoshikawa
On Fri, 24 Aug 2012 15:54:59 -0300
Marcelo Tosatti mtosa...@redhat.com wrote:

 Other arches do not need this.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 Index: kvm/arch/x86/kvm/x86.c
 ===
 --- kvm.orig/arch/x86/kvm/x86.c
 +++ kvm/arch/x86/kvm/x86.c
 @@ -6455,6 +6455,14 @@ void kvm_arch_commit_memory_region(struc
   kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
   kvm_mmu_slot_remove_write_access(kvm, mem-slot);
   spin_unlock(kvm-mmu_lock);
 + /*
 +  * If the new memory slot is created, we need to clear all
 +  * mmio sptes.
 +  */
 + if (old.npages == 0  npages) {
 + kvm_mmu_zap_all(kvm);
 + kvm_reload_remote_mmus(kvm);
 + }
  }

Any explanation why (old.base_gfn != new.base_gfn) case can be
omitted?

Takuya

  
  void kvm_arch_flush_shadow_all(struct kvm *kvm)
 Index: kvm/virt/kvm/kvm_main.c
 ===
 --- kvm.orig/virt/kvm/kvm_main.c
 +++ kvm/virt/kvm/kvm_main.c
 @@ -849,13 +849,6 @@ int __kvm_set_memory_region(struct kvm *
  
   kvm_arch_commit_memory_region(kvm, mem, old, user_alloc);
  
 - /*
 -  * If the new memory slot is created, we need to clear all
 -  * mmio sptes.
 -  */
 - if (npages  old.base_gfn != mem-guest_phys_addr  PAGE_SHIFT)
 - kvm_arch_flush_shadow_all(kvm);
 -
   kvm_free_physmem_slot(old, new);
   kfree(old_memslots);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3

2012-08-27 Thread Michael S. Tsirkin
On Mon, Aug 27, 2012 at 01:42:03PM +0100, Peter Maydell wrote:
 On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote:
  Update linux-headers to version present in Linux 3.6-rc3.
  Header asm-x96_64/kvm_para.h update is needed for the new PV EOI
  feature.
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   linux-headers/asm-s390/kvm.h  | 2 +-
   linux-headers/asm-s390/kvm_para.h | 2 +-
   linux-headers/asm-x86/kvm.h   | 1 +
   linux-headers/asm-x86/kvm_para.h  | 7 +++
   linux-headers/linux/kvm.h | 3 +++
   5 files changed, 13 insertions(+), 2 deletions(-)
 
 The latest version of update-linux-headers.sh should have caused
 this update to include asm-generic/kvm_para.h, I think. Did the
 script not pull that header in, or were you maybe using an old
 version of the script or forgot to git add the new file?
 
 thanks
 -- PMM

I have no idea but adding new files is not the same as updating
existing ones.
Why don't you add it when you update headers to a version that
actually uses it?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3

2012-08-27 Thread Michael S. Tsirkin
On Mon, Aug 27, 2012 at 02:48:57PM +0200, Jan Kiszka wrote:
 On 2012-08-27 14:42, Peter Maydell wrote:
  On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote:
  Update linux-headers to version present in Linux 3.6-rc3.
  Header asm-x96_64/kvm_para.h update is needed for the new PV EOI
  feature.
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   linux-headers/asm-s390/kvm.h  | 2 +-
   linux-headers/asm-s390/kvm_para.h | 2 +-
   linux-headers/asm-x86/kvm.h   | 1 +
   linux-headers/asm-x86/kvm_para.h  | 7 +++
   linux-headers/linux/kvm.h | 3 +++
   5 files changed, 13 insertions(+), 2 deletions(-)
  
  The latest version of update-linux-headers.sh should have caused
  this update to include asm-generic/kvm_para.h, I think. Did the
  script not pull that header in, or were you maybe using an old
  version of the script or forgot to git add the new file?
 
 To be fair, that is hard to guess. We should add some magic to the
 update script to detect new files and maybe suggest them for addition.
 
 Jan

But why did you add a header to qemu without adding it
to git? That's a cleaner solution and needs no magic scripting.

 -- 
 Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3

2012-08-27 Thread Jan Kiszka
On 2012-08-27 16:53, Michael S. Tsirkin wrote:
 On Mon, Aug 27, 2012 at 02:48:57PM +0200, Jan Kiszka wrote:
 On 2012-08-27 14:42, Peter Maydell wrote:
 On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote:
 Update linux-headers to version present in Linux 3.6-rc3.
 Header asm-x96_64/kvm_para.h update is needed for the new PV EOI
 feature.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  linux-headers/asm-s390/kvm.h  | 2 +-
  linux-headers/asm-s390/kvm_para.h | 2 +-
  linux-headers/asm-x86/kvm.h   | 1 +
  linux-headers/asm-x86/kvm_para.h  | 7 +++
  linux-headers/linux/kvm.h | 3 +++
  5 files changed, 13 insertions(+), 2 deletions(-)

 The latest version of update-linux-headers.sh should have caused
 this update to include asm-generic/kvm_para.h, I think. Did the
 script not pull that header in, or were you maybe using an old
 version of the script or forgot to git add the new file?

 To be fair, that is hard to guess. We should add some magic to the
 update script to detect new files and maybe suggest them for addition.

 Jan
 
 But why did you add a header to qemu without adding it
 to git? That's a cleaner solution and needs no magic scripting.

Yes, this would have been appropriate. Still, a simple git status -s
linux-headers run at the end of the update script can help reminding
people in the future.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Michael Wolf
On Sat, 2012-08-25 at 19:36 -0400, Glauber Costa wrote:
 On 08/24/2012 11:11 AM, Michael Wolf wrote:
  On Fri, 2012-08-24 at 08:53 +0400, Glauber Costa wrote:
  On 08/24/2012 03:14 AM, Michael Wolf wrote:
  This is an RFC regarding the reporting of stealtime.  In the case of
  where you have a system that is running with partial processors such as
  KVM the user may see steal time being reported in accounting tools such
  as top or vmstat.  This can cause confusion for the end user.  To
  ease the confusion this patch set adds a sysctl interface to set the
  cpu entitlement.  This is the percentage of cpu that the guest system is
   expected to receive.  As long as the steal time is within its expected
  range it will show up as 0 in /proc/stat.  The user will then see in the
  accounting tools that they are getting a full utilization of the cpu
  resources assigned to them.
 
 
  And how is such a knob not confusing?
 
  Steal time is pretty well defined in meaning and is shown in top for
  ages. I really don't see the point for this.
  
  Currently you can see the steal time but you have no way of knowing if
  the cpu utilization you are seeing on the guest is the expected amount.
  I decided on making it a knob because a guest could be migrated to
  another system and it's entitlement could change because of hardware or 
  load differences.  It could simply be a /proc file and report the
  current entitlement if needed.   As things are currently implemented I 
  don't see how someone knows if the guest is running as expected or
  whether there is a problem.
  
 
 Turning off steal time display won't get even close to displaying the
 information you want. What you probably want is a guest-visible way to
 say how many miliseconds you are expected to run each second. Right?

It is not clear to me how knowing how many milliseconds you are
expecting to run will help the user.  Currently the users will run top
to see how well the guest is running.  If they see _any_ steal time some
users think they are not getting the full use of their processor
entitlement.

Maybe I'm missing what you are proposing, but even if you knew the
milliseconds that you were expecting for your processor you would have
to adjust the top output in your head so to speak.  You would see the
utilization and then say 'ok that matches the number of milliseconds I
expected to run...   If we take away the steal time (as long as it is
equal to or less than the expected amount of steal time) then the user
running top will see the 100% utilization.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool

2012-08-27 Thread Andrew Jones
On Mon, Aug 27, 2012 at 05:51:46PM +0800, Dong Hao wrote:

snip

 +struct event_stats {
 + u64 count;
 + u64 time;
 +
 + /* used to calculate stddev. */
 + double mean;
 + double M2;
 +};

How about moving the stats functions from builtin-stat.c to e.g.
util/stats.c, and then reusing them? Then this struct (which I would
rename to kvm_event_stats) would look like this

struct kvm_event_stats {
u64 time;
struct stats stats;
};

of course the get_event_ accessor generators would need tweaking

snip

 +static void update_event_stats(struct event_stats *stats, u64 time_diff)
 +{
 + double delta;
 +
 + stats-count++;
 + stats-time += time_diff;
 +
 + delta = time_diff - stats-mean;
 + stats-mean += delta / stats-count;
 + stats-M2 += delta*(time_diff - stats-mean);
 +}

Reusing stats would allow this to become just

static void update_event_stats(struct kvm_event_stats *stats, u64 time_diff)
{
update_stats(kvm_stats-stats, time_diff);
kvm_stats-time += time_diff;
}

 +
 +static double event_stats_stddev(int vcpu_id, struct kvm_event *event)
 +{
 + struct event_stats *stats = event-total;
 + double variance, variance_mean, stddev;
 +
 + if (vcpu_id != -1)
 + stats = event-vcpu[vcpu_id];
 +
 + BUG_ON(!stats-count);
 +
 + variance = stats-M2 / (stats-count - 1);
 + variance_mean = variance / stats-count;
 + stddev = sqrt(variance_mean);
 +
 + return stddev * 100 / stats-mean;

This function's name implies it returns the stddev, but it returns the
relative stddev instead. Maybe rename it? This would be simplified
with code reuse too to basically just

return stddev_stats(kvm_stats-stats) * 100 / kvm_stats-stats.mean;

Drew
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: setting time in guest with ntpdate results in VM hang

2012-08-27 Thread Dale Swanston

On 8/24/2012 1:43 PM, Marcelo Tosatti wrote:

On Fri, Aug 24, 2012 at 09:57:35AM -0600, Dale Swanston wrote:

Hello.

We are running a guest OS of CentOS 4.4 (kernel 2.6.12) for legacy
reasons, upgrading is not an option.   NTP is running on the host
and synching with a local GPS NTP server.  But due to frequency
drift in the guest it restarts itself periodically and upon start up
performs an ntpdate to force a time jump on the guest.

I have seen 2 occasions now (over 2 months) where the VM hangs right
as the ntpdate command alters the guest clock (based on output in
/var/log/messages).

 From the host's perspective the VM is still running but it appears
to be using very high CPU percentage (more than typical).  The only
recovery option is to force shutdown of the VM and restart it.

This should not happen.


1.  Are there any known issues with ntpdate and VMs hanging?  Any
workarounds?
2.  Are there any debugging tools further characterise the problem?

Upgrading the guest kernel is not an option? At least install recent
kernel in guest to confirm that its not an already fixed bug.


Good idea.  I'll try that.

But are there any tools available to determine what the VM is doing when 
it appears hung?  I've looked but haven't found much on debug or 
diagnostics on a running VM.  Any links?


Is it possible the guest kernel is panicking? What would the VM do if 
that happened?  Would it shutdown?


Thanks again.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3

2012-08-27 Thread Michael S. Tsirkin
On Mon, Aug 27, 2012 at 04:59:40PM +0200, Jan Kiszka wrote:
 On 2012-08-27 16:53, Michael S. Tsirkin wrote:
  On Mon, Aug 27, 2012 at 02:48:57PM +0200, Jan Kiszka wrote:
  On 2012-08-27 14:42, Peter Maydell wrote:
  On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote:
  Update linux-headers to version present in Linux 3.6-rc3.
  Header asm-x96_64/kvm_para.h update is needed for the new PV EOI
  feature.
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   linux-headers/asm-s390/kvm.h  | 2 +-
   linux-headers/asm-s390/kvm_para.h | 2 +-
   linux-headers/asm-x86/kvm.h   | 1 +
   linux-headers/asm-x86/kvm_para.h  | 7 +++
   linux-headers/linux/kvm.h | 3 +++
   5 files changed, 13 insertions(+), 2 deletions(-)
 
  The latest version of update-linux-headers.sh should have caused
  this update to include asm-generic/kvm_para.h, I think. Did the
  script not pull that header in, or were you maybe using an old
  version of the script or forgot to git add the new file?
 
  To be fair, that is hard to guess. We should add some magic to the
  update script to detect new files and maybe suggest them for addition.
 
  Jan
  
  But why did you add a header to qemu without adding it
  to git? That's a cleaner solution and needs no magic scripting.
 
 Yes, this would have been appropriate. Still, a simple git status -s
 linux-headers run at the end of the update script can help reminding
 people in the future.
 
 Jan


Yes. But it would be better if instead of duplicating
a list of files/directories, update-linux-headers.sh would
just look at what is under linux-headers and update
exactly that.

This removes any chance of error, and avoids the need
to tweak shell scripts each time we add a header.

As a bonus we do not blow away random stuff
developer might have under linux-headers.
Thoughts?
WFM

---

scripts: better update headers

Be more careful when updating headers: only
update files we already have in git.
Also remove need to list files in this script.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

--

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 9d2a4bc..6607e56 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -28,23 +28,33 @@ if [ -z $output ]; then
 output=$PWD
 fi
 
-for arch in x86 powerpc s390; do
-make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install
-
-rm -rf $output/linux-headers/asm-$arch
-mkdir -p $output/linux-headers/asm-$arch
-for header in kvm.h kvm_para.h; do
-cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch
-done
-if [ $arch = x86 ]; then
-cp $tmpdir/include/asm/hyperv.h $output/linux-headers/asm-x86
-fi
-done
+IFS=$'\n'
+
+#get list of files
+dirs=`git ls-tree HEAD -- linux-headers/|grep tree|cut -f 2`
+if [ -z $dirs ]; then
+echo Unable to get list of directories under linux-headers/ to update
+fi
 
-rm -rf $output/linux-headers/linux
-mkdir -p $output/linux-headers/linux
-for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h; do
-cp $tmpdir/include/linux/$header $output/linux-headers/linux
+for d in $dirs; do
+a=${d/#linux-headers\//}
+case $a in
+asm-*)
+arch=${a/asm-/}
+make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch 
headers_install
+files=`git ls-tree -r HEAD -- $d |cut -f 2`
+for dst in $files; do
+src=include/asm/${dst/linux-headers\/asm-$arch\//}
+cp -f $tmpdir/$src $output/$dst || exit 2
+done ;;
+*) 
+make -C $linux INSTALL_HDR_PATH=$tmpdir headers_install
+files=`git ls-tree -r HEAD -- $d |cut -f 2`
+for dst in $files; do
+src=include/${dst/linux-headers\//}
+cp -f $tmpdir/$src $output/$dst || exit 2
+done ;;
+esac
 done
 if [ -L $linux/source ]; then
 cp $linux/source/COPYING $output/linux-headers
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Glauber Costa
On 08/27/2012 08:50 AM, Michael Wolf wrote:
 On Sat, 2012-08-25 at 19:36 -0400, Glauber Costa wrote:
 On 08/24/2012 11:11 AM, Michael Wolf wrote:
 On Fri, 2012-08-24 at 08:53 +0400, Glauber Costa wrote:
 On 08/24/2012 03:14 AM, Michael Wolf wrote:
 This is an RFC regarding the reporting of stealtime.  In the case of
 where you have a system that is running with partial processors such as
 KVM the user may see steal time being reported in accounting tools such
 as top or vmstat.  This can cause confusion for the end user.  To
 ease the confusion this patch set adds a sysctl interface to set the
 cpu entitlement.  This is the percentage of cpu that the guest system is
  expected to receive.  As long as the steal time is within its expected
 range it will show up as 0 in /proc/stat.  The user will then see in the
 accounting tools that they are getting a full utilization of the cpu
 resources assigned to them.


 And how is such a knob not confusing?

 Steal time is pretty well defined in meaning and is shown in top for
 ages. I really don't see the point for this.

 Currently you can see the steal time but you have no way of knowing if
 the cpu utilization you are seeing on the guest is the expected amount.
 I decided on making it a knob because a guest could be migrated to
 another system and it's entitlement could change because of hardware or 
 load differences.  It could simply be a /proc file and report the
 current entitlement if needed.   As things are currently implemented I 
 don't see how someone knows if the guest is running as expected or
 whether there is a problem.


 Turning off steal time display won't get even close to displaying the
 information you want. What you probably want is a guest-visible way to
 say how many miliseconds you are expected to run each second. Right?
 
 It is not clear to me how knowing how many milliseconds you are
 expecting to run will help the user.  Currently the users will run top
 to see how well the guest is running.  If they see _any_ steal time some
 users think they are not getting the full use of their processor
 entitlement.


And your plan is just to selectively lie about it, but disabling it with
a knob?

 Maybe I'm missing what you are proposing, but even if you knew the
 milliseconds that you were expecting for your processor you would have
 to adjust the top output in your head so to speak.  You would see the
 utilization and then say 'ok that matches the number of milliseconds I
 expected to run...   If we take away the steal time (as long as it is
 equal to or less than the expected amount of steal time) then the user
 running top will see the 100% utilization.
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Avi Kivity
On 08/23/2012 04:14 PM, Michael Wolf wrote:
 This is an RFC regarding the reporting of stealtime.  In the case of
 where you have a system that is running with partial processors such as
 KVM the user may see steal time being reported in accounting tools such
 as top or vmstat.  This can cause confusion for the end user.  To
 ease the confusion this patch set adds a sysctl interface to set the
 cpu entitlement.  This is the percentage of cpu that the guest system is
  expected to receive.  As long as the steal time is within its expected
 range it will show up as 0 in /proc/stat.  The user will then see in the
 accounting tools that they are getting a full utilization of the cpu
 resources assigned to them.

 This patchset is changing the contents/output of /proc/stat and could affect 
 user tools.  However the default setting is that the cpu is entitled to 100% 
 so the code will act as before.  Also another field could be added to the 
 /proc/stat output and show the unaltered steal time. Since this additional 
 field could cause more confusion than it would clear up I have left it out 
 for now.
 

How would a guest know what its entitlement is?


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-08-27 Thread Michael S. Tsirkin
On Mon, Aug 27, 2012 at 06:56:38PM +, Blue Swirl wrote:
  +static uint32_t slow_bar_readb(void *opaque, target_phys_addr_t addr)
  +{
  +AssignedDevRegion *d = opaque;
  +uint8_t *in = d-u.r_virtbase + addr;
 
 Don't perform arithmetic with void pointers.

Why not?
We require gcc and it's a documented extension there.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types

2012-08-27 Thread Michael S. Tsirkin
On Mon, Aug 27, 2012 at 06:58:29PM +, Blue Swirl wrote:
 On Mon, Aug 27, 2012 at 12:20 PM, Michael S. Tsirkin m...@redhat.com wrote:
  In preparation for adding PV EOI support, disable PV EOI by default for
  1.1 and older machine types, to avoid CPUID changing during migration.
 
  PV EOI can still be enabled/disabled by specifying it explicitly.
  Enable for 1.1
  -M pc-1.1 -cpu kvm64,+kvm_pv_eoi
  Disable for 1.2
  -M pc-1.2 -cpu kvm64,-kvm_pv_eoi
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   hw/Makefile.objs  |  2 +-
   hw/cpu_flags.c| 32 
   hw/cpu_flags.h|  9 +
   hw/pc_piix.c  |  2 ++
   target-i386/cpu.c |  8 
   5 files changed, 52 insertions(+), 1 deletion(-)
   create mode 100644 hw/cpu_flags.c
   create mode 100644 hw/cpu_flags.h
 
  diff --git a/hw/Makefile.objs b/hw/Makefile.objs
  index 850b87b..3f2532a 100644
  --- a/hw/Makefile.objs
  +++ b/hw/Makefile.objs
  @@ -1,5 +1,5 @@
   hw-obj-y = usb/ ide/
  -hw-obj-y += loader.o
  +hw-obj-y += loader.o cpu_flags.o
   hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
   hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
   hw-obj-y += fw_cfg.o
  diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c
  new file mode 100644
  index 000..2422d20
  --- /dev/null
  +++ b/hw/cpu_flags.c
  @@ -0,0 +1,32 @@
  +/*
  + * CPU compatibility flags.
  + *
  + * Copyright (c) 2012 Red Hat Inc.
  + * Author: Michael S. Tsirkin.
  + *
  + * This program is free software; you can redistribute it and/or modify
  + * it under the terms of the GNU General Public License as published by
  + * the Free Software Foundation; either version 2 of the License, or
  + * (at your option) any later version.
  + *
  + * This program is distributed in the hope that it will be useful,
  + * but WITHOUT ANY WARRANTY; without even the implied warranty of
  + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  + * GNU General Public License for more details.
  + *
  + * You should have received a copy of the GNU General Public License along
  + * with this program; if not, see http://www.gnu.org/licenses/.
  + */
  +#include hw/cpu_flags.h
  +
  +static bool __kvm_pv_eoi_disabled;
 
 Don't use identifiers with leading underscores.

C99 spec says 
Any other predefined macro names
shall begin with a leading underscore followed by an uppercase letter or
a second underscore.


what are chances of compiler predefining macro __kvm_pv_eoi_disabled?

But OK, will rename _kvm_pv_eoi_disabled.
_ + lower case is guaranteed OK.


  +
  +void disable_kvm_pv_eoi(void)
  +{
  +   __kvm_pv_eoi_disabled = true;
  +}
  +
  +bool kvm_pv_eoi_disabled(void)
  +{
  +   return __kvm_pv_eoi_disabled;
  +}
  diff --git a/hw/cpu_flags.h b/hw/cpu_flags.h
  new file mode 100644
  index 000..05777b6
  --- /dev/null
  +++ b/hw/cpu_flags.h
  @@ -0,0 +1,9 @@
  +#ifndef HW_CPU_FLAGS_H
  +#define HW_CPU_FLAGS_H
  +
  +#include stdbool.h
  +
  +void disable_kvm_pv_eoi(void);
  +bool kvm_pv_eoi_disabled(void);
  +
  +#endif
  diff --git a/hw/pc_piix.c b/hw/pc_piix.c
  index 008d42f..bdbceda 100644
  --- a/hw/pc_piix.c
  +++ b/hw/pc_piix.c
  @@ -46,6 +46,7 @@
   #ifdef CONFIG_XEN
   #  include xen/hvm/hvm_info_table.h
   #endif
  +#include cpu_flags.h
 
   #define MAX_IDE_BUS 2
 
  @@ -371,6 +372,7 @@ static QEMUMachine pc_machine_v1_2 = {
 
   static void pc_machine_v1_1_compat(void)
   {
  +disable_kvm_pv_eoi();
   }
 
   static void pc_init_pci_v1_1(ram_addr_t ram_size,
  diff --git a/target-i386/cpu.c b/target-i386/cpu.c
  index 120a2e3..0d02fd1 100644
  --- a/target-i386/cpu.c
  +++ b/target-i386/cpu.c
  @@ -23,6 +23,7 @@
 
   #include cpu.h
   #include kvm.h
  +#include asm/kvm_para.h
 
   #include qemu-option.h
   #include qemu-config.h
  @@ -33,6 +34,7 @@
   #include hyperv.h
 
   #include hw/hw.h
  +#include hw/cpu_flags.h
 
   /* feature flags taken from Intel Processor Identification and the CPUID
* Instruction and AMD's CPUID Specification.  In cases of disagreement
  @@ -889,6 +891,12 @@ static int cpu_x86_find_by_name(x86_def_t 
  *x86_cpu_def, const char *cpu_model)
 
   plus_kvm_features = ~0; /* not supported bits will be filtered out 
  later */
 
  +/* Disable PV EOI for old machine types.
  + * Feature flags can still override. */
  +if (kvm_pv_eoi_disabled()) {
  +plus_kvm_features = ~(0x1  KVM_FEATURE_PV_EOI);
  +}
  +
   add_flagname_to_bitmaps(hypervisor, plus_features,
   plus_ext_features, plus_ext2_features, plus_ext3_features,
   plus_kvm_features, plus_svm_features);
  --
  MST
 
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types

2012-08-27 Thread Blue Swirl
On Mon, Aug 27, 2012 at 12:20 PM, Michael S. Tsirkin m...@redhat.com wrote:
 In preparation for adding PV EOI support, disable PV EOI by default for
 1.1 and older machine types, to avoid CPUID changing during migration.

 PV EOI can still be enabled/disabled by specifying it explicitly.
 Enable for 1.1
 -M pc-1.1 -cpu kvm64,+kvm_pv_eoi
 Disable for 1.2
 -M pc-1.2 -cpu kvm64,-kvm_pv_eoi

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  hw/Makefile.objs  |  2 +-
  hw/cpu_flags.c| 32 
  hw/cpu_flags.h|  9 +
  hw/pc_piix.c  |  2 ++
  target-i386/cpu.c |  8 
  5 files changed, 52 insertions(+), 1 deletion(-)
  create mode 100644 hw/cpu_flags.c
  create mode 100644 hw/cpu_flags.h

 diff --git a/hw/Makefile.objs b/hw/Makefile.objs
 index 850b87b..3f2532a 100644
 --- a/hw/Makefile.objs
 +++ b/hw/Makefile.objs
 @@ -1,5 +1,5 @@
  hw-obj-y = usb/ ide/
 -hw-obj-y += loader.o
 +hw-obj-y += loader.o cpu_flags.o
  hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
  hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
  hw-obj-y += fw_cfg.o
 diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c
 new file mode 100644
 index 000..2422d20
 --- /dev/null
 +++ b/hw/cpu_flags.c
 @@ -0,0 +1,32 @@
 +/*
 + * CPU compatibility flags.
 + *
 + * Copyright (c) 2012 Red Hat Inc.
 + * Author: Michael S. Tsirkin.
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License as published by
 + * the Free Software Foundation; either version 2 of the License, or
 + * (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License along
 + * with this program; if not, see http://www.gnu.org/licenses/.
 + */
 +#include hw/cpu_flags.h
 +
 +static bool __kvm_pv_eoi_disabled;

Don't use identifiers with leading underscores.

 +
 +void disable_kvm_pv_eoi(void)
 +{
 +   __kvm_pv_eoi_disabled = true;
 +}
 +
 +bool kvm_pv_eoi_disabled(void)
 +{
 +   return __kvm_pv_eoi_disabled;
 +}
 diff --git a/hw/cpu_flags.h b/hw/cpu_flags.h
 new file mode 100644
 index 000..05777b6
 --- /dev/null
 +++ b/hw/cpu_flags.h
 @@ -0,0 +1,9 @@
 +#ifndef HW_CPU_FLAGS_H
 +#define HW_CPU_FLAGS_H
 +
 +#include stdbool.h
 +
 +void disable_kvm_pv_eoi(void);
 +bool kvm_pv_eoi_disabled(void);
 +
 +#endif
 diff --git a/hw/pc_piix.c b/hw/pc_piix.c
 index 008d42f..bdbceda 100644
 --- a/hw/pc_piix.c
 +++ b/hw/pc_piix.c
 @@ -46,6 +46,7 @@
  #ifdef CONFIG_XEN
  #  include xen/hvm/hvm_info_table.h
  #endif
 +#include cpu_flags.h

  #define MAX_IDE_BUS 2

 @@ -371,6 +372,7 @@ static QEMUMachine pc_machine_v1_2 = {

  static void pc_machine_v1_1_compat(void)
  {
 +disable_kvm_pv_eoi();
  }

  static void pc_init_pci_v1_1(ram_addr_t ram_size,
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 120a2e3..0d02fd1 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -23,6 +23,7 @@

  #include cpu.h
  #include kvm.h
 +#include asm/kvm_para.h

  #include qemu-option.h
  #include qemu-config.h
 @@ -33,6 +34,7 @@
  #include hyperv.h

  #include hw/hw.h
 +#include hw/cpu_flags.h

  /* feature flags taken from Intel Processor Identification and the CPUID
   * Instruction and AMD's CPUID Specification.  In cases of disagreement
 @@ -889,6 +891,12 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
 const char *cpu_model)

  plus_kvm_features = ~0; /* not supported bits will be filtered out later 
 */

 +/* Disable PV EOI for old machine types.
 + * Feature flags can still override. */
 +if (kvm_pv_eoi_disabled()) {
 +plus_kvm_features = ~(0x1  KVM_FEATURE_PV_EOI);
 +}
 +
  add_flagname_to_bitmaps(hypervisor, plus_features,
  plus_ext_features, plus_ext2_features, plus_ext3_features,
  plus_kvm_features, plus_svm_features);
 --
 MST


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-08-27 Thread Blue Swirl
On Mon, Aug 27, 2012 at 7:01 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Mon, Aug 27, 2012 at 06:56:38PM +, Blue Swirl wrote:
  +static uint32_t slow_bar_readb(void *opaque, target_phys_addr_t addr)
  +{
  +AssignedDevRegion *d = opaque;
  +uint8_t *in = d-u.r_virtbase + addr;

 Don't perform arithmetic with void pointers.

 Why not?
 We require gcc and it's a documented extension there.

We don't require GCC, Clang can be used for some targets already.
Though it supports this non-standard extension too.

It's a bad idea to introduce dependencies where it's not necessary.

In this case it's not much effort to add the identifier for the struct
and in fact the only benefit ever is that the lazy coder saves a few
key presses.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types

2012-08-27 Thread Blue Swirl
On Mon, Aug 27, 2012 at 7:06 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Mon, Aug 27, 2012 at 06:58:29PM +, Blue Swirl wrote:
 On Mon, Aug 27, 2012 at 12:20 PM, Michael S. Tsirkin m...@redhat.com wrote:
  In preparation for adding PV EOI support, disable PV EOI by default for
  1.1 and older machine types, to avoid CPUID changing during migration.
 
  PV EOI can still be enabled/disabled by specifying it explicitly.
  Enable for 1.1
  -M pc-1.1 -cpu kvm64,+kvm_pv_eoi
  Disable for 1.2
  -M pc-1.2 -cpu kvm64,-kvm_pv_eoi
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   hw/Makefile.objs  |  2 +-
   hw/cpu_flags.c| 32 
   hw/cpu_flags.h|  9 +
   hw/pc_piix.c  |  2 ++
   target-i386/cpu.c |  8 
   5 files changed, 52 insertions(+), 1 deletion(-)
   create mode 100644 hw/cpu_flags.c
   create mode 100644 hw/cpu_flags.h
 
  diff --git a/hw/Makefile.objs b/hw/Makefile.objs
  index 850b87b..3f2532a 100644
  --- a/hw/Makefile.objs
  +++ b/hw/Makefile.objs
  @@ -1,5 +1,5 @@
   hw-obj-y = usb/ ide/
  -hw-obj-y += loader.o
  +hw-obj-y += loader.o cpu_flags.o
   hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
   hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
   hw-obj-y += fw_cfg.o
  diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c
  new file mode 100644
  index 000..2422d20
  --- /dev/null
  +++ b/hw/cpu_flags.c
  @@ -0,0 +1,32 @@
  +/*
  + * CPU compatibility flags.
  + *
  + * Copyright (c) 2012 Red Hat Inc.
  + * Author: Michael S. Tsirkin.
  + *
  + * This program is free software; you can redistribute it and/or modify
  + * it under the terms of the GNU General Public License as published by
  + * the Free Software Foundation; either version 2 of the License, or
  + * (at your option) any later version.
  + *
  + * This program is distributed in the hope that it will be useful,
  + * but WITHOUT ANY WARRANTY; without even the implied warranty of
  + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  + * GNU General Public License for more details.
  + *
  + * You should have received a copy of the GNU General Public License along
  + * with this program; if not, see http://www.gnu.org/licenses/.
  + */
  +#include hw/cpu_flags.h
  +
  +static bool __kvm_pv_eoi_disabled;

 Don't use identifiers with leading underscores.

 C99 spec says 
 Any other predefined macro names
 shall begin with a leading underscore followed by an uppercase letter or
 a second underscore.
 

 what are chances of compiler predefining macro __kvm_pv_eoi_disabled?

Why do you even consider that since it's trivially easy to use
something else? If a standard (and HACKING in our case) specifies
something, why do you want to fight it?


 But OK, will rename _kvm_pv_eoi_disabled.
 _ + lower case is guaranteed OK.

No, just use kvm_pv_eoi_disabled, the underscore is useless.



  +
  +void disable_kvm_pv_eoi(void)
  +{
  +   __kvm_pv_eoi_disabled = true;
  +}
  +
  +bool kvm_pv_eoi_disabled(void)
  +{
  +   return __kvm_pv_eoi_disabled;
  +}
  diff --git a/hw/cpu_flags.h b/hw/cpu_flags.h
  new file mode 100644
  index 000..05777b6
  --- /dev/null
  +++ b/hw/cpu_flags.h
  @@ -0,0 +1,9 @@
  +#ifndef HW_CPU_FLAGS_H
  +#define HW_CPU_FLAGS_H
  +
  +#include stdbool.h
  +
  +void disable_kvm_pv_eoi(void);
  +bool kvm_pv_eoi_disabled(void);
  +
  +#endif
  diff --git a/hw/pc_piix.c b/hw/pc_piix.c
  index 008d42f..bdbceda 100644
  --- a/hw/pc_piix.c
  +++ b/hw/pc_piix.c
  @@ -46,6 +46,7 @@
   #ifdef CONFIG_XEN
   #  include xen/hvm/hvm_info_table.h
   #endif
  +#include cpu_flags.h
 
   #define MAX_IDE_BUS 2
 
  @@ -371,6 +372,7 @@ static QEMUMachine pc_machine_v1_2 = {
 
   static void pc_machine_v1_1_compat(void)
   {
  +disable_kvm_pv_eoi();
   }
 
   static void pc_init_pci_v1_1(ram_addr_t ram_size,
  diff --git a/target-i386/cpu.c b/target-i386/cpu.c
  index 120a2e3..0d02fd1 100644
  --- a/target-i386/cpu.c
  +++ b/target-i386/cpu.c
  @@ -23,6 +23,7 @@
 
   #include cpu.h
   #include kvm.h
  +#include asm/kvm_para.h
 
   #include qemu-option.h
   #include qemu-config.h
  @@ -33,6 +34,7 @@
   #include hyperv.h
 
   #include hw/hw.h
  +#include hw/cpu_flags.h
 
   /* feature flags taken from Intel Processor Identification and the CPUID
* Instruction and AMD's CPUID Specification.  In cases of disagreement
  @@ -889,6 +891,12 @@ static int cpu_x86_find_by_name(x86_def_t 
  *x86_cpu_def, const char *cpu_model)
 
   plus_kvm_features = ~0; /* not supported bits will be filtered out 
  later */
 
  +/* Disable PV EOI for old machine types.
  + * Feature flags can still override. */
  +if (kvm_pv_eoi_disabled()) {
  +plus_kvm_features = ~(0x1  KVM_FEATURE_PV_EOI);
  +}
  +
   add_flagname_to_bitmaps(hypervisor, plus_features,
   plus_ext_features, plus_ext2_features, plus_ext3_features,
   plus_kvm_features, plus_svm_features);
  --
  MST
 
 
--
To unsubscribe from this list: send 

Re: [patch 3/3] KVM: move postcommit flush to x86, as mmio sptes are x86 specific

2012-08-27 Thread Marcelo Tosatti
On Mon, Aug 27, 2012 at 11:41:08PM +0900, Takuya Yoshikawa wrote:
 On Fri, 24 Aug 2012 15:54:59 -0300
 Marcelo Tosatti mtosa...@redhat.com wrote:
 
  Other arches do not need this.
  
  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
  
  Index: kvm/arch/x86/kvm/x86.c
  ===
  --- kvm.orig/arch/x86/kvm/x86.c
  +++ kvm/arch/x86/kvm/x86.c
  @@ -6455,6 +6455,14 @@ void kvm_arch_commit_memory_region(struc
  kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
  kvm_mmu_slot_remove_write_access(kvm, mem-slot);
  spin_unlock(kvm-mmu_lock);
  +   /*
  +* If the new memory slot is created, we need to clear all
  +* mmio sptes.
  +*/
  +   if (old.npages == 0  npages) {
  +   kvm_mmu_zap_all(kvm);
  +   kvm_reload_remote_mmus(kvm);
  +   }
   }
 
 Any explanation why (old.base_gfn != new.base_gfn) case can be
 omitted?

(old.base_gfn != new.base_gfn) check covers the cases

1. old.base_gfn = 0, new.base_gfn = !0 (slot creation)

and

x != 0, y != 0, x != y.
2. old.base_gfn = x, new.base_gfn = y (gpa base change)

Patch 2 covers case 2, so its only necessary to cover case
1 here.

Makes sense?

 Takuya
 
   
   void kvm_arch_flush_shadow_all(struct kvm *kvm)
  Index: kvm/virt/kvm/kvm_main.c
  ===
  --- kvm.orig/virt/kvm/kvm_main.c
  +++ kvm/virt/kvm/kvm_main.c
  @@ -849,13 +849,6 @@ int __kvm_set_memory_region(struct kvm *
   
  kvm_arch_commit_memory_region(kvm, mem, old, user_alloc);
   
  -   /*
  -* If the new memory slot is created, we need to clear all
  -* mmio sptes.
  -*/
  -   if (npages  old.base_gfn != mem-guest_phys_addr  PAGE_SHIFT)
  -   kvm_arch_flush_shadow_all(kvm);
  -
  kvm_free_physmem_slot(old, new);
  kfree(old_memslots);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] kvm: Fix nonsense handling of compat ioctl

2012-08-27 Thread Marcelo Tosatti
On Wed, Aug 22, 2012 at 02:34:11PM +0100, Alan Cox wrote:
 From: Alan Cox a...@linux.intel.com
 
 KVM_SET_SIGNAL_MASK passed a NULL argument leaves the on stack signal
 sets uninitialized. It then passes them through to
 kvm_vcpu_ioctl_set_sigmask.
 
 We should be passing a NULL in this case not translated garbage.
 
 Signed-off-by: Alan Cox a...@linux.intel.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: setting time in guest with ntpdate results in VM hang

2012-08-27 Thread David Ahern

On 8/27/12 10:58 AM, Dale Swanston wrote:

Good idea.  I'll try that.

But are there any tools available to determine what the VM is doing when
it appears hung?  I've looked but haven't found much on debug or
diagnostics on a running VM.  Any links?


If you have the vmlinux, enable the gdbserver stub via Qemu's monitor. 
Then use 'gdb vmlinux', connect to the VM 'target remote host:port' and 
look at the backtrace.


I have seen something similar using kvm-clock in a guest running 2.6.27.

David
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types

2012-08-27 Thread Michael S. Tsirkin
On Mon, Aug 27, 2012 at 07:12:27PM +, Blue Swirl wrote:
 On Mon, Aug 27, 2012 at 7:06 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Mon, Aug 27, 2012 at 06:58:29PM +, Blue Swirl wrote:
  On Mon, Aug 27, 2012 at 12:20 PM, Michael S. Tsirkin m...@redhat.com 
  wrote:
   In preparation for adding PV EOI support, disable PV EOI by default for
   1.1 and older machine types, to avoid CPUID changing during migration.
  
   PV EOI can still be enabled/disabled by specifying it explicitly.
   Enable for 1.1
   -M pc-1.1 -cpu kvm64,+kvm_pv_eoi
   Disable for 1.2
   -M pc-1.2 -cpu kvm64,-kvm_pv_eoi
  
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   ---
hw/Makefile.objs  |  2 +-
hw/cpu_flags.c| 32 
hw/cpu_flags.h|  9 +
hw/pc_piix.c  |  2 ++
target-i386/cpu.c |  8 
5 files changed, 52 insertions(+), 1 deletion(-)
create mode 100644 hw/cpu_flags.c
create mode 100644 hw/cpu_flags.h
  
   diff --git a/hw/Makefile.objs b/hw/Makefile.objs
   index 850b87b..3f2532a 100644
   --- a/hw/Makefile.objs
   +++ b/hw/Makefile.objs
   @@ -1,5 +1,5 @@
hw-obj-y = usb/ ide/
   -hw-obj-y += loader.o
   +hw-obj-y += loader.o cpu_flags.o
hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
hw-obj-y += fw_cfg.o
   diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c
   new file mode 100644
   index 000..2422d20
   --- /dev/null
   +++ b/hw/cpu_flags.c
   @@ -0,0 +1,32 @@
   +/*
   + * CPU compatibility flags.
   + *
   + * Copyright (c) 2012 Red Hat Inc.
   + * Author: Michael S. Tsirkin.
   + *
   + * This program is free software; you can redistribute it and/or modify
   + * it under the terms of the GNU General Public License as published by
   + * the Free Software Foundation; either version 2 of the License, or
   + * (at your option) any later version.
   + *
   + * This program is distributed in the hope that it will be useful,
   + * but WITHOUT ANY WARRANTY; without even the implied warranty of
   + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   + * GNU General Public License for more details.
   + *
   + * You should have received a copy of the GNU General Public License 
   along
   + * with this program; if not, see http://www.gnu.org/licenses/.
   + */
   +#include hw/cpu_flags.h
   +
   +static bool __kvm_pv_eoi_disabled;
 
  Don't use identifiers with leading underscores.
 
  C99 spec says 
  Any other predefined macro names
  shall begin with a leading underscore followed by an uppercase letter or
  a second underscore.
  
 
  what are chances of compiler predefining macro __kvm_pv_eoi_disabled?
 
 Why do you even consider that since it's trivially easy to use
 something else? If a standard (and HACKING in our case) specifies
 something, why do you want to fight it?

I missed this in HACKING, you are right:

2.4. Reserved namespaces in C and POSIX
Underscore capital, double underscore, and underscore 't' suffixes
should be avoided.

so _kvm_pv_eoi_disabled is ok __kvm_pv_eoi_disabled is not.
Will fix.

 
  But OK, will rename _kvm_pv_eoi_disabled.
  _ + lower case is guaranteed OK.
 
 No, just use kvm_pv_eoi_disabled, the underscore is useless.

It isn't useless, this avoids conflict with function name.
_ says it's an internal variable used to implement kvm_pv_eoi_disabled
in a very clear way.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool

2012-08-27 Thread David Ahern

On 8/27/12 9:53 AM, Andrew Jones wrote:

On Mon, Aug 27, 2012 at 05:51:46PM +0800, Dong Hao wrote:

snip


+struct event_stats {
+   u64 count;
+   u64 time;
+
+   /* used to calculate stddev. */
+   double mean;
+   double M2;
+};


How about moving the stats functions from builtin-stat.c to e.g.
util/stats.c, and then reusing them? Then this struct (which I would
rename to kvm_event_stats) would look like this

struct kvm_event_stats {
 u64 time;
 struct stats stats;
};

of course the get_event_ accessor generators would need tweaking


Given the history of the command (first submitted back in February) code 
refactoring can wait until there is a second user for the stats code.


David
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types

2012-08-27 Thread Blue Swirl
On Mon, Aug 27, 2012 at 7:24 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Mon, Aug 27, 2012 at 07:12:27PM +, Blue Swirl wrote:
 On Mon, Aug 27, 2012 at 7:06 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Mon, Aug 27, 2012 at 06:58:29PM +, Blue Swirl wrote:
  On Mon, Aug 27, 2012 at 12:20 PM, Michael S. Tsirkin m...@redhat.com 
  wrote:
   In preparation for adding PV EOI support, disable PV EOI by default for
   1.1 and older machine types, to avoid CPUID changing during migration.
  
   PV EOI can still be enabled/disabled by specifying it explicitly.
   Enable for 1.1
   -M pc-1.1 -cpu kvm64,+kvm_pv_eoi
   Disable for 1.2
   -M pc-1.2 -cpu kvm64,-kvm_pv_eoi
  
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   ---
hw/Makefile.objs  |  2 +-
hw/cpu_flags.c| 32 
hw/cpu_flags.h|  9 +
hw/pc_piix.c  |  2 ++
target-i386/cpu.c |  8 
5 files changed, 52 insertions(+), 1 deletion(-)
create mode 100644 hw/cpu_flags.c
create mode 100644 hw/cpu_flags.h
  
   diff --git a/hw/Makefile.objs b/hw/Makefile.objs
   index 850b87b..3f2532a 100644
   --- a/hw/Makefile.objs
   +++ b/hw/Makefile.objs
   @@ -1,5 +1,5 @@
hw-obj-y = usb/ ide/
   -hw-obj-y += loader.o
   +hw-obj-y += loader.o cpu_flags.o
hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
hw-obj-y += fw_cfg.o
   diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c
   new file mode 100644
   index 000..2422d20
   --- /dev/null
   +++ b/hw/cpu_flags.c
   @@ -0,0 +1,32 @@
   +/*
   + * CPU compatibility flags.
   + *
   + * Copyright (c) 2012 Red Hat Inc.
   + * Author: Michael S. Tsirkin.
   + *
   + * This program is free software; you can redistribute it and/or modify
   + * it under the terms of the GNU General Public License as published by
   + * the Free Software Foundation; either version 2 of the License, or
   + * (at your option) any later version.
   + *
   + * This program is distributed in the hope that it will be useful,
   + * but WITHOUT ANY WARRANTY; without even the implied warranty of
   + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   + * GNU General Public License for more details.
   + *
   + * You should have received a copy of the GNU General Public License 
   along
   + * with this program; if not, see http://www.gnu.org/licenses/.
   + */
   +#include hw/cpu_flags.h
   +
   +static bool __kvm_pv_eoi_disabled;
 
  Don't use identifiers with leading underscores.
 
  C99 spec says 
  Any other predefined macro names
  shall begin with a leading underscore followed by an uppercase letter or
  a second underscore.
  
 
  what are chances of compiler predefining macro __kvm_pv_eoi_disabled?

 Why do you even consider that since it's trivially easy to use
 something else? If a standard (and HACKING in our case) specifies
 something, why do you want to fight it?

 I missed this in HACKING, you are right:

 2.4. Reserved namespaces in C and POSIX
 Underscore capital, double underscore, and underscore 't' suffixes
 should be avoided.

 so _kvm_pv_eoi_disabled is ok __kvm_pv_eoi_disabled is not.
 Will fix.

No leading underscores. They are not used in QEMU.


 
  But OK, will rename _kvm_pv_eoi_disabled.
  _ + lower case is guaranteed OK.

 No, just use kvm_pv_eoi_disabled, the underscore is useless.

 It isn't useless, this avoids conflict with function name.
 _ says it's an internal variable used to implement kvm_pv_eoi_disabled
 in a very clear way.

Sure, but there are infinite number of ways of making the identifiers
unique. Using leading underscores is a way to ever conflict with
compiler, linker,  libc, POSIX etc. Don't do it.

Where's your imagination, can't you invent any other prefix or suffix?


 --
 MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


/dev/kvm not sufficiently restricted, and in ways I didn't think were possible

2012-08-27 Thread Henry Cejtin
I'm  completely  confused  about  access to /dev/kvm.  In particular, it
looks like it is too  open  to  access,  but  in  a  way  that  I  don't
understand.

On my machine, /dev/kvm is owned by root.root and mode 660.  Here is the
output of ls:

% ls -l /dev/kvm
crw-rw+ 1 root root 10, 232 Aug 24 15:03 /dev/kvm

Despite that, when a process is uid 1000 and group id 1000, and  not  in
any other groups, I can open /dev/kvm.

I.e., here are the relevant lines from /proc/pid/status:

Uid:1000100010001000
Gid:1000100010001000
Groups: 1000

Note,  just  to  show  this  isn't  some  weirdness  in  /etc/passwd  or
/etc/groups, here is the output of stat on /dev/kvm:

  File: `/dev/kvm'
  Size: 0   Blocks: 0  IO Block: 4096
character special file
Device: 5h/5d   Inode: 2597329 Links: 1 Device type: a,e8
Access: (0660/crw-rw)  Uid: (0/root)   Gid: (0/root)
Access: 2012-08-24 15:03:33.616998585 -0500
Modify: 2012-08-24 15:03:33.616998585 -0500
Change: 2012-08-24 15:03:33.616998585 -0500

Please note, I don't understand how this could really be.  Regardless of
what  the  /dev/kvm driver does, I don't get how I can get to open it if
the file which `is' the device doesn't  have  the  correct  permissions.
The  driver  can make access more restrictive than the file permissions,
but not less restrictive, or so I thought.

Also, if I try opening /dev/kvm as uid 1001 and group id 1000, again not
in any other groups, it fails.

I  don't understand how this could be.  Also, it means that uid 1000/gid
1000 can run virtual processes.  I want to be able to limit that, and  I
would  have  thought  that  /dev/kvm  having mode 660 and being owned by
root.root would have done it.

If it is any help, I am running a stock Debian Squeeze.  The kernel is
2.6.32-5-amd64.

Any help or pointers explaining  how  /dev/kvm  can  be  opened  by  uid
1000/gid  1000 would be greatly appreciated.  Also any explanation about
why uid 1000 is different than 1001.

Thanks
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: /dev/kvm not sufficiently restricted, and in ways I didn't think were possible

2012-08-27 Thread Avi Kivity
On 08/27/2012 01:11 PM, Henry Cejtin wrote:
 I'm  completely  confused  about  access to /dev/kvm.  In particular, it
 looks like it is too  open  to  access,  but  in  a  way  that  I  don't
 understand.

 On my machine, /dev/kvm is owned by root.root and mode 660.  Here is the
 output of ls:

 % ls -l /dev/kvm
 crw-rw+ 1 root root 10, 232 Aug 24 15:03 /dev/kvm

 Despite that, when a process is uid 1000 and group id 1000, and  not  in
 any other groups, I can open /dev/kvm.

 I.e., here are the relevant lines from /proc/pid/status:

 Uid:1000100010001000
 Gid:1000100010001000
 Groups: 1000

 Note,  just  to  show  this  isn't  some  weirdness  in  /etc/passwd  or
 /etc/groups, here is the output of stat on /dev/kvm:

   File: `/dev/kvm'
   Size: 0   Blocks: 0  IO Block: 4096
 character special file
 Device: 5h/5d   Inode: 2597329 Links: 1 Device type: a,e8
 Access: (0660/crw-rw)  Uid: (0/root)   Gid: (0/root)
 Access: 2012-08-24 15:03:33.616998585 -0500
 Modify: 2012-08-24 15:03:33.616998585 -0500
 Change: 2012-08-24 15:03:33.616998585 -0500

 Please note, I don't understand how this could really be.  Regardless of
 what  the  /dev/kvm driver does, I don't get how I can get to open it if
 the file which `is' the device doesn't  have  the  correct  permissions.
 The  driver  can make access more restrictive than the file permissions,
 but not less restrictive, or so I thought.

 Also, if I try opening /dev/kvm as uid 1001 and group id 1000, again not
 in any other groups, it fails.

 I  don't understand how this could be.  Also, it means that uid 1000/gid
 1000 can run virtual processes.  I want to be able to limit that, and  I
 would  have  thought  that  /dev/kvm  having mode 660 and being owned by
 root.root would have done it.

 If it is any help, I am running a stock Debian Squeeze.  The kernel is
 2.6.32-5-amd64.

 Any help or pointers explaining  how  /dev/kvm  can  be  opened  by  uid
 1000/gid  1000 would be greatly appreciated.  Also any explanation about
 why uid 1000 is different than 1001.



Strange.  Try changing the permissions to 600 or 060 to see if it's the
user or group that allows you access.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Michael Wolf
On Mon, 2012-08-27 at 11:50 -0700, Glauber Costa wrote:
 On 08/27/2012 08:50 AM, Michael Wolf wrote:
  On Sat, 2012-08-25 at 19:36 -0400, Glauber Costa wrote:
  On 08/24/2012 11:11 AM, Michael Wolf wrote:
  On Fri, 2012-08-24 at 08:53 +0400, Glauber Costa wrote:
  On 08/24/2012 03:14 AM, Michael Wolf wrote:
  This is an RFC regarding the reporting of stealtime.  In the case of
  where you have a system that is running with partial processors such as
  KVM the user may see steal time being reported in accounting tools such
  as top or vmstat.  This can cause confusion for the end user.  To
  ease the confusion this patch set adds a sysctl interface to set the
  cpu entitlement.  This is the percentage of cpu that the guest system is
   expected to receive.  As long as the steal time is within its expected
  range it will show up as 0 in /proc/stat.  The user will then see in the
  accounting tools that they are getting a full utilization of the cpu
  resources assigned to them.
 
 
  And how is such a knob not confusing?
 
  Steal time is pretty well defined in meaning and is shown in top for
  ages. I really don't see the point for this.
 
  Currently you can see the steal time but you have no way of knowing if
  the cpu utilization you are seeing on the guest is the expected amount.
  I decided on making it a knob because a guest could be migrated to
  another system and it's entitlement could change because of hardware or 
  load differences.  It could simply be a /proc file and report the
  current entitlement if needed.   As things are currently implemented I 
  don't see how someone knows if the guest is running as expected or
  whether there is a problem.
 
 
  Turning off steal time display won't get even close to displaying the
  information you want. What you probably want is a guest-visible way to
  say how many miliseconds you are expected to run each second. Right?
  
  It is not clear to me how knowing how many milliseconds you are
  expecting to run will help the user.  Currently the users will run top
  to see how well the guest is running.  If they see _any_ steal time some
  users think they are not getting the full use of their processor
  entitlement.
 
 
 And your plan is just to selectively lie about it, but disabling it with
 a knob?

It is about making it very obvious to the end user whether they are
receiving their cpu entitlement.  If there is more steal time than
expected that will still show up.  I have experimented, and it seems to
work, to put the raw stealtime at the end of each cpu line
in /proc/stat.  That way the raw data is there as well.   

Do you have another suggestion to communicate to the user whether they
are receiving their full entitlement?  At the very least shouldn't the
entitlement reside in a /proc file somewhere so that the user could look
up the value and do the math?

 
  Maybe I'm missing what you are proposing, but even if you knew the
  milliseconds that you were expecting for your processor you would have
  to adjust the top output in your head so to speak.  You would see the
  utilization and then say 'ok that matches the number of milliseconds I
  expected to run...   If we take away the steal time (as long as it is
  equal to or less than the expected amount of steal time) then the user
  running top will see the 100% utilization.
  
 



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] KVM: x86 emulator: access GPRs on demand

2012-08-27 Thread Avi Kivity
On 08/26/2012 10:04 AM, Marcelo Tosatti wrote:
 On Thu, Aug 23, 2012 at 05:14:27AM -0300, Marcelo Tosatti wrote:
  On Sun, Aug 19, 2012 at 12:32:36PM +0300, Avi Kivity wrote:
   On 08/17/2012 08:29 PM, Marcelo Tosatti wrote:
On Thu, Aug 16, 2012 at 05:54:49PM +0300, Avi Kivity wrote:
Instead of populating the the entire register file, read in registers
as they are accessed, and write back only the modified ones.  This
saves a VMREAD and VMWRITE on Intel (for rsp, since it is not usually
used during emulation), and a two 128-byte copies for the registers.


@@ -2715,14 +2764,17 @@ int emulator_task_switch(struct 
x86_emulate_ctxt *ctxt,
 {
   int rc;
 
+  invalidate_registers(ctxt);
   ctxt-_eip = ctxt-eip;
   ctxt-dst.type = OP_NONE;
 
   rc = emulator_do_task_switch(ctxt, tss_selector, idt_index, 
reason,
has_error_code, error_code);
 
-  if (rc == X86EMUL_CONTINUE)
+  if (rc == X86EMUL_CONTINUE) {
   ctxt-eip = ctxt-_eip;
+  writeback_registers(ctxt);
+  }
 
   return (rc == X86EMUL_UNHANDLEABLE) ? EMULATION_FAILED : 
EMULATION_OK;
 }


No clear point when emulator register cache is active, when it is
not (AFAICS this patch does not invalidate registers on emulation start
(the above being one of the exceptions) does not clear valid bit on
writeback-to-vcpu-cache on emulation exit).
   
   It is cleared when emulation starts.  For the non-insn-emulation entry
   points, there is an explicit invalidate.  For the emulation entry point,
   there is a memset() that clears everything up to _regs, which includes
   the cache.  This discrepancy isn't nice, but it preexists.  I don't know
   whether we should decompose the memset() or not, it is rather efficient.
   

Concern is that emulator can start with cached registers marked as 
valid 
but in fact are invalid from previous emulation round.

Maybe move invalidate() to init_emulate_ctxt?

   
   See the memset() in init_decode_cache().
  
  Right. Applied, thanks.

 Actually, had to revert because autotest was failing. 


Was it failing because of this patch?  Or what?

 Now it rejects:

 4 out of 49 hunks FAILED -- saving rejects to file
 arch/x86/kvm/emulate.c.rej

 Please regenerate.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Michael Wolf
On Mon, 2012-08-27 at 11:55 -0700, Avi Kivity wrote:
 On 08/23/2012 04:14 PM, Michael Wolf wrote:
  This is an RFC regarding the reporting of stealtime.  In the case of
  where you have a system that is running with partial processors such as
  KVM the user may see steal time being reported in accounting tools such
  as top or vmstat.  This can cause confusion for the end user.  To
  ease the confusion this patch set adds a sysctl interface to set the
  cpu entitlement.  This is the percentage of cpu that the guest system is
   expected to receive.  As long as the steal time is within its expected
  range it will show up as 0 in /proc/stat.  The user will then see in the
  accounting tools that they are getting a full utilization of the cpu
  resources assigned to them.
 
  This patchset is changing the contents/output of /proc/stat and could 
  affect 
  user tools.  However the default setting is that the cpu is entitled to 
  100% 
  so the code will act as before.  Also another field could be added to the 
  /proc/stat output and show the unaltered steal time. Since this additional 
  field could cause more confusion than it would clear up I have left it out 
  for now.
  
 
 How would a guest know what its entitlement is?
 
 

Currently the Admin/management tool setting up the guests will put it on
the qemu commandline.  From this it is passed via an ioctl to the host.
The guest will get the value from the host via a hypercall.

In the future the host could try and do some of it automatically in some
cases. 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] KVM: x86 emulator: access GPRs on demand

2012-08-27 Thread Avi Kivity
On 08/27/2012 01:22 PM, Avi Kivity wrote:
 On 08/26/2012 10:04 AM, Marcelo Tosatti wrote:
  On Thu, Aug 23, 2012 at 05:14:27AM -0300, Marcelo Tosatti wrote:
   On Sun, Aug 19, 2012 at 12:32:36PM +0300, Avi Kivity wrote:
On 08/17/2012 08:29 PM, Marcelo Tosatti wrote:
 On Thu, Aug 16, 2012 at 05:54:49PM +0300, Avi Kivity wrote:
 Instead of populating the the entire register file, read in registers
 as they are accessed, and write back only the modified ones.  This
 saves a VMREAD and VMWRITE on Intel (for rsp, since it is not usually
 used during emulation), and a two 128-byte copies for the registers.
 
 
 @@ -2715,14 +2764,17 @@ int emulator_task_switch(struct 
 x86_emulate_ctxt *ctxt,
  {
  int rc;
  
 +invalidate_registers(ctxt);
  ctxt-_eip = ctxt-eip;
  ctxt-dst.type = OP_NONE;
  
  rc = emulator_do_task_switch(ctxt, tss_selector, idt_index, 
 reason,
   has_error_code, error_code);
  
 -if (rc == X86EMUL_CONTINUE)
 +if (rc == X86EMUL_CONTINUE) {
  ctxt-eip = ctxt-_eip;
 +writeback_registers(ctxt);
 +}
  
  return (rc == X86EMUL_UNHANDLEABLE) ? EMULATION_FAILED : 
 EMULATION_OK;
  }
 
 
 No clear point when emulator register cache is active, when it is
 not (AFAICS this patch does not invalidate registers on emulation 
 start
 (the above being one of the exceptions) does not clear valid bit on
 writeback-to-vcpu-cache on emulation exit).

It is cleared when emulation starts.  For the non-insn-emulation entry
points, there is an explicit invalidate.  For the emulation entry point,
there is a memset() that clears everything up to _regs, which includes
the cache.  This discrepancy isn't nice, but it preexists.  I don't know
whether we should decompose the memset() or not, it is rather efficient.

 
 Concern is that emulator can start with cached registers marked as 
 valid 
 but in fact are invalid from previous emulation round.
 
 Maybe move invalidate() to init_emulate_ctxt?
 

See the memset() in init_decode_cache().
   
   Right. Applied, thanks.
 
  Actually, had to revert because autotest was failing. 
 

 Was it failing because of this patch?  Or what?



I see, the rsp mask fix.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Avi Kivity
On 08/27/2012 01:23 PM, Michael Wolf wrote:
  
  How would a guest know what its entitlement is?
  
  

 Currently the Admin/management tool setting up the guests will put it on
 the qemu commandline.  From this it is passed via an ioctl to the host.
 The guest will get the value from the host via a hypercall.

 In the future the host could try and do some of it automatically in some
 cases. 

Seems to me it's a meaningless value for the guest.  Suppose it is
migrated to a host that is more powerful, and as a result its relative
entitlement is reduced.  The value needs to be adjusted.

This is best taken care of from the host side.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] KVM: x86 emulator: access GPRs on demand

2012-08-27 Thread Avi Kivity
Instead of populating the the entire register file, read in registers
as they are accessed, and write back only the modified ones.  This
saves a VMREAD and VMWRITE on Intel (for rsp, since it is not usually
used during emulation), and a two 128-byte copies for the registers.

Signed-off-by: Avi Kivity a...@redhat.com
---

v4:
  rebased

v3:
  fix misplaced parentheses in em_loop() and em_jcxz(), unbreaking those 
instructions.

v2:
  add APIs for managing the register cache.  This reduces the potential for 
confusion
between ctxt-regs_dirty and vcpu-arch.regs_dirty.
  move cache management to the entry points
  add missing writebacks to int and task switch emulation

 arch/x86/include/asm/kvm_emulate.h |  20 ++-
 arch/x86/kvm/emulate.c | 299 +++--
 arch/x86/kvm/x86.c |  45 +++---
 3 files changed, 220 insertions(+), 144 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index c764f43..282aee5 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -86,6 +86,19 @@ struct x86_instruction_info {
 
 struct x86_emulate_ops {
/*
+* read_gpr: read a general purpose register (rax - r15)
+*
+* @reg: gpr number.
+*/
+   ulong (*read_gpr)(struct x86_emulate_ctxt *ctxt, unsigned reg);
+   /*
+* write_gpr: write a general purpose register (rax - r15)
+*
+* @reg: gpr number.
+* @val: value to write.
+*/
+   void (*write_gpr)(struct x86_emulate_ctxt *ctxt, unsigned reg, ulong 
val);
+   /*
 * read_std: Read bytes of standard (non-emulated/special) memory.
 *   Used for descriptor reading.
 *  @addr:  [IN ] Linear address from which to read.
@@ -281,8 +294,10 @@ struct x86_emulate_ctxt {
bool rip_relative;
unsigned long _eip;
struct operand memop;
+   u32 regs_valid;  /* bitmaps of registers in _regs[] that can be read */
+   u32 regs_dirty;  /* bitmaps of registers in _regs[] that have been 
written */
/* Fields above regs are cleared together. */
-   unsigned long regs[NR_VCPU_REGS];
+   unsigned long _regs[NR_VCPU_REGS];
struct operand *memopp;
struct fetch_cache fetch;
struct read_cache io_read;
@@ -394,4 +409,7 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
 u16 tss_selector, int idt_index, int reason,
 bool has_error_code, u32 error_code);
 int emulate_int_real(struct x86_emulate_ctxt *ctxt, int irq);
+void emulator_invalidate_register_cache(struct x86_emulate_ctxt *ctxt);
+void emulator_writeback_register_cache(struct x86_emulate_ctxt *ctxt);
+
 #endif /* _ASM_X86_KVM_X86_EMULATE_H */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e8fb6c5..5e27ba5 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -202,6 +202,42 @@ struct gprefix {
 #define EFLG_RESERVED_ZEROS_MASK 0xffc0802a
 #define EFLG_RESERVED_ONE_MASK 2
 
+static ulong reg_read(struct x86_emulate_ctxt *ctxt, unsigned nr)
+{
+   if (!(ctxt-regs_valid  (1  nr))) {
+   ctxt-regs_valid |= 1  nr;
+   ctxt-_regs[nr] = ctxt-ops-read_gpr(ctxt, nr);
+   }
+   return ctxt-_regs[nr];
+}
+
+static ulong *reg_write(struct x86_emulate_ctxt *ctxt, unsigned nr)
+{
+   ctxt-regs_valid |= 1  nr;
+   ctxt-regs_dirty |= 1  nr;
+   return ctxt-_regs[nr];
+}
+
+static ulong *reg_rmw(struct x86_emulate_ctxt *ctxt, unsigned nr)
+{
+   reg_read(ctxt, nr);
+   return reg_write(ctxt, nr);
+}
+
+static void writeback_registers(struct x86_emulate_ctxt *ctxt)
+{
+   unsigned reg;
+
+   for_each_set_bit(reg, (ulong *)ctxt-regs_dirty, 16)
+   ctxt-ops-write_gpr(ctxt, reg, ctxt-_regs[reg]);
+}
+
+static void invalidate_registers(struct x86_emulate_ctxt *ctxt)
+{
+   ctxt-regs_dirty = 0;
+   ctxt-regs_valid = 0;
+}
+
 /*
  * Instruction emulation:
  * Most instructions are emulated directly via a fragment of inline assembly
@@ -374,8 +410,8 @@ struct gprefix {
 #define __emulate_1op_rax_rdx(ctxt, _op, _suffix, _ex) \
do {\
unsigned long _tmp; \
-   ulong *rax = (ctxt)-regs[VCPU_REGS_RAX];  \
-   ulong *rdx = (ctxt)-regs[VCPU_REGS_RDX];  \
+   ulong *rax = reg_rmw((ctxt), VCPU_REGS_RAX);\
+   ulong *rdx = reg_rmw((ctxt), VCPU_REGS_RDX);\
\
__asm__ __volatile__ (  \
_PRE_EFLAGS(0, 5, 1)  \
@@ -494,7 +530,7 @@ static void masked_increment(ulong *reg, ulong mask, int 
inc)
 
 static void 

Re: [PATCH v3] KVM: x86 emulator: access GPRs on demand

2012-08-27 Thread Marcelo Tosatti
On Mon, Aug 27, 2012 at 05:53:32PM -0300, Marcelo Tosatti wrote:
 With the fix, it rejects. About to merge the big real mode 
 patchset, so its not a bad idea to wait for that before
 resending.

Nevermind this sentence.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] KVM: x86 emulator: access GPRs on demand

2012-08-27 Thread Marcelo Tosatti
On Mon, Aug 27, 2012 at 01:22:55PM -0700, Avi Kivity wrote:
 On 08/26/2012 10:04 AM, Marcelo Tosatti wrote:
  On Thu, Aug 23, 2012 at 05:14:27AM -0300, Marcelo Tosatti wrote:
   On Sun, Aug 19, 2012 at 12:32:36PM +0300, Avi Kivity wrote:
On 08/17/2012 08:29 PM, Marcelo Tosatti wrote:
 On Thu, Aug 16, 2012 at 05:54:49PM +0300, Avi Kivity wrote:
 Instead of populating the the entire register file, read in registers
 as they are accessed, and write back only the modified ones.  This
 saves a VMREAD and VMWRITE on Intel (for rsp, since it is not usually
 used during emulation), and a two 128-byte copies for the registers.
 
 
 @@ -2715,14 +2764,17 @@ int emulator_task_switch(struct 
 x86_emulate_ctxt *ctxt,
  {
  int rc;
  
 +invalidate_registers(ctxt);
  ctxt-_eip = ctxt-eip;
  ctxt-dst.type = OP_NONE;
  
  rc = emulator_do_task_switch(ctxt, tss_selector, idt_index, 
 reason,
   has_error_code, error_code);
  
 -if (rc == X86EMUL_CONTINUE)
 +if (rc == X86EMUL_CONTINUE) {
  ctxt-eip = ctxt-_eip;
 +writeback_registers(ctxt);
 +}
  
  return (rc == X86EMUL_UNHANDLEABLE) ? EMULATION_FAILED : 
 EMULATION_OK;
  }
 
 
 No clear point when emulator register cache is active, when it is
 not (AFAICS this patch does not invalidate registers on emulation 
 start
 (the above being one of the exceptions) does not clear valid bit on
 writeback-to-vcpu-cache on emulation exit).

It is cleared when emulation starts.  For the non-insn-emulation entry
points, there is an explicit invalidate.  For the emulation entry point,
there is a memset() that clears everything up to _regs, which includes
the cache.  This discrepancy isn't nice, but it preexists.  I don't know
whether we should decompose the memset() or not, it is rather efficient.

 
 Concern is that emulator can start with cached registers marked as 
 valid 
 but in fact are invalid from previous emulation round.
 
 Maybe move invalidate() to init_emulate_ctxt?
 

See the memset() in init_decode_cache().
   
   Right. Applied, thanks.
 
  Actually, had to revert because autotest was failing. 
 
 
 Was it failing because of this patch?  Or what?

No, due to lack of stack size fix. 

With the fix, it rejects. About to merge the big real mode 
patchset, so its not a bad idea to wait for that before
resending.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-3.6] kvm: fix KVM_GET_MSR for PV EOI

2012-08-27 Thread Marcelo Tosatti
On Sun, Aug 26, 2012 at 06:00:29PM +0300, Michael S. Tsirkin wrote:
 KVM_GET_MSR was missing support for PV EOI,
 which is needed for migration.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Please consider this bugfix patch for 3.6.
 Thanks!
 
  arch/x86/kvm/x86.c | 3 +++
  1 file changed, 3 insertions(+)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 91a5958..ff5e985 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -1993,6 +1993,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
 u64 *pdata)
   case MSR_KVM_STEAL_TIME:
   data = vcpu-arch.st.msr_val;
   break;
 + case MSR_KVM_PV_EOI_EN:
 + data = vcpu-arch.pv_eoi.msr_val;
 + break;
   case MSR_IA32_P5_MC_ADDR:
   case MSR_IA32_P5_MC_TYPE:
   case MSR_IA32_MCG_CAP:

Should increase KVM_SAVE_MSRS_BEGIN.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/2] KVM: PPC: Move kvm-arch.slot_phys into memslot.arch

2012-08-27 Thread Marcelo Tosatti
On Sat, Aug 25, 2012 at 10:40:40PM +1000, Paul Mackerras wrote:
 Now that we have an architecture-specific field in the kvm_memory_slot
 structure, we can use it to store the array of page physical addresses
 that we need for Book3S HV KVM on PPC970 processors.  This reduces the
 size of struct kvm_arch for Book3S HV, and also reduces the size of
 struct kvm_arch_memory_slot for other PPC KVM variants since the fields
 in it are now only compiled in for Book3S HV.
 
 This necessitates making the kvm_arch_create_memslot and
 kvm_arch_free_memslot operations specific to each PPC KVM variant.
 That in turn means that we now don't allocate the rmap arrays on
 Book3S PR and Book E.
 
 Since we now unpin pages and free the slot_phys array in
 kvmppc_core_free_memslot, we no longer need to do it in
 kvmppc_core_destroy_vm, since the generic code takes care to free
 all the memslots when destroying a VM.
 
 We now need the new memslot to be passed in to
 kvmppc_core_prepare_memory_region, since we need to initialize its
 arch.slot_phys member on Book3S HV.
 
 Signed-off-by: Paul Mackerras pau...@samba.org
 ---
 This is on top of Alex's kvm-ppc-next branch with the KVM tree's next
 branch merged in and then Marcelo's set of 3 patches on that.
 
  arch/powerpc/include/asm/kvm_host.h |9 +--
  arch/powerpc/include/asm/kvm_ppc.h  |5 ++
  arch/powerpc/kvm/book3s_64_mmu_hv.c |6 +-
  arch/powerpc/kvm/book3s_hv.c|  104 
 ---
  arch/powerpc/kvm/book3s_hv_rm_mmu.c |2 +-
  arch/powerpc/kvm/book3s_pr.c|   12 
  arch/powerpc/kvm/booke.c|   12 
  arch/powerpc/kvm/powerpc.c  |   13 +
  8 files changed, 102 insertions(+), 61 deletions(-)

Regarding generic memslot code, looks fine.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm/book3s: fix build error caused by gfn_to_hva_memslot()

2012-08-27 Thread Marcelo Tosatti
On Fri, Aug 24, 2012 at 07:03:14PM +1000, Paul Mackerras wrote:
 On Fri, Aug 24, 2012 at 04:50:28PM +0800, Gavin Shan wrote:
  The build error was caused by that builtin functions are calling
  the functions implemented in modules. That was introduced by the
  following commit.
  
  commit 4d8b81abc47b83a1939e59df2fdb0e98dfe0eedd
  
  The patch fixes the build error by moving function __gfn_to_hva_memslot()
  from kvm_main.c to kvm_host.h and making that inline so that the
  builtin function (kvmppc_h_enter) can use that.
  
  Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
 
 Acked-by: Paul Mackerras pau...@samba.org
 
 By the way, when you give a commit ID it's a good idea to give the
 headline of the commit as well, something like this:
 
 This error was introduced by commit 4d8b81abc4 (KVM: introduce
 readonly memslot).
 
 Paul.

Applied, thanks (with suggested changelog modification).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()

2012-08-27 Thread Marcelo Tosatti
On Fri, Aug 24, 2012 at 06:15:49PM +0900, Takuya Yoshikawa wrote:
 Although returning -1 should be likely according to the likely(),
 the ASSERT in apic_find_highest_irr() will be triggered in such a case.
 It seems that this optimization is not working as expected.
 
 This patch simplifies the logic to mitigate this issue: search for the
 first non-zero word in a for loop and then use __fls() if found.  When
 nothing found, we are out of the loop, so we can just return -1.

Numbers please?

 Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
 ---
  arch/x86/kvm/lapic.c |   18 ++
  1 files changed, 10 insertions(+), 8 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: setting time in guest with ntpdate results in VM hang

2012-08-27 Thread Marcelo Tosatti
On Mon, Aug 27, 2012 at 01:23:05PM -0600, David Ahern wrote:
 On 8/27/12 10:58 AM, Dale Swanston wrote:
 Good idea.  I'll try that.
 
 But are there any tools available to determine what the VM is doing when
 it appears hung?  I've looked but haven't found much on debug or
 diagnostics on a running VM.  Any links?
 
 If you have the vmlinux, enable the gdbserver stub via Qemu's
 monitor. Then use 'gdb vmlinux', connect to the VM 'target remote
 host:port' and look at the backtrace.

Another option is to boot the host with profile=kvm, wait for the guest to hang,
then do:

readprofile -r ; readprofile -m System-map-of-guest.map

 I have seen something similar using kvm-clock in a guest running 2.6.27.
 
 David

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Glauber Costa
On 08/27/2012 01:19 PM, Michael Wolf wrote:
 On Mon, 2012-08-27 at 11:50 -0700, Glauber Costa wrote:
 On 08/27/2012 08:50 AM, Michael Wolf wrote:
 On Sat, 2012-08-25 at 19:36 -0400, Glauber Costa wrote:
 On 08/24/2012 11:11 AM, Michael Wolf wrote:
 On Fri, 2012-08-24 at 08:53 +0400, Glauber Costa wrote:
 On 08/24/2012 03:14 AM, Michael Wolf wrote:
 This is an RFC regarding the reporting of stealtime.  In the case of
 where you have a system that is running with partial processors such as
 KVM the user may see steal time being reported in accounting tools such
 as top or vmstat.  This can cause confusion for the end user.  To
 ease the confusion this patch set adds a sysctl interface to set the
 cpu entitlement.  This is the percentage of cpu that the guest system is
  expected to receive.  As long as the steal time is within its expected
 range it will show up as 0 in /proc/stat.  The user will then see in the
 accounting tools that they are getting a full utilization of the cpu
 resources assigned to them.


 And how is such a knob not confusing?

 Steal time is pretty well defined in meaning and is shown in top for
 ages. I really don't see the point for this.

 Currently you can see the steal time but you have no way of knowing if
 the cpu utilization you are seeing on the guest is the expected amount.
 I decided on making it a knob because a guest could be migrated to
 another system and it's entitlement could change because of hardware or 
 load differences.  It could simply be a /proc file and report the
 current entitlement if needed.   As things are currently implemented I 
 don't see how someone knows if the guest is running as expected or
 whether there is a problem.


 Turning off steal time display won't get even close to displaying the
 information you want. What you probably want is a guest-visible way to
 say how many miliseconds you are expected to run each second. Right?

 It is not clear to me how knowing how many milliseconds you are
 expecting to run will help the user.  Currently the users will run top
 to see how well the guest is running.  If they see _any_ steal time some
 users think they are not getting the full use of their processor
 entitlement.


 And your plan is just to selectively lie about it, but disabling it with
 a knob?
 
 It is about making it very obvious to the end user whether they are
 receiving their cpu entitlement.  If there is more steal time than
 expected that will still show up.  I have experimented, and it seems to
 work, to put the raw stealtime at the end of each cpu line
 in /proc/stat.  That way the raw data is there as well.   
 
 Do you have another suggestion to communicate to the user whether they
 are receiving their full entitlement?  At the very least shouldn't the
 entitlement reside in a /proc file somewhere so that the user could look
 up the value and do the math?
 

I personally believe Avi is right here. This is something to be done at
the host side. The user can learn this from any tool he is using to
manage his VMs.

Now if you absolutely must inform him from inside the guest, I would go
with the later, informing him in another location. (I am not saying I
agree with this, just that this is less worse)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-3.6] kvm: fix KVM_GET_MSR for PV EOI

2012-08-27 Thread Marcelo Tosatti
On Mon, Aug 27, 2012 at 05:47:42PM -0300, Marcelo Tosatti wrote:
 On Sun, Aug 26, 2012 at 06:00:29PM +0300, Michael S. Tsirkin wrote:
  KVM_GET_MSR was missing support for PV EOI,
  which is needed for migration.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
  
  Please consider this bugfix patch for 3.6.
  Thanks!
  
   arch/x86/kvm/x86.c | 3 +++
   1 file changed, 3 insertions(+)
  
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index 91a5958..ff5e985 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -1993,6 +1993,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 
  msr, u64 *pdata)
  case MSR_KVM_STEAL_TIME:
  data = vcpu-arch.st.msr_val;
  break;
  +   case MSR_KVM_PV_EOI_EN:
  +   data = vcpu-arch.pv_eoi.msr_val;
  +   break;
  case MSR_IA32_P5_MC_ADDR:
  case MSR_IA32_P5_MC_TYPE:
  case MSR_IA32_MCG_CAP:
 
 Should increase KVM_SAVE_MSRS_BEGIN.

Already done by e115676e042f4d9268, applied.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Michael Wolf
On Mon, 2012-08-27 at 13:31 -0700, Avi Kivity wrote:
 On 08/27/2012 01:23 PM, Michael Wolf wrote:
   
   How would a guest know what its entitlement is?
   
   
 
  Currently the Admin/management tool setting up the guests will put it on
  the qemu commandline.  From this it is passed via an ioctl to the host.
  The guest will get the value from the host via a hypercall.
 
  In the future the host could try and do some of it automatically in some
  cases. 
 
 Seems to me it's a meaningless value for the guest.  Suppose it is
 migrated to a host that is more powerful, and as a result its relative
 entitlement is reduced.  The value needs to be adjusted.

This is why I chose to manage the value from the sysctl interface rather
than just have it stored as a value in /proc.  Whatever tool was used to
migrate the vm could hopefully adjust the sysctl value on the guest.
 
 This is best taken care of from the host side.

Not sure what you are getting at here.  If you are running in a cloud
environment, you purchase a VM with the understanding that you are
getting certain resources.  As this type of user I don't believe you
have any access to the host to see this type of information.  So the
user still wouldnt have a way to confirm that they are receiving what
they should be in the way of processor resources.

Would you please elaborate a little more on this?
 



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Glauber Costa
On 08/27/2012 02:27 PM, Michael Wolf wrote:
 On Mon, 2012-08-27 at 13:31 -0700, Avi Kivity wrote:
 On 08/27/2012 01:23 PM, Michael Wolf wrote:

 How would a guest know what its entitlement is?



 Currently the Admin/management tool setting up the guests will put it on
 the qemu commandline.  From this it is passed via an ioctl to the host.
 The guest will get the value from the host via a hypercall.

 In the future the host could try and do some of it automatically in some
 cases. 

 Seems to me it's a meaningless value for the guest.  Suppose it is
 migrated to a host that is more powerful, and as a result its relative
 entitlement is reduced.  The value needs to be adjusted.
 
 This is why I chose to manage the value from the sysctl interface rather
 than just have it stored as a value in /proc.  Whatever tool was used to
 migrate the vm could hopefully adjust the sysctl value on the guest.

 This is best taken care of from the host side.
 
 Not sure what you are getting at here.  If you are running in a cloud
 environment, you purchase a VM with the understanding that you are
 getting certain resources.  As this type of user I don't believe you
 have any access to the host to see this type of information.  So the
 user still wouldnt have a way to confirm that they are receiving what
 they should be in the way of processor resources.
 
 Would you please elaborate a little more on this?

What do you mean they have no access to the host?
They have access to all sorts of tools that display information from the
host. Speaking of a view-only resource, those are strictly equivalent.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Avi Kivity
On 08/27/2012 02:27 PM, Michael Wolf wrote:
 On Mon, 2012-08-27 at 13:31 -0700, Avi Kivity wrote:
  On 08/27/2012 01:23 PM, Michael Wolf wrote:

How would a guest know what its entitlement is?


  
   Currently the Admin/management tool setting up the guests will put it on
   the qemu commandline.  From this it is passed via an ioctl to the host.
   The guest will get the value from the host via a hypercall.
  
   In the future the host could try and do some of it automatically in some
   cases. 
  
  Seems to me it's a meaningless value for the guest.  Suppose it is
  migrated to a host that is more powerful, and as a result its relative
  entitlement is reduced.  The value needs to be adjusted.

 This is why I chose to manage the value from the sysctl interface rather
 than just have it stored as a value in /proc.  Whatever tool was used to
 migrate the vm could hopefully adjust the sysctl value on the guest.

We usually try to avoid this type of coupling.  What if the guest is
rebooting while this is happening?  What if it's not running Linux at all?

  
  This is best taken care of from the host side.

 Not sure what you are getting at here.  If you are running in a cloud
 environment, you purchase a VM with the understanding that you are
 getting certain resources.  As this type of user I don't believe you
 have any access to the host to see this type of information.  So the
 user still wouldnt have a way to confirm that they are receiving what
 they should be in the way of processor resources.

 Would you please elaborate a little more on this?

I meant not reporting this time as steal time.  But that cripples steal
time reporting.

Looks like for each quanta we need to report how much real time has
passed, how much the guest was actually using, and how much the guest
was not using due to overcommit (with the reminder being unallocated
time).  The guest could then present it any way it wanted to.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Michael Wolf
On Mon, 2012-08-27 at 14:41 -0700, Glauber Costa wrote:
 On 08/27/2012 02:27 PM, Michael Wolf wrote:
  On Mon, 2012-08-27 at 13:31 -0700, Avi Kivity wrote:
  On 08/27/2012 01:23 PM, Michael Wolf wrote:
 
  How would a guest know what its entitlement is?
 
 
 
  Currently the Admin/management tool setting up the guests will put it on
  the qemu commandline.  From this it is passed via an ioctl to the host.
  The guest will get the value from the host via a hypercall.
 
  In the future the host could try and do some of it automatically in some
  cases. 
 
  Seems to me it's a meaningless value for the guest.  Suppose it is
  migrated to a host that is more powerful, and as a result its relative
  entitlement is reduced.  The value needs to be adjusted.
  
  This is why I chose to manage the value from the sysctl interface rather
  than just have it stored as a value in /proc.  Whatever tool was used to
  migrate the vm could hopefully adjust the sysctl value on the guest.
 
  This is best taken care of from the host side.
  
  Not sure what you are getting at here.  If you are running in a cloud
  environment, you purchase a VM with the understanding that you are
  getting certain resources.  As this type of user I don't believe you
  have any access to the host to see this type of information.  So the
  user still wouldnt have a way to confirm that they are receiving what
  they should be in the way of processor resources.
  
  Would you please elaborate a little more on this?
 
 What do you mean they have no access to the host?
 They have access to all sorts of tools that display information from the
 host. Speaking of a view-only resource, those are strictly equivalent.
 
 
 

ok.  I will go look at those resources. 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: /dev/kvm not sufficiently restricted, and in ways I didn't think were possible

2012-08-27 Thread Neal Murphy
On Monday, August 27, 2012 04:11:11 PM Henry Cejtin wrote:
 I'm  completely  confused  about  access to /dev/kvm.  In particular, it
 looks like it is too  open  to  access,  but  in  a  way  that  I  don't
 understand.
 
 On my machine, /dev/kvm is owned by root.root and mode 660.  Here is the
 output of ls:
 
 % ls -l /dev/kvm
 crw-rw+ 1 root root 10, 232 Aug 24 15:03 /dev/kvm
 
 Despite that, when a process is uid 1000 and group id 1000, and  not  in
 any other groups, I can open /dev/kvm.
 
 ...
 
 Please note, I don't understand how this could really be.

I think the '+' indicates ACLs are in use; 'getfacl /dev/kvm' might be 
illuminating. It might be something udev does, or something your desktop 
software does when you log in.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for Tuesda, August 28th

2012-08-27 Thread Juan Quintela

Hi

Please send in any agenda items you are interested in covering.

Thanks, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Reminder: KVM Forum 2012 Call For Participation

2012-08-27 Thread KVM Forum 2012 Program Committee
Just a reminder, the CFP ends this Friday.
--

=
KVM Forum 2012: Call For Participation
November 7-9, 2012 - Hotel Fira Palace - Barcelona, Spain

(All submissions must be received before midnight Aug 31st, 2012)
=

KVM is an industry leading open source hypervisor that provides
an ideal platform for datacenter virtualization, virtual desktop
infrastructure, and cloud computing.  Once again, it's time to bring
together the community of developers and users that define the KVM
ecosystem for our annual technical conference.  We will discuss the
current state of affairs and plan for the future of KVM, its surrounding
infrastructure, and management tools.  We are also excited to announce
the oVirt Workshop will run in parallel with the KVM Forum, bringing in
a community focused on enterprise datacenter virtualization management
built on KVM.  For topics which overlap we will have shared sessions.
So mark your calendar and join us in advancing KVM.

http://events.linuxfoundation.org/events/kvm-forum/

Once again we are colocated with The Linux Foundation's LinuxCon,
Based on feedback from last year, this time it's LinuxCon Europe!
KVM Forum attendees will be able to attend oVirt Workshop sessions and
are eligible to attend LinuxCon Europe for a discounted rate.

http://events.linuxfoundation.org/events/kvm-forum/register

We invite you to lead part of the discussion by submitting a speaking
proposal for KVM Forum 2012.

http://events.linuxfoundation.org/cfp

Suggested topics:

 KVM
 - Scaling and performance
 - Nested virtualization
 - I/O improvements
 - PCI device assignment
 - Driver domains
 - Time keeping
 - Resource management (cpu, memory, i/o)
 - Memory management (page sharing, swapping, huge pages, etc)
 - VEPA, VN-Link, vswitch
 - Security
 - Architecture ports
 
 QEMU
 - Device model improvements
 - New devices and chipsets
 - Scaling and performance
 - Desktop virtualization
 - Spice
 - Increasing robustness and hardening
 - Security model
 - Management interfaces
 - QMP protocol and implementation
 - Image formats
 - Firmware (SeaBIOS, OVMF, UEFI, etc)
 - Live migration
 - Live snapshots and merging
 - Fault tolerance, high availability, continuous backup
 - Real-time guest support
 
 Virtio
 - Speeding up existing devices
 - Alternatives
 - Virtio on non-Linux or non-virtualized
 
 Management infrastructure
 - oVirt (shared track w/ oVirt Workshop)
 - Libvirt
 - KVM autotest
 - OpenStack
 - Network virtualization management
 - Enterprise storage management
 
 Cloud computing
 - Scalable storage
 - Virtual networking
 - Security
 - Provisioning

SUBMISSION REQUIREMENTS

Abstracts due: Aug 31st, 2012
Notification: Sep 14th, 2012

Please submit a short abstract (~150 words) describing your presentation
proposal.  In your submission please note how long your talk will take.
Slots vary in length up to 45 minutes.  Also include in your proposal
the proposal type -- one of:

- technical talk
- end-user talk
- birds of a feather (BOF) session

Submit your proposal here:

http://events.linuxfoundation.org/cfp

You will receive a notification whether or not your presentation proposal
was accepted by Sep 14th.

END-USER COLLABORATION

One of the big challenges as developers is to know what, where and how
people actually use our software.  We will reserve a few slots for end
users talking about their deployment challenges and achievements.

If you are using KVM in production you are encouraged submit a speaking
proposal.  Simply mark it as an end-user collaboration proposal.  As an
end user, this is a unique opportunity to get your input to developers.

BOF SESSION

We will reserve some slots in the evening after the main conference
tracks, for birds of a feather (BOF) sessions. These sessions will be
less formal than presentation tracks and targetted for people who would
like to discuss specific issues with other developers and/or users.
If you are interested in getting developers and/or uses together to
discuss a specific problem, please submit a BOF proposal.

LIGHTNING TALKS

In addition to submitted talks we will also have some room for lightning
talks. These are short (5 minute) discussions to highlight new work or
ideas that aren't complete enough to warrant a full presentation slot.
Lightning talk submissions and scheduling will be handled on-site at
KVM Forum.

HOTEL / TRAVEL

The KVM Forum 2012 will be held in Barcelona, Spain at the Hotel Fira Palace.

http://events.linuxfoundation.org/events/kvm-forum/hotel

Thank you for your interest in KVM.  We're looking forward to your
submissions and seeing you at the KVM Forum 2012 in November!

Thanks,
your KVM Forum 2012 Program Commitee

Please contact us with any questions or comments.
kvm-forum-2012...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to 

Re: [patch 3/3] KVM: move postcommit flush to x86, as mmio sptes are x86 specific

2012-08-27 Thread Takuya Yoshikawa
On Mon, 27 Aug 2012 16:06:01 -0300
Marcelo Tosatti mtosa...@redhat.com wrote:

  Any explanation why (old.base_gfn != new.base_gfn) case can be
  omitted?
 
 (old.base_gfn != new.base_gfn) check covers the cases
 
 1. old.base_gfn = 0, new.base_gfn = !0 (slot creation)
 
 and
 
 x != 0, y != 0, x != y.
 2. old.base_gfn = x, new.base_gfn = y (gpa base change)
 
 Patch 2 covers case 2, so its only necessary to cover case
 1 here.
 
 Makes sense?

Yes.

But didn't you change the flush in the if block modified by patch 2
to kvm_arch_flush_shadow_memslot()?

Although current implementation flushes everything, this may trigger
problem when we change it.

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-08-27 Thread Jan Kiszka
Hi Blue,

thanks for the review. I addressed most of them, the others a commented
below.

On 2012-08-27 20:56, Blue Swirl wrote:
 +typedef struct AssignedDevice {
 +PCIDevice dev;
 +PCIHostDeviceAddress host;
 +uint32_t dev_id;
 +uint32_t features;
 +int intpin;
 +AssignedDevRegion v_addrs[PCI_NUM_REGIONS - 1];
 +PCIDevRegions real_device;
 +PCIINTxRoute intx_route;
 +AssignedIRQType assigned_irq_type;
 +struct {
 +#define ASSIGNED_DEVICE_CAP_MSI (1  0)
 +#define ASSIGNED_DEVICE_CAP_MSIX (1  1)
 +uint32_t available;
 +#define ASSIGNED_DEVICE_MSI_ENABLED (1  0)
 +#define ASSIGNED_DEVICE_MSIX_ENABLED (1  1)
 +#define ASSIGNED_DEVICE_MSIX_MASKED (1  2)
 +uint32_t state;
 +} cap;
 +uint8_t emulate_config_read[PCI_CONFIG_SPACE_SIZE];
 +uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
 +int msi_virq_nr;
 +int *msi_virq;
 +MSIXTableEntry *msix_table;
 +target_phys_addr_t msix_table_addr;
 +uint16_t msix_max;
 +MemoryRegion mmio;
 +char *configfd_name;
 
 const? Not if this would mean more casts.

DEFINE_PROP_STRING, where this is used, doesn't allow this.

...
 +} else {
 +uint32_t port = addr + dev_region-u.r_baseport;
 +
 +if (data) {
 +DEBUG(out data=%lx, size=%d, e_phys=%lx, host=%x\n,
 +  *data, size, addr, port);
 +switch (size) {
 +case 1:
 +outb(*data, port);
 +break;
 +case 2:
 +outw(*data, port);
 +break;
 +case 4:
 +outl(*data, port);
 +break;
 
 Maybe add case 8: and default: with abort(), also below.

PIO is never 8 bytes long, the generic layer protects us.

...
 +
 +fclose(f);
 +
 +/* read and fill vendor ID */
 +v = get_real_vendor_id(dir, id);
 +if (v) {
 +return 1;
 +}
 +pci_dev-dev.config[0] = id  0xff;
 +pci_dev-dev.config[1] = (id  0xff00)  8;
 +
 +/* read and fill device ID */
 +v = get_real_device_id(dir, id);
 +if (v) {
 +return 1;
 +}
 +pci_dev-dev.config[2] = id  0xff;
 +pci_dev-dev.config[3] = (id  0xff00)  8;
 +
 +pci_word_test_and_clear_mask(pci_dev-emulate_config_write + 
 PCI_COMMAND,
 + PCI_COMMAND_MASTER | 
 PCI_COMMAND_INTX_DISABLE);
 +
 +dev-region_number = r;
 +return 0;
 +}
 
 Pretty long function, how about refactoring?

Possibly, but I'd prefer to do such changes in-tree, after the more
important refactoring on MSI[-X] is done.

...
 +if (ctrl_byte  PCI_MSI_FLAGS_ENABLE) {
 +uint8_t *pos = pci_dev-config + pci_dev-msi_cap;
 +MSIMessage msg;
 +int virq;
 +
 +msg.address = pci_get_long(pos + PCI_MSI_ADDRESS_LO);
 +msg.data = pci_get_word(pos + PCI_MSI_DATA_32);
 +virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 +if (virq  0) {
 +perror(assigned_dev_update_msi: kvm_irqchip_add_msi_route);
 +return;
 +}
 +
 +assigned_dev-msi_virq = g_malloc(sizeof(*assigned_dev-msi_virq));
 
 Is this ever freed?

Yep, in free_msi_virqs. If you think you spotted a path where this is
not the case, let me know.

...
 +
 +static Property da_properties[] = {
 
 const?

Nope, properties must remain writable.

 
 +DEFINE_PROP_PCI_HOST_DEVADDR(host, AssignedDevice, host),
 +DEFINE_PROP_BIT(prefer_msi, AssignedDevice, features,
 +ASSIGNED_DEVICE_PREFER_MSI_BIT, false),
 +DEFINE_PROP_BIT(share_intx, AssignedDevice, features,
 +ASSIGNED_DEVICE_SHARE_INTX_BIT, true),
 +DEFINE_PROP_INT32(bootindex, AssignedDevice, bootindex, -1),
 +DEFINE_PROP_STRING(configfd, AssignedDevice, configfd_name),
 +DEFINE_PROP_END_OF_LIST(),
 +};
 +

Jan



signature.asc
Description: OpenPGP digital signature


Re: KVM: MMU: Tracking guest writes through EPT entries ?

2012-08-27 Thread Felix
Xiao Guangrong xiaoguangrong at linux.vnet.ibm.com writes:

 
 On 07/31/2012 01:18 AM, Sunil wrote:
  Hello List,
  
  I am a KVM newbie and studying KVM mmu code.
  
  On the existing guest, I am trying to track all guest writes by
  marking page table entry as read-only in EPT entry [ I am using Intel
  machine with vmx and ept support ]. Looks like EPT support re-uses
  shadow page table(SPT) code and hence some of SPT routines.
  
  I was thinking of below possible approach. Use pte_list_walk() to
  traverse through list of sptes and use mmu_spte_update()  to flip the
  PT_WRITABLE_MASK flag. But all SPTEs are not part of any single list;
  but on separate lists (based on gfn, page level, memory_slot). So,
  recording all the faulted guest GFN and then using above method work ?
  
 
 There are two ways to write-protect all sptes:
 - use kvm_mmu_slot_remove_write_access() on all memslots
 - walk the shadow page cache to get the shadow pages in the highest level
   (level = 4 on EPT), then write-protect its entries.
 
 If you just want to do it for the specified gfn, you can use
 rmap_write_protect().
 
 Just inquisitive, what is your purpose? :)
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majordomo at vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
Hi, Guangrong, 

I have done similar things like Sunil did. Simply for study purpose. However, I 
found some very weird situations. Basically, in the guest vm, I allocate a 
chunk 
of memory (with size of a page) in a user level program. Through a guest kernel 
level module and my self defined hypercall, I pass the gva of this memory to 
kvm. Then I try different methods in the hypercall handler to write protect 
this 
page of memory. You can see that I want to write protect it through ETP instead 
of write protected in the guest page tables.  

1. I use kvm_mmu_gva_to_gpa_read to translate the gva into gpa. Based on the 
function, kvm_mmu_get_spte_hierarchy(vcpu, gpa, spte[4]), I change the codes to 
read sptep (the pointer to spte) instead of spte, so I can modify the spte 
corresponding to this gpa. What I observe is that if I modify spte[0] (I think 
this is the lowest level page table entry corresponding to EPT table; I can 
successfully modify it as the changes are reflected in the result of calling 
kvm_mmu_get_spte_hierarchy again), but my user level program in vm can still 
write to this page. 

In your this blog post, you mentioned (the shadow pages in the highest level 
(level = 4 on EPT)), I don't understand this part. Does this mean I have to 
modify spte[3] instead of spte[0]? I just try modify spte[1] and spte[3], both 
can cause vmexit. So I am totally confused about the meaning of level used in 
shadow page table and its relations to shadow page table. Can you help me to 
understand this?

2. As suggested by this post, I also use rmap_write_protect() to write protect 
this page. With kvm_mmu_get_spte_hierarchy(vcpu, gpa, spte[4]), I still can see 
that spte[0] gives me xx005 such result, this means that the function is 
called successfully. But still I can write to this page. 

I even try the function kvm_age_hva() to remove this spte, this gives me 0 of 
spte[0], but I still can write to this page. So I am further confused about the 
level used in the shadow page?

Really thanks and appreciate your reply. 

Felix




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: MMU: Tracking guest writes through EPT entries

2012-08-27 Thread Hugo
Hi,

I have done similar things posted in
http://article.gmane.org/gmane.comp.emulators.kvm.devel/95342/match=tracking+guest+writes+ept
.

However, I found some very weird situations. Basically, in the guest
vm, I allocate a chunk
of memory (with size of a page) in a user level program. Through a guest kernel
level module and my self defined hypercall, I pass the gva of this memory to
kvm. Then I try different methods in the hypercall handler to write protect this
page of memory. You can see that I want to write protect it through ETP instead
of write protected in the guest page tables.

1. I use kvm_mmu_gva_to_gpa_read to translate the gva into gpa. Based on the
function, kvm_mmu_get_spte_hierarchy(vcpu, gpa, spte[4]), I change the codes to
read sptep (the pointer to spte) instead of spte, so I can modify the spte
corresponding to this gpa. What I observe is that if I modify spte[0] (I think
this is the lowest level page table entry corresponding to EPT table; I can
successfully modify it as the changes are reflected in the result of calling
kvm_mmu_get_spte_hierarchy again), but my user level program in vm can still
write to this page.

In this post, it mentioned (the shadow pages in the highest level
(level = 4 on EPT)), I don't understand this part. Does this mean I have to
modify spte[3] instead of spte[0]? I just try modify spte[1] and spte[3], both
can cause vmexit. So I am totally confused about the meaning of level used in
shadow page table and its relations to shadow page table. Can you help me to
understand this?

2. As suggested by this post, I also use rmap_write_protect() to write protect
this page. With kvm_mmu_get_spte_hierarchy(vcpu, gpa, spte[4]), I still can see
that spte[0] gives me results like xx005, this means that the function is
called successfully and write protected bit is cleared in pte. But
still I can write to this page.

I even try the function kvm_age_hva() to remove this spte, this gives me 0 of
spte[0], but I still can write to this page. So I am further confused about the
level used in the shadow page?

Really thanks and appreciate your reply.


Hugo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Documentation for kvm_stat.

2012-08-27 Thread Bo Yang
Signed-off-by: Bo Yang boy...@suse.com
---
 Makefile  |9 -
 kvm_stat.texi |   55 +++
 2 files changed, 63 insertions(+), 1 deletions(-)
 create mode 100644 kvm_stat.texi

diff --git a/Makefile b/Makefile
index 1cd5bc8..ee524b0 100644
--- a/Makefile
+++ b/Makefile
@@ -40,7 +40,7 @@ LIBS+=-lz $(LIBS_TOOLS)
 HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF)
 
 ifdef BUILD_DOCS
-DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 
QMP/qmp-commands.txt
+DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 kvm_stat.1 
QMP/qmp-commands.txt
 ifdef CONFIG_VIRTFS
 DOCS+=fsdev/virtfs-proxy-helper.1
 endif
@@ -283,6 +283,7 @@ ifdef CONFIG_POSIX
$(INSTALL_DATA) qemu.1 qemu-img.1 $(DESTDIR)$(mandir)/man1
$(INSTALL_DIR) $(DESTDIR)$(mandir)/man8
$(INSTALL_DATA) qemu-nbd.8 $(DESTDIR)$(mandir)/man8
+   $(INSTALL_DATA) kvm_stat.1 $(DESTDIR)$(mandir)/man1
 endif
 ifdef CONFIG_VIRTFS
$(INSTALL_DIR) $(DESTDIR)$(mandir)/man1
@@ -387,6 +388,12 @@ qemu-nbd.8: qemu-nbd.texi
  $(POD2MAN) --section=8 --center=  --release=  qemu-nbd.pod  $@, \
GEN   $@)
 
+kvm_stat.1: kvm_stat.texi
+   $(call quiet-command, \
+ perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $ kvm_stat.pod  \
+ $(POD2MAN) --section=1 --center=  --release=  kvm_stat.pod  $@, \
+   GEN   $@)
+
 dvi: qemu-doc.dvi qemu-tech.dvi
 html: qemu-doc.html qemu-tech.html
 info: qemu-doc.info qemu-tech.info
diff --git a/kvm_stat.texi b/kvm_stat.texi
new file mode 100644
index 000..ff7d414
--- /dev/null
+++ b/kvm_stat.texi
@@ -0,0 +1,55 @@
+@example
+@c man begin SYNOPSIS
+
+usage: kvm_stat [OPTIONS]
+
+@c man end
+@end example
+
+@c man begin DESCRIPTION
+
+This is a utility to watch kvm statistics.
+
+@c man end
+@c man begin OPTIONS
+
+@table @option
+
+@item -h, --help
+
+Show help message.
+
+@item -1, --once, --batch
+
+Run in batch mode for one second.
+
+@item -l, --log
+
+Run in logging mode (like vmstat).
+
+@item -f @var{FIELDS}, --fields=@var{FIELDS}
+
+Fields to display (regex). regex expression can be accepted here. Fields 
include:
+@samp{size}, @samp{config}, @samp{sample_freq}, @samp{sample_type}, 
@samp{read_format}, @samp{flags}, @samp{wakeup_events}, @samp{bp_type}, 
@samp{bp_addr}, @samp{bp_len}
+
+@end table
+@c man end
+
+@ignore
+
+@setfilename kvm_stat
+@settitle kvm statistics utility
+
+@c man begin SEE ALSO
+
+vmstat
+
+@c man end
+
+@c man begin AUTHOR
+
+Copyright (C) 2012 Bo Yang boy...@suse.com.
+This is free software; see the source for copying conditions.  There is NO
+warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+@c man end
+@end ignore
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/2] KVM: PPC: Move kvm-arch.slot_phys into memslot.arch

2012-08-27 Thread Marcelo Tosatti
On Sat, Aug 25, 2012 at 10:40:40PM +1000, Paul Mackerras wrote:
 Now that we have an architecture-specific field in the kvm_memory_slot
 structure, we can use it to store the array of page physical addresses
 that we need for Book3S HV KVM on PPC970 processors.  This reduces the
 size of struct kvm_arch for Book3S HV, and also reduces the size of
 struct kvm_arch_memory_slot for other PPC KVM variants since the fields
 in it are now only compiled in for Book3S HV.
 
 This necessitates making the kvm_arch_create_memslot and
 kvm_arch_free_memslot operations specific to each PPC KVM variant.
 That in turn means that we now don't allocate the rmap arrays on
 Book3S PR and Book E.
 
 Since we now unpin pages and free the slot_phys array in
 kvmppc_core_free_memslot, we no longer need to do it in
 kvmppc_core_destroy_vm, since the generic code takes care to free
 all the memslots when destroying a VM.
 
 We now need the new memslot to be passed in to
 kvmppc_core_prepare_memory_region, since we need to initialize its
 arch.slot_phys member on Book3S HV.
 
 Signed-off-by: Paul Mackerras pau...@samba.org
 ---
 This is on top of Alex's kvm-ppc-next branch with the KVM tree's next
 branch merged in and then Marcelo's set of 3 patches on that.
 
  arch/powerpc/include/asm/kvm_host.h |9 +--
  arch/powerpc/include/asm/kvm_ppc.h  |5 ++
  arch/powerpc/kvm/book3s_64_mmu_hv.c |6 +-
  arch/powerpc/kvm/book3s_hv.c|  104 
 ---
  arch/powerpc/kvm/book3s_hv_rm_mmu.c |2 +-
  arch/powerpc/kvm/book3s_pr.c|   12 
  arch/powerpc/kvm/booke.c|   12 
  arch/powerpc/kvm/powerpc.c  |   13 +
  8 files changed, 102 insertions(+), 61 deletions(-)

Regarding generic memslot code, looks fine.

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html