Re: [PATCH for-1.2 0/2] migrate PV EOI MSR
On 2012-08-26 17:59, Michael S. Tsirkin wrote: It turns out PV EOI gets disabled after migration - until next guest reset. This is because we are missing code to actually migrate it. This patch fixes it up: it does not do anything useful without kvm irqchip but applies cleanly to qemu.git as well as qemu-kvm.git, so I think it's cleaner to apply it in qemu.git to keep diff to minimum. There is nothing except pci-assign left in qemu-kvm (which will be posted for upstream in a minute), so you are intuitively doing the right thing. Patch 2 looks good to me, see patch 1 for the clean procedure. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH for-1.2 1/2] linux-headers: update asm/kvm_para.h to 3.6
On 2012-08-26 17:59, Michael S. Tsirkin wrote: Update asm-x96/kvm_para.h to version present in Linux 3.6. Nope, we have update-linux-headers.sh for this. Just run it again 3.6-rcX, grab the result, and mention the source (release version or kvm.git hash). Jan This is needed for the new PV EOI feature. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-headers/asm-x86/kvm_para.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h index f2ac46a..a1c3d72 100644 --- a/linux-headers/asm-x86/kvm_para.h +++ b/linux-headers/asm-x86/kvm_para.h @@ -22,6 +22,7 @@ #define KVM_FEATURE_CLOCKSOURCE23 #define KVM_FEATURE_ASYNC_PF 4 #define KVM_FEATURE_STEAL_TIME 5 +#define KVM_FEATURE_PV_EOI 6 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -37,6 +38,7 @@ #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 +#define MSR_KVM_PV_EOI_EN 0x4b564d04 struct kvm_steal_time { __u64 steal; @@ -89,5 +91,10 @@ struct kvm_vcpu_pv_apf_data { __u32 enabled; }; +#define KVM_PV_EOI_BIT 0 +#define KVM_PV_EOI_MASK (0x1 KVM_PV_EOI_BIT) +#define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK +#define KVM_PV_EOI_DISABLED 0x0 + #endif /* _ASM_X86_KVM_PARA_H */ signature.asc Description: OpenPGP digital signature
[PATCH 0/4] uq/master: Add classic PCI device assignment
I'm proud to present probably the last patch series to merge qemu-kvm into upstream: This one adds PCI device assignment for x86 using the classic interface that the KVM model provides. See the last patch for reasons why we still want this while next-generation device assignment via VFIO is approaching. It's been a long journey, but once this is merged, I think we can close the qemu-kvm chapter. I already did so, all work is based on QEMU now. Jan Kiszka (4): kvm: Introduce kvm_irqchip_update_msi_route kvm: Introduce kvm_has_intx_set_mask kvm: i386: Add services required for PCI device assignment kvm: i386: Add classic PCI device assignment hw/kvm/Makefile.objs |2 +- hw/kvm/pci-assign.c| 1929 kvm-all.c | 50 ++ kvm.h |2 + target-i386/kvm.c | 141 target-i386/kvm_i386.h | 22 + 6 files changed, 2145 insertions(+), 1 deletions(-) create mode 100644 hw/kvm/pci-assign.c -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] kvm: Introduce kvm_has_intx_set_mask
From: Jan Kiszka jan.kis...@siemens.com Will be used by PCI device assignment code. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c |8 kvm.h |1 + 2 files changed, 9 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index fd9d9b4..84d4f7f 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -88,6 +88,7 @@ struct KVMState int pit_state2; int xsave, xcrs; int many_ioeventfds; +int intx_set_mask; /* The man page (and posix) say ioctl numbers are signed int, but * they're not. Linux, glibc and *BSD all treat ioctl numbers as * unsigned, and treating them as signed here can break things */ @@ -1387,6 +1388,8 @@ int kvm_init(void) s-irq_set_ioctl = KVM_IRQ_LINE_STATUS; } +s-intx_set_mask = kvm_check_extension(s, KVM_CAP_PCI_2_3); + ret = kvm_arch_init(s); if (ret 0) { goto err; @@ -1739,6 +1742,11 @@ int kvm_has_gsi_routing(void) #endif } +int kvm_has_intx_set_mask(void) +{ +return kvm_state-intx_set_mask; +} + void *kvm_vmalloc(ram_addr_t size) { #ifdef TARGET_S390X diff --git a/kvm.h b/kvm.h index 5cefe3a..dea2998 100644 --- a/kvm.h +++ b/kvm.h @@ -117,6 +117,7 @@ int kvm_has_xcrs(void); int kvm_has_pit_state2(void); int kvm_has_many_ioeventfds(void); int kvm_has_gsi_routing(void); +int kvm_has_intx_set_mask(void); #ifdef NEED_CPU_H int kvm_init_vcpu(CPUArchState *env); -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] kvm: Introduce kvm_irqchip_update_msi_route
From: Jan Kiszka jan.kis...@siemens.com This service allows to update an MSI route without releasing/reacquiring the associated VIRQ. Will be used by PCI device assignment, later on likely also by virtio/vhost and VFIO. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c | 42 ++ kvm.h |1 + 2 files changed, 43 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index d4d8a1f..fd9d9b4 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -963,6 +963,30 @@ static void kvm_add_routing_entry(KVMState *s, kvm_irqchip_commit_routes(s); } +static int kvm_update_routing_entry(KVMState *s, +struct kvm_irq_routing_entry *new_entry) +{ +struct kvm_irq_routing_entry *entry; +int n; + +for (n = 0; n s-irq_routes-nr; n++) { +entry = s-irq_routes-entries[n]; +if (entry-gsi != new_entry-gsi) { +continue; +} + +entry-type = new_entry-type; +entry-flags = new_entry-flags; +entry-u = new_entry-u; + +kvm_irqchip_commit_routes(s); + +return 0; +} + +return -ESRCH; +} + void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int pin) { struct kvm_irq_routing_entry e; @@ -1125,6 +1149,24 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) return virq; } +int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg) +{ +struct kvm_irq_routing_entry kroute; + +if (!kvm_irqchip_in_kernel()) { +return -ENOSYS; +} + +kroute.gsi = virq; +kroute.type = KVM_IRQ_ROUTING_MSI; +kroute.flags = 0; +kroute.u.msi.address_lo = (uint32_t)msg.address; +kroute.u.msi.address_hi = msg.address 32; +kroute.u.msi.data = msg.data; + +return kvm_update_routing_entry(s, kroute); +} + static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, int virq, bool assign) { struct kvm_irqfd irqfd = { diff --git a/kvm.h b/kvm.h index 37d1f81..5cefe3a 100644 --- a/kvm.h +++ b/kvm.h @@ -270,6 +270,7 @@ int kvm_set_ioeventfd_mmio(int fd, uint32_t adr, uint32_t val, bool assign, int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool assign); int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg); +int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg); void kvm_irqchip_release_virq(KVMState *s, int virq); int kvm_irqchip_add_irqfd_notifier(KVMState *s, EventNotifier *n, int virq); -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] kvm: i386: Add services required for PCI device assignment
From: Jan Kiszka jan.kis...@siemens.com These helpers abstract the interaction of upcoming pci-assign with the KVM kernel services. Put them under i386 only as other archs will implement device pass-through via VFIO and not this classic interface. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- target-i386/kvm.c | 141 target-i386/kvm_i386.h | 22 2 files changed, 163 insertions(+), 0 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 696b14a..5e2d4f5 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -31,6 +31,7 @@ #include hw/apic.h #include ioport.h #include hyperv.h +#include hw/pci.h //#define DEBUG_KVM @@ -2055,3 +2056,143 @@ void kvm_arch_init_irq_routing(KVMState *s) kvm_msi_via_irqfd_allowed = true; kvm_gsi_routing_allowed = true; } + +/* Classic KVM device assignment interface. Will remain x86 only. */ +int kvm_device_pci_assign(KVMState *s, PCIHostDeviceAddress *dev_addr, + uint32_t flags, uint32_t *dev_id) +{ +struct kvm_assigned_pci_dev dev_data = { +.segnr = dev_addr-domain, +.busnr = dev_addr-bus, +.devfn = PCI_DEVFN(dev_addr-slot, dev_addr-function), +.flags = flags, +}; +int ret; + +dev_data.assigned_dev_id = +(dev_addr-domain 16) | (dev_addr-bus 8) | dev_data.devfn; + +ret = kvm_vm_ioctl(s, KVM_ASSIGN_PCI_DEVICE, dev_data); +if (ret 0) { +return ret; +} + +*dev_id = dev_data.assigned_dev_id; + +return 0; +} + +int kvm_device_pci_deassign(KVMState *s, uint32_t dev_id) +{ +struct kvm_assigned_pci_dev dev_data = { +.assigned_dev_id = dev_id, +}; + +return kvm_vm_ioctl(s, KVM_DEASSIGN_PCI_DEVICE, dev_data); +} + +static int kvm_assign_irq_internal(KVMState *s, uint32_t dev_id, + uint32_t irq_type, uint32_t guest_irq) +{ +struct kvm_assigned_irq assigned_irq = { +.assigned_dev_id = dev_id, +.guest_irq = guest_irq, +.flags = irq_type, +}; + +if (kvm_check_extension(s, KVM_CAP_ASSIGN_DEV_IRQ)) { +return kvm_vm_ioctl(s, KVM_ASSIGN_DEV_IRQ, assigned_irq); +} else { +return kvm_vm_ioctl(s, KVM_ASSIGN_IRQ, assigned_irq); +} +} + +int kvm_device_intx_assign(KVMState *s, uint32_t dev_id, bool use_host_msi, + uint32_t guest_irq) +{ +uint32_t irq_type = KVM_DEV_IRQ_GUEST_INTX | +(use_host_msi ? KVM_DEV_IRQ_HOST_MSI : KVM_DEV_IRQ_HOST_INTX); + +return kvm_assign_irq_internal(s, dev_id, irq_type, guest_irq); +} + +int kvm_device_intx_set_mask(KVMState *s, uint32_t dev_id, bool masked) +{ +struct kvm_assigned_pci_dev dev_data = { +.assigned_dev_id = dev_id, +.flags = masked ? KVM_DEV_ASSIGN_MASK_INTX : 0, +}; + +return kvm_vm_ioctl(s, KVM_ASSIGN_SET_INTX_MASK, dev_data); +} + +static int kvm_deassign_irq_internal(KVMState *s, uint32_t dev_id, + uint32_t type) +{ +struct kvm_assigned_irq assigned_irq = { +.assigned_dev_id = dev_id, +.flags = type, +}; + +return kvm_vm_ioctl(s, KVM_DEASSIGN_DEV_IRQ, assigned_irq); +} + +int kvm_device_intx_deassign(KVMState *s, uint32_t dev_id, bool use_host_msi) +{ +return kvm_deassign_irq_internal(s, dev_id, KVM_DEV_IRQ_GUEST_INTX | +(use_host_msi ? KVM_DEV_IRQ_HOST_MSI : KVM_DEV_IRQ_HOST_INTX)); +} + +int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, int virq) +{ +return kvm_assign_irq_internal(s, dev_id, KVM_DEV_IRQ_HOST_MSI | + KVM_DEV_IRQ_GUEST_MSI, virq); +} + +int kvm_device_msi_deassign(KVMState *s, uint32_t dev_id) +{ +return kvm_deassign_irq_internal(s, dev_id, KVM_DEV_IRQ_GUEST_MSI | +KVM_DEV_IRQ_HOST_MSI); +} + +bool kvm_device_msix_supported(KVMState *s) +{ +/* The kernel lacks a corresponding KVM_CAP, so we probe by calling + * KVM_ASSIGN_SET_MSIX_NR with an invalid parameter. */ +return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, NULL) == -EFAULT; +} + +int kvm_device_msix_init_vectors(KVMState *s, uint32_t dev_id, + uint32_t nr_vectors) +{ +struct kvm_assigned_msix_nr msix_nr = { +.assigned_dev_id = dev_id, +.entry_nr = nr_vectors, +}; + +return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, msix_nr); +} + +int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t vector, + int virq) +{ +struct kvm_assigned_msix_entry msix_entry = { +.assigned_dev_id = dev_id, +.gsi = virq, +.entry = vector, +}; + +return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_ENTRY, msix_entry); +} + +int kvm_device_msix_assign(KVMState *s, uint32_t dev_id) +{ +return kvm_assign_irq_internal(s, dev_id, KVM_DEV_IRQ_HOST_MSIX | +
Export offsets of VMCS fields as note information for kdump
Hello Avi, About this VMCSINFO patch, we really need this functionality in our development. And YOSHIDA Masanori(masanori.yoshida...@hitachi.com), the developer from Hitachi, has said they need this too. So could you please tell us why the patch is unacceptable? You dislike the whole export-VMCSINFO-thing in all, or you just dislike the way we implement the path? Finally do you have any suggestion about all this? Below is why we need this patch and how we will use this patch in our development. We once came to an abnormal situation: a host scheduler bug caused guest machine's vcpu stopped for a long time and then led to heartbeat stop (host is still running). We want to have an efficient way to make the bug analysis when we come to the similar situations where guest machine doesn't work well due to something of host machine's. Actually, these situations have happened many times, in particular, under development. So here comes the requirement: If we want to find the root cause, we should debug both host machine's and guest machine's sides. But first we should get both host machine's crash dump and guest machine's crash dump and they must be dumped at the same time when the abnormal situation remains. So the only way to do this is to panic the host with the abnormal guest running on it and then the guest's image is contained in host's crash dump. Logically, retrieving guest's crash dump from the host's crash dump is the very important step to accomplish our goal. Unfortunately, in kvm implementation, some registers' values of the guest are hidden in vmcs, and vmcs internal is hidden by Intel. If we could not retrieve these registers from the vmcs, the guest crash dump we make is incomplete, and some key information is lost when we analyse the guest crash dump. So we make this patch to export the vmcs internal. With the patch applied, we could write registers' values stored in vmcs into guest's crash dump. And that's what we want. If a bug was found on customer's environment, we have two ways to avoid affecting other guest machines running on the same host. First, we could do bug analysis on another environment to reproduce the buggy situation; Second, we could migrate other guest machines to other hosts. After the abnormal situation is reproduced, we panic the host *manually*. Then we could use userland tools to get guest machine's crash dump from host machine's with the feature provided by this patch. Finally we could analyse them separately to find which side causes the problem. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/3] KVM: move postcommit flush to x86, as mmio sptes are x86 specific
On 08/25/2012 02:54 AM, Marcelo Tosatti wrote: Other arches do not need this. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -6455,6 +6455,14 @@ void kvm_arch_commit_memory_region(struc kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages); kvm_mmu_slot_remove_write_access(kvm, mem-slot); spin_unlock(kvm-mmu_lock); + /* + * If the new memory slot is created, we need to clear all + * mmio sptes. + */ + if (old.npages == 0 npages) { + kvm_mmu_zap_all(kvm); + kvm_reload_remote_mmus(kvm); + } Can not use kvm_arch_flush_shadow_all()? Others are fine to me. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: registering ioeventfd in qemu/kvm
Il 23/08/2012 05:35, Shesha Sreenivasamurthy ha scritto: Hi, I am trying to generate eventfd upon a IO write from the guest, say it is at offset IO_NOTIFY_REG (0x10). When the guest writes to this register, I get control to QEMU's to the write function associated in mypci_iomem_ops. However, instead of this I would like to register an eventfd. To achieve that, first I tried: memory_region_add_eventfd(mypci-bar_iomem, IO_NOTIFY_REG, 4, true, 1, fd); This is the right way. You can look (in the git tree of QEMU) at hw/ivshmem.c, which is the simplest user of the eventfd API. Note that recently the API was changed to accept an EventNotifier rather than the raw eventfd. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question: Timekeeping between Host and Guest with NTP
Hello, (2012/08/25 0:00), Marcelo Tosatti wrote: snip kvmclock driver has access to the ntpd corrected frequency of the host, but: 1) kvmclock time as reported to the guest uses the TSC as an offset in addition to the host monotonic clock, TSC is susceptible to frequency variations. The guest has its own timekeeping (it accumulates time from kvmclock, at every timer interrupt). The algorithm is not perfect, and its suspectible to small variations. These add up over time. 2) Corrections to UTC, such as leap seconds, are not reflected to the host monotonic clock. NTP algorithm in the guest is responsible for synchronization to UTC. I see, I understood the pitfalls of the guest only syncing to kvmclock, and now NTP on the guest seems simple and reasonable for me. Thank you again for your detailed explanation. Sincerely, --- Aritoki TAKADA aritoki.takada...@hitachi.com Hitachi, Ltd., Yokohama Research Laboratory -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 0/3] KVM: perf: kvm events analysis tool
From: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com Changelog: - rebased it on Arnaldo's newest git tree perf/core branch the change from Arnaldo's comments: - directly get event from evsel-tp_format - remove die() and return the proper error code - rename thread-private to thread-priv the change from David's comments: - use is_valid_tracepoint instead of kvm_events_exist This patchset introduces a perf-based tool (perf kvm stat record/report) which can analyze kvm events more smartly. Below is the presentation slice on 2012 Japan LinuxCon: http://events.linuxfoundation.org/images/stories/pdf/lcjp2012_guangrong.pdf You can get more details from it. If any questions/comments, please feel free to let us know. This patchset is based on Arnaldo's git tree perf/core branch, and patch 2 is just doing the improvement work, which can be picked up independently. Usage: - kvm stat run a command and gather performance counter statistics, it is the alias of perf stat - trace kvm events: perf kvm stat record, or, if other tracepoints are interesting as well, we can append the events like this: perf kvm stat record -e kvm:* If many guests are running, we can track the specified guest by using -p or --pid - show the result: perf kvm stat report The output example is following: # pgrep qemu-kvm 26071 32253 32564 total 3 guests are running on the host Then, track the guest whose pid is 26071: # ./perf kvm stat record -p 26071 ^C[ perf record: Woken up 9 times to write data ] [ perf record: Captured and wrote 24.903 MB perf.data.guest (~1088034 samples) ] See the vmexit events: # ./perf kvm stat report --event=vmexit Analyze events for all VCPUs: VM-EXITSamples Samples% Time% Avg time APIC_ACCESS 6538166.58% 5.95% 37.72us ( +- 6.54% ) EXTERNAL_INTERRUPT 1603116.32% 3.06% 79.11us ( +- 7.34% ) CPUID 5360 5.46% 0.06% 4.50us ( +- 35.07% ) HLT 4496 4.58%90.75% 8360.34us ( +- 5.22% ) EPT_VIOLATION 2667 2.72% 0.04% 5.49us ( +- 5.05% ) PENDING_INTERRUPT 2242 2.28% 0.03% 5.25us ( +- 2.96% ) EXCEPTION_NMI 1332 1.36% 0.02% 6.53us ( +- 6.51% ) IO_INSTRUCTION383 0.39% 0.09% 93.39us ( +- 40.92% ) CR_ACCESS310 0.32% 0.00% 6.10us ( +- 3.95% ) Total Samples:98202, Total events handled time:41419293.63us. See the mmio events: # ./perf kvm stat report --event=mmio Analyze events for all VCPUs: MMIO AccessSamples Samples% Time% Avg time 0xfee00380:W 5868690.21%15.67% 4.95us ( +- 2.96% ) 0xfee00300:R 2124 3.26% 1.48% 12.93us ( +- 14.75% ) 0xfee00310:W 2124 3.26% 0.34% 3.00us ( +- 1.33% ) 0xfee00300:W 2123 3.26%82.50%720.68us ( +- 10.24% ) Total Samples:65057, Total events handled time:1854470.45us. See the ioport event: # ./perf kvm stat report --event=ioport Analyze events for all VCPUs: IO Port AccessSamples Samples% Time% Avg time 0xc090:POUT383 100.00% 100.00% 89.00us ( +- 42.94% ) Total Samples:383, Total events handled time:34085.56us. And, --vcpu is used to track the specified vcpu and --key is used to sort the result: # ./perf kvm stat report --event=vmexit --vcpu=0 --key=time Analyze events for VCPU 0: VM-EXITSamples Samples% Time% Avg time HLT551 5.05%94.81% 9501.72us ( +- 12.52% ) EXTERNAL_INTERRUPT 139012.74% 2.39% 94.80us ( +- 20.92% ) APIC_ACCESS 618656.68% 2.62% 23.41us ( +- 23.62% ) IO_INSTRUCTION 17 0.16% 0.01% 20.39us ( +- 22.33% ) EXCEPTION_NMI 94 0.86% 0.01% 6.07us ( +- 7.13% ) PENDING_INTERRUPT199 1.82% 0.02% 5.48us ( +- 4.36% ) CR_ACCESS 52 0.48% 0.00% 4.89us ( +- 4.09% ) EPT_VIOLATION 205718.85% 0.12% 3.15us ( +- 1.33% ) CPUID368 3.37% 0.02% 2.82us ( +- 2.79% ) Total Samples:10914, Total events handled time:5521782.02us. Dong Hao (3): KVM: x86: export svm/vmx exit code and vector code to userspace KVM: x86: trace mmio begin and complete KVM: perf: kvm events analysis tool arch/x86/include/asm/kvm_host.h | 36 +- arch/x86/include/asm/svm.h| 205 +--- arch/x86/include/asm/vmx.h| 126 +++-- arch/x86/kvm/trace.h | 89 arch/x86/kvm/x86.c| 32 +- include/trace/events/kvm.h| 37 ++ tools/perf/Documentation/perf-kvm.txt | 30 +- tools/perf/MANIFEST |3 + tools/perf/builtin-kvm.c
[PATCH v7 1/3] KVM: x86: export svm/vmx exit code and vector code to userspace
From: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com Exporting KVM exit information to userspace to be consumed by perf. [ Dong Hao haod...@linux.vnet.ibm.com: rebase it on acme's git tree ] Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com Signed-off-by: Dong Hao haod...@linux.vnet.ibm.com --- arch/x86/include/asm/kvm_host.h | 36 --- arch/x86/include/asm/svm.h | 205 +-- arch/x86/include/asm/vmx.h | 126 arch/x86/kvm/trace.h| 89 - 4 files changed, 234 insertions(+), 222 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 09155d6..ad2d229 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -11,6 +11,24 @@ #ifndef _ASM_X86_KVM_HOST_H #define _ASM_X86_KVM_HOST_H +#define DE_VECTOR 0 +#define DB_VECTOR 1 +#define BP_VECTOR 3 +#define OF_VECTOR 4 +#define BR_VECTOR 5 +#define UD_VECTOR 6 +#define NM_VECTOR 7 +#define DF_VECTOR 8 +#define TS_VECTOR 10 +#define NP_VECTOR 11 +#define SS_VECTOR 12 +#define GP_VECTOR 13 +#define PF_VECTOR 14 +#define MF_VECTOR 16 +#define MC_VECTOR 18 + +#ifdef __KERNEL__ + #include linux/types.h #include linux/mm.h #include linux/mmu_notifier.h @@ -75,22 +93,6 @@ #define KVM_HPAGE_MASK(x) (~(KVM_HPAGE_SIZE(x) - 1)) #define KVM_PAGES_PER_HPAGE(x) (KVM_HPAGE_SIZE(x) / PAGE_SIZE) -#define DE_VECTOR 0 -#define DB_VECTOR 1 -#define BP_VECTOR 3 -#define OF_VECTOR 4 -#define BR_VECTOR 5 -#define UD_VECTOR 6 -#define NM_VECTOR 7 -#define DF_VECTOR 8 -#define TS_VECTOR 10 -#define NP_VECTOR 11 -#define SS_VECTOR 12 -#define GP_VECTOR 13 -#define PF_VECTOR 14 -#define MF_VECTOR 16 -#define MC_VECTOR 18 - #define SELECTOR_TI_MASK (1 2) #define SELECTOR_RPL_MASK 0x03 @@ -994,4 +996,6 @@ int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data); void kvm_handle_pmu_event(struct kvm_vcpu *vcpu); void kvm_deliver_pmi(struct kvm_vcpu *vcpu); +#endif + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index f2b83bc..cdf5674 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -1,6 +1,135 @@ #ifndef __SVM_H #define __SVM_H +#define SVM_EXIT_READ_CR0 0x000 +#define SVM_EXIT_READ_CR3 0x003 +#define SVM_EXIT_READ_CR4 0x004 +#define SVM_EXIT_READ_CR8 0x008 +#define SVM_EXIT_WRITE_CR0 0x010 +#define SVM_EXIT_WRITE_CR3 0x013 +#define SVM_EXIT_WRITE_CR4 0x014 +#define SVM_EXIT_WRITE_CR8 0x018 +#define SVM_EXIT_READ_DR0 0x020 +#define SVM_EXIT_READ_DR1 0x021 +#define SVM_EXIT_READ_DR2 0x022 +#define SVM_EXIT_READ_DR3 0x023 +#define SVM_EXIT_READ_DR4 0x024 +#define SVM_EXIT_READ_DR5 0x025 +#define SVM_EXIT_READ_DR6 0x026 +#define SVM_EXIT_READ_DR7 0x027 +#define SVM_EXIT_WRITE_DR0 0x030 +#define SVM_EXIT_WRITE_DR1 0x031 +#define SVM_EXIT_WRITE_DR2 0x032 +#define SVM_EXIT_WRITE_DR3 0x033 +#define SVM_EXIT_WRITE_DR4 0x034 +#define SVM_EXIT_WRITE_DR5 0x035 +#define SVM_EXIT_WRITE_DR6 0x036 +#define SVM_EXIT_WRITE_DR7 0x037 +#define SVM_EXIT_EXCP_BASE 0x040 +#define SVM_EXIT_INTR 0x060 +#define SVM_EXIT_NMI 0x061 +#define SVM_EXIT_SMI 0x062 +#define SVM_EXIT_INIT 0x063 +#define SVM_EXIT_VINTR 0x064 +#define SVM_EXIT_CR0_SEL_WRITE 0x065 +#define SVM_EXIT_IDTR_READ 0x066 +#define SVM_EXIT_GDTR_READ 0x067 +#define SVM_EXIT_LDTR_READ 0x068 +#define SVM_EXIT_TR_READ 0x069 +#define SVM_EXIT_IDTR_WRITE0x06a +#define SVM_EXIT_GDTR_WRITE0x06b +#define SVM_EXIT_LDTR_WRITE0x06c +#define SVM_EXIT_TR_WRITE 0x06d +#define SVM_EXIT_RDTSC 0x06e +#define SVM_EXIT_RDPMC 0x06f +#define SVM_EXIT_PUSHF 0x070 +#define SVM_EXIT_POPF 0x071 +#define SVM_EXIT_CPUID 0x072 +#define SVM_EXIT_RSM 0x073 +#define SVM_EXIT_IRET 0x074 +#define SVM_EXIT_SWINT 0x075 +#define SVM_EXIT_INVD 0x076 +#define SVM_EXIT_PAUSE 0x077 +#define SVM_EXIT_HLT 0x078 +#define SVM_EXIT_INVLPG0x079 +#define SVM_EXIT_INVLPGA 0x07a +#define SVM_EXIT_IOIO 0x07b +#define SVM_EXIT_MSR 0x07c +#define SVM_EXIT_TASK_SWITCH 0x07d +#define SVM_EXIT_FERR_FREEZE 0x07e +#define SVM_EXIT_SHUTDOWN 0x07f +#define SVM_EXIT_VMRUN 0x080 +#define SVM_EXIT_VMMCALL 0x081 +#define SVM_EXIT_VMLOAD0x082 +#define SVM_EXIT_VMSAVE0x083 +#define SVM_EXIT_STGI 0x084 +#define SVM_EXIT_CLGI 0x085 +#define SVM_EXIT_SKINIT0x086 +#define SVM_EXIT_RDTSCP0x087 +#define SVM_EXIT_ICEBP 0x088 +#define SVM_EXIT_WBINVD0x089 +#define SVM_EXIT_MONITOR 0x08a +#define SVM_EXIT_MWAIT 0x08b +#define SVM_EXIT_MWAIT_COND0x08c +#define SVM_EXIT_XSETBV0x08d +#define
[PATCH v7 2/3] KVM: x86: trace mmio begin and complete
From: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com 'perf kvm stat record/report' will use kvm_exit and kvm_mmio(read...) to calculate mmio read emulated time for the old kernel, in order to trace mmio read event more exactly, we add kvm_mmio_begin to trace the time when mmio read begins, also, add kvm_io_done to trace the time when mmio/pio is completed [ Dong Hao haod...@linux.vnet.ibm.com: rebase it on current kvm tree ] Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com Signed-off-by: Dong Hao haod...@linux.vnet.ibm.com --- arch/x86/kvm/x86.c | 32 include/trace/events/kvm.h | 37 + 2 files changed, 57 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 42bce48..b90394d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3828,9 +3828,12 @@ mmio: /* * Is this MMIO handled locally? */ + trace_kvm_mmio_begin(vcpu-vcpu_id, write, gpa); handled = ops-read_write_mmio(vcpu, gpa, bytes, val); - if (handled == bytes) + if (handled == bytes) { + trace_kvm_io_done(vcpu-vcpu_id); return X86EMUL_CONTINUE; + } gpa += handled; bytes -= handled; @@ -4025,6 +4028,7 @@ static int emulator_pio_in_out(struct kvm_vcpu *vcpu, int size, vcpu-arch.pio.size = size; if (!kernel_pio(vcpu, vcpu-arch.pio_data)) { + trace_kvm_io_done(vcpu-vcpu_id); vcpu-arch.pio.count = 0; return 1; } @@ -4625,9 +4629,7 @@ restart: inject_emulated_exception(vcpu); r = EMULATE_DONE; } else if (vcpu-arch.pio.count) { - if (!vcpu-arch.pio.in) - vcpu-arch.pio.count = 0; - else + if (vcpu-arch.pio.in) writeback = false; r = EMULATE_DO_MMIO; } else if (vcpu-mmio_needed) { @@ -4658,8 +4660,6 @@ int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port) unsigned long val = kvm_register_read(vcpu, VCPU_REGS_RAX); int ret = emulator_pio_out_emulated(vcpu-arch.emulate_ctxt, size, port, val, 1); - /* do not return to emulator after return from userspace */ - vcpu-arch.pio.count = 0; return ret; } EXPORT_SYMBOL_GPL(kvm_fast_pio_out); @@ -5509,11 +5509,16 @@ static int complete_mmio(struct kvm_vcpu *vcpu) { struct kvm_run *run = vcpu-run; struct kvm_mmio_fragment *frag; - int r; + int r = 1; if (!(vcpu-arch.pio.count || vcpu-mmio_needed)) return 1; + if (vcpu-arch.pio.count !vcpu-arch.pio.in) { + vcpu-arch.pio.count = 0; + goto exit; + } + if (vcpu-mmio_needed) { /* Complete previous fragment */ frag = vcpu-mmio_fragments[vcpu-mmio_cur_fragment++]; @@ -5521,8 +5526,10 @@ static int complete_mmio(struct kvm_vcpu *vcpu) memcpy(frag-data, run-mmio.data, frag-len); if (vcpu-mmio_cur_fragment == vcpu-mmio_nr_fragments) { vcpu-mmio_needed = 0; + if (vcpu-mmio_is_write) - return 1; + goto exit; + vcpu-mmio_read_completed = 1; goto done; } @@ -5539,11 +5546,12 @@ static int complete_mmio(struct kvm_vcpu *vcpu) } done: vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu); - r = emulate_instruction(vcpu, EMULTYPE_NO_DECODE); + r = emulate_instruction(vcpu, EMULTYPE_NO_DECODE) == EMULATE_DONE; srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx); - if (r != EMULATE_DONE) - return 0; - return 1; + +exit: + trace_kvm_io_done(vcpu-vcpu_id); + return r; } int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h index 7ef9e75..d4182fa 100644 --- a/include/trace/events/kvm.h +++ b/include/trace/events/kvm.h @@ -177,6 +177,43 @@ TRACE_EVENT(kvm_mmio, __entry-len, __entry-gpa, __entry-val) ); +TRACE_EVENT(kvm_mmio_begin, + TP_PROTO(unsigned int vcpu_id, bool rw, u64 gpa), + TP_ARGS(vcpu_id, rw, gpa), + + TP_STRUCT__entry( + __field(unsigned int, vcpu_id) + __field(int, type) + __field(u64, gpa) + ), + + TP_fast_assign( + __entry-vcpu_id = vcpu_id; + __entry-type = rw ? KVM_TRACE_MMIO_WRITE : + KVM_TRACE_MMIO_READ; + __entry-gpa = gpa; + ), + + TP_printk(vcpu %u mmio %s gpa 0x%llx, __entry-vcpu_id, +
[PATCH v7 3/3] KVM: perf: kvm events analysis tool
From: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com Add 'perf kvm stat' support to analyze kvm vmexit/mmio/ioport smartly Usage: - kvm stat run a command and gather performance counter statistics, it is the alias of perf stat - trace kvm events: perf kvm stat record, or, if other tracepoints are interesting as well, we can append the events like this: perf kvm stat record -e timer:* If many guests are running, we can track the specified guest by using -p or --pid - show the result: perf kvm stat report The output example is following: # pgrep qemu-kvm 26071 32253 32564 total 3 guests are running on the host Then, track the guest whose pid is 26071: # ./perf kvm stat record -p 26071 ^C[ perf record: Woken up 9 times to write data ] [ perf record: Captured and wrote 24.903 MB perf.data.guest (~1088034 samples) ] See the vmexit events: # ./perf kvm stat report --event=vmexit Analyze events for all VCPUs: VM-EXITSamples Samples% Time% Avg time APIC_ACCESS 6538166.58% 5.95% 37.72us ( +- 6.54% ) EXTERNAL_INTERRUPT 1603116.32% 3.06% 79.11us ( +- 7.34% ) CPUID 5360 5.46% 0.06% 4.50us ( +- 35.07% ) HLT 4496 4.58%90.75% 8360.34us ( +- 5.22% ) EPT_VIOLATION 2667 2.72% 0.04% 5.49us ( +- 5.05% ) PENDING_INTERRUPT 2242 2.28% 0.03% 5.25us ( +- 2.96% ) EXCEPTION_NMI 1332 1.36% 0.02% 6.53us ( +- 6.51% ) IO_INSTRUCTION383 0.39% 0.09% 93.39us ( +- 40.92% ) CR_ACCESS310 0.32% 0.00% 6.10us ( +- 3.95% ) Total Samples:98202, Total events handled time:41419293.63us. See the mmio events: # ./perf kvm stat report --event=mmio Analyze events for all VCPUs: MMIO AccessSamples Samples% Time% Avg time 0xfee00380:W 5868690.21%15.67% 4.95us ( +- 2.96% ) 0xfee00300:R 2124 3.26% 1.48% 12.93us ( +- 14.75% ) 0xfee00310:W 2124 3.26% 0.34% 3.00us ( +- 1.33% ) 0xfee00300:W 2123 3.26%82.50%720.68us ( +- 10.24% ) Total Samples:65057, Total events handled time:1854470.45us. See the ioport event: # ./perf kvm stat report --event=ioport Analyze events for all VCPUs: IO Port AccessSamples Samples% Time% Avg time 0xc090:POUT383 100.00% 100.00% 89.00us ( +- 42.94% ) Total Samples:383, Total events handled time:34085.56us. And, --vcpu is used to track the specified vcpu and --key is used to sort the result: # ./perf kvm stat report --event=vmexit --vcpu=0 --key=time Analyze events for VCPU 0: VM-EXITSamples Samples% Time% Avg time HLT551 5.05%94.81% 9501.72us ( +- 12.52% ) EXTERNAL_INTERRUPT 139012.74% 2.39% 94.80us ( +- 20.92% ) APIC_ACCESS 618656.68% 2.62% 23.41us ( +- 23.62% ) IO_INSTRUCTION 17 0.16% 0.01% 20.39us ( +- 22.33% ) EXCEPTION_NMI 94 0.86% 0.01% 6.07us ( +- 7.13% ) PENDING_INTERRUPT199 1.82% 0.02% 5.48us ( +- 4.36% ) CR_ACCESS 52 0.48% 0.00% 4.89us ( +- 4.09% ) EPT_VIOLATION 205718.85% 0.12% 3.15us ( +- 1.33% ) CPUID368 3.37% 0.02% 2.82us ( +- 2.79% ) Total Samples:10914, Total events handled time:5521782.02us. [ Dong Hao haod...@linux.vnet.ibm.com: - rebase it on current acme's tree - fix the compiling-error on i386 ] Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com Signed-off-by: Dong Hao haod...@linux.vnet.ibm.com --- tools/perf/Documentation/perf-kvm.txt | 30 +- tools/perf/MANIFEST |3 + tools/perf/builtin-kvm.c | 889 - tools/perf/util/header.c | 54 ++- tools/perf/util/header.h |1 + tools/perf/util/thread.h |2 + 6 files changed, 973 insertions(+), 6 deletions(-) diff --git a/tools/perf/Documentation/perf-kvm.txt b/tools/perf/Documentation/perf-kvm.txt index dd84cb2..326f2cb 100644 --- a/tools/perf/Documentation/perf-kvm.txt +++ b/tools/perf/Documentation/perf-kvm.txt @@ -12,7 +12,7 @@ SYNOPSIS [--guestkallsyms=path --guestmodules=path | --guestvmlinux=path]] {top|record|report|diff|buildid-list} 'perf kvm' [--host] [--guest] [--guestkallsyms=path --guestmodules=path - | --guestvmlinux=path] {top|record|report|diff|buildid-list} + | --guestvmlinux=path] {top|record|report|diff|buildid-list|stat} DESCRIPTION --- @@ -38,6 +38,18 @@ There are a couple of variants of perf kvm: so that other tools can be used to fetch packages with matching
Re: [PATCH v7 0/3] KVM: perf: kvm events analysis tool
CC David. Hi David, I should apologize to you that Dong forgot to post the patchset to you. Could you pick these up from the mail list? On 08/27/2012 05:51 PM, Dong Hao wrote: From: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com Changelog: - rebased it on Arnaldo's newest git tree perf/core branch the change from Arnaldo's comments: - directly get event from evsel-tp_format - remove die() and return the proper error code - rename thread-private to thread-priv the change from David's comments: - use is_valid_tracepoint instead of kvm_events_exist This patchset introduces a perf-based tool (perf kvm stat record/report) which can analyze kvm events more smartly. Below is the presentation slice on 2012 Japan LinuxCon: http://events.linuxfoundation.org/images/stories/pdf/lcjp2012_guangrong.pdf You can get more details from it. If any questions/comments, please feel free to let us know. This patchset is based on Arnaldo's git tree perf/core branch, and patch 2 is just doing the improvement work, which can be picked up independently. Usage: - kvm stat run a command and gather performance counter statistics, it is the alias of perf stat - trace kvm events: perf kvm stat record, or, if other tracepoints are interesting as well, we can append the events like this: perf kvm stat record -e kvm:* If many guests are running, we can track the specified guest by using -p or --pid - show the result: perf kvm stat report The output example is following: # pgrep qemu-kvm 26071 32253 32564 total 3 guests are running on the host Then, track the guest whose pid is 26071: # ./perf kvm stat record -p 26071 ^C[ perf record: Woken up 9 times to write data ] [ perf record: Captured and wrote 24.903 MB perf.data.guest (~1088034 samples) ] See the vmexit events: # ./perf kvm stat report --event=vmexit Analyze events for all VCPUs: VM-EXITSamples Samples% Time% Avg time APIC_ACCESS 6538166.58% 5.95% 37.72us ( +- 6.54% ) EXTERNAL_INTERRUPT 1603116.32% 3.06% 79.11us ( +- 7.34% ) CPUID 5360 5.46% 0.06% 4.50us ( +- 35.07% ) HLT 4496 4.58%90.75% 8360.34us ( +- 5.22% ) EPT_VIOLATION 2667 2.72% 0.04% 5.49us ( +- 5.05% ) PENDING_INTERRUPT 2242 2.28% 0.03% 5.25us ( +- 2.96% ) EXCEPTION_NMI 1332 1.36% 0.02% 6.53us ( +- 6.51% ) IO_INSTRUCTION383 0.39% 0.09% 93.39us ( +- 40.92% ) CR_ACCESS310 0.32% 0.00% 6.10us ( +- 3.95% ) Total Samples:98202, Total events handled time:41419293.63us. See the mmio events: # ./perf kvm stat report --event=mmio Analyze events for all VCPUs: MMIO AccessSamples Samples% Time% Avg time 0xfee00380:W 5868690.21%15.67% 4.95us ( +- 2.96% ) 0xfee00300:R 2124 3.26% 1.48% 12.93us ( +- 14.75% ) 0xfee00310:W 2124 3.26% 0.34% 3.00us ( +- 1.33% ) 0xfee00300:W 2123 3.26%82.50%720.68us ( +- 10.24% ) Total Samples:65057, Total events handled time:1854470.45us. See the ioport event: # ./perf kvm stat report --event=ioport Analyze events for all VCPUs: IO Port AccessSamples Samples% Time% Avg time 0xc090:POUT383 100.00% 100.00% 89.00us ( +- 42.94% ) Total Samples:383, Total events handled time:34085.56us. And, --vcpu is used to track the specified vcpu and --key is used to sort the result: # ./perf kvm stat report --event=vmexit --vcpu=0 --key=time Analyze events for VCPU 0: VM-EXITSamples Samples% Time% Avg time HLT551 5.05%94.81% 9501.72us ( +- 12.52% ) EXTERNAL_INTERRUPT 139012.74% 2.39% 94.80us ( +- 20.92% ) APIC_ACCESS 618656.68% 2.62% 23.41us ( +- 23.62% ) IO_INSTRUCTION 17 0.16% 0.01% 20.39us ( +- 22.33% ) EXCEPTION_NMI 94 0.86% 0.01% 6.07us ( +- 7.13% ) PENDING_INTERRUPT199 1.82% 0.02% 5.48us ( +- 4.36% ) CR_ACCESS 52 0.48% 0.00% 4.89us ( +- 4.09% ) EPT_VIOLATION 205718.85% 0.12% 3.15us ( +- 1.33% ) CPUID368 3.37% 0.02% 2.82us ( +- 2.79% ) Total Samples:10914, Total events handled time:5521782.02us. Dong Hao (3): KVM: x86: export svm/vmx exit code and vector code to userspace KVM: x86: trace mmio begin and complete KVM: perf: kvm events analysis tool arch/x86/include/asm/kvm_host.h | 36 +- arch/x86/include/asm/svm.h| 205 +--- arch/x86/include/asm/vmx.h
Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment
Hi, Am 27.08.2012 08:28, schrieb Jan Kiszka: From: Jan Kiszka jan.kis...@siemens.com This adds PCI device assignment for i386 targets using the classic KVM interfaces. This version is 100% identical to what is being maintained in qemu-kvm for several years and is supported by libvirt as well. It is expected to remain relevant for another couple of years until kernels without full-features and performance-wise equivalent VFIO support are obsolete. A refactoring to-do that should be done in-tree is to model MSI and MSI-X support via the generic PCI layer, similar to what VFIO is already doing for MSI-X. This should improve the correctness and clean up the code from duplicate logic. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/kvm/Makefile.objs |2 +- hw/kvm/pci-assign.c | 1929 ++ 2 files changed, 1930 insertions(+), 1 deletions(-) create mode 100644 hw/kvm/pci-assign.c [...] diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c new file mode 100644 index 000..9cce02c --- /dev/null +++ b/hw/kvm/pci-assign.c @@ -0,0 +1,1929 @@ +/* + * Copyright (c) 2007, Neocleus Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. The downside of accepting this into qemu.git is that it gets us a huge blob of GPLv2-only code without history of contributors for GPLv2+ relicensing... + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. (Expect the usual GNU address reminder here.) + * + * + * Assign a PCI device from the host to a guest VM. + * + * Adapted for KVM by Qumranet. + * + * Copyright (c) 2007, Neocleus, Alex Novik (a...@neocleus.com) + * Copyright (c) 2007, Neocleus, Guy Zana (g...@neocleus.com) + * Copyright (C) 2008, Qumranet, Amit Shah (amit.s...@qumranet.com) + * Copyright (C) 2008, Red Hat, Amit Shah (amit.s...@redhat.com) + * Copyright (C) 2008, IBM, Muli Ben-Yehuda (m...@il.ibm.com) + */ +#include stdio.h +#include unistd.h +#include sys/io.h +#include sys/mman.h +#include sys/types.h +#include sys/stat.h +#include hw/hw.h +#include hw/pc.h +#include qemu-error.h +#include console.h +#include hw/loader.h +#include monitor.h +#include range.h +#include sysemu.h +#include hw/pci.h +#include hw/msi.h +#include kvm_i386.h Am I correct to understand we compile this only for i386 / x86_64? (apic.o in kvm/Makefile.objs hints in that direction) You may want to update the description in the comment above accordingly, also mentioning that this is some deprecated backwards-compatibility thing. Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment
On 2012-08-27 14:07, Andreas Färber wrote: Hi, Am 27.08.2012 08:28, schrieb Jan Kiszka: From: Jan Kiszka jan.kis...@siemens.com This adds PCI device assignment for i386 targets using the classic KVM interfaces. This version is 100% identical to what is being maintained in qemu-kvm for several years and is supported by libvirt as well. It is expected to remain relevant for another couple of years until kernels without full-features and performance-wise equivalent VFIO support are obsolete. A refactoring to-do that should be done in-tree is to model MSI and MSI-X support via the generic PCI layer, similar to what VFIO is already doing for MSI-X. This should improve the correctness and clean up the code from duplicate logic. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/kvm/Makefile.objs |2 +- hw/kvm/pci-assign.c | 1929 ++ 2 files changed, 1930 insertions(+), 1 deletions(-) create mode 100644 hw/kvm/pci-assign.c [...] diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c new file mode 100644 index 000..9cce02c --- /dev/null +++ b/hw/kvm/pci-assign.c @@ -0,0 +1,1929 @@ +/* + * Copyright (c) 2007, Neocleus Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. The downside of accepting this into qemu.git is that it gets us a huge blob of GPLv2-only code without history of contributors for GPLv2+ relicensing... The history is documented in qemu-kvm. I personally don't see it will pay off going through this, but someone else may, and nothing will prevent trying this at least. I can leave a comment. BTW, VFIO will be GPLv2 only as well. If I understood Alex correctly, it is too much derived from this code. IOW: There is probably no PCI assignment without this restriction in the foreseeable future. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. (Expect the usual GNU address reminder here.) Will fix. + * + * + * Assign a PCI device from the host to a guest VM. + * + * Adapted for KVM by Qumranet. + * + * Copyright (c) 2007, Neocleus, Alex Novik (a...@neocleus.com) + * Copyright (c) 2007, Neocleus, Guy Zana (g...@neocleus.com) + * Copyright (C) 2008, Qumranet, Amit Shah (amit.s...@qumranet.com) + * Copyright (C) 2008, Red Hat, Amit Shah (amit.s...@redhat.com) + * Copyright (C) 2008, IBM, Muli Ben-Yehuda (m...@il.ibm.com) + */ +#include stdio.h +#include unistd.h +#include sys/io.h +#include sys/mman.h +#include sys/types.h +#include sys/stat.h +#include hw/hw.h +#include hw/pc.h +#include qemu-error.h +#include console.h +#include hw/loader.h +#include monitor.h +#include range.h +#include sysemu.h +#include hw/pci.h +#include hw/msi.h +#include kvm_i386.h Am I correct to understand we compile this only for i386 / x86_64? This is correct. (apic.o in kvm/Makefile.objs hints in that direction) You may want to update the description in the comment above accordingly, also mentioning that this is some deprecated backwards-compatibility thing. You mean in the header of pci-assign.c? Can do. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 0/4] migrate PV EOI MSR
It turns out PV EOI gets disabled after migration - until next guest reset. This is because we are missing code to actually migrate it. This patch fixes it up: it applies cleanly to qemu.git as well as qemu-kvm.git, so I think it's cleaner to apply it in qemu.git to keep diff to minimum. Note: there's talk about adding infrastructure for CPUID whitelisting which thinkably could be used for migration compat support. I am guessing this won't be 1.2 material - when it's ready we can easily replace a simple flag that this patchset adds with something else. So this just adds minimal code to avoid regressing cross-version migration. Note: there's a kernel bug in linux 3.6-rc3 - apply my patch 'kvm: fix KVM_GET_MSR for PV EOI' in order to use this patchset on it. Needed for 1.2. Changes from v1: Update all headers from 3.6-rc3 to keep them in sync (Jan) Disable cpuid flag for qemu 1.2 and older (Orit) Michael S. Tsirkin (4): linux-headers: update to 3.6-rc3 pc: refactor compat code cpuid: disable pv eoi for 1.1 and older compat types kvm: get/set PV EOI MSR hw/Makefile.objs | 2 +- hw/cpu_flags.c| 32 +++ hw/cpu_flags.h| 9 hw/pc_piix.c | 46 --- linux-headers/asm-s390/kvm.h | 2 +- linux-headers/asm-s390/kvm_para.h | 2 +- linux-headers/asm-x86/kvm.h | 1 + linux-headers/asm-x86/kvm_para.h | 7 ++ linux-headers/linux/kvm.h | 3 +++ target-i386/cpu.c | 8 +++ target-i386/cpu.h | 1 + target-i386/kvm.c | 13 +++ target-i386/machine.c | 21 ++ 13 files changed, 136 insertions(+), 11 deletions(-) create mode 100644 hw/cpu_flags.c create mode 100644 hw/cpu_flags.h -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 1/4] linux-headers: update to 3.6-rc3
Update linux-headers to version present in Linux 3.6-rc3. Header asm-x96_64/kvm_para.h update is needed for the new PV EOI feature. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-headers/asm-s390/kvm.h | 2 +- linux-headers/asm-s390/kvm_para.h | 2 +- linux-headers/asm-x86/kvm.h | 1 + linux-headers/asm-x86/kvm_para.h | 7 +++ linux-headers/linux/kvm.h | 3 +++ 5 files changed, 13 insertions(+), 2 deletions(-) diff --git a/linux-headers/asm-s390/kvm.h b/linux-headers/asm-s390/kvm.h index bdcbe0f..d25da59 100644 --- a/linux-headers/asm-s390/kvm.h +++ b/linux-headers/asm-s390/kvm.h @@ -1,7 +1,7 @@ #ifndef __LINUX_KVM_S390_H #define __LINUX_KVM_S390_H /* - * asm-s390/kvm.h - KVM s390 specific structures and definitions + * KVM s390 specific structures and definitions * * Copyright IBM Corp. 2008 * diff --git a/linux-headers/asm-s390/kvm_para.h b/linux-headers/asm-s390/kvm_para.h index 8e2dd67..870051f 100644 --- a/linux-headers/asm-s390/kvm_para.h +++ b/linux-headers/asm-s390/kvm_para.h @@ -1,5 +1,5 @@ /* - * asm-s390/kvm_para.h - definition for paravirtual devices on s390 + * definition for paravirtual devices on s390 * * Copyright IBM Corp. 2008 * diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h index e7d1c19..246617e 100644 --- a/linux-headers/asm-x86/kvm.h +++ b/linux-headers/asm-x86/kvm.h @@ -12,6 +12,7 @@ /* Select x86 specific features in linux/kvm.h */ #define __KVM_HAVE_PIT #define __KVM_HAVE_IOAPIC +#define __KVM_HAVE_IRQ_LINE #define __KVM_HAVE_DEVICE_ASSIGNMENT #define __KVM_HAVE_MSI #define __KVM_HAVE_USER_NMI diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h index f2ac46a..a1c3d72 100644 --- a/linux-headers/asm-x86/kvm_para.h +++ b/linux-headers/asm-x86/kvm_para.h @@ -22,6 +22,7 @@ #define KVM_FEATURE_CLOCKSOURCE23 #define KVM_FEATURE_ASYNC_PF 4 #define KVM_FEATURE_STEAL_TIME 5 +#define KVM_FEATURE_PV_EOI 6 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -37,6 +38,7 @@ #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 +#define MSR_KVM_PV_EOI_EN 0x4b564d04 struct kvm_steal_time { __u64 steal; @@ -89,5 +91,10 @@ struct kvm_vcpu_pv_apf_data { __u32 enabled; }; +#define KVM_PV_EOI_BIT 0 +#define KVM_PV_EOI_MASK (0x1 KVM_PV_EOI_BIT) +#define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK +#define KVM_PV_EOI_DISABLED 0x0 + #endif /* _ASM_X86_KVM_PARA_H */ diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 5a9d4e3..4b9e575 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -617,6 +617,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_SIGNAL_MSI 77 #define KVM_CAP_PPC_GET_SMMU_INFO 78 #define KVM_CAP_S390_COW 79 +#define KVM_CAP_PPC_ALLOC_HTAB 80 #ifdef KVM_CAP_IRQ_ROUTING @@ -828,6 +829,8 @@ struct kvm_s390_ucas_mapping { #define KVM_SIGNAL_MSI_IOW(KVMIO, 0xa5, struct kvm_msi) /* Available with KVM_CAP_PPC_GET_SMMU_INFO */ #define KVM_PPC_GET_SMMU_INFO_IOR(KVMIO, 0xa6, struct kvm_ppc_smmu_info) +/* Available with KVM_CAP_PPC_ALLOC_HTAB */ +#define KVM_PPC_ALLOCATE_HTAB_IOWR(KVMIO, 0xa7, __u32) /* * ioctls for vcpu fds -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 2/4] pc: refactor compat code
In preparation to adding PV EOI migration for 1.2, trivially refactor some some compat code to make it easier to add version specific cpuid tweaks. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pc_piix.c | 44 1 file changed, 36 insertions(+), 8 deletions(-) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index a771d79..008d42f 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -369,6 +369,22 @@ static QEMUMachine pc_machine_v1_2 = { .default_machine_opts = KVM_MACHINE_OPTIONS, }; +static void pc_machine_v1_1_compat(void) +{ +} + +static void pc_init_pci_v1_1(ram_addr_t ram_size, + const char *boot_device, + const char *kernel_filename, + const char *kernel_cmdline, + const char *initrd_filename, + const char *cpu_model) +{ +pc_machine_v1_1_compat(); +pc_init_pci(ram_size, boot_device, kernel_filename, +kernel_cmdline, initrd_filename, cpu_model); +} + #define PC_COMPAT_1_1 \ {\ .driver = virtio-scsi-pci,\ @@ -403,7 +419,7 @@ static QEMUMachine pc_machine_v1_2 = { static QEMUMachine pc_machine_v1_1 = { .name = pc-1.1, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_v1_1, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { @@ -439,7 +455,7 @@ static QEMUMachine pc_machine_v1_1 = { static QEMUMachine pc_machine_v1_0 = { .name = pc-1.0, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_v1_1, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { @@ -455,7 +471,7 @@ static QEMUMachine pc_machine_v1_0 = { static QEMUMachine pc_machine_v0_15 = { .name = pc-0.15, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_v1_1, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { @@ -488,7 +504,7 @@ static QEMUMachine pc_machine_v0_15 = { static QEMUMachine pc_machine_v0_14 = { .name = pc-0.14, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_v1_1, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { @@ -519,10 +535,22 @@ static QEMUMachine pc_machine_v0_14 = { .value= stringify(1),\ } +static void pc_init_pci_v0_13(ram_addr_t ram_size, + const char *boot_device, + const char *kernel_filename, + const char *kernel_cmdline, + const char *initrd_filename, + const char *cpu_model) +{ +pc_machine_v1_1_compat(); +pc_init_pci_no_kvmclock(ram_size, boot_device, kernel_filename, +kernel_cmdline, initrd_filename, cpu_model); +} + static QEMUMachine pc_machine_v0_13 = { .name = pc-0.13, .desc = Standard PC, -.init = pc_init_pci_no_kvmclock, +.init = pc_init_pci_v0_13, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { @@ -560,7 +588,7 @@ static QEMUMachine pc_machine_v0_13 = { static QEMUMachine pc_machine_v0_12 = { .name = pc-0.12, .desc = Standard PC, -.init = pc_init_pci_no_kvmclock, +.init = pc_init_pci_v0_13, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { @@ -594,7 +622,7 @@ static QEMUMachine pc_machine_v0_12 = { static QEMUMachine pc_machine_v0_11 = { .name = pc-0.11, .desc = Standard PC, qemu 0.11, -.init = pc_init_pci_no_kvmclock, +.init = pc_init_pci_v0_13, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { @@ -616,7 +644,7 @@ static QEMUMachine pc_machine_v0_11 = { static QEMUMachine pc_machine_v0_10 = { .name = pc-0.10, .desc = Standard PC, qemu 0.10, -.init = pc_init_pci_no_kvmclock, +.init = pc_init_pci_v0_13, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types
In preparation for adding PV EOI support, disable PV EOI by default for 1.1 and older machine types, to avoid CPUID changing during migration. PV EOI can still be enabled/disabled by specifying it explicitly. Enable for 1.1 -M pc-1.1 -cpu kvm64,+kvm_pv_eoi Disable for 1.2 -M pc-1.2 -cpu kvm64,-kvm_pv_eoi Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/Makefile.objs | 2 +- hw/cpu_flags.c| 32 hw/cpu_flags.h| 9 + hw/pc_piix.c | 2 ++ target-i386/cpu.c | 8 5 files changed, 52 insertions(+), 1 deletion(-) create mode 100644 hw/cpu_flags.c create mode 100644 hw/cpu_flags.h diff --git a/hw/Makefile.objs b/hw/Makefile.objs index 850b87b..3f2532a 100644 --- a/hw/Makefile.objs +++ b/hw/Makefile.objs @@ -1,5 +1,5 @@ hw-obj-y = usb/ ide/ -hw-obj-y += loader.o +hw-obj-y += loader.o cpu_flags.o hw-obj-$(CONFIG_VIRTIO) += virtio-console.o hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c new file mode 100644 index 000..2422d20 --- /dev/null +++ b/hw/cpu_flags.c @@ -0,0 +1,32 @@ +/* + * CPU compatibility flags. + * + * Copyright (c) 2012 Red Hat Inc. + * Author: Michael S. Tsirkin. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ +#include hw/cpu_flags.h + +static bool __kvm_pv_eoi_disabled; + +void disable_kvm_pv_eoi(void) +{ + __kvm_pv_eoi_disabled = true; +} + +bool kvm_pv_eoi_disabled(void) +{ + return __kvm_pv_eoi_disabled; +} diff --git a/hw/cpu_flags.h b/hw/cpu_flags.h new file mode 100644 index 000..05777b6 --- /dev/null +++ b/hw/cpu_flags.h @@ -0,0 +1,9 @@ +#ifndef HW_CPU_FLAGS_H +#define HW_CPU_FLAGS_H + +#include stdbool.h + +void disable_kvm_pv_eoi(void); +bool kvm_pv_eoi_disabled(void); + +#endif diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 008d42f..bdbceda 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -46,6 +46,7 @@ #ifdef CONFIG_XEN # include xen/hvm/hvm_info_table.h #endif +#include cpu_flags.h #define MAX_IDE_BUS 2 @@ -371,6 +372,7 @@ static QEMUMachine pc_machine_v1_2 = { static void pc_machine_v1_1_compat(void) { +disable_kvm_pv_eoi(); } static void pc_init_pci_v1_1(ram_addr_t ram_size, diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 120a2e3..0d02fd1 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -23,6 +23,7 @@ #include cpu.h #include kvm.h +#include asm/kvm_para.h #include qemu-option.h #include qemu-config.h @@ -33,6 +34,7 @@ #include hyperv.h #include hw/hw.h +#include hw/cpu_flags.h /* feature flags taken from Intel Processor Identification and the CPUID * Instruction and AMD's CPUID Specification. In cases of disagreement @@ -889,6 +891,12 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, const char *cpu_model) plus_kvm_features = ~0; /* not supported bits will be filtered out later */ +/* Disable PV EOI for old machine types. + * Feature flags can still override. */ +if (kvm_pv_eoi_disabled()) { +plus_kvm_features = ~(0x1 KVM_FEATURE_PV_EOI); +} + add_flagname_to_bitmaps(hypervisor, plus_features, plus_ext_features, plus_ext2_features, plus_ext3_features, plus_kvm_features, plus_svm_features); -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 4/4] kvm: get/set PV EOI MSR
Support get/set of new PV EOI MSR, for migration. Add an optional section for MSR value - send it out in case MSR was changed from the default value (0). Signed-off-by: Michael S. Tsirkin m...@redhat.com --- target-i386/cpu.h | 1 + target-i386/kvm.c | 13 + target-i386/machine.c | 21 + 3 files changed, 35 insertions(+) diff --git a/target-i386/cpu.h b/target-i386/cpu.h index aabf993..3c57d8b 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -699,6 +699,7 @@ typedef struct CPUX86State { uint64_t system_time_msr; uint64_t wall_clock_msr; uint64_t async_pf_en_msr; +uint64_t pv_eoi_en_msr; uint64_t tsc; uint64_t tsc_deadline; diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 5e2d4f5..6790180 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -64,6 +64,7 @@ static bool has_msr_star; static bool has_msr_hsave_pa; static bool has_msr_tsc_deadline; static bool has_msr_async_pf_en; +static bool has_msr_pv_eoi_en; static bool has_msr_misc_enable; static int lm_capable_kernel; @@ -456,6 +457,8 @@ int kvm_arch_init_vcpu(CPUX86State *env) has_msr_async_pf_en = c-eax (1 KVM_FEATURE_ASYNC_PF); +has_msr_pv_eoi_en = c-eax (1 KVM_FEATURE_PV_EOI); + cpu_x86_cpuid(env, 0, 0, limit, unused, unused, unused); for (i = 0; i = limit; i++) { @@ -1018,6 +1021,10 @@ static int kvm_put_msrs(CPUX86State *env, int level) kvm_msr_entry_set(msrs[n++], MSR_KVM_ASYNC_PF_EN, env-async_pf_en_msr); } +if (has_msr_pv_eoi_en) { +kvm_msr_entry_set(msrs[n++], MSR_KVM_PV_EOI_EN, + env-pv_eoi_en_msr); +} if (hyperv_hypercall_available()) { kvm_msr_entry_set(msrs[n++], HV_X64_MSR_GUEST_OS_ID, 0); kvm_msr_entry_set(msrs[n++], HV_X64_MSR_HYPERCALL, 0); @@ -1260,6 +1267,9 @@ static int kvm_get_msrs(CPUX86State *env) if (has_msr_async_pf_en) { msrs[n++].index = MSR_KVM_ASYNC_PF_EN; } +if (has_msr_pv_eoi_en) { +msrs[n++].index = MSR_KVM_PV_EOI_EN; +} if (env-mcg_cap) { msrs[n++].index = MSR_MCG_STATUS; @@ -1339,6 +1349,9 @@ static int kvm_get_msrs(CPUX86State *env) case MSR_KVM_ASYNC_PF_EN: env-async_pf_en_msr = msrs[i].data; break; +case MSR_KVM_PV_EOI_EN: +env-pv_eoi_en_msr = msrs[i].data; +break; } } diff --git a/target-i386/machine.c b/target-i386/machine.c index a8be058..4771508 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -279,6 +279,13 @@ static bool async_pf_msr_needed(void *opaque) return cpu-async_pf_en_msr != 0; } +static bool pv_eoi_msr_needed(void *opaque) +{ +CPUX86State *cpu = opaque; + +return cpu-pv_eoi_en_msr != 0; +} + static const VMStateDescription vmstate_async_pf_msr = { .name = cpu/async_pf_msr, .version_id = 1, @@ -290,6 +297,17 @@ static const VMStateDescription vmstate_async_pf_msr = { } }; +static const VMStateDescription vmstate_pv_eoi_msr = { +.name = cpu/async_pv_eoi_msr, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT64(pv_eoi_en_msr, CPUX86State), +VMSTATE_END_OF_LIST() +} +}; + static bool fpop_ip_dp_needed(void *opaque) { CPUX86State *env = opaque; @@ -454,6 +472,9 @@ static const VMStateDescription vmstate_cpu = { .vmsd = vmstate_async_pf_msr, .needed = async_pf_msr_needed, } , { +.vmsd = vmstate_pv_eoi_msr, +.needed = pv_eoi_msr_needed, +} , { .vmsd = vmstate_fpop_ip_dp, .needed = fpop_ip_dp_needed, }, { -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3
On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote: Update linux-headers to version present in Linux 3.6-rc3. Header asm-x96_64/kvm_para.h update is needed for the new PV EOI feature. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-headers/asm-s390/kvm.h | 2 +- linux-headers/asm-s390/kvm_para.h | 2 +- linux-headers/asm-x86/kvm.h | 1 + linux-headers/asm-x86/kvm_para.h | 7 +++ linux-headers/linux/kvm.h | 3 +++ 5 files changed, 13 insertions(+), 2 deletions(-) The latest version of update-linux-headers.sh should have caused this update to include asm-generic/kvm_para.h, I think. Did the script not pull that header in, or were you maybe using an old version of the script or forgot to git add the new file? thanks -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3
On 2012-08-27 14:42, Peter Maydell wrote: On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote: Update linux-headers to version present in Linux 3.6-rc3. Header asm-x96_64/kvm_para.h update is needed for the new PV EOI feature. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-headers/asm-s390/kvm.h | 2 +- linux-headers/asm-s390/kvm_para.h | 2 +- linux-headers/asm-x86/kvm.h | 1 + linux-headers/asm-x86/kvm_para.h | 7 +++ linux-headers/linux/kvm.h | 3 +++ 5 files changed, 13 insertions(+), 2 deletions(-) The latest version of update-linux-headers.sh should have caused this update to include asm-generic/kvm_para.h, I think. Did the script not pull that header in, or were you maybe using an old version of the script or forgot to git add the new file? To be fair, that is hard to guess. We should add some magic to the update script to detect new files and maybe suggest them for addition. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 0/3] KVM: perf: kvm events analysis tool
On 8/27/12 3:59 AM, Xiao Guangrong wrote: CC David. Hi David, I should apologize to you that Dong forgot to post the patchset to you. Could you pick these up from the mail list? Yes, I do catch all perf related emails to LKML. I'll take a look at the patches today or tomorrow. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/3] KVM: move postcommit flush to x86, as mmio sptes are x86 specific
On Fri, 24 Aug 2012 15:54:59 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: Other arches do not need this. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -6455,6 +6455,14 @@ void kvm_arch_commit_memory_region(struc kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages); kvm_mmu_slot_remove_write_access(kvm, mem-slot); spin_unlock(kvm-mmu_lock); + /* + * If the new memory slot is created, we need to clear all + * mmio sptes. + */ + if (old.npages == 0 npages) { + kvm_mmu_zap_all(kvm); + kvm_reload_remote_mmus(kvm); + } } Any explanation why (old.base_gfn != new.base_gfn) case can be omitted? Takuya void kvm_arch_flush_shadow_all(struct kvm *kvm) Index: kvm/virt/kvm/kvm_main.c === --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -849,13 +849,6 @@ int __kvm_set_memory_region(struct kvm * kvm_arch_commit_memory_region(kvm, mem, old, user_alloc); - /* - * If the new memory slot is created, we need to clear all - * mmio sptes. - */ - if (npages old.base_gfn != mem-guest_phys_addr PAGE_SHIFT) - kvm_arch_flush_shadow_all(kvm); - kvm_free_physmem_slot(old, new); kfree(old_memslots); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3
On Mon, Aug 27, 2012 at 01:42:03PM +0100, Peter Maydell wrote: On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote: Update linux-headers to version present in Linux 3.6-rc3. Header asm-x96_64/kvm_para.h update is needed for the new PV EOI feature. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-headers/asm-s390/kvm.h | 2 +- linux-headers/asm-s390/kvm_para.h | 2 +- linux-headers/asm-x86/kvm.h | 1 + linux-headers/asm-x86/kvm_para.h | 7 +++ linux-headers/linux/kvm.h | 3 +++ 5 files changed, 13 insertions(+), 2 deletions(-) The latest version of update-linux-headers.sh should have caused this update to include asm-generic/kvm_para.h, I think. Did the script not pull that header in, or were you maybe using an old version of the script or forgot to git add the new file? thanks -- PMM I have no idea but adding new files is not the same as updating existing ones. Why don't you add it when you update headers to a version that actually uses it? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3
On Mon, Aug 27, 2012 at 02:48:57PM +0200, Jan Kiszka wrote: On 2012-08-27 14:42, Peter Maydell wrote: On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote: Update linux-headers to version present in Linux 3.6-rc3. Header asm-x96_64/kvm_para.h update is needed for the new PV EOI feature. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-headers/asm-s390/kvm.h | 2 +- linux-headers/asm-s390/kvm_para.h | 2 +- linux-headers/asm-x86/kvm.h | 1 + linux-headers/asm-x86/kvm_para.h | 7 +++ linux-headers/linux/kvm.h | 3 +++ 5 files changed, 13 insertions(+), 2 deletions(-) The latest version of update-linux-headers.sh should have caused this update to include asm-generic/kvm_para.h, I think. Did the script not pull that header in, or were you maybe using an old version of the script or forgot to git add the new file? To be fair, that is hard to guess. We should add some magic to the update script to detect new files and maybe suggest them for addition. Jan But why did you add a header to qemu without adding it to git? That's a cleaner solution and needs no magic scripting. -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3
On 2012-08-27 16:53, Michael S. Tsirkin wrote: On Mon, Aug 27, 2012 at 02:48:57PM +0200, Jan Kiszka wrote: On 2012-08-27 14:42, Peter Maydell wrote: On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote: Update linux-headers to version present in Linux 3.6-rc3. Header asm-x96_64/kvm_para.h update is needed for the new PV EOI feature. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-headers/asm-s390/kvm.h | 2 +- linux-headers/asm-s390/kvm_para.h | 2 +- linux-headers/asm-x86/kvm.h | 1 + linux-headers/asm-x86/kvm_para.h | 7 +++ linux-headers/linux/kvm.h | 3 +++ 5 files changed, 13 insertions(+), 2 deletions(-) The latest version of update-linux-headers.sh should have caused this update to include asm-generic/kvm_para.h, I think. Did the script not pull that header in, or were you maybe using an old version of the script or forgot to git add the new file? To be fair, that is hard to guess. We should add some magic to the update script to detect new files and maybe suggest them for addition. Jan But why did you add a header to qemu without adding it to git? That's a cleaner solution and needs no magic scripting. Yes, this would have been appropriate. Still, a simple git status -s linux-headers run at the end of the update script can help reminding people in the future. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting
On Sat, 2012-08-25 at 19:36 -0400, Glauber Costa wrote: On 08/24/2012 11:11 AM, Michael Wolf wrote: On Fri, 2012-08-24 at 08:53 +0400, Glauber Costa wrote: On 08/24/2012 03:14 AM, Michael Wolf wrote: This is an RFC regarding the reporting of stealtime. In the case of where you have a system that is running with partial processors such as KVM the user may see steal time being reported in accounting tools such as top or vmstat. This can cause confusion for the end user. To ease the confusion this patch set adds a sysctl interface to set the cpu entitlement. This is the percentage of cpu that the guest system is expected to receive. As long as the steal time is within its expected range it will show up as 0 in /proc/stat. The user will then see in the accounting tools that they are getting a full utilization of the cpu resources assigned to them. And how is such a knob not confusing? Steal time is pretty well defined in meaning and is shown in top for ages. I really don't see the point for this. Currently you can see the steal time but you have no way of knowing if the cpu utilization you are seeing on the guest is the expected amount. I decided on making it a knob because a guest could be migrated to another system and it's entitlement could change because of hardware or load differences. It could simply be a /proc file and report the current entitlement if needed. As things are currently implemented I don't see how someone knows if the guest is running as expected or whether there is a problem. Turning off steal time display won't get even close to displaying the information you want. What you probably want is a guest-visible way to say how many miliseconds you are expected to run each second. Right? It is not clear to me how knowing how many milliseconds you are expecting to run will help the user. Currently the users will run top to see how well the guest is running. If they see _any_ steal time some users think they are not getting the full use of their processor entitlement. Maybe I'm missing what you are proposing, but even if you knew the milliseconds that you were expecting for your processor you would have to adjust the top output in your head so to speak. You would see the utilization and then say 'ok that matches the number of milliseconds I expected to run... If we take away the steal time (as long as it is equal to or less than the expected amount of steal time) then the user running top will see the 100% utilization. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
On Mon, Aug 27, 2012 at 05:51:46PM +0800, Dong Hao wrote: snip +struct event_stats { + u64 count; + u64 time; + + /* used to calculate stddev. */ + double mean; + double M2; +}; How about moving the stats functions from builtin-stat.c to e.g. util/stats.c, and then reusing them? Then this struct (which I would rename to kvm_event_stats) would look like this struct kvm_event_stats { u64 time; struct stats stats; }; of course the get_event_ accessor generators would need tweaking snip +static void update_event_stats(struct event_stats *stats, u64 time_diff) +{ + double delta; + + stats-count++; + stats-time += time_diff; + + delta = time_diff - stats-mean; + stats-mean += delta / stats-count; + stats-M2 += delta*(time_diff - stats-mean); +} Reusing stats would allow this to become just static void update_event_stats(struct kvm_event_stats *stats, u64 time_diff) { update_stats(kvm_stats-stats, time_diff); kvm_stats-time += time_diff; } + +static double event_stats_stddev(int vcpu_id, struct kvm_event *event) +{ + struct event_stats *stats = event-total; + double variance, variance_mean, stddev; + + if (vcpu_id != -1) + stats = event-vcpu[vcpu_id]; + + BUG_ON(!stats-count); + + variance = stats-M2 / (stats-count - 1); + variance_mean = variance / stats-count; + stddev = sqrt(variance_mean); + + return stddev * 100 / stats-mean; This function's name implies it returns the stddev, but it returns the relative stddev instead. Maybe rename it? This would be simplified with code reuse too to basically just return stddev_stats(kvm_stats-stats) * 100 / kvm_stats-stats.mean; Drew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: setting time in guest with ntpdate results in VM hang
On 8/24/2012 1:43 PM, Marcelo Tosatti wrote: On Fri, Aug 24, 2012 at 09:57:35AM -0600, Dale Swanston wrote: Hello. We are running a guest OS of CentOS 4.4 (kernel 2.6.12) for legacy reasons, upgrading is not an option. NTP is running on the host and synching with a local GPS NTP server. But due to frequency drift in the guest it restarts itself periodically and upon start up performs an ntpdate to force a time jump on the guest. I have seen 2 occasions now (over 2 months) where the VM hangs right as the ntpdate command alters the guest clock (based on output in /var/log/messages). From the host's perspective the VM is still running but it appears to be using very high CPU percentage (more than typical). The only recovery option is to force shutdown of the VM and restart it. This should not happen. 1. Are there any known issues with ntpdate and VMs hanging? Any workarounds? 2. Are there any debugging tools further characterise the problem? Upgrading the guest kernel is not an option? At least install recent kernel in guest to confirm that its not an already fixed bug. Good idea. I'll try that. But are there any tools available to determine what the VM is doing when it appears hung? I've looked but haven't found much on debug or diagnostics on a running VM. Any links? Is it possible the guest kernel is panicking? What would the VM do if that happened? Would it shutdown? Thanks again. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3
On Mon, Aug 27, 2012 at 04:59:40PM +0200, Jan Kiszka wrote: On 2012-08-27 16:53, Michael S. Tsirkin wrote: On Mon, Aug 27, 2012 at 02:48:57PM +0200, Jan Kiszka wrote: On 2012-08-27 14:42, Peter Maydell wrote: On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote: Update linux-headers to version present in Linux 3.6-rc3. Header asm-x96_64/kvm_para.h update is needed for the new PV EOI feature. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-headers/asm-s390/kvm.h | 2 +- linux-headers/asm-s390/kvm_para.h | 2 +- linux-headers/asm-x86/kvm.h | 1 + linux-headers/asm-x86/kvm_para.h | 7 +++ linux-headers/linux/kvm.h | 3 +++ 5 files changed, 13 insertions(+), 2 deletions(-) The latest version of update-linux-headers.sh should have caused this update to include asm-generic/kvm_para.h, I think. Did the script not pull that header in, or were you maybe using an old version of the script or forgot to git add the new file? To be fair, that is hard to guess. We should add some magic to the update script to detect new files and maybe suggest them for addition. Jan But why did you add a header to qemu without adding it to git? That's a cleaner solution and needs no magic scripting. Yes, this would have been appropriate. Still, a simple git status -s linux-headers run at the end of the update script can help reminding people in the future. Jan Yes. But it would be better if instead of duplicating a list of files/directories, update-linux-headers.sh would just look at what is under linux-headers and update exactly that. This removes any chance of error, and avoids the need to tweak shell scripts each time we add a header. As a bonus we do not blow away random stuff developer might have under linux-headers. Thoughts? WFM --- scripts: better update headers Be more careful when updating headers: only update files we already have in git. Also remove need to list files in this script. Signed-off-by: Michael S. Tsirkin m...@redhat.com -- diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh index 9d2a4bc..6607e56 100755 --- a/scripts/update-linux-headers.sh +++ b/scripts/update-linux-headers.sh @@ -28,23 +28,33 @@ if [ -z $output ]; then output=$PWD fi -for arch in x86 powerpc s390; do -make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install - -rm -rf $output/linux-headers/asm-$arch -mkdir -p $output/linux-headers/asm-$arch -for header in kvm.h kvm_para.h; do -cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch -done -if [ $arch = x86 ]; then -cp $tmpdir/include/asm/hyperv.h $output/linux-headers/asm-x86 -fi -done +IFS=$'\n' + +#get list of files +dirs=`git ls-tree HEAD -- linux-headers/|grep tree|cut -f 2` +if [ -z $dirs ]; then +echo Unable to get list of directories under linux-headers/ to update +fi -rm -rf $output/linux-headers/linux -mkdir -p $output/linux-headers/linux -for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h; do -cp $tmpdir/include/linux/$header $output/linux-headers/linux +for d in $dirs; do +a=${d/#linux-headers\//} +case $a in +asm-*) +arch=${a/asm-/} +make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install +files=`git ls-tree -r HEAD -- $d |cut -f 2` +for dst in $files; do +src=include/asm/${dst/linux-headers\/asm-$arch\//} +cp -f $tmpdir/$src $output/$dst || exit 2 +done ;; +*) +make -C $linux INSTALL_HDR_PATH=$tmpdir headers_install +files=`git ls-tree -r HEAD -- $d |cut -f 2` +for dst in $files; do +src=include/${dst/linux-headers\//} +cp -f $tmpdir/$src $output/$dst || exit 2 +done ;; +esac done if [ -L $linux/source ]; then cp $linux/source/COPYING $output/linux-headers -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting
On 08/27/2012 08:50 AM, Michael Wolf wrote: On Sat, 2012-08-25 at 19:36 -0400, Glauber Costa wrote: On 08/24/2012 11:11 AM, Michael Wolf wrote: On Fri, 2012-08-24 at 08:53 +0400, Glauber Costa wrote: On 08/24/2012 03:14 AM, Michael Wolf wrote: This is an RFC regarding the reporting of stealtime. In the case of where you have a system that is running with partial processors such as KVM the user may see steal time being reported in accounting tools such as top or vmstat. This can cause confusion for the end user. To ease the confusion this patch set adds a sysctl interface to set the cpu entitlement. This is the percentage of cpu that the guest system is expected to receive. As long as the steal time is within its expected range it will show up as 0 in /proc/stat. The user will then see in the accounting tools that they are getting a full utilization of the cpu resources assigned to them. And how is such a knob not confusing? Steal time is pretty well defined in meaning and is shown in top for ages. I really don't see the point for this. Currently you can see the steal time but you have no way of knowing if the cpu utilization you are seeing on the guest is the expected amount. I decided on making it a knob because a guest could be migrated to another system and it's entitlement could change because of hardware or load differences. It could simply be a /proc file and report the current entitlement if needed. As things are currently implemented I don't see how someone knows if the guest is running as expected or whether there is a problem. Turning off steal time display won't get even close to displaying the information you want. What you probably want is a guest-visible way to say how many miliseconds you are expected to run each second. Right? It is not clear to me how knowing how many milliseconds you are expecting to run will help the user. Currently the users will run top to see how well the guest is running. If they see _any_ steal time some users think they are not getting the full use of their processor entitlement. And your plan is just to selectively lie about it, but disabling it with a knob? Maybe I'm missing what you are proposing, but even if you knew the milliseconds that you were expecting for your processor you would have to adjust the top output in your head so to speak. You would see the utilization and then say 'ok that matches the number of milliseconds I expected to run... If we take away the steal time (as long as it is equal to or less than the expected amount of steal time) then the user running top will see the 100% utilization. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting
On 08/23/2012 04:14 PM, Michael Wolf wrote: This is an RFC regarding the reporting of stealtime. In the case of where you have a system that is running with partial processors such as KVM the user may see steal time being reported in accounting tools such as top or vmstat. This can cause confusion for the end user. To ease the confusion this patch set adds a sysctl interface to set the cpu entitlement. This is the percentage of cpu that the guest system is expected to receive. As long as the steal time is within its expected range it will show up as 0 in /proc/stat. The user will then see in the accounting tools that they are getting a full utilization of the cpu resources assigned to them. This patchset is changing the contents/output of /proc/stat and could affect user tools. However the default setting is that the cpu is entitled to 100% so the code will act as before. Also another field could be added to the /proc/stat output and show the unaltered steal time. Since this additional field could cause more confusion than it would clear up I have left it out for now. How would a guest know what its entitlement is? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment
On Mon, Aug 27, 2012 at 06:56:38PM +, Blue Swirl wrote: +static uint32_t slow_bar_readb(void *opaque, target_phys_addr_t addr) +{ +AssignedDevRegion *d = opaque; +uint8_t *in = d-u.r_virtbase + addr; Don't perform arithmetic with void pointers. Why not? We require gcc and it's a documented extension there. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types
On Mon, Aug 27, 2012 at 06:58:29PM +, Blue Swirl wrote: On Mon, Aug 27, 2012 at 12:20 PM, Michael S. Tsirkin m...@redhat.com wrote: In preparation for adding PV EOI support, disable PV EOI by default for 1.1 and older machine types, to avoid CPUID changing during migration. PV EOI can still be enabled/disabled by specifying it explicitly. Enable for 1.1 -M pc-1.1 -cpu kvm64,+kvm_pv_eoi Disable for 1.2 -M pc-1.2 -cpu kvm64,-kvm_pv_eoi Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/Makefile.objs | 2 +- hw/cpu_flags.c| 32 hw/cpu_flags.h| 9 + hw/pc_piix.c | 2 ++ target-i386/cpu.c | 8 5 files changed, 52 insertions(+), 1 deletion(-) create mode 100644 hw/cpu_flags.c create mode 100644 hw/cpu_flags.h diff --git a/hw/Makefile.objs b/hw/Makefile.objs index 850b87b..3f2532a 100644 --- a/hw/Makefile.objs +++ b/hw/Makefile.objs @@ -1,5 +1,5 @@ hw-obj-y = usb/ ide/ -hw-obj-y += loader.o +hw-obj-y += loader.o cpu_flags.o hw-obj-$(CONFIG_VIRTIO) += virtio-console.o hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c new file mode 100644 index 000..2422d20 --- /dev/null +++ b/hw/cpu_flags.c @@ -0,0 +1,32 @@ +/* + * CPU compatibility flags. + * + * Copyright (c) 2012 Red Hat Inc. + * Author: Michael S. Tsirkin. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ +#include hw/cpu_flags.h + +static bool __kvm_pv_eoi_disabled; Don't use identifiers with leading underscores. C99 spec says Any other predefined macro names shall begin with a leading underscore followed by an uppercase letter or a second underscore. what are chances of compiler predefining macro __kvm_pv_eoi_disabled? But OK, will rename _kvm_pv_eoi_disabled. _ + lower case is guaranteed OK. + +void disable_kvm_pv_eoi(void) +{ + __kvm_pv_eoi_disabled = true; +} + +bool kvm_pv_eoi_disabled(void) +{ + return __kvm_pv_eoi_disabled; +} diff --git a/hw/cpu_flags.h b/hw/cpu_flags.h new file mode 100644 index 000..05777b6 --- /dev/null +++ b/hw/cpu_flags.h @@ -0,0 +1,9 @@ +#ifndef HW_CPU_FLAGS_H +#define HW_CPU_FLAGS_H + +#include stdbool.h + +void disable_kvm_pv_eoi(void); +bool kvm_pv_eoi_disabled(void); + +#endif diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 008d42f..bdbceda 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -46,6 +46,7 @@ #ifdef CONFIG_XEN # include xen/hvm/hvm_info_table.h #endif +#include cpu_flags.h #define MAX_IDE_BUS 2 @@ -371,6 +372,7 @@ static QEMUMachine pc_machine_v1_2 = { static void pc_machine_v1_1_compat(void) { +disable_kvm_pv_eoi(); } static void pc_init_pci_v1_1(ram_addr_t ram_size, diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 120a2e3..0d02fd1 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -23,6 +23,7 @@ #include cpu.h #include kvm.h +#include asm/kvm_para.h #include qemu-option.h #include qemu-config.h @@ -33,6 +34,7 @@ #include hyperv.h #include hw/hw.h +#include hw/cpu_flags.h /* feature flags taken from Intel Processor Identification and the CPUID * Instruction and AMD's CPUID Specification. In cases of disagreement @@ -889,6 +891,12 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, const char *cpu_model) plus_kvm_features = ~0; /* not supported bits will be filtered out later */ +/* Disable PV EOI for old machine types. + * Feature flags can still override. */ +if (kvm_pv_eoi_disabled()) { +plus_kvm_features = ~(0x1 KVM_FEATURE_PV_EOI); +} + add_flagname_to_bitmaps(hypervisor, plus_features, plus_ext_features, plus_ext2_features, plus_ext3_features, plus_kvm_features, plus_svm_features); -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types
On Mon, Aug 27, 2012 at 12:20 PM, Michael S. Tsirkin m...@redhat.com wrote: In preparation for adding PV EOI support, disable PV EOI by default for 1.1 and older machine types, to avoid CPUID changing during migration. PV EOI can still be enabled/disabled by specifying it explicitly. Enable for 1.1 -M pc-1.1 -cpu kvm64,+kvm_pv_eoi Disable for 1.2 -M pc-1.2 -cpu kvm64,-kvm_pv_eoi Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/Makefile.objs | 2 +- hw/cpu_flags.c| 32 hw/cpu_flags.h| 9 + hw/pc_piix.c | 2 ++ target-i386/cpu.c | 8 5 files changed, 52 insertions(+), 1 deletion(-) create mode 100644 hw/cpu_flags.c create mode 100644 hw/cpu_flags.h diff --git a/hw/Makefile.objs b/hw/Makefile.objs index 850b87b..3f2532a 100644 --- a/hw/Makefile.objs +++ b/hw/Makefile.objs @@ -1,5 +1,5 @@ hw-obj-y = usb/ ide/ -hw-obj-y += loader.o +hw-obj-y += loader.o cpu_flags.o hw-obj-$(CONFIG_VIRTIO) += virtio-console.o hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c new file mode 100644 index 000..2422d20 --- /dev/null +++ b/hw/cpu_flags.c @@ -0,0 +1,32 @@ +/* + * CPU compatibility flags. + * + * Copyright (c) 2012 Red Hat Inc. + * Author: Michael S. Tsirkin. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ +#include hw/cpu_flags.h + +static bool __kvm_pv_eoi_disabled; Don't use identifiers with leading underscores. + +void disable_kvm_pv_eoi(void) +{ + __kvm_pv_eoi_disabled = true; +} + +bool kvm_pv_eoi_disabled(void) +{ + return __kvm_pv_eoi_disabled; +} diff --git a/hw/cpu_flags.h b/hw/cpu_flags.h new file mode 100644 index 000..05777b6 --- /dev/null +++ b/hw/cpu_flags.h @@ -0,0 +1,9 @@ +#ifndef HW_CPU_FLAGS_H +#define HW_CPU_FLAGS_H + +#include stdbool.h + +void disable_kvm_pv_eoi(void); +bool kvm_pv_eoi_disabled(void); + +#endif diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 008d42f..bdbceda 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -46,6 +46,7 @@ #ifdef CONFIG_XEN # include xen/hvm/hvm_info_table.h #endif +#include cpu_flags.h #define MAX_IDE_BUS 2 @@ -371,6 +372,7 @@ static QEMUMachine pc_machine_v1_2 = { static void pc_machine_v1_1_compat(void) { +disable_kvm_pv_eoi(); } static void pc_init_pci_v1_1(ram_addr_t ram_size, diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 120a2e3..0d02fd1 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -23,6 +23,7 @@ #include cpu.h #include kvm.h +#include asm/kvm_para.h #include qemu-option.h #include qemu-config.h @@ -33,6 +34,7 @@ #include hyperv.h #include hw/hw.h +#include hw/cpu_flags.h /* feature flags taken from Intel Processor Identification and the CPUID * Instruction and AMD's CPUID Specification. In cases of disagreement @@ -889,6 +891,12 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, const char *cpu_model) plus_kvm_features = ~0; /* not supported bits will be filtered out later */ +/* Disable PV EOI for old machine types. + * Feature flags can still override. */ +if (kvm_pv_eoi_disabled()) { +plus_kvm_features = ~(0x1 KVM_FEATURE_PV_EOI); +} + add_flagname_to_bitmaps(hypervisor, plus_features, plus_ext_features, plus_ext2_features, plus_ext3_features, plus_kvm_features, plus_svm_features); -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment
On Mon, Aug 27, 2012 at 7:01 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Aug 27, 2012 at 06:56:38PM +, Blue Swirl wrote: +static uint32_t slow_bar_readb(void *opaque, target_phys_addr_t addr) +{ +AssignedDevRegion *d = opaque; +uint8_t *in = d-u.r_virtbase + addr; Don't perform arithmetic with void pointers. Why not? We require gcc and it's a documented extension there. We don't require GCC, Clang can be used for some targets already. Though it supports this non-standard extension too. It's a bad idea to introduce dependencies where it's not necessary. In this case it's not much effort to add the identifier for the struct and in fact the only benefit ever is that the lazy coder saves a few key presses. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types
On Mon, Aug 27, 2012 at 7:06 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Aug 27, 2012 at 06:58:29PM +, Blue Swirl wrote: On Mon, Aug 27, 2012 at 12:20 PM, Michael S. Tsirkin m...@redhat.com wrote: In preparation for adding PV EOI support, disable PV EOI by default for 1.1 and older machine types, to avoid CPUID changing during migration. PV EOI can still be enabled/disabled by specifying it explicitly. Enable for 1.1 -M pc-1.1 -cpu kvm64,+kvm_pv_eoi Disable for 1.2 -M pc-1.2 -cpu kvm64,-kvm_pv_eoi Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/Makefile.objs | 2 +- hw/cpu_flags.c| 32 hw/cpu_flags.h| 9 + hw/pc_piix.c | 2 ++ target-i386/cpu.c | 8 5 files changed, 52 insertions(+), 1 deletion(-) create mode 100644 hw/cpu_flags.c create mode 100644 hw/cpu_flags.h diff --git a/hw/Makefile.objs b/hw/Makefile.objs index 850b87b..3f2532a 100644 --- a/hw/Makefile.objs +++ b/hw/Makefile.objs @@ -1,5 +1,5 @@ hw-obj-y = usb/ ide/ -hw-obj-y += loader.o +hw-obj-y += loader.o cpu_flags.o hw-obj-$(CONFIG_VIRTIO) += virtio-console.o hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c new file mode 100644 index 000..2422d20 --- /dev/null +++ b/hw/cpu_flags.c @@ -0,0 +1,32 @@ +/* + * CPU compatibility flags. + * + * Copyright (c) 2012 Red Hat Inc. + * Author: Michael S. Tsirkin. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ +#include hw/cpu_flags.h + +static bool __kvm_pv_eoi_disabled; Don't use identifiers with leading underscores. C99 spec says Any other predefined macro names shall begin with a leading underscore followed by an uppercase letter or a second underscore. what are chances of compiler predefining macro __kvm_pv_eoi_disabled? Why do you even consider that since it's trivially easy to use something else? If a standard (and HACKING in our case) specifies something, why do you want to fight it? But OK, will rename _kvm_pv_eoi_disabled. _ + lower case is guaranteed OK. No, just use kvm_pv_eoi_disabled, the underscore is useless. + +void disable_kvm_pv_eoi(void) +{ + __kvm_pv_eoi_disabled = true; +} + +bool kvm_pv_eoi_disabled(void) +{ + return __kvm_pv_eoi_disabled; +} diff --git a/hw/cpu_flags.h b/hw/cpu_flags.h new file mode 100644 index 000..05777b6 --- /dev/null +++ b/hw/cpu_flags.h @@ -0,0 +1,9 @@ +#ifndef HW_CPU_FLAGS_H +#define HW_CPU_FLAGS_H + +#include stdbool.h + +void disable_kvm_pv_eoi(void); +bool kvm_pv_eoi_disabled(void); + +#endif diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 008d42f..bdbceda 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -46,6 +46,7 @@ #ifdef CONFIG_XEN # include xen/hvm/hvm_info_table.h #endif +#include cpu_flags.h #define MAX_IDE_BUS 2 @@ -371,6 +372,7 @@ static QEMUMachine pc_machine_v1_2 = { static void pc_machine_v1_1_compat(void) { +disable_kvm_pv_eoi(); } static void pc_init_pci_v1_1(ram_addr_t ram_size, diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 120a2e3..0d02fd1 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -23,6 +23,7 @@ #include cpu.h #include kvm.h +#include asm/kvm_para.h #include qemu-option.h #include qemu-config.h @@ -33,6 +34,7 @@ #include hyperv.h #include hw/hw.h +#include hw/cpu_flags.h /* feature flags taken from Intel Processor Identification and the CPUID * Instruction and AMD's CPUID Specification. In cases of disagreement @@ -889,6 +891,12 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, const char *cpu_model) plus_kvm_features = ~0; /* not supported bits will be filtered out later */ +/* Disable PV EOI for old machine types. + * Feature flags can still override. */ +if (kvm_pv_eoi_disabled()) { +plus_kvm_features = ~(0x1 KVM_FEATURE_PV_EOI); +} + add_flagname_to_bitmaps(hypervisor, plus_features, plus_ext_features, plus_ext2_features, plus_ext3_features, plus_kvm_features, plus_svm_features); -- MST -- To unsubscribe from this list: send
Re: [patch 3/3] KVM: move postcommit flush to x86, as mmio sptes are x86 specific
On Mon, Aug 27, 2012 at 11:41:08PM +0900, Takuya Yoshikawa wrote: On Fri, 24 Aug 2012 15:54:59 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: Other arches do not need this. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -6455,6 +6455,14 @@ void kvm_arch_commit_memory_region(struc kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages); kvm_mmu_slot_remove_write_access(kvm, mem-slot); spin_unlock(kvm-mmu_lock); + /* +* If the new memory slot is created, we need to clear all +* mmio sptes. +*/ + if (old.npages == 0 npages) { + kvm_mmu_zap_all(kvm); + kvm_reload_remote_mmus(kvm); + } } Any explanation why (old.base_gfn != new.base_gfn) case can be omitted? (old.base_gfn != new.base_gfn) check covers the cases 1. old.base_gfn = 0, new.base_gfn = !0 (slot creation) and x != 0, y != 0, x != y. 2. old.base_gfn = x, new.base_gfn = y (gpa base change) Patch 2 covers case 2, so its only necessary to cover case 1 here. Makes sense? Takuya void kvm_arch_flush_shadow_all(struct kvm *kvm) Index: kvm/virt/kvm/kvm_main.c === --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -849,13 +849,6 @@ int __kvm_set_memory_region(struct kvm * kvm_arch_commit_memory_region(kvm, mem, old, user_alloc); - /* -* If the new memory slot is created, we need to clear all -* mmio sptes. -*/ - if (npages old.base_gfn != mem-guest_phys_addr PAGE_SHIFT) - kvm_arch_flush_shadow_all(kvm); - kvm_free_physmem_slot(old, new); kfree(old_memslots); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] kvm: Fix nonsense handling of compat ioctl
On Wed, Aug 22, 2012 at 02:34:11PM +0100, Alan Cox wrote: From: Alan Cox a...@linux.intel.com KVM_SET_SIGNAL_MASK passed a NULL argument leaves the on stack signal sets uninitialized. It then passes them through to kvm_vcpu_ioctl_set_sigmask. We should be passing a NULL in this case not translated garbage. Signed-off-by: Alan Cox a...@linux.intel.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: setting time in guest with ntpdate results in VM hang
On 8/27/12 10:58 AM, Dale Swanston wrote: Good idea. I'll try that. But are there any tools available to determine what the VM is doing when it appears hung? I've looked but haven't found much on debug or diagnostics on a running VM. Any links? If you have the vmlinux, enable the gdbserver stub via Qemu's monitor. Then use 'gdb vmlinux', connect to the VM 'target remote host:port' and look at the backtrace. I have seen something similar using kvm-clock in a guest running 2.6.27. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types
On Mon, Aug 27, 2012 at 07:12:27PM +, Blue Swirl wrote: On Mon, Aug 27, 2012 at 7:06 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Aug 27, 2012 at 06:58:29PM +, Blue Swirl wrote: On Mon, Aug 27, 2012 at 12:20 PM, Michael S. Tsirkin m...@redhat.com wrote: In preparation for adding PV EOI support, disable PV EOI by default for 1.1 and older machine types, to avoid CPUID changing during migration. PV EOI can still be enabled/disabled by specifying it explicitly. Enable for 1.1 -M pc-1.1 -cpu kvm64,+kvm_pv_eoi Disable for 1.2 -M pc-1.2 -cpu kvm64,-kvm_pv_eoi Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/Makefile.objs | 2 +- hw/cpu_flags.c| 32 hw/cpu_flags.h| 9 + hw/pc_piix.c | 2 ++ target-i386/cpu.c | 8 5 files changed, 52 insertions(+), 1 deletion(-) create mode 100644 hw/cpu_flags.c create mode 100644 hw/cpu_flags.h diff --git a/hw/Makefile.objs b/hw/Makefile.objs index 850b87b..3f2532a 100644 --- a/hw/Makefile.objs +++ b/hw/Makefile.objs @@ -1,5 +1,5 @@ hw-obj-y = usb/ ide/ -hw-obj-y += loader.o +hw-obj-y += loader.o cpu_flags.o hw-obj-$(CONFIG_VIRTIO) += virtio-console.o hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c new file mode 100644 index 000..2422d20 --- /dev/null +++ b/hw/cpu_flags.c @@ -0,0 +1,32 @@ +/* + * CPU compatibility flags. + * + * Copyright (c) 2012 Red Hat Inc. + * Author: Michael S. Tsirkin. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ +#include hw/cpu_flags.h + +static bool __kvm_pv_eoi_disabled; Don't use identifiers with leading underscores. C99 spec says Any other predefined macro names shall begin with a leading underscore followed by an uppercase letter or a second underscore. what are chances of compiler predefining macro __kvm_pv_eoi_disabled? Why do you even consider that since it's trivially easy to use something else? If a standard (and HACKING in our case) specifies something, why do you want to fight it? I missed this in HACKING, you are right: 2.4. Reserved namespaces in C and POSIX Underscore capital, double underscore, and underscore 't' suffixes should be avoided. so _kvm_pv_eoi_disabled is ok __kvm_pv_eoi_disabled is not. Will fix. But OK, will rename _kvm_pv_eoi_disabled. _ + lower case is guaranteed OK. No, just use kvm_pv_eoi_disabled, the underscore is useless. It isn't useless, this avoids conflict with function name. _ says it's an internal variable used to implement kvm_pv_eoi_disabled in a very clear way. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
On 8/27/12 9:53 AM, Andrew Jones wrote: On Mon, Aug 27, 2012 at 05:51:46PM +0800, Dong Hao wrote: snip +struct event_stats { + u64 count; + u64 time; + + /* used to calculate stddev. */ + double mean; + double M2; +}; How about moving the stats functions from builtin-stat.c to e.g. util/stats.c, and then reusing them? Then this struct (which I would rename to kvm_event_stats) would look like this struct kvm_event_stats { u64 time; struct stats stats; }; of course the get_event_ accessor generators would need tweaking Given the history of the command (first submitted back in February) code refactoring can wait until there is a second user for the stats code. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types
On Mon, Aug 27, 2012 at 7:24 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Aug 27, 2012 at 07:12:27PM +, Blue Swirl wrote: On Mon, Aug 27, 2012 at 7:06 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Aug 27, 2012 at 06:58:29PM +, Blue Swirl wrote: On Mon, Aug 27, 2012 at 12:20 PM, Michael S. Tsirkin m...@redhat.com wrote: In preparation for adding PV EOI support, disable PV EOI by default for 1.1 and older machine types, to avoid CPUID changing during migration. PV EOI can still be enabled/disabled by specifying it explicitly. Enable for 1.1 -M pc-1.1 -cpu kvm64,+kvm_pv_eoi Disable for 1.2 -M pc-1.2 -cpu kvm64,-kvm_pv_eoi Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/Makefile.objs | 2 +- hw/cpu_flags.c| 32 hw/cpu_flags.h| 9 + hw/pc_piix.c | 2 ++ target-i386/cpu.c | 8 5 files changed, 52 insertions(+), 1 deletion(-) create mode 100644 hw/cpu_flags.c create mode 100644 hw/cpu_flags.h diff --git a/hw/Makefile.objs b/hw/Makefile.objs index 850b87b..3f2532a 100644 --- a/hw/Makefile.objs +++ b/hw/Makefile.objs @@ -1,5 +1,5 @@ hw-obj-y = usb/ ide/ -hw-obj-y += loader.o +hw-obj-y += loader.o cpu_flags.o hw-obj-$(CONFIG_VIRTIO) += virtio-console.o hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c new file mode 100644 index 000..2422d20 --- /dev/null +++ b/hw/cpu_flags.c @@ -0,0 +1,32 @@ +/* + * CPU compatibility flags. + * + * Copyright (c) 2012 Red Hat Inc. + * Author: Michael S. Tsirkin. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ +#include hw/cpu_flags.h + +static bool __kvm_pv_eoi_disabled; Don't use identifiers with leading underscores. C99 spec says Any other predefined macro names shall begin with a leading underscore followed by an uppercase letter or a second underscore. what are chances of compiler predefining macro __kvm_pv_eoi_disabled? Why do you even consider that since it's trivially easy to use something else? If a standard (and HACKING in our case) specifies something, why do you want to fight it? I missed this in HACKING, you are right: 2.4. Reserved namespaces in C and POSIX Underscore capital, double underscore, and underscore 't' suffixes should be avoided. so _kvm_pv_eoi_disabled is ok __kvm_pv_eoi_disabled is not. Will fix. No leading underscores. They are not used in QEMU. But OK, will rename _kvm_pv_eoi_disabled. _ + lower case is guaranteed OK. No, just use kvm_pv_eoi_disabled, the underscore is useless. It isn't useless, this avoids conflict with function name. _ says it's an internal variable used to implement kvm_pv_eoi_disabled in a very clear way. Sure, but there are infinite number of ways of making the identifiers unique. Using leading underscores is a way to ever conflict with compiler, linker, libc, POSIX etc. Don't do it. Where's your imagination, can't you invent any other prefix or suffix? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
/dev/kvm not sufficiently restricted, and in ways I didn't think were possible
I'm completely confused about access to /dev/kvm. In particular, it looks like it is too open to access, but in a way that I don't understand. On my machine, /dev/kvm is owned by root.root and mode 660. Here is the output of ls: % ls -l /dev/kvm crw-rw+ 1 root root 10, 232 Aug 24 15:03 /dev/kvm Despite that, when a process is uid 1000 and group id 1000, and not in any other groups, I can open /dev/kvm. I.e., here are the relevant lines from /proc/pid/status: Uid:1000100010001000 Gid:1000100010001000 Groups: 1000 Note, just to show this isn't some weirdness in /etc/passwd or /etc/groups, here is the output of stat on /dev/kvm: File: `/dev/kvm' Size: 0 Blocks: 0 IO Block: 4096 character special file Device: 5h/5d Inode: 2597329 Links: 1 Device type: a,e8 Access: (0660/crw-rw) Uid: (0/root) Gid: (0/root) Access: 2012-08-24 15:03:33.616998585 -0500 Modify: 2012-08-24 15:03:33.616998585 -0500 Change: 2012-08-24 15:03:33.616998585 -0500 Please note, I don't understand how this could really be. Regardless of what the /dev/kvm driver does, I don't get how I can get to open it if the file which `is' the device doesn't have the correct permissions. The driver can make access more restrictive than the file permissions, but not less restrictive, or so I thought. Also, if I try opening /dev/kvm as uid 1001 and group id 1000, again not in any other groups, it fails. I don't understand how this could be. Also, it means that uid 1000/gid 1000 can run virtual processes. I want to be able to limit that, and I would have thought that /dev/kvm having mode 660 and being owned by root.root would have done it. If it is any help, I am running a stock Debian Squeeze. The kernel is 2.6.32-5-amd64. Any help or pointers explaining how /dev/kvm can be opened by uid 1000/gid 1000 would be greatly appreciated. Also any explanation about why uid 1000 is different than 1001. Thanks -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: /dev/kvm not sufficiently restricted, and in ways I didn't think were possible
On 08/27/2012 01:11 PM, Henry Cejtin wrote: I'm completely confused about access to /dev/kvm. In particular, it looks like it is too open to access, but in a way that I don't understand. On my machine, /dev/kvm is owned by root.root and mode 660. Here is the output of ls: % ls -l /dev/kvm crw-rw+ 1 root root 10, 232 Aug 24 15:03 /dev/kvm Despite that, when a process is uid 1000 and group id 1000, and not in any other groups, I can open /dev/kvm. I.e., here are the relevant lines from /proc/pid/status: Uid:1000100010001000 Gid:1000100010001000 Groups: 1000 Note, just to show this isn't some weirdness in /etc/passwd or /etc/groups, here is the output of stat on /dev/kvm: File: `/dev/kvm' Size: 0 Blocks: 0 IO Block: 4096 character special file Device: 5h/5d Inode: 2597329 Links: 1 Device type: a,e8 Access: (0660/crw-rw) Uid: (0/root) Gid: (0/root) Access: 2012-08-24 15:03:33.616998585 -0500 Modify: 2012-08-24 15:03:33.616998585 -0500 Change: 2012-08-24 15:03:33.616998585 -0500 Please note, I don't understand how this could really be. Regardless of what the /dev/kvm driver does, I don't get how I can get to open it if the file which `is' the device doesn't have the correct permissions. The driver can make access more restrictive than the file permissions, but not less restrictive, or so I thought. Also, if I try opening /dev/kvm as uid 1001 and group id 1000, again not in any other groups, it fails. I don't understand how this could be. Also, it means that uid 1000/gid 1000 can run virtual processes. I want to be able to limit that, and I would have thought that /dev/kvm having mode 660 and being owned by root.root would have done it. If it is any help, I am running a stock Debian Squeeze. The kernel is 2.6.32-5-amd64. Any help or pointers explaining how /dev/kvm can be opened by uid 1000/gid 1000 would be greatly appreciated. Also any explanation about why uid 1000 is different than 1001. Strange. Try changing the permissions to 600 or 060 to see if it's the user or group that allows you access. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting
On Mon, 2012-08-27 at 11:50 -0700, Glauber Costa wrote: On 08/27/2012 08:50 AM, Michael Wolf wrote: On Sat, 2012-08-25 at 19:36 -0400, Glauber Costa wrote: On 08/24/2012 11:11 AM, Michael Wolf wrote: On Fri, 2012-08-24 at 08:53 +0400, Glauber Costa wrote: On 08/24/2012 03:14 AM, Michael Wolf wrote: This is an RFC regarding the reporting of stealtime. In the case of where you have a system that is running with partial processors such as KVM the user may see steal time being reported in accounting tools such as top or vmstat. This can cause confusion for the end user. To ease the confusion this patch set adds a sysctl interface to set the cpu entitlement. This is the percentage of cpu that the guest system is expected to receive. As long as the steal time is within its expected range it will show up as 0 in /proc/stat. The user will then see in the accounting tools that they are getting a full utilization of the cpu resources assigned to them. And how is such a knob not confusing? Steal time is pretty well defined in meaning and is shown in top for ages. I really don't see the point for this. Currently you can see the steal time but you have no way of knowing if the cpu utilization you are seeing on the guest is the expected amount. I decided on making it a knob because a guest could be migrated to another system and it's entitlement could change because of hardware or load differences. It could simply be a /proc file and report the current entitlement if needed. As things are currently implemented I don't see how someone knows if the guest is running as expected or whether there is a problem. Turning off steal time display won't get even close to displaying the information you want. What you probably want is a guest-visible way to say how many miliseconds you are expected to run each second. Right? It is not clear to me how knowing how many milliseconds you are expecting to run will help the user. Currently the users will run top to see how well the guest is running. If they see _any_ steal time some users think they are not getting the full use of their processor entitlement. And your plan is just to selectively lie about it, but disabling it with a knob? It is about making it very obvious to the end user whether they are receiving their cpu entitlement. If there is more steal time than expected that will still show up. I have experimented, and it seems to work, to put the raw stealtime at the end of each cpu line in /proc/stat. That way the raw data is there as well. Do you have another suggestion to communicate to the user whether they are receiving their full entitlement? At the very least shouldn't the entitlement reside in a /proc file somewhere so that the user could look up the value and do the math? Maybe I'm missing what you are proposing, but even if you knew the milliseconds that you were expecting for your processor you would have to adjust the top output in your head so to speak. You would see the utilization and then say 'ok that matches the number of milliseconds I expected to run... If we take away the steal time (as long as it is equal to or less than the expected amount of steal time) then the user running top will see the 100% utilization. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] KVM: x86 emulator: access GPRs on demand
On 08/26/2012 10:04 AM, Marcelo Tosatti wrote: On Thu, Aug 23, 2012 at 05:14:27AM -0300, Marcelo Tosatti wrote: On Sun, Aug 19, 2012 at 12:32:36PM +0300, Avi Kivity wrote: On 08/17/2012 08:29 PM, Marcelo Tosatti wrote: On Thu, Aug 16, 2012 at 05:54:49PM +0300, Avi Kivity wrote: Instead of populating the the entire register file, read in registers as they are accessed, and write back only the modified ones. This saves a VMREAD and VMWRITE on Intel (for rsp, since it is not usually used during emulation), and a two 128-byte copies for the registers. @@ -2715,14 +2764,17 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, { int rc; + invalidate_registers(ctxt); ctxt-_eip = ctxt-eip; ctxt-dst.type = OP_NONE; rc = emulator_do_task_switch(ctxt, tss_selector, idt_index, reason, has_error_code, error_code); - if (rc == X86EMUL_CONTINUE) + if (rc == X86EMUL_CONTINUE) { ctxt-eip = ctxt-_eip; + writeback_registers(ctxt); + } return (rc == X86EMUL_UNHANDLEABLE) ? EMULATION_FAILED : EMULATION_OK; } No clear point when emulator register cache is active, when it is not (AFAICS this patch does not invalidate registers on emulation start (the above being one of the exceptions) does not clear valid bit on writeback-to-vcpu-cache on emulation exit). It is cleared when emulation starts. For the non-insn-emulation entry points, there is an explicit invalidate. For the emulation entry point, there is a memset() that clears everything up to _regs, which includes the cache. This discrepancy isn't nice, but it preexists. I don't know whether we should decompose the memset() or not, it is rather efficient. Concern is that emulator can start with cached registers marked as valid but in fact are invalid from previous emulation round. Maybe move invalidate() to init_emulate_ctxt? See the memset() in init_decode_cache(). Right. Applied, thanks. Actually, had to revert because autotest was failing. Was it failing because of this patch? Or what? Now it rejects: 4 out of 49 hunks FAILED -- saving rejects to file arch/x86/kvm/emulate.c.rej Please regenerate. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting
On Mon, 2012-08-27 at 11:55 -0700, Avi Kivity wrote: On 08/23/2012 04:14 PM, Michael Wolf wrote: This is an RFC regarding the reporting of stealtime. In the case of where you have a system that is running with partial processors such as KVM the user may see steal time being reported in accounting tools such as top or vmstat. This can cause confusion for the end user. To ease the confusion this patch set adds a sysctl interface to set the cpu entitlement. This is the percentage of cpu that the guest system is expected to receive. As long as the steal time is within its expected range it will show up as 0 in /proc/stat. The user will then see in the accounting tools that they are getting a full utilization of the cpu resources assigned to them. This patchset is changing the contents/output of /proc/stat and could affect user tools. However the default setting is that the cpu is entitled to 100% so the code will act as before. Also another field could be added to the /proc/stat output and show the unaltered steal time. Since this additional field could cause more confusion than it would clear up I have left it out for now. How would a guest know what its entitlement is? Currently the Admin/management tool setting up the guests will put it on the qemu commandline. From this it is passed via an ioctl to the host. The guest will get the value from the host via a hypercall. In the future the host could try and do some of it automatically in some cases. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] KVM: x86 emulator: access GPRs on demand
On 08/27/2012 01:22 PM, Avi Kivity wrote: On 08/26/2012 10:04 AM, Marcelo Tosatti wrote: On Thu, Aug 23, 2012 at 05:14:27AM -0300, Marcelo Tosatti wrote: On Sun, Aug 19, 2012 at 12:32:36PM +0300, Avi Kivity wrote: On 08/17/2012 08:29 PM, Marcelo Tosatti wrote: On Thu, Aug 16, 2012 at 05:54:49PM +0300, Avi Kivity wrote: Instead of populating the the entire register file, read in registers as they are accessed, and write back only the modified ones. This saves a VMREAD and VMWRITE on Intel (for rsp, since it is not usually used during emulation), and a two 128-byte copies for the registers. @@ -2715,14 +2764,17 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, { int rc; +invalidate_registers(ctxt); ctxt-_eip = ctxt-eip; ctxt-dst.type = OP_NONE; rc = emulator_do_task_switch(ctxt, tss_selector, idt_index, reason, has_error_code, error_code); -if (rc == X86EMUL_CONTINUE) +if (rc == X86EMUL_CONTINUE) { ctxt-eip = ctxt-_eip; +writeback_registers(ctxt); +} return (rc == X86EMUL_UNHANDLEABLE) ? EMULATION_FAILED : EMULATION_OK; } No clear point when emulator register cache is active, when it is not (AFAICS this patch does not invalidate registers on emulation start (the above being one of the exceptions) does not clear valid bit on writeback-to-vcpu-cache on emulation exit). It is cleared when emulation starts. For the non-insn-emulation entry points, there is an explicit invalidate. For the emulation entry point, there is a memset() that clears everything up to _regs, which includes the cache. This discrepancy isn't nice, but it preexists. I don't know whether we should decompose the memset() or not, it is rather efficient. Concern is that emulator can start with cached registers marked as valid but in fact are invalid from previous emulation round. Maybe move invalidate() to init_emulate_ctxt? See the memset() in init_decode_cache(). Right. Applied, thanks. Actually, had to revert because autotest was failing. Was it failing because of this patch? Or what? I see, the rsp mask fix. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting
On 08/27/2012 01:23 PM, Michael Wolf wrote: How would a guest know what its entitlement is? Currently the Admin/management tool setting up the guests will put it on the qemu commandline. From this it is passed via an ioctl to the host. The guest will get the value from the host via a hypercall. In the future the host could try and do some of it automatically in some cases. Seems to me it's a meaningless value for the guest. Suppose it is migrated to a host that is more powerful, and as a result its relative entitlement is reduced. The value needs to be adjusted. This is best taken care of from the host side. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] KVM: x86 emulator: access GPRs on demand
Instead of populating the the entire register file, read in registers as they are accessed, and write back only the modified ones. This saves a VMREAD and VMWRITE on Intel (for rsp, since it is not usually used during emulation), and a two 128-byte copies for the registers. Signed-off-by: Avi Kivity a...@redhat.com --- v4: rebased v3: fix misplaced parentheses in em_loop() and em_jcxz(), unbreaking those instructions. v2: add APIs for managing the register cache. This reduces the potential for confusion between ctxt-regs_dirty and vcpu-arch.regs_dirty. move cache management to the entry points add missing writebacks to int and task switch emulation arch/x86/include/asm/kvm_emulate.h | 20 ++- arch/x86/kvm/emulate.c | 299 +++-- arch/x86/kvm/x86.c | 45 +++--- 3 files changed, 220 insertions(+), 144 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index c764f43..282aee5 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -86,6 +86,19 @@ struct x86_instruction_info { struct x86_emulate_ops { /* +* read_gpr: read a general purpose register (rax - r15) +* +* @reg: gpr number. +*/ + ulong (*read_gpr)(struct x86_emulate_ctxt *ctxt, unsigned reg); + /* +* write_gpr: write a general purpose register (rax - r15) +* +* @reg: gpr number. +* @val: value to write. +*/ + void (*write_gpr)(struct x86_emulate_ctxt *ctxt, unsigned reg, ulong val); + /* * read_std: Read bytes of standard (non-emulated/special) memory. * Used for descriptor reading. * @addr: [IN ] Linear address from which to read. @@ -281,8 +294,10 @@ struct x86_emulate_ctxt { bool rip_relative; unsigned long _eip; struct operand memop; + u32 regs_valid; /* bitmaps of registers in _regs[] that can be read */ + u32 regs_dirty; /* bitmaps of registers in _regs[] that have been written */ /* Fields above regs are cleared together. */ - unsigned long regs[NR_VCPU_REGS]; + unsigned long _regs[NR_VCPU_REGS]; struct operand *memopp; struct fetch_cache fetch; struct read_cache io_read; @@ -394,4 +409,7 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, u16 tss_selector, int idt_index, int reason, bool has_error_code, u32 error_code); int emulate_int_real(struct x86_emulate_ctxt *ctxt, int irq); +void emulator_invalidate_register_cache(struct x86_emulate_ctxt *ctxt); +void emulator_writeback_register_cache(struct x86_emulate_ctxt *ctxt); + #endif /* _ASM_X86_KVM_X86_EMULATE_H */ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e8fb6c5..5e27ba5 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -202,6 +202,42 @@ struct gprefix { #define EFLG_RESERVED_ZEROS_MASK 0xffc0802a #define EFLG_RESERVED_ONE_MASK 2 +static ulong reg_read(struct x86_emulate_ctxt *ctxt, unsigned nr) +{ + if (!(ctxt-regs_valid (1 nr))) { + ctxt-regs_valid |= 1 nr; + ctxt-_regs[nr] = ctxt-ops-read_gpr(ctxt, nr); + } + return ctxt-_regs[nr]; +} + +static ulong *reg_write(struct x86_emulate_ctxt *ctxt, unsigned nr) +{ + ctxt-regs_valid |= 1 nr; + ctxt-regs_dirty |= 1 nr; + return ctxt-_regs[nr]; +} + +static ulong *reg_rmw(struct x86_emulate_ctxt *ctxt, unsigned nr) +{ + reg_read(ctxt, nr); + return reg_write(ctxt, nr); +} + +static void writeback_registers(struct x86_emulate_ctxt *ctxt) +{ + unsigned reg; + + for_each_set_bit(reg, (ulong *)ctxt-regs_dirty, 16) + ctxt-ops-write_gpr(ctxt, reg, ctxt-_regs[reg]); +} + +static void invalidate_registers(struct x86_emulate_ctxt *ctxt) +{ + ctxt-regs_dirty = 0; + ctxt-regs_valid = 0; +} + /* * Instruction emulation: * Most instructions are emulated directly via a fragment of inline assembly @@ -374,8 +410,8 @@ struct gprefix { #define __emulate_1op_rax_rdx(ctxt, _op, _suffix, _ex) \ do {\ unsigned long _tmp; \ - ulong *rax = (ctxt)-regs[VCPU_REGS_RAX]; \ - ulong *rdx = (ctxt)-regs[VCPU_REGS_RDX]; \ + ulong *rax = reg_rmw((ctxt), VCPU_REGS_RAX);\ + ulong *rdx = reg_rmw((ctxt), VCPU_REGS_RDX);\ \ __asm__ __volatile__ ( \ _PRE_EFLAGS(0, 5, 1) \ @@ -494,7 +530,7 @@ static void masked_increment(ulong *reg, ulong mask, int inc) static void
Re: [PATCH v3] KVM: x86 emulator: access GPRs on demand
On Mon, Aug 27, 2012 at 05:53:32PM -0300, Marcelo Tosatti wrote: With the fix, it rejects. About to merge the big real mode patchset, so its not a bad idea to wait for that before resending. Nevermind this sentence. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] KVM: x86 emulator: access GPRs on demand
On Mon, Aug 27, 2012 at 01:22:55PM -0700, Avi Kivity wrote: On 08/26/2012 10:04 AM, Marcelo Tosatti wrote: On Thu, Aug 23, 2012 at 05:14:27AM -0300, Marcelo Tosatti wrote: On Sun, Aug 19, 2012 at 12:32:36PM +0300, Avi Kivity wrote: On 08/17/2012 08:29 PM, Marcelo Tosatti wrote: On Thu, Aug 16, 2012 at 05:54:49PM +0300, Avi Kivity wrote: Instead of populating the the entire register file, read in registers as they are accessed, and write back only the modified ones. This saves a VMREAD and VMWRITE on Intel (for rsp, since it is not usually used during emulation), and a two 128-byte copies for the registers. @@ -2715,14 +2764,17 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, { int rc; +invalidate_registers(ctxt); ctxt-_eip = ctxt-eip; ctxt-dst.type = OP_NONE; rc = emulator_do_task_switch(ctxt, tss_selector, idt_index, reason, has_error_code, error_code); -if (rc == X86EMUL_CONTINUE) +if (rc == X86EMUL_CONTINUE) { ctxt-eip = ctxt-_eip; +writeback_registers(ctxt); +} return (rc == X86EMUL_UNHANDLEABLE) ? EMULATION_FAILED : EMULATION_OK; } No clear point when emulator register cache is active, when it is not (AFAICS this patch does not invalidate registers on emulation start (the above being one of the exceptions) does not clear valid bit on writeback-to-vcpu-cache on emulation exit). It is cleared when emulation starts. For the non-insn-emulation entry points, there is an explicit invalidate. For the emulation entry point, there is a memset() that clears everything up to _regs, which includes the cache. This discrepancy isn't nice, but it preexists. I don't know whether we should decompose the memset() or not, it is rather efficient. Concern is that emulator can start with cached registers marked as valid but in fact are invalid from previous emulation round. Maybe move invalidate() to init_emulate_ctxt? See the memset() in init_decode_cache(). Right. Applied, thanks. Actually, had to revert because autotest was failing. Was it failing because of this patch? Or what? No, due to lack of stack size fix. With the fix, it rejects. About to merge the big real mode patchset, so its not a bad idea to wait for that before resending. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-3.6] kvm: fix KVM_GET_MSR for PV EOI
On Sun, Aug 26, 2012 at 06:00:29PM +0300, Michael S. Tsirkin wrote: KVM_GET_MSR was missing support for PV EOI, which is needed for migration. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Please consider this bugfix patch for 3.6. Thanks! arch/x86/kvm/x86.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 91a5958..ff5e985 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1993,6 +1993,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_KVM_STEAL_TIME: data = vcpu-arch.st.msr_val; break; + case MSR_KVM_PV_EOI_EN: + data = vcpu-arch.pv_eoi.msr_val; + break; case MSR_IA32_P5_MC_ADDR: case MSR_IA32_P5_MC_TYPE: case MSR_IA32_MCG_CAP: Should increase KVM_SAVE_MSRS_BEGIN. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 1/2] KVM: PPC: Move kvm-arch.slot_phys into memslot.arch
On Sat, Aug 25, 2012 at 10:40:40PM +1000, Paul Mackerras wrote: Now that we have an architecture-specific field in the kvm_memory_slot structure, we can use it to store the array of page physical addresses that we need for Book3S HV KVM on PPC970 processors. This reduces the size of struct kvm_arch for Book3S HV, and also reduces the size of struct kvm_arch_memory_slot for other PPC KVM variants since the fields in it are now only compiled in for Book3S HV. This necessitates making the kvm_arch_create_memslot and kvm_arch_free_memslot operations specific to each PPC KVM variant. That in turn means that we now don't allocate the rmap arrays on Book3S PR and Book E. Since we now unpin pages and free the slot_phys array in kvmppc_core_free_memslot, we no longer need to do it in kvmppc_core_destroy_vm, since the generic code takes care to free all the memslots when destroying a VM. We now need the new memslot to be passed in to kvmppc_core_prepare_memory_region, since we need to initialize its arch.slot_phys member on Book3S HV. Signed-off-by: Paul Mackerras pau...@samba.org --- This is on top of Alex's kvm-ppc-next branch with the KVM tree's next branch merged in and then Marcelo's set of 3 patches on that. arch/powerpc/include/asm/kvm_host.h |9 +-- arch/powerpc/include/asm/kvm_ppc.h |5 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c |6 +- arch/powerpc/kvm/book3s_hv.c| 104 --- arch/powerpc/kvm/book3s_hv_rm_mmu.c |2 +- arch/powerpc/kvm/book3s_pr.c| 12 arch/powerpc/kvm/booke.c| 12 arch/powerpc/kvm/powerpc.c | 13 + 8 files changed, 102 insertions(+), 61 deletions(-) Regarding generic memslot code, looks fine. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm/book3s: fix build error caused by gfn_to_hva_memslot()
On Fri, Aug 24, 2012 at 07:03:14PM +1000, Paul Mackerras wrote: On Fri, Aug 24, 2012 at 04:50:28PM +0800, Gavin Shan wrote: The build error was caused by that builtin functions are calling the functions implemented in modules. That was introduced by the following commit. commit 4d8b81abc47b83a1939e59df2fdb0e98dfe0eedd The patch fixes the build error by moving function __gfn_to_hva_memslot() from kvm_main.c to kvm_host.h and making that inline so that the builtin function (kvmppc_h_enter) can use that. Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com Acked-by: Paul Mackerras pau...@samba.org By the way, when you give a commit ID it's a good idea to give the headline of the commit as well, something like this: This error was introduced by commit 4d8b81abc4 (KVM: introduce readonly memslot). Paul. Applied, thanks (with suggested changelog modification). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()
On Fri, Aug 24, 2012 at 06:15:49PM +0900, Takuya Yoshikawa wrote: Although returning -1 should be likely according to the likely(), the ASSERT in apic_find_highest_irr() will be triggered in such a case. It seems that this optimization is not working as expected. This patch simplifies the logic to mitigate this issue: search for the first non-zero word in a for loop and then use __fls() if found. When nothing found, we are out of the loop, so we can just return -1. Numbers please? Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/lapic.c | 18 ++ 1 files changed, 10 insertions(+), 8 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: setting time in guest with ntpdate results in VM hang
On Mon, Aug 27, 2012 at 01:23:05PM -0600, David Ahern wrote: On 8/27/12 10:58 AM, Dale Swanston wrote: Good idea. I'll try that. But are there any tools available to determine what the VM is doing when it appears hung? I've looked but haven't found much on debug or diagnostics on a running VM. Any links? If you have the vmlinux, enable the gdbserver stub via Qemu's monitor. Then use 'gdb vmlinux', connect to the VM 'target remote host:port' and look at the backtrace. Another option is to boot the host with profile=kvm, wait for the guest to hang, then do: readprofile -r ; readprofile -m System-map-of-guest.map I have seen something similar using kvm-clock in a guest running 2.6.27. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting
On 08/27/2012 01:19 PM, Michael Wolf wrote: On Mon, 2012-08-27 at 11:50 -0700, Glauber Costa wrote: On 08/27/2012 08:50 AM, Michael Wolf wrote: On Sat, 2012-08-25 at 19:36 -0400, Glauber Costa wrote: On 08/24/2012 11:11 AM, Michael Wolf wrote: On Fri, 2012-08-24 at 08:53 +0400, Glauber Costa wrote: On 08/24/2012 03:14 AM, Michael Wolf wrote: This is an RFC regarding the reporting of stealtime. In the case of where you have a system that is running with partial processors such as KVM the user may see steal time being reported in accounting tools such as top or vmstat. This can cause confusion for the end user. To ease the confusion this patch set adds a sysctl interface to set the cpu entitlement. This is the percentage of cpu that the guest system is expected to receive. As long as the steal time is within its expected range it will show up as 0 in /proc/stat. The user will then see in the accounting tools that they are getting a full utilization of the cpu resources assigned to them. And how is such a knob not confusing? Steal time is pretty well defined in meaning and is shown in top for ages. I really don't see the point for this. Currently you can see the steal time but you have no way of knowing if the cpu utilization you are seeing on the guest is the expected amount. I decided on making it a knob because a guest could be migrated to another system and it's entitlement could change because of hardware or load differences. It could simply be a /proc file and report the current entitlement if needed. As things are currently implemented I don't see how someone knows if the guest is running as expected or whether there is a problem. Turning off steal time display won't get even close to displaying the information you want. What you probably want is a guest-visible way to say how many miliseconds you are expected to run each second. Right? It is not clear to me how knowing how many milliseconds you are expecting to run will help the user. Currently the users will run top to see how well the guest is running. If they see _any_ steal time some users think they are not getting the full use of their processor entitlement. And your plan is just to selectively lie about it, but disabling it with a knob? It is about making it very obvious to the end user whether they are receiving their cpu entitlement. If there is more steal time than expected that will still show up. I have experimented, and it seems to work, to put the raw stealtime at the end of each cpu line in /proc/stat. That way the raw data is there as well. Do you have another suggestion to communicate to the user whether they are receiving their full entitlement? At the very least shouldn't the entitlement reside in a /proc file somewhere so that the user could look up the value and do the math? I personally believe Avi is right here. This is something to be done at the host side. The user can learn this from any tool he is using to manage his VMs. Now if you absolutely must inform him from inside the guest, I would go with the later, informing him in another location. (I am not saying I agree with this, just that this is less worse) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-3.6] kvm: fix KVM_GET_MSR for PV EOI
On Mon, Aug 27, 2012 at 05:47:42PM -0300, Marcelo Tosatti wrote: On Sun, Aug 26, 2012 at 06:00:29PM +0300, Michael S. Tsirkin wrote: KVM_GET_MSR was missing support for PV EOI, which is needed for migration. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Please consider this bugfix patch for 3.6. Thanks! arch/x86/kvm/x86.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 91a5958..ff5e985 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1993,6 +1993,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_KVM_STEAL_TIME: data = vcpu-arch.st.msr_val; break; + case MSR_KVM_PV_EOI_EN: + data = vcpu-arch.pv_eoi.msr_val; + break; case MSR_IA32_P5_MC_ADDR: case MSR_IA32_P5_MC_TYPE: case MSR_IA32_MCG_CAP: Should increase KVM_SAVE_MSRS_BEGIN. Already done by e115676e042f4d9268, applied. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting
On Mon, 2012-08-27 at 13:31 -0700, Avi Kivity wrote: On 08/27/2012 01:23 PM, Michael Wolf wrote: How would a guest know what its entitlement is? Currently the Admin/management tool setting up the guests will put it on the qemu commandline. From this it is passed via an ioctl to the host. The guest will get the value from the host via a hypercall. In the future the host could try and do some of it automatically in some cases. Seems to me it's a meaningless value for the guest. Suppose it is migrated to a host that is more powerful, and as a result its relative entitlement is reduced. The value needs to be adjusted. This is why I chose to manage the value from the sysctl interface rather than just have it stored as a value in /proc. Whatever tool was used to migrate the vm could hopefully adjust the sysctl value on the guest. This is best taken care of from the host side. Not sure what you are getting at here. If you are running in a cloud environment, you purchase a VM with the understanding that you are getting certain resources. As this type of user I don't believe you have any access to the host to see this type of information. So the user still wouldnt have a way to confirm that they are receiving what they should be in the way of processor resources. Would you please elaborate a little more on this? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting
On 08/27/2012 02:27 PM, Michael Wolf wrote: On Mon, 2012-08-27 at 13:31 -0700, Avi Kivity wrote: On 08/27/2012 01:23 PM, Michael Wolf wrote: How would a guest know what its entitlement is? Currently the Admin/management tool setting up the guests will put it on the qemu commandline. From this it is passed via an ioctl to the host. The guest will get the value from the host via a hypercall. In the future the host could try and do some of it automatically in some cases. Seems to me it's a meaningless value for the guest. Suppose it is migrated to a host that is more powerful, and as a result its relative entitlement is reduced. The value needs to be adjusted. This is why I chose to manage the value from the sysctl interface rather than just have it stored as a value in /proc. Whatever tool was used to migrate the vm could hopefully adjust the sysctl value on the guest. This is best taken care of from the host side. Not sure what you are getting at here. If you are running in a cloud environment, you purchase a VM with the understanding that you are getting certain resources. As this type of user I don't believe you have any access to the host to see this type of information. So the user still wouldnt have a way to confirm that they are receiving what they should be in the way of processor resources. Would you please elaborate a little more on this? What do you mean they have no access to the host? They have access to all sorts of tools that display information from the host. Speaking of a view-only resource, those are strictly equivalent. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting
On 08/27/2012 02:27 PM, Michael Wolf wrote: On Mon, 2012-08-27 at 13:31 -0700, Avi Kivity wrote: On 08/27/2012 01:23 PM, Michael Wolf wrote: How would a guest know what its entitlement is? Currently the Admin/management tool setting up the guests will put it on the qemu commandline. From this it is passed via an ioctl to the host. The guest will get the value from the host via a hypercall. In the future the host could try and do some of it automatically in some cases. Seems to me it's a meaningless value for the guest. Suppose it is migrated to a host that is more powerful, and as a result its relative entitlement is reduced. The value needs to be adjusted. This is why I chose to manage the value from the sysctl interface rather than just have it stored as a value in /proc. Whatever tool was used to migrate the vm could hopefully adjust the sysctl value on the guest. We usually try to avoid this type of coupling. What if the guest is rebooting while this is happening? What if it's not running Linux at all? This is best taken care of from the host side. Not sure what you are getting at here. If you are running in a cloud environment, you purchase a VM with the understanding that you are getting certain resources. As this type of user I don't believe you have any access to the host to see this type of information. So the user still wouldnt have a way to confirm that they are receiving what they should be in the way of processor resources. Would you please elaborate a little more on this? I meant not reporting this time as steal time. But that cripples steal time reporting. Looks like for each quanta we need to report how much real time has passed, how much the guest was actually using, and how much the guest was not using due to overcommit (with the reminder being unallocated time). The guest could then present it any way it wanted to. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting
On Mon, 2012-08-27 at 14:41 -0700, Glauber Costa wrote: On 08/27/2012 02:27 PM, Michael Wolf wrote: On Mon, 2012-08-27 at 13:31 -0700, Avi Kivity wrote: On 08/27/2012 01:23 PM, Michael Wolf wrote: How would a guest know what its entitlement is? Currently the Admin/management tool setting up the guests will put it on the qemu commandline. From this it is passed via an ioctl to the host. The guest will get the value from the host via a hypercall. In the future the host could try and do some of it automatically in some cases. Seems to me it's a meaningless value for the guest. Suppose it is migrated to a host that is more powerful, and as a result its relative entitlement is reduced. The value needs to be adjusted. This is why I chose to manage the value from the sysctl interface rather than just have it stored as a value in /proc. Whatever tool was used to migrate the vm could hopefully adjust the sysctl value on the guest. This is best taken care of from the host side. Not sure what you are getting at here. If you are running in a cloud environment, you purchase a VM with the understanding that you are getting certain resources. As this type of user I don't believe you have any access to the host to see this type of information. So the user still wouldnt have a way to confirm that they are receiving what they should be in the way of processor resources. Would you please elaborate a little more on this? What do you mean they have no access to the host? They have access to all sorts of tools that display information from the host. Speaking of a view-only resource, those are strictly equivalent. ok. I will go look at those resources. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: /dev/kvm not sufficiently restricted, and in ways I didn't think were possible
On Monday, August 27, 2012 04:11:11 PM Henry Cejtin wrote: I'm completely confused about access to /dev/kvm. In particular, it looks like it is too open to access, but in a way that I don't understand. On my machine, /dev/kvm is owned by root.root and mode 660. Here is the output of ls: % ls -l /dev/kvm crw-rw+ 1 root root 10, 232 Aug 24 15:03 /dev/kvm Despite that, when a process is uid 1000 and group id 1000, and not in any other groups, I can open /dev/kvm. ... Please note, I don't understand how this could really be. I think the '+' indicates ACLs are in use; 'getfacl /dev/kvm' might be illuminating. It might be something udev does, or something your desktop software does when you log in. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for Tuesda, August 28th
Hi Please send in any agenda items you are interested in covering. Thanks, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Reminder: KVM Forum 2012 Call For Participation
Just a reminder, the CFP ends this Friday. -- = KVM Forum 2012: Call For Participation November 7-9, 2012 - Hotel Fira Palace - Barcelona, Spain (All submissions must be received before midnight Aug 31st, 2012) = KVM is an industry leading open source hypervisor that provides an ideal platform for datacenter virtualization, virtual desktop infrastructure, and cloud computing. Once again, it's time to bring together the community of developers and users that define the KVM ecosystem for our annual technical conference. We will discuss the current state of affairs and plan for the future of KVM, its surrounding infrastructure, and management tools. We are also excited to announce the oVirt Workshop will run in parallel with the KVM Forum, bringing in a community focused on enterprise datacenter virtualization management built on KVM. For topics which overlap we will have shared sessions. So mark your calendar and join us in advancing KVM. http://events.linuxfoundation.org/events/kvm-forum/ Once again we are colocated with The Linux Foundation's LinuxCon, Based on feedback from last year, this time it's LinuxCon Europe! KVM Forum attendees will be able to attend oVirt Workshop sessions and are eligible to attend LinuxCon Europe for a discounted rate. http://events.linuxfoundation.org/events/kvm-forum/register We invite you to lead part of the discussion by submitting a speaking proposal for KVM Forum 2012. http://events.linuxfoundation.org/cfp Suggested topics: KVM - Scaling and performance - Nested virtualization - I/O improvements - PCI device assignment - Driver domains - Time keeping - Resource management (cpu, memory, i/o) - Memory management (page sharing, swapping, huge pages, etc) - VEPA, VN-Link, vswitch - Security - Architecture ports QEMU - Device model improvements - New devices and chipsets - Scaling and performance - Desktop virtualization - Spice - Increasing robustness and hardening - Security model - Management interfaces - QMP protocol and implementation - Image formats - Firmware (SeaBIOS, OVMF, UEFI, etc) - Live migration - Live snapshots and merging - Fault tolerance, high availability, continuous backup - Real-time guest support Virtio - Speeding up existing devices - Alternatives - Virtio on non-Linux or non-virtualized Management infrastructure - oVirt (shared track w/ oVirt Workshop) - Libvirt - KVM autotest - OpenStack - Network virtualization management - Enterprise storage management Cloud computing - Scalable storage - Virtual networking - Security - Provisioning SUBMISSION REQUIREMENTS Abstracts due: Aug 31st, 2012 Notification: Sep 14th, 2012 Please submit a short abstract (~150 words) describing your presentation proposal. In your submission please note how long your talk will take. Slots vary in length up to 45 minutes. Also include in your proposal the proposal type -- one of: - technical talk - end-user talk - birds of a feather (BOF) session Submit your proposal here: http://events.linuxfoundation.org/cfp You will receive a notification whether or not your presentation proposal was accepted by Sep 14th. END-USER COLLABORATION One of the big challenges as developers is to know what, where and how people actually use our software. We will reserve a few slots for end users talking about their deployment challenges and achievements. If you are using KVM in production you are encouraged submit a speaking proposal. Simply mark it as an end-user collaboration proposal. As an end user, this is a unique opportunity to get your input to developers. BOF SESSION We will reserve some slots in the evening after the main conference tracks, for birds of a feather (BOF) sessions. These sessions will be less formal than presentation tracks and targetted for people who would like to discuss specific issues with other developers and/or users. If you are interested in getting developers and/or uses together to discuss a specific problem, please submit a BOF proposal. LIGHTNING TALKS In addition to submitted talks we will also have some room for lightning talks. These are short (5 minute) discussions to highlight new work or ideas that aren't complete enough to warrant a full presentation slot. Lightning talk submissions and scheduling will be handled on-site at KVM Forum. HOTEL / TRAVEL The KVM Forum 2012 will be held in Barcelona, Spain at the Hotel Fira Palace. http://events.linuxfoundation.org/events/kvm-forum/hotel Thank you for your interest in KVM. We're looking forward to your submissions and seeing you at the KVM Forum 2012 in November! Thanks, your KVM Forum 2012 Program Commitee Please contact us with any questions or comments. kvm-forum-2012...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to
Re: [patch 3/3] KVM: move postcommit flush to x86, as mmio sptes are x86 specific
On Mon, 27 Aug 2012 16:06:01 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: Any explanation why (old.base_gfn != new.base_gfn) case can be omitted? (old.base_gfn != new.base_gfn) check covers the cases 1. old.base_gfn = 0, new.base_gfn = !0 (slot creation) and x != 0, y != 0, x != y. 2. old.base_gfn = x, new.base_gfn = y (gpa base change) Patch 2 covers case 2, so its only necessary to cover case 1 here. Makes sense? Yes. But didn't you change the flush in the if block modified by patch 2 to kvm_arch_flush_shadow_memslot()? Although current implementation flushes everything, this may trigger problem when we change it. Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment
Hi Blue, thanks for the review. I addressed most of them, the others a commented below. On 2012-08-27 20:56, Blue Swirl wrote: +typedef struct AssignedDevice { +PCIDevice dev; +PCIHostDeviceAddress host; +uint32_t dev_id; +uint32_t features; +int intpin; +AssignedDevRegion v_addrs[PCI_NUM_REGIONS - 1]; +PCIDevRegions real_device; +PCIINTxRoute intx_route; +AssignedIRQType assigned_irq_type; +struct { +#define ASSIGNED_DEVICE_CAP_MSI (1 0) +#define ASSIGNED_DEVICE_CAP_MSIX (1 1) +uint32_t available; +#define ASSIGNED_DEVICE_MSI_ENABLED (1 0) +#define ASSIGNED_DEVICE_MSIX_ENABLED (1 1) +#define ASSIGNED_DEVICE_MSIX_MASKED (1 2) +uint32_t state; +} cap; +uint8_t emulate_config_read[PCI_CONFIG_SPACE_SIZE]; +uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE]; +int msi_virq_nr; +int *msi_virq; +MSIXTableEntry *msix_table; +target_phys_addr_t msix_table_addr; +uint16_t msix_max; +MemoryRegion mmio; +char *configfd_name; const? Not if this would mean more casts. DEFINE_PROP_STRING, where this is used, doesn't allow this. ... +} else { +uint32_t port = addr + dev_region-u.r_baseport; + +if (data) { +DEBUG(out data=%lx, size=%d, e_phys=%lx, host=%x\n, + *data, size, addr, port); +switch (size) { +case 1: +outb(*data, port); +break; +case 2: +outw(*data, port); +break; +case 4: +outl(*data, port); +break; Maybe add case 8: and default: with abort(), also below. PIO is never 8 bytes long, the generic layer protects us. ... + +fclose(f); + +/* read and fill vendor ID */ +v = get_real_vendor_id(dir, id); +if (v) { +return 1; +} +pci_dev-dev.config[0] = id 0xff; +pci_dev-dev.config[1] = (id 0xff00) 8; + +/* read and fill device ID */ +v = get_real_device_id(dir, id); +if (v) { +return 1; +} +pci_dev-dev.config[2] = id 0xff; +pci_dev-dev.config[3] = (id 0xff00) 8; + +pci_word_test_and_clear_mask(pci_dev-emulate_config_write + PCI_COMMAND, + PCI_COMMAND_MASTER | PCI_COMMAND_INTX_DISABLE); + +dev-region_number = r; +return 0; +} Pretty long function, how about refactoring? Possibly, but I'd prefer to do such changes in-tree, after the more important refactoring on MSI[-X] is done. ... +if (ctrl_byte PCI_MSI_FLAGS_ENABLE) { +uint8_t *pos = pci_dev-config + pci_dev-msi_cap; +MSIMessage msg; +int virq; + +msg.address = pci_get_long(pos + PCI_MSI_ADDRESS_LO); +msg.data = pci_get_word(pos + PCI_MSI_DATA_32); +virq = kvm_irqchip_add_msi_route(kvm_state, msg); +if (virq 0) { +perror(assigned_dev_update_msi: kvm_irqchip_add_msi_route); +return; +} + +assigned_dev-msi_virq = g_malloc(sizeof(*assigned_dev-msi_virq)); Is this ever freed? Yep, in free_msi_virqs. If you think you spotted a path where this is not the case, let me know. ... + +static Property da_properties[] = { const? Nope, properties must remain writable. +DEFINE_PROP_PCI_HOST_DEVADDR(host, AssignedDevice, host), +DEFINE_PROP_BIT(prefer_msi, AssignedDevice, features, +ASSIGNED_DEVICE_PREFER_MSI_BIT, false), +DEFINE_PROP_BIT(share_intx, AssignedDevice, features, +ASSIGNED_DEVICE_SHARE_INTX_BIT, true), +DEFINE_PROP_INT32(bootindex, AssignedDevice, bootindex, -1), +DEFINE_PROP_STRING(configfd, AssignedDevice, configfd_name), +DEFINE_PROP_END_OF_LIST(), +}; + Jan signature.asc Description: OpenPGP digital signature
Re: KVM: MMU: Tracking guest writes through EPT entries ?
Xiao Guangrong xiaoguangrong at linux.vnet.ibm.com writes: On 07/31/2012 01:18 AM, Sunil wrote: Hello List, I am a KVM newbie and studying KVM mmu code. On the existing guest, I am trying to track all guest writes by marking page table entry as read-only in EPT entry [ I am using Intel machine with vmx and ept support ]. Looks like EPT support re-uses shadow page table(SPT) code and hence some of SPT routines. I was thinking of below possible approach. Use pte_list_walk() to traverse through list of sptes and use mmu_spte_update() to flip the PT_WRITABLE_MASK flag. But all SPTEs are not part of any single list; but on separate lists (based on gfn, page level, memory_slot). So, recording all the faulted guest GFN and then using above method work ? There are two ways to write-protect all sptes: - use kvm_mmu_slot_remove_write_access() on all memslots - walk the shadow page cache to get the shadow pages in the highest level (level = 4 on EPT), then write-protect its entries. If you just want to do it for the specified gfn, you can use rmap_write_protect(). Just inquisitive, what is your purpose? :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Hi, Guangrong, I have done similar things like Sunil did. Simply for study purpose. However, I found some very weird situations. Basically, in the guest vm, I allocate a chunk of memory (with size of a page) in a user level program. Through a guest kernel level module and my self defined hypercall, I pass the gva of this memory to kvm. Then I try different methods in the hypercall handler to write protect this page of memory. You can see that I want to write protect it through ETP instead of write protected in the guest page tables. 1. I use kvm_mmu_gva_to_gpa_read to translate the gva into gpa. Based on the function, kvm_mmu_get_spte_hierarchy(vcpu, gpa, spte[4]), I change the codes to read sptep (the pointer to spte) instead of spte, so I can modify the spte corresponding to this gpa. What I observe is that if I modify spte[0] (I think this is the lowest level page table entry corresponding to EPT table; I can successfully modify it as the changes are reflected in the result of calling kvm_mmu_get_spte_hierarchy again), but my user level program in vm can still write to this page. In your this blog post, you mentioned (the shadow pages in the highest level (level = 4 on EPT)), I don't understand this part. Does this mean I have to modify spte[3] instead of spte[0]? I just try modify spte[1] and spte[3], both can cause vmexit. So I am totally confused about the meaning of level used in shadow page table and its relations to shadow page table. Can you help me to understand this? 2. As suggested by this post, I also use rmap_write_protect() to write protect this page. With kvm_mmu_get_spte_hierarchy(vcpu, gpa, spte[4]), I still can see that spte[0] gives me xx005 such result, this means that the function is called successfully. But still I can write to this page. I even try the function kvm_age_hva() to remove this spte, this gives me 0 of spte[0], but I still can write to this page. So I am further confused about the level used in the shadow page? Really thanks and appreciate your reply. Felix -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: Tracking guest writes through EPT entries
Hi, I have done similar things posted in http://article.gmane.org/gmane.comp.emulators.kvm.devel/95342/match=tracking+guest+writes+ept . However, I found some very weird situations. Basically, in the guest vm, I allocate a chunk of memory (with size of a page) in a user level program. Through a guest kernel level module and my self defined hypercall, I pass the gva of this memory to kvm. Then I try different methods in the hypercall handler to write protect this page of memory. You can see that I want to write protect it through ETP instead of write protected in the guest page tables. 1. I use kvm_mmu_gva_to_gpa_read to translate the gva into gpa. Based on the function, kvm_mmu_get_spte_hierarchy(vcpu, gpa, spte[4]), I change the codes to read sptep (the pointer to spte) instead of spte, so I can modify the spte corresponding to this gpa. What I observe is that if I modify spte[0] (I think this is the lowest level page table entry corresponding to EPT table; I can successfully modify it as the changes are reflected in the result of calling kvm_mmu_get_spte_hierarchy again), but my user level program in vm can still write to this page. In this post, it mentioned (the shadow pages in the highest level (level = 4 on EPT)), I don't understand this part. Does this mean I have to modify spte[3] instead of spte[0]? I just try modify spte[1] and spte[3], both can cause vmexit. So I am totally confused about the meaning of level used in shadow page table and its relations to shadow page table. Can you help me to understand this? 2. As suggested by this post, I also use rmap_write_protect() to write protect this page. With kvm_mmu_get_spte_hierarchy(vcpu, gpa, spte[4]), I still can see that spte[0] gives me results like xx005, this means that the function is called successfully and write protected bit is cleared in pte. But still I can write to this page. I even try the function kvm_age_hva() to remove this spte, this gives me 0 of spte[0], but I still can write to this page. So I am further confused about the level used in the shadow page? Really thanks and appreciate your reply. Hugo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Documentation for kvm_stat.
Signed-off-by: Bo Yang boy...@suse.com --- Makefile |9 - kvm_stat.texi | 55 +++ 2 files changed, 63 insertions(+), 1 deletions(-) create mode 100644 kvm_stat.texi diff --git a/Makefile b/Makefile index 1cd5bc8..ee524b0 100644 --- a/Makefile +++ b/Makefile @@ -40,7 +40,7 @@ LIBS+=-lz $(LIBS_TOOLS) HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF) ifdef BUILD_DOCS -DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 QMP/qmp-commands.txt +DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 kvm_stat.1 QMP/qmp-commands.txt ifdef CONFIG_VIRTFS DOCS+=fsdev/virtfs-proxy-helper.1 endif @@ -283,6 +283,7 @@ ifdef CONFIG_POSIX $(INSTALL_DATA) qemu.1 qemu-img.1 $(DESTDIR)$(mandir)/man1 $(INSTALL_DIR) $(DESTDIR)$(mandir)/man8 $(INSTALL_DATA) qemu-nbd.8 $(DESTDIR)$(mandir)/man8 + $(INSTALL_DATA) kvm_stat.1 $(DESTDIR)$(mandir)/man1 endif ifdef CONFIG_VIRTFS $(INSTALL_DIR) $(DESTDIR)$(mandir)/man1 @@ -387,6 +388,12 @@ qemu-nbd.8: qemu-nbd.texi $(POD2MAN) --section=8 --center= --release= qemu-nbd.pod $@, \ GEN $@) +kvm_stat.1: kvm_stat.texi + $(call quiet-command, \ + perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $ kvm_stat.pod \ + $(POD2MAN) --section=1 --center= --release= kvm_stat.pod $@, \ + GEN $@) + dvi: qemu-doc.dvi qemu-tech.dvi html: qemu-doc.html qemu-tech.html info: qemu-doc.info qemu-tech.info diff --git a/kvm_stat.texi b/kvm_stat.texi new file mode 100644 index 000..ff7d414 --- /dev/null +++ b/kvm_stat.texi @@ -0,0 +1,55 @@ +@example +@c man begin SYNOPSIS + +usage: kvm_stat [OPTIONS] + +@c man end +@end example + +@c man begin DESCRIPTION + +This is a utility to watch kvm statistics. + +@c man end +@c man begin OPTIONS + +@table @option + +@item -h, --help + +Show help message. + +@item -1, --once, --batch + +Run in batch mode for one second. + +@item -l, --log + +Run in logging mode (like vmstat). + +@item -f @var{FIELDS}, --fields=@var{FIELDS} + +Fields to display (regex). regex expression can be accepted here. Fields include: +@samp{size}, @samp{config}, @samp{sample_freq}, @samp{sample_type}, @samp{read_format}, @samp{flags}, @samp{wakeup_events}, @samp{bp_type}, @samp{bp_addr}, @samp{bp_len} + +@end table +@c man end + +@ignore + +@setfilename kvm_stat +@settitle kvm statistics utility + +@c man begin SEE ALSO + +vmstat + +@c man end + +@c man begin AUTHOR + +Copyright (C) 2012 Bo Yang boy...@suse.com. +This is free software; see the source for copying conditions. There is NO +warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +@c man end +@end ignore -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 1/2] KVM: PPC: Move kvm-arch.slot_phys into memslot.arch
On Sat, Aug 25, 2012 at 10:40:40PM +1000, Paul Mackerras wrote: Now that we have an architecture-specific field in the kvm_memory_slot structure, we can use it to store the array of page physical addresses that we need for Book3S HV KVM on PPC970 processors. This reduces the size of struct kvm_arch for Book3S HV, and also reduces the size of struct kvm_arch_memory_slot for other PPC KVM variants since the fields in it are now only compiled in for Book3S HV. This necessitates making the kvm_arch_create_memslot and kvm_arch_free_memslot operations specific to each PPC KVM variant. That in turn means that we now don't allocate the rmap arrays on Book3S PR and Book E. Since we now unpin pages and free the slot_phys array in kvmppc_core_free_memslot, we no longer need to do it in kvmppc_core_destroy_vm, since the generic code takes care to free all the memslots when destroying a VM. We now need the new memslot to be passed in to kvmppc_core_prepare_memory_region, since we need to initialize its arch.slot_phys member on Book3S HV. Signed-off-by: Paul Mackerras pau...@samba.org --- This is on top of Alex's kvm-ppc-next branch with the KVM tree's next branch merged in and then Marcelo's set of 3 patches on that. arch/powerpc/include/asm/kvm_host.h |9 +-- arch/powerpc/include/asm/kvm_ppc.h |5 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c |6 +- arch/powerpc/kvm/book3s_hv.c| 104 --- arch/powerpc/kvm/book3s_hv_rm_mmu.c |2 +- arch/powerpc/kvm/book3s_pr.c| 12 arch/powerpc/kvm/booke.c| 12 arch/powerpc/kvm/powerpc.c | 13 + 8 files changed, 102 insertions(+), 61 deletions(-) Regarding generic memslot code, looks fine. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html