[COMMIT master] make-release: fix mtime for a wider range of git versions

2010-11-23 Thread Avi Kivity
From: Bernhard Kohl bernhard.k...@nsn.com

With the latest git versions, e.g. 1.7.2.3, git still prints out
the tag info in addition to the requested format. So let's simply
fetch the first line from the output.

In addition I use the --pretty option instead of --format which
is not recognized in very old git versions, e.g. 1.5.5.6.

Tested with git versions 1.5.5.6 and 1.7.2.3.

Signed-off-by: Bernhard Kohl bernhard.k...@nsn.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kvm/scripts/make-release b/kvm/scripts/make-release
index 56302c3..2d050fc 100755
--- a/kvm/scripts/make-release
+++ b/kvm/scripts/make-release
@@ -51,7 +51,7 @@ cd $(dirname $0)/../..
 mkdir -p $(dirname $tarball)
 git archive --prefix=$name/ --format=tar $commit  $tarball
 
-mtime=`git show --format=%ct $commit^{commit} --`
+mtime=`git show --pretty=format:%ct $commit^{commit} -- | head -n 1`
 tarargs=--owner=root --group=root
 
 mkdir -p $tmpdir/$name
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] device-assignment: register a reset function

2010-11-23 Thread Avi Kivity
From: Bernhard Kohl bernhard.k...@nsn.com

This is necessary because during reboot of a VM the assigned devices
continue DMA transfers which causes memory corruption.

Acked-by: Alex Williamson alex.william...@redhat.com
Acked-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Thomas Ostler thomas.ost...@nsn.com
Signed-off-by: Bernhard Kohl bernhard.k...@nsn.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index c2a7b27..369bff9 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1438,6 +1438,17 @@ static const VMStateDescription vmstate_assigned_device 
= {
 .name = pci-assign
 };
 
+static void reset_assigned_device(DeviceState *dev)
+{
+PCIDevice *d = DO_UPCAST(PCIDevice, qdev, dev);
+
+/*
+ * When a 0 is written to the command register, the device is logically
+ * disconnected from the PCI bus. This avoids further DMA transfers.
+ */
+assigned_dev_pci_write_config(d, PCI_COMMAND, 0, 2);
+}
+
 static int assigned_initfn(struct PCIDevice *pci_dev)
 {
 AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
@@ -1555,6 +1566,7 @@ static PCIDeviceInfo assign_info = {
 .qdev.name= pci-assign,
 .qdev.desc= pass through host pci devices to the guest,
 .qdev.size= sizeof(AssignedDevice),
+.qdev.reset   = reset_assigned_device,
 .init = assigned_initfn,
 .exit = assigned_exitfn,
 .config_read  = assigned_dev_pci_read_config,
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] device-assignment: Register as un-migratable

2010-11-23 Thread Avi Kivity
From: Alex Williamson alex.william...@redhat.com

Use register_device_unmigratable() to declare ourselves as
non-migratable.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 5f5bde1..c2a7b27 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1434,6 +1434,10 @@ static void 
assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
 dev-msix_table_page = NULL;
 }
 
+static const VMStateDescription vmstate_assigned_device = {
+.name = pci-assign
+};
+
 static int assigned_initfn(struct PCIDevice *pci_dev)
 {
 AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
@@ -1495,6 +1499,12 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 
 assigned_dev_load_option_rom(dev);
 QLIST_INSERT_HEAD(devs, dev, next);
+
+/* Register a vmsd so that we can mark it unmigratable. */
+vmstate_register(dev-dev.qdev, 0, vmstate_assigned_device, dev);
+register_device_unmigratable(dev-dev.qdev,
+ vmstate_assigned_device.name, dev);
+
 return 0;
 
 assigned_out:
@@ -1508,6 +1518,7 @@ static int assigned_exitfn(struct PCIDevice *pci_dev)
 {
 AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 
+vmstate_unregister(dev-dev.qdev, vmstate_assigned_device, dev);
 QLIST_REMOVE(dev, next);
 deassign_device(dev);
 free_assigned_device(dev);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Mask KVM_GET_SUPPORTED_CPUID data with Linux cpuid info

2010-11-23 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

This allows Linux to mask cpuid bits if, for example, nx is enabled on only
some cpus.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 003a0ca..410d2d1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2232,6 +2232,11 @@ out:
return r;
 }
 
+static void cpuid_mask(u32 *word, int wordnum)
+{
+   *word = boot_cpu_data.x86_capability[wordnum];
+}
+
 static void do_cpuid_1_ent(struct kvm_cpuid_entry2 *entry, u32 function,
   u32 index)
 {
@@ -2306,7 +2311,9 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
u32 function,
break;
case 1:
entry-edx = kvm_supported_word0_x86_features;
+   cpuid_mask(entry-edx, 0);
entry-ecx = kvm_supported_word4_x86_features;
+   cpuid_mask(entry-ecx, 4);
/* we support x2apic emulation even if host does not support
 * it since we emulate x2apic in software */
entry-ecx |= F(X2APIC);
@@ -2397,7 +2404,9 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
u32 function,
break;
case 0x8001:
entry-edx = kvm_supported_word1_x86_features;
+   cpuid_mask(entry-edx, 1);
entry-ecx = kvm_supported_word6_x86_features;
+   cpuid_mask(entry-ecx, 6);
break;
}
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: VMX: Fix host userspace gsbase corruption

2010-11-23 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

We now use load_gs_index() to load gs safely; unfortunately this also
changes MSR_KERNEL_GS_BASE, which we managed separately.  This resulted
in confusion and breakage running 32-bit host userspace on a 64-bit kernel.

Fix by
- saving guest MSR_KERNEL_GS_BASE before we we reload the host's gs
- doing the host save/load unconditionally, instead of only when in guest
  long mode

Things can be cleaned up further, but this is the minmal fix for now.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9367abc..0badeac 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -821,10 +821,9 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu)
 #endif
 
 #ifdef CONFIG_X86_64
-   if (is_long_mode(vmx-vcpu)) {
-   rdmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base);
+   rdmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base);
+   if (is_long_mode(vmx-vcpu))
wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_guest_kernel_gs_base);
-   }
 #endif
for (i = 0; i  vmx-save_nmsrs; ++i)
kvm_set_shared_msr(vmx-guest_msrs[i].index,
@@ -839,11 +838,14 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
 
++vmx-vcpu.stat.host_state_reload;
vmx-host_state.loaded = 0;
+#ifdef CONFIG_X86_64
+   if (is_long_mode(vmx-vcpu))
+   rdmsrl(MSR_KERNEL_GS_BASE, vmx-msr_guest_kernel_gs_base);
+#endif
if (vmx-host_state.gs_ldt_reload_needed) {
kvm_load_ldt(vmx-host_state.ldt_sel);
 #ifdef CONFIG_X86_64
load_gs_index(vmx-host_state.gs_sel);
-   wrmsrl(MSR_KERNEL_GS_BASE, current-thread.gs);
 #else
loadsegment(gs, vmx-host_state.gs_sel);
 #endif
@@ -852,10 +854,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
loadsegment(fs, vmx-host_state.fs_sel);
reload_tss();
 #ifdef CONFIG_X86_64
-   if (is_long_mode(vmx-vcpu)) {
-   rdmsrl(MSR_KERNEL_GS_BASE, vmx-msr_guest_kernel_gs_base);
-   wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base);
-   }
+   wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base);
 #endif
if (current_thread_info()-status  TS_USEDFPU)
clts();
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Clear assigned guest IRQ on release

2010-11-23 Thread Avi Kivity
From: Jan Kiszka jan.kis...@siemens.com

When we deassign a guest IRQ, clear the potentially asserted guest line.
There might be no chance for the guest to do this, specifically if we
switch from INTx to MSI mode.

Acked-by: Alex Williamson alex.william...@redhat.com
Acked-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 7c98928..ecc4419 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -141,6 +141,9 @@ static void deassign_guest_irq(struct kvm *kvm,
kvm_unregister_irq_ack_notifier(kvm, assigned_dev-ack_notifier);
assigned_dev-ack_notifier.gsi = -1;
 
+   kvm_set_irq(assigned_dev-kvm, assigned_dev-irq_source_id,
+   assigned_dev-guest_irq, 0);
+
if (assigned_dev-irq_source_id != -1)
kvm_free_irq_source_id(kvm, assigned_dev-irq_source_id);
assigned_dev-irq_source_id = -1;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Refactor IRQ names of assigned devices

2010-11-23 Thread Avi Kivity
From: Jan Kiszka jan.kis...@siemens.com

Cosmetic change, but it helps to correlate IRQs with PCI devices.

Acked-by: Alex Williamson alex.william...@redhat.com
Acked-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9fe7fef..4bd663d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -489,6 +489,7 @@ struct kvm_assigned_dev_kernel {
struct pci_dev *dev;
struct kvm *kvm;
spinlock_t intx_lock;
+   char irq_name[32];
 };
 
 struct kvm_irq_mask_notifier {
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 1d77ce1..7623408 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -231,8 +231,7 @@ static int assigned_device_enable_host_intx(struct kvm *kvm,
 * are going to be long delays in accepting, acking, etc.
 */
if (request_threaded_irq(dev-host_irq, NULL, kvm_assigned_dev_thread,
-IRQF_ONESHOT, kvm_assigned_intx_device,
-(void *)dev))
+IRQF_ONESHOT, dev-irq_name, (void *)dev))
return -EIO;
return 0;
 }
@@ -251,7 +250,7 @@ static int assigned_device_enable_host_msi(struct kvm *kvm,
 
dev-host_irq = dev-dev-irq;
if (request_threaded_irq(dev-host_irq, NULL, kvm_assigned_dev_thread,
-0, kvm_assigned_msi_device, (void *)dev)) {
+0, dev-irq_name, (void *)dev)) {
pci_disable_msi(dev-dev);
return -EIO;
}
@@ -278,8 +277,7 @@ static int assigned_device_enable_host_msix(struct kvm *kvm,
for (i = 0; i  dev-entries_nr; i++) {
r = request_threaded_irq(dev-host_msix_entries[i].vector,
 NULL, kvm_assigned_dev_thread,
-0, kvm_assigned_msix_device,
-(void *)dev);
+0, dev-irq_name, (void *)dev);
if (r)
goto err;
}
@@ -336,6 +334,9 @@ static int assign_host_irq(struct kvm *kvm,
if (dev-irq_requested_type  KVM_DEV_IRQ_HOST_MASK)
return r;
 
+   snprintf(dev-irq_name, sizeof(dev-irq_name), kvm:%s,
+pci_name(dev-dev));
+
switch (host_irq_type) {
case KVM_DEV_IRQ_HOST_INTX:
r = assigned_device_enable_host_intx(kvm, dev);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Save/restore state of assigned PCI device

2010-11-23 Thread Avi Kivity
From: Jan Kiszka jan.kis...@siemens.com

The guest may change states that pci_reset_function does not touch. So
we better save/restore the assigned device across guest usage.

Acked-by: Alex Williamson alex.william...@redhat.com
Acked-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 7623408..d389207 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -197,7 +197,8 @@ static void kvm_free_assigned_device(struct kvm *kvm,
 {
kvm_free_assigned_irq(kvm, assigned_dev);
 
-   pci_reset_function(assigned_dev-dev);
+   __pci_reset_function(assigned_dev-dev);
+   pci_restore_state(assigned_dev-dev);
 
pci_release_regions(assigned_dev-dev);
pci_disable_device(assigned_dev-dev);
@@ -514,6 +515,7 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
}
 
pci_reset_function(dev);
+   pci_save_state(dev);
 
match-assigned_dev_id = assigned_dev-assigned_dev_id;
match-host_segnr = assigned_dev-segnr;
@@ -544,6 +546,7 @@ out:
mutex_unlock(kvm-lock);
return r;
 out_list_del:
+   pci_restore_state(dev);
list_del(match-list);
pci_release_regions(dev);
 out_disable:
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Document device assigment API

2010-11-23 Thread Avi Kivity
From: Jan Kiszka jan.kis...@siemens.com

Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.

Acked-by: Alex Williamson alex.william...@redhat.com
Acked-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index b336266..e1a9297 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -1085,6 +1085,184 @@ of 4 instructions that make up a hypercall.
 If any additional field gets added to this structure later on, a bit for that
 additional piece of information will be set in the flags bitmap.
 
+4.47 KVM_ASSIGN_PCI_DEVICE
+
+Capability: KVM_CAP_DEVICE_ASSIGNMENT
+Architectures: x86 ia64
+Type: vm ioctl
+Parameters: struct kvm_assigned_pci_dev (in)
+Returns: 0 on success, -1 on error
+
+Assigns a host PCI device to the VM.
+
+struct kvm_assigned_pci_dev {
+   __u32 assigned_dev_id;
+   __u32 busnr;
+   __u32 devfn;
+   __u32 flags;
+   __u32 segnr;
+   union {
+   __u32 reserved[11];
+   };
+};
+
+The PCI device is specified by the triple segnr, busnr, and devfn.
+Identification in succeeding service requests is done via assigned_dev_id. The
+following flags are specified:
+
+/* Depends on KVM_CAP_IOMMU */
+#define KVM_DEV_ASSIGN_ENABLE_IOMMU(1  0)
+
+4.48 KVM_DEASSIGN_PCI_DEVICE
+
+Capability: KVM_CAP_DEVICE_DEASSIGNMENT
+Architectures: x86 ia64
+Type: vm ioctl
+Parameters: struct kvm_assigned_pci_dev (in)
+Returns: 0 on success, -1 on error
+
+Ends PCI device assignment, releasing all associated resources.
+
+See KVM_CAP_DEVICE_ASSIGNMENT for the data structure. Only assigned_dev_id is
+used in kvm_assigned_pci_dev to identify the device.
+
+4.49 KVM_ASSIGN_DEV_IRQ
+
+Capability: KVM_CAP_ASSIGN_DEV_IRQ
+Architectures: x86 ia64
+Type: vm ioctl
+Parameters: struct kvm_assigned_irq (in)
+Returns: 0 on success, -1 on error
+
+Assigns an IRQ to a passed-through device.
+
+struct kvm_assigned_irq {
+   __u32 assigned_dev_id;
+   __u32 host_irq;
+   __u32 guest_irq;
+   __u32 flags;
+   union {
+   struct {
+   __u32 addr_lo;
+   __u32 addr_hi;
+   __u32 data;
+   } guest_msi;
+   __u32 reserved[12];
+   };
+};
+
+The following flags are defined:
+
+#define KVM_DEV_IRQ_HOST_INTX(1  0)
+#define KVM_DEV_IRQ_HOST_MSI (1  1)
+#define KVM_DEV_IRQ_HOST_MSIX(1  2)
+
+#define KVM_DEV_IRQ_GUEST_INTX   (1  8)
+#define KVM_DEV_IRQ_GUEST_MSI(1  9)
+#define KVM_DEV_IRQ_GUEST_MSIX   (1  10)
+
+It is not valid to specify multiple types per host or guest IRQ. However, the
+IRQ type of host and guest can differ or can even be null.
+
+4.50 KVM_DEASSIGN_DEV_IRQ
+
+Capability: KVM_CAP_ASSIGN_DEV_IRQ
+Architectures: x86 ia64
+Type: vm ioctl
+Parameters: struct kvm_assigned_irq (in)
+Returns: 0 on success, -1 on error
+
+Ends an IRQ assignment to a passed-through device.
+
+See KVM_ASSIGN_DEV_IRQ for the data structure. The target device is specified
+by assigned_dev_id, flags must correspond to the IRQ type specified on
+KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
+
+4.51 KVM_SET_GSI_ROUTING
+
+Capability: KVM_CAP_IRQ_ROUTING
+Architectures: x86 ia64
+Type: vm ioctl
+Parameters: struct kvm_irq_routing (in)
+Returns: 0 on success, -1 on error
+
+Sets the GSI routing table entries, overwriting any previously set entries.
+
+struct kvm_irq_routing {
+   __u32 nr;
+   __u32 flags;
+   struct kvm_irq_routing_entry entries[0];
+};
+
+No flags are specified so far, the corresponding field must be set to zero.
+
+struct kvm_irq_routing_entry {
+   __u32 gsi;
+   __u32 type;
+   __u32 flags;
+   __u32 pad;
+   union {
+   struct kvm_irq_routing_irqchip irqchip;
+   struct kvm_irq_routing_msi msi;
+   __u32 pad[8];
+   } u;
+};
+
+/* gsi routing entry types */
+#define KVM_IRQ_ROUTING_IRQCHIP 1
+#define KVM_IRQ_ROUTING_MSI 2
+
+No flags are specified so far, the corresponding field must be set to zero.
+
+struct kvm_irq_routing_irqchip {
+   __u32 irqchip;
+   __u32 pin;
+};
+
+struct kvm_irq_routing_msi {
+   __u32 address_lo;
+   __u32 address_hi;
+   __u32 data;
+   __u32 pad;
+};
+
+4.52 KVM_ASSIGN_SET_MSIX_NR
+
+Capability: KVM_CAP_DEVICE_MSIX
+Architectures: x86 ia64
+Type: vm ioctl
+Parameters: struct kvm_assigned_msix_nr (in)
+Returns: 0 on success, -1 on error
+
+Set the number of MSI-X interrupts for an assigned device. This service can
+only be called once in the lifetime of an assigned device.
+
+struct kvm_assigned_msix_nr {
+   __u32 assigned_dev_id;
+   __u16 entry_nr;
+   __u16 padding;
+};
+
+#define KVM_MAX_MSIX_PER_DEV   256
+
+4.53 

[COMMIT master] KVM: MMU: don't mark spte notrap if reserved bit set

2010-11-23 Thread Avi Kivity
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com

If reserved bit is set, we need inject the #PF with PFEC.RSVD=1,
but shadow_notrap_nonpresent_pte injects #PF with PFEC.RSVD=0 only

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index ba00eef..590bf12 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -395,8 +395,10 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, 
struct guest_walker *gw,
 
gpte = gptep[i];
 
-   if (!is_present_gpte(gpte) ||
- is_rsvd_bits_set(mmu, gpte, PT_PAGE_TABLE_LEVEL)) {
+   if (is_rsvd_bits_set(mmu, gpte, PT_PAGE_TABLE_LEVEL))
+   continue;
+
+   if (!is_present_gpte(gpte)) {
if (!sp-unsync)
__set_spte(spte, shadow_notrap_nonpresent_pte);
continue;
@@ -760,6 +762,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
pt_element_t gpte;
gpa_t pte_gpa;
gfn_t gfn;
+   bool rsvd_bits_set;
 
if (!is_shadow_present_pte(sp-spt[i]))
continue;
@@ -771,12 +774,14 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
return -EINVAL;
 
gfn = gpte_to_gfn(gpte);
-   if (is_rsvd_bits_set(vcpu-arch.mmu, gpte, PT_PAGE_TABLE_LEVEL)
- || gfn != sp-gfns[i] || !is_present_gpte(gpte)
- || !(gpte  PT_ACCESSED_MASK)) {
+   rsvd_bits_set = is_rsvd_bits_set(vcpu-arch.mmu, gpte,
+PT_PAGE_TABLE_LEVEL);
+   if (rsvd_bits_set || gfn != sp-gfns[i] ||
+ !is_present_gpte(gpte) || !(gpte  PT_ACCESSED_MASK)) {
u64 nonpresent;
 
-   if (is_present_gpte(gpte) || !clear_unsync)
+   if (rsvd_bits_set || is_present_gpte(gpte) ||
+ !clear_unsync)
nonpresent = shadow_trap_nonpresent_pte;
else
nonpresent = shadow_notrap_nonpresent_pte;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: take kvm_lock for hardware_disable() during cpu hotplug

2010-11-23 Thread Avi Kivity
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

In kvm_cpu_hotplug(), only CPU_STARTING case is protected by kvm_lock.
This patch adds missing protection for CPU_DYING case.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 339dd43..0fdd911 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2148,7 +2148,9 @@ static int kvm_cpu_hotplug(struct notifier_block 
*notifier, unsigned long val,
case CPU_DYING:
printk(KERN_INFO kvm: disabling virtualization on CPU%d\n,
   cpu);
+   spin_lock(kvm_lock);
hardware_disable(NULL);
+   spin_unlock(kvm_lock);
break;
case CPU_STARTING:
printk(KERN_INFO kvm: enabling virtualization on CPU%d\n,
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: drop unused #ifndef __KERNEL__

2010-11-23 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 38b6e8d..ffd6e01 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -20,16 +20,9 @@
  * From: xen-unstable 10676:af9809f51f81a3c43f276f00c81a52ef558afda4
  */
 
-#ifndef __KERNEL__
-#include stdio.h
-#include stdint.h
-#include public/xen.h
-#define DPRINTF(_f, _a ...) printf(_f , ## _a)
-#else
 #include linux/kvm_host.h
 #include kvm_cache_regs.h
 #define DPRINTF(x...) do {} while (0)
-#endif
 #include linux/module.h
 #include asm/kvm_emulate.h
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: rename hardware_[dis|en]able() to *_nolock() and add locking wrappers

2010-11-23 Thread Avi Kivity
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

The naming convension of hardware_[dis|en]able family is little bit confusing
because only hardware_[dis|en]able_all are using _nolock suffix.

Renaming current hardware_[dis|en]able() to *_nolock() and using
hardware_[dis|en]able() as wrapper functions which take kvm_lock for them
reduces extra confusion.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0fdd911..fb93ff9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2067,7 +2067,7 @@ static struct miscdevice kvm_dev = {
kvm_chardev_ops,
 };
 
-static void hardware_enable(void *junk)
+static void hardware_enable_nolock(void *junk)
 {
int cpu = raw_smp_processor_id();
int r;
@@ -2087,7 +2087,14 @@ static void hardware_enable(void *junk)
}
 }
 
-static void hardware_disable(void *junk)
+static void hardware_enable(void *junk)
+{
+   spin_lock(kvm_lock);
+   hardware_enable_nolock(junk);
+   spin_unlock(kvm_lock);
+}
+
+static void hardware_disable_nolock(void *junk)
 {
int cpu = raw_smp_processor_id();
 
@@ -2097,13 +2104,20 @@ static void hardware_disable(void *junk)
kvm_arch_hardware_disable(NULL);
 }
 
+static void hardware_disable(void *junk)
+{
+   spin_lock(kvm_lock);
+   hardware_disable_nolock(junk);
+   spin_unlock(kvm_lock);
+}
+
 static void hardware_disable_all_nolock(void)
 {
BUG_ON(!kvm_usage_count);
 
kvm_usage_count--;
if (!kvm_usage_count)
-   on_each_cpu(hardware_disable, NULL, 1);
+   on_each_cpu(hardware_disable_nolock, NULL, 1);
 }
 
 static void hardware_disable_all(void)
@@ -2122,7 +2136,7 @@ static int hardware_enable_all(void)
kvm_usage_count++;
if (kvm_usage_count == 1) {
atomic_set(hardware_enable_failed, 0);
-   on_each_cpu(hardware_enable, NULL, 1);
+   on_each_cpu(hardware_enable_nolock, NULL, 1);
 
if (atomic_read(hardware_enable_failed)) {
hardware_disable_all_nolock();
@@ -2148,16 +2162,12 @@ static int kvm_cpu_hotplug(struct notifier_block 
*notifier, unsigned long val,
case CPU_DYING:
printk(KERN_INFO kvm: disabling virtualization on CPU%d\n,
   cpu);
-   spin_lock(kvm_lock);
hardware_disable(NULL);
-   spin_unlock(kvm_lock);
break;
case CPU_STARTING:
printk(KERN_INFO kvm: enabling virtualization on CPU%d\n,
   cpu);
-   spin_lock(kvm_lock);
hardware_enable(NULL);
-   spin_unlock(kvm_lock);
break;
}
return NOTIFY_OK;
@@ -2188,7 +2198,7 @@ static int kvm_reboot(struct notifier_block *notifier, 
unsigned long val,
 */
printk(KERN_INFO kvm: exiting hardware virtualization\n);
kvm_rebooting = true;
-   on_each_cpu(hardware_disable, NULL, 1);
+   on_each_cpu(hardware_disable_nolock, NULL, 1);
return NOTIFY_OK;
 }
 
@@ -2358,7 +2368,7 @@ static void kvm_exit_debug(void)
 static int kvm_suspend(struct sys_device *dev, pm_message_t state)
 {
if (kvm_usage_count)
-   hardware_disable(NULL);
+   hardware_disable_nolock(NULL);
return 0;
 }
 
@@ -2366,7 +2376,7 @@ static int kvm_resume(struct sys_device *dev)
 {
if (kvm_usage_count) {
WARN_ON(spin_is_locked(kvm_lock));
-   hardware_enable(NULL);
+   hardware_enable_nolock(NULL);
}
return 0;
 }
@@ -2543,7 +2553,7 @@ void kvm_exit(void)
sysdev_class_unregister(kvm_sysdev_class);
unregister_reboot_notifier(kvm_reboot_notifier);
unregister_cpu_notifier(kvm_cpu_notifier);
-   on_each_cpu(hardware_disable, NULL, 1);
+   on_each_cpu(hardware_disable_nolock, NULL, 1);
kvm_arch_hardware_unsetup();
kvm_arch_exit();
free_cpumask_var(cpus_hardware_enabled);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Switch assigned device IRQ forwarding to threaded handler

2010-11-23 Thread Avi Kivity
From: Jan Kiszka jan.kis...@siemens.com

This improves the IRQ forwarding for assigned devices: By using the
kernel's threaded IRQ scheme, we can get rid of the latency-prone work
queue and simplify the code in the same run.

Moreover, we no longer have to hold assigned_dev_lock while raising the
guest IRQ, which can be a lenghty operation as we may have to iterate
over all VCPUs. The lock is now only used for synchronizing masking vs.
unmasking of INTx-type IRQs, thus is renames to intx_lock.

Acked-by: Alex Williamson alex.william...@redhat.com
Acked-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2d63f2c..9fe7fef 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -470,16 +470,8 @@ struct kvm_irq_ack_notifier {
void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
 };
 
-#define KVM_ASSIGNED_MSIX_PENDING  0x1
-struct kvm_guest_msix_entry {
-   u32 vector;
-   u16 entry;
-   u16 flags;
-};
-
 struct kvm_assigned_dev_kernel {
struct kvm_irq_ack_notifier ack_notifier;
-   struct work_struct interrupt_work;
struct list_head list;
int assigned_dev_id;
int host_segnr;
@@ -490,13 +482,13 @@ struct kvm_assigned_dev_kernel {
bool host_irq_disabled;
struct msix_entry *host_msix_entries;
int guest_irq;
-   struct kvm_guest_msix_entry *guest_msix_entries;
+   struct msix_entry *guest_msix_entries;
unsigned long irq_requested_type;
int irq_source_id;
int flags;
struct pci_dev *dev;
struct kvm *kvm;
-   spinlock_t assigned_dev_lock;
+   spinlock_t intx_lock;
 };
 
 struct kvm_irq_mask_notifier {
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index ecc4419..1d77ce1 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -55,58 +55,31 @@ static int find_index_from_host_irq(struct 
kvm_assigned_dev_kernel
return index;
 }
 
-static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
+static irqreturn_t kvm_assigned_dev_thread(int irq, void *dev_id)
 {
-   struct kvm_assigned_dev_kernel *assigned_dev;
-   int i;
+   struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
+   u32 vector;
+   int index;
 
-   assigned_dev = container_of(work, struct kvm_assigned_dev_kernel,
-   interrupt_work);
+   if (assigned_dev-irq_requested_type  KVM_DEV_IRQ_HOST_INTX) {
+   spin_lock(assigned_dev-intx_lock);
+   disable_irq_nosync(irq);
+   assigned_dev-host_irq_disabled = true;
+   spin_unlock(assigned_dev-intx_lock);
+   }
 
-   spin_lock_irq(assigned_dev-assigned_dev_lock);
if (assigned_dev-irq_requested_type  KVM_DEV_IRQ_HOST_MSIX) {
-   struct kvm_guest_msix_entry *guest_entries =
-   assigned_dev-guest_msix_entries;
-   for (i = 0; i  assigned_dev-entries_nr; i++) {
-   if (!(guest_entries[i].flags 
-   KVM_ASSIGNED_MSIX_PENDING))
-   continue;
-   guest_entries[i].flags = ~KVM_ASSIGNED_MSIX_PENDING;
+   index = find_index_from_host_irq(assigned_dev, irq);
+   if (index = 0) {
+   vector = assigned_dev-
+   guest_msix_entries[index].vector;
kvm_set_irq(assigned_dev-kvm,
-   assigned_dev-irq_source_id,
-   guest_entries[i].vector, 1);
+   assigned_dev-irq_source_id, vector, 1);
}
} else
kvm_set_irq(assigned_dev-kvm, assigned_dev-irq_source_id,
assigned_dev-guest_irq, 1);
 
-   spin_unlock_irq(assigned_dev-assigned_dev_lock);
-}
-
-static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id)
-{
-   unsigned long flags;
-   struct kvm_assigned_dev_kernel *assigned_dev =
-   (struct kvm_assigned_dev_kernel *) dev_id;
-
-   spin_lock_irqsave(assigned_dev-assigned_dev_lock, flags);
-   if (assigned_dev-irq_requested_type  KVM_DEV_IRQ_HOST_MSIX) {
-   int index = find_index_from_host_irq(assigned_dev, irq);
-   if (index  0)
-   goto out;
-   assigned_dev-guest_msix_entries[index].flags |=
-   KVM_ASSIGNED_MSIX_PENDING;
-   }
-
-   schedule_work(assigned_dev-interrupt_work);
-
-   if (assigned_dev-irq_requested_type  KVM_DEV_IRQ_GUEST_INTX) {
-   disable_irq_nosync(irq);
-   assigned_dev-host_irq_disabled = true;
-   }
-
-out:
-   

[COMMIT master] KVM: x86 emulator: drop DPRINTF()

2010-11-23 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Failed emulation is reported via a tracepoint; the cmps printk is pointless.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ffd6e01..3325b47 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -22,7 +22,6 @@
 
 #include linux/kvm_host.h
 #include kvm_cache_regs.h
-#define DPRINTF(x...) do {} while (0)
 #include linux/module.h
 #include asm/kvm_emulate.h
 
@@ -2796,10 +2795,8 @@ done_prefixes:
c-execute = opcode.u.execute;
 
/* Unrecognised? */
-   if (c-d == 0 || (c-d  Undefined)) {
-   DPRINTF(Cannot emulate %02x\n, c-b);
+   if (c-d == 0 || (c-d  Undefined))
return -1;
-   }
 
if (mode == X86EMUL_MODE_PROT64  (c-d  Stack))
c-op_bytes = 8;
@@ -3261,7 +3258,6 @@ special_insn:
break;
case 0xa6 ... 0xa7: /* cmps */
c-dst.type = OP_NONE; /* Disable writeback. */
-   DPRINTF(cmps: mem1=0x%p mem2=0x%p\n, c-src.addr.mem, 
c-dst.addr.mem);
goto cmp;
case 0xa8 ... 0xa9: /* test ax, imm */
goto test;
@@ -3778,6 +3774,5 @@ twobyte_insn:
goto writeback;
 
 cannot_emulate:
-   DPRINTF(Cannot emulate %02x\n, c-b);
return -1;
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: do not perform address calculations on linear addresses

2010-11-23 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Linear addresses are supposed to already have segment checks performed on them;
if we play with these addresses the checks become invalid.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e967055..bdbbb18 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -568,7 +568,8 @@ static int read_descriptor(struct x86_emulate_ctxt *ctxt,
   ctxt-vcpu, NULL);
if (rc != X86EMUL_CONTINUE)
return rc;
-   rc = ops-read_std(linear(ctxt, addr) + 2, address, op_bytes,
+   addr.ea += 2;
+   rc = ops-read_std(linear(ctxt, addr), address, op_bytes,
   ctxt-vcpu, NULL);
return rc;
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: preserve an operand's segment identity

2010-11-23 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Currently the x86 emulator converts the segment register associated with
an operand into a segment base which is added into the operand address.
This loss of information results in us not doing segment limit checks properly.

Replace struct operand's addr.mem field by a segmented_address structure
which holds both the effetive address and segment.  This will allow us to
do the limit check at the point of access.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index b36c6b3..b48c133 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -159,7 +159,10 @@ struct operand {
};
union {
unsigned long *reg;
-   unsigned long mem;
+   struct segmented_address {
+   ulong ea;
+   unsigned seg;
+   } mem;
} addr;
union {
unsigned long val;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 3325b47..e967055 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -410,9 +410,9 @@ address_mask(struct decode_cache *c, unsigned long reg)
 }
 
 static inline unsigned long
-register_address(struct decode_cache *c, unsigned long base, unsigned long reg)
+register_address(struct decode_cache *c, unsigned long reg)
 {
-   return base + address_mask(c, reg);
+   return address_mask(c, reg);
 }
 
 static inline void
@@ -444,26 +444,26 @@ static unsigned long seg_base(struct x86_emulate_ctxt 
*ctxt,
return ops-get_cached_segment_base(seg, ctxt-vcpu);
 }
 
-static unsigned long seg_override_base(struct x86_emulate_ctxt *ctxt,
-  struct x86_emulate_ops *ops,
-  struct decode_cache *c)
+static unsigned seg_override(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops,
+struct decode_cache *c)
 {
if (!c-has_seg_override)
return 0;
 
-   return seg_base(ctxt, ops, c-seg_override);
+   return c-seg_override;
 }
 
-static unsigned long es_base(struct x86_emulate_ctxt *ctxt,
-struct x86_emulate_ops *ops)
+static ulong linear(struct x86_emulate_ctxt *ctxt,
+   struct segmented_address addr)
 {
-   return seg_base(ctxt, ops, VCPU_SREG_ES);
-}
+   struct decode_cache *c = ctxt-decode;
+   ulong la;
 
-static unsigned long ss_base(struct x86_emulate_ctxt *ctxt,
-struct x86_emulate_ops *ops)
-{
-   return seg_base(ctxt, ops, VCPU_SREG_SS);
+   la = seg_base(ctxt, ctxt-ops, addr.seg) + addr.ea;
+   if (c-ad_bytes != 8)
+   la = (u32)-1;
+   return la;
 }
 
 static void emulate_exception(struct x86_emulate_ctxt *ctxt, int vec,
@@ -556,7 +556,7 @@ static void *decode_register(u8 modrm_reg, unsigned long 
*regs,
 
 static int read_descriptor(struct x86_emulate_ctxt *ctxt,
   struct x86_emulate_ops *ops,
-  ulong addr,
+  struct segmented_address addr,
   u16 *size, unsigned long *address, int op_bytes)
 {
int rc;
@@ -564,10 +564,12 @@ static int read_descriptor(struct x86_emulate_ctxt *ctxt,
if (op_bytes == 2)
op_bytes = 3;
*address = 0;
-   rc = ops-read_std(addr, (unsigned long *)size, 2, ctxt-vcpu, NULL);
+   rc = ops-read_std(linear(ctxt, addr), (unsigned long *)size, 2,
+  ctxt-vcpu, NULL);
if (rc != X86EMUL_CONTINUE)
return rc;
-   rc = ops-read_std(addr + 2, address, op_bytes, ctxt-vcpu, NULL);
+   rc = ops-read_std(linear(ctxt, addr) + 2, address, op_bytes,
+  ctxt-vcpu, NULL);
return rc;
 }
 
@@ -760,7 +762,7 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
break;
}
}
-   op-addr.mem = modrm_ea;
+   op-addr.mem.ea = modrm_ea;
 done:
return rc;
 }
@@ -775,13 +777,13 @@ static int decode_abs(struct x86_emulate_ctxt *ctxt,
op-type = OP_MEM;
switch (c-ad_bytes) {
case 2:
-   op-addr.mem = insn_fetch(u16, 2, c-eip);
+   op-addr.mem.ea = insn_fetch(u16, 2, c-eip);
break;
case 4:
-   op-addr.mem = insn_fetch(u32, 4, c-eip);
+   op-addr.mem.ea = insn_fetch(u32, 4, c-eip);
break;
case 8:
-   op-addr.mem = insn_fetch(u64, 8, c-eip);
+   op-addr.mem.ea = insn_fetch(u64, 8, c-eip);
break;
}
 done:
@@ -800,7 +802,7 @@ static void fetch_bit_operand(struct decode_cache *c)
else if (c-src.bytes == 4)
  

[COMMIT master] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/

2010-11-23 Thread Avi Kivity
From: Marcelo Tosatti mtosa...@redhat.com

Conflicts:
arch/x86/kvm/svm.c
kernel/sched.c

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()

2010-11-23 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

cea15c2 (KVM: Move KVM context switch into own function) split vmx_vcpu_run()
to prevent multiple copies of the context switch from being generated (causing
problems due to a label).  This patch folds them back together again and adds
the __noclone attribute to prevent the label from being duplicated.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a9ad174..58e5913 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3904,17 +3904,33 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
 #define Q l
 #endif
 
-/*
- * We put this into a separate noinline function to prevent the compiler
- * from duplicating the code. This is needed because this code
- * uses non local labels that cannot be duplicated.
- * Do not put any flow control into this function.
- * Better would be to put this whole monstrosity into a .S file.
- */
-static void noinline do_vmx_vcpu_run(struct kvm_vcpu *vcpu)
+static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
-   asm volatile(
+
+   /* Record the guest's net vcpu time for enforced NMI injections. */
+   if (unlikely(!cpu_has_virtual_nmis()  vmx-soft_vnmi_blocked))
+   vmx-entry_time = ktime_get();
+
+   /* Don't enter VMX if guest state is invalid, let the exit handler
+  start emulation until we arrive back to a valid state */
+   if (vmx-emulation_required  emulate_invalid_guest_state)
+   return;
+
+   if (test_bit(VCPU_REGS_RSP, (unsigned long *)vcpu-arch.regs_dirty))
+   vmcs_writel(GUEST_RSP, vcpu-arch.regs[VCPU_REGS_RSP]);
+   if (test_bit(VCPU_REGS_RIP, (unsigned long *)vcpu-arch.regs_dirty))
+   vmcs_writel(GUEST_RIP, vcpu-arch.regs[VCPU_REGS_RIP]);
+
+   /* When single-stepping over STI and MOV SS, we must clear the
+* corresponding interruptibility bits in the guest state. Otherwise
+* vmentry fails as it then expects bit 14 (BS) in pending debug
+* exceptions being set, but that's not correct for the guest debugging
+* case. */
+   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
+   vmx_set_interrupt_shadow(vcpu, 0);
+
+   asm(
/* Store host registers */
push %%Rdx; push %%Rbp;
push %%Rcx \n\t
@@ -4009,35 +4025,6 @@ static void noinline do_vmx_vcpu_run(struct kvm_vcpu 
*vcpu)
, r8, r9, r10, r11, r12, r13, r14, r15
 #endif
  );
-}
-
-static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
-{
-   struct vcpu_vmx *vmx = to_vmx(vcpu);
-
-   /* Record the guest's net vcpu time for enforced NMI injections. */
-   if (unlikely(!cpu_has_virtual_nmis()  vmx-soft_vnmi_blocked))
-   vmx-entry_time = ktime_get();
-
-   /* Don't enter VMX if guest state is invalid, let the exit handler
-  start emulation until we arrive back to a valid state */
-   if (vmx-emulation_required  emulate_invalid_guest_state)
-   return;
-
-   if (test_bit(VCPU_REGS_RSP, (unsigned long *)vcpu-arch.regs_dirty))
-   vmcs_writel(GUEST_RSP, vcpu-arch.regs[VCPU_REGS_RSP]);
-   if (test_bit(VCPU_REGS_RIP, (unsigned long *)vcpu-arch.regs_dirty))
-   vmcs_writel(GUEST_RIP, vcpu-arch.regs[VCPU_REGS_RIP]);
-
-   /* When single-stepping over STI and MOV SS, we must clear the
-* corresponding interruptibility bits in the guest state. Otherwise
-* vmentry fails as it then expects bit 14 (BS) in pending debug
-* exceptions being set, but that's not correct for the guest debugging
-* case. */
-   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
-   vmx_set_interrupt_shadow(vcpu, 0);
-
-   do_vmx_vcpu_run(vcpu);
 
vcpu-arch.regs_avail = ~((1  VCPU_REGS_RIP) | (1  VCPU_REGS_RSP)
  | (1  VCPU_EXREG_PDPTR));
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: VMX: Inform user about INTEL_TXT dependency

2010-11-23 Thread Avi Kivity
From: Shane Wang shane.w...@intel.com

Inform user to either disable TXT in the BIOS or do TXT launch
with tboot before enabling KVM since some BIOSes do not set
FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX bit when TXT is enabled.

Signed-off-by: Shane Wang shane.w...@intel.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0badeac..a9ad174 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1305,8 +1305,11 @@ static __init int vmx_disabled_by_bios(void)
 tboot_enabled())
return 1;
if (!(msr  FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX)
-!tboot_enabled())
+!tboot_enabled()) {
+   printk(KERN_WARNING kvm: disable TXT in the BIOS or 
+activate TXT before enabling KVM\n);
return 1;
+   }
}
 
return 0;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Add instruction-set-specific exit qualifications to kvm_exit trace

2010-11-23 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

The exit reason alone is insufficient to understand exactly why an exit
occured; add ISA-specific trace parameters for additional information.

Because fetching these parameters is expensive on vmx, and because these
parameters are fetched even if tracing is disabled, we fetch the
parameters via a callback instead of as traditional trace arguments.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b04c0fa..54e42c8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -594,6 +594,7 @@ struct kvm_x86_ops {
 
void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
 
+   void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2);
const struct trace_print_flags *exit_reasons_str;
 };
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index b83954e..2fd2f4d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2974,6 +2974,14 @@ void dump_vmcb(struct kvm_vcpu *vcpu)
 
 }
 
+static void svm_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2)
+{
+   struct vmcb_control_area *control = to_svm(vcpu)-vmcb-control;
+
+   *info1 = control-exit_info_1;
+   *info2 = control-exit_info_2;
+}
+
 static int handle_exit(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3678,7 +3686,9 @@ static struct kvm_x86_ops svm_x86_ops = {
.get_tdp_level = get_npt_level,
.get_mt_mask = svm_get_mt_mask,
 
+   .get_exit_info = svm_get_exit_info,
.exit_reasons_str = svm_exit_reasons_str,
+
.get_lpage_level = svm_get_lpage_level,
 
.cpuid_update = svm_cpuid_update,
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 1061022..1357d7c 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -192,18 +192,22 @@ TRACE_EVENT(kvm_exit,
__field(unsigned int,   exit_reason )
__field(unsigned long,  guest_rip   )
__field(u32,isa )
+   __field(u64,info1   )
+   __field(u64,info2   )
),
 
TP_fast_assign(
__entry-exit_reason= exit_reason;
__entry-guest_rip  = kvm_rip_read(vcpu);
__entry-isa= isa;
+   kvm_x86_ops-get_exit_info(vcpu, __entry-info1,
+  __entry-info2);
),
 
-   TP_printk(reason %s rip 0x%lx,
+   TP_printk(reason %s rip 0x%lx info %llx %llx,
 ftrace_print_symbols_seq(p, __entry-exit_reason,
  kvm_x86_ops-exit_reasons_str),
-__entry-guest_rip)
+__entry-guest_rip, __entry-info1, __entry-info2)
 );
 
 /*
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4e2b8f3..caa967e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3690,6 +3690,12 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu 
*vcpu) = {
 static const int kvm_vmx_max_exit_handlers =
ARRAY_SIZE(kvm_vmx_exit_handlers);
 
+static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2)
+{
+   *info1 = vmcs_readl(EXIT_QUALIFICATION);
+   *info2 = vmcs_read32(VM_EXIT_INTR_INFO);
+}
+
 /*
  * The guest has exited.  See if we can fix it or if we need userspace
  * assistance.
@@ -4339,7 +4345,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
.get_tdp_level = get_ept_level,
.get_mt_mask = vmx_get_mt_mask,
 
+   .get_exit_info = vmx_get_exit_info,
.exit_reasons_str = vmx_exit_reasons_str,
+
.get_lpage_level = vmx_get_lpage_level,
 
.cpuid_update = vmx_cpuid_update,
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Record instruction set in kvm_exit tracepoint

2010-11-23 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

exit_reason's meaning depend on the instruction set; record it so a trace
taken on one machine can be interpreted on another.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c6a7798..b83954e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2980,7 +2980,7 @@ static int handle_exit(struct kvm_vcpu *vcpu)
struct kvm_run *kvm_run = vcpu-run;
u32 exit_code = svm-vmcb-control.exit_code;
 
-   trace_kvm_exit(exit_code, vcpu);
+   trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM);
 
if (!(svm-vmcb-control.intercept_cr_write  INTERCEPT_CR0_MASK))
vcpu-arch.cr0 = svm-vmcb-save.cr0;
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index a6544b8..1061022 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -178,21 +178,26 @@ TRACE_EVENT(kvm_apic,
 #define trace_kvm_apic_read(reg, val)  trace_kvm_apic(0, reg, val)
 #define trace_kvm_apic_write(reg, val) trace_kvm_apic(1, reg, val)
 
+#define KVM_ISA_VMX   1
+#define KVM_ISA_SVM   2
+
 /*
  * Tracepoint for kvm guest exit:
  */
 TRACE_EVENT(kvm_exit,
-   TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu),
-   TP_ARGS(exit_reason, vcpu),
+   TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu, u32 isa),
+   TP_ARGS(exit_reason, vcpu, isa),
 
TP_STRUCT__entry(
__field(unsigned int,   exit_reason )
__field(unsigned long,  guest_rip   )
+   __field(u32,isa )
),
 
TP_fast_assign(
__entry-exit_reason= exit_reason;
__entry-guest_rip  = kvm_rip_read(vcpu);
+   __entry-isa= isa;
),
 
TP_printk(reason %s rip 0x%lx,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 58e5913..4e2b8f3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3700,7 +3700,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
u32 exit_reason = vmx-exit_reason;
u32 vectoring_info = vmx-idt_vectoring_info;
 
-   trace_kvm_exit(exit_reason, vcpu);
+   trace_kvm_exit(exit_reason, vcpu, KVM_ISA_VMX);
 
/* If guest state is invalid, start emulating */
if (vmx-emulation_required  emulate_invalid_guest_state)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: fast-path msi injection with irqfd

2010-11-23 Thread Avi Kivity
From: Michael S. Tsirkin m...@redhat.com

Store irq routing table pointer in the irqfd object,
and use that to inject MSI directly without bouncing out to
a kernel thread.

While we touch this structure, rearrange irqfd fields to make fastpath
better packed for better cache utilization.

This also adds some comments about locking rules and rcu usage in code.

Some notes on the design:
- Use pointer into the rt instead of copying an entry,
  to make it possible to use rcu, thus side-stepping
  locking complexities.  We also save some memory this way.
- Old workqueue code is still used for level irqs.
  I don't think we DTRT with level anyway, however,
  it seems easier to keep the code around as
  it has been thought through and debugged, and fix level later than
  rip out and re-instate it later.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Acked-by: Marcelo Tosatti mtosa...@redhat.com
Acked-by: Gregory Haskins ghask...@novell.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4bd663d..f17beae 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -17,6 +17,7 @@
 #include linux/preempt.h
 #include linux/msi.h
 #include linux/slab.h
+#include linux/rcupdate.h
 #include asm/signal.h
 
 #include linux/kvm.h
@@ -240,6 +241,10 @@ struct kvm {
 
struct mutex irq_lock;
 #ifdef CONFIG_HAVE_KVM_IRQCHIP
+   /*
+* Update side is protected by irq_lock and,
+* if configured, irqfds.lock.
+*/
struct kvm_irq_routing_table __rcu *irq_routing;
struct hlist_head mask_notifier_list;
struct hlist_head irq_ack_notifier_list;
@@ -511,6 +516,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
*ioapic,
   unsigned long *deliver_bitmask);
 #endif
 int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
+int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
*kvm,
+   int irq_source_id, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
   struct kvm_irq_ack_notifier *kian);
@@ -652,17 +659,26 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) 
{}
 void kvm_eventfd_init(struct kvm *kvm);
 int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
 void kvm_irqfd_release(struct kvm *kvm);
+void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
 
 #else
 
 static inline void kvm_eventfd_init(struct kvm *kvm) {}
+
 static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
 {
return -EINVAL;
 }
 
 static inline void kvm_irqfd_release(struct kvm *kvm) {}
+
+static inline void kvm_irq_routing_update(struct kvm *kvm,
+ struct kvm_irq_routing_table *irq_rt)
+{
+   rcu_assign_pointer(kvm-irq_routing, irq_rt);
+}
+
 static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 {
return -ENOSYS;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index c1f1e3c..2ca4535 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -44,14 +44,19 @@
  */
 
 struct _irqfd {
-   struct kvm   *kvm;
-   struct eventfd_ctx   *eventfd;
-   int   gsi;
-   struct list_head  list;
-   poll_tablept;
-   wait_queue_t  wait;
-   struct work_structinject;
-   struct work_structshutdown;
+   /* Used for MSI fast-path */
+   struct kvm *kvm;
+   wait_queue_t wait;
+   /* Update side is protected by irqfds.lock */
+   struct kvm_kernel_irq_routing_entry __rcu *irq_entry;
+   /* Used for level IRQ fast-path */
+   int gsi;
+   struct work_struct inject;
+   /* Used for setup/shutdown */
+   struct eventfd_ctx *eventfd;
+   struct list_head list;
+   poll_table pt;
+   struct work_struct shutdown;
 };
 
 static struct workqueue_struct *irqfd_cleanup_wq;
@@ -125,14 +130,22 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, 
void *key)
 {
struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
unsigned long flags = (unsigned long)key;
+   struct kvm_kernel_irq_routing_entry *irq;
+   struct kvm *kvm = irqfd-kvm;
 
-   if (flags  POLLIN)
+   if (flags  POLLIN) {
+   rcu_read_lock();
+   irq = rcu_dereference(irqfd-irq_entry);
/* An event has been signaled, inject an interrupt */
-   schedule_work(irqfd-inject);
+   if (irq)
+   kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1);
+   else
+   schedule_work(irqfd-inject);
+   rcu_read_unlock();
+   }
 
if (flags  POLLHUP) 

[COMMIT master] apic: test nmi-after-sti

2010-11-23 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

While not required by the spec, some guests (Linux)
rely on nmi being blocked by an IF-enabling sti.  Add
a unit test for this condition.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/x86/apic.c b/x86/apic.c
index 165f820..2207040 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -1,6 +1,7 @@
 #include libcflat.h
 #include apic.h
 #include vm.h
+#include smp.h
 
 typedef struct {
 unsigned short offset0;
@@ -274,9 +275,74 @@ static void test_ioapic_simultaneous(void)
g_66  g_78  g_66_after_78  g_66_rip == g_78_rip);
 }
 
+volatile int nmi_counter_private, nmi_counter, nmi_hlt_counter, 
sti_loop_active;
+
+void sti_nop(char *p)
+{
+asm volatile (
+ .globl post_sti \n\t
+ sti \n
+ /*
+  * vmx won't exit on external interrupt if blocked-by-sti,
+  * so give it a reason to exit by accessing an unmapped page.
+  */
+ post_sti: testb $0, %0 \n\t
+ nop \n\t
+ cli
+ : : m(*p)
+ );
+nmi_counter = nmi_counter_private;
+}
+
+static void sti_loop(void *ignore)
+{
+unsigned k = 0;
+
+while (sti_loop_active) {
+   sti_nop((char *)(ulong)((k++ * 4096) % (128 * 1024 * 1024)));
+}
+}
+
+static void nmi_handler(isr_regs_t *regs)
+{
+extern void post_sti(void);
+++nmi_counter_private;
+nmi_hlt_counter += regs-rip == (ulong)post_sti;
+}
+
+static void update_cr3(void *cr3)
+{
+write_cr3((ulong)cr3);
+}
+
+static void test_sti_nmi(void)
+{
+unsigned old_counter;
+
+if (cpu_count()  2) {
+   return;
+}
+
+set_idt_entry(2, nmi_handler);
+on_cpu(1, update_cr3, (void *)read_cr3());
+
+sti_loop_active = 1;
+on_cpu_async(1, sti_loop, 0);
+while (nmi_counter  3) {
+   old_counter = nmi_counter;
+   apic_icr_write(APIC_DEST_PHYSICAL | APIC_DM_NMI | APIC_INT_ASSERT, 1);
+   while (nmi_counter == old_counter) {
+   ;
+   }
+}
+sti_loop_active = 0;
+report(nmi-after-sti, nmi_hlt_counter == 0);
+}
+
 int main()
 {
 setup_vm();
+smp_init();
 
 test_lapic_existence();
 
@@ -288,6 +354,7 @@ int main()
 
 test_ioapic_intr();
 test_ioapic_simultaneous();
+test_sti_nmi();
 
 printf(\nsummary: %d tests, %d failures\n, g_tests, g_fail);
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] apic: use boot idt instead of a locally allocated idt

2010-11-23 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

This allows the smp support, which uses the boot idt, to work.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/x86/apic.c b/x86/apic.c
index 48fa0f7..165f820 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -89,7 +89,7 @@ asm (
 #endif
 );
 
-static idt_entry_t idt[256];
+static idt_entry_t *idt = 0;
 
 static int g_fail;
 static int g_tests;
@@ -127,19 +127,6 @@ void test_enable_x2apic(void)
 }
 }
 
-static void init_idt(void)
-{
-struct {
-u16 limit;
-ulong idt;
-} __attribute__((packed)) idt_ptr = {
-sizeof(idt_entry_t) * 256 - 1,
-(ulong)idt,
-};
-
-asm volatile(lidt %0 : : m(idt_ptr));
-}
-
 static void set_idt_entry(unsigned vec, void (*func)(isr_regs_t *regs))
 {
 u8 *thunk = vmalloc(50);
@@ -296,7 +283,6 @@ int main()
 mask_pic_interrupts();
 enable_apic();
 test_enable_x2apic();
-init_idt();
 
 test_self_ipi();
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands

2010-11-23 Thread Dor Laor

On 11/23/2010 08:41 AM, Avi Kivity wrote:

On 11/23/2010 01:00 AM, Anthony Liguori wrote:

qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT. Instead of
teaching
them to respond to these signals, introduce monitor commands that stop
and start
individual vcpus.

The purpose of these commands are to implement CPU hard limits using
an external
tool that watches the CPU consumption and stops the CPU as appropriate.


Why not use cgroup for that?



The monitor commands provide a more elegant solution that signals
because it
ensures that a stopped vcpu isn't holding the qemu_mutex.



 From signal(7):

The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.

Perhaps this is a bug in kvm?

If we could catch SIGSTOP, then it would be easy to unblock it only
while running in guest context. It would then stop on exit to userspace.

Using monitor commands is fairly heavyweight for something as high
frequency as this. What control period do you see people using? Maybe we
should define USR1 for vcpu start/stop.

What happens if one vcpu is stopped while another is running? Spin
loops, synchronous IPIs will take forever. Maybe we need to stop the
entire process.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mask bit support's API

2010-11-23 Thread Yang, Sheng
On Tuesday 23 November 2010 15:54:40 Avi Kivity wrote:
 On 11/23/2010 08:35 AM, Yang, Sheng wrote:
  On Tuesday 23 November 2010 14:17:28 Avi Kivity wrote:
On 11/23/2010 08:09 AM, Yang, Sheng wrote:
  Hi Avi,
  
  I've purposed the following API for mask bit support.
  
  The main point is, QEmu can know which entries are enabled(by
  pci_enable_msix()). And for enabled entries, kernel own it,
  including MSI data/address and mask bit(routing table and mask
  bitmap). QEmu should use KVM_GET_MSIX_ENTRY ioctl to get them(and
  it can sync with them if it want to do so).
  
  Before entries are enabled, QEmu can still use it's own MSI
  table(because we didn't contain these kind of information in
  kernel, and it's unnecessary for kernel).
  
  The KVM_MSIX_FLAG_ENTRY flag would be clear if QEmu want to query
  one entry didn't exist in kernel - or we can simply return -EINVAL
  for it.
  
  I suppose it would be rare for QEmu to use this interface to get
  the context of entry(the only case I think is when MSI-X disable
  and QEmu need to sync the context), so performance should not be
  an issue.
  
  What's your opinion?
  
 #define KVM_GET_MSIX_ENTRY_IOWR(KVMIO,  0x7d, struct
 kvm_msix_entry)

Need SET_MSIX_ENTRY for live migration as well.
  
  Current we don't support LM with VT-d...
 
 Isn't this work useful for virtio as well?

Yeah, but won't be included in this patchset.
 
 #define KVM_UPDATE_MSIX_MMIO  _IOW(KVMIO,  0x7e, struct
 kvm_msix_mmio)
 
 #define KVM_MSIX_TYPE_ASSIGNED_DEV  1
 
 #define KVM_MSIX_FLAG_MASKBIT   (1   0)
 #define KVM_MSIX_FLAG_QUERY_MASKBIT (1   0)
 #define KVM_MSIX_FLAG_ENTRY (1   1)
 #define KVM_MSIX_FLAG_QUERY_ENTRY   (1   1)

Why is there a need for the flag?  If we simply get/set entire
entries, that includes the mask bits?
  
  We still want QEmu to cover a part of entries which hasn't been enabled
  yet(which won't existed in routing table), but kernel would cover all
  mask bit regardless of if it's enabled. So QEmu can query any entry to
  check the maskbit, but not address/data.
 
 Don't understand.  If we support reading/writing entire entries, that
 works for both enabled and disabled entries?
 
What about the pending bits?
  
  We didn't cover it here - and it's in another MMIO space(PBA). Of course
  we can add more flags for it later.
 
 When an entry is masked, we need to set the pending bit for it
 somewhere.  I guess this is broken in the existing code (without your
 patches)?

Even with my patch, we didn't support the pending bit. It would always return 0 
now. What we supposed to do(after my patch checked in) is to check IRQ_PENDING 
flag 
of irq_desc-status(if the entry is masked), and return the result to userspace.

That would involve some core change, like to export irq_to_desc(). I don't 
think 
it would be accepted soon, so would push mask bit first.

 
Also need a new exit reason to tell userspace that an msix entry has
changed, so userspace can update mappings.
  
  I think we don't need it. Whenever userspace want to get one mapping
  which is an enabled MSI-X entry, it can check it with the API
  above(which is quite rare, because kernel would handle all of them when
  guest is accessing them). If it's a disabled entry, the context inside
  userspace MMIO record is the correct one(and only one). The only place I
  think QEmu need to sync is when MSI-X is about to disabled, QEmu need to
  update it's own MMIO record.
 
 So in-kernel handling of mmio would be decided per entry?  I'm trying to
 simplify this, and simplest thing is - all or nothing.

So you would like to handle all MSI-X MMIO in kernel?

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: buildbot for kvm.git

2010-11-23 Thread Avi Kivity

On 11/23/2010 02:11 AM, Daniel Gollub wrote:

On Monday, November 22, 2010 10:37:05 pm Avi Kivity wrote:
  On 11/11/2010 11:22 AM, Daniel Gollub wrote:
On Thursday, November 11, 2010 02:31:06 am Avi Kivity wrote:
   Daniel, the buildbot has been fairly effective in keeping qemu-kvm.git
   building.  I'd like to extend that to kvm.git, especially for non-x86
   architectures.
  
[...]
  
   Can you help with this?
  
Sure. I'll look into that next week.

  Daniel, any news about this?

Currently I'm applying your recipe on the buildmaster configuration.

Beside that, buildmaster and a small x86_64 buildslave got setup and is
available on:
http://buildbot.b1-systems.de/kvm/

Once I'm done with the buildmaster configuration (and some more testing)
kvm.git continuous build testing could be ready within the next days.
(I'm travelling right now, but shouldn't block me to get this done)

If you like you can already setup the git post-receive hook in the kvm.git
repo to trigger the buildmaster.

Like for qemu-kvm.git you need to copy git_buildbot.py (preferably a copy of
the one which is used for qemu-kvm.git) and change the master-port to 9991
( master = :9991 )

In hooks/post-receive you add:
/path/to/git_buildbot.py $1 $2 $3



Thanks, done.

Will you set up crossbuilders for ppc/ia64/s390, or will I contribute a 
builder?



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trace_printk() support in trace-cmd

2010-11-23 Thread Avi Kivity

On 11/16/2010 05:12 PM, Steven Rostedt wrote:


  Hmm, I'll try it out on the latest kernel. Would you be able to upload
  the trace.dat that does not work someplace that I can get it. I'd like
  to take a look at it. If you don't have a place to put it, I could give
  you access to my box, and you can scp it there.

Hmm, I still can not reproduce. But as a workaround, here's what you can
do for now. Instead of using trace_printk() use:


__trace_printk(_THIS_IP_, format, args);

This will force the snprintf into the buffer and skips the bprintk trick
to post process at read time.


I see a trace_printk() commit in trace-cmd.git.  Is that related?  If 
not, I'll work on getting a small sample of the problem.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH trace-cmd 0/3] kvm plugin updates

2010-11-23 Thread Avi Kivity
Currently the kvm plugin only decodes vmx exit reasons; the first patch
in this series adds support for the svm instruction set.

Second patch fixes a typo.

A couple of fields were added to the kvm_exit tracepoint; the third patch
prints them out.

Avi Kivity (3):
  kvm: parse svm exit reason
  kvm: fix typo UNKOWN
  kvm: display the new kvm_exit info1 and info2 fields, if available

 plugin_kvm.c |  121 ++
 1 files changed, 113 insertions(+), 8 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH trace-cmd 2/3] kvm: fix typo UNKOWN

2010-11-23 Thread Avi Kivity
Signed-off-by: Avi Kivity a...@redhat.com
---
 plugin_kvm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/plugin_kvm.c b/plugin_kvm.c
index c8e8b8c..659b27f 100644
--- a/plugin_kvm.c
+++ b/plugin_kvm.c
@@ -236,7 +236,7 @@ static const char *find_exit_reason(unsigned isa, int val)
break;
if (strings[i].str)
return strings[i].str;
-   return UNKOWN;
+   return UNKNOWN;
 }
 
 static int kvm_exit_handler(struct trace_seq *s, struct record *record,
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH trace-cmd 1/3] kvm: parse svm exit reason

2010-11-23 Thread Avi Kivity
svm exit reasons use different code than vmx; use the new isa trace field
to select the instruction set and display the strings accordingly.

Signed-off-by: Avi Kivity a...@redhat.com
---
 plugin_kvm.c |  114 ++
 1 files changed, 107 insertions(+), 7 deletions(-)

diff --git a/plugin_kvm.c b/plugin_kvm.c
index 724143d..c8e8b8c 100644
--- a/plugin_kvm.c
+++ b/plugin_kvm.c
@@ -120,6 +120,80 @@ static const char *disassemble(unsigned char *insn, int 
len, uint64_t rip,
_ER(EPT_MISCONFIG,  49) \
_ER(WBINVD, 54)
 
+#define SVM_EXIT_REASONS \
+   _ER(EXIT_READ_CR0,  0x000)  \
+   _ER(EXIT_READ_CR3,  0x003)  \
+   _ER(EXIT_READ_CR4,  0x004)  \
+   _ER(EXIT_READ_CR8,  0x008)  \
+   _ER(EXIT_WRITE_CR0, 0x010)  \
+   _ER(EXIT_WRITE_CR3, 0x013)  \
+   _ER(EXIT_WRITE_CR4, 0x014)  \
+   _ER(EXIT_WRITE_CR8, 0x018)  \
+   _ER(EXIT_READ_DR0,  0x020)  \
+   _ER(EXIT_READ_DR1,  0x021)  \
+   _ER(EXIT_READ_DR2,  0x022)  \
+   _ER(EXIT_READ_DR3,  0x023)  \
+   _ER(EXIT_READ_DR4,  0x024)  \
+   _ER(EXIT_READ_DR5,  0x025)  \
+   _ER(EXIT_READ_DR6,  0x026)  \
+   _ER(EXIT_READ_DR7,  0x027)  \
+   _ER(EXIT_WRITE_DR0, 0x030)  \
+   _ER(EXIT_WRITE_DR1, 0x031)  \
+   _ER(EXIT_WRITE_DR2, 0x032)  \
+   _ER(EXIT_WRITE_DR3, 0x033)  \
+   _ER(EXIT_WRITE_DR4, 0x034)  \
+   _ER(EXIT_WRITE_DR5, 0x035)  \
+   _ER(EXIT_WRITE_DR6, 0x036)  \
+   _ER(EXIT_WRITE_DR7, 0x037)  \
+   _ER(EXIT_EXCP_BASE, 0x040)  \
+   _ER(EXIT_INTR,  0x060)  \
+   _ER(EXIT_NMI,   0x061)  \
+   _ER(EXIT_SMI,   0x062)  \
+   _ER(EXIT_INIT,  0x063)  \
+   _ER(EXIT_VINTR, 0x064)  \
+   _ER(EXIT_CR0_SEL_WRITE, 0x065)  \
+   _ER(EXIT_IDTR_READ, 0x066)  \
+   _ER(EXIT_GDTR_READ, 0x067)  \
+   _ER(EXIT_LDTR_READ, 0x068)  \
+   _ER(EXIT_TR_READ,   0x069)  \
+   _ER(EXIT_IDTR_WRITE,0x06a)  \
+   _ER(EXIT_GDTR_WRITE,0x06b)  \
+   _ER(EXIT_LDTR_WRITE,0x06c)  \
+   _ER(EXIT_TR_WRITE,  0x06d)  \
+   _ER(EXIT_RDTSC, 0x06e)  \
+   _ER(EXIT_RDPMC, 0x06f)  \
+   _ER(EXIT_PUSHF, 0x070)  \
+   _ER(EXIT_POPF,  0x071)  \
+   _ER(EXIT_CPUID, 0x072)  \
+   _ER(EXIT_RSM,   0x073)  \
+   _ER(EXIT_IRET,  0x074)  \
+   _ER(EXIT_SWINT, 0x075)  \
+   _ER(EXIT_INVD,  0x076)  \
+   _ER(EXIT_PAUSE, 0x077)  \
+   _ER(EXIT_HLT,   0x078)  \
+   _ER(EXIT_INVLPG,0x079)  \
+   _ER(EXIT_INVLPGA,   0x07a)  \
+   _ER(EXIT_IOIO,  0x07b)  \
+   _ER(EXIT_MSR,   0x07c)  \
+   _ER(EXIT_TASK_SWITCH,   0x07d)  \
+   _ER(EXIT_FERR_FREEZE,   0x07e)  \
+   _ER(EXIT_SHUTDOWN,  0x07f)  \
+   _ER(EXIT_VMRUN, 0x080)  \
+   _ER(EXIT_VMMCALL,   0x081)  \
+   _ER(EXIT_VMLOAD,0x082)  \
+   _ER(EXIT_VMSAVE,0x083)  \
+   _ER(EXIT_STGI,  0x084)  \
+   _ER(EXIT_CLGI,  0x085)  \
+   _ER(EXIT_SKINIT,0x086)  \
+   _ER(EXIT_RDTSCP,0x087)  \
+   _ER(EXIT_ICEBP, 0x088)  \
+   _ER(EXIT_WBINVD,0x089)  \
+   _ER(EXIT_MONITOR,   0x08a)  \
+   _ER(EXIT_MWAIT, 0x08b)  \
+   _ER(EXIT_MWAIT_COND,0x08c)  \
+   _ER(EXIT_NPF,   0x400)  \
+   _ER(EXIT_ERR,   -1)
+
 #define _ER(reason, val)   { #reason, val },
 struct str_values {
const char  *str;
@@ -131,27 +205,53 @@ static struct str_values vmx_exit_reasons[] = {
{ NULL, -1}
 };
 
-static const char *find_vmx_reason(int val)
+static struct str_values svm_exit_reasons[] = {
+   SVM_EXIT_REASONS
+   { NULL, -1}
+};
+
+static struct isa_exit_reasons {
+   unsigned isa;
+   struct str_values *strings;
+} isa_exit_reasons[] = {
+   { .isa = 1, .strings = vmx_exit_reasons },
+   { .isa = 2, .strings = svm_exit_reasons },
+   { }
+};
+
+static const char *find_exit_reason(unsigned isa, int val)
 {
+   struct str_values *strings = NULL;
int i;
 
-   for (i = 0; vmx_exit_reasons[i].val = 0; i++)
-  

[PATCH trace-cmd 3/3] kvm: display the new kvm_exit info1 and info2 fields, if available

2010-11-23 Thread Avi Kivity
Signed-off-by: Avi Kivity a...@redhat.com
---
 plugin_kvm.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/plugin_kvm.c b/plugin_kvm.c
index 659b27f..c1cb2e4 100644
--- a/plugin_kvm.c
+++ b/plugin_kvm.c
@@ -244,6 +244,7 @@ static int kvm_exit_handler(struct trace_seq *s, struct 
record *record,
 {
unsigned long long isa;
unsigned long long val;
+   unsigned long long info1 = 0, info2 = 0;
 
if (pevent_get_field_val(s, event, exit_reason, record, val, 1)  0)
return -1;
@@ -255,6 +256,10 @@ static int kvm_exit_handler(struct trace_seq *s, struct 
record *record,
 
pevent_print_num_field(s,  rip 0x%lx, event, guest_rip, record, 1);
 
+   if (pevent_get_field_val(s, event, info1, record, info1, 1) = 0
+pevent_get_field_val(s, event, info2, record, info2, 1) = 0)
+   trace_seq_printf(s,  info %llx %llx\n, info1, info2);
+
return 0;
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trace_printk() support in trace-cmd

2010-11-23 Thread Avi Kivity

On 11/16/2010 05:13 PM, Steven Rostedt wrote:

BTW, what does /debug/tracing/printk_formats show?



Empty.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance test result between per-vhost kthread disable and enable

2010-11-23 Thread Michael S. Tsirkin
On Tue, Nov 23, 2010 at 10:13:43AM +0800, lidong chen wrote:
 I test the performance between per-vhost kthread disable and enable.
 
 Test method:
 Send the same traffic load between per-vhost kthread disable and
 enable, and compare the cpu rate of host os.
 I run five vm on kvm, each of them have five nic.
 the vhost version which per-vhost kthread disable we used is rhel6
 beta 2(2.6.32.60).
 the vhost version which per-vhost kthread enable we used is rhel6 (2.6.32-71).

At this point, I'd suggest testing vhost-net on the upstream kernel,
not on rhel kernels. The change that introduced per-device threads is:
c23f3445e68e1db0e74099f264bc5ff5d55ebdeb

 Test result:
 with per-vhost kthread disable, the cpu rate of host os is 110%.
 with per-vhost kthread enable, the cpu rate of host os is 130%.

Is CONFIG_SCHED_DEBUG set? We are stressing the scheduler a lot with
vhost-net.

 In 2.6.32.60,the whole system only have a kthread.
 [r...@rhel6-kvm1 ~]# ps -ef | grep vhost
 root   973 2  0 Nov22 ?00:00:00 [vhost]
 
 In 2.6.32.71,the whole system have 25 kthread.
 [r...@kvm-4slot ~]# ps -ef | grep vhost-
 root 12896 2  0 10:26 ?00:00:00 [vhost-12842]
 root 12897 2  0 10:26 ?00:00:00 [vhost-12842]
 root 12898 2  0 10:26 ?00:00:00 [vhost-12842]
 root 12899 2  0 10:26 ?00:00:00 [vhost-12842]
 root 12900 2  0 10:26 ?00:00:00 [vhost-12842]
 
 root 13022 2  0 10:26 ?00:00:00 [vhost-12981]
 root 13023 2  0 10:26 ?00:00:00 [vhost-12981]
 root 13024 2  0 10:26 ?00:00:00 [vhost-12981]
 root 13025 2  0 10:26 ?00:00:00 [vhost-12981]
 root 13026 2  0 10:26 ?00:00:00 [vhost-12981]
 
 root 13146 2  0 10:26 ?00:00:00 [vhost-13088]
 root 13147 2  0 10:26 ?00:00:00 [vhost-13088]
 root 13148 2  0 10:26 ?00:00:00 [vhost-13088]
 root 13149 2  0 10:26 ?00:00:00 [vhost-13088]
 root 13150 2  0 10:26 ?00:00:00 [vhost-13088]
 ...
 
 Code difference:
 In 2.6.32.60,in function vhost_init, create the kthread for vhost.
 vhost_workqueue = create_singlethread_workqueue(vhost);
 
 In 2.6.32.71,in function vhost_dev_set_owner, create the kthread for
 each nic interface.
 dev-wq = create_singlethread_workqueue(vhost_name);
 
 Conclusion:
 with per-vhost kthread enable, the system can more throughput.
 but deal the same traffic load with per-vhost kthread enable, it waste
 more cpu resource.
 
 In my application scene, the cpu resource is more important, and one
 kthread for deal with traffic load is enough.
 
 So i think we should add a param to control this.
 for the CPU-bound system, this param disable per-vhost kthread.
 for the I/O-bound system, this param enable per-vhost kthread.
 the default value of this param is enable.
 
 If my opinion is right, i will give a patch for this.

Let's try to figure out what the issue is, first.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mask bit support's API

2010-11-23 Thread Michael S. Tsirkin
On Tue, Nov 23, 2010 at 02:09:52PM +0800, Yang, Sheng wrote:
 Hi Avi,
 
 I've purposed the following API for mask bit support.
 
 The main point is, QEmu can know which entries are enabled(by 
 pci_enable_msix()). 

Unfortunately, it can't I think, unless all your guests are linux.
enabled entries is a linux kernel concept.
The MSIX spec only tells you which entries are masked and which are unmasked.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for Nov 23

2010-11-23 Thread Luiz Capitulino
On Mon, 22 Nov 2010 17:00:41 -0600
Anthony Liguori anth...@codemonkey.ws wrote:

 On 11/22/2010 03:45 PM, Chris Wright wrote:
  * Juan Quintela (quint...@redhat.com) wrote:
 
  Please send in any agenda items you are interested in covering.
   
  usb-ccid
 
 
 - vcpu hard limits

- 0.14 (release date, bug day, -rc planning, etc)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on disable_kvm_x86_64_debian_5_0

2010-11-23 Thread qemu-kvm
The Buildbot has detected a new failure of disable_kvm_x86_64_debian_5_0 on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_debian_5_0/builds/643

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Avi Kivity a...@redhat.com

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on disable_kvm_x86_64_out_of_tree

2010-11-23 Thread qemu-kvm
The Buildbot has detected a new failure of disable_kvm_x86_64_out_of_tree on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_out_of_tree/builds/592

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Avi Kivity a...@redhat.com

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on disable_kvm_i386_debian_5_0

2010-11-23 Thread qemu-kvm
The Buildbot has detected a new failure of disable_kvm_i386_debian_5_0 on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_i386_debian_5_0/builds/644

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Avi Kivity a...@redhat.com

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on default_x86_64_debian_5_0

2010-11-23 Thread qemu-kvm
The Buildbot has detected a new failure of default_x86_64_debian_5_0 on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_debian_5_0/builds/653

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Avi Kivity a...@redhat.com

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on default_x86_64_out_of_tree

2010-11-23 Thread qemu-kvm
The Buildbot has detected a new failure of default_x86_64_out_of_tree on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_out_of_tree/builds/594

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Avi Kivity a...@redhat.com

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on disable_kvm_i386_out_of_tree

2010-11-23 Thread qemu-kvm
The Buildbot has detected a new failure of disable_kvm_i386_out_of_tree on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_i386_out_of_tree/builds/592

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Avi Kivity a...@redhat.com

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mask bit support's API

2010-11-23 Thread Avi Kivity

On 11/23/2010 10:30 AM, Yang, Sheng wrote:

On Tuesday 23 November 2010 15:54:40 Avi Kivity wrote:
  On 11/23/2010 08:35 AM, Yang, Sheng wrote:
On Tuesday 23 November 2010 14:17:28 Avi Kivity wrote:
   On 11/23/2010 08:09 AM, Yang, Sheng wrote:
  Hi Avi,
   
  I've purposed the following API for mask bit support.
   
  The main point is, QEmu can know which entries are enabled(by
  pci_enable_msix()). And for enabled entries, kernel own it,
  including MSI data/address and mask bit(routing table and mask
  bitmap). QEmu should use KVM_GET_MSIX_ENTRY ioctl to get them(and
  it can sync with them if it want to do so).
   
  Before entries are enabled, QEmu can still use it's own MSI
  table(because we didn't contain these kind of information in
  kernel, and it's unnecessary for kernel).
   
  The KVM_MSIX_FLAG_ENTRY flag would be clear if QEmu want to query
  one entry didn't exist in kernel - or we can simply return -EINVAL
  for it.
   
  I suppose it would be rare for QEmu to use this interface to get
  the context of entry(the only case I think is when MSI-X disable
  and QEmu need to sync the context), so performance should not be
  an issue.
   
  What's your opinion?
   
  #define KVM_GET_MSIX_ENTRY_IOWR(KVMIO,  0x7d, struct
  kvm_msix_entry)

   Need SET_MSIX_ENTRY for live migration as well.
  
Current we don't support LM with VT-d...

  Isn't this work useful for virtio as well?

Yeah, but won't be included in this patchset.


What API changes are needed?  I'd like to see the complete API.


   What about the pending bits?
  
We didn't cover it here - and it's in another MMIO space(PBA). Of course
we can add more flags for it later.

  When an entry is masked, we need to set the pending bit for it
  somewhere.  I guess this is broken in the existing code (without your
  patches)?

Even with my patch, we didn't support the pending bit. It would always return 0
now. What we supposed to do(after my patch checked in) is to check IRQ_PENDING 
flag
of irq_desc-status(if the entry is masked), and return the result to userspace.

That would involve some core change, like to export irq_to_desc(). I don't think
it would be accepted soon, so would push mask bit first.


The API needs to be compatible with the pending bit, even if we don't 
implement it now.  I want to reduce the rate of API changes.




   Also need a new exit reason to tell userspace that an msix entry has
   changed, so userspace can update mappings.
  
I think we don't need it. Whenever userspace want to get one mapping
which is an enabled MSI-X entry, it can check it with the API
above(which is quite rare, because kernel would handle all of them when
guest is accessing them). If it's a disabled entry, the context inside
userspace MMIO record is the correct one(and only one). The only place I
think QEmu need to sync is when MSI-X is about to disabled, QEmu need to
update it's own MMIO record.

  So in-kernel handling of mmio would be decided per entry?  I'm trying to
  simplify this, and simplest thing is - all or nothing.

So you would like to handle all MSI-X MMIO in kernel?


Yes.  Writes to address or data would be handled by:
- recording it into the shadow msix table
- notifying userspace that msix entry x changed
Reads would be handled in kernel from the shadow msix table.

So instead of

- guest reads/writes msix
- kvm filters mmio, implements some, passes others to userspace

we have

- guest reads/writes msix
- kvm implements all
- some writes generate an additional notification to userspace


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on default_i386_debian_5_0

2010-11-23 Thread qemu-kvm
The Buildbot has detected a new failure of default_i386_debian_5_0 on qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_debian_5_0/builds/655

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Avi Kivity a...@redhat.com

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on default_i386_out_of_tree

2010-11-23 Thread qemu-kvm
The Buildbot has detected a new failure of default_i386_out_of_tree on qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_out_of_tree/builds/592

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Avi Kivity a...@redhat.com

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for Nov 23

2010-11-23 Thread Michael Tokarev
23.11.2010 15:08, Luiz Capitulino wrote:
[]
 - 0.14 (release date, bug day, -rc planning, etc)

Um, can we have some 0.13.x before, please?.. :)

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mask bit support's API

2010-11-23 Thread Michael S. Tsirkin
On Tue, Nov 23, 2010 at 02:47:33PM +0200, Avi Kivity wrote:
 On 11/23/2010 10:30 AM, Yang, Sheng wrote:
 On Tuesday 23 November 2010 15:54:40 Avi Kivity wrote:
   On 11/23/2010 08:35 AM, Yang, Sheng wrote:
 On Tuesday 23 November 2010 14:17:28 Avi Kivity wrote:
On 11/23/2010 08:09 AM, Yang, Sheng wrote:
   Hi Avi,

   I've purposed the following API for mask bit support.

   The main point is, QEmu can know which entries are enabled(by
   pci_enable_msix()). And for enabled entries, kernel own it,
   including MSI data/address and mask bit(routing table and mask
   bitmap). QEmu should use KVM_GET_MSIX_ENTRY ioctl to get 
  them(and
   it can sync with them if it want to do so).

   Before entries are enabled, QEmu can still use it's own MSI
   table(because we didn't contain these kind of information in
   kernel, and it's unnecessary for kernel).

   The KVM_MSIX_FLAG_ENTRY flag would be clear if QEmu want to 
  query
   one entry didn't exist in kernel - or we can simply return 
  -EINVAL
   for it.

   I suppose it would be rare for QEmu to use this interface to 
  get
   the context of entry(the only case I think is when MSI-X 
  disable
   and QEmu need to sync the context), so performance should not 
  be
   an issue.

   What's your opinion?

   #define KVM_GET_MSIX_ENTRY_IOWR(KVMIO,  0x7d, 
  struct
   kvm_msix_entry)
 
Need SET_MSIX_ENTRY for live migration as well.
   
 Current we don't support LM with VT-d...
 
   Isn't this work useful for virtio as well?
 
 Yeah, but won't be included in this patchset.
 
 What API changes are needed?  I'd like to see the complete API.
 
What about the pending bits?
   
 We didn't cover it here - and it's in another MMIO space(PBA). Of 
  course
 we can add more flags for it later.
 
   When an entry is masked, we need to set the pending bit for it
   somewhere.  I guess this is broken in the existing code (without your
   patches)?
 
 Even with my patch, we didn't support the pending bit. It would always 
 return 0
 now. What we supposed to do(after my patch checked in) is to check 
 IRQ_PENDING flag
 of irq_desc-status(if the entry is masked), and return the result to 
 userspace.
 
 That would involve some core change, like to export irq_to_desc(). I don't 
 think
 it would be accepted soon, so would push mask bit first.
 
 The API needs to be compatible with the pending bit, even if we
 don't implement it now.  I want to reduce the rate of API changes.
 
 
Also need a new exit reason to tell userspace that an msix entry 
  has
changed, so userspace can update mappings.
   
 I think we don't need it. Whenever userspace want to get one mapping
 which is an enabled MSI-X entry, it can check it with the API
 above(which is quite rare, because kernel would handle all of them when
 guest is accessing them). If it's a disabled entry, the context inside
 userspace MMIO record is the correct one(and only one). The only place 
  I
 think QEmu need to sync is when MSI-X is about to disabled, QEmu need 
  to
 update it's own MMIO record.
 
   So in-kernel handling of mmio would be decided per entry?  I'm trying to
   simplify this, and simplest thing is - all or nothing.
 
 So you would like to handle all MSI-X MMIO in kernel?
 
 Yes.  Writes to address or data would be handled by:
 - recording it into the shadow msix table
 - notifying userspace that msix entry x changed
 Reads would be handled in kernel from the shadow msix table.
 
 So instead of
 
 - guest reads/writes msix
 - kvm filters mmio, implements some, passes others to userspace
 
 we have
 
 - guest reads/writes msix
 - kvm implements all
 - some writes generate an additional notification to userspace

One small proposal in addition: since all accesses are done from guest
anyway, the shadow table can/should be stored using userspace memory,
reducing the kernel memory overhead of the feature from up to 4K per
MSIX table to just 8 bytes.

Active entries can be cached in kernel memory.

 
 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance test result between per-vhost kthread disable and enable

2010-11-23 Thread lidong chen
At this point, I'd suggest testing vhost-net on the upstream kernel,
not on rhel kernels. The change that introduced per-device threads is:
c23f3445e68e1db0e74099f264bc5ff5d55ebdeb
i will try this tomorrow.

Is CONFIG_SCHED_DEBUG set?
yes. CONFIG_SCHED_DEBUG=y.

2010/11/23 Michael S. Tsirkin m...@redhat.com:
 On Tue, Nov 23, 2010 at 10:13:43AM +0800, lidong chen wrote:
 I test the performance between per-vhost kthread disable and enable.

 Test method:
 Send the same traffic load between per-vhost kthread disable and
 enable, and compare the cpu rate of host os.
 I run five vm on kvm, each of them have five nic.
 the vhost version which per-vhost kthread disable we used is rhel6
 beta 2(2.6.32.60).
 the vhost version which per-vhost kthread enable we used is rhel6 
 (2.6.32-71).

 At this point, I'd suggest testing vhost-net on the upstream kernel,
 not on rhel kernels. The change that introduced per-device threads is:
 c23f3445e68e1db0e74099f264bc5ff5d55ebdeb

 Test result:
 with per-vhost kthread disable, the cpu rate of host os is 110%.
 with per-vhost kthread enable, the cpu rate of host os is 130%.

 Is CONFIG_SCHED_DEBUG set? We are stressing the scheduler a lot with
 vhost-net.

 In 2.6.32.60,the whole system only have a kthread.
 [r...@rhel6-kvm1 ~]# ps -ef | grep vhost
 root       973     2  0 Nov22 ?        00:00:00 [vhost]

 In 2.6.32.71,the whole system have 25 kthread.
 [r...@kvm-4slot ~]# ps -ef | grep vhost-
 root     12896     2  0 10:26 ?        00:00:00 [vhost-12842]
 root     12897     2  0 10:26 ?        00:00:00 [vhost-12842]
 root     12898     2  0 10:26 ?        00:00:00 [vhost-12842]
 root     12899     2  0 10:26 ?        00:00:00 [vhost-12842]
 root     12900     2  0 10:26 ?        00:00:00 [vhost-12842]

 root     13022     2  0 10:26 ?        00:00:00 [vhost-12981]
 root     13023     2  0 10:26 ?        00:00:00 [vhost-12981]
 root     13024     2  0 10:26 ?        00:00:00 [vhost-12981]
 root     13025     2  0 10:26 ?        00:00:00 [vhost-12981]
 root     13026     2  0 10:26 ?        00:00:00 [vhost-12981]

 root     13146     2  0 10:26 ?        00:00:00 [vhost-13088]
 root     13147     2  0 10:26 ?        00:00:00 [vhost-13088]
 root     13148     2  0 10:26 ?        00:00:00 [vhost-13088]
 root     13149     2  0 10:26 ?        00:00:00 [vhost-13088]
 root     13150     2  0 10:26 ?        00:00:00 [vhost-13088]
 ...

 Code difference:
 In 2.6.32.60,in function vhost_init, create the kthread for vhost.
 vhost_workqueue = create_singlethread_workqueue(vhost);

 In 2.6.32.71,in function vhost_dev_set_owner, create the kthread for
 each nic interface.
 dev-wq = create_singlethread_workqueue(vhost_name);

 Conclusion:
 with per-vhost kthread enable, the system can more throughput.
 but deal the same traffic load with per-vhost kthread enable, it waste
 more cpu resource.

 In my application scene, the cpu resource is more important, and one
 kthread for deal with traffic load is enough.

 So i think we should add a param to control this.
 for the CPU-bound system, this param disable per-vhost kthread.
 for the I/O-bound system, this param enable per-vhost kthread.
 the default value of this param is enable.

 If my opinion is right, i will give a patch for this.

 Let's try to figure out what the issue is, first.

 --
 MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 02/22] bitops: rename generic little-endian bitops functions

2010-11-23 Thread Akinobu Mita
As a preparation for providing little-endian bitops for all architectures,
This removes generic_ prefix from little-endian bitops function names
in asm-generic/bitops/le.h.

s/generic_find_next_le_bit/find_next_le_bit/
s/generic_find_next_zero_le_bit/find_next_zero_le_bit/
s/generic_find_first_zero_le_bit/find_first_zero_le_bit/
s/generic___test_and_set_le_bit/__test_and_set_le_bit/
s/generic___test_and_clear_le_bit/__test_and_clear_le_bit/
s/generic_test_le_bit/test_le_bit/
s/generic___set_le_bit/__set_le_bit/
s/generic___clear_le_bit/__clear_le_bit/
s/generic_test_and_set_le_bit/test_and_set_le_bit/
s/generic_test_and_clear_le_bit/test_and_clear_le_bit/

Signed-off-by: Akinobu Mita akinobu.m...@gmail.com
Acked-by: Arnd Bergmann a...@arndb.de
Acked-by: Hans-Christian Egtvedt hans-christian.egtv...@atmel.com
Cc: Geert Uytterhoeven ge...@linux-m68k.org
Cc: Roman Zippel zip...@linux-m68k.org
Cc: Andreas Schwab sch...@linux-m68k.org
Cc: linux-m...@lists.linux-m68k.org
Cc: Greg Ungerer g...@uclinux.org
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: Paul Mackerras pau...@samba.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: Andy Grover andy.gro...@oracle.com
Cc: rds-de...@oss.oracle.com
Cc: David S. Miller da...@davemloft.net
Cc: net...@vger.kernel.org
Cc: Avi Kivity a...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com
Cc: kvm@vger.kernel.org
---
No change from previous submission
 arch/avr32/kernel/avr32_ksyms.c  |4 ++--
 arch/avr32/lib/findbit.S |4 ++--
 arch/m68k/include/asm/bitops_mm.h|8 
 arch/m68k/include/asm/bitops_no.h|2 +-
 arch/powerpc/include/asm/bitops.h|   11 ++-
 include/asm-generic/bitops/ext2-non-atomic.h |   12 ++--
 include/asm-generic/bitops/le.h  |   26 +-
 include/asm-generic/bitops/minix-le.h|   10 +-
 lib/find_next_bit.c  |9 -
 net/rds/cong.c   |6 +++---
 virt/kvm/kvm_main.c  |2 +-
 11 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/arch/avr32/kernel/avr32_ksyms.c b/arch/avr32/kernel/avr32_ksyms.c
index 11e310c..c63b943 100644
--- a/arch/avr32/kernel/avr32_ksyms.c
+++ b/arch/avr32/kernel/avr32_ksyms.c
@@ -58,8 +58,8 @@ EXPORT_SYMBOL(find_first_zero_bit);
 EXPORT_SYMBOL(find_next_zero_bit);
 EXPORT_SYMBOL(find_first_bit);
 EXPORT_SYMBOL(find_next_bit);
-EXPORT_SYMBOL(generic_find_next_le_bit);
-EXPORT_SYMBOL(generic_find_next_zero_le_bit);
+EXPORT_SYMBOL(find_next_le_bit);
+EXPORT_SYMBOL(find_next_zero_le_bit);
 
 /* I/O primitives (lib/io-*.S) */
 EXPORT_SYMBOL(__raw_readsb);
diff --git a/arch/avr32/lib/findbit.S b/arch/avr32/lib/findbit.S
index 997b33b..6880d85 100644
--- a/arch/avr32/lib/findbit.S
+++ b/arch/avr32/lib/findbit.S
@@ -123,7 +123,7 @@ ENTRY(find_next_bit)
brgt1b
retal   r11
 
-ENTRY(generic_find_next_le_bit)
+ENTRY(find_next_le_bit)
lsr r8, r10, 5
sub r9, r11, r10
retle   r11
@@ -153,7 +153,7 @@ ENTRY(generic_find_next_le_bit)
brgt1b
retal   r11
 
-ENTRY(generic_find_next_zero_le_bit)
+ENTRY(find_next_zero_le_bit)
lsr r8, r10, 5
sub r9, r11, r10
retle   r11
diff --git a/arch/m68k/include/asm/bitops_mm.h 
b/arch/m68k/include/asm/bitops_mm.h
index b4ecdaa..f1010ab 100644
--- a/arch/m68k/include/asm/bitops_mm.h
+++ b/arch/m68k/include/asm/bitops_mm.h
@@ -366,9 +366,9 @@ static inline int minix_test_bit(int nr, const void *vaddr)
 #define ext2_clear_bit(nr, addr)   __test_and_clear_bit((nr) ^ 24, 
(unsigned long *)(addr))
 #define ext2_clear_bit_atomic(lock, nr, addr)  test_and_clear_bit((nr) ^ 24, 
(unsigned long *)(addr))
 #define ext2_find_next_zero_bit(addr, size, offset) \
-   generic_find_next_zero_le_bit((unsigned long *)addr, size, offset)
+   find_next_zero_le_bit((unsigned long *)addr, size, offset)
 #define ext2_find_next_bit(addr, size, offset) \
-   generic_find_next_le_bit((unsigned long *)addr, size, offset)
+   find_next_le_bit((unsigned long *)addr, size, offset)
 
 static inline int ext2_test_bit(int nr, const void *vaddr)
 {
@@ -398,7 +398,7 @@ static inline int ext2_find_first_zero_bit(const void 
*vaddr, unsigned size)
return (p - addr) * 32 + res;
 }
 
-static inline unsigned long generic_find_next_zero_le_bit(const unsigned long 
*addr,
+static inline unsigned long find_next_zero_le_bit(const unsigned long *addr,
unsigned long size, unsigned long offset)
 {
const unsigned long *p = addr + (offset  5);
@@ -440,7 +440,7 @@ static inline int ext2_find_first_bit(const void *vaddr, 
unsigned size)
return (p - addr) * 32 + res;
 }
 
-static inline unsigned long generic_find_next_le_bit(const unsigned long *addr,
+static inline unsigned long find_next_le_bit(const unsigned long *addr,
unsigned long size, 

Re: Performance test result between per-vhost kthread disable and enable

2010-11-23 Thread Michael S. Tsirkin
On Tue, Nov 23, 2010 at 09:23:41PM +0800, lidong chen wrote:
 At this point, I'd suggest testing vhost-net on the upstream kernel,
 not on rhel kernels. The change that introduced per-device threads is:
 c23f3445e68e1db0e74099f264bc5ff5d55ebdeb
 i will try this tomorrow.
 
 Is CONFIG_SCHED_DEBUG set?
 yes. CONFIG_SCHED_DEBUG=y.

Disable it. Either debug scheduler or perf-test it :)

 2010/11/23 Michael S. Tsirkin m...@redhat.com:
  On Tue, Nov 23, 2010 at 10:13:43AM +0800, lidong chen wrote:
  I test the performance between per-vhost kthread disable and enable.
 
  Test method:
  Send the same traffic load between per-vhost kthread disable and
  enable, and compare the cpu rate of host os.
  I run five vm on kvm, each of them have five nic.
  the vhost version which per-vhost kthread disable we used is rhel6
  beta 2(2.6.32.60).
  the vhost version which per-vhost kthread enable we used is rhel6 
  (2.6.32-71).
 
  At this point, I'd suggest testing vhost-net on the upstream kernel,
  not on rhel kernels. The change that introduced per-device threads is:
  c23f3445e68e1db0e74099f264bc5ff5d55ebdeb
 
  Test result:
  with per-vhost kthread disable, the cpu rate of host os is 110%.
  with per-vhost kthread enable, the cpu rate of host os is 130%.
 
  Is CONFIG_SCHED_DEBUG set? We are stressing the scheduler a lot with
  vhost-net.
 
  In 2.6.32.60,the whole system only have a kthread.
  [r...@rhel6-kvm1 ~]# ps -ef | grep vhost
  root       973     2  0 Nov22 ?        00:00:00 [vhost]
 
  In 2.6.32.71,the whole system have 25 kthread.
  [r...@kvm-4slot ~]# ps -ef | grep vhost-
  root     12896     2  0 10:26 ?        00:00:00 [vhost-12842]
  root     12897     2  0 10:26 ?        00:00:00 [vhost-12842]
  root     12898     2  0 10:26 ?        00:00:00 [vhost-12842]
  root     12899     2  0 10:26 ?        00:00:00 [vhost-12842]
  root     12900     2  0 10:26 ?        00:00:00 [vhost-12842]
 
  root     13022     2  0 10:26 ?        00:00:00 [vhost-12981]
  root     13023     2  0 10:26 ?        00:00:00 [vhost-12981]
  root     13024     2  0 10:26 ?        00:00:00 [vhost-12981]
  root     13025     2  0 10:26 ?        00:00:00 [vhost-12981]
  root     13026     2  0 10:26 ?        00:00:00 [vhost-12981]
 
  root     13146     2  0 10:26 ?        00:00:00 [vhost-13088]
  root     13147     2  0 10:26 ?        00:00:00 [vhost-13088]
  root     13148     2  0 10:26 ?        00:00:00 [vhost-13088]
  root     13149     2  0 10:26 ?        00:00:00 [vhost-13088]
  root     13150     2  0 10:26 ?        00:00:00 [vhost-13088]
  ...
 
  Code difference:
  In 2.6.32.60,in function vhost_init, create the kthread for vhost.
  vhost_workqueue = create_singlethread_workqueue(vhost);
 
  In 2.6.32.71,in function vhost_dev_set_owner, create the kthread for
  each nic interface.
  dev-wq = create_singlethread_workqueue(vhost_name);
 
  Conclusion:
  with per-vhost kthread enable, the system can more throughput.
  but deal the same traffic load with per-vhost kthread enable, it waste
  more cpu resource.
 
  In my application scene, the cpu resource is more important, and one
  kthread for deal with traffic load is enough.
 
  So i think we should add a param to control this.
  for the CPU-bound system, this param disable per-vhost kthread.
  for the I/O-bound system, this param enable per-vhost kthread.
  the default value of this param is enable.
 
  If my opinion is right, i will give a patch for this.
 
  Let's try to figure out what the issue is, first.
 
  --
  MST
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 09/22] kvm: stop including asm-generic/bitops/le.h

2010-11-23 Thread Akinobu Mita
No need to include asm-generic/bitops/le.h as all architectures
provide little-endian bit operations now.

Signed-off-by: Akinobu Mita akinobu.m...@gmail.com
Cc: Avi Kivity a...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com
Cc: kvm@vger.kernel.org
---
No change from previous submission
 virt/kvm/kvm_main.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index da16155..57a7e3d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -52,7 +52,6 @@
 #include asm/io.h
 #include asm/uaccess.h
 #include asm/pgtable.h
-#include asm-generic/bitops/le.h
 
 #include coalesced_mmio.h
 
-- 
1.7.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands

2010-11-23 Thread Anthony Liguori

On 11/23/2010 12:41 AM, Avi Kivity wrote:

On 11/23/2010 01:00 AM, Anthony Liguori wrote:
qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of 
teaching
them to respond to these signals, introduce monitor commands that 
stop and start

individual vcpus.

The purpose of these commands are to implement CPU hard limits using 
an external

tool that watches the CPU consumption and stops the CPU as appropriate.

The monitor commands provide a more elegant solution that signals 
because it

ensures that a stopped vcpu isn't holding the qemu_mutex.



From signal(7):

  The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.

Perhaps this is a bug in kvm?


I need to dig deeper than.

Maybe its something about sending SIGSTOP to a process?



If we could catch SIGSTOP, then it would be easy to unblock it only 
while running in guest context. It would then stop on exit to userspace.


Yeah, that's not a bad idea.

Using monitor commands is fairly heavyweight for something as high 
frequency as this.  What control period do you see people using?  
Maybe we should define USR1 for vcpu start/stop.


What happens if one vcpu is stopped while another is running?  Spin 
loops, synchronous IPIs will take forever.  Maybe we need to stop the 
entire process.


It's the same problem if a VCPU is descheduled while another is 
running.  The problem with stopping the entire process is that a big 
motivation for this is to ensure that benchmarks have consistent results 
regardless of CPU capacity.  If you just monitor the full process, then 
one VCPU may dominate the entitlement resulting in very erratic 
benchmarking.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mask bit support's API

2010-11-23 Thread Yang, Sheng
On Tuesday 23 November 2010 20:47:33 Avi Kivity wrote:
 On 11/23/2010 10:30 AM, Yang, Sheng wrote:
  On Tuesday 23 November 2010 15:54:40 Avi Kivity wrote:
On 11/23/2010 08:35 AM, Yang, Sheng wrote:
  On Tuesday 23 November 2010 14:17:28 Avi Kivity wrote:
 On 11/23/2010 08:09 AM, Yang, Sheng wrote:
Hi Avi,

I've purposed the following API for mask bit support.

The main point is, QEmu can know which entries are
enabled(by pci_enable_msix()). And for enabled entries,
kernel own it, including MSI data/address and mask
bit(routing table and mask bitmap). QEmu should use
KVM_GET_MSIX_ENTRY ioctl to get them(and it can sync with
them if it want to do so).

Before entries are enabled, QEmu can still use it's own MSI
table(because we didn't contain these kind of information
in kernel, and it's unnecessary for kernel).

The KVM_MSIX_FLAG_ENTRY flag would be clear if QEmu want to
query one entry didn't exist in kernel - or we can simply
return -EINVAL for it.

I suppose it would be rare for QEmu to use this interface
to get the context of entry(the only case I think is when
MSI-X disable and QEmu need to sync the context), so
performance should not be an issue.

What's your opinion?

#define KVM_GET_MSIX_ENTRY_IOWR(KVMIO,  0x7d,
struct kvm_msix_entry)
 
 Need SET_MSIX_ENTRY for live migration as well.
  
  Current we don't support LM with VT-d...

Isn't this work useful for virtio as well?
  
  Yeah, but won't be included in this patchset.
 
 What API changes are needed?  I'd like to see the complete API.

I am not sure about it. But I suppose the structure should be the same? In fact 
it's pretty hard for me to image what's needed for virtio in the future, 
especially there is no such code now. I really prefer to deal with assigned 
device 
and virtio separately, which would make the work much easier. But seems you 
won't 
agree on that.

 
 What about the pending bits?
  
  We didn't cover it here - and it's in another MMIO space(PBA). Of
  course we can add more flags for it later.

When an entry is masked, we need to set the pending bit for it
somewhere.  I guess this is broken in the existing code (without your
patches)?
  
  Even with my patch, we didn't support the pending bit. It would always
  return 0 now. What we supposed to do(after my patch checked in) is to
  check IRQ_PENDING flag of irq_desc-status(if the entry is masked), and
  return the result to userspace.
  
  That would involve some core change, like to export irq_to_desc(). I
  don't think it would be accepted soon, so would push mask bit first.
 
 The API needs to be compatible with the pending bit, even if we don't
 implement it now.  I want to reduce the rate of API changes.

This can be implemented by this API, just adding a flag for it. And I would 
still 
take this into consideration in the next API purposal.
 
 Also need a new exit reason to tell userspace that an msix
 entry has changed, so userspace can update mappings.
  
  I think we don't need it. Whenever userspace want to get one
  mapping which is an enabled MSI-X entry, it can check it with the
  API above(which is quite rare, because kernel would handle all of
  them when guest is accessing them). If it's a disabled entry, the
  context inside userspace MMIO record is the correct one(and only
  one). The only place I think QEmu need to sync is when MSI-X is
  about to disabled, QEmu need to update it's own MMIO record.

So in-kernel handling of mmio would be decided per entry?  I'm trying
to simplify this, and simplest thing is - all or nothing.
  
  So you would like to handle all MSI-X MMIO in kernel?
 
 Yes.  Writes to address or data would be handled by:
 - recording it into the shadow msix table
 - notifying userspace that msix entry x changed
 Reads would be handled in kernel from the shadow msix table.
 
 So instead of
 
 - guest reads/writes msix
 - kvm filters mmio, implements some, passes others to userspace
 
 we have
 
 - guest reads/writes msix
 - kvm implements all
 - some writes generate an additional notification to userspace

I suppose we don't need to generate notification to userspace? Because every 
read/write is handled by kernel, and userspace just need interface to kernel to 
get/set the entry - and well, does userspace need to do it when kernel can 
handle 
all of them? Maybe not...

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands

2010-11-23 Thread Anthony Liguori

On 11/23/2010 02:16 AM, Dor Laor wrote:

On 11/23/2010 08:41 AM, Avi Kivity wrote:

On 11/23/2010 01:00 AM, Anthony Liguori wrote:

qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT. Instead of
teaching
them to respond to these signals, introduce monitor commands that stop
and start
individual vcpus.

The purpose of these commands are to implement CPU hard limits using
an external
tool that watches the CPU consumption and stops the CPU as appropriate.


Why not use cgroup for that?


This is a stop-gap.

The cgroup solution isn't perfect.  It doesn't know anything about guest 
time verses hypervisor time so it can't account just the guest time like 
we do with this implementation.  Also, since it may deschedule the vcpu 
thread while it's holding the qemu_mutex, it may unfairly tax other vcpu 
threads by creating additional lock contention.


This is all solvable but if there's an alternative that just requires a 
small change to qemu, it's worth doing in the short term.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands

2010-11-23 Thread Avi Kivity

On 11/23/2010 03:51 PM, Anthony Liguori wrote:

On 11/23/2010 12:41 AM, Avi Kivity wrote:

On 11/23/2010 01:00 AM, Anthony Liguori wrote:
qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of 
teaching
them to respond to these signals, introduce monitor commands that 
stop and start

individual vcpus.

The purpose of these commands are to implement CPU hard limits using 
an external

tool that watches the CPU consumption and stops the CPU as appropriate.

The monitor commands provide a more elegant solution that signals 
because it

ensures that a stopped vcpu isn't holding the qemu_mutex.



From signal(7):

  The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.

Perhaps this is a bug in kvm?


I need to dig deeper than.


Signals are a bottomless pit.


Maybe its something about sending SIGSTOP to a process?


AFAIK sending SIGSTOP to a process should stop all of its threads?  
SIGSTOPping a thread should also work.




If we could catch SIGSTOP, then it would be easy to unblock it only 
while running in guest context. It would then stop on exit to userspace.


Yeah, that's not a bad idea.


Except we can't.



Using monitor commands is fairly heavyweight for something as high 
frequency as this.  What control period do you see people using?  
Maybe we should define USR1 for vcpu start/stop.


What happens if one vcpu is stopped while another is running?  Spin 
loops, synchronous IPIs will take forever.  Maybe we need to stop the 
entire process.


It's the same problem if a VCPU is descheduled while another is running. 


We can fix that with directed yield or lock holder preemption 
prevention.  But if a vcpu is stopped by qemu, we suddenly can't.


The problem with stopping the entire process is that a big motivation 
for this is to ensure that benchmarks have consistent results 
regardless of CPU capacity.  If you just monitor the full process, 
then one VCPU may dominate the entitlement resulting in very erratic 
benchmarking.


What's the desired behaviour?  Give each vcpu 300M cycles per second, or 
give a 2vcpu guest 600M cycles per second?


You could monitor threads separately but stop the entire process.  
Stopping individual threads will break apart as soon as they start 
taking locks.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mask bit support's API

2010-11-23 Thread Yang, Sheng
On Tuesday 23 November 2010 20:04:16 Michael S. Tsirkin wrote:
 On Tue, Nov 23, 2010 at 02:09:52PM +0800, Yang, Sheng wrote:
  Hi Avi,
  
  I've purposed the following API for mask bit support.
  
  The main point is, QEmu can know which entries are enabled(by
  pci_enable_msix()).
 
 Unfortunately, it can't I think, unless all your guests are linux.
 enabled entries is a linux kernel concept.
 The MSIX spec only tells you which entries are masked and which are
 unmasked.

Can't understand what you are talking about, and how it related to the guest 
OS. I 
was talking about pci_enable_msix() in the host Linux.

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mask bit support's API

2010-11-23 Thread Avi Kivity

On 11/23/2010 03:57 PM, Yang, Sheng wrote:

  
Yeah, but won't be included in this patchset.

  What API changes are needed?  I'd like to see the complete API.

I am not sure about it. But I suppose the structure should be the same? In fact
it's pretty hard for me to image what's needed for virtio in the future,
especially there is no such code now. I really prefer to deal with assigned 
device
and virtio separately, which would make the work much easier. But seems you 
won't
agree on that.


First, I don't really see why the two cases are different (but I don't 
do a lot in this space).  Surely between you and Michael, you have all 
the information?


Second, my worry is a huge number of ABI variants that come from 
incrementally adding features.  I want to implement bigger chunks of 
functionality.  So I'd like to see all potential users addressed, at 
least from the ABI point of view if not the implementation.



  The API needs to be compatible with the pending bit, even if we don't
  implement it now.  I want to reduce the rate of API changes.

This can be implemented by this API, just adding a flag for it. And I would 
still
take this into consideration in the next API purposal.


Shouldn't kvm also service reads from the pending bitmask?



  So instead of

  - guest reads/writes msix
  - kvm filters mmio, implements some, passes others to userspace

  we have

  - guest reads/writes msix
  - kvm implements all
  - some writes generate an additional notification to userspace

I suppose we don't need to generate notification to userspace? Because every
read/write is handled by kernel, and userspace just need interface to kernel to
get/set the entry - and well, does userspace need to do it when kernel can 
handle
all of them? Maybe not...


We could have the kernel handle addr/data writes by setting up an 
internal interrupt routing.  A disadvantage is that more work is needed 
if we emulator interrupt remapping in qemu.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands

2010-11-23 Thread Anthony Liguori

On 11/23/2010 08:00 AM, Avi Kivity wrote:


If we could catch SIGSTOP, then it would be easy to unblock it only 
while running in guest context. It would then stop on exit to 
userspace.


Yeah, that's not a bad idea.


Except we can't.


Yeah, I s:SIGSTOP:SIGUSR1:g.



Using monitor commands is fairly heavyweight for something as high 
frequency as this.  What control period do you see people using?  
Maybe we should define USR1 for vcpu start/stop.


What happens if one vcpu is stopped while another is running?  Spin 
loops, synchronous IPIs will take forever.  Maybe we need to stop 
the entire process.


It's the same problem if a VCPU is descheduled while another is running. 


We can fix that with directed yield or lock holder preemption 
prevention.  But if a vcpu is stopped by qemu, we suddenly can't.


That only works for spin locks.

Here's the scenario:

1) VCPU 0 drops to userspace and acquires qemu_mutex
2) VCPU 0 gets descheduled
3) VCPU 1 needs to drop to userspace and acquire qemu_mutex, gets 
blocked and yields
4) If we're lucky, VCPU 0 gets scheduled but it depends on how busy the 
system is


With CFS hard limits, once (2) happens, we're boned for (3) because (4) 
cannot happen.  By having QEMU know about (2), it can choose to run just 
a little bit longer in order to drop qemu_mutex such that (3) never happens.




The problem with stopping the entire process is that a big motivation 
for this is to ensure that benchmarks have consistent results 
regardless of CPU capacity.  If you just monitor the full process, 
then one VCPU may dominate the entitlement resulting in very erratic 
benchmarking.


What's the desired behaviour?  Give each vcpu 300M cycles per second, 
or give a 2vcpu guest 600M cycles per second?


Each vcpu gets 300M cycles per second.

You could monitor threads separately but stop the entire process.  
Stopping individual threads will break apart as soon as they start 
taking locks.


I don't think so..  PLE should work as expected.  It's no different than 
a normally contended system.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trace_printk() support in trace-cmd

2010-11-23 Thread Steven Rostedt
On Tue, 2010-11-23 at 13:04 +0200, Avi Kivity wrote:
 On 11/16/2010 05:13 PM, Steven Rostedt wrote:
  BTW, what does /debug/tracing/printk_formats show?
 
 
 Empty.
 

So you have real trace_printk's not bprintk's?

That is, if the format is not a const, then we fall back to
__trace_printk(_THIS_IP_, fmt, args);

And this is a different object. I have not tested these in a while, I'll
give it a try.

But if your printks are bprintks, then the bug is in the kernel, since
that printk_formats needs to show something.

-- Steve

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands

2010-11-23 Thread Avi Kivity

On 11/23/2010 04:24 PM, Anthony Liguori wrote:




Using monitor commands is fairly heavyweight for something as high 
frequency as this.  What control period do you see people using?  
Maybe we should define USR1 for vcpu start/stop.


What happens if one vcpu is stopped while another is running?  Spin 
loops, synchronous IPIs will take forever.  Maybe we need to stop 
the entire process.


It's the same problem if a VCPU is descheduled while another is 
running. 


We can fix that with directed yield or lock holder preemption 
prevention.  But if a vcpu is stopped by qemu, we suddenly can't.


That only works for spin locks.

Here's the scenario:

1) VCPU 0 drops to userspace and acquires qemu_mutex
2) VCPU 0 gets descheduled
3) VCPU 1 needs to drop to userspace and acquire qemu_mutex, gets 
blocked and yields
4) If we're lucky, VCPU 0 gets scheduled but it depends on how busy 
the system is


With CFS hard limits, once (2) happens, we're boned for (3) because 
(4) cannot happen.  By having QEMU know about (2), it can choose to 
run just a little bit longer in order to drop qemu_mutex such that (3) 
never happens.


There's some support for futex priority inheritance, perhaps we can 
leverage that.  It's supposed to be for realtime threads, but perhaps we 
can hook the priority booster to directed yield.


It's really the same problem -- preempted lock holder -- only in 
userspace.  We should be able to use the same solution.






The problem with stopping the entire process is that a big 
motivation for this is to ensure that benchmarks have consistent 
results regardless of CPU capacity.  If you just monitor the full 
process, then one VCPU may dominate the entitlement resulting in 
very erratic benchmarking.


What's the desired behaviour?  Give each vcpu 300M cycles per second, 
or give a 2vcpu guest 600M cycles per second?


Each vcpu gets 300M cycles per second.

You could monitor threads separately but stop the entire process.  
Stopping individual threads will break apart as soon as they start 
taking locks.


I don't think so..  PLE should work as expected.  It's no different 
than a normally contended system.




PLE without directed yield is useless.  With directed yield, it may 
work, but if the vcpu is stopped, it becomes ineffective.


Directed yield allows the scheduler to follow a bouncing lock around by 
increasing the priority (or decreasing vruntime) of the immediate lock 
holder at the expense of waiters.  SIGSTOP may drop the priority of the 
lock holder to zero without giving PLE a way to adjust.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Nov 23

2010-11-23 Thread Kevin Wolf
Am 22.11.2010 14:55, schrieb Stefan Hajnoczi:
 On Mon, Nov 22, 2010 at 1:38 PM, Juan Quintela quint...@redhat.com wrote:

 Please send in any agenda items you are interested in covering.
 
 QCOW2 performance roadmap:
 * What can be done to achieve near-raw image format performance?
 * Benchmark results from an ideal QCOW2 model.

Some thoughts on qcow2 performance:

== Fully allocated image ==
Should be able to perform similar to raw because there is very little
handling of metadata. Additional I/O only if an L2 table must be read
from the disk.

* Should we increase the L2 table cache size to make it happen less
often? (Currently 16 * 512 MB, QED uses more)

Known problems:
* Synchronous read of L2 tables; should be made async
** General thought on making things async: Coroutines? What happened to
that proposal?
* We may want to have online defragmentation eventually

== Growing stand-alone image ==
Stand-alone images (i.e. images without a backing file) aren't that
interesting because you would use raw for them anyway if you needed
optimal performance. We need to be good enough here.

However, all of the problems that arise from dealing with metadata apply
for the really interesting third case, so optimizing them is an
important step on the way.

Known problems:
* Needs a bdrv_flush between refcount table and L2 table write
* Synchronous metadata updates
* Both to be solved by block-queue
** Batches writes and makes the async, can greatly reduce number of
bdrv_flush calls
** Except for cache=writethrough, but this is secondary
** Should we make cache=off the default caching mode in qemu?
writethrough seems to be a bit too much anyway irrespective of the image
format.
* Synchronous refcount table reads
** How frequent are cache misses?
** Making this one async is much harder than L2 table reads. We can make
it a goal for mid-term, but short term we should make it hurt less if
it's a problem in practice.
*** It's probably not, because (without internal snapshots or
compression) we never free clusters, so we fill it sequentially and only
load a new one when the old one is full - and that one we don't even
read, but write, so block-queue will help
* Things like refcount table growth are completely synchronous.
** Not a real problem, because it happens approximately never.

== Growing image with backing file ==
This is the really interesting scenario where you need an image format
that provides some features. For qcow2, it's mostly the same as above.

See stand-alone, plus:
* Needs an bdrv_flush between COW and writing to the L2 table
** qcow2 has already one after refcount table write, so no additional
overhead
* Synchronous COW
** Should be fairly easy to make async
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trace_printk() support in trace-cmd

2010-11-23 Thread Avi Kivity

On 11/23/2010 04:30 PM, Steven Rostedt wrote:

On Tue, 2010-11-23 at 13:04 +0200, Avi Kivity wrote:
  On 11/16/2010 05:13 PM, Steven Rostedt wrote:
BTW, what does /debug/tracing/printk_formats show?
  

  Empty.


So you have real trace_printk's not bprintk's?



What are bprintk()s?


That is, if the format is not a const, then we fall back to
__trace_printk(_THIS_IP_, fmt, args);

And this is a different object. I have not tested these in a while, I'll
give it a try.

But if your printks are bprintks, then the bug is in the kernel, since
that printk_formats needs to show something.


What I do is sprinkle trace_printk()s around my code and expect to see 
them interspersed with enabled tracepoints in 'trace-cmd report'.  Is 
that not the intended behaviour?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Nov 23

2010-11-23 Thread Stefan Hajnoczi
On Tue, Nov 23, 2010 at 2:37 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 22.11.2010 14:55, schrieb Stefan Hajnoczi:
 On Mon, Nov 22, 2010 at 1:38 PM, Juan Quintela quint...@redhat.com wrote:

 Please send in any agenda items you are interested in covering.

 QCOW2 performance roadmap:
 * What can be done to achieve near-raw image format performance?
 * Benchmark results from an ideal QCOW2 model.

Performance figures from a series of I/O scenarios:
http://wiki.qemu.org/Qcow2/PerformanceRoadmap

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu-kvm: remove unused setupcpuid

2010-11-23 Thread Michael S. Tsirkin
kvm_setup_cpuid seems unused, so remove it.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

diff --git a/kvm/libkvm/libkvm-x86.c b/kvm/libkvm/libkvm-x86.c
index f1aef76..2b12408 100644
--- a/kvm/libkvm/libkvm-x86.c
+++ b/kvm/libkvm/libkvm-x86.c
@@ -466,45 +466,6 @@ __u64 kvm_get_cr8(kvm_context_t kvm, int vcpu)
return kvm-run[vcpu]-cr8;
 }
 
-int kvm_setup_cpuid(kvm_context_t kvm, int vcpu, int nent,
-   struct kvm_cpuid_entry *entries)
-{
-   struct kvm_cpuid *cpuid;
-   int r;
-
-   cpuid = malloc(sizeof(*cpuid) + nent * sizeof(*entries));
-   if (!cpuid)
-   return -ENOMEM;
-
-   cpuid-nent = nent;
-   memcpy(cpuid-entries, entries, nent * sizeof(*entries));
-   r = ioctl(kvm-vcpu_fd[vcpu], KVM_SET_CPUID, cpuid);
-
-   free(cpuid);
-   return r;
-}
-
-int kvm_setup_cpuid2(kvm_context_t kvm, int vcpu, int nent,
-struct kvm_cpuid_entry2 *entries)
-{
-   struct kvm_cpuid2 *cpuid;
-   int r;
-
-   cpuid = malloc(sizeof(*cpuid) + nent * sizeof(*entries));
-   if (!cpuid)
-   return -ENOMEM;
-
-   cpuid-nent = nent;
-   memcpy(cpuid-entries, entries, nent * sizeof(*entries));
-   r = ioctl(kvm-vcpu_fd[vcpu], KVM_SET_CPUID2, cpuid);
-   if (r == -1) {
-   fprintf(stderr, kvm_setup_cpuid2: %m\n);
-   r = -errno;
-   }
-   free(cpuid);
-   return r;
-}
-
 int kvm_set_shadow_pages(kvm_context_t kvm, unsigned int nrshadow_pages)
 {
 #ifdef KVM_CAP_MMU_SHADOW_CACHE_CONTROL
diff --git a/kvm/libkvm/libkvm.h b/kvm/libkvm/libkvm.h
index 4821a1e..a70945d 100644
--- a/kvm/libkvm/libkvm.h
+++ b/kvm/libkvm/libkvm.h
@@ -359,36 +359,6 @@ int kvm_set_guest_debug(kvm_context_t, int vcpu, struct 
kvm_guest_debug *dbg);
 
 #if defined(__i386__) || defined(__x86_64__)
 /*!
- * \brief Setup a vcpu's cpuid instruction emulation
- *
- * Set up a table of cpuid function to cpuid outputs.\n
- *
- * \param kvm Pointer to the current kvm_context
- * \param vcpu Which virtual CPU should be initialized
- * \param nent number of entries to be installed
- * \param entries cpuid function entries table
- * \return 0 on success, or -errno on error
- */
-int kvm_setup_cpuid(kvm_context_t kvm, int vcpu, int nent,
-   struct kvm_cpuid_entry *entries);
-
-/*!
- * \brief Setup a vcpu's cpuid instruction emulation
- *
- * Set up a table of cpuid function to cpuid outputs.
- * This call replaces the older kvm_setup_cpuid interface by adding a few
- * parameters to support cpuid functions that have sub-leaf values.
- *
- * \param kvm Pointer to the current kvm_context
- * \param vcpu Which virtual CPU should be initialized
- * \param nent number of entries to be installed
- * \param entries cpuid function entries table
- * \return 0 on success, or -errno on error
- */
-int kvm_setup_cpuid2(kvm_context_t kvm, int vcpu, int nent,
-struct kvm_cpuid_entry2 *entries);
-
-/*!
  * \brief Setting the number of shadow pages to be allocated to the vm
  *
  * \param kvm pointer to kvm_context
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 20b7d6d..672bcbf 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -418,37 +418,6 @@ static void kvm_set_cr8(CPUState *env, uint64_t cr8)
 env-kvm_run-cr8 = cr8;
 }
 
-int kvm_setup_cpuid(CPUState *env, int nent,
-struct kvm_cpuid_entry *entries)
-{
-struct kvm_cpuid *cpuid;
-int r;
-
-cpuid = qemu_malloc(sizeof(*cpuid) + nent * sizeof(*entries));
-
-cpuid-nent = nent;
-memcpy(cpuid-entries, entries, nent * sizeof(*entries));
-r = kvm_vcpu_ioctl(env, KVM_SET_CPUID, cpuid);
-
-free(cpuid);
-return r;
-}
-
-int kvm_setup_cpuid2(CPUState *env, int nent,
- struct kvm_cpuid_entry2 *entries)
-{
-struct kvm_cpuid2 *cpuid;
-int r;
-
-cpuid = qemu_malloc(sizeof(*cpuid) + nent * sizeof(*entries));
-
-cpuid-nent = nent;
-memcpy(cpuid-entries, entries, nent * sizeof(*entries));
-r = kvm_vcpu_ioctl(env, KVM_SET_CPUID2, cpuid);
-free(cpuid);
-return r;
-}
-
 int kvm_set_shadow_pages(kvm_context_t kvm, unsigned int nrshadow_pages)
 {
 #ifdef KVM_CAP_MMU_SHADOW_CACHE_CONTROL
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 0f3fb50..7e6edfb 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -219,6 +219,7 @@ int kvm_get_mpstate(CPUState *env, struct kvm_mp_state 
*mp_state);
 int kvm_set_mpstate(CPUState *env, struct kvm_mp_state *mp_state);
 #endif
 
+#if defined(__i386__) || defined(__x86_64__)
 /*!
  * \brief Simulate an external vectored interrupt
  *
@@ -231,36 +232,6 @@ int kvm_set_mpstate(CPUState *env, struct kvm_mp_state 
*mp_state);
  */
 int kvm_inject_irq(CPUState *env, unsigned irq);
 
-#if defined(__i386__) || defined(__x86_64__)
-/*!
- * \brief Setup a vcpu's cpuid instruction emulation
- *
- * Set up a table of cpuid function to cpuid outputs.\n
- *
- * \param kvm Pointer to the current kvm_context
- * \param vcpu Which 

Re: Mask bit support's API

2010-11-23 Thread Michael S. Tsirkin
On Tue, Nov 23, 2010 at 04:06:20PM +0200, Avi Kivity wrote:
 
   So instead of
 
   - guest reads/writes msix
   - kvm filters mmio, implements some, passes others to userspace
 
   we have
 
   - guest reads/writes msix
   - kvm implements all
   - some writes generate an additional notification to userspace
 
 I suppose we don't need to generate notification to userspace? Because every
 read/write is handled by kernel, and userspace just need interface to kernel 
 to
 get/set the entry - and well, does userspace need to do it when kernel can 
 handle
 all of them? Maybe not...
 
 We could have the kernel handle addr/data writes by setting up an
 internal interrupt routing.  A disadvantage is that more work is
 needed if we emulator interrupt remapping in qemu.

As an alternative, interrupt remapping will need some API rework, right?
Existing APIs only pass address/data for msi.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mask bit support's API

2010-11-23 Thread Gleb Natapov
On Tue, Nov 23, 2010 at 05:11:19PM +0200, Michael S. Tsirkin wrote:
 On Tue, Nov 23, 2010 at 04:06:20PM +0200, Avi Kivity wrote:
  
So instead of
  
- guest reads/writes msix
- kvm filters mmio, implements some, passes others to userspace
  
we have
  
- guest reads/writes msix
- kvm implements all
- some writes generate an additional notification to userspace
  
  I suppose we don't need to generate notification to userspace? Because 
  every
  read/write is handled by kernel, and userspace just need interface to 
  kernel to
  get/set the entry - and well, does userspace need to do it when kernel can 
  handle
  all of them? Maybe not...
  
  We could have the kernel handle addr/data writes by setting up an
  internal interrupt routing.  A disadvantage is that more work is
  needed if we emulator interrupt remapping in qemu.
 
 As an alternative, interrupt remapping will need some API rework, right?
 Existing APIs only pass address/data for msi.
 
IIRC interrupt remapping works with address/data to. It just interpret
it differently from apic.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 00/16] boot order specification

2010-11-23 Thread Gleb Natapov
Anthony, Blue

No comments on this patch series for almost a week. Can it be applied?

On Wed, Nov 17, 2010 at 06:43:47PM +0200, Gleb Natapov wrote:
 I am using open firmware naming scheme to specify device path names.
 In this version: added SCSI bus support. Pass boot order list as file
 to firmware.
 
 Names look like this on pci machine:
 /p...@i0cf8/i...@1,1/dr...@1/d...@0
 /p...@i0cf8/i...@1/f...@03f1/flo...@1
 /p...@i0cf8/i...@1/f...@03f1/flo...@0
 /p...@i0cf8/i...@1,1/dr...@1/d...@1
 /p...@i0cf8/i...@1,1/dr...@0/d...@0
 /p...@i0cf8/s...@3/d...@0,0
 /p...@i0cf8/ether...@4/ethernet-...@0
 /p...@i0cf8/ether...@5/ethernet-...@0
 /p...@i0cf8/i...@1,1/dr...@0/d...@1
 /p...@i0cf8/i...@1/i...@01e8/dr...@0/d...@0
 /p...@i0cf8/u...@1,2/netw...@0/ether...@0
 /p...@i0cf8/u...@1,2/h...@1/netw...@0/ether...@0
 /r...@genroms/linuxboot.bin
 
 and on isa machine:
 /isa/i...@0170/dr...@0/d...@0
 /isa/f...@03f1/flo...@1
 /isa/f...@03f1/flo...@0
 /isa/i...@0170/dr...@0/d...@1
 
 Instead of using get_dev_path() callback I introduces another one
 get_fw_dev_path. Unfortunately the way get_dev_path() callback is used
 in migration code makes it hard to reuse it for other purposes. First
 of all it is not called recursively so caller expects it to provide
 unique name by itself. Device path though is inherently recursive. Each
 individual element may not be unique, but the whole path will be. On
 the other hand to call get_dev_path() recursively in migration code we
 should implement it for all possible buses first. Other problem is
 compatibility. If we change get_dev_path() output format now we will not
 be able to migrate from old qemu to new one without some additional
 compatibility layer.
 
 Gleb Natapov (16):
   Introduce fw_name field to DeviceInfo structure.
   Introduce new BusInfo callback get_fw_dev_path.
   Keep track of ISA ports ISA device is using in qdev.
   Add get_fw_dev_path callback to ISA bus in qdev.
   Store IDE bus id in IDEBus structure for easy access.
   Add get_fw_dev_path callback to IDE bus.
   Add get_dev_path callback for system bus.
   Add get_fw_dev_path callback for pci bus.
   Record which USBDevice USBPort belongs too.
   Add get_dev_path callback for usb bus.
   Add get_dev_path callback to scsi bus.
   Add bootindex parameter to net/block/fd device
   Change fw_cfg_add_file() to get full file path as a parameter.
   Add bootindex for option roms.
   Add notifier that will be called when machine is fully created.
   Pass boot device list to firmware.
 
  block_int.h   |4 +-
  hw/cs4231a.c  |1 +
  hw/e1000.c|4 ++
  hw/eepro100.c |3 +
  hw/fdc.c  |   12 ++
  hw/fw_cfg.c   |   30 --
  hw/fw_cfg.h   |4 +-
  hw/gus.c  |4 ++
  hw/ide/cmd646.c   |4 +-
  hw/ide/internal.h |3 +-
  hw/ide/isa.c  |5 ++-
  hw/ide/piix.c |4 +-
  hw/ide/qdev.c |   22 ++-
  hw/ide/via.c  |4 +-
  hw/isa-bus.c  |   42 +++
  hw/isa.h  |4 ++
  hw/lance.c|1 +
  hw/loader.c   |   32 ---
  hw/loader.h   |8 ++--
  hw/m48t59.c   |1 +
  hw/mc146818rtc.c  |1 +
  hw/multiboot.c|3 +-
  hw/ne2000-isa.c   |3 +
  hw/ne2000.c   |5 ++-
  hw/nseries.c  |4 +-
  hw/palm.c |6 +-
  hw/parallel.c |5 ++
  hw/pc.c   |7 ++-
  hw/pci.c  |  110 ---
  hw/pci_host.c |2 +
  hw/pckbd.c|3 +
  hw/pcnet.c|6 ++-
  hw/piix_pci.c |1 +
  hw/qdev.c |   32 +++
  hw/qdev.h |9 
  hw/rtl8139.c  |4 ++
  hw/sb16.c |4 ++
  hw/scsi-bus.c |   23 +++
  hw/scsi-disk.c|2 +
  hw/serial.c   |1 +
  hw/sysbus.c   |   30 ++
  hw/sysbus.h   |4 ++
  hw/usb-bus.c  |   45 -
  hw/usb-hub.c  |3 +-
  hw/usb-musb.c |2 +-
  hw/usb-net.c  |3 +
  hw/usb-ohci.c |2 +-
  hw/usb-uhci.c |2 +-
  hw/usb.h  |3 +-
  hw/virtio-blk.c   |2 +
  hw/virtio-net.c   |2 +
  hw/virtio-pci.c   |1 +
  net.h |4 +-
  qemu-config.c |   17 
  sysemu.h  |   11 +-
  vl.c  |  114 
 -
  56 files changed, 588 insertions(+), 80 deletions(-)
 
 -- 
 1.7.2.3
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2010-11-23 Thread satimis

http://www.cir-rosario.com.ar/peper.php
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trace_printk() support in trace-cmd

2010-11-23 Thread Steven Rostedt
On Tue, 2010-11-23 at 16:37 +0200, Avi Kivity wrote:
 On 11/23/2010 04:30 PM, Steven Rostedt wrote:
  On Tue, 2010-11-23 at 13:04 +0200, Avi Kivity wrote:
On 11/16/2010 05:13 PM, Steven Rostedt wrote:
  BTW, what does /debug/tracing/printk_formats show?

  
Empty.
  
 
  So you have real trace_printk's not bprintk's?
 
 
 What are bprintk()s?

trace_printk() tries to be clever. If it detects that the format is
constant, instead of doing the sprintf at the tracepoint, it copies a
pointer to the format, and then copies the args to the stack. (although,
I'm not sure how much quicker this is). It just saves on the format in
the ring buffer.

If the format is not static, then it just simply calls __trace_printk()
that does the sprintf() and writes that output into the buffer.

 
  That is, if the format is not a const, then we fall back to
  __trace_printk(_THIS_IP_, fmt, args);
 
  And this is a different object. I have not tested these in a while, I'll
  give it a try.
 
  But if your printks are bprintks, then the bug is in the kernel, since
  that printk_formats needs to show something.
 
 What I do is sprinkle trace_printk()s around my code and expect to see 
 them interspersed with enabled tracepoints in 'trace-cmd report'.  Is 
 that not the intended behaviour?
 

No, that is exactly the intended behavior. But the problem is, for some
reason, the bprintk's (the default that trace_printk() uses) is not
having the format exported. Remember, only the pointer to the format is
stored in the ring buffer (and thus exported by trace-cmd). If that
format is not shown in the printk_format's than trace-cmd has no way to
determine what that trace_printk's format was.

I guess the question is, why did it not show up?

Again, the work around is to replace your trace_printks() with
__trace_printk(_THIS_IP_, ...) or just modify the trace_printk() macro
in include/linux/kernel.h to always use the __trace_printk() version.

-- Steve


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance test result between per-vhost kthread disable and enable

2010-11-23 Thread Sridhar Samudrala

On 11/23/2010 5:41 AM, Michael S. Tsirkin wrote:

On Tue, Nov 23, 2010 at 09:23:41PM +0800, lidong chen wrote:

At this point, I'd suggest testing vhost-net on the upstream kernel,
not on rhel kernels. The change that introduced per-device threads is:
c23f3445e68e1db0e74099f264bc5ff5d55ebdeb
i will try this tomorrow.

Is CONFIG_SCHED_DEBUG set?
yes. CONFIG_SCHED_DEBUG=y.

Disable it. Either debug scheduler or perf-test it :)

Another debug option  to disable is CONFIG_WORKQUEUE_TRACER if it is set
when using old rhel6 kernels.

-Sridhar


2010/11/23 Michael S. Tsirkinm...@redhat.com:

On Tue, Nov 23, 2010 at 10:13:43AM +0800, lidong chen wrote:

I test the performance between per-vhost kthread disable and enable.

Test method:
Send the same traffic load between per-vhost kthread disable and
enable, and compare the cpu rate of host os.
I run five vm on kvm, each of them have five nic.
the vhost version which per-vhost kthread disable we used is rhel6
beta 2(2.6.32.60).
the vhost version which per-vhost kthread enable we used is rhel6 (2.6.32-71).

At this point, I'd suggest testing vhost-net on the upstream kernel,
not on rhel kernels. The change that introduced per-device threads is:
c23f3445e68e1db0e74099f264bc5ff5d55ebdeb


Test result:
with per-vhost kthread disable, the cpu rate of host os is 110%.
with per-vhost kthread enable, the cpu rate of host os is 130%.

Is CONFIG_SCHED_DEBUG set? We are stressing the scheduler a lot with
vhost-net.


In 2.6.32.60,the whole system only have a kthread.
[r...@rhel6-kvm1 ~]# ps -ef | grep vhost
root   973 2  0 Nov22 ?00:00:00 [vhost]

In 2.6.32.71,the whole system have 25 kthread.
[r...@kvm-4slot ~]# ps -ef | grep vhost-
root 12896 2  0 10:26 ?00:00:00 [vhost-12842]
root 12897 2  0 10:26 ?00:00:00 [vhost-12842]
root 12898 2  0 10:26 ?00:00:00 [vhost-12842]
root 12899 2  0 10:26 ?00:00:00 [vhost-12842]
root 12900 2  0 10:26 ?00:00:00 [vhost-12842]

root 13022 2  0 10:26 ?00:00:00 [vhost-12981]
root 13023 2  0 10:26 ?00:00:00 [vhost-12981]
root 13024 2  0 10:26 ?00:00:00 [vhost-12981]
root 13025 2  0 10:26 ?00:00:00 [vhost-12981]
root 13026 2  0 10:26 ?00:00:00 [vhost-12981]

root 13146 2  0 10:26 ?00:00:00 [vhost-13088]
root 13147 2  0 10:26 ?00:00:00 [vhost-13088]
root 13148 2  0 10:26 ?00:00:00 [vhost-13088]
root 13149 2  0 10:26 ?00:00:00 [vhost-13088]
root 13150 2  0 10:26 ?00:00:00 [vhost-13088]
...

Code difference:
In 2.6.32.60,in function vhost_init, create the kthread for vhost.
vhost_workqueue = create_singlethread_workqueue(vhost);

In 2.6.32.71,in function vhost_dev_set_owner, create the kthread for
each nic interface.
dev-wq = create_singlethread_workqueue(vhost_name);

Conclusion:
with per-vhost kthread enable, the system can more throughput.
but deal the same traffic load with per-vhost kthread enable, it waste
more cpu resource.

In my application scene, the cpu resource is more important, and one
kthread for deal with traffic load is enough.

So i think we should add a param to control this.
for the CPU-bound system, this param disable per-vhost kthread.
for the I/O-bound system, this param enable per-vhost kthread.
the default value of this param is enable.

If my opinion is right, i will give a patch for this.

Let's try to figure out what the issue is, first.

--
MST




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes for Nov 23

2010-11-23 Thread Chris Wright
qcow2 performance roadmap
- What can be done to achieve near-raw image format performance?
  - some discussion points from Kevin on list
http://lists.nongnu.org/archive/html/qemu-devel/2010-11/msg02126.html
  - please follow up on the list
- some perf numbers (latest upstream qcow2 compared with qed)
  - qed is fully async, added unconditional flush to model qcow2
  - http://wiki.qemu.org/Qcow2/PerformanceRoadmap 
  - qcow2 not scaling as well
- metadata handling still quite sync
- sequential reads not scaling at all (a
- only serialization point is two accesses to same block and need to
  allocate
- template based backing file is common (esp. in cloud)
- perf data suggests that data/table format dictates performance ceiling
  - barriers off on underlying fs, cache=writethrough
  - raw backing file (sparse) grows with basic tools like cp
  - suggestion: qed == qcow2 v3
- wouldn't support encryption and compression (Kevin won't do this)

usb-ccid
- concern about external library implementation
  - hard to add device features, enhancements, live migration protocol changes
- external library
- will resend patch to 

vcpu hard limits
- will continue discussion on list

0.14 (release date, bug day, -rc planning, etc)
- aiming for dec 15th
- will send note out after call with release schedule

0.13.x
- will connect with jforbes regarding -stable maintainance

gPXE vs. iPXE
- ipxe is new fork
- ipxe looking more active (including original gpxe developers)
- which is a better choice?
  - iPXE more active, gPXE stalled
  - some concern about where the community sits (gPXE has irc, bug
reports, etc)
  - some concern about boot delay with iPXE
- qemu not updating roms that frequently, next time we need to update,
  can evaluate
- syslinux still using gPXE
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mask bit support's API

2010-11-23 Thread Michael S. Tsirkin
On Tue, Nov 23, 2010 at 05:24:44PM +0200, Gleb Natapov wrote:
 On Tue, Nov 23, 2010 at 05:11:19PM +0200, Michael S. Tsirkin wrote:
  On Tue, Nov 23, 2010 at 04:06:20PM +0200, Avi Kivity wrote:
   
 So instead of
   
 - guest reads/writes msix
 - kvm filters mmio, implements some, passes others to userspace
   
 we have
   
 - guest reads/writes msix
 - kvm implements all
 - some writes generate an additional notification to userspace
   
   I suppose we don't need to generate notification to userspace? Because 
   every
   read/write is handled by kernel, and userspace just need interface to 
   kernel to
   get/set the entry - and well, does userspace need to do it when kernel 
   can handle
   all of them? Maybe not...
   
   We could have the kernel handle addr/data writes by setting up an
   internal interrupt routing.  A disadvantage is that more work is
   needed if we emulator interrupt remapping in qemu.
  
  As an alternative, interrupt remapping will need some API rework, right?
  Existing APIs only pass address/data for msi.
  
 IIRC interrupt remapping works with address/data to. It just interpret
 it differently from apic.

Yes. So since our APIs use address/data, this is an argument for doing
the remapping in kernel.

 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 00/16] boot order specification

2010-11-23 Thread Anthony Liguori

On 11/23/2010 09:31 AM, Gleb Natapov wrote:

Anthony, Blue

No comments on this patch series for almost a week. Can it be applied?
   


Does that mean everyone's happy or have folks not gotten around to 
review it?


IOW, last call if you have objections :-)

Regards,

Anthony Liguori


On Wed, Nov 17, 2010 at 06:43:47PM +0200, Gleb Natapov wrote:
   

I am using open firmware naming scheme to specify device path names.
In this version: added SCSI bus support. Pass boot order list as file
to firmware.

Names look like this on pci machine:
/p...@i0cf8/i...@1,1/dr...@1/d...@0
/p...@i0cf8/i...@1/f...@03f1/flo...@1
/p...@i0cf8/i...@1/f...@03f1/flo...@0
/p...@i0cf8/i...@1,1/dr...@1/d...@1
/p...@i0cf8/i...@1,1/dr...@0/d...@0
/p...@i0cf8/s...@3/d...@0,0
/p...@i0cf8/ether...@4/ethernet-...@0
/p...@i0cf8/ether...@5/ethernet-...@0
/p...@i0cf8/i...@1,1/dr...@0/d...@1
/p...@i0cf8/i...@1/i...@01e8/dr...@0/d...@0
/p...@i0cf8/u...@1,2/netw...@0/ether...@0
/p...@i0cf8/u...@1,2/h...@1/netw...@0/ether...@0
/r...@genroms/linuxboot.bin

and on isa machine:
/isa/i...@0170/dr...@0/d...@0
/isa/f...@03f1/flo...@1
/isa/f...@03f1/flo...@0
/isa/i...@0170/dr...@0/d...@1

Instead of using get_dev_path() callback I introduces another one
get_fw_dev_path. Unfortunately the way get_dev_path() callback is used
in migration code makes it hard to reuse it for other purposes. First
of all it is not called recursively so caller expects it to provide
unique name by itself. Device path though is inherently recursive. Each
individual element may not be unique, but the whole path will be. On
the other hand to call get_dev_path() recursively in migration code we
should implement it for all possible buses first. Other problem is
compatibility. If we change get_dev_path() output format now we will not
be able to migrate from old qemu to new one without some additional
compatibility layer.

Gleb Natapov (16):
   Introduce fw_name field to DeviceInfo structure.
   Introduce new BusInfo callback get_fw_dev_path.
   Keep track of ISA ports ISA device is using in qdev.
   Add get_fw_dev_path callback to ISA bus in qdev.
   Store IDE bus id in IDEBus structure for easy access.
   Add get_fw_dev_path callback to IDE bus.
   Add get_dev_path callback for system bus.
   Add get_fw_dev_path callback for pci bus.
   Record which USBDevice USBPort belongs too.
   Add get_dev_path callback for usb bus.
   Add get_dev_path callback to scsi bus.
   Add bootindex parameter to net/block/fd device
   Change fw_cfg_add_file() to get full file path as a parameter.
   Add bootindex for option roms.
   Add notifier that will be called when machine is fully created.
   Pass boot device list to firmware.

  block_int.h   |4 +-
  hw/cs4231a.c  |1 +
  hw/e1000.c|4 ++
  hw/eepro100.c |3 +
  hw/fdc.c  |   12 ++
  hw/fw_cfg.c   |   30 --
  hw/fw_cfg.h   |4 +-
  hw/gus.c  |4 ++
  hw/ide/cmd646.c   |4 +-
  hw/ide/internal.h |3 +-
  hw/ide/isa.c  |5 ++-
  hw/ide/piix.c |4 +-
  hw/ide/qdev.c |   22 ++-
  hw/ide/via.c  |4 +-
  hw/isa-bus.c  |   42 +++
  hw/isa.h  |4 ++
  hw/lance.c|1 +
  hw/loader.c   |   32 ---
  hw/loader.h   |8 ++--
  hw/m48t59.c   |1 +
  hw/mc146818rtc.c  |1 +
  hw/multiboot.c|3 +-
  hw/ne2000-isa.c   |3 +
  hw/ne2000.c   |5 ++-
  hw/nseries.c  |4 +-
  hw/palm.c |6 +-
  hw/parallel.c |5 ++
  hw/pc.c   |7 ++-
  hw/pci.c  |  110 ---
  hw/pci_host.c |2 +
  hw/pckbd.c|3 +
  hw/pcnet.c|6 ++-
  hw/piix_pci.c |1 +
  hw/qdev.c |   32 +++
  hw/qdev.h |9 
  hw/rtl8139.c  |4 ++
  hw/sb16.c |4 ++
  hw/scsi-bus.c |   23 +++
  hw/scsi-disk.c|2 +
  hw/serial.c   |1 +
  hw/sysbus.c   |   30 ++
  hw/sysbus.h   |4 ++
  hw/usb-bus.c  |   45 -
  hw/usb-hub.c  |3 +-
  hw/usb-musb.c |2 +-
  hw/usb-net.c  |3 +
  hw/usb-ohci.c |2 +-
  hw/usb-uhci.c |2 +-
  hw/usb.h  |3 +-
  hw/virtio-blk.c   |2 +
  hw/virtio-net.c   |2 +
  hw/virtio-pci.c   |1 +
  net.h |4 +-
  qemu-config.c |   17 
  sysemu.h  |   11 +-
  vl.c  |  114 -
  56 files changed, 588 insertions(+), 80 deletions(-)

--
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
Gleb.
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org

[PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)

2010-11-23 Thread Anthony Liguori
qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
them to respond to these signals (which cannot be trapped), use SIGUSR1 to
approximate the behavior of SIGSTOP/SIGCONT.

The purpose of this is to implement CPU hard limits using an external tool that
watches the CPU consumption and stops the VCPU as appropriate.

This provides a more elegant solution in that it allows the VCPU thread to
release qemu_mutex before going to sleep.

This current implementation uses a single signal.  I think this is too racey
in the long term so I think we should introduce a second signal.  If two signals
get coalesced into one, it could confuse the monitoring tool into giving the
VCPU the inverse of it's entitlement.

It might be better to simply move this logic entirely into QEMU to make this
more robust--the question is whether we think this is a good long term feature
to carry in QEMU?

Signed-off-by: Anthony Liguori aligu...@us.ibm.com

diff --git a/cpu-defs.h b/cpu-defs.h
index 51533c6..6434dca 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -220,6 +220,7 @@ struct KVMCPUState {
 const char *cpu_model_str;  \
 struct KVMState *kvm_state; \
 struct kvm_run *kvm_run;\
+int sigusr1_fd; \
 int kvm_fd; \
 int kvm_vcpu_dirty; \
 struct KVMCPUState kvm_cpu_state;
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 471306b..354109f 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1351,6 +1351,29 @@ static void pause_all_threads(void)
 }
 }
 
+static void vcpu_stop(CPUState *env)
+{
+if (env != cpu_single_env) {
+env-stop = 1;
+pthread_kill(env-kvm_cpu_state.thread, SIG_IPI);
+} else {
+env-stop = 0;
+env-stopped = 1;
+cpu_exit(env);
+}
+
+while (!env-stopped) {
+qemu_cond_wait(qemu_pause_cond);
+}
+}
+
+static void vcpu_start(CPUState *env)
+{
+env-stop = 0;
+env-stopped = 0;
+pthread_kill(env-kvm_cpu_state.thread, SIG_IPI);
+}
+
 static void resume_all_threads(void)
 {
 CPUState *penv = first_cpu;
@@ -1426,6 +1449,37 @@ static int kvm_main_loop_cpu(CPUState *env)
 return 0;
 }
 
+static __thread int sigusr1_wfd;
+
+static void on_sigusr1(int signo)
+{
+char ch = 0;
+if (write(sigusr1_wfd, ch, 1)  0) {
+/* who cares */
+}
+}
+
+static void sigusr1_read(void *opaque)
+{
+CPUState *env = opaque;
+ssize_t len;
+int caught_signal = 0;
+
+do {
+char buffer[256];
+len = read(env-sigusr1_fd, buffer, sizeof(buffer));
+caught_signal = 1;
+} while (len  0);
+
+if (caught_signal) {
+if (env-stopped) {
+vcpu_start(env);
+} else {
+vcpu_stop(env);
+}
+}
+}
+
 static void *ap_main_loop(void *_env)
 {
 CPUState *env = _env;
@@ -1433,10 +1487,12 @@ static void *ap_main_loop(void *_env)
 #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
 struct ioperm_data *data = NULL;
 #endif
+int fds[2];
 
 current_env = env;
 env-thread_id = kvm_get_thread_id();
 sigfillset(signals);
+sigdelset(signals, SIGUSR1);
 sigprocmask(SIG_BLOCK, signals, NULL);
 
 #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
@@ -1451,6 +1507,18 @@ static void *ap_main_loop(void *_env)
 kvm_create_vcpu(env, env-cpu_index);
 setup_kernel_sigmask(env);
 
+if (pipe(fds) == -1) {
+/* do nothing */
+}
+
+fcntl(fds[0], F_SETFL, O_NONBLOCK);
+fcntl(fds[1], F_SETFL, O_NONBLOCK);
+
+env-sigusr1_fd = fds[0];
+sigusr1_wfd = fds[1];
+
+qemu_set_fd_handler2(fds[0], NULL, sigusr1_read, NULL, env);
+
 /* signal VCPU creation */
 current_env-created = 1;
 pthread_cond_signal(qemu_vcpu_cond);
@@ -1463,6 +1531,8 @@ static void *ap_main_loop(void *_env)
 /* re-initialize cpu_single_env after re-acquiring qemu_mutex */
 cpu_single_env = env;
 
+signal(SIGUSR1, on_sigusr1);
+
 kvm_main_loop_cpu(env);
 return NULL;
 }
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 0f3fb50..3addc77 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -783,6 +783,7 @@ struct KVMState {
 int irqchip_in_kernel;
 int pit_in_kernel;
 int xsave, xcrs;
+int sigusr2_fd;
 
 struct kvm_context kvm_context;
 };
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH trace-cmd 0/3] kvm plugin updates

2010-11-23 Thread Steven Rostedt
On Tue, 2010-11-23 at 12:58 +0200, Avi Kivity wrote:
 Currently the kvm plugin only decodes vmx exit reasons; the first patch
 in this series adds support for the svm instruction set.
 
 Second patch fixes a typo.
 
 A couple of fields were added to the kvm_exit tracepoint; the third patch
 prints them out.
 
 Avi Kivity (3):
   kvm: parse svm exit reason
   kvm: fix typo UNKOWN
   kvm: display the new kvm_exit info1 and info2 fields, if available
 
  plugin_kvm.c |  121 
 ++
  1 files changed, 113 insertions(+), 8 deletions(-)

Applied, Thanks Avi!

-- Steve


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 00/16] boot order specification

2010-11-23 Thread Blue Swirl
On Tue, Nov 23, 2010 at 4:12 PM, Anthony Liguori
aligu...@linux.vnet.ibm.com wrote:
 On 11/23/2010 09:31 AM, Gleb Natapov wrote:

 Anthony, Blue

 No comments on this patch series for almost a week. Can it be applied?


 Does that mean everyone's happy or have folks not gotten around to review
 it?

 IOW, last call if you have objections :-)

I'm happy with the patch set in general, I've just been very busy IRL.
More experiments with Sparc32 device paths would not hurt, but bugs
(if any) can be fixed later.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)

2010-11-23 Thread Blue Swirl
On Tue, Nov 23, 2010 at 4:49 PM, Anthony Liguori aligu...@us.ibm.com wrote:
 qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
 them to respond to these signals (which cannot be trapped), use SIGUSR1 to
 approximate the behavior of SIGSTOP/SIGCONT.

 The purpose of this is to implement CPU hard limits using an external tool 
 that
 watches the CPU consumption and stops the VCPU as appropriate.

 This provides a more elegant solution in that it allows the VCPU thread to
 release qemu_mutex before going to sleep.

 This current implementation uses a single signal.  I think this is too racey
 in the long term so I think we should introduce a second signal.  If two 
 signals
 get coalesced into one, it could confuse the monitoring tool into giving the
 VCPU the inverse of it's entitlement.

 It might be better to simply move this logic entirely into QEMU to make this
 more robust--the question is whether we think this is a good long term feature
 to carry in QEMU?

 +static __thread int sigusr1_wfd;

While OpenBSD finally updated the default compiler to 4.2.1 from 3.x
series, thread local storage is still not supported:

$ cat thread.c
static __thread int sigusr1_wfd;
$ gcc thread.c -c
thread.c:1: error: thread-local storage not supported for this target
$ gcc -v
Reading specs from /usr/lib/gcc-lib/sparc64-unknown-openbsd4.8/4.2.1/specs
Target: sparc64-unknown-openbsd4.8
Configured with: OpenBSD/sparc64 system compiler
Thread model: posix
gcc version 4.2.1 20070719
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)

2010-11-23 Thread Anthony Liguori

On 11/23/2010 01:35 PM, Blue Swirl wrote:

On Tue, Nov 23, 2010 at 4:49 PM, Anthony Liguorialigu...@us.ibm.com  wrote:
   

qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
them to respond to these signals (which cannot be trapped), use SIGUSR1 to
approximate the behavior of SIGSTOP/SIGCONT.

The purpose of this is to implement CPU hard limits using an external tool that
watches the CPU consumption and stops the VCPU as appropriate.

This provides a more elegant solution in that it allows the VCPU thread to
release qemu_mutex before going to sleep.

This current implementation uses a single signal.  I think this is too racey
in the long term so I think we should introduce a second signal.  If two signals
get coalesced into one, it could confuse the monitoring tool into giving the
VCPU the inverse of it's entitlement.

It might be better to simply move this logic entirely into QEMU to make this
more robust--the question is whether we think this is a good long term feature
to carry in QEMU?
 
   

+static __thread int sigusr1_wfd;
 

While OpenBSD finally updated the default compiler to 4.2.1 from 3.x
series, thread local storage is still not supported:
   


Hrm, is there a portable way to do this (distinguish a signal on a 
particular thread)?


Regards,

Anthony Liguori


$ cat thread.c
static __thread int sigusr1_wfd;
$ gcc thread.c -c
thread.c:1: error: thread-local storage not supported for this target
$ gcc -v
Reading specs from /usr/lib/gcc-lib/sparc64-unknown-openbsd4.8/4.2.1/specs
Target: sparc64-unknown-openbsd4.8
Configured with: OpenBSD/sparc64 system compiler
Thread model: posix
gcc version 4.2.1 20070719
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)

2010-11-23 Thread Paolo Bonzini

On 11/23/2010 10:46 PM, Anthony Liguori wrote:

+static __thread int sigusr1_wfd;

While OpenBSD finally updated the default compiler to 4.2.1 from 3.x
series, thread local storage is still not supported:


Hrm, is there a portable way to do this (distinguish a signal on a
particular thread)?


You can use pthread_getspecific/pthread_setspecific.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)

2010-11-23 Thread Anthony Liguori

On 11/23/2010 05:43 PM, Paolo Bonzini wrote:

On 11/23/2010 10:46 PM, Anthony Liguori wrote:

+static __thread int sigusr1_wfd;

While OpenBSD finally updated the default compiler to 4.2.1 from 3.x
series, thread local storage is still not supported:


Hrm, is there a portable way to do this (distinguish a signal on a
particular thread)?


You can use pthread_getspecific/pthread_setspecific.


Is it signal safe?

BTW, this is all only theoretical.  This is in the KVM io thread code 
which is already highly unportable.


Regards,

Anthony Liguori


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 00/16] boot order specification

2010-11-23 Thread Kevin O'Connor
Hi Gleb,

On Tue, Nov 23, 2010 at 05:31:41PM +0200, Gleb Natapov wrote:
 Anthony, Blue
 
 No comments on this patch series for almost a week. Can it be applied?

My apologies - I haven't had time to review.

 On Wed, Nov 17, 2010 at 06:43:47PM +0200, Gleb Natapov wrote:
  I am using open firmware naming scheme to specify device path names.
  In this version: added SCSI bus support. Pass boot order list as file
  to firmware.
  
  Names look like this on pci machine:
[...]
  /p...@i0cf8/u...@1,2/h...@1/netw...@0/ether...@0
  /r...@genroms/linuxboot.bin

What's the plan for handling optionroms (ie, BCVs and BEVs)?  This is
an area which is a bit tricky - mainly due to legacy BIOS crud.

An option rom can register either a BEV (eg, gpxe on a network card),
or it can register one or more BCVs (eg, a scsi card registering two
drives).  How do we say boot from the optionrom on the second nic
card?  If you have a scsi card, how do we communicate that its second
drive should be the c: drive?

The ugly thing about BCVs is that they are not necessarily registered
in the rom for the device that controls it.  So, if you have two of
the same type of scsi card, each with two drives, it's possible for
the optionrom to put all four drives in the rom of the first scsi
card.

  Gleb Natapov (16):
[...]
Pass boot device list to firmware.

It looks like you went with a newline separated list.  Thanks.

-Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mask bit support's API

2010-11-23 Thread Yang, Sheng
On Tuesday 23 November 2010 22:06:20 Avi Kivity wrote:
 On 11/23/2010 03:57 PM, Yang, Sheng wrote:
  Yeah, but won't be included in this patchset.

What API changes are needed?  I'd like to see the complete API.
  
  I am not sure about it. But I suppose the structure should be the same?
  In fact it's pretty hard for me to image what's needed for virtio in the
  future, especially there is no such code now. I really prefer to deal
  with assigned device and virtio separately, which would make the work
  much easier. But seems you won't agree on that.
 
 First, I don't really see why the two cases are different (but I don't
 do a lot in this space).  Surely between you and Michael, you have all
 the information?
 
 Second, my worry is a huge number of ABI variants that come from
 incrementally adding features.  I want to implement bigger chunks of
 functionality.  So I'd like to see all potential users addressed, at
 least from the ABI point of view if not the implementation.
 
The API needs to be compatible with the pending bit, even if we don't
implement it now.  I want to reduce the rate of API changes.
  
  This can be implemented by this API, just adding a flag for it. And I
  would still take this into consideration in the next API purposal.
 
 Shouldn't kvm also service reads from the pending bitmask?

Of course KVM should service reading from pending bitmask. For assigned device, 
it's kernel who would set the pending bit; but I am not sure for virtio. This 
interface is GET_ENTRY, so reading is fine with it.
 
So instead of

- guest reads/writes msix
- kvm filters mmio, implements some, passes others to userspace

we have

- guest reads/writes msix
- kvm implements all
- some writes generate an additional notification to userspace
  
  I suppose we don't need to generate notification to userspace? Because
  every read/write is handled by kernel, and userspace just need interface
  to kernel to get/set the entry - and well, does userspace need to do it
  when kernel can handle all of them? Maybe not...
 
 We could have the kernel handle addr/data writes by setting up an
 internal interrupt routing.  A disadvantage is that more work is needed
 if we emulator interrupt remapping in qemu.

In fact modifying irq routing in the kernel is also the thing I want to avoid.

So, the flow would be:

kernel get MMIO write, record it in it's own MSI table
KVM exit to QEmu, by one specific exit reason
QEmu know it have to sync the MSI table, then reading the entries from kernel
QEmu found it's an write, so it need to reprogram irq routing table using the 
entries above
done

But wait, why should qemu read entries from kernel? By default exit we already 
have the information about what's the entry to modify and what to write, so we 
can 
use them directly. By this way, we also don't need an specific exit reason - 
just 
exit to qemu in normal way is fine.

Then it would be:

kernel get MMIO write, record it in it's own MSI table
KVM exit to QEmu, indicate MMIO exit
QEmu found it's an write, it would update it's own MSI table(may need to query 
mask bit from kernel), and reprogram irq routing table using the entries above
done

Then why should kernel kept it's own MSI table? I think the only reason is we 
can 
speed up reading in that way - but the reading we want to speed up is mostly on 
enabled entry(the first entry), which is already in the IRQ routing table... 

And for enabled/disabled entry, you can see it like this: for the entries 
inside 
routing table, we think it's enabled; otherwise it's disabled. Then you don't 
need 
to bothered by pci_enable_msix().

So our strategy for reading accelerating can be:

If the entry contained in irq routing table, then use it; otherwise let qemu 
deal 
with it. Because it's the QEmu who owned irq routing table, the synchronization 
is 
guaranteed. We don't need the MSI table in the kernel then.

And for writing, we just want to cover all of mask bit, but none of others.

I think the concept here is more acceptable?

The issue here is MSI table and irq routing table got duplicate information on 
some entries. My initial purposal is to use irq routing table in kernel, then 
we 
don't need to duplicate information.


--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)

2010-11-23 Thread Paolo Bonzini

On 11/24/2010 02:15 AM, Anthony Liguori wrote:


Is it signal safe?


Yes, at heart it is just a somewhat more expensive access to 
pthread_self()-some_array[key].



BTW, this is all only theoretical.  This is in the KVM io thread code
which is already highly unportable.


True, and newer versions of GCC emulate __thread even on Windows.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html