tun/tap and Vlans (was: Re: Network I/O performance)

2009-05-19 Thread Lukas Kolbe
Hi all,

On a sidenote:

  I have also realized that when using the tun/tap configuration with
  a bridge, packets are replicated on all tap devices when QEMU writes
  packets to the tun interface. I guess this is a limitation of
  tun/tap as it does not know to which tap device the packet has to go
  to. The tap device then eventually drops packets when the
  destination MAC is not its own, but it still receives the packet 
  which causes more overhead in the system overall.
 
 Right, I guess you'd see this with a real switch as well?  Maybe have 
 your guest send a packet out once in a while so the bridge can learn its 
 MAC address (we do this after migration, for example).

Does this mean that it is not possible for having each tun device in a
seperate bridge that serves a seperate Vlan? We have experienced a
strange problem that we couldn't yet explain. Given this setup:

GuestHost  
kvm1 --- eth0 -+- bridge0 --- vlan1 \
   | +-- eth0
kvm2 -+- eth0 -/ /
  \- eth1 --- bridge1 --- vlan2 +

When sending packets through kvm2/eth0, they appear on both bridges and
also vlans, also when sending packets through kvm2/eth1. When the guest
has only one interface, the packets only appear on one bridge and one
vlan as it's supposed to be.

Can this be worked around?

-- 
Lukas


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] kvm: Use a bitmap for tracking used GSIs

2009-05-19 Thread Avi Kivity

Alex Williamson wrote:

Perhaps we should update the bitmap on entry points that everyone uses
so we don't have to worry about preallocating.  We could set the bitmap
in kvm_add_routing_entry() and clear it in kvm_del_routing_entry().
This would mean that kvm_del_routing_entry() implicitly gives up a GSI
obtained via kvm_get_irq_route_gsi(), which seems to be the assumption
already.

  


Much better.


That would eliminate any need for proliferating KVM_CAP_IRQ_ROUTING
ifdefs or doing anything based on KVM_IOAPIC_NUM_PINS, but should I keep
the KVM_CAP_IRQ_ROUTING around the new code for documentation purposes


Only around code which directly uses the routing facilities (i.e. only 
in the libkvm wrappers).  Code in qemu should only do runtime detection.


I really should write Documentation/kvm/extensions.txt.  And ioctls.txt, 
and intro.txt...


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-19 Thread Christian Bornträger
Am Montag 18 Mai 2009 16:26:15 schrieb Avi Kivity:
 Christian Borntraeger wrote:
  Sorry for the late question, but I missed your first version. Is there a
  way to change that code to use virtio instead of PCI? That would allow us
  to use this driver on s390 and maybe other virtio transports.

 Opinion differs.  See the discussion in
 http://article.gmane.org/gmane.comp.emulators.kvm.devel/30119.

 To summarize, Anthony thinks it should use virtio, while I believe
 virtio is useful for exporting guest memory, not for importing host memory.

I think the current virtio interface is not ideal for importing host memory, 
but we can change that. If you look at the dcssblk driver for s390, it allows 
a guest to map shared memory segments via a diagnose (hypercall). This driver 
uses PCI regions to map memory.

My point is, that the method to map memory is completely irrelevant, we just 
need something like mmap/shmget between the guest and the host. We could 
define an interface in virtio, that can be used by any transport. In case of 
pci this could be a simple pci map operation. 

What do you think about something like: (CCed Rusty)
---
 include/linux/virtio.h |   26 ++
 1 file changed, 26 insertions(+)

Index: linux-2.6/include/linux/virtio.h
===
--- linux-2.6.orig/include/linux/virtio.h
+++ linux-2.6/include/linux/virtio.h
@@ -71,6 +71,31 @@ struct virtqueue_ops {
 };
 
 /**
+ * virtio_device_ops - operations for virtio devices
+ * @map_region: map host buffer at a given address
+ * vdev: the struct virtio_device we're talking about.
+ * addr: The address where the buffer should be mapped (hint only)
+ * length: THe length of the mapping
+ * identifier: the token that identifies the host buffer
+ *  Returns the mapping address or an error pointer.
+ * @unmap_region: unmap host buffer from the address
+ * vdev: the struct virtio_device we're talking about.
+ * addr: The address where the buffer is mapped
+ *  Returns 0 on success or an error
+ *
+ * TBD, we might need query etc.
+ */
+struct virtio_device_ops {
+   void * (*map_region)(struct virtio_device *vdev,
+void *addr,
+size_t length,
+int identifier);
+   int (*unmap_region)(struct virtio_device *vdev, void *addr);
+/* we might need query region and other stuff */
+};
+
+
+/**
  * virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
  * @dev: underlying device.
@@ -85,6 +110,7 @@ struct virtio_device
struct device dev;
struct virtio_device_id id;
struct virtio_config_ops *config;
+   struct virtio_device_ops *ops;
/* Note that this is a Linux set_bit-style bitmap. */
unsigned long features[1];
void *priv;



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-19 Thread Avi Kivity

Christian Bornträger wrote:

To summarize, Anthony thinks it should use virtio, while I believe
virtio is useful for exporting guest memory, not for importing host memory.



I think the current virtio interface is not ideal for importing host memory, 
but we can change that. If you look at the dcssblk driver for s390, it allows 
a guest to map shared memory segments via a diagnose (hypercall). This driver 
uses PCI regions to map memory.


My point is, that the method to map memory is completely irrelevant, we just 
need something like mmap/shmget between the guest and the host. We could 
define an interface in virtio, that can be used by any transport. In case of 
pci this could be a simple pci map operation. 


What do you think about something like: (CCed Rusty)
  


Exactly.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v9] kvm: add support for irqfd

2009-05-19 Thread Avi Kivity

Gregory Haskins wrote:

More slop.  I shouldn't send patches out first thing Monday morning, I
guess.

Here is my current delta queued for v10.  I will wait for some feedback
on v9 before cutting it:
  


With this, v10 looks good to go.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] If interrupt injection is not possible do not scan IRR.

2009-05-19 Thread Avi Kivity

Gleb Natapov wrote:

Forget to remove debug output before submitting. Resending.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 1ccb50c..d32ceac 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -218,6 +218,11 @@ int kvm_pic_read_irq(struct kvm *kvm)
struct kvm_pic *s = pic_irqchip(kvm);
 
 	pic_lock(s);

+   if (!s-output) {
+   pic_unlock(s);
+   return -1;
+   }
+   s-output = 0;
irq = pic_get_irq(s-pics[0]);
if (irq = 0) {
pic_intack(s-pics[0], irq);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 96dfbb6..e93405a 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -78,7 +78,6 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
if (vector == -1) {
if (kvm_apic_accept_pic_intr(v)) {
s = pic_irqchip(v-kvm);
-   s-output = 0;   /* PIC */
vector = kvm_pic_read_irq(v-kvm);
}
}
  


Please split into a different patch.  Even though it is a lot simpler, 
it contains non-local changes and is therefore relatively dangerous.



diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 44e87a5..854e8c9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3174,10 +3174,10 @@ static void inject_pending_irq(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
vcpu-arch.nmi_injected = true;
kvm_x86_ops-set_nmi(vcpu);
}
-   } else if (kvm_cpu_has_interrupt(vcpu)) {
-   if (kvm_x86_ops-interrupt_allowed(vcpu)) {
-   kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
-   false);
+   } else if (kvm_x86_ops-interrupt_allowed(vcpu)) {
+   int vec = kvm_cpu_get_interrupt(vcpu);
+   if (vec != -1) {
+   kvm_queue_interrupt(vcpu, vec, false);
kvm_x86_ops-set_irq(vcpu);
}
}
  


Again, I don't think this is a win.  Usually -interrupts_allowed() == 
true so we'll execute the rest anyway.


Perhaps we could move the call to has_interrupt into get_interrupt.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Drop interrupt shadow when single stepping should be done only on VMX.

2009-05-19 Thread Avi Kivity

Gleb Natapov wrote:

The problem exists only on VMX. Also currently we skip this step if
there is pending exception. The patch fixes this too.
  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [APIC] Optimize searching for highest IRR

2009-05-19 Thread Avi Kivity

Gleb Natapov wrote:

Most of the time IRR is empty, so instead of scanning the whole IRR on
each VM entry keep a variable that tells us if IRR is not empty. IRR
will have to be scanned twice on each IRQ delivery, but this is much more
rare than VM entry.

 
 static inline int apic_find_highest_irr(struct kvm_lapic *apic)

 {
int result;
 
-	result = find_highest_vector(apic-regs + APIC_IRR);

+   if (!apic-irr_pending)
+   return -1;
  


smp_mb__before_clear_bit(), to prevent the cpu speculating the IRR.


+
+   result = apic_search_irr(apic);
ASSERT(result == -1 || result = 16);
 
 	return result;

 }
 
+static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)

+{
+   apic-irr_pending = false;
+   apic_clear_vector(vec, apic-regs + APIC_IRR);
  


smp_rmb()


+   if (apic_search_irr(apic) != -1)
+   apic-irr_pending = true;
  




--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio net regression

2009-05-19 Thread Antoine Martin
Avi Kivity wrote:
 Antoine Martin wrote:
 Hi,

 Here is another one, any ideas?
 These oopses do look quite deep. Is it normal to end up in tcp_send_ack
 from pdflush??

   

 I think it can happen anywhere, part of the net softirq.
Hah, gotcha.

 Cheers
 Antoine

 [929492.154634] pdflush: page allocation failure. order:0, mode:0x20
   

 You're out of memory.
That's quite odd, the guest wasn't even hitting the swap at the tine.
   How much memory did you allocate to the guest?  did you balloon it?
512MB, no ballooning.


 [929492.154637] Pid: 291, comm: pdflush Not tainted 2.6.29.2 #5
 [929492.154639] Call Trace:
 [929492.154641]  IRQ  [8027e8bc]
 __alloc_pages_internal+0x3e1/0x401
 [929492.154649]  [8055b5ea] try_fill_recv+0xa1/0x182
 [929492.154652]  [8055c1fc] virtnet_poll+0x533/0x5ab
 [929492.154655]  [80632bba] net_rx_action+0x70/0x143
 [929492.154658]  [8023f18c] __do_softirq+0x83/0x123
 [929492.154661]  [8020d35c] call_softirq+0x1c/0x28
 [929492.154664]  [8020e2c0] do_softirq+0x3c/0x85
 [929492.154666]  [8023eea3] irq_exit+0x3f/0x7a
 [929492.154668]  [8020e59c] do_IRQ+0x12b/0x14f
 [929492.154670]  [8020cad3] ret_from_intr+0x0/0x29
 [929492.154672]  EOI  [802c22b1]
 __set_page_dirty_buffers+0x0/0x8f
 [929492.154677]  [8031702b] bget_one+0x0/0xb
 [929492.154680]  [80316fa2] walk_page_buffers+0x2/0x8b
 [929492.154682]  [803185bc] ext3_ordered_writepage+0xae/0x134
 [929492.154685]  [8027ea46] __writepage+0xa/0x25
 [929492.154687]  [8027f19f] write_cache_pages+0x206/0x322
 [929492.154689]  [8027ea3c] __writepage+0x0/0x25
 [929492.154691]  [8027f2fe] do_writepages+0x27/0x2d
 [929492.154694]  [802bd3f6]
 __writeback_single_inode+0x1a7/0x3b5
 [929492.154696]  [8020a68c] __switch_to+0xb4/0x38c
 [929492.154698]  [802bda76] generic_sync_sb_inodes+0x2a7/0x458
 [929492.154701]  [802bde00] writeback_inodes+0x8d/0xe6
 [929492.154704]  [807296e2] _spin_lock+0x5/0x7
 [929492.155056]  [8027f432] wb_kupdate+0x9f/0x116
 [929492.155058]  [80280095] pdflush+0x14b/0x202
 [929492.155061]  [8027f393] wb_kupdate+0x0/0x116
 [929492.155063]  [8027ff4a] pdflush+0x0/0x202
 [929492.155065]  [8027ff4a] pdflush+0x0/0x202
 [929492.155068]  [8024c127] kthread+0x47/0x73
 [929492.155070]  [8020d25a] child_rip+0xa/0x20
 [929492.155072]  [8024c0e0] kthread+0x0/0x73
 [929492.183142]  [8020d250] child_rip+0x0/0x20
 [929492.183145] Mem-Info:
 [929492.183147] DMA per-cpu:
 [929492.183149] CPU0: hi:0, btch:   1 usd:   0
 [929492.183151] DMA32 per-cpu:
 [929492.183154] CPU0: hi:  186, btch:  31 usd: 184
 [929492.183158] Active_anon:2755 active_file:39849 inactive_anon:2972
 [929492.183159]  inactive_file:70353 unevictable:0 dirty:4172
 writeback:1580 unstable:0
 [929492.183161]  free:734 slab:5619 mapped:15047 pagetables:927 bounce:0
 [929492.183166] DMA free:1968kB min:28kB low:32kB high:40kB
 active_anon:0kB inactive_anon:40kB active_file:2116kB
 inactive_file:1880kB unevictable:0kB present:5448kB pages_scanned:0
 all_unreclaimable? no
 [929492.183169] lowmem_reserve[]: 0 489 489 489
 [929492.183176] DMA32 free:968kB min:2812kB low:3512kB high:4216kB
 active_anon:11020kB inactive_anon:11848kB active_file:157280kB
 inactive_file:279532kB unevictable:0kB present:500896kB pages_scanned:0
 all_unreclaimable? no
 [929492.183180] lowmem_reserve[]: 0 0 0 0
 [929492.183183] DMA: 6*4kB 2*8kB 3*16kB 1*32kB 1*64kB 2*128kB 0*256kB
 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1976kB
 [929492.183235] DMA32: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 3*128kB 2*256kB
 0*512kB 0*1024kB 0*2048kB 0*4096kB = 968kB
 [929492.183244] 110992 total pagecache pages
 [929492.183246] 739 pages in swap cache
 [929492.183248] Swap cache stats: add 8996, delete 8257, find
 92604/93191
 [929492.183250] Free swap  = 1040016kB
 [929492.183252] Total swap = 1048568kB
 [929492.186003] 131056 pages RAM
 [929492.186006] 4799 pages reserved
 [929492.186007] 44697 pages shared
 [929492.186008] 90516 pages non-shared
 [930274.380075] eth0: no IPv6 routers present

   
 Strange, seems to be a bit of free memory here.
There should be lots, all this host is doing is apache+sftp...

Assuming I can make it re-occur (stress testing it?), how would I dig
further to find the cause of this memory exhaustion? /proc/meminfo and
friends?

Cheers
Antoine

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio net regression

2009-05-19 Thread Avi Kivity

Antoine Martin wrote:

You're out of memory.


That's quite odd, the guest wasn't even hitting the swap at the tine.
  


But you do have swap enabled?

 


Strange, seems to be a bit of free memory here.


There should be lots, all this host is doing is apache+sftp...

Assuming I can make it re-occur (stress testing it?), how would I dig
further to find the cause of this memory exhaustion? /proc/meminfo and
friends?
  


Yes please.  Maybe virtio is leaking memory.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] Intel-IOMMU: source-id checking for interrupt remapping

2009-05-19 Thread Weidong Han
Support source-id checking for interrupt remapping, and then
isolates interrupts for guests/VMs with assigned devices.

v1 - v2 change log:
Access PCI directly (read_pci_config_byte) to parse IOAPIC,
instead of PCI related discovery, because PCI subsystem is not
initialized at that time.

Weidong Han (2):
  Intel-IOMMU, intr-remap: set the whole 128bits of irte when
modify/free it
  Intel-IOMMU, intr-remap: source-id checking

 arch/x86/kernel/apic/io_apic.c |6 ++
 drivers/pci/intr_remapping.c   |  100 +--
 drivers/pci/intr_remapping.h   |2 +
 include/linux/dmar.h   |   11 
 4 files changed, 113 insertions(+), 6 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] Intel-IOMMU, intr-remap: set the whole 128bits of irte when modify/free it

2009-05-19 Thread Weidong Han
Interrupt remapping table entry is 128bits. Currently, it only sets low
64bits of irte in modify_irte and free_irte. This ignores high 64bits
setting of irte, that means source-id setting will be ignored. This patch
sets the whole 128bits of irte when modify/free it. Following source-id
checking patch depends on this.

Signed-off-by: Weidong Han weidong@intel.com
---
 drivers/pci/intr_remapping.c |   10 +++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/intr_remapping.c b/drivers/pci/intr_remapping.c
index f5e0ea7..946e170 100644
--- a/drivers/pci/intr_remapping.c
+++ b/drivers/pci/intr_remapping.c
@@ -309,7 +309,8 @@ int modify_irte(int irq, struct irte *irte_modified)
index = irq_iommu-irte_index + irq_iommu-sub_handle;
irte = iommu-ir_table-base[index];
 
-   set_64bit((unsigned long *)irte, irte_modified-low);
+   set_64bit((unsigned long *)irte-low, irte_modified-low);
+   set_64bit((unsigned long *)irte-high, irte_modified-high);
__iommu_flush_cache(iommu, irte, sizeof(*irte));
 
rc = qi_flush_iec(iommu, index, 0);
@@ -386,8 +387,11 @@ int free_irte(int irq)
irte = iommu-ir_table-base[index];
 
if (!irq_iommu-sub_handle) {
-   for (i = 0; i  (1  irq_iommu-irte_mask); i++)
-   set_64bit((unsigned long *)(irte + i), 0);
+   for (i = 0; i  (1  irq_iommu-irte_mask); i++) {
+   set_64bit((unsigned long *)irte-low, 0);
+   set_64bit((unsigned long *)irte-high, 0);
+   irte++;
+   }
rc = qi_flush_iec(iommu, index, irq_iommu-irte_mask);
}
 
-- 
1.6.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] Intel-IOMMU, intr-remap: source-id checking

2009-05-19 Thread Han, Weidong
Ingo Molnar wrote:
 * Han, Weidong weidong@intel.com wrote:
 
 Ingo Molnar wrote:
 * Han, Weidong weidong@intel.com wrote:
 
 Siddha, Suresh B wrote:
 On Wed, 2009-05-06 at 23:16 -0700, Han, Weidong wrote:
 @@ -634,6 +694,44 @@ static int ir_parse_ioapic_scope(struct
 acpi_dmar_header *header,  0x%Lx\n,
 scope-enumeration_id, drhd-address);
 
 +bus = pci_find_bus(drhd-segment, scope-bus);
 +path = (struct acpi_dmar_pci_path *)(scope + 
 1); +  count =
 (scope-length - +sizeof(struct 
 acpi_dmar_device_scope))
 +/ sizeof(struct acpi_dmar_pci_path);
 +
 +while (count) {
 +if (pdev)
 +pci_dev_put(pdev);
 +
 +if (!bus)
 +break;
 +
 +pdev = pci_get_slot(bus,
 +PCI_DEVFN(path-dev, path-fn));
 +if (!pdev)
 +break;
 
 ir_parse_ioapic_scope() happens very early in the boot. So, I
 don't think we can do the pci related discovery here.
 
 
 Thanks for your pointing it out. It should enable the source-id
 checking for io-apic's after the pci subsystem is up. I will
 change it.
 
 Note, there's ways to do early PCI quirks too, check
 arch/x86/kernel/early-quirks.c. It's done by reading the PCI
 configuration space directly via a careful early-capable subset of
 the PCI config space APIs. 
 
 But it's a method of last resort.
 
 
 Thanks for your reminder. It can use direct PCI access here as
 follows. It's easy and clean. I think it's better than adding the
 source-id checking for io-apic's after the pci subsystem is up. I
 will send out updated patches after some tests.   
 
 @@ -634,6 +695,24 @@ static int ir_parse_ioapic_scope(struct
acpi_dmar_header *header,  0x%Lx\n,
scope-enumeration_id, drhd-address);
 
 +   bus = scope-bus;
 +   path = (struct acpi_dmar_pci_path *)(scope +
 1); +   count = (scope-length -
 +sizeof(struct
 acpi_dmar_device_scope)) +   /
 sizeof(struct acpi_dmar_pci_path); + +   while
 (--count  0) { +   /* Access PCI
 directly due to the PCI +* subsystem
 isn't initialized yet. +*/
 +   bus = read_pci_config_byte(bus,
 path-dev, +   path-fn,
 PCI_SECONDARY_BUS); +   path++;
 +   }
 +
 +   ir_ioapic[ir_ioapic_num].bus = bus;
 +   ir_ioapic[ir_ioapic_num].devfn =
 +   PCI_DEVFN(path-dev,
 path-fn); 
 
 looks good IMO, beyond the obligatory comment-style nitpick [*] :-)
 Also, the function above seems to be way too large - please split it
 into a couple of natural helper functions.
 
 Thanks,
 
   Ingo
 
 [*]
 
 Please use the customary comment style:
 
   /*
* Comment .
* .. goes here:
*/
 
 specified in Documentation/CodingStyle.

I have sent out the updated patches. Thanks!

Regards,
Weidong--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio net regression

2009-05-19 Thread Antoine Martin
Avi Kivity wrote:
 Antoine Martin wrote:
 You're out of memory.
 
 That's quite odd, the guest wasn't even hitting the swap at the tine.  

 But you do have swap enabled?
Yes.

I always do this on the guests as it seems fairer to let the guests use
swap when they need the extra memory rather than over-committing too
much memory on the host. Although it would probably be more efficient
overall to let the host manage all swapping.
It consumes more I/O bandwidth, but most guest's memory stay warm no
matter what other guests are doing.
Does that sound reasonable?
 Strange, seems to be a bit of free memory here.
 
 There should be lots, all this host is doing is apache+sftp...

 Assuming I can make it re-occur (stress testing it?), how would I dig
 further to find the cause of this memory exhaustion? /proc/meminfo and
 friends?
   

 Yes please.  Maybe virtio is leaking memory.
Will report if I find anything.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] Nested SVM: Implement INVLPGA v2

2009-05-19 Thread Alexander Graf
SVM adds another way to do INVLPG by ASID which Hyper-V makes use of,
so let's implement it!

For now we just do the same thing invlpg does, as asid switching
means we flush the mmu anyways. That might change one day though.

v2 makes invlpga do the same as invlpg, not flush the whole mmu

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/x86/kvm/svm.c |   15 ++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 4b4eadd..fa2a710 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1785,6 +1785,19 @@ static int clgi_interception(struct vcpu_svm *svm, 
struct kvm_run *kvm_run)
return 1;
 }
 
+static int invlpga_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+   struct kvm_vcpu *vcpu = svm-vcpu;
+   nsvm_printk(INVLPGA\n);
+
+   /* Let's treat INVLPGA the same as INVLPG */
+   kvm_mmu_invlpg(vcpu, vcpu-arch.regs[VCPU_REGS_RAX]);
+
+   svm-next_rip = kvm_rip_read(svm-vcpu) + 3;
+   skip_emulated_instruction(svm-vcpu);
+   return 1;
+}
+
 static int invalid_op_interception(struct vcpu_svm *svm,
   struct kvm_run *kvm_run)
 {
@@ -2130,7 +2143,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
[SVM_EXIT_INVD] = emulate_on_interception,
[SVM_EXIT_HLT]  = halt_interception,
[SVM_EXIT_INVLPG]   = invlpg_interception,
-   [SVM_EXIT_INVLPGA]  = invalid_op_interception,
+   [SVM_EXIT_INVLPGA]  = invlpga_interception,
[SVM_EXIT_IOIO] = io_interception,
[SVM_EXIT_MSR]  = msr_interception,
[SVM_EXIT_TASK_SWITCH]  = task_switch_interception,
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] Add rudimentary Hyper-V guest support v2

2009-05-19 Thread Alexander Graf
Now that we have nested SVM in place, let's make use of it and virtualize
something non-kvm.
The first interesting target that came to my mind here was Hyper-V.

This patchset makes Windows Server 2008 boot with Hyper-V, which runs
the dom0 in virtualized mode already. It hangs somewhere in IDE code when
booted, so I haven't been able to run a second VM within for now yet.

Please keep in mind that Hyper-V won't work unless you apply the userspace
patches too and the PAT bit patch

v2 changes:
  - remove reserved PAT check patch (Avi will do this)
  - remove #PF inject on emulated_read
  - take comments from v1 into account (listed individually)

Alexander Graf (4):
  Add definition for IGNNE MSR
  Implement Hyper-V MSRs v2
  Nested SVM: Implement INVLPGA v2
  Nested SVM: Improve interrupt injection v2

 arch/x86/include/asm/msr-index.h |1 +
 arch/x86/kvm/svm.c   |   59 +++--
 2 files changed, 44 insertions(+), 16 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] Nested SVM: Improve interrupt injection v2

2009-05-19 Thread Alexander Graf
While trying to get Hyper-V running, I realized that the interrupt injection
mechanisms that are in place right now are not 100% correct.

This patch makes nested SVM's interrupt injection behave more like on a
real machine.

v2 calls BUG_ON when svm_set_irq is called with GIF=0

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/x86/kvm/svm.c |   39 ---
 1 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index fa2a710..5b14c9d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1517,7 +1517,8 @@ static int nested_svm_vmexit_real(struct vcpu_svm *svm, 
void *arg1,
/* Kill any pending exceptions */
if (svm-vcpu.arch.exception.pending == true)
nsvm_printk(WARNING: Pending Exception\n);
-   svm-vcpu.arch.exception.pending = false;
+   kvm_clear_exception_queue(svm-vcpu);
+   kvm_clear_interrupt_queue(svm-vcpu);
 
/* Restore selected save entries */
svm-vmcb-save.es = hsave-save.es;
@@ -1585,7 +1586,8 @@ static int nested_svm_vmrun(struct vcpu_svm *svm, void 
*arg1,
svm-nested_vmcb = svm-vmcb-save.rax;
 
/* Clear internal status */
-   svm-vcpu.arch.exception.pending = false;
+   kvm_clear_exception_queue(svm-vcpu);
+   kvm_clear_interrupt_queue(svm-vcpu);
 
/* Save the old vmcb, so we don't need to pick what we save, but
   can restore everything when a VMEXIT occurs */
@@ -2277,21 +2279,14 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, 
int irq)
((/*control-int_vector  4*/ 0xf)  V_INTR_PRIO_SHIFT);
 }
 
-static void svm_queue_irq(struct kvm_vcpu *vcpu, unsigned nr)
-{
-   struct vcpu_svm *svm = to_svm(vcpu);
-
-   svm-vmcb-control.event_inj = nr |
-   SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR;
-}
-
 static void svm_set_irq(struct kvm_vcpu *vcpu, int irq)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
-   nested_svm_intr(svm);
+   BUG_ON(!(svm-vcpu.arch.hflags  HF_GIF_MASK));
 
-   svm_queue_irq(vcpu, irq);
+   svm-vmcb-control.event_inj = irq |
+   SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR;
 }
 
 static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
@@ -2319,13 +2314,25 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu)
struct vmcb *vmcb = svm-vmcb;
return (vmcb-save.rflags  X86_EFLAGS_IF) 
!(vmcb-control.int_state  SVM_INTERRUPT_SHADOW_MASK) 
-   (svm-vcpu.arch.hflags  HF_GIF_MASK);
+   (svm-vcpu.arch.hflags  HF_GIF_MASK) 
+   !is_nested(svm);
 }
 
 static void enable_irq_window(struct kvm_vcpu *vcpu)
 {
-   svm_set_vintr(to_svm(vcpu));
-   svm_inject_irq(to_svm(vcpu), 0x0);
+   struct vcpu_svm *svm = to_svm(vcpu);
+   nsvm_printk(Trying to open IRQ window\n);
+
+   nested_svm_intr(svm);
+
+   /* In case GIF=0 we can't rely on the CPU to tell us when
+* GIF becomes 1, because that's a separate STGI/VMRUN intercept.
+* The next time we get that intercept, this function will be
+* called again though and we'll get the vintr intercept. */
+   if (svm-vcpu.arch.hflags  HF_GIF_MASK) {
+   svm_set_vintr(svm);
+   svm_inject_irq(svm, 0x0);
+   }
 }
 
 static void enable_nmi_window(struct kvm_vcpu *vcpu)
@@ -2393,6 +2400,8 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
case SVM_EXITINTINFO_TYPE_EXEPT:
/* In case of software exception do not reinject an exception
   vector, but re-execute and instruction instead */
+   if (is_nested(svm))
+   break;
if (vector == BP_VECTOR || vector == OF_VECTOR)
break;
if (exitintinfo  SVM_EXITINTINFO_VALID_ERR) {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] Add definition for IGNNE MSR

2009-05-19 Thread Alexander Graf
Hyper-V tried to access MSR_IGNNE, so let's at least have a definition
for it in our headers.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/x86/include/asm/msr-index.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ec41fc1..e273549 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -372,6 +372,7 @@
 /* AMD-V MSRs */
 
 #define MSR_VM_CR   0xc0010114
+#define MSR_VM_IGNNE0xc0010115
 #define MSR_VM_HSAVE_PA 0xc0010117
 
 #endif /* _ASM_X86_MSR_INDEX_H */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] Shared memory device with interrupt support

2009-05-19 Thread Jayaraman, Bhaskar
Cam, is it somehow possible to generate a local APIC interrupt from one VM to 
another? I guess it shouldn't be as the LAPIC interrupts generated in one VM 
will go to the VCPUs of the same VM...
Regards,
Bhaskar.

-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of 
Kumar, Venkat
Sent: Tuesday, May 19, 2009 9:22 AM
To: Cam Macdonell
Cc: kvm@vger.kernel.org list
Subject: RE: [PATCH v2] Shared memory device with interrupt support

I had tried all syntaxes other than this :).
Interrupts work now.

Thx,

Venkat

-Original Message-
From: Cam Macdonell [mailto:c...@cs.ualberta.ca]
Sent: Monday, May 18, 2009 9:51 PM
To: Kumar, Venkat
Cc: kvm@vger.kernel.org list
Subject: Re: [PATCH v2] Shared memory device with interrupt support

Kumar, Venkat wrote:
 Cam - I got your patch to work but without notifications. I could share 
 memory using the patch but notifications aren't working.

 I bring up two VM's with option -ivshmem shrmem,1024,/dev/shm/shrmem,server 
 and -ivshmem shrmem,1024,/dev/shm/shrmem respectively.

Ok, I guess I need to do more error checking of arguments :)  You need
to specify unix: on the path.  So your options should look like this

-ivshmem shrmem,1024,unix:/dev/shm/shrmem,server

-ivshmem shrmem,1024,unix:/dev/shm/shrmem

That should help.

Cam


 When I make an ioctl from one of the VM's to inject an interrupt to the 
 other VM, I get an error in qemu_chr_write and return value is -1. 
 write call in send_all is failing with return value -1.

 Am I missing something here?

 Thx,

 Venkat


 -Original Message-
 From: Cam Macdonell [mailto:c...@cs.ualberta.ca]
 Sent: Saturday, May 16, 2009 9:01 AM
 To: Kumar, Venkat
 Cc: kvm@vger.kernel.org list
 Subject: Re: [PATCH v2] Shared memory device with interrupt support


 On 15-May-09, at 8:54 PM, Kumar, Venkat wrote:

 Cam,

 A questions on interrupts as well.
 What is unix:path that needs to be passed in the argument list?
 Can it be any string?

 It has to be a valid path on the host.  It will create a unix domain
 socket on that path.

 If my understanding is correct both the VM's who wants to
 communicate would gives this path in the command line with one of
 them specifying as server.

 Exactly, the one with the server in the parameter list will wait for
 a connection before booting.

 Cam

 Thx,
 Venkat






Support an inter-vm shared memory device that maps a shared-
 memory object
 as a PCI device in the guest.  This patch also supports interrupts
 between
 guest by communicating over a unix domain socket.  This patch
 applies to the
 qemu-kvm repository.

 This device now creates a qemu character device and sends 1-bytes
 messages to
 trigger interrupts.  Writes are trigger by writing to the Doorbell
 register
 on the shared memory PCI device.  The lower 8-bits of the value
 written to this
 register are sent as the 1-byte message so different meanings of
 interrupts can
 be supported.

 Interrupts are only supported between 2 VMs currently.  One VM must
 act as the
 server by adding server to the command-line argument.  Shared
 memory devices
 are created with the following command-line:

 -ivhshmem shm object,size in MB,[unix:path][,server]

 Interrupts can also be used between host and guest as well by
 implementing a
 listener on the host.

 Cam

 ---
 Makefile.target |3 +
 hw/ivshmem.c|  421 ++
 +
 hw/pc.c |6 +
 hw/pc.h |3 +
 qemu-options.hx |   14 ++
 sysemu.h|8 +
 vl.c|   14 ++
 7 files changed, 469 insertions(+), 0 deletions(-)
 create mode 100644 hw/ivshmem.c

 diff --git a/Makefile.target b/Makefile.target
 index b68a689..3190bba 100644
 --- a/Makefile.target
 +++ b/Makefile.target
 @@ -643,6 +643,9 @@ OBJS += pcnet.o
 OBJS += rtl8139.o
 OBJS += e1000.o

 +# Inter-VM PCI shared memory
 +OBJS += ivshmem.o
 +
 # Generic watchdog support and some watchdog devices
 OBJS += watchdog.o
 OBJS += wdt_ib700.o wdt_i6300esb.o
 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 new file mode 100644
 index 000..95e2268
 --- /dev/null
 +++ b/hw/ivshmem.c
 @@ -0,0 +1,421 @@
 +/*
 + * Inter-VM Shared Memory PCI device.
 + *
 + * Author:
 + *  Cam Macdonell c...@cs.ualberta.ca
 + *
 + * Based On: cirrus_vga.c and rtl8139.c
 + *
 + * This code is licensed under the GNU GPL v2.
 + */
 +
 +#include hw.h
 +#include console.h
 +#include pc.h
 +#include pci.h
 +#include sysemu.h
 +
 +#include qemu-common.h
 +#include sys/mman.h
 +
 +#define PCI_COMMAND_IOACCESS0x0001
 +#define PCI_COMMAND_MEMACCESS   0x0002
 +#define PCI_COMMAND_BUSMASTER   0x0004
 +
 +//#define DEBUG_IVSHMEM
 +
 +#ifdef DEBUG_IVSHMEM
 +#define IVSHMEM_DPRINTF(fmt, args...)\
 +do {printf(IVSHMEM:  fmt, ##args); } while (0)
 +#else
 +#define IVSHMEM_DPRINTF(fmt, args...)
 +#endif
 +
 +typedef struct IVShmemState {
 +uint16_t intrmask;
 +uint16_t 

Re: [PATCH v2] Shared memory device with interrupt support

2009-05-19 Thread Gregory Haskins
Jayaraman, Bhaskar wrote:
 Cam, is it somehow possible to generate a local APIC interrupt from one VM to 
 another? I guess it shouldn't be as the LAPIC interrupts generated in one VM 
 will go to the VCPUs of the same VM...
 Regards,
 Bhaskar.
   

The closest thing to this is the irqfd+iosignalfd thing I mentioned the
other day.  With this model, a PIO/MMIO write in the src guest will
directly inject an interrupt into the dst guest's LAPIC.  However, as
Avi points out, this is just an optimization.  You can also do it by
first taking a hop through each guests userspace as well.

HTH
-Greg




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 2/2] Intel-IOMMU, intr-remap: source-id checking

2009-05-19 Thread Ingo Molnar

* Weidong Han weidong@intel.com wrote:

 To support domain-isolation usages, the platform hardware must be 
 capable of uniquely identifying the requestor (source-id) for each 
 interrupt message. Without source-id checking for interrupt 
 remapping , a rouge guest/VM with assigned devices can launch 
 interrupt attacks to bring down anothe guest/VM or the VMM itself.
 
 This patch adds source-id checking for interrupt remapping, and 
 then really isolates interrupts for guests/VMs with assigned 
 devices.
 
 Because PCI subsystem is not initialized yet when set up IOAPIC 
 entries, use read_pci_config_byte to access PCI config space 
 directly.
 
 Signed-off-by: Weidong Han weidong@intel.com
 ---
  arch/x86/kernel/apic/io_apic.c |6 +++
  drivers/pci/intr_remapping.c   |   90 ++-
  drivers/pci/intr_remapping.h   |2 +
  include/linux/dmar.h   |   11 +
  4 files changed, 106 insertions(+), 3 deletions(-)

Code structure looks nice now. (and i susect you have tested this on 
real and relevant hardware?) I've Cc:-ed Eric too ... does this 
direction look good to you too Eric?

Have a few minor nits only:

 diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
 index 30da617..3d10c68 100644
 --- a/arch/x86/kernel/apic/io_apic.c
 +++ b/arch/x86/kernel/apic/io_apic.c
 @@ -1559,6 +1559,9 @@ int setup_ioapic_entry(int apic_id, int irq,
   irte.vector = vector;
   irte.dest_id = IRTE_DEST(destination);
  
 + /* Set source-id of interrupt request */
 + set_ioapic_sid(irte, apic_id);
 +
   modify_irte(irq, irte);
  
   ir_entry-index2 = (index  15)  0x1;
 @@ -3329,6 +3332,9 @@ static int msi_compose_msg(struct pci_dev *pdev, 
 unsigned int irq, struct msi_ms
   irte.vector = cfg-vector;
   irte.dest_id = IRTE_DEST(dest);
  
 + /* Set source-id of interrupt request */
 + set_msi_sid(irte, pdev);
 +
   modify_irte(irq, irte);
  
   msg-address_hi = MSI_ADDR_BASE_HI;
 diff --git a/drivers/pci/intr_remapping.c b/drivers/pci/intr_remapping.c
 index 946e170..9ef7b0d 100644
 --- a/drivers/pci/intr_remapping.c
 +++ b/drivers/pci/intr_remapping.c
 @@ -10,6 +10,8 @@
  #include linux/intel-iommu.h
  #include intr_remapping.h
  #include acpi/acpi.h
 +#include asm/pci-direct.h
 +#include pci.h
  
  static struct ioapic_scope ir_ioapic[MAX_IO_APICS];
  static int ir_ioapic_num;
 @@ -405,6 +407,61 @@ int free_irte(int irq)
   return rc;
  }
  
 +int set_ioapic_sid(struct irte *irte, int apic)
 +{
 + int i;
 + u16 sid = 0;
 +
 + if (!irte)
 + return -1;
 +
 + for (i = 0; i  MAX_IO_APICS; i++)
 + if (ir_ioapic[i].id == apic) {
 + sid = (ir_ioapic[i].bus  8) | ir_ioapic[i].devfn;
 + break;
 + }

Please generally put extra curly braces around each multi-line loop 
body. One reason beyond readability is robustness: the above 
structure can be easily extended in a buggy way via [mockup patch 
hunk]:

   sid = (ir_ioapic[i].bus  8) | ir_ioapic[i].devfn;
   break;
   }
 + if (!sid)
 + break;

And note that if this slips in by accident how unobvious this bug is 
during patch review - the loop head context is not present in the 
3-line default context and the code looks correct at a glance.

With extra braces, such typos or mismerges:

   }
   }
 + if (!sid)
 + break;

stick out during review like a sore thumb :-)

 + if (sid == 0) {
 + printk(KERN_WARNING Failed to set source-id of 
 +I/O APIC (%d), because it is not under 
 +any DRHD\n, apic);
 + return -1;
 + }

please try to keep kernel messages on a single line - even if 
checkpatch complains. Also, it's a good idea to use pr_warning(), 
it's shorter by 8 characters.

 +
 + irte-svt = 1; /* requestor ID verification SID/SQ */
 + irte-sq = 0;  /* comparing all 16-bit of SID */
 + irte-sid = sid;

this is a borderline suggestion:

Note how you already lined up the _comments_ vertically, so you did 
notice that it makes sense to align vertically. The same visual 
arguments can be made for the initialization itself too:

 +
 + irte-svt   = 1;/* requestor ID verification SID/SQ */
 + irte-sq= 0;/* comparing all 16-bit of SID */
 + irte-sid   = sid;
 +
 + return 0;

But ... it might make even more sense to introduce a set_irte() 
helper method, so it can all be written in a single line as:

set_irte(irte, 1, 0, sid);

and explain common values in the set_irte() function's description - 
that way those comments above (and below) dont have to be made at 
the usage sites.

 +}
 +
 +int set_msi_sid(struct irte 

Re: [PATCH v2 1/2] Intel-IOMMU, intr-remap: set the whole 128bits of irte when modify/free it

2009-05-19 Thread Ingo Molnar

* Weidong Han weidong@intel.com wrote:

 Interrupt remapping table entry is 128bits. Currently, it only sets low
 64bits of irte in modify_irte and free_irte. This ignores high 64bits
 setting of irte, that means source-id setting will be ignored. This patch
 sets the whole 128bits of irte when modify/free it. Following source-id
 checking patch depends on this.
 
 Signed-off-by: Weidong Han weidong@intel.com
 ---
  drivers/pci/intr_remapping.c |   10 +++---
  1 files changed, 7 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/pci/intr_remapping.c b/drivers/pci/intr_remapping.c
 index f5e0ea7..946e170 100644
 --- a/drivers/pci/intr_remapping.c
 +++ b/drivers/pci/intr_remapping.c
 @@ -309,7 +309,8 @@ int modify_irte(int irq, struct irte *irte_modified)
   index = irq_iommu-irte_index + irq_iommu-sub_handle;
   irte = iommu-ir_table-base[index];
  
 - set_64bit((unsigned long *)irte, irte_modified-low);
 + set_64bit((unsigned long *)irte-low, irte_modified-low);
 + set_64bit((unsigned long *)irte-high, irte_modified-high);

   __iommu_flush_cache(iommu, irte, sizeof(*irte));
  
   rc = qi_flush_iec(iommu, index, 0);
 @@ -386,8 +387,11 @@ int free_irte(int irq)
   irte = iommu-ir_table-base[index];
  
   if (!irq_iommu-sub_handle) {
 - for (i = 0; i  (1  irq_iommu-irte_mask); i++)
 - set_64bit((unsigned long *)(irte + i), 0);
 + for (i = 0; i  (1  irq_iommu-irte_mask); i++) {
 + set_64bit((unsigned long *)irte-low, 0);
 + set_64bit((unsigned long *)irte-high, 0);
 + irte++;
 + }

The loop is a bit unclean. It has a side-effect on 'irte' - and 
other patterns in the driver usually treat 'irte' as a generally 
available variable.

So the above code, while correct, opens up the possibility of later 
code added to this function relying on 'irte', thinking that it's 
set to iommu-ir_table-base[index], and then breaking because 
'irte' has been iterated to the end of it in certain circumstances.

It's better to factor out the whole loop into a helper function, 
which does something like:

int flush_entries(struct irq_2_iommu *irq_iommu)
{
struct irte *start, *entry, *end;
struct intel_iommu *iommu;
int index;

if (irq_iommu-sub_handle)
return 0;

iommu = irq_iommu-iommu;
index = irq_iommu-irte_index + irq_iommu-sub_handle;

start = iommu-ir_table-base + index;
end = start + (1  irq_iommu-irte_mask);

for (entry = start; entry  end; entry++) {
set_64bit((unsigned long *)entry-low,  0);
set_64bit((unsigned long *)entry-high, 0);
}

return qi_flush_iec(iommu, index, irq_iommu-irte_mask);
}

Note how clearer this is - the new method has one purpose and 
'entry' is a clear iterator.

( And note how much clearer the flow of 'rc' has become as well as a 
  side-effect: it is clear now that it's set to 0 when 
  irq_iommu-sub_handle is still present. )

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2

2009-05-19 Thread Avi Kivity

Alexander Graf wrote:

SVM adds another way to do INVLPG by ASID which Hyper-V makes use of,
so let's implement it!

For now we just do the same thing invlpg does, as asid switching
means we flush the mmu anyways. That might change one day though.

v2 makes invlpga do the same as invlpg, not flush the whole mmu

 
+static int invlpga_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)

+{
+   struct kvm_vcpu *vcpu = svm-vcpu;
+   nsvm_printk(INVLPGA\n);
+
+   /* Let's treat INVLPGA the same as INVLPG */
+   kvm_mmu_invlpg(vcpu, vcpu-arch.regs[VCPU_REGS_RAX]);
+
+   svm-next_rip = kvm_rip_read(svm-vcpu) + 3;
+   skip_emulated_instruction(svm-vcpu);
+   return 1;
+}
  


I think that for ASID!=0 you can actually do nothing.  The guest entry 
is a cr3 switch, so we'll both get a tlb flush and a resync on any 
modified ptes.


For ASID==0 you can do the invlpg thing.

Marcelo?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] Implement Hyper-V MSRs v2

2009-05-19 Thread Avi Kivity

Alexander Graf wrote:

Hyper-V uses some MSRs, some of which are actually reserved for BIOS usage.

But let's be nice today and have it its way, because otherwise it fails
terribly.

v2 changes:
  - remove the 0x4081 MSR definition
  - add pr_unimpl() on unimplemented writes

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/x86/kvm/svm.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ef43a18..4b4eadd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2034,6 +2034,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned 
ecx, u64 data)
case MSR_VM_HSAVE_PA:
svm-hsave_msr = data;
break;
+   case MSR_VM_CR:
+   case MSR_VM_IGNNE:
+   case MSR_K8_HWCR:
+   pr_unimpl(vcpu, unimplemented wrmsr: 0x%x data 0x%llx\n, ecx, 
data);
+   break;
  


We can be nicer, if the write doesn't set bits which we don't implement, 
we can let it proceed silently.  See for example MSR_IA32_DEBUGCTLMSR.  
Most likely the values written are already correctly implemented (by 
doing nothing).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio net regression

2009-05-19 Thread Avi Kivity

Antoine Martin wrote:


But you do have swap enabled?


Yes.

I always do this on the guests as it seems fairer to let the guests use
swap when they need the extra memory rather than over-committing too
much memory on the host. Although it would probably be more efficient
overall to let the host manage all swapping.
It consumes more I/O bandwidth, but most guest's memory stay warm no
matter what other guests are doing.
Does that sound reasonable?
  


Yes, it also provides better isolation.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2

2009-05-19 Thread Alexander Graf


On 19.05.2009, at 14:58, Avi Kivity wrote:


Alexander Graf wrote:

SVM adds another way to do INVLPG by ASID which Hyper-V makes use of,
so let's implement it!

For now we just do the same thing invlpg does, as asid switching
means we flush the mmu anyways. That might change one day though.

v2 makes invlpga do the same as invlpg, not flush the whole mmu

+static int invlpga_interception(struct vcpu_svm *svm, struct  
kvm_run *kvm_run)

+{
+   struct kvm_vcpu *vcpu = svm-vcpu;
+   nsvm_printk(INVLPGA\n);
+
+   /* Let's treat INVLPGA the same as INVLPG */
+   kvm_mmu_invlpg(vcpu, vcpu-arch.regs[VCPU_REGS_RAX]);
+
+   svm-next_rip = kvm_rip_read(svm-vcpu) + 3;
+   skip_emulated_instruction(svm-vcpu);
+   return 1;
+}



I think that for ASID!=0 you can actually do nothing.  The guest  
entry is a cr3 switch, so we'll both get a tlb flush and a resync on  
any modified ptes.


Right, the only situation I can imagine this isn't fulfilled is when  
INVLPGA isn't trapped in the 1st level guest, but issued in the 2nd  
level one. That should be rather rare though ;-).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2

2009-05-19 Thread Avi Kivity

Alexander Graf wrote:
I think that for ASID!=0 you can actually do nothing.  The guest 
entry is a cr3 switch, so we'll both get a tlb flush and a resync on 
any modified ptes.


Right, the only situation I can imagine this isn't fulfilled is when 
INVLPGA isn't trapped in the 1st level guest, but issued in the 2nd 
level one. That should be rather rare though ;-).


Good catch.

Would be better to get it right; changing the test to asid != 
current_asid should suffice.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Nested SVM: Improve interrupt injection v2

2009-05-19 Thread Gleb Natapov
On Tue, May 19, 2009 at 12:54:03PM +0200, Alexander Graf wrote:
 While trying to get Hyper-V running, I realized that the interrupt injection
 mechanisms that are in place right now are not 100% correct.
 
 This patch makes nested SVM's interrupt injection behave more like on a
 real machine.
 
 v2 calls BUG_ON when svm_set_irq is called with GIF=0
 
 Signed-off-by: Alexander Graf ag...@suse.de
 ---
  arch/x86/kvm/svm.c |   39 ---
  1 files changed, 24 insertions(+), 15 deletions(-)
 
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index fa2a710..5b14c9d 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1517,7 +1517,8 @@ static int nested_svm_vmexit_real(struct vcpu_svm *svm, 
 void *arg1,
   /* Kill any pending exceptions */
   if (svm-vcpu.arch.exception.pending == true)
   nsvm_printk(WARNING: Pending Exception\n);
 - svm-vcpu.arch.exception.pending = false;
 + kvm_clear_exception_queue(svm-vcpu);
 + kvm_clear_interrupt_queue(svm-vcpu);
  
What about pending NMI here?
 
   /* Restore selected save entries */
   svm-vmcb-save.es = hsave-save.es;
 @@ -1585,7 +1586,8 @@ static int nested_svm_vmrun(struct vcpu_svm *svm, void 
 *arg1,
   svm-nested_vmcb = svm-vmcb-save.rax;
  
   /* Clear internal status */
 - svm-vcpu.arch.exception.pending = false;
 + kvm_clear_exception_queue(svm-vcpu);
 + kvm_clear_interrupt_queue(svm-vcpu);
  
And here.

   /* Save the old vmcb, so we don't need to pick what we save, but
  can restore everything when a VMEXIT occurs */
 @@ -2277,21 +2279,14 @@ static inline void svm_inject_irq(struct vcpu_svm 
 *svm, int irq)
   ((/*control-int_vector  4*/ 0xf)  V_INTR_PRIO_SHIFT);
  }
  
 -static void svm_queue_irq(struct kvm_vcpu *vcpu, unsigned nr)
 -{
 - struct vcpu_svm *svm = to_svm(vcpu);
 -
 - svm-vmcb-control.event_inj = nr |
 - SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR;
 -}
 -
  static void svm_set_irq(struct kvm_vcpu *vcpu, int irq)
  {
   struct vcpu_svm *svm = to_svm(vcpu);
  
 - nested_svm_intr(svm);
 + BUG_ON(!(svm-vcpu.arch.hflags  HF_GIF_MASK));
  
 - svm_queue_irq(vcpu, irq);
 + svm-vmcb-control.event_inj = irq |
 + SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR;
  }
  
  static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 @@ -2319,13 +2314,25 @@ static int svm_interrupt_allowed(struct kvm_vcpu 
 *vcpu)
   struct vmcb *vmcb = svm-vmcb;
   return (vmcb-save.rflags  X86_EFLAGS_IF) 
   !(vmcb-control.int_state  SVM_INTERRUPT_SHADOW_MASK) 
 - (svm-vcpu.arch.hflags  HF_GIF_MASK);
 + (svm-vcpu.arch.hflags  HF_GIF_MASK) 
 + !is_nested(svm);
  }
  
  static void enable_irq_window(struct kvm_vcpu *vcpu)
  {
 - svm_set_vintr(to_svm(vcpu));
 - svm_inject_irq(to_svm(vcpu), 0x0);
 + struct vcpu_svm *svm = to_svm(vcpu);
 + nsvm_printk(Trying to open IRQ window\n);
 +
 + nested_svm_intr(svm);
 +
 + /* In case GIF=0 we can't rely on the CPU to tell us when
 +  * GIF becomes 1, because that's a separate STGI/VMRUN intercept.
 +  * The next time we get that intercept, this function will be
 +  * called again though and we'll get the vintr intercept. */
 + if (svm-vcpu.arch.hflags  HF_GIF_MASK) {
 + svm_set_vintr(svm);
 + svm_inject_irq(svm, 0x0);
 + }
  }
  
  static void enable_nmi_window(struct kvm_vcpu *vcpu)
 @@ -2393,6 +2400,8 @@ static void svm_complete_interrupts(struct vcpu_svm 
 *svm)
   case SVM_EXITINTINFO_TYPE_EXEPT:
   /* In case of software exception do not reinject an exception
  vector, but re-execute and instruction instead */
 + if (is_nested(svm))
 + break;
   if (vector == BP_VECTOR || vector == OF_VECTOR)
   break;
   if (exitintinfo  SVM_EXITINTINFO_VALID_ERR) {
 -- 
 1.6.0.2
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2

2009-05-19 Thread Marcelo Tosatti
On Tue, May 19, 2009 at 03:58:52PM +0300, Avi Kivity wrote:
 Alexander Graf wrote:
 SVM adds another way to do INVLPG by ASID which Hyper-V makes use of,
 so let's implement it!

 For now we just do the same thing invlpg does, as asid switching
 means we flush the mmu anyways. That might change one day though.

 v2 makes invlpga do the same as invlpg, not flush the whole mmu

  +static int invlpga_interception(struct vcpu_svm *svm, struct kvm_run 
 *kvm_run)
 +{
 +struct kvm_vcpu *vcpu = svm-vcpu;
 +nsvm_printk(INVLPGA\n);
 +
 +/* Let's treat INVLPGA the same as INVLPG */
 +kvm_mmu_invlpg(vcpu, vcpu-arch.regs[VCPU_REGS_RAX]);
 +
 +svm-next_rip = kvm_rip_read(svm-vcpu) + 3;
 +skip_emulated_instruction(svm-vcpu);
 +return 1;
 +}
   

 I think that for ASID!=0 you can actually do nothing.  The guest entry  
 is a cr3 switch, so we'll both get a tlb flush and a resync on any  
 modified ptes.

 For ASID==0 you can do the invlpg thing.

 Marcelo?

kvm_mmu_invlpg is cheap, better just invalidate the entry. If hyper-v
uses invlpga to invalidate TLB entries which it has updated pte's in
memory for, and you skip the invalidation now and somehow later use an
unsync spte, you're toast.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2

2009-05-19 Thread Avi Kivity

Marcelo Tosatti wrote:
I think that for ASID!=0 you can actually do nothing.  The guest entry  
is a cr3 switch, so we'll both get a tlb flush and a resync on any  
modified ptes.


For ASID==0 you can do the invlpg thing.

Marcelo?



kvm_mmu_invlpg is cheap, better just invalidate the entry. If hyper-v
uses invlpga to invalidate TLB entries which it has updated pte's in
memory for, and you skip the invalidation now and somehow later use an
unsync spte, you're toast.
  


But won't the guest entry cause a resync?

Doing nothing is even cheaper.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2

2009-05-19 Thread Marcelo Tosatti
On Tue, May 19, 2009 at 04:56:48PM +0300, Avi Kivity wrote:
 Marcelo Tosatti wrote:
 I think that for ASID!=0 you can actually do nothing.  The guest 
 entry  is a cr3 switch, so we'll both get a tlb flush and a resync on 
 any  modified ptes.

 For ASID==0 you can do the invlpg thing.

 Marcelo?
 

 kvm_mmu_invlpg is cheap, better just invalidate the entry. If hyper-v
 uses invlpga to invalidate TLB entries which it has updated pte's in
 memory for, and you skip the invalidation now and somehow later use an
 unsync spte, you're toast.
   

 But won't the guest entry cause a resync?

If its a cr3/cr4 exit, yes. 

 Doing nothing is even cheaper.

My brain is nested.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-net zero-copy

2009-05-19 Thread Raju Srivastava
On Mon, May 18, 2009 at 10:00 PM, Avi Kivity a...@redhat.com wrote:
 Raju Srivastava wrote:

 Greetings,

 Could someone let me know if current virtio-net supports zero-copy? I
 see some discussion here:
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/28061/
 (copyless virtio net thoughts) and it looks like the copyless
 virtio-net is not supported by KVM yet.

 That is correct.

Thank you for letting me know this.

 If this is true, then is there
 any plan to add the zero copy to the virtio-net?


 Yes, but it will be a difficult journey.

That's great. I'm looking forward to it. It's said Xen NetChannel 2
has some new features including the zero-copy. Though it would be a
difficult journey, it's really worth, right?

Thanks  Regards,
Raju
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu-kvm.git regression in configure

2009-05-19 Thread Beth Kon
Latest qemu-kvm.git fails with ./configure, and reverting 
22d239bcee126742df46938ee8ddc7c6b9209e23 corrects it.



Beth Kon
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2

2009-05-19 Thread Alexander Graf


On 19.05.2009, at 15:58, Marcelo Tosatti wrote:


On Tue, May 19, 2009 at 04:56:48PM +0300, Avi Kivity wrote:

Marcelo Tosatti wrote:

I think that for ASID!=0 you can actually do nothing.  The guest
entry  is a cr3 switch, so we'll both get a tlb flush and a  
resync on

any  modified ptes.

For ASID==0 you can do the invlpg thing.

Marcelo?



kvm_mmu_invlpg is cheap, better just invalidate the entry. If  
hyper-v

uses invlpga to invalidate TLB entries which it has updated pte's in
memory for, and you skip the invalidation now and somehow later  
use an

unsync spte, you're toast.



But won't the guest entry cause a resync?


If its a cr3/cr4 exit, yes.


Well it has to be. Either we're switching from one NPT to the other  
(todo) or do a normal cr3+cr4 switch.


So I guess we can optimize here. Is it worth it?

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm.git regression in configure

2009-05-19 Thread Avi Kivity

Beth Kon wrote:
Latest qemu-kvm.git fails with ./configure, and reverting 
22d239bcee126742df46938ee8ddc7c6b9209e23 corrects it.




Works for me.  What error do you get?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm.git regression in configure

2009-05-19 Thread Beth Kon

Avi Kivity wrote:

Beth Kon wrote:
Latest qemu-kvm.git fails with ./configure, and reverting 
22d239bcee126742df46938ee8ddc7c6b9209e23 corrects it.




Works for me.  What error do you get?



./configure: 1364: Syntax error: ( unexpected (expecting fi)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm.git regression in configure

2009-05-19 Thread Avi Kivity

Beth Kon wrote:

Avi Kivity wrote:

Beth Kon wrote:
Latest qemu-kvm.git fails with ./configure, and reverting 
22d239bcee126742df46938ee8ddc7c6b9209e23 corrects it.




Works for me.  What error do you get?



./configure: 1364: Syntax error: ( unexpected (expecting fi)



Ah, a non-bash shell, no arrays.  I'll sort it out.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2

2009-05-19 Thread Marcelo Tosatti
On Tue, May 19, 2009 at 05:18:07PM +0200, Alexander Graf wrote:

 On 19.05.2009, at 15:58, Marcelo Tosatti wrote:

 On Tue, May 19, 2009 at 04:56:48PM +0300, Avi Kivity wrote:
 Marcelo Tosatti wrote:
 I think that for ASID!=0 you can actually do nothing.  The guest
 entry  is a cr3 switch, so we'll both get a tlb flush and a  
 resync on
 any  modified ptes.

 For ASID==0 you can do the invlpg thing.

 Marcelo?


 kvm_mmu_invlpg is cheap, better just invalidate the entry. If  
 hyper-v
 uses invlpga to invalidate TLB entries which it has updated pte's in
 memory for, and you skip the invalidation now and somehow later  
 use an
 unsync spte, you're toast.


 But won't the guest entry cause a resync?

 If its a cr3/cr4 exit, yes.

 Well it has to be. Either we're switching from one NPT to the other  
 (todo) or do a normal cr3+cr4 switch.

 So I guess we can optimize here. Is it worth it?

IMHO better leave it the way it is, perhaps add a comment that 
the optimization is possible, and do it later if worthwhile.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2

2009-05-19 Thread Avi Kivity

Alexander Graf wrote:

kvm_mmu_invlpg is cheap, better just invalidate the entry. If hyper-v
uses invlpga to invalidate TLB entries which it has updated pte's in
memory for, and you skip the invalidation now and somehow later use an
unsync spte, you're toast.



But won't the guest entry cause a resync?


If its a cr3/cr4 exit, yes.


Well it has to be. Either we're switching from one NPT to the other 
(todo) or do a normal cr3+cr4 switch.


So I guess we can optimize here. Is it worth it?



I think so.  We also need to make sure the entry causes a resync, even 
if cr3 doesn't change.


Oh, exit needs to force a resync as well, in case the guest foolishly 
let its guest touch its page tables and issue invlpga asid=0.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-19 Thread Cam Macdonell

Avi Kivity wrote:

Christian Bornträger wrote:

To summarize, Anthony thinks it should use virtio, while I believe
virtio is useful for exporting guest memory, not for importing host 
memory.



I think the current virtio interface is not ideal for importing host 
memory, but we can change that. If you look at the dcssblk driver for 
s390, it allows a guest to map shared memory segments via a diagnose 
(hypercall). This driver uses PCI regions to map memory.


My point is, that the method to map memory is completely irrelevant, 
we just need something like mmap/shmget between the guest and the 
host. We could define an interface in virtio, that can be used by any 
transport. In case of pci this could be a simple pci map operation.

What do you think about something like: (CCed Rusty)
  


Exactly.



Agreed.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: XP smp using a lot of CPU

2009-05-19 Thread Erik Rull

Hi Avi,

here is the cpuinfo - what do you mean with workload? The CPU isage is 
around 33%.


processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Core(TM)2 CPU T5600  @ 1.83GHz
stepping: 2
cpu MHz : 1833.554
cache size  : 2048 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 
cx16 xtpr lahf_lm

bogomips: 3667.98
clflush size: 64

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Core(TM)2 CPU T5600  @ 1.83GHz
stepping: 2
cpu MHz : 1833.554
cache size  : 2048 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 
cx16 xtpr lahf_lm

bogomips: 3666.43
clflush size: 64

Best regards,

Erik


Avi Kivity wrote:

Erik Rull wrote:

Hi all,

very very interesting.

I have a similar problem but the other way round.
If my XP runs up tp 100% CPU usage top on the linux host reports 
only 33% cpu usage. I would expect around 50% because I only provide 
one core for the guest. I already increased the process priority of 
qemu and the io priority, nothing helped. The rest of the CPU is 
nearly idle, no excessive disk access this time :-)


Any Idea what this could be?


What workload is the guest running?

What is your host cpu type (/proc/cpuinfo)?





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/17] net: drop packet from tap device if all NICs are down

2009-05-19 Thread Mark McLoughlin
On Sun, 2009-05-17 at 10:43 -0500, Anthony Liguori wrote:
 From: Mark McLoughlin mar...@redhat.com
 
 If you do e.g. set_link virtio.0 down and there are packets
 pending on the tap interface, we currently buffer a packet
 and constantly try and send it until the link is up again.
 
 We actually just want to drop the packet if the NIC is down.
 Upstream qemu already does this, we just differ because we
 buffer packets from the tap interface.
 
 [aliguori: rebased this patch on stable.  Mark, please review and Ack]
 
 Reported-by: Yan Vugenfirer yvuge...@redhat.com
 Signed-off-by: Mark McLoughlin mar...@redhat.com
 Signed-off-by: Avi Kivity a...@redhat.com
 Signed-off-by: Anthony Liguori aligu...@us.ibm.com

Looks good to me.

Cheers,
Mark.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2793994 ] kvm (command) doesn't work when vboxdrv module is loaded

2009-05-19 Thread SourceForge.net
Bugs item #2793994, was opened at 2009-05-19 19:52
Message generated for change (Tracker Item Submitted) made by benb
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2793994group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Interface (example)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Ben Bucksch (benb)
Assigned to: Nobody/Anonymous (nobody)
Summary: kvm (command) doesn't work when vboxdrv module is loaded

Initial Comment:
Reproduction:
1. Load kvm_amd kernel module
2. Start kvm VM with kvm command
3. Stop kvm VM
4. Load VirtualBox vboxdrv kernel module
5. Start VirtualBox GUI, start VM, exit VM, close VirtualBox GUI
6. Start kvm VM with kvm -vnc ... command

Actual result:
All steps up to step 5 work.
In Step 6, kvm starts and keeps running, I can connect to VNC, but I only get a 
black screen. No error message.

Expected result:
In step 6, kvm command immediately exits, with an error message:
Another virtual machine manager like VirtualBox, Xen or VMWare is running at 
the moment. Check 'lsmod' that no virtual machine manager modules other than 
'kvm*' are loaded.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2793994group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2793994 ] kvm doesn't work when VirtualBox' vboxdrv module is loaded

2009-05-19 Thread SourceForge.net
Bugs item #2793994, was opened at 2009-05-19 19:52
Message generated for change (Settings changed) made by benb
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2793994group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Interface (example)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Ben Bucksch (benb)
Assigned to: Nobody/Anonymous (nobody)
Summary: kvm doesn't work when VirtualBox' vboxdrv module is loaded

Initial Comment:
Reproduction:
1. Load kvm_amd kernel module
2. Start kvm VM with kvm command
3. Stop kvm VM
4. Load VirtualBox vboxdrv kernel module
5. Start VirtualBox GUI, start VM, exit VM, close VirtualBox GUI
6. Start kvm VM with kvm -vnc ... command

Actual result:
All steps up to step 5 work.
In Step 6, kvm starts and keeps running, I can connect to VNC, but I only get a 
black screen. No error message.

Expected result:
In step 6, kvm command immediately exits, with an error message:
Another virtual machine manager like VirtualBox, Xen or VMWare is running at 
the moment. Check 'lsmod' that no virtual machine manager modules other than 
'kvm*' are loaded.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2793994group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Shared memory device with interrupt support

2009-05-19 Thread Anthony Liguori

Avi Kivity wrote:

Anthony Liguori wrote:
I'd strongly recommend working these patches on qemu-devel and lkml.  
I suspect Avi may disagree with me, but in order for this to be 
eventually merged in either place, you're going to have additional 
requirements put on you.


I don't disagree with the fact that there will be additional 
requirements, but I might disagree with some of those additional 
requirements themselves.


It actually works out better than I think you expect it to...

We can't use mmap() directly.  With the new RAM allocation scheme, I 
think it's pretty reasonable to now allow portions of ram to come from 
files that get mmap() (sort of like -mem-path).


This RAM area could be setup as a BAR.

  In particular I think your proposal was unimplementable; I would 
like to see how how you can address my concerns.


I don't remember what my proposal was to be perfectly honest :-)  I 
think I suggested registering a guest allocated portion of memory as a 
sharable region via virtio?  Why is that unimplementable?


I don't think bulk memory sharing and the current transactional virtio 
mechanisms are a good fit for each other; but if we were to add a 
BAR-like capability to virtio that would address the compatibility 
requirement (though it might be difficult to implement on s390 with 
its requirement on contiguous host virtual address space).


It doesn't necessarily have to be virtio if that's not what makes sense.

The QEMU bits and the device model bits are actually relatively simple.  
The part that I think needs more deep thought is the guest-visible 
interface.


A char device is probably not the best interface.  I think you want 
something like tmpfs/hugetlbfs.  Another question is whether you want a 
guest to be able to share a portion of it's memory with another guest or 
have everything setup by the host.


If everything is setup by the host, hot plug is important.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-19 Thread Anthony Liguori

Christian Bornträger wrote:

Am Montag 18 Mai 2009 16:26:15 schrieb Avi Kivity:
  

Christian Borntraeger wrote:


Sorry for the late question, but I missed your first version. Is there a
way to change that code to use virtio instead of PCI? That would allow us
to use this driver on s390 and maybe other virtio transports.
  

Opinion differs.  See the discussion in
http://article.gmane.org/gmane.comp.emulators.kvm.devel/30119.

To summarize, Anthony thinks it should use virtio, while I believe
virtio is useful for exporting guest memory, not for importing host memory.



I think the current virtio interface is not ideal for importing host memory, 
but we can change that. If you look at the dcssblk driver for s390, it allows 
a guest to map shared memory segments via a diagnose (hypercall). This driver 
uses PCI regions to map memory.


My point is, that the method to map memory is completely irrelevant, we just 
need something like mmap/shmget between the guest and the host. We could 
define an interface in virtio, that can be used by any transport. In case of 
pci this could be a simple pci map operation. 


What do you think about something like: (CCed Rusty)
---
 include/linux/virtio.h |   26 ++
 1 file changed, 26 insertions(+)

Index: linux-2.6/include/linux/virtio.h
===
--- linux-2.6.orig/include/linux/virtio.h
+++ linux-2.6/include/linux/virtio.h
@@ -71,6 +71,31 @@ struct virtqueue_ops {
 };
 
 /**

+ * virtio_device_ops - operations for virtio devices
+ * @map_region: map host buffer at a given address
+ * vdev: the struct virtio_device we're talking about.
+ * addr: The address where the buffer should be mapped (hint only)
+ * length: THe length of the mapping
+ * identifier: the token that identifies the host buffer
+ *  Returns the mapping address or an error pointer.
+ * @unmap_region: unmap host buffer from the address
+ * vdev: the struct virtio_device we're talking about.
+ * addr: The address where the buffer is mapped
+ *  Returns 0 on success or an error
+ *
+ * TBD, we might need query etc.
+ */
+struct virtio_device_ops {
+   void * (*map_region)(struct virtio_device *vdev,
+void *addr,
+size_t length,
+int identifier);
+   int (*unmap_region)(struct virtio_device *vdev, void *addr);
+/* we might need query region and other stuff */
+};
  


Perhaps something that maps closer to the current add_buf/get_buf API.  
Something like:


struct iovec *(*map_buf)(struct virtqueue *vq, unsigned int *out_num, 
unsigned int *in_num);
void (*unmap_buf)(struct virtqueue *vq, struct iovec *iov, unsigned int 
out_num, unsigned int in_num);


There's symmetry here which is good.  The one bad thing about it is 
forces certain memory to be read-only and other memory to be 
read-write.  I don't see that as a bad thing though.


I think we'll need an interface like this so support driver domains too 
since backend.  To put it another way, in QEMU, map_buf == 
virtqueue_pop and unmap_buf == virtqueue_push.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][KVM][retry 3] Add support for Pause Filtering to AMD SVM

2009-05-19 Thread Mark Langsdorf
From 67f831e825b64be5dedae9936ff8a60b884959f2 Mon Sep 17 00:00:00 2001
From: mark.langsd...@amd.com 
Date: Tue, 19 May 2009 07:46:11 -0500
Subject: [PATCH]

This feature creates a new field in the VMCB called Pause
Filter Count.  If Pause Filter Count is greater than 0 and
intercepting PAUSEs is enabled, the processor will increment
an internal counter when a PAUSE instruction occurs instead
of intercepting.  When the internal counter reaches the
Pause Filter Count value, a PAUSE intercept will occur.

This feature can be used to detect contended spinlocks,
especially when the lock holding VCPU is not scheduled.
Rescheduling another VCPU prevents the VCPU seeking the
lock from wasting its quantum by spinning idly.  Perform
the reschedule by increasing the the credited time on
the VCPU.

Experimental results show that most spinlocks are held
for less than 1000 PAUSE cycles or more than a few
thousand.  Default the Pause Filter Counter to 5000 to
detect the contended spinlocks.

Processor support for this feature is indicated by a CPUID
bit.

On a 24 core system running 4 guests each with 16 VCPUs,
this patch improved overall performance of each guest's
32 job kernbench by approximately 1%.  Further performance
improvement may be possible with a more sophisticated
yield algorithm.

-Mark Langsdorf
Operating System Research Center
AMD

Signed-off-by: Mark Langsdorf mark.langsd...@amd.com
---
 arch/x86/include/asm/svm.h |3 ++-
 arch/x86/kvm/svm.c |   13 +
 include/linux/sched.h  |7 +++
 kernel/sched.c |5 +
 4 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 85574b7..1fecb7e 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -57,7 +57,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u16 intercept_dr_write;
u32 intercept_exceptions;
u64 intercept;
-   u8 reserved_1[44];
+   u8 reserved_1[42];
+   u16 pause_filter_count;
u64 iopm_base_pa;
u64 msrpm_base_pa;
u64 tsc_offset;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ef43a18..86df191 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -45,6 +45,7 @@ MODULE_LICENSE(GPL);
 #define SVM_FEATURE_NPT  (1  0)
 #define SVM_FEATURE_LBRV (1  1)
 #define SVM_FEATURE_SVML (1  2)
+#define SVM_FEATURE_PAUSE_FILTER (1  10)
 
 #define DEBUGCTL_RESERVED_BITS (~(0x3fULL))
 
@@ -575,6 +576,11 @@ static void init_vmcb(struct vcpu_svm *svm)
 
svm-nested_vmcb = 0;
svm-vcpu.arch.hflags = HF_GIF_MASK;
+
+   if (svm_has(SVM_FEATURE_PAUSE_FILTER)) {
+   control-pause_filter_count = 3000;
+   control-intercept |= (1ULL  INTERCEPT_PAUSE);
+   }
 }
 
 static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
@@ -2087,6 +2093,12 @@ static int interrupt_window_interception(struct vcpu_svm 
*svm,
return 1;
 }
 
+static int pause_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+   set_task_delay(current, 100);
+   return 1;
+}
+
 static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
  struct kvm_run *kvm_run) = {
[SVM_EXIT_READ_CR0] = emulate_on_interception,
@@ -2123,6 +2135,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
[SVM_EXIT_CPUID]= cpuid_interception,
[SVM_EXIT_IRET] = iret_interception,
[SVM_EXIT_INVD] = emulate_on_interception,
+   [SVM_EXIT_PAUSE]= pause_interception,
[SVM_EXIT_HLT]  = halt_interception,
[SVM_EXIT_INVLPG]   = invlpg_interception,
[SVM_EXIT_INVLPGA]  = invalid_op_interception,
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b4c38bc..683bc65 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2283,6 +2283,9 @@ static inline unsigned int task_cpu(const struct 
task_struct *p)
return task_thread_info(p)-cpu;
 }
 
+extern void set_task_delay(struct task_struct *p, unsigned int delay);
+
+
 extern void set_task_cpu(struct task_struct *p, unsigned int cpu);
 
 #else
@@ -2292,6 +2295,10 @@ static inline unsigned int task_cpu(const struct 
task_struct *p)
return 0;
 }
 
+void set_task_delay(struct task_struct *p, unsigned int delay)
+{
+}
+
 static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
 {
 }
diff --git a/kernel/sched.c b/kernel/sched.c
index b902e58..3174620 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1947,6 +1947,11 @@ task_hot(struct task_struct *p, u64 now, struct 
sched_domain *sd)
return delta  (s64)sysctl_sched_migration_cost;
 }
 
+void set_task_delay(struct task_struct *p, unsigned int delay)
+{
+   p-se.vruntime += delay;
+}
+EXPORT_SYMBOL(set_task_delay);
 
 void 

kvm guest debug using gdb on x86

2009-05-19 Thread Aneesh Kumar K.V
Hi,

With the latest qemu-kvm and 2.6.30-rc6 kernel i am not able to get
the guest debugging with gdb. I get the following error.

$gdb ./vmlinux
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show
copying
and show warranty for details.
This GDB was configured as i486-linux-gnu...
(gdb) b do_fork
Breakpoint 1 at 0xc106cfc8: file kernel/fork.c, line 1347.
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
[New Thread 1]
Remote 'g' packet reply is too long:
7fa557e209c10400c8b3d0c1c03fd1c1a83fd1c1912d03c10202600068007b007b00d8f60b8015407f03
(gdb)

any patches that i can try ?

-aneesh
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm guest debug using gdb on x86

2009-05-19 Thread Aneesh Kumar K.V
On Wed, May 20, 2009 at 12:23:12AM +0530, Aneesh Kumar K.V wrote:
 Hi,
 
 With the latest qemu-kvm and 2.6.30-rc6 kernel i am not able to get
 the guest debugging with gdb. I get the following error.
 
 $gdb ./vmlinux
 GNU gdb 6.8-debian
 Copyright (C) 2008 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later
 http://gnu.org/licenses/gpl.html
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.  Type show
 copying
 and show warranty for details.
 This GDB was configured as i486-linux-gnu...
 (gdb) b do_fork
 Breakpoint 1 at 0xc106cfc8: file kernel/fork.c, line 1347.
 (gdb) target remote localhost:1234
 Remote debugging using localhost:1234
 [New Thread 1]
 Remote 'g' packet reply is too long:
 7fa557e209c10400c8b3d0c1c03fd1c1a83fd1c1912d03c10202600068007b007b00d8f60b8015407f03
 (gdb)
 
 any patches that i can try ?

Works better with the four patches found at

http://git.kiszka.org/?p=kvm-userspace.git;a=shortlog;h=refs/heads/queues/gdb

But a next and continue doesn't get the prompt back on gdb. The guest
does stops the execution.

-aneesh
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] Intel-IOMMU, intr-remap: source-id checking

2009-05-19 Thread Eric W. Biederman
Ingo Molnar mi...@elte.hu writes:

 * Weidong Han weidong@intel.com wrote:

 To support domain-isolation usages, the platform hardware must be 
 capable of uniquely identifying the requestor (source-id) for each 
 interrupt message. Without source-id checking for interrupt 
 remapping , a rouge guest/VM with assigned devices can launch 
 interrupt attacks to bring down anothe guest/VM or the VMM itself.
 
 This patch adds source-id checking for interrupt remapping, and 
 then really isolates interrupts for guests/VMs with assigned 
 devices.
 
 Because PCI subsystem is not initialized yet when set up IOAPIC 
 entries, use read_pci_config_byte to access PCI config space 
 directly.
 
 Signed-off-by: Weidong Han weidong@intel.com
 ---
  arch/x86/kernel/apic/io_apic.c |6 +++
  drivers/pci/intr_remapping.c   |   90 
 ++-
  drivers/pci/intr_remapping.h   |2 +
  include/linux/dmar.h   |   11 +
  4 files changed, 106 insertions(+), 3 deletions(-)

 Code structure looks nice now. (and i susect you have tested this on 
 real and relevant hardware?) I've Cc:-ed Eric too ... does this 
 direction look good to you too Eric?

Being a major nitpick, I have to point out that the code is not
structured to support other iommus, and I think AMD has one that can
do this as well. 

The early pci reading of the bus is just wrong.  What happens if the
pci layer decided to renumber things?  It looks like we have a real
dependency on pci there and are avoiding sorting it out with this.

Hmm.  But that is what we use in setup_ioapic_sid
I expect the right solution is to delay enabling ioapic entries
until driver enable them.   That could also reduce screaming
irqs during bootup in the kdump case.

set_msi_sid looks wrong.  The comment are unhelpful. irte-svt should
get an enum value or a deine (removing the repeated explanations of
the magic value) and then we could have room to explain why we
are doing what we are doing.

Not finding an upstream pcie_bridge and then concluding we are a pcie
device seems bogus.

Why if we do have an upstream pcie bridge do we only want to do a bus
range verification instead of checking just for the bus :devfn?

The legacy PCI case seems even stranger.



The table of apic information by apic_id also seems wrong.   Don't
we have chip_data or something that should point it  that we can
get from the irq?

Eric
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tun/tap and Vlans

2009-05-19 Thread Lukas Kolbe
Am Dienstag, den 19.05.2009, 10:45 +0300 schrieb Avi Kivity:

Hi,

  GuestHost  
  kvm1 --- eth0 -+- bridge0 --- vlan1 \
 | +-- eth0
  kvm2 -+- eth0 -/ /
\- eth1 --- bridge1 --- vlan2 +
 
  When sending packets through kvm2/eth0, they appear on both bridges and
  also vlans, also when sending packets through kvm2/eth1. When the guest
  has only one interface, the packets only appear on one bridge and one
  vlan as it's supposed to be.
 
  Can this be worked around?

 
 This is strange.  Can you post the command line you used to start kvm2?

Please bear with me - this was a few weeks ago and we didn't investigate
further as we had other problems to solve. I'll set up a testbed next
week and hope to report back with more details.

-- 
Lukas



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7] kvm: Use a bitmap for tracking used GSIs

2009-05-19 Thread Alex Williamson
We're currently using a counter to track the most recent GSI we've
handed out.  This quickly hits KVM_MAX_IRQ_ROUTES when using device
assignment with a driver that regularly toggles the MSI enable bit
(such as Linux kernels 2.6.21-26).  This can mean only a few minutes
of usable run time.  Instead, track used GSIs in a bitmap.

Signed-off-by: Alex Williamson alex.william...@hp.com
---

 v2: Added mutex to protect gsi bitmap
 v3: Updated for comments from Michael Tsirkin
 No longer depends on [PATCH] kvm: device-assignment: Catch GSI overflow
 v4: Fix gsi_bytes calculation noted by Sheng Yang
 v5: Remove mutex per Avi
 Fix negative gsi_count path per Michael
 Remove KVM_CAP_IRQ_ROUTING per Michael, ppc should still be protected
 by the KVM_IOAPIC_NUM_PINS check
 v6: Make use of ALIGN macro, per Michael
 Define KVM_IOAPIC_NUM_PINS if not already, per Michael
 Fix comment indent, per Michael
 Remove unused BITMAP_SIZE macro
 v7: Don't define KVM_IOAPIC_NUM_PINS, mark bitmap in common
 paths so we can stay blissfully ignorant of ioapics

 kvm/libkvm/kvm-common.h |3 +-
 kvm/libkvm/libkvm.c |   91 +--
 2 files changed, 73 insertions(+), 21 deletions(-)

diff --git a/kvm/libkvm/kvm-common.h b/kvm/libkvm/kvm-common.h
index 591fb53..c95c591 100644
--- a/kvm/libkvm/kvm-common.h
+++ b/kvm/libkvm/kvm-common.h
@@ -67,7 +67,8 @@ struct kvm_context {
struct kvm_irq_routing *irq_routes;
int nr_allocated_irq_routes;
 #endif
-   int max_used_gsi;
+   void *used_gsi_bitmap;
+   int max_gsi;
 };
 
 int kvm_alloc_kernel_memory(kvm_context_t kvm, unsigned long memory,
diff --git a/kvm/libkvm/libkvm.c b/kvm/libkvm/libkvm.c
index ba0a5d1..c5d6a7f 100644
--- a/kvm/libkvm/libkvm.c
+++ b/kvm/libkvm/libkvm.c
@@ -61,10 +61,32 @@
 #define DPRINTF(fmt, args...) do {} while (0)
 #endif
 
+#define MIN(x,y) ((x)  (y) ? (x) : (y))
+#define ALIGN(x, y) (((x)+(y)-1)  ~((y)-1))
 
 int kvm_abi = EXPECTED_KVM_API_VERSION;
 int kvm_page_size;
 
+static inline void set_gsi(kvm_context_t kvm, unsigned int gsi)
+{
+   uint32_t *bitmap = kvm-used_gsi_bitmap;
+
+   if (gsi  kvm-max_gsi)
+   bitmap[gsi / 32] |= 1U  (gsi % 32);
+   else
+   DPRINTF(Invalid GSI %d\n);
+}
+
+static inline void clear_gsi(kvm_context_t kvm, unsigned int gsi)
+{
+   uint32_t *bitmap = kvm-used_gsi_bitmap;
+
+   if (gsi  kvm-max_gsi)
+   bitmap[gsi / 32] = ~(1U  (gsi % 32));
+   else
+   DPRINTF(Invalid GSI %d\n);
+}
+
 struct slot_info {
unsigned long phys_addr;
unsigned long len;
@@ -285,7 +307,7 @@ kvm_context_t kvm_init(struct kvm_callbacks *callbacks,
 {
int fd;
kvm_context_t kvm;
-   int r;
+   int r, gsi_count;
 
fd = open(/dev/kvm, O_RDWR);
if (fd == -1) {
@@ -323,6 +345,23 @@ kvm_context_t kvm_init(struct kvm_callbacks *callbacks,
kvm-no_irqchip_creation = 0;
kvm-no_pit_creation = 0;
 
+   gsi_count = kvm_get_gsi_count(kvm);
+   if (gsi_count  0) {
+   int gsi_bits, i;
+
+   /* Round up so we can search ints using ffs */
+   gsi_bits = ALIGN(gsi_count, 32);
+   kvm-used_gsi_bitmap = malloc(gsi_bits / 8);
+   if (!kvm-used_gsi_bitmap)
+   goto out_close;
+   memset(kvm-used_gsi_bitmap, 0, gsi_bits / 8);
+   kvm-max_gsi = gsi_bits;
+
+   /* Mark any over-allocated bits as already in use */
+   for (i = gsi_count; i  gsi_bits; i++)
+   set_gsi(kvm, i);
+   }
+
return kvm;
  out_close:
close(fd);
@@ -626,9 +665,6 @@ int kvm_get_dirty_pages(kvm_context_t kvm, unsigned long 
phys_addr, void *buf)
return kvm_get_map(kvm, KVM_GET_DIRTY_LOG, slot, buf);
 }
 
-#define ALIGN(x, y)  (((x)+(y)-1)  ~((y)-1))
-#define BITMAP_SIZE(m) (ALIGN(((m)/PAGE_SIZE), sizeof(long) * 8) / 8)
-
 int kvm_get_dirty_pages_range(kvm_context_t kvm, unsigned long phys_addr,
  unsigned long len, void *buf, void *opaque,
  int (*cb)(unsigned long start, unsigned long len,
@@ -1298,8 +1334,8 @@ int kvm_add_routing_entry(kvm_context_t kvm,
new-flags = entry-flags;
new-u = entry-u;
 
-   if (entry-gsi  kvm-max_used_gsi)
-   kvm-max_used_gsi = entry-gsi;
+   set_gsi(kvm, entry-gsi);
+
return 0;
 #else
return -ENOSYS;
@@ -1327,12 +1363,14 @@ int kvm_del_routing_entry(kvm_context_t kvm,
 {
 #ifdef KVM_CAP_IRQ_ROUTING
struct kvm_irq_routing_entry *e, *p;
-   int i, found = 0;
+   int i, gsi, found = 0;
+
+   gsi = entry-gsi;
 
for (i = 0; i  kvm-irq_routes-nr; ++i) {
e = kvm-irq_routes-entries[i];
if (e-type == entry-type
-e-gsi == entry-gsi) {
+e-gsi == gsi) {

Does KVM suffer from ACK-compression as you increase the number of VMs?

2009-05-19 Thread Andrew de Andrade
I recently read the following paper from 2004 that discusses ACK- 
compression in a VMware GSX 2.5.1 environment.

http://www.cs.clemson.edu/~jmarty/papers/ccn2004.pdf

I was wondering if anyone had checked to see if KVM also suffers from  
ACK-compression as you increase the number of VMs on each host  
(increasing virtualization overhead)?


If it does suffer delays, what solutions exist for remedying this?

In addition to that, I was also curious what the maximum number of VMs  
people have been able to fit on a host, and what bottlenecks they  
encountered as they reached a maximum level of VMs before things fell  
apart.


thanks,

andrew

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm building issue

2009-05-19 Thread Nitin A Kamble
Hi Avi,
  I am trying to build the kvm with changed repositories. I was trying
to follow the instructions from here:
http://www.linux-kvm.org/page/Code. Especially this section: building
an external module with older kernels from that page.

 I find the kernel directory is missing in the qemu-kvm.git repository.
Hence the make sync is not working anymore. 


With the new repository setup, how do I build latest qemu-kvm with
latest kvm modules for fedora 10 kernel?

Thanks,
Nitin



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu-kvm: Handle -no-shutodwn

2009-05-19 Thread Daniel Gollub
Plain QEMU has the parameter -no-shutdown. This avoids termination of the qemu
process when VM got shutdown (e.g. to still use the QEMU-Monitor with stopped
VM). This parameter has no effect on qemu-kvm, today.

This patch introduces identical handling, as in qemu, of -no-shutdown for
qemu-kvm:

 * termination of qemu-kvm process on a VM shutdown get only avoided once
 * second shutdown of VM cause termination of qemu-kvm (like in qemu)

Signed-off-by: Daniel Gollub gol...@b1-systems.de

--- 
 qemu-kvm.c |9 ++---
 sysemu.h   |1 +
 vl.c   |7 +++
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 5e4002b..b9926eb 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -597,9 +597,12 @@ int kvm_main_loop(void)
 
 while (1) {
 main_loop_wait(1000);
-if (qemu_shutdown_requested())
-break;
-else if (qemu_powerdown_requested())
+if (qemu_shutdown_requested()) {
+if (qemu_no_shutdown()) {
+vm_stop(0);
+} else
+break;
+   } else if (qemu_powerdown_requested())
 qemu_system_powerdown();
 else if (qemu_reset_requested())
qemu_kvm_system_reset();
diff --git a/sysemu.h b/sysemu.h
index 1f45fd6..0dd184d 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -35,6 +35,7 @@ void cpu_disable_ticks(void);
 void qemu_system_reset_request(void);
 void qemu_system_shutdown_request(void);
 void qemu_system_powerdown_request(void);
+int qemu_no_shutdown(void);
 int qemu_shutdown_requested(void);
 int qemu_reset_requested(void);
 int qemu_powerdown_requested(void);
diff --git a/vl.c b/vl.c
index d9f0607..9b2a420 100644
--- a/vl.c
+++ b/vl.c
@@ -3644,6 +3644,13 @@ static int powerdown_requested;
 static int debug_requested;
 static int vmstop_requested;
 
+int qemu_no_shutdown(void)
+{
+int r = no_shutdown;
+no_shutdown = 0;
+return r;
+}
+
 int qemu_shutdown_requested(void)
 {
 int r = shutdown_requested;

-- 
Daniel GollubGeschaeftsfuehrer: Ralph Dehner
FOSS Developer   Unternehmenssitz:  Vohburg
B1 Systems GmbH  Amtsgericht:   Ingolstadt
Mobil: +49-(0)-160 47 73 970 Handelsregister:   HRB 3537
EMail: gol...@b1-systems.de  http://www.b1-systems.de

Adresse: B1 Systems GmbH, Osterfeldstraße 7, 85088 Vohburg
http://pgpkeys.pca.dfn.de/pks/lookup?op=getsearch=0xED14B95C2F8CA78D
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-19 Thread Rusty Russell
On Wed, 20 May 2009 02:21:08 am Cam Macdonell wrote:
 Avi Kivity wrote:
  Christian Bornträger wrote:
  To summarize, Anthony thinks it should use virtio, while I believe
  virtio is useful for exporting guest memory, not for importing host
  memory.

Yes, precisely.

But what's it *for*, this shared memory?  Implementing shared memory is 
trivial.  Using it is harder.  For example, inter-guest networking: you'd have 
to copy packets in and out, making it slow as well as losing abstraction.

The only interesting idea I can think of is exposing it to userspace, and 
having that run some protocol across it for fast app - app comms.  But if 
that's your plan, you still have a lot of code the write!

So I guess I'm missing the big picture here?

Thanks,
Rusty.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html