Question: data consistency on fail-over using shared disk

2010-07-20 Thread Takuya Yoshikawa
Hi,


We are now checking about what we should do on vm fail-over.

Concerning this, does anybody know about any danger about data
consistency when we are using shared disk?


What I'm concerning is if crashed VM-side host is still holding
buffered data, starting a new VM instance on another node may
result in file system corruption.

  This problem may similar to live-migration but little bit different
  in the sense that VM is crashed -> cannot do anything from that point.


How about the combination of old or new guest OS and the following
settings?

 - writethrough
 - writeback
 - none

If needed, we'll do sync by HA-side scripts before starting a new VM
instance.


Thanks,
  Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Freezing Windows 2008 x64bit guest

2010-07-20 Thread Harri Olin

Gleb Natapov kirjoitti:

On Mon, Jul 19, 2010 at 10:17:02AM +0300, Harri Olin wrote:

Gleb Natapov kirjoitti:

On Thu, Jul 15, 2010 at 03:19:44PM +0200, Christoph Adomeit wrote:

But one Windows 2008 64 Bit Server Standard is freezing regularly.
This happens sometimes 3 times a day, sometimes it takes 2 days
until freeze. The Windows Machine is a clean fresh install.

I think I have seen same problem occur on my Windows 2008 SBS SP2
64bit system, but a bit less often, only like once a week.
Now I haven't seen crashes but only freezes with qemu on 100% and
virtual system unresponsive.

qemu command line:
  /usr/local/qemu-kvm-0.11.1/bin/qemu-system-x86_64 -drive
file=/dev/rigelvg/w2k8system,cache=none,boot=on -drive
file=/dev/rigelvg/w2k8data,cache=none -m 6144 -vnc :1 -net
nic,macaddr=C0:FF:12:FB:AA:01,model=e1000 -net tap -smp 4 -localtime



Try with different model then e1000 please. Default driver that comes
with Windows known to have problems.


Didn't help, changed to default realtek emulation, but system freezed 
again. Command line:
/usr/local/qemu-kvm-0.11.1/bin/qemu-system-x86_64 -drive 
file=/dev/rigelvg/w2k8system,cache=none,boot=on -drive 
file=/dev/rigelvg/w2k8data,cache=none -m 6144 -vnc :1 -net 
nic,macaddr=C0:FF:12:FB:AA:01 -net tap -smp 4 -localtime


This time the virtual system was not totally unresponsive somehow, 
system pinged just fine and I think DNS server worked too. However on 
console mouse moved but didn't react to clicking, ctrl-alt-del, etc.





When hang occurs ensure that problematic vm is the only on on the server and run
kvm_stat. Send output here. Also do the following:

# mount -t debugfs debugfs /sys/kernel/debug
# echo kvm > /sys/kernel/debug/tracing/set_event
# sleep 1
# cat /sys/kernel/debug/tracing/trace > /tmp/trace

Send /tmp/trace here too, but it may be huge, so send only last 1000
lines.


Stats from this hang:
log style http://mizar.remote.agasha.com/k/kvm/kvm_stat_2_log.txt

kvm statistics

 efer_reload  0   0
 exits   5368720296   21258
 fpu_reload  28267507212216
 halt_exits   695286653   0
 halt_wakeup  685187661   0
 host_state_reload   30103996402216
 hypercalls   0   0
 insn_emulation  27653520326542
 insn_emulation_fail477   0
 invlpg42417110 204
 io_exits 818764026 122
 irq_exits1096595729870
 irq_injections   7889920324643
 irq_window221702771718
 largepages   0   0
 mmio_exits  1490912756  69
 mmu_cache_miss 2471963   0
 mmu_flooded 905690   0
 mmu_pde_zapped 1821325   0
 mmu_pte_updated3256583   0
 mmu_pte_write  2896199   0
 mmu_recycled259845   0


full trace: http://mizar.remote.agasha.com/k/kvm/kvm_trace_2_full.txt

 qemu-system-x86-5155  [002] 1248198.487147: kvm_entry: vcpu 3
 qemu-system-x86-5152  [000] 1248198.487147: kvm_inj_virq: irq 209
 qemu-system-x86-5152  [000] 1248198.487148: kvm_entry: vcpu 0
 qemu-system-x86-5154  [001] 1248198.487148: kvm_exit: reason 
apic_access rip 0xf80001c9df13
 qemu-system-x86-5155  [002] 1248198.487148: kvm_exit: reason 
apic_access rip 0xf80001c9df13
 qemu-system-x86-5152  [000] 1248198.487150: kvm_exit: reason 
apic_access rip 0xf80001c9d5c4
 qemu-system-x86-5154  [001] 1248198.487150: kvm_mmio: mmio write len 4 
gpa 0xfee000b0 val 0x0
 qemu-system-x86-5154  [001] 1248198.487150: kvm_apic: apic_write 
APIC_EOI = 0x0
 qemu-system-x86-5155  [002] 1248198.487150: kvm_mmio: mmio write len 4 
gpa 0xfee000b0 val 0x0
 qemu-system-x86-5155  [002] 1248198.487151: kvm_apic: apic_write 
APIC_EOI = 0x0
 qemu-system-x86-5154  [001] 1248198.487151: kvm_ack_irq: irqchip 
IOAPIC pin 2
 qemu-system-x86-5155  [002] 1248198.487151: kvm_ack_irq: irqchip 
IOAPIC pin 2
 qemu-system-x86-5152  [000] 1248198.487151: kvm_mmio: mmio write len 4 
gpa 0xfee000b0 val 0x0
 qemu-system-x86-5152  [000] 1248198.487152: kvm_apic: apic_write 
APIC_EOI = 0x0

 qemu-system-x86-5154  [001] 1248198.487152: kvm_entry: vcpu 2
 qemu-system-x86-5155  [002] 1248198.487152: kvm_entry: vcpu 3
 qemu-system-x86-5152  [000] 1248198.487152: kvm_ack_irq: irqchip 
IOAPIC pin 2
 qemu-system-x86-5153  [003] 1248198.487153: kvm_exit: reason ext_irq 
rip 0xf80001d04a00

 qemu-system-x86-5152  [000] 1248198.487153: kvm_entry: vcpu 0
 qemu-system-x86-5153  [003] 1248198.487154: kvm_inj_virq: irq 209
 qemu-system-x86-5153  [003] 1248198.487155: kvm_entry: vcpu 1
 qemu-system-x86-5153  [003] 1248198.487157: kvm_exit: reason 
apic_access rip 0xf80001c9df13
 qemu-system-x86-5153  [003] 1248198.487159: kvm_mmio: mmio write len 4 
gpa 0xfee000b0 val 0x0
 qemu-system-x86-5153  [003] 1248198.487159: kvm_apic: apic_write 
APIC_EOI = 0x0
 qemu-system-x86-5153  [003] 1248198.487160: kvm_ack_irq: irqchip 
IOAPIC pin 2

 qemu-system-x86-5153  [003] 1248198.

Re: [PATCH v2 3/3] KVM: Non-atomic interrupt injection

2010-07-20 Thread Avi Kivity

On 07/21/2010 03:55 AM, Marcelo Tosatti wrote:



--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4709,6 +4709,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (unlikely(r))
goto out;

+   inject_pending_event(vcpu);
+
+   /* enable NMI/IRQ window open exits if needed */
+   if (vcpu->arch.nmi_pending)
+   kvm_x86_ops->enable_nmi_window(vcpu);
+   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
+   kvm_x86_ops->enable_irq_window(vcpu);
+
+   if (kvm_lapic_enabled(vcpu)) {
+   update_cr8_intercept(vcpu);
+   kvm_lapic_sync_to_vapic(vcpu);
+   }
+
preempt_disable();

kvm_x86_ops->prepare_guest_switch(vcpu);
@@ -4727,23 +4740,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
smp_wmb();
local_irq_enable();
preempt_enable();
+   kvm_x86_ops->cancel_injection(vcpu);
r = 1;
goto out;
}

-   inject_pending_event(vcpu);
-
-   /* enable NMI/IRQ window open exits if needed */
-   if (vcpu->arch.nmi_pending)
-   kvm_x86_ops->enable_nmi_window(vcpu);
-   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
-   kvm_x86_ops->enable_irq_window(vcpu);
-
-   if (kvm_lapic_enabled(vcpu)) {
-   update_cr8_intercept(vcpu);
-   kvm_lapic_sync_to_vapic(vcpu);
-   }
-
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);

kvm_guest_enter();
 

This breaks

int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
{
 struct kvm_lapic *apic = vcpu->arch.apic;
 int highest_irr;

 /* This may race with setting of irr in __apic_accept_irq() and
  * value returned may be wrong, but kvm_vcpu_kick() in
  * __apic_accept_irq
  * will cause vmexit immediately and the value will be
  * recalculated
  * on the next vmentry.
  */

(also valid for nmi_pending and PIC). Can't simply move
atomic_set(guest_mode, 1) in preemptible section as that would make it
possible for kvm_vcpu_kick to IPI stale vcpu->cpu.
   


Right.  Can fix by adding a kvm_make_request() to force the retry loop.


Also should undo vmx.rmode.* ?
   


Elaborate?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] device-assignment: Use PCI I/O port sysfs resource file when available

2010-07-20 Thread Alex Williamson
When supported by the host kernel, we can use read/write on the
PCI sysfs resource file for I/O port regions.  This allows us to
avoid raw in/out commands and works with deprivileged guests via
libvirt.  For uid 0 callers, we use in/out directly to avoid any
compatibility issues.

Signed-off-by: Alex Williamson 
---

 Required kernel patch pending here:
 http://www.spinics.net/lists/linux-pci/msg09389.html

 v2: Drop getuid() since it doesn't guarantee permissions
 Don't use in/out as a fallback since we don't have permissions
 Consolidate ioport read/write functions

 hw/device-assignment.c |  205 
 hw/device-assignment.h |1 
 2 files changed, 120 insertions(+), 86 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 2bba22f..2e141ac 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -62,93 +62,100 @@ static void assigned_dev_load_option_rom(AssignedDevice 
*dev);
 
 static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
 
-static uint32_t guest_to_host_ioport(AssignedDevRegion *region, uint32_t addr)
+static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
+   uint32_t addr, int len, uint32_t *val)
 {
-return region->u.r_baseport + (addr - region->e_physbase);
+uint32_t ret = 0;
+uint32_t offset = addr - dev_region->e_physbase;
+int fd = dev_region->region->resource_fd;
+
+if (fd >= 0) {
+if (val) {
+DEBUG("pwrite val=%x, len=%d, e_phys=%x, offset=%x\n",
+  *val, len, addr, offset);
+if (pwrite(fd, val, len, offset) != len) {
+fprintf(stderr, "%s - pwrite failed %s\n",
+__func__, strerror(errno));
+}
+} else {
+if (pread(fd, &ret, len, offset) != len) {
+fprintf(stderr, "%s - pread failed %s\n",
+__func__, strerror(errno));
+ret = (1UL << (len * 8)) - 1;
+}
+DEBUG("pread ret=%x, len=%d, e_phys=%x, offset=%x\n",
+  ret, len, addr, offset);
+}
+} else {
+uint32_t port = offset + dev_region->u.r_baseport;
+
+if (val) {
+DEBUG("out val=%x, len=%d, e_phys=%x, host=%x\n",
+  *val, len, addr, port);
+switch (len) {
+case 1:
+outb(*val, port);
+break;
+case 2:
+outw(*val, port);
+break;
+case 4:
+outl(*val, port);
+break;
+}
+} else {
+switch (len) {
+case 1:
+ret = inb(port);
+break;
+case 2:
+ret = inw(port);
+break;
+case 4:
+ret = inl(port);
+break;
+}
+DEBUG("in val=%x, len=%d, e_phys=%x, host=%x\n",
+  ret, len, addr, port);
+}
+}
+return ret;
 }
 
 static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr,
uint32_t value)
 {
-AssignedDevRegion *r_access = opaque;
-uint32_t r_pio = guest_to_host_ioport(r_access, addr);
-
-DEBUG("r_pio=%08x e_physbase=%08x r_baseport=%08lx value=%08x\n",
- r_pio, (int)r_access->e_physbase,
- (unsigned long)r_access->u.r_baseport, value);
-
-outb(value, r_pio);
+assigned_dev_ioport_rw(opaque, addr, 1, &value);
+return;
 }
 
 static void assigned_dev_ioport_writew(void *opaque, uint32_t addr,
uint32_t value)
 {
-AssignedDevRegion *r_access = opaque;
-uint32_t r_pio = guest_to_host_ioport(r_access, addr);
-
-DEBUG("r_pio=%08x e_physbase=%08x r_baseport=%08lx value=%08x\n",
-  r_pio, (int)r_access->e_physbase,
- (unsigned long)r_access->u.r_baseport, value);
-
-outw(value, r_pio);
+assigned_dev_ioport_rw(opaque, addr, 2, &value);
+return;
 }
 
 static void assigned_dev_ioport_writel(void *opaque, uint32_t addr,
uint32_t value)
 {
-AssignedDevRegion *r_access = opaque;
-uint32_t r_pio = guest_to_host_ioport(r_access, addr);
-
-DEBUG("r_pio=%08x e_physbase=%08x r_baseport=%08lx value=%08x\n",
- r_pio, (int)r_access->e_physbase,
-  (unsigned long)r_access->u.r_baseport, value);
-
-outl(value, r_pio);
+assigned_dev_ioport_rw(opaque, addr, 4, &value);
+return;
 }
 
 static uint32_t assigned_dev_ioport_readb(void *opaque, uint32_t addr)
 {
-AssignedDevRegion *r_access = opaque;
-uint32_t r_pio = guest_to_host_ioport(r_access, addr);
-uint32_t value;
-
-value = inb(r_pio);
-
-DEBUG("r_pio=%08x e_physbase=%08x r_=%08lx value=%08x\n",
-  r_pio, (int)r_

Re: [BUG?] vhost assert error with < 4GB of RAM

2010-07-20 Thread Michael S. Tsirkin
On Tue, Jul 20, 2010 at 02:42:19PM -0600, Cam Macdonell wrote:
> I think I've found a bug when running a guest with vhost with less
> than 4GB of RAM.
> 
> If a guest has less than 4GB of RAM, then above_4g_mem_size is 0 for
> this call to cpu_register_physical_memory() in pc_memory_init() from
> hw/pc.c:922
> 
> #if TARGET_PHYS_ADDR_BITS > 32
> cpu_register_physical_memory(0x1ULL, above_4g_mem_size,
>  ram_addr + below_4g_mem_size);
> #endif

Yes, the fix is in qemu already, it's a matter of merging into qemu-kvm.

> this leads to vhost_client_set_memory being called with size == 0
> 
> #3  0x004301f3 in vhost_client_set_memory (client=0x113b010,
> start_addr=4294967296, size=0, phys_offset=3221225472)
> at /home/cam/research/KVM/qemu-kvm/hw/vhost.c:312
> 
> which trips the assert at hw/vhost.c:312
> 
> static void vhost_client_set_memory(CPUPhysMemoryClient *client,
> target_phys_addr_t start_addr,
> ram_addr_t size,
> ram_addr_t phys_offset)
> {
> 
> ..
> 
> assert(size);
> ...
> 
> something like the following fixes the problem but I'm not sure if
> it's the proper way to handle it.
> 
> diff --git a/exec.c b/exec.c
> index 5e9a5b7..991abfc 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2592,7 +2592,9 @@ void
> cpu_register_physical_memory_offset(target_phys_addr_t start_addr,
>  ram_addr_t orig_size = size;
>  subpage_t *subpage;
> 
> -cpu_notify_set_memory(start_addr, size, phys_offset);
> +if (size > 0) {
> +cpu_notify_set_memory(start_addr, size, phys_offset);
> +}
> 
>  if (phys_offset == IO_MEM_UNASSIGNED) {
>  region_offset = start_addr;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] KVM: Non-atomic interrupt injection

2010-07-20 Thread Marcelo Tosatti
On Tue, Jul 20, 2010 at 04:17:07PM +0300, Avi Kivity wrote:
> Change the interrupt injection code to work from preemptible, interrupts
> enabled context.  This works by adding a ->cancel_injection() operation
> that undoes an injection in case we were not able to actually enter the guest
> (this condition could never happen with atomic injection).
> 
> Signed-off-by: Avi Kivity 
> ---
>  arch/x86/include/asm/kvm_host.h |1 +
>  arch/x86/kvm/svm.c  |   12 
>  arch/x86/kvm/vmx.c  |   11 +++
>  arch/x86/kvm/x86.c  |   27 ++-
>  4 files changed, 38 insertions(+), 13 deletions(-)
> 

> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4709,6 +4709,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>   if (unlikely(r))
>   goto out;
>  
> + inject_pending_event(vcpu);
> +
> + /* enable NMI/IRQ window open exits if needed */
> + if (vcpu->arch.nmi_pending)
> + kvm_x86_ops->enable_nmi_window(vcpu);
> + else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> + kvm_x86_ops->enable_irq_window(vcpu);
> +
> + if (kvm_lapic_enabled(vcpu)) {
> + update_cr8_intercept(vcpu);
> + kvm_lapic_sync_to_vapic(vcpu);
> + }
> +
>   preempt_disable();
>  
>   kvm_x86_ops->prepare_guest_switch(vcpu);
> @@ -4727,23 +4740,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>   smp_wmb();
>   local_irq_enable();
>   preempt_enable();
> + kvm_x86_ops->cancel_injection(vcpu);
>   r = 1;
>   goto out;
>   }
>  
> - inject_pending_event(vcpu);
> -
> - /* enable NMI/IRQ window open exits if needed */
> - if (vcpu->arch.nmi_pending)
> - kvm_x86_ops->enable_nmi_window(vcpu);
> - else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> - kvm_x86_ops->enable_irq_window(vcpu);
> -
> - if (kvm_lapic_enabled(vcpu)) {
> - update_cr8_intercept(vcpu);
> - kvm_lapic_sync_to_vapic(vcpu);
> - }
> -
>   srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
>  
>   kvm_guest_enter();

This breaks

int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;
int highest_irr;

/* This may race with setting of irr in __apic_accept_irq() and
 * value returned may be wrong, but kvm_vcpu_kick() in
 * __apic_accept_irq
 * will cause vmexit immediately and the value will be
 * recalculated
 * on the next vmentry.
 */

(also valid for nmi_pending and PIC). Can't simply move
atomic_set(guest_mode, 1) in preemptible section as that would make it
possible for kvm_vcpu_kick to IPI stale vcpu->cpu.

Also should undo vmx.rmode.* ?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/6] KVM: MMU: fix forgot reserved bits check in speculative path

2010-07-20 Thread Xiao Guangrong


Xiao Guangrong wrote:
> In the speculative path, we should check guest pte's reserved bits just as
> the real processor does
> 

Ping..?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] device-assignment: Use PCI I/O port sysfs resource file when available

2010-07-20 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
> When supported by the host kernel, we can use read/write on the
> PCI sysfs resource file for I/O port regions.  This allows us to
> avoid raw in/out commands and works with deprivileged guests via
> libvirt.  For uid 0 callers, we use in/out directly to avoid any
> compatibility issues.

won't uid 0 test will fail if libvirt launches qemu with user set to
root (capabilities still get dropped)?

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] device-assignment: Use PCI I/O port sysfs resource file when available

2010-07-20 Thread Alex Williamson
When supported by the host kernel, we can use read/write on the
PCI sysfs resource file for I/O port regions.  This allows us to
avoid raw in/out commands and works with deprivileged guests via
libvirt.  For uid 0 callers, we use in/out directly to avoid any
compatibility issues.

Signed-off-by: Alex Williamson 
---

 Required kernel patch pending here:
 http://www.spinics.net/lists/linux-pci/msg09389.html

 hw/device-assignment.c |  131 
 hw/device-assignment.h |1 
 2 files changed, 99 insertions(+), 33 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 2bba22f..37c1278 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -67,6 +67,28 @@ static uint32_t guest_to_host_ioport(AssignedDevRegion 
*region, uint32_t addr)
 return region->u.r_baseport + (addr - region->e_physbase);
 }
 
+static int assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
+  uint32_t addr, int len, uint32_t *val,
+  int write)
+{
+if (dev_region->region->resource_fd == -1)
+return -1;
+
+if (write) {
+if (pwrite(dev_region->region->resource_fd, val, len,
+  (addr - dev_region->e_physbase)) != len) {
+return -1;
+}
+} else {
+if (pread(dev_region->region->resource_fd, val, len,
+  (addr - dev_region->e_physbase)) != len) {
+return -1;
+}
+}
+
+return 0;
+}
+
 static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr,
uint32_t value)
 {
@@ -77,7 +99,9 @@ static void assigned_dev_ioport_writeb(void *opaque, uint32_t 
addr,
  r_pio, (int)r_access->e_physbase,
  (unsigned long)r_access->u.r_baseport, value);
 
-outb(value, r_pio);
+if (assigned_dev_ioport_rw(r_access, addr, 1, &value, 1) != 0) {
+outb(value, r_pio);
+}
 }
 
 static void assigned_dev_ioport_writew(void *opaque, uint32_t addr,
@@ -90,7 +114,9 @@ static void assigned_dev_ioport_writew(void *opaque, 
uint32_t addr,
   r_pio, (int)r_access->e_physbase,
  (unsigned long)r_access->u.r_baseport, value);
 
-outw(value, r_pio);
+if (assigned_dev_ioport_rw(r_access, addr, 2, &value, 1) != 0) {
+outw(value, r_pio);
+}
 }
 
 static void assigned_dev_ioport_writel(void *opaque, uint32_t addr,
@@ -103,7 +129,9 @@ static void assigned_dev_ioport_writel(void *opaque, 
uint32_t addr,
  r_pio, (int)r_access->e_physbase,
   (unsigned long)r_access->u.r_baseport, value);
 
-outl(value, r_pio);
+if (assigned_dev_ioport_rw(r_access, addr, 4, &value, 1) != 0) {
+outl(value, r_pio);
+}
 }
 
 static uint32_t assigned_dev_ioport_readb(void *opaque, uint32_t addr)
@@ -112,7 +140,9 @@ static uint32_t assigned_dev_ioport_readb(void *opaque, 
uint32_t addr)
 uint32_t r_pio = guest_to_host_ioport(r_access, addr);
 uint32_t value;
 
-value = inb(r_pio);
+if (assigned_dev_ioport_rw(r_access, addr, 1, &value, 0) != 0) {
+value = inb(r_pio);
+}
 
 DEBUG("r_pio=%08x e_physbase=%08x r_=%08lx value=%08x\n",
   r_pio, (int)r_access->e_physbase,
@@ -127,7 +157,9 @@ static uint32_t assigned_dev_ioport_readw(void *opaque, 
uint32_t addr)
 uint32_t r_pio = guest_to_host_ioport(r_access, addr);
 uint32_t value;
 
-value = inw(r_pio);
+if (assigned_dev_ioport_rw(r_access, addr, 2, &value, 0) != 0) {
+value = inw(r_pio);
+}
 
 DEBUG("r_pio=%08x e_physbase=%08x r_baseport=%08lx value=%08x\n",
   r_pio, (int)r_access->e_physbase,
@@ -142,7 +174,9 @@ static uint32_t assigned_dev_ioport_readl(void *opaque, 
uint32_t addr)
 uint32_t r_pio = guest_to_host_ioport(r_access, addr);
 uint32_t value;
 
-value = inl(r_pio);
+if (assigned_dev_ioport_rw(r_access, addr, 4, &value, 0) != 0) {
+value = inl(r_pio);
+}
 
 DEBUG("r_pio=%08x e_physbase=%08x r_baseport=%08lx value=%08x\n",
   r_pio, (int)r_access->e_physbase,
@@ -305,7 +339,7 @@ static void assigned_dev_ioport_map(PCIDevice *pci_dev, int 
region_num,
 DEBUG("e_phys=0x%" FMT_PCIBUS " r_baseport=%x type=0x%x len=%" FMT_PCIBUS 
" region_num=%d \n",
   addr, region->u.r_baseport, type, size, region_num);
 
-if (first_map) {
+if (first_map && region->region->resource_fd < 0) {
struct ioperm_data *data;
 
data = qemu_mallocz(sizeof(struct ioperm_data));
@@ -586,19 +620,46 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
  slow_map ? assigned_dev_iomem_map_slow
   : assigned_dev_iomem_map);
 continue;
+} else {
+/* handle port io regions */
+uint32_t val;
+int ret;
+
+/* Test kernel support for ioport resource read/write.  Old
+ * kerne

Re: [PATCH 04/18] Make cpu_tsc_khz updates use local CPU

2010-07-20 Thread Zachary Amsden

On 07/19/2010 10:53 PM, Avi Kivity wrote:

On 07/19/2010 11:06 PM, Zachary Amsden wrote:

+static void tsc_khz_changed(void *data)
  {
-/* nothing */
+struct cpufreq_freqs *freq = data;
+unsigned long khz = 0;
+
+if (data)
+khz = freq->new;
+else if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
+khz = cpufreq_quick_get(raw_smp_processor_id());
+if (!khz)
+khz = tsc_khz;
+__get_cpu_var(cpu_tsc_khz) = khz;
  }


Do we really need to cache cpufreq_quick_get()?  If it's really 
quick, why not just use it everywhere instead of cacheing it?  Not a 
comment on this patch.





If cpufreq is compiled in, but disabled, it returns zero, so we need 
some sort of logic.


Maybe it's better to put it into cpufreq_quick_get().  Inconsistent 
APIs that appear to work are bad.




I don't think it's quite so simple; cpufreq is platform independent and 
tsc_khz is a platform specific export.  It seems cpufreq is designed to 
return zero when disabled and we're the unusual ones for wanting to use it.


Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG?] vhost assert error with < 4GB of RAM

2010-07-20 Thread Cam Macdonell
I think I've found a bug when running a guest with vhost with less
than 4GB of RAM.

If a guest has less than 4GB of RAM, then above_4g_mem_size is 0 for
this call to cpu_register_physical_memory() in pc_memory_init() from
hw/pc.c:922

#if TARGET_PHYS_ADDR_BITS > 32
cpu_register_physical_memory(0x1ULL, above_4g_mem_size,
 ram_addr + below_4g_mem_size);
#endif

this leads to vhost_client_set_memory being called with size == 0

#3  0x004301f3 in vhost_client_set_memory (client=0x113b010,
start_addr=4294967296, size=0, phys_offset=3221225472)
at /home/cam/research/KVM/qemu-kvm/hw/vhost.c:312

which trips the assert at hw/vhost.c:312

static void vhost_client_set_memory(CPUPhysMemoryClient *client,
target_phys_addr_t start_addr,
ram_addr_t size,
ram_addr_t phys_offset)
{

..

assert(size);
...

something like the following fixes the problem but I'm not sure if
it's the proper way to handle it.

diff --git a/exec.c b/exec.c
index 5e9a5b7..991abfc 100644
--- a/exec.c
+++ b/exec.c
@@ -2592,7 +2592,9 @@ void
cpu_register_physical_memory_offset(target_phys_addr_t start_addr,
 ram_addr_t orig_size = size;
 subpage_t *subpage;

-cpu_notify_set_memory(start_addr, size, phys_offset);
+if (size > 0) {
+cpu_notify_set_memory(start_addr, size, phys_offset);
+}

 if (phys_offset == IO_MEM_UNASSIGNED) {
 region_offset = start_addr;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-20 Thread Greg KH
On Sat, Jul 17, 2010 at 10:45:23AM +0200, Piotr Jaroszy??ski wrote:
> On 16 July 2010 23:58, Tom Lyon  wrote:
> > The VFIO "driver" is used to allow privileged AND non-privileged processes 
> > to
> > implement user-level device drivers for any well-behaved PCI, PCI-X, and 
> > PCIe
> > devices.
> 
> Thanks for working on that! I wonder whether it's possible to say what
> are the chances of it being merged to mainline and which version we
> might be talking about?

We still have a long way to go before you need to worry about what
kernel version it's going to show up in...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Swap usage with KVM

2010-07-20 Thread David Weber

> Yes, we are using Virtio drivers for networking and storage in both VMs
> with cache=none. Both VMs are running Linux 2.6.32-bpo.5-amd64 from
> Lenny Backports repositories. For VMHost, we are using a stable version
> of KVM with Linux 2.6.32.12 compiled from source code of kernel.org and
> qemu-kvm 0.12.3 compiled with the source code obtained from the official
> site of KVM.
> 

Afaik this should be this bug
http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2989366&group_id=180599

try upgrading to 0.12.4 or backport this commit
http://git.kernel.org/?p=virt/kvm/qemu-
kvm.git;a=commit;h=012d4869c1eb195e83f159ed7b2bced33f37f960

David
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for July 20

2010-07-20 Thread Anthony Liguori

On 07/20/2010 11:29 AM, Aurelien Jarno wrote:

It's a pitty I can't easily attend to this conference call, as it seems
a lot of decisions are taken there. Anyway let me comment the part
concerning 0.12 stable:
   


Is it a matter of time zone or conflict?  The call has historically been 
centered around KVM issues but these days it's hard to make such a clear 
distinction..



On Tue, Jul 20, 2010 at 07:45:51AM -0700, Chris Wright wrote:
   

0.12.stable
- start w/ git tree + pull requests
- release process is separate from commit access
- justin will put up a tree for pull requests
- there's current backlog, what about that?
 

I think someone should actively follow the patches committed to HEAD and
backport them when they seems to be stable material. I guess it's what's
Justin plans to do.

OTOH, it might be useful if people sending patches to HEAD adds a small
comment about cherry-picking the patch to stable if it applies.
   


My big concern with -stable is testing.  For folks interested in helping 
out, what I'd really like to see is people explicitly testing their 
patches on -stable.  IOW, just saying "this is probably stable material" 
is not nearly as helpful as saying, "I've verified this cherry picks 
cleanly to stable and tested there."



- anthony's concern with -stable is the testing (upstream tree gets more
   testing than -stable)
 

Debian gets regular uploads with the contents of the -stable tree
between to releases. Also patches from trunk are all cherry-picked from
HEAD.
   


That's good to know.  My main point was that proportionately speaking, 
the master branch gets considerably more testing than the stable 
branch.  Considering that there is a higher expectation of stable too, 
the testing requirement for it is pretty high in my opinion.


Regards,

Anthony Liguori


- 0.12.5?
   - planning to do next w/ 0.13 release
   - aurelien may cut a release
 

Following the minutes from last week, I sent a call for release, with a
deadline today. I only got the patch series from Kevin. There are
currently 44 patches waiting in the stable tree, so I guess we can go
for a release. I plan to do that later this week if nobody opposes.

   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call minutes for July 20

2010-07-20 Thread David S. Ahern


On 07/20/10 08:45, Chris Wright wrote:
> 0.13
> - rc RSN (hopefully this week, top priority for anthony)

Can Cam's inter-vm shared memory device get committed for 0.13? It's
been stagnant on the list for a while now waiting for inclusion (or NAK
comments).

David
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for July 20

2010-07-20 Thread Aurelien Jarno
It's a pitty I can't easily attend to this conference call, as it seems
a lot of decisions are taken there. Anyway let me comment the part
concerning 0.12 stable:

On Tue, Jul 20, 2010 at 07:45:51AM -0700, Chris Wright wrote:
> 0.12.stable
> - start w/ git tree + pull requests
> - release process is separate from commit access
> - justin will put up a tree for pull requests
> - there's current backlog, what about that?

I think someone should actively follow the patches committed to HEAD and
backport them when they seems to be stable material. I guess it's what's
Justin plans to do.

OTOH, it might be useful if people sending patches to HEAD adds a small
comment about cherry-picking the patch to stable if it applies.

> - anthony's concern with -stable is the testing (upstream tree gets more
>   testing than -stable)

Debian gets regular uploads with the contents of the -stable tree
between to releases. Also patches from trunk are all cherry-picked from
HEAD.

> - 0.12.5?
>   - planning to do next w/ 0.13 release
>   - aurelien may cut a release

Following the minutes from last week, I sent a call for release, with a
deadline today. I only got the patch series from Kevin. There are
currently 44 patches waiting in the stable tree, so I guess we can go
for a release. I plan to do that later this week if nobody opposes.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 01/14] KVM-test: Add a new macaddress pool algorithm

2010-07-20 Thread Michael Goldish
On 07/20/2010 04:44 PM, Amos Kong wrote:
> On Tue, Jul 20, 2010 at 01:19:39PM +0300, Michael Goldish wrote:
>>
> 
> Michael,
> 
> Thanks for your comments. Let's simplify this method together.
> 
>> On 07/20/2010 04:34 AM, Amos Kong wrote:
>>> Old method uses the mac address in the configuration files which could
>>> lead serious problem when multiple tests running in different hosts.
>>>
>>> This patch adds a new macaddress pool algorithm, it generates the mac prefix
>>> based on mac address of the host which could eliminate the duplicated mac
>>> addresses between machines.
>>>
>>> When user have set the mac_prefix in the configuration file, we should use 
>>> it
>>> in stead of the dynamic generated mac prefix.
>>>
>>> Other change:
>>> . Fix randomly generating mac address so that it correspond to IEEE802.
>>> . Update clone function to decide clone mac address or not.
>>> . Update get_macaddr function.
>>> . Add set_mac_address function.
>>>
>>> New auto mac address pool algorithm:
>>> If address_index is defined, VM will get mac from config file then record 
>>> mac
>>> in to address_pool. If address_index is not defined, VM will call
>>> get_mac_from_pool to auto create mac then recored mac to address_pool in
>>> following format:
>>> {'macpool': {'AE:9D:94:6A:9b:f9': ['20100310-165222-Wt7l:0']}}
>>>
>>>   AE:9D:94:6A:9b:f9: mac address
>>>   20100310-165222-Wt7l : instance attribute of VM
>>>   0: index of NIC
>>
>> Why do you use the mac address as a key, instead of the instance string
>> + nic index?  When the mac address is used as a key, each key has a list
>> of values instead of just one value.  This order seems unnatural.  If it
>> were the other way around (i.e. key = VM instance + nic index, value =
>> mac address), then each key would have exactly one value, and I think
>> this patch would be shorter and simpler.
> 
> One mac address may be used by two VMs, eg. migration.

Sure, that's why I thought the opposite direction would be better: keys
= VMs (nics), values = mac addresses.  That way we have one value per
key, instead of a list of values per key.

To clarify, instead of using:

{'AE:9D:94:6A:9b:f9': ['20100310-165222-Wt7l:0',
'20100310-165222-Wt7l:1', '20100310-165222-Wt7l:2']}

I suggest:

{'20100310-165222-Wt7l:0': 'AE:9D:94:6A:9b:f9',
 '20100310-165222-Wt7l:1': 'AE:9D:94:6A:9b:f9',
 '20100310-165222-Wt7l:2': 'AE:9D:94:6A:9b:f9'}

>>> Signed-off-by: Jason Wang 
>>> Signed-off-by: Feng Yang 
>>> Signed-off-by: Amos Kong 
>>> ---
>>>  0 files changed, 0 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
>>> index fb2d1c2..7c0946e 100644
>>> --- a/client/tests/kvm/kvm_utils.py
>>> +++ b/client/tests/kvm/kvm_utils.py
>>> @@ -5,6 +5,7 @@ KVM test utility functions.
>>>  """
>>>  
>>>  import time, string, random, socket, os, signal, re, logging, commands, 
>>> cPickle
>>> +import fcntl, shelve
>>>  from autotest_lib.client.bin import utils
>>>  from autotest_lib.client.common_lib import error, logging_config
>>>  import kvm_subprocess
>>> @@ -82,6 +83,104 @@ def get_sub_dict_names(dict, keyword):
>>>  
>>>  # Functions related to MAC/IP addresses
>>>  
>>> +def get_mac_from_pool(root_dir, vm, nic_index, prefix='00:11:22:33:'):
>>
>> The name of this function is confusing because it does the exact
>> opposite: it puts a mac address in address_pool.  Maybe the pool you're
>> referring to in the name isn't address_pool, but still a less confusing
>> name should probably be used.
> 
> How about allocate_mac(...) ?
> address_pool -> address_container
> 
> Allocate mac address and record into address_container.

Yes, something like that, sounds less confusing.

>>> +"""
>>> +random generated mac address.
>>> +
>>> +1) First try to generate macaddress based on the mac address prefix.
>>> +2) And then try to use total random generated mac address.
>>> +
>>> +@param root_dir: Root dir for kvm
>>> +@param vm: Here we use instance of vm
>>> +@param nic_index: The index of nic.
>>> +@param prefix: Prefix of mac address.
>>> +@Return: Return mac address.
>>> +"""
>>> +
>>> +lock_filename = os.path.join(root_dir, "mac_lock")
>>> +lock_file = open(lock_filename, 'w')
>>> +fcntl.lockf(lock_file.fileno() ,fcntl.LOCK_EX)
>>> +mac_filename = os.path.join(root_dir, "address_pool")
>>
>> Maybe it makes sense to put address_pool and the lock file in /tmp,
>> where they can be shared by more than a single autotest instance running
>> on the same host (unlikely, but theoretically possible).
> 
> good idea.
>  
>>> +mac_shelve = shelve.open(mac_filename, writeback=False)
>>> +
>>> +mac_pool = mac_shelve.get("macpool")
>>
>> Why is this 'macpool' needed?  Why not put the keys directly in the
>> shelve object?
>  
> yes, put keys directly in the shelve object is better.
> 
>>> +if not mac_pool:
>>> +mac_pool = {}
>>> +found = False
>>> +
>>> +v

Re: Swap usage with KVM

2010-07-20 Thread Daniel Bareiro
On Sunday, 11 July 2010 19:08:58 -0300,
Daniel Bareiro wrote:

> > > I have an installation with Debian GNU/Linux 5.0.4 amd64 with
> > > qemu-kvm 0.12.3 compiled with the source code obtained from the
> > > official site of KVM and Linux 2.6.32.12 compiled from source code
> > > of kernel.org. All this is installed on an HP Proliant DL380 G6
> > > with two Xeon E5530 quadcore processors and 16 GiB of RAM which
> > > has two VMs with the following configuration of memory:

> > Are you using virtio drivers in the VMs?
> > 
> > There was an issue with KVM-72 and virtio that leaks memory in the
> > host until all RAM and swap is used (inside the VMs, no swap is
> > used). It was supposed to be fixed in KVM-80-something, though.
> > 
> > Perhaps something similar is happening again?  If you switch the
> > disks to scsi instead of virtio, does the problem go away?
> > 
> > We are running KVM-72 on Debian 5.0 and have run into this issue.
> > We'll be upgrading our hosts this month to fix this.

> Yes, we are using Virtio drivers for networking and storage in both
> VMs with cache=none. Both VMs are running Linux 2.6.32-bpo.5-amd64
> from Lenny Backports repositories. For VMHost, we are using a stable
> version of KVM with Linux 2.6.32.12 compiled from source code of
> kernel.org and qemu-kvm 0.12.3 compiled with the source code obtained
> from the official site of KVM.
> 
> This is the syntax I'm using to boot the virtual machines:
> 
> 
>  8587 ?Sl   6515:25 /usr/local/qemu-kvm/bin/qemu-system-x86_64 -drive
> file=/dev/vm/aps4-raiz,cache=none,if=virtio,boot=on -drive
> file=/dev/vm/aps4-cache,cache=none,if=virtio -drive 
> file=/dev/vm/aps4-index,cache=none,if=virtio
> -drive file=/dev/vm/aps4-space,cache=none,if=virtio -m 7168 -smp 4 -net
> nic,model=virtio,macaddr=00:16:3e:00:00:95 -net tap -daemonize -vnc :3 -k es 
> -localtime -monitor
> telnet:localhost:4003,server,nowait -serial 
> telnet:localhost:4043,server,nowait
> 
>  9769 ?Rl   11968:47 /usr/local/qemu-kvm/bin/qemu-system-x86_64 -drive
> file=/dev/vm/leela-raiz,cache=none,if=virtio,boot=on -drive
> file=/dev/vm/leela-u01,cache=none,if=virtio -drive 
> file=/dev/vm/leela-u02,cache=none,if=virtio
> -drive file=/dev/vm/leela-u03,cache=none,if=virtio -drive
> file=/dev/vm/leela-u04,cache=none,if=virtio -drive 
> file=/dev/vm/leela-u05,cache=none,if=virtio
> -drive file=/dev/vm/leela-u06,cache=none,if=virtio -drive
> file=/dev/vm/leela-u07,cache=none,if=virtio -drive 
> file=/dev/vm/leela-u08,cache=none,if=virtio
> -drive file=/dev/vm/leela-u09,cache=none,if=virtio -drive
> file=/dev/vm/leela-space,cache=none,if=virtio -m 7168 -smp 8 -net
> nic,model=virtio,macaddr=00:16:3e:00:00:96 -net tap -daemonize -vnc :4 -k es 
> -localtime -monitor
> telnet:localhost:4004,server,nowait -serial 
> telnet:localhost:4044,server,nowait

> To make the switch from Virtio to SCSI I would have to shut down the
> hosts, which would not be a good idea whereas are two productive
> systems. At least, before doing so I would be sure of what might be
> the problem.
> 
> Taking a current measurement in VMHost with free, I got the following:
> 
> 
> ss04:~# free
>  total   used   free sharedbuffers cached
> Mem:  16461588   16406504  55084  0   2920  21504
> -/+ buffers/cache:   16382080  79508
> Swap:  2028492 9831401045352
> 
> 
> It draws attention to me that thinking about initially leaving a margin
> of 2 GB of RAM for the VMHost, already it has used almost half of swap.

This is a current measurement I've taken in both the VMs and in VMHost:

* VMHost:


ss04:~# free
 total   used   free sharedbuffers cached
Mem:  16461588   16405140  56448  0   3496  18604
-/+ buffers/cache:   16383040  78548
Swap:  517422024015522772668


* Aps4:

aps4:~# free
 total   used   free sharedbuffers cached
Mem:   71643007120192  44108  0  23108 239076
-/+ buffers/cache:6858008 306292
Swap:  2931820  140842917736


* Leela:

leela:~# free
 total   used   free sharedbuffers cached
Mem:   71638366905224 258612  0 1233806282816
-/+ buffers/cache: 4990286664808
Swap:   979924  35640 944284


As you can see, I added more swap in VMHost for more margin, but
currently only 54% is free.



Thanks in advance for your replies.

Regards,
Daniel
-- 
Fingerprint: BFB3 08D6 B4D1 31B2 72B9  29CE 6696 BF1B 14E6 1D37
Powered by Debian GNU/Linux Lenny - Linux user #188.598


signature.asc
Description: Digital signature


KVM call minutes for July 20

2010-07-20 Thread Chris Wright
0.12.stable
- start w/ git tree + pull requests
- release process is separate from commit access
- justin will put up a tree for pull requests
- there's current backlog, what about that?
- anthony's concern with -stable is the testing (upstream tree gets more
  testing than -stable)
- 0.12.5?
  - planning to do next w/ 0.13 release
  - aurelien may cut a release
  - justin will do some sanity testing, most patches are in fedora anyway

0.13
- rc RSN (hopefully this week, top priority for anthony)

kvm testsuite
- was planning to clean up and contribute to qemu
- now thinking perhaps just split it out to its own repo
  - not really qemu code, not really kvm code, not cross compile, etc..
  - could use std serial device
  - could use vga (needs mmio space)
  - 
- would like to add nested svm and (more important) nested vmx
  - small bit to copy l1 to l2 state, to make guest nested
  - need framework, can then require nested patches come w/ regression tests
- current testsuite failing on qemu (shows softmmu issues, any takers?)

fw_cfg issues
- mostly on list
- concerns about dma interface (too close to use case specific hack)
- rep could be optimized in general
  - each byte == function call
- possible pull in 4k (instead of 1k) on each exit
- bar for changes should be no new interfaces
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: does sidt get correct start address of IDT in guest?

2010-07-20 Thread Avi Kivity

On 07/20/2010 05:04 PM, 吴忠远 wrote:

  in guest os , a module with sidt instruction was execution to get
start address of IDT.does this return the correct address of IDT in
guest OS? thanks.
   


Yes.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


does sidt get correct start address of IDT in guest?

2010-07-20 Thread 吴忠远
 in guest os , a module with sidt instruction was execution to get
start address of IDT.does this return the correct address of IDT in
guest OS? thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 01/14] KVM-test: Add a new macaddress pool algorithm

2010-07-20 Thread Amos Kong
On Tue, Jul 20, 2010 at 01:19:39PM +0300, Michael Goldish wrote:
>

Michael,

Thanks for your comments. Let's simplify this method together.

> On 07/20/2010 04:34 AM, Amos Kong wrote:
> > Old method uses the mac address in the configuration files which could
> > lead serious problem when multiple tests running in different hosts.
> > 
> > This patch adds a new macaddress pool algorithm, it generates the mac prefix
> > based on mac address of the host which could eliminate the duplicated mac
> > addresses between machines.
> > 
> > When user have set the mac_prefix in the configuration file, we should use 
> > it
> > in stead of the dynamic generated mac prefix.
> > 
> > Other change:
> > . Fix randomly generating mac address so that it correspond to IEEE802.
> > . Update clone function to decide clone mac address or not.
> > . Update get_macaddr function.
> > . Add set_mac_address function.
> > 
> > New auto mac address pool algorithm:
> > If address_index is defined, VM will get mac from config file then record 
> > mac
> > in to address_pool. If address_index is not defined, VM will call
> > get_mac_from_pool to auto create mac then recored mac to address_pool in
> > following format:
> > {'macpool': {'AE:9D:94:6A:9b:f9': ['20100310-165222-Wt7l:0']}}
> > 
> >   AE:9D:94:6A:9b:f9: mac address
> >   20100310-165222-Wt7l : instance attribute of VM
> >   0: index of NIC
> 
> Why do you use the mac address as a key, instead of the instance string
> + nic index?  When the mac address is used as a key, each key has a list
> of values instead of just one value.  This order seems unnatural.  If it
> were the other way around (i.e. key = VM instance + nic index, value =
> mac address), then each key would have exactly one value, and I think
> this patch would be shorter and simpler.

One mac address may be used by two VMs, eg. migration.
 
> > Signed-off-by: Jason Wang 
> > Signed-off-by: Feng Yang 
> > Signed-off-by: Amos Kong 
> > ---
> >  0 files changed, 0 insertions(+), 0 deletions(-)
> > 
> > diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
> > index fb2d1c2..7c0946e 100644
> > --- a/client/tests/kvm/kvm_utils.py
> > +++ b/client/tests/kvm/kvm_utils.py
> > @@ -5,6 +5,7 @@ KVM test utility functions.
> >  """
> >  
> >  import time, string, random, socket, os, signal, re, logging, commands, 
> > cPickle
> > +import fcntl, shelve
> >  from autotest_lib.client.bin import utils
> >  from autotest_lib.client.common_lib import error, logging_config
> >  import kvm_subprocess
> > @@ -82,6 +83,104 @@ def get_sub_dict_names(dict, keyword):
> >  
> >  # Functions related to MAC/IP addresses
> >  
> > +def get_mac_from_pool(root_dir, vm, nic_index, prefix='00:11:22:33:'):
> 
> The name of this function is confusing because it does the exact
> opposite: it puts a mac address in address_pool.  Maybe the pool you're
> referring to in the name isn't address_pool, but still a less confusing
> name should probably be used.

How about allocate_mac(...) ?
address_pool -> address_container

Allocate mac address and record into address_container.

 
> > +"""
> > +random generated mac address.
> > +
> > +1) First try to generate macaddress based on the mac address prefix.
> > +2) And then try to use total random generated mac address.
> > +
> > +@param root_dir: Root dir for kvm
> > +@param vm: Here we use instance of vm
> > +@param nic_index: The index of nic.
> > +@param prefix: Prefix of mac address.
> > +@Return: Return mac address.
> > +"""
> > +
> > +lock_filename = os.path.join(root_dir, "mac_lock")
> > +lock_file = open(lock_filename, 'w')
> > +fcntl.lockf(lock_file.fileno() ,fcntl.LOCK_EX)
> > +mac_filename = os.path.join(root_dir, "address_pool")
> 
> Maybe it makes sense to put address_pool and the lock file in /tmp,
> where they can be shared by more than a single autotest instance running
> on the same host (unlikely, but theoretically possible).

good idea.
 
> > +mac_shelve = shelve.open(mac_filename, writeback=False)
> > +
> > +mac_pool = mac_shelve.get("macpool")
> 
> Why is this 'macpool' needed?  Why not put the keys directly in the
> shelve object?
 
yes, put keys directly in the shelve object is better.

> > +if not mac_pool:
> > +mac_pool = {}
> > +found = False
> > +
> > +val = "%s:%s" % (vm, nic_index)
> > +for key in mac_pool.keys():
> > +if val in mac_pool[key]:
> > +mac_pool[key].append(val)
> 
> Why append val to mac_pool[key] if val is already in mac_pool[key]?

need drop it.

> > +found = True
> > +mac = key
> > +
> > +while not found:
> > +postfix = "%02x:%02x" % (random.randint(0x00,0xfe),
> > +random.randint(0x00,0xfe))
> > +mac = prefix + postfix
> > +mac_list = mac.split(":")
> > +# Clear multicast bit
> > +mac_list[0] = int(mac_list[0],16) 

[PATCH v2 3/3] KVM: Non-atomic interrupt injection

2010-07-20 Thread Avi Kivity
Change the interrupt injection code to work from preemptible, interrupts
enabled context.  This works by adding a ->cancel_injection() operation
that undoes an injection in case we were not able to actually enter the guest
(this condition could never happen with atomic injection).

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/svm.c  |   12 
 arch/x86/kvm/vmx.c  |   11 +++
 arch/x86/kvm/x86.c  |   27 ++-
 4 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 502e53f..5dd797c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -505,6 +505,7 @@ struct kvm_x86_ops {
void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr,
bool has_error_code, u32 error_code,
bool reinject);
+   void (*cancel_injection)(struct kvm_vcpu *vcpu);
int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
int (*nmi_allowed)(struct kvm_vcpu *vcpu);
bool (*get_nmi_mask)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 56c9b6b..46d068e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3135,6 +3135,17 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
}
 }
 
+static void svm_cancel_injection(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+   struct vmcb_control_area *control = &svm->vmcb->control;
+
+   control->exit_int_info = control->event_inj;
+   control->exit_int_info_err = control->event_inj_err;
+   control->event_inj = 0;
+   svm_complete_interrupts(svm);
+}
+
 #ifdef CONFIG_X86_64
 #define R "r"
 #else
@@ -3493,6 +3504,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.set_irq = svm_set_irq,
.set_nmi = svm_inject_nmi,
.queue_exception = svm_queue_exception,
+   .cancel_injection = svm_cancel_injection,
.interrupt_allowed = svm_interrupt_allowed,
.nmi_allowed = svm_nmi_allowed,
.get_nmi_mask = svm_get_nmi_mask,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 53b6fc0..72381b7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3906,6 +3906,16 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
  IDT_VECTORING_ERROR_CODE);
 }
 
+static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
+{
+   __vmx_complete_interrupts(to_vmx(vcpu),
+ vmcs_read32(VM_ENTRY_INTR_INFO_FIELD),
+ VM_ENTRY_INSTRUCTION_LEN,
+ VM_ENTRY_EXCEPTION_ERROR_CODE);
+
+   vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
+}
+
 /*
  * Failure to inject an interrupt should give us the information
  * in IDT_VECTORING_INFO_FIELD.  However, if the failure occurs
@@ -4360,6 +4370,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.set_irq = vmx_inject_irq,
.set_nmi = vmx_inject_nmi,
.queue_exception = vmx_queue_exception,
+   .cancel_injection = vmx_cancel_injection,
.interrupt_allowed = vmx_interrupt_allowed,
.nmi_allowed = vmx_nmi_allowed,
.get_nmi_mask = vmx_get_nmi_mask,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 84bfb51..1040d3f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4709,6 +4709,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (unlikely(r))
goto out;
 
+   inject_pending_event(vcpu);
+
+   /* enable NMI/IRQ window open exits if needed */
+   if (vcpu->arch.nmi_pending)
+   kvm_x86_ops->enable_nmi_window(vcpu);
+   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
+   kvm_x86_ops->enable_irq_window(vcpu);
+
+   if (kvm_lapic_enabled(vcpu)) {
+   update_cr8_intercept(vcpu);
+   kvm_lapic_sync_to_vapic(vcpu);
+   }
+
preempt_disable();
 
kvm_x86_ops->prepare_guest_switch(vcpu);
@@ -4727,23 +4740,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
smp_wmb();
local_irq_enable();
preempt_enable();
+   kvm_x86_ops->cancel_injection(vcpu);
r = 1;
goto out;
}
 
-   inject_pending_event(vcpu);
-
-   /* enable NMI/IRQ window open exits if needed */
-   if (vcpu->arch.nmi_pending)
-   kvm_x86_ops->enable_nmi_window(vcpu);
-   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
-   kvm_x86_ops->enable_irq_window(vcpu);
-
-   if (kvm_lapic_enabled(vcpu)) {
-   update_cr8_intercept(vcpu);
-   kvm_lapic_sync_to_vapic(vcpu);
-   }
-
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
 
kvm_guest_enter();
-- 
1.7.1

--
To unsubscribe from this list:

[PATCH v2 1/3] KVM: VMX: Split up vmx_complete_interrupts()

2010-07-20 Thread Avi Kivity
vmx_complete_interrupts() does too much, split it up:
 - vmx_vcpu_run() gets the "cache important vmcs fields" part
 - a new vmx_complete_atomic_exit() gets the parts that must be done atomically
 - a new vmx_recover_nmi_blocking() does what its name says
 - vmx_complete_interrupts() retains the event injection recovery code

This helps in reducing the work done in atomic context.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |   39 +++
 1 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2fdcc98..1a35964 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -125,6 +125,7 @@ struct vcpu_vmx {
unsigned long host_rsp;
int   launched;
u8fail;
+   u32   exit_intr_info;
u32   idt_vectoring_info;
struct shared_msr_entry *guest_msrs;
int   nmsrs;
@@ -3792,18 +3793,9 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, 
int tpr, int irr)
vmcs_write32(TPR_THRESHOLD, irr);
 }
 
-static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 {
-   u32 exit_intr_info;
-   u32 idt_vectoring_info = vmx->idt_vectoring_info;
-   bool unblock_nmi;
-   u8 vector;
-   int type;
-   bool idtv_info_valid;
-
-   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
-
-   vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
+   u32 exit_intr_info = vmx->exit_intr_info;
 
/* Handle machine checks before interrupts are enabled */
if ((vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
@@ -3818,8 +3810,16 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
asm("int $2");
kvm_after_handle_nmi(&vmx->vcpu);
}
+}
 
-   idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK;
+static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx)
+{
+   u32 exit_intr_info = vmx->exit_intr_info;
+   bool unblock_nmi;
+   u8 vector;
+   bool idtv_info_valid;
+
+   idtv_info_valid = vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK;
 
if (cpu_has_virtual_nmis()) {
unblock_nmi = (exit_intr_info & INTR_INFO_UNBLOCK_NMI) != 0;
@@ -3841,6 +3841,16 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
} else if (unlikely(vmx->soft_vnmi_blocked))
vmx->vnmi_blocked_time +=
ktime_to_ns(ktime_sub(ktime_get(), vmx->entry_time));
+}
+
+static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+{
+   u32 idt_vectoring_info = vmx->idt_vectoring_info;
+   u8 vector;
+   int type;
+   bool idtv_info_valid;
+
+   idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK;
 
vmx->vcpu.arch.nmi_injected = false;
kvm_clear_exception_queue(&vmx->vcpu);
@@ -4051,6 +4061,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
asm("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS));
vmx->launched = 1;
 
+   vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
+   vmx->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
+
+   vmx_complete_atomic_exit(vmx);
+   vmx_recover_nmi_blocking(vmx);
vmx_complete_interrupts(vmx);
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and entry

2010-07-20 Thread Avi Kivity
Currently vmx_complete_interrupts() can decode event information from vmx
exit fields into the generic kvm event queues.  Make it able to decode
the information from the entry fields as well by parametrizing it.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |   19 ++-
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1a35964..53b6fc0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3843,9 +3843,11 @@ static void vmx_recover_nmi_blocking(struct vcpu_vmx 
*vmx)
ktime_to_ns(ktime_sub(ktime_get(), vmx->entry_time));
 }
 
-static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+static void __vmx_complete_interrupts(struct vcpu_vmx *vmx,
+ u32 idt_vectoring_info,
+ int instr_len_field,
+ int error_code_field)
 {
-   u32 idt_vectoring_info = vmx->idt_vectoring_info;
u8 vector;
int type;
bool idtv_info_valid;
@@ -3875,18 +3877,18 @@ static void vmx_complete_interrupts(struct vcpu_vmx 
*vmx)
break;
case INTR_TYPE_SOFT_EXCEPTION:
vmx->vcpu.arch.event_exit_inst_len =
-   vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+   vmcs_read32(instr_len_field);
/* fall through */
case INTR_TYPE_HARD_EXCEPTION:
if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK) {
-   u32 err = vmcs_read32(IDT_VECTORING_ERROR_CODE);
+   u32 err = vmcs_read32(error_code_field);
kvm_queue_exception_e(&vmx->vcpu, vector, err);
} else
kvm_queue_exception(&vmx->vcpu, vector);
break;
case INTR_TYPE_SOFT_INTR:
vmx->vcpu.arch.event_exit_inst_len =
-   vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+   vmcs_read32(instr_len_field);
/* fall through */
case INTR_TYPE_EXT_INTR:
kvm_queue_interrupt(&vmx->vcpu, vector,
@@ -3897,6 +3899,13 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
}
 }
 
+static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+{
+   __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info,
+ VM_EXIT_INSTRUCTION_LEN,
+ IDT_VECTORING_ERROR_CODE);
+}
+
 /*
  * Failure to inject an interrupt should give us the information
  * in IDT_VECTORING_INFO_FIELD.  However, if the failure occurs
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] Nonatomic interrupt injection

2010-07-20 Thread Avi Kivity
This patchset changes interrupt injection to be done from normal process
context instead of interrupts disabled context.  This is useful for real
mode interrupt injection on Intel without the current hacks (injecting as
a software interrupt of a vm86 task), reducing latencies, and later, for
allowing nested virtualization code to use kvm_read_guest()/kvm_write_guest()
instead of kmap() to access the guest vmcb/vmcs.

Seems to survive a hack that cancels every 16th entry, after injection has
already taken place.

v2: svm support (easier than expected)
fix silly vmx warning

Avi Kivity (3):
  KVM: VMX: Split up vmx_complete_interrupts()
  KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and
entry
  KVM: Non-atomic interrupt injection

 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/svm.c  |   12 +++
 arch/x86/kvm/vmx.c  |   65 ++-
 arch/x86/kvm/x86.c  |   27 
 4 files changed, 77 insertions(+), 28 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: KVM call agenda for July 20

2010-07-20 Thread Luiz Capitulino
On Tue, 20 Jul 2010 09:07:11 +0300
Avi Kivity  wrote:

> On 07/20/2010 12:46 AM, Chris Wright wrote:
> > Please send in any agenda items you are interested in covering.
> >
> >
>   Last week's agenda, minus the item that we started to discuss.

(includes 0.13)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Allow a user to stop and start one guest VM

2010-07-20 Thread Daniel P. Berrange
On Tue, Jul 20, 2010 at 08:01:15AM -0500, Neil Aggarwal wrote:
> Hello:
> 
> One of my customers asked for access to stop and start
> their guest VM.  
> 
> Right now, I can do that using virsh, but I do not want
> to give this customer the ability to stop and start
> all VMs running on the host.
> 
> Is there a way to give stop and start control of one
> VM to someone?

Fine grained role based access control is not available at the
libvirt/virsh level. It is currently something that must be
provided by the management layer above libvirt. We intend to
add this capability directly into libvirt in the future, but
there's no firm ETA. So in the immediate term you'd need to
write a small tool using libvirt APIs to delegate stop/start
operations to users you desire


Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Allow a user to stop and start one guest VM

2010-07-20 Thread Neil Aggarwal
Hello:

One of my customers asked for access to stop and start
their guest VM.  

Right now, I can do that using virsh, but I do not want
to give this customer the ability to stop and start
all VMs running on the host.

Is there a way to give stop and start control of one
VM to someone?

I am using KVM on a CentOS 5.5 host.

Thanks,
Neil

--
Neil Aggarwal, (281)846-8957
FREE trial: Virtualmin VPS with unmetered bandwidth
http://UnmeteredVPS.net/virtualmin

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and entry

2010-07-20 Thread Avi Kivity
Currently vmx_complete_interrupts() can decode event information from vmx
exit fields into the generic kvm event queues.  Make it able to decode
the information from the entry fields as well by parametrizing it.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |   19 ++-
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1a35964..53b6fc0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3843,9 +3843,11 @@ static void vmx_recover_nmi_blocking(struct vcpu_vmx 
*vmx)
ktime_to_ns(ktime_sub(ktime_get(), vmx->entry_time));
 }
 
-static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+static void __vmx_complete_interrupts(struct vcpu_vmx *vmx,
+ u32 idt_vectoring_info,
+ int instr_len_field,
+ int error_code_field)
 {
-   u32 idt_vectoring_info = vmx->idt_vectoring_info;
u8 vector;
int type;
bool idtv_info_valid;
@@ -3875,18 +3877,18 @@ static void vmx_complete_interrupts(struct vcpu_vmx 
*vmx)
break;
case INTR_TYPE_SOFT_EXCEPTION:
vmx->vcpu.arch.event_exit_inst_len =
-   vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+   vmcs_read32(instr_len_field);
/* fall through */
case INTR_TYPE_HARD_EXCEPTION:
if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK) {
-   u32 err = vmcs_read32(IDT_VECTORING_ERROR_CODE);
+   u32 err = vmcs_read32(error_code_field);
kvm_queue_exception_e(&vmx->vcpu, vector, err);
} else
kvm_queue_exception(&vmx->vcpu, vector);
break;
case INTR_TYPE_SOFT_INTR:
vmx->vcpu.arch.event_exit_inst_len =
-   vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+   vmcs_read32(instr_len_field);
/* fall through */
case INTR_TYPE_EXT_INTR:
kvm_queue_interrupt(&vmx->vcpu, vector,
@@ -3897,6 +3899,13 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
}
 }
 
+static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+{
+   __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info,
+ VM_EXIT_INSTRUCTION_LEN,
+ IDT_VECTORING_ERROR_CODE);
+}
+
 /*
  * Failure to inject an interrupt should give us the information
  * in IDT_VECTORING_INFO_FIELD.  However, if the failure occurs
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: VMX: Split up vmx_complete_interrupts()

2010-07-20 Thread Avi Kivity
vmx_complete_interrupts() does too much, split it up:
 - vmx_vcpu_run() gets the "cache important vmcs fields" part
 - a new vmx_complete_atomic_exit() gets the parts that must be done atomically
 - a new vmx_recover_nmi_blocking() does what its name says
 - vmx_complete_interrupts() retains the event injection recovery code

This helps in reducing the work done in atomic context.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |   39 +++
 1 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2fdcc98..1a35964 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -125,6 +125,7 @@ struct vcpu_vmx {
unsigned long host_rsp;
int   launched;
u8fail;
+   u32   exit_intr_info;
u32   idt_vectoring_info;
struct shared_msr_entry *guest_msrs;
int   nmsrs;
@@ -3792,18 +3793,9 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, 
int tpr, int irr)
vmcs_write32(TPR_THRESHOLD, irr);
 }
 
-static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 {
-   u32 exit_intr_info;
-   u32 idt_vectoring_info = vmx->idt_vectoring_info;
-   bool unblock_nmi;
-   u8 vector;
-   int type;
-   bool idtv_info_valid;
-
-   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
-
-   vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
+   u32 exit_intr_info = vmx->exit_intr_info;
 
/* Handle machine checks before interrupts are enabled */
if ((vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
@@ -3818,8 +3810,16 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
asm("int $2");
kvm_after_handle_nmi(&vmx->vcpu);
}
+}
 
-   idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK;
+static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx)
+{
+   u32 exit_intr_info = vmx->exit_intr_info;
+   bool unblock_nmi;
+   u8 vector;
+   bool idtv_info_valid;
+
+   idtv_info_valid = vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK;
 
if (cpu_has_virtual_nmis()) {
unblock_nmi = (exit_intr_info & INTR_INFO_UNBLOCK_NMI) != 0;
@@ -3841,6 +3841,16 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
} else if (unlikely(vmx->soft_vnmi_blocked))
vmx->vnmi_blocked_time +=
ktime_to_ns(ktime_sub(ktime_get(), vmx->entry_time));
+}
+
+static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+{
+   u32 idt_vectoring_info = vmx->idt_vectoring_info;
+   u8 vector;
+   int type;
+   bool idtv_info_valid;
+
+   idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK;
 
vmx->vcpu.arch.nmi_injected = false;
kvm_clear_exception_queue(&vmx->vcpu);
@@ -4051,6 +4061,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
asm("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS));
vmx->launched = 1;
 
+   vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
+   vmx->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
+
+   vmx_complete_atomic_exit(vmx);
+   vmx_recover_nmi_blocking(vmx);
vmx_complete_interrupts(vmx);
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: Non-atomic interrupt injection

2010-07-20 Thread Avi Kivity
Change the interrupt injection code to work from preemptible, interrupts
enabled context.  This works by adding a ->cancel_injection() operation
that undoes an injection in case we were not able to actually enter the guest
(this condition could never happen with atomic injection).

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/vmx.c  |   10 ++
 arch/x86/kvm/x86.c  |   27 ++-
 3 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 502e53f..5dd797c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -505,6 +505,7 @@ struct kvm_x86_ops {
void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr,
bool has_error_code, u32 error_code,
bool reinject);
+   void (*cancel_injection)(struct kvm_vcpu *vcpu);
int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
int (*nmi_allowed)(struct kvm_vcpu *vcpu);
bool (*get_nmi_mask)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 53b6fc0..a039af2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3906,6 +3906,15 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
  IDT_VECTORING_ERROR_CODE);
 }
 
+static void vmx_cancel_injection(struct vcpu_vmx *vmx)
+{
+   __vmx_complete_interrupts(vmx, vmcs_read32(VM_ENTRY_INTR_INFO_FIELD),
+ VM_ENTRY_INSTRUCTION_LEN,
+ VM_ENTRY_EXCEPTION_ERROR_CODE);
+
+   vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
+}
+
 /*
  * Failure to inject an interrupt should give us the information
  * in IDT_VECTORING_INFO_FIELD.  However, if the failure occurs
@@ -4360,6 +4369,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.set_irq = vmx_inject_irq,
.set_nmi = vmx_inject_nmi,
.queue_exception = vmx_queue_exception,
+   .cancel_injection = vmx_cancel_injection,
.interrupt_allowed = vmx_interrupt_allowed,
.nmi_allowed = vmx_nmi_allowed,
.get_nmi_mask = vmx_get_nmi_mask,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 84bfb51..1040d3f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4709,6 +4709,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (unlikely(r))
goto out;
 
+   inject_pending_event(vcpu);
+
+   /* enable NMI/IRQ window open exits if needed */
+   if (vcpu->arch.nmi_pending)
+   kvm_x86_ops->enable_nmi_window(vcpu);
+   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
+   kvm_x86_ops->enable_irq_window(vcpu);
+
+   if (kvm_lapic_enabled(vcpu)) {
+   update_cr8_intercept(vcpu);
+   kvm_lapic_sync_to_vapic(vcpu);
+   }
+
preempt_disable();
 
kvm_x86_ops->prepare_guest_switch(vcpu);
@@ -4727,23 +4740,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
smp_wmb();
local_irq_enable();
preempt_enable();
+   kvm_x86_ops->cancel_injection(vcpu);
r = 1;
goto out;
}
 
-   inject_pending_event(vcpu);
-
-   /* enable NMI/IRQ window open exits if needed */
-   if (vcpu->arch.nmi_pending)
-   kvm_x86_ops->enable_nmi_window(vcpu);
-   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
-   kvm_x86_ops->enable_irq_window(vcpu);
-
-   if (kvm_lapic_enabled(vcpu)) {
-   update_cr8_intercept(vcpu);
-   kvm_lapic_sync_to_vapic(vcpu);
-   }
-
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
 
kvm_guest_enter();
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Nonatomic interrupt injection

2010-07-20 Thread Avi Kivity
This patchset changes interrupt injection to be done from normal process
context instead of interrupts disabled context.  This is useful for real
mode interrupt injection on Intel without the current hacks (injecting as
a software interrupt of a vm86 task), reducing latencies, and later, for
allowing nested virtualization code to use kvm_read_guest()/kvm_write_guest()
instead of kmap() to access the guest vmcb/vmcs.

Seems to survive a hack that cancels every 16th entry, after injection has
already taken place.

TODO: svm support, more complicated due to debug and nsvm handling

Avi Kivity (3):
  KVM: VMX: Split up vmx_complete_interrupts()
  KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and
entry
  KVM: Non-atomic interrupt injection

 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/vmx.c  |   64 +-
 arch/x86/kvm/x86.c  |   27 
 3 files changed, 64 insertions(+), 28 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest][RFC PATCH 00/14] Patchset of network related subtests

2010-07-20 Thread Lucas Meneghel Rodrigues
On Tue, 2010-07-20 at 09:34 +0800, Amos Kong wrote:
> The following series contain 11 network related subtests, welcome to give me
> some suggestions about correctness, design, enhancement.

Awesome work, will start to review them today. Thanks!

> Thank you so much!
> 
> ---
> 
> Amos Kong (14):
>   KVM-test: Add a new macaddress pool algorithm
>   KVM Test: Add a function get_interface_name() to kvm_net_utils.py
>   KVM Test: Add a common ping module for network related tests
>   KVM-test: Add a new subtest ping
>   KVM-test: Add a subtest jumbo
>   KVM-test: Add basic file transfer test
>   KVM-test: Add a subtest of load/unload nic driver
>   KVM-test: Add a subtest of nic promisc
>   KVM-test: Add a subtest of multicast
>   KVM-test: Add a subtest of pxe
>   KVM-test: Add a subtest of changing mac address
>   KVM-test: Add a subtest of netperf
>   KVM-test: Improve vlan subtest
>   KVM-test: Add subtest of testing offload by ethtool
> 
> 
>  0 files changed, 0 insertions(+), 0 deletions(-)
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 01/14] KVM-test: Add a new macaddress pool algorithm

2010-07-20 Thread Michael Goldish
On 07/20/2010 04:34 AM, Amos Kong wrote:
> Old method uses the mac address in the configuration files which could
> lead serious problem when multiple tests running in different hosts.
> 
> This patch adds a new macaddress pool algorithm, it generates the mac prefix
> based on mac address of the host which could eliminate the duplicated mac
> addresses between machines.
> 
> When user have set the mac_prefix in the configuration file, we should use it
> in stead of the dynamic generated mac prefix.
> 
> Other change:
> . Fix randomly generating mac address so that it correspond to IEEE802.
> . Update clone function to decide clone mac address or not.
> . Update get_macaddr function.
> . Add set_mac_address function.
> 
> New auto mac address pool algorithm:
> If address_index is defined, VM will get mac from config file then record mac
> in to address_pool. If address_index is not defined, VM will call
> get_mac_from_pool to auto create mac then recored mac to address_pool in
> following format:
> {'macpool': {'AE:9D:94:6A:9b:f9': ['20100310-165222-Wt7l:0']}}
> 
>   AE:9D:94:6A:9b:f9: mac address
>   20100310-165222-Wt7l : instance attribute of VM
>   0: index of NIC

Why do you use the mac address as a key, instead of the instance string
+ nic index?  When the mac address is used as a key, each key has a list
of values instead of just one value.  This order seems unnatural.  If it
were the other way around (i.e. key = VM instance + nic index, value =
mac address), then each key would have exactly one value, and I think
this patch would be shorter and simpler.

> Signed-off-by: Jason Wang 
> Signed-off-by: Feng Yang 
> Signed-off-by: Amos Kong 
> ---
>  0 files changed, 0 insertions(+), 0 deletions(-)
> 
> diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
> index fb2d1c2..7c0946e 100644
> --- a/client/tests/kvm/kvm_utils.py
> +++ b/client/tests/kvm/kvm_utils.py
> @@ -5,6 +5,7 @@ KVM test utility functions.
>  """
>  
>  import time, string, random, socket, os, signal, re, logging, commands, 
> cPickle
> +import fcntl, shelve
>  from autotest_lib.client.bin import utils
>  from autotest_lib.client.common_lib import error, logging_config
>  import kvm_subprocess
> @@ -82,6 +83,104 @@ def get_sub_dict_names(dict, keyword):
>  
>  # Functions related to MAC/IP addresses
>  
> +def get_mac_from_pool(root_dir, vm, nic_index, prefix='00:11:22:33:'):

The name of this function is confusing because it does the exact
opposite: it puts a mac address in address_pool.  Maybe the pool you're
referring to in the name isn't address_pool, but still a less confusing
name should probably be used.

> +"""
> +random generated mac address.
> +
> +1) First try to generate macaddress based on the mac address prefix.
> +2) And then try to use total random generated mac address.
> +
> +@param root_dir: Root dir for kvm
> +@param vm: Here we use instance of vm
> +@param nic_index: The index of nic.
> +@param prefix: Prefix of mac address.
> +@Return: Return mac address.
> +"""
> +
> +lock_filename = os.path.join(root_dir, "mac_lock")
> +lock_file = open(lock_filename, 'w')
> +fcntl.lockf(lock_file.fileno() ,fcntl.LOCK_EX)
> +mac_filename = os.path.join(root_dir, "address_pool")

Maybe it makes sense to put address_pool and the lock file in /tmp,
where they can be shared by more than a single autotest instance running
on the same host (unlikely, but theoretically possible).

> +mac_shelve = shelve.open(mac_filename, writeback=False)
> +
> +mac_pool = mac_shelve.get("macpool")

Why is this 'macpool' needed?  Why not put the keys directly in the
shelve object?

> +
> +if not mac_pool:
> +mac_pool = {}
> +found = False
> +
> +val = "%s:%s" % (vm, nic_index)
> +for key in mac_pool.keys():
> +if val in mac_pool[key]:
> +mac_pool[key].append(val)

Why append val to mac_pool[key] if val is already in mac_pool[key]?

> +found = True
> +mac = key
> +
> +while not found:
> +postfix = "%02x:%02x" % (random.randint(0x00,0xfe),
> +random.randint(0x00,0xfe))
> +mac = prefix + postfix
> +mac_list = mac.split(":")
> +# Clear multicast bit
> +mac_list[0] = int(mac_list[0],16) & 0xfe
> +# Set local assignment bit (IEEE802)
> +mac_list[0] = mac_list[0] | 0x02
> +mac_list[0] = "%02x" % mac_list[0]

Why is this needed?  Most mac addresses begin with 00. If the mac
address is generated from the address of eth0 (using the method in this
patch) it begins with 00, which is fine. If the prefix is set by the
user using mac_prefix, I don't think we should modify it.

> +mac = ":".join(mac_list)
> +if mac not in mac_pool.keys() or 0 == len(mac_pool[mac]):
> +mac_pool[mac] = ["%s:%s" % (vm,nic_index)]
> +found = True
> +mac_shelve["macpool"] = 

Re: [PATCH 04/18] Make cpu_tsc_khz updates use local CPU

2010-07-20 Thread Avi Kivity

On 07/19/2010 11:06 PM, Zachary Amsden wrote:

+static void tsc_khz_changed(void *data)
  {
-/* nothing */
+struct cpufreq_freqs *freq = data;
+unsigned long khz = 0;
+
+if (data)
+khz = freq->new;
+else if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
+khz = cpufreq_quick_get(raw_smp_processor_id());
+if (!khz)
+khz = tsc_khz;
+__get_cpu_var(cpu_tsc_khz) = khz;
  }


Do we really need to cache cpufreq_quick_get()?  If it's really 
quick, why not just use it everywhere instead of cacheing it?  Not a 
comment on this patch.





If cpufreq is compiled in, but disabled, it returns zero, so we need 
some sort of logic.


Maybe it's better to put it into cpufreq_quick_get().  Inconsistent APIs 
that appear to work are bad.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PPC64/Power7 - 2.6.35-rc5] Bad relocation warnings whileBuilding a CONFIG_RELOCATABLE kernel with CONFIG_ISERIES enabled

2010-07-20 Thread Alexander Graf

On 20.07.2010, at 09:27, Milton Miller wrote:

> On Mon, 19 Jul 2010 about 14:00:56 +0200, Alexander Graf wrote:
>> Milton Miller wrote:
>>> I wrote:
>>> 
>>> Oh yea, and for book-3s, the code copies from 0x100 to __end_interrupts
>>> in arch/powerpc/kernel/exceptions-64s.h down to the real 0, but the rest
>>> of the kernel is at some disjointed address.  The interrupt will go to
>>> the copy at the real zero.  Any references to code outside that region
>>> must be done via a full indrect branch (not a relative one), simiar to
>>> the secondary startup (via following the function pointer in a descriptor
>>> set in very low memory), or syscall entry and exception vectors via paca.
>>> 
>> 
>> That would still break on normal PPC boxes, as any address accessed in
>> real mode has to be inside the RMA. And the #include for
>> kvm/book3s_rmhandlers.S happens after __end_interrupts. So I'd end up
>> with code that gets executed outside of the RMA after a relocation, right?
>> 
>> Alex
>> 
> 
> Weither its outside of the RMA or not, DO_KVM is creating a branch outside
> of code copied to lowmem.
> 
> This is BROKEN.
> 
> We have a hard limit that we can't extend _end_interrupts past 0x7000, and
> a soft limit that we can't exceed 0x6000.  If there is space, we could
> move the real mode handler extensions inside end_interrupts in
> exceptions-64s.S, and store the full address in a .quad so it gets
> relocated properly.  Don't subtract the start, we have designed the kernel
> to run with start at a VA that can be used as a EA in real mode.

Moving everything to exceptions-64s.S sounds like the best thing to do. All the 
code in real mode really is there so it stays inside the RMA. I don't think we 
can guarantee that for any code that is not copied, right?

> Otherwise we need to mark KVM_BOOK3S_64 depends on (!RELOCATABLE ||
> BROKEN) for 2.6.35 until we get fixes.

Well - it's only broken when really getting relocated. But I agree, the current 
state doesn't cope with Linux's relocation logic.

> I took a read though the book3s code as of 2.6.34.   A few things I noticed:
> 
> (1) The code is using slb large to control the segment size.   It should
> be using SLB B field (or just impliment 256M segments only).

I'm not sure I understand this part? We only use 256MB segments for now.

> (2) It appears that the mtspr and mfspr code is using the same storage for
> bats 4-7 as 0-3 ... I would have expected a 4 + a few places.

Yes, that one is fixed in more recent versions already.

> (3) Its not clear to me that you clear RI when transitioning to the guest
> but its obviously required because you place state in srr0 & srr1.

Uh - do I have to clear RI? I'm not prepared to take an interrupt anyways and 
RI is just a soft flag for Linux's handlers, right?

> (4) I don't understand why __kvmppc_vcpu_run turns on interrupts so that
> __kvmppc_vcpu_entry can turn them back off.   Something to do with
> irq trace annotations?

__kvmppc_vcpu_run turns on soft interrupts while __kvmppc_vcpu_entry turns them 
off in MSR. This is so that when enabling interrupts again on guest exit, we 
have the soft enable bit set.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html