[COMMIT stable-2.6.30] Define true and false bool constants

2009-06-10 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Needed by iommu.h.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/external-module-compat-comm.h b/external-module-compat-comm.h
index 581d867..d473ae7 100644
--- a/external-module-compat-comm.h
+++ b/external-module-compat-comm.h
@@ -199,6 +199,9 @@ uint64_t div64_u64(uint64_t dividend, uint64_t divisor);
 
 typedef _Bool bool;
 
+#define false 0
+#define true 1
+
 #endif
 
 #endif
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT stable-2.6.30] Include asm/uaccess.h on pre-2.6.18 kernels

2009-06-10 Thread Avi Kivity
From: Jan Kiszka jan.kis...@siemens.com

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/external-module-compat-comm.h b/external-module-compat-comm.h
index d473ae7..e2342db 100644
--- a/external-module-compat-comm.h
+++ b/external-module-compat-comm.h
@@ -302,7 +302,11 @@ static inline void pagefault_enable(void)
 
 #endif
 
+#if LINUX_VERSION_CODE  KERNEL_VERSION(2,6,18)
+#include asm/uaccess.h
+#else
 #include linux/uaccess.h
+#endif
 
 /* vm ops -fault() was introduced in 2.6.23. */
 #include linux/mm.h
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT stable-2.6.30] Update source link for v2.6.30

2009-06-10 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/linux-2.6 b/linux-2.6
index 9fa7eb2..07a2039 16
--- a/linux-2.6
+++ b/linux-2.6
@@ -1 +1 @@
-Subproject commit 9fa7eb283c5cdc2b0f4a8cfe6387ed82e5e9a3d3
+Subproject commit 07a2039b8eb0af4ff464efd3dfd95de5c02648c6
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] bios: Fix MADT table creation

2009-06-10 Thread Avi Kivity
From: Beth Kon e...@us.ibm.com

Correct MADT table size calculation.  Based on patch from Vincent Minet.

Signed-off-by: Beth Kon e...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 369cbef..cdae363 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -86,6 +86,8 @@ typedef unsigned long long uint64_t;
 #define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
 #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
 
+#define MAX_INT_OVERRIDES 16
+
 static inline void outl(int addr, int val)
 {
 asm volatile (outl %1, %w0 : : d (addr), a (val));
@@ -1600,7 +1602,7 @@ void acpi_bios_init(void)
 uint32_t hpet_addr;
 #endif
 uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr, 
ssdt_addr;
-uint32_t acpi_tables_size, madt_addr, madt_size, rsdt_size;
+uint32_t acpi_tables_size, madt_addr, madt_size, rsdt_size, madt_end;
 uint32_t srat_addr,srat_size;
 uint16_t i, external_tables;
 int nb_numa_nodes;
@@ -1668,7 +1670,7 @@ void acpi_bios_init(void)
 madt_size = sizeof(*madt) +
 sizeof(struct madt_processor_apic) * MAX_CPUS +
 #ifdef BX_QEMU
-sizeof(struct madt_io_apic) /* + sizeof(struct madt_int_override) */;
+sizeof(struct madt_io_apic)  + sizeof(struct madt_int_override) * 
MAX_INT_OVERRIDES;
 #else
 sizeof(struct madt_io_apic);
 #endif
@@ -1786,8 +1788,9 @@ void acpi_bios_init(void)
 continue;
 }
 int_override++;
-madt_size += sizeof(struct madt_int_override);
 }
+madt_end = (uint32_t)int_override;
+madt_size = madt_end - madt_addr;
 acpi_build_table_header((struct acpi_table_header *)madt,
 APIC, madt_size, 1);
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MCE: Fix compiling without CONFIG_X86_MCE

2009-06-10 Thread Avi Kivity
From: Huang Ying ying.hu...@intel.com

Enclose do_machine_check with #ifdef CONFIG_X86_MCE.

Reported-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Huang Ying ying.hu...@intel.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 673bcb3..8c60db6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2681,12 +2681,14 @@ static int handle_rmode_exception(struct kvm_vcpu *vcpu,
  */
 static void kvm_machine_check(void)
 {
+#if defined(CONFIG_X86_MCE)  defined(CONFIG_X86_64)
struct pt_regs regs = {
.cs = 3, /* Fake ring 3 no matter what the guest ran on */
.flags = X86_EFLAGS_IF,
};
 
do_machine_check(regs, 0);
+#endif
 }
 
 static int handle_machine_check(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pte

2009-06-10 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Since the guest and host ptes can have wildly different format, adjust
the pte accessor names to indicate on which type of pte they operate on.

No functional changes.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8e5003c..415630d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -240,12 +240,12 @@ static int is_writeble_pte(unsigned long pte)
return pte  PT_WRITABLE_MASK;
 }
 
-static int is_dirty_pte(unsigned long pte)
+static int is_dirty_gpte(unsigned long pte)
 {
return pte  PT_DIRTY_MASK;
 }
 
-static int is_rmap_pte(u64 pte)
+static int is_rmap_spte(u64 pte)
 {
return is_shadow_present_pte(pte);
 }
@@ -498,7 +498,7 @@ static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, 
gfn_t gfn, int lpage)
unsigned long *rmapp;
int i;
 
-   if (!is_rmap_pte(*spte))
+   if (!is_rmap_spte(*spte))
return;
gfn = unalias_gfn(vcpu-kvm, gfn);
sp = page_header(__pa(spte));
@@ -560,7 +560,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
unsigned long *rmapp;
int i;
 
-   if (!is_rmap_pte(*spte))
+   if (!is_rmap_spte(*spte))
return;
sp = page_header(__pa(spte));
pfn = spte_to_pfn(*spte);
@@ -1747,7 +1747,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
 __func__, *shadow_pte, pt_access,
 write_fault, user_fault, gfn);
 
-   if (is_rmap_pte(*shadow_pte)) {
+   if (is_rmap_spte(*shadow_pte)) {
/*
 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
 * the parent of the now unreachable PTE.
@@ -1783,7 +1783,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
page_header_update_slot(vcpu-kvm, shadow_pte, gfn);
if (!was_rmapped) {
rmap_add(vcpu, shadow_pte, gfn, largepage);
-   if (!is_rmap_pte(*shadow_pte))
+   if (!is_rmap_spte(*shadow_pte))
kvm_release_pfn_clean(pfn);
} else {
if (was_writeble)
@@ -1960,7 +1960,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
ASSERT(!VALID_PAGE(root));
if (vcpu-arch.mmu.root_level == PT32E_ROOT_LEVEL) {
pdptr = kvm_pdptr_read(vcpu, i);
-   if (!is_present_pte(pdptr)) {
+   if (!is_present_gpte(pdptr)) {
vcpu-arch.mmu.pae_root[i] = 0;
continue;
}
@@ -2451,7 +2451,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu 
*vcpu, gpa_t gpa,
if ((bytes == 4)  (gpa % 4 == 0))
memcpy((void *)gpte, new, 4);
}
-   if (!is_present_pte(gpte))
+   if (!is_present_gpte(gpte))
return;
gfn = (gpte  PT64_BASE_ADDR_MASK)  PAGE_SHIFT;
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 3494a2f..016bf71 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -75,7 +75,7 @@ static inline int is_paging(struct kvm_vcpu *vcpu)
return vcpu-arch.cr0  X86_CR0_PG;
 }
 
-static inline int is_present_pte(unsigned long pte)
+static inline int is_present_gpte(unsigned long pte)
 {
return pte  PT_PRESENT_MASK;
 }
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 4cb1dbf..238a193 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -132,7 +132,7 @@ walk:
 #if PTTYPE == 64
if (!is_long_mode(vcpu)) {
pte = kvm_pdptr_read(vcpu, (addr  30)  3);
-   if (!is_present_pte(pte))
+   if (!is_present_gpte(pte))
goto not_present;
--walker-level;
}
@@ -155,7 +155,7 @@ walk:
 
kvm_read_guest(vcpu-kvm, pte_gpa, pte, sizeof(pte));
 
-   if (!is_present_pte(pte))
+   if (!is_present_gpte(pte))
goto not_present;
 
rsvd_fault = is_rsvd_bits_set(vcpu, pte, walker-level);
@@ -205,7 +205,7 @@ walk:
--walker-level;
}
 
-   if (write_fault  !is_dirty_pte(pte)) {
+   if (write_fault  !is_dirty_gpte(pte)) {
bool ret;
 
mark_page_dirty(vcpu-kvm, table_gfn);
@@ -252,7 +252,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *page,
 
gpte = *(const pt_element_t *)pte;
if (~gpte  (PT_PRESENT_MASK | PT_ACCESSED_MASK)) {
-   if (!is_present_pte(gpte))
+   if (!is_present_gpte(gpte))
set_shadow_pte(spte, shadow_notrap_nonpresent_pte);
return;
}
@@ -289,7 +289,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
pt_element_t curr_pte;
struct 

[COMMIT master] KVM: MMU: s/shadow_pte/spte/

2009-06-10 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

We use shadow_pte and spte inconsistently, switch to the shorter spelling.

Rename set_shadow_pte() to __set_spte() to avoid a conflict with the
existing set_spte(), and to indicate its lowlevelness.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 415630d..abd3a17 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -143,7 +143,7 @@ module_param(oos_shadow, bool, 0644);
 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
 
 struct kvm_rmap_desc {
-   u64 *shadow_ptes[RMAP_EXT];
+   u64 *sptes[RMAP_EXT];
struct kvm_rmap_desc *more;
 };
 
@@ -262,7 +262,7 @@ static gfn_t pse36_gfn_delta(u32 gpte)
return (gpte  PT32_DIR_PSE36_MASK)  shift;
 }
 
-static void set_shadow_pte(u64 *sptep, u64 spte)
+static void __set_spte(u64 *sptep, u64 spte)
 {
 #ifdef CONFIG_X86_64
set_64bit((unsigned long *)sptep, spte);
@@ -510,21 +510,21 @@ static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, 
gfn_t gfn, int lpage)
} else if (!(*rmapp  1)) {
rmap_printk(rmap_add: %p %llx 1-many\n, spte, *spte);
desc = mmu_alloc_rmap_desc(vcpu);
-   desc-shadow_ptes[0] = (u64 *)*rmapp;
-   desc-shadow_ptes[1] = spte;
+   desc-sptes[0] = (u64 *)*rmapp;
+   desc-sptes[1] = spte;
*rmapp = (unsigned long)desc | 1;
} else {
rmap_printk(rmap_add: %p %llx many-many\n, spte, *spte);
desc = (struct kvm_rmap_desc *)(*rmapp  ~1ul);
-   while (desc-shadow_ptes[RMAP_EXT-1]  desc-more)
+   while (desc-sptes[RMAP_EXT-1]  desc-more)
desc = desc-more;
-   if (desc-shadow_ptes[RMAP_EXT-1]) {
+   if (desc-sptes[RMAP_EXT-1]) {
desc-more = mmu_alloc_rmap_desc(vcpu);
desc = desc-more;
}
-   for (i = 0; desc-shadow_ptes[i]; ++i)
+   for (i = 0; desc-sptes[i]; ++i)
;
-   desc-shadow_ptes[i] = spte;
+   desc-sptes[i] = spte;
}
 }
 
@@ -535,14 +535,14 @@ static void rmap_desc_remove_entry(unsigned long *rmapp,
 {
int j;
 
-   for (j = RMAP_EXT - 1; !desc-shadow_ptes[j]  j  i; --j)
+   for (j = RMAP_EXT - 1; !desc-sptes[j]  j  i; --j)
;
-   desc-shadow_ptes[i] = desc-shadow_ptes[j];
-   desc-shadow_ptes[j] = NULL;
+   desc-sptes[i] = desc-sptes[j];
+   desc-sptes[j] = NULL;
if (j != 0)
return;
if (!prev_desc  !desc-more)
-   *rmapp = (unsigned long)desc-shadow_ptes[0];
+   *rmapp = (unsigned long)desc-sptes[0];
else
if (prev_desc)
prev_desc-more = desc-more;
@@ -587,8 +587,8 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
desc = (struct kvm_rmap_desc *)(*rmapp  ~1ul);
prev_desc = NULL;
while (desc) {
-   for (i = 0; i  RMAP_EXT  desc-shadow_ptes[i]; ++i)
-   if (desc-shadow_ptes[i] == spte) {
+   for (i = 0; i  RMAP_EXT  desc-sptes[i]; ++i)
+   if (desc-sptes[i] == spte) {
rmap_desc_remove_entry(rmapp,
   desc, i,
   prev_desc);
@@ -619,10 +619,10 @@ static u64 *rmap_next(struct kvm *kvm, unsigned long 
*rmapp, u64 *spte)
prev_desc = NULL;
prev_spte = NULL;
while (desc) {
-   for (i = 0; i  RMAP_EXT  desc-shadow_ptes[i]; ++i) {
+   for (i = 0; i  RMAP_EXT  desc-sptes[i]; ++i) {
if (prev_spte == spte)
-   return desc-shadow_ptes[i];
-   prev_spte = desc-shadow_ptes[i];
+   return desc-sptes[i];
+   prev_spte = desc-sptes[i];
}
desc = desc-more;
}
@@ -644,7 +644,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
BUG_ON(!(*spte  PT_PRESENT_MASK));
rmap_printk(rmap_write_protect: spte %p %llx\n, spte, *spte);
if (is_writeble_pte(*spte)) {
-   set_shadow_pte(spte, *spte  ~PT_WRITABLE_MASK);
+   __set_spte(spte, *spte  ~PT_WRITABLE_MASK);
write_protected = 1;
}
spte = rmap_next(kvm, rmapp, spte);
@@ -668,7 +668,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
if (is_writeble_pte(*spte)) {
rmap_remove(kvm, spte);
--kvm-stat.lpages;
-   set_shadow_pte(spte, 

Re: TODO list for qemu+KVM networking performance v2

2009-06-10 Thread Dor Laor

Rusty Russell wrote:

On Fri, 5 Jun 2009 02:13:20 am Michael S. Tsirkin wrote:
  

I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as
well, and intend to dump updates there from time to time.



Hi Michael,

  Sorry for the delay.  I'm weaning myself off my virtio work, but virtio_net 
performance is an issue which still needs lots of love.  

BTW a non-wiki on the wiki?.  You should probably rename it to 
MST_Networking_Performance or allow editing :)


  

- skbs in flight are kept in send queue linked list,
  so that we can flush them when device is removed
  [ mst: optimization idea: virtqueue already tracks
posted buffers. Add flush/purge operation and use that instead?



Interesting idea,  but not really an optimization.  (flush_buf() which does a 
get_buf() but for unused buffers).


  

] - skb is reformatted to scattergather format
  [ mst: idea to try: this does a copy for skb head,
which might be costly especially for small/linear packets.
Try to avoid this? Might need to tweak virtio interface.
  ]



There's no copy here that I can see?

  

- network driver adds the packet buffer on TX ring
- network driver does a kick which causes a VM exit
  [ mst: any way to mitigate # of VM exits here?
Possibly could be done on host side as well. ]
  [ markmc: All of our efforts there have been on the host side, I think
that's preferable than trying to do anything on the guest side.
]



The current theoretical hole is that the host suppresses notifications using 
the VIRTIO_AVAIL_F_NO_NOTIFY flag, but we can get a number of notifications in 
before it gets to that suppression.  You can use a counter to improve this: 
you only notify when they're equal, and inc when you notify.  That way you 
suppress further notifications even if the other side takes ages to wake up.  
In practice, this shouldn't be played with until we have full aio (or equiv in 
kernel) for other side: host xmit tends to be too fast at the moment and we 
get a notification per packet anyway.
  
Xen ring has the exact optimization for ages. imho we should have it 
too, regardless of aio.

It reduces #vmexits/spurious wakeups and it is very simple to implement.

  

- Full queue:
we keep a single extra skb around:
if we fail to transmit, we queue it
[ mst: idea to try: what does it do to
  performance if we queue more packets? ]



Bad idea!!  We already have two queues, this is a third.  We should either 
stop the queue before it gets full, or fix TX_BUSY handling.  I've been arguing 
on netdev for the latter (see thread[PATCH 2/4] virtio_net: return 
NETDEV_TX_BUSY instead of queueing an extra skb.).


  

[ markmc: the queue might soon be going away:
   200905292346.04815.ru...@rustcorp.com.au



Ah, yep, that one.

  

http://archive.netbsd.se/?ml=linux-netdeva=2009-05m=10788575 ]

- We get each buffer from host as it is completed and free it
- TX interrupts are only enabled when queue is stopped,
  and when it is originally created (we disable them on completion)
  [ mst: idea: second part is probably unintentional.
todo: we probably should disable interrupts when device is
created. ]



Yep, minor wart.

  

- We poll for buffer completions:
  1. Before each TX 2. On a timer tasklet (unless 3 is supported)
  3. When host sends us interrupt telling us that the queue is
empty [ mst: idea to try: instead of empty, enable send interrupts on xmit
when buffer is almost full (e.g. at least half empty): we are running out
of buffers, it's important to free them ASAP. Can be done from host or from
guest. ]
  [ Rusty proposing that we don't need (2) or (3) if the skbs are
orphaned before start_xmit(). See subj net: skb_orphan on
dev_hard_start_xmit.] [ rusty also seems to be suggesting that disabling
VIRTIO_F_NOTIFY_ON_EMPTY on the host should help the case where the host
out-paces the guest ]



Yes, that's more fruitful.

  

- Each skb has a 128 byte buffer at head and a single page for
data. Only full pages are passed to virtio buffers.
  [ mst: for large packets, managing the 128 head buffers is wasted
effort. Try allocating skbs on rcv path when needed. ].
[ mst: to clarify the previos suggestion: I am talking about
merging here.  We currently allocate skbs and pages for them. If a
packet spans multiple pages, we discard the extra skbs.  Instead, let's
allocate pages but not skbs. Allocate and fill skbs on receive path. ]



Yep.  There's another issue here, which is alignment: packets which get placed 
into pages are misaligned (that 14 byte ethernet header).  We should add a 
feature to allow the 

RE: [PATCH RFC] qemu: fix hot remove assigned device

2009-06-10 Thread Han, Weidong
Gerd Hoffmann wrote:
 On 06/09/09 16:51, Paul Brook wrote:
 On Tuesday 09 June 2009, Han, Weidong wrote:
 Paul Brook wrote:
 On Monday 08 June 2009, Weidong Han wrote:
 When hot remove an assigned device, segmentation fault was
 triggered by qemu_free(pci_dev-qdev) in pci_unregister_device().
 pci_register_device() doesn't initialize or set pci_dev-qdev.
 For an assigned device, qdev variable isn't touched at all. So
 segmentation fault happens when to free a non-initialized qdev.
 Better would be to just disable hot remove for devices still using
 the legacy pci_register_device API.
 PCI passthrough uses pci_register_device to register assigned
 device to qemu. Is there newer API to do so?
 
 Yes. See e.g. LSI scsi emulation.
 
 Well.  Except that you can't (yet) register pci config read/write
 callbacks using the qdev-based API.
 

So pci passthrough have to use pci_register_device now. I cooked a new patch 
(post below) to fix this issue. Thanks.


When hot remove an assigned device, segmentation fault was triggered
by qdev_free(pci_dev-qdev) in pci_unregister_device().
pci_register_device() doesn't initialize or set pci_dev-qdev. For an
assigned device, qdev variable isn't touched at all. So segmentation
fault happens when to free a non-initialized qdev.

This patch adds a parameter in pci_unregister_device to check if
it's an assigned device. For assgined device, free pci_dev instead of
qdev. No changes for other devices.


Signed-off-by: Weidong Han weidong@intel.com
---
 hw/device-assignment.c |2 +-
 hw/pci-hotplug.c   |2 +-
 hw/pci.c   |8 ++--
 hw/pci.h   |2 +-
 4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 624d15a..65920d0 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -581,7 +581,7 @@ static void free_assigned_device(AssignedDevInfo *adev)
 dev-real_device.config_fd = 0;
 }
 
-pci_unregister_device(dev-dev);
+pci_unregister_device(dev-dev, 1);
 #ifdef KVM_CAP_IRQ_ROUTING
 free_dev_irq_entries(dev);
 #endif
diff --git a/hw/pci-hotplug.c b/hw/pci-hotplug.c
index d7c8b84..18a4912 100644
--- a/hw/pci-hotplug.c
+++ b/hw/pci-hotplug.c
@@ -271,6 +271,6 @@ void pci_device_hot_remove_success(int pcibus, int slot)
 break;
 }
 
-pci_unregister_device(d);
+pci_unregister_device(d, 0);
 }
 
diff --git a/hw/pci.c b/hw/pci.c
index 25581a4..35fd064 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -363,7 +363,7 @@ static void pci_unregister_io_regions(PCIDevice *pci_dev)
 }
 }
 
-int pci_unregister_device(PCIDevice *pci_dev)
+int pci_unregister_device(PCIDevice *pci_dev, int assigned)
 {
 int ret = 0;
 
@@ -377,7 +377,11 @@ int pci_unregister_device(PCIDevice *pci_dev)
 qemu_free_irqs(pci_dev-irq);
 pci_irq_index--;
 pci_dev-bus-devices[pci_dev-devfn] = NULL;
-qdev_free(pci_dev-qdev);
+
+if (assigned)
+qemu_free(pci_dev);
+else
+qdev_free(pci_dev-qdev);
 return 0;
 }
 
diff --git a/hw/pci.h b/hw/pci.h
index f2dccb5..f658e78 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -199,7 +199,7 @@ PCIDevice *pci_register_device(PCIBus *bus, const char 
*name,
int instance_size, int devfn,
PCIConfigReadFunc *config_read,
PCIConfigWriteFunc *config_write);
-int pci_unregister_device(PCIDevice *pci_dev);
+int pci_unregister_device(PCIDevice *pci_dev, int assigned);
 
 void pci_register_io_region(PCIDevice *pci_dev, int region_num,
 uint32_t size, int type,
-- 
1.6.0.4--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG at mmu.c:615 from localhost migration using ept+hugetlbfs

2009-06-10 Thread Avi Kivity

Avi Kivity wrote:


Not really.  One thing, migration should transition the shadow 
pagetables from large pages to small ones, maybe that bit is broken.


Maybe we're looking at a largepage spte and interpreting it as a 
normal L2 spte, and interpreting a guest page as the L1 spt.


I tried to find where we drop the mmu (or at least large sptes for the 
slot) when we enable dirty logging, and failed.  Maybe 
remove_write_access() is sufficient.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] qemu: fix hot remove assigned device

2009-06-10 Thread Avi Kivity

Han, Weidong wrote:
 
-int pci_unregister_device(PCIDevice *pci_dev)

+int pci_unregister_device(PCIDevice *pci_dev, int assigned)
 {
 int ret = 0;
 
@@ -377,7 +377,11 @@ int pci_unregister_device(PCIDevice *pci_dev)

 qemu_free_irqs(pci_dev-irq);
 pci_irq_index--;
 pci_dev-bus-devices[pci_dev-devfn] = NULL;
-qdev_free(pci_dev-qdev);
+
+if (assigned)
+qemu_free(pci_dev);
+else
+qdev_free(pci_dev-qdev);
 return 0;
 }
  


Can you check pci_dev-qdev instead of assigned?  A little less ugly.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive

2009-06-10 Thread Yolkfull Chow

On 06/09/2009 05:44 PM, Michael Goldish wrote:

The test looks pretty nicely written. Comments:

1. Consider making all the cloned VMs use image snapshots:

curr_vm = vm1.clone()
curr_vm.get_params()[extra_params] +=  -snapshot
   
I'm not sure it's a good idea to let all VMs use the same disk image.

Or maybe you shouldn't add -snapshot yourself, but rather do it in the config
file for the first VM, and then all cloned VMs will have -snapshot as well.
   

Yes I use 'image_snapshot = yes' in config file.

2. Consider changing the message
 Booting the %dth guest % num
to
Booting guest #%d % num
(because there's no such thing as 2th and 3th)
   
3. Consider changing the message

Cannot boot vm anylonger
to
Cannot create VM #%d % num

4. Why not add curr_vm to vms immediately after cloning it?
That way you can kill it in the exception handler later, without having
to send it a 'quit' if you can't login ('if not curr_vm_session').
   

Yes, good idea.

5.  %dth guest boots up successfully % num --  again, 2th and 3th make no 
sense.
Also, I wonder why you add those spaces before every info message.

6. %dth guest's session is not responsive --  same
(maybe use Guest session #%d is not responsive % num)

7. Shut down the %dth guest --  same
(maybe Shutting down guest #%d? or destroying/killing?)

8. Shouldn't we fail the test when we find an unresponsive session?
It seems you just display an error message. You can simply replace
logging.error( with raise error.TestFail(.
   



9. Consider using a stricter test than just vm_session.is_responsive().
vm_session.is_responsive() just sends ENTER to the sessions and returns
True if it gets anything as a result (usually a prompt, or even just a
newline echoed back). If the session passes this test it is indeed
responsive, so it's a decent test, but maybe you can send some command
(user configurable?) and test for some output. I'm really not sure this
is important, because I can't imagine a session would respond to a newline
but not to other commands, but who knows. Maybe you can send the first VM
a user-specified command when the test begins, remember the output, and
then send all other VMs the same command and make sure the output is the
same.
   
maybe use 'info status' and send command 'help' via session to vms and 
compare their output?

10. I'm not sure you should use the param kill_vm_gracefully because that's
a postprocessor param (probably not your business). You can just call
destroy() in the exception handler with gracefully=False, because if the VMs
are non- responsive, I don't expect them to shutdown nicely with an SSH
command (that's what gracefully does). Also, we're using -snapshot, so
there's no reason to shut them down nicely.
   

Yes,  I agree. :)

11. Total number booted successfully: %d % (num - 1) --  why not just num?
We really have num VMs including the first one.
Or you can say: Total number booted successfully in addition to the first one
but that's much longer.
   
Since after the first guest booted, I set num = 1 and then  'num += 1' 
at first in while loop ( for the purpose of getting a new vm ).
So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot up, 
the num booted successfully should be (num - 1).
I would use enumerate(vms) that Uri suggested to make number easier to 
count.

12. Consider adding a 'max_vms' (or 'threshold') user param to the test. If
num reaches 'max_vms', we stop adding VMs and pass the test. Otherwise the
test will always fail (which is depressing). If params.get(threshold) is
None or , or in short -- 'if not params.get(threshold)', disable this
feature and keep adding VMs forever. The user can enable the feature with:
max_vms = 50
or disable it with:
max_vms =
   

This is a good idea for hardware resource limit of host.

13. Why are you catching OSError? If you get OSError it might be a framework 
bug.
   
Since sometimes, vm.create() successfully but failed to ssh-login since 
the running python cannot allocate physical memory (OSError).

Add max_vms could fix this problem I think.

14. At the end of the exception handler you should proably re-raise the 
exception
you caught. Otherwise the user won't see the error message. You can simply 
replace
'break' with 'raise' (no parameters), and it should work, hopefully.
   

Yes I should if add a 'max_vms'.

I know these are quite a few comments, but they're all rather minor and the test
is well written in my opinion.
   
Thank you,  I will do modification according to your and Uri's comments, 
and will re-submit it here later. :)


Thanks and Best Regards,
Yolkfull

Thanks,
Michael

- Original Message -
From: Yolkfull Chowyz...@redhat.com
To:kvm@vger.kernel.org
Cc: Uri Lublinu...@redhat.com
Sent: Tuesday, June 9, 2009 11:41:54 AM (GMT+0200) Auto-Detected
Subject: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes 
unresponsive


Hi,

This test will boot VMs until one of them becomes unresponsive, and
records the maximum 

Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive

2009-06-10 Thread Yolkfull Chow

On 06/09/2009 08:45 PM, Uri Lublin wrote:

On 06/09/2009 11:41 AM, Yolkfull Chow wrote:


Hi,

This test will boot VMs until one of them becomes unresponsive, and
records the maximum number of VMs successfully started.




Hello,

Some more comments (in addition to previous comments by others)
1. Do not just send monitor command quit but use vm.destroy
   * This was mentioned by Michael, but in a different context.
2. Do not destroy main_vm (or vm1). We may want to run other tests on it.
3. You can use enumerate(vms) instead of looking for vm with index.
4. It would be nice to close all ssh sessions too.

OK, I will do modification according to your comments, thank you so much. :)

Best Regards,
Yolkfull



Regards,
Uri.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RFC] qemu: fix hot remove assigned device

2009-06-10 Thread Han, Weidong
Avi Kivity wrote:
 Han, Weidong wrote:
 
 -int pci_unregister_device(PCIDevice *pci_dev)
 +int pci_unregister_device(PCIDevice *pci_dev, int assigned)  {
  int ret = 0;
 
 @@ -377,7 +377,11 @@ int pci_unregister_device(PCIDevice *pci_dev)
  qemu_free_irqs(pci_dev-irq);
  pci_irq_index--;
  pci_dev-bus-devices[pci_dev-devfn] = NULL;
 -qdev_free(pci_dev-qdev);
 +
 +if (assigned)
 +qemu_free(pci_dev);
 +else
 +qdev_free(pci_dev-qdev);
  return 0;
  }
 
 
 Can you check pci_dev-qdev instead of assigned?  A little less ugly.

I tried to find an easy and clean way to check it, but I found the members of 
struct PCIDevice and DeviceState were not suitable for this checking, and qdev 
is not pointer in struct PCIDevice.

Regards,
Weidong--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] qemu: fix hot remove assigned device

2009-06-10 Thread Avi Kivity

Han, Weidong wrote:

I tried to find an easy and clean way to check it, but I found the members of 
struct PCIDevice and DeviceState were not suitable for this checking, and qdev 
is not pointer in struct PCIDevice.
  


Yes, of course.  I applied the patch, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] qemu: fix hot remove assigned device

2009-06-10 Thread Gerd Hoffmann

On 06/10/09 10:31, Han, Weidong wrote:

Avi Kivity wrote:

Can you check pci_dev-qdev instead of assigned?  A little less
ugly.


I tried to find an easy and clean way to check it, but I found the
members of struct PCIDevice and DeviceState were not suitable for
this checking, and qdev is not pointer in struct PCIDevice.


Well, certain DeviceState pointers (type for example) are set only in 
case the device was created using the qdev api.  I think you certainly 
should get away without adding a new parameter to pci_unregister_device.


Also I've just sent a patch to the qemu-devel fixing the qdev register 
API for pci devices, allowing to register config space callbacks.  Once 
this is merged you can switch pci passthrough to the new qdev API.


cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RFC] qemu: fix hot remove assigned device

2009-06-10 Thread Han, Weidong
Gerd Hoffmann wrote:
 On 06/10/09 10:31, Han, Weidong wrote:
 Avi Kivity wrote:
 Can you check pci_dev-qdev instead of assigned?  A little less
 ugly.
 
 I tried to find an easy and clean way to check it, but I found the
 members of struct PCIDevice and DeviceState were not suitable for
 this checking, and qdev is not pointer in struct PCIDevice.
 
 Well, certain DeviceState pointers (type for example) are set only in
 case the device was created using the qdev api.  I think you certainly
 should get away without adding a new parameter to
 pci_unregister_device. 
 
 Also I've just sent a patch to the qemu-devel fixing the qdev register
 API for pci devices, allowing to register config space callbacks. 
 Once this is merged you can switch pci passthrough to the new qdev
 API. 
 

Good. Extending the qdev APIs is the most elegant solution. Thanks.

Regards,
Weidong
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MCE: Fix compiling without CONFIG_X86_MCE

2009-06-10 Thread Avi Kivity

Huang Ying wrote:

Enclose do_machine_check with #ifdef CONFIG_X86_MCE.

Reported-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Huang Ying ying.hu...@intel.com

  


Applied, thanks.

I would have preferred to select X86_MCE instead, but that doesn't do 
anything (need to select X86_MCE_INTEL as well, and that becomes fragile).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/5] VMX EPT misconfigurtion handler

2009-06-10 Thread Yang, Sheng
On Wednesday 10 June 2009 05:30:09 Marcelo Tosatti wrote:
 From the Intel docs:

 An EPT misconfiguration occurs when, in the course of translation
 a guest-physical address, the logical processor encounters an EPT
 paging-structure entry that contains an unsupported value.

 Handle this event and print useful information for diagnostics.

Looks fine to me, thanks!

-- 
regards
Yang, Sheng


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/5] KVM: MMU: make for_each_shadow_entry aware of largepages

2009-06-10 Thread Avi Kivity

Marcelo Tosatti wrote:

This way there is no need to add explicit checks in every
for_each_shadow_entry user.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1273,6 +1273,11 @@ static bool shadow_walk_okay(struct kvm_
 {
if (iterator-level  PT_PAGE_TABLE_LEVEL)
return false;
+
+   if (iterator-level == PT_PAGE_TABLE_LEVEL)
+   if (is_large_pte(*iterator-sptep))
+   return false;

  

s/==//?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/4] udp: Handle large UFO packets from untrusted sources

2009-06-10 Thread Herbert Xu
On Tue, Jun 09, 2009 at 10:51:23PM -0700, Sridhar Samudrala wrote:

 Unfortunately, this doesn't work for UDP without any changes.  
 skb_segment() currently adds transport header to
 each segmented skb. But when we are using IP fragmentation, only the  
 first fragment should include the UDP
 header.
 We either need to fix skb_segment() to handle IP fragmentation or write  
 a new skb_fragment(). I will look into
 this when i get some time.

Couldn't you get around this by setting skb-transport_header to
skb-network_header before getting into skb_segment?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 5/5] KVM: VMX: conditionally disable 2M pages

2009-06-10 Thread Avi Kivity

Marcelo Tosatti wrote:

Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear
in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled.

Index: kvm/virt/kvm/kvm_main.c
===
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -85,6 +85,8 @@ static long kvm_vcpu_ioctl(struct file *
 
 static bool kvm_rebooting;
 
+static bool largepages_disabled = false;

+
 #ifdef KVM_CAP_DEVICE_ASSIGNMENT
 static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head 
*head,
  int assigned_dev_id)
@@ -1184,6 +1186,10 @@ int __kvm_set_memory_region(struct kvm *
if ((base_gfn ^ ugfn)  (KVM_PAGES_PER_HPAGE - 1))
for (i = 0; i  largepages; ++i)
new.lpage_info[i].write_count = 1;
+
+   if (largepages_disabled)
+   for (i = 0; i  largepages; ++i)
+   new.lpage_info[i].write_count = 1;
}
  


Please reuse the loop above, it's exactly the same (think of 1GB pages).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/5] KVM: MMU: make for_each_shadow_entry aware of largepages

2009-06-10 Thread Avi Kivity

Avi Kivity wrote:

Marcelo Tosatti wrote:

This way there is no need to add explicit checks in every
for_each_shadow_entry user.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1273,6 +1273,11 @@ static bool shadow_walk_okay(struct kvm_
 {
 if (iterator-level  PT_PAGE_TABLE_LEVEL)
 return false;
+
+if (iterator-level == PT_PAGE_TABLE_LEVEL)
+if (is_large_pte(*iterator-sptep))
+return false;

  

s/==//?



Ah, it's actually fine.  But changing == to = will make it 1GBpage-ready.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-10 Thread Michael S. Tsirkin
On Wed, Jun 10, 2009 at 12:19:42AM +0100, Paul Brook wrote:
 On Monday 25 May 2009, Michael S. Tsirkin wrote:
  Add functions implementing MSI-X support. First user will be virtio-pci.
  Note that platform must set a flag to declare MSI supported.
  For PC this will be set by APIC.
 
 This sounds wrong. The device shouldn't know or care whether the system has a 
 MSI capable interrupt controller. That's for the guest OS to figure out.
 
 Paul 

You are right of course. In theory there's nothing that breaks if I
set this flag to on, on all platforms. OTOH if qemu emulates some
controller incorrectly, guest might misdetect MSI support in the
controller, and things will break horribly.

It seems safer to have a flag that can be enabled by people
that know about a specific platform.

What do you think?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities

2009-06-10 Thread Michael S. Tsirkin
On Tue, Jun 09, 2009 at 02:11:14PM -0300, Glauber Costa wrote:
 On Fri, Jun 05, 2009 at 01:23:15PM +0300, Michael S. Tsirkin wrote:
  Add routines to manage PCI capability list. First user will be MSI-X.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   hw/pci.c |   98 
  --
   hw/pci.h |   18 +++-
   2 files changed, 106 insertions(+), 10 deletions(-)
  
  diff --git a/hw/pci.c b/hw/pci.c
  index 361d741..ed011b5 100644
  --- a/hw/pci.c
  +++ b/hw/pci.c
  @@ -130,12 +130,13 @@ void pci_device_save(PCIDevice *s, QEMUFile *f)
   int version = s-cap_present ? 3 : 2;
   int i;
   
  -qemu_put_be32(f, version); /* PCI device version */
  +/* PCI device version and capabilities */
  +qemu_put_be32(f, version);
  +if (version = 3)
  +qemu_put_be32(f, s-cap_present);
   qemu_put_buffer(f, s-config, 256);
   for (i = 0; i  4; i++)
   qemu_put_be32(f, s-irq_state[i]);
  -if (version = 3)
  -qemu_put_be32(f, s-cap_present);
   }
 What is it doing here?
 You should just do it right in the first patch, instead of doing in
 one way there, and fixing here.
 
   
   int pci_device_load(PCIDevice *s, QEMUFile *f)
  @@ -146,12 +147,6 @@ int pci_device_load(PCIDevice *s, QEMUFile *f)
   version_id = qemu_get_be32(f);
   if (version_id  3)
   return -EINVAL;
  -qemu_get_buffer(f, s-config, 256);
  -pci_update_mappings(s);
  -
  -if (version_id = 2)
  -for (i = 0; i  4; i ++)
  -s-irq_state[i] = qemu_get_be32(f);
   if (version_id = 3)
   s-cap_present = qemu_get_be32(f);
   else
 ditto.
  @@ -160,6 +155,18 @@ int pci_device_load(PCIDevice *s, QEMUFile *f)
   if (s-cap_present  ~s-cap_supported)
   return -EINVAL;
   
  +qemu_get_buffer(f, s-config, 256);
  +pci_update_mappings(s);
  +
  +if (version_id = 2)
  +for (i = 0; i  4; i ++)
  +s-irq_state[i] = qemu_get_be32(f);
  +/* Clear wmask and used bits for capabilities.
  +   Must be restored separately, since capabilities can
  +   be placed anywhere in config space. */
  +memset(s-used, 0, PCI_CONFIG_SPACE_SIZE);
  +for (i = PCI_CONFIG_HEADER_SIZE; i  PCI_CONFIG_SPACE_SIZE; ++i)
  +s-wmask[i] = 0xff;
   return 0;
   }
 Sorry, I don't exactly understand it. Although it can be anywhere, what do we 
 actually
 lose by keeping it at the same place in config space?

We lose the ability to let user control the capabilities exposed
by the device.

And generally, I dislike arbitrary limitations. The PCI spec says the
capability can be anywhere, implementing a linked list of caps is simple
enough to not invent abritrary restrictions.

   
  @@ -870,3 +877,76 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, 
  const char *name)
   
   return (PCIDevice *)dev;
   }
  +
  +static int pci_find_space(PCIDevice *pdev, uint8_t size)
  +{
  +int offset = PCI_CONFIG_HEADER_SIZE;
  +int i;
  +for (i = PCI_CONFIG_HEADER_SIZE; i  PCI_CONFIG_SPACE_SIZE; ++i)
  +if (pdev-used[i])
  +offset = i + 1;
  +else if (i - offset + 1 == size)
  +return offset;
  +return 0;
  +}
  +
  +static uint8_t pci_find_capability_list(PCIDevice *pdev, uint8_t cap_id,
  +uint8_t *prev_p)
  +{
  +uint8_t next, prev;
  +
  +if (!(pdev-config[PCI_STATUS]  PCI_STATUS_CAP_LIST))
  +return 0;
  +
  +for (prev = PCI_CAPABILITY_LIST; (next = pdev-config[prev]);
  + prev = next + PCI_CAP_LIST_NEXT)
  +if (pdev-config[next + PCI_CAP_LIST_ID] == cap_id)
  +break;
  +
  +*prev_p = prev;
  +return next;
  +}
 I'd prefer to do:
   if (prev_p)
   *prev_p = prev;
 so we don't have to always pass a prev_p pointer. You have yourself a user
 where you don't need it in this very patch.


Good idea.

  +
  +/* Reserve space and add capability to the linked list in pci config space 
  */
  +int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
  +{
  +uint8_t offset = pci_find_space(pdev, size);
  +uint8_t *config = pdev-config + offset;
  +if (!offset)
  +return -ENOSPC;
  +config[PCI_CAP_LIST_ID] = cap_id;
  +config[PCI_CAP_LIST_NEXT] = pdev-config[PCI_CAPABILITY_LIST];
  +pdev-config[PCI_CAPABILITY_LIST] = offset;
  +pdev-config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
  +memset(pdev-used + offset, 0xFF, size);
  +/* Make capability read-only by default */
  +memset(pdev-wmask + offset, 0, size);
  +return offset;
  +}
  +
  +/* Unlink capability from the pci config space. */
  +void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
  +{
  +uint8_t prev, offset = pci_find_capability_list(pdev, cap_id, prev);
  +if (!offset)
  +return;
  +pdev-config[prev] = pdev-config[offset + 

Re: [Qemu-devel] [PATCHv3 05/13] qemu: MSI-X support functions

2009-06-10 Thread Michael S. Tsirkin
On Tue, Jun 09, 2009 at 02:26:27PM -0300, Glauber Costa wrote:
 On Fri, Jun 05, 2009 at 01:23:31PM +0300, Michael S. Tsirkin wrote:
  Add functions implementing MSI-X support. First user will be virtio-pci.
  Note that platform must set a flag to declare MSI supported.
  For PC this will be set by APIC.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   Makefile.target |2 +-
   hw/msix.c   |  423 
  +++
   hw/msix.h   |   35 +
   hw/pci.h|   20 +++
   4 files changed, 479 insertions(+), 1 deletions(-)
   create mode 100644 hw/msix.c
   create mode 100644 hw/msix.h
  
  diff --git a/Makefile.target b/Makefile.target
  index 664a1e3..87b2859 100644
  --- a/Makefile.target
  +++ b/Makefile.target
  @@ -486,7 +486,7 @@ endif #CONFIG_BSD_USER
   ifndef CONFIG_USER_ONLY
   
   OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o \
  - gdbstub.o gdbstub-xml.o
  + gdbstub.o gdbstub-xml.o msix.o
   # virtio has to be here due to weird dependency between PCI and virtio-net.
   # need to fix this properly
   OBJS+=virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o
  diff --git a/hw/msix.c b/hw/msix.c
  new file mode 100644
  index 000..1b5aec8
  --- /dev/null
  +++ b/hw/msix.c
  @@ -0,0 +1,423 @@
  +/*
  + * MSI-X device support
  + *
  + * This module includes support for MSI-X in pci devices.
  + *
  + * Author: Michael S. Tsirkin m...@redhat.com
  + *
  + *  Copyright (c) 2009, Red Hat Inc, Michael S. Tsirkin (m...@redhat.com)
  + *
  + * This work is licensed under the terms of the GNU GPL, version 2.  See
  + * the COPYING file in the top-level directory.
  + */
  +
  +#include hw.h
  +#include msix.h
  +#include pci.h
  +
  +/* Declaration from linux/pci_regs.h */
  +#define  PCI_CAP_ID_MSIX 0x11 /* MSI-X */
  +#define  PCI_MSIX_FLAGS 2 /* Table at lower 11 bits */
  +#define  PCI_MSIX_FLAGS_QSIZE  0x7FF
  +#define  PCI_MSIX_FLAGS_ENABLE (1  15)
  +#define  PCI_MSIX_FLAGS_BIRMASK(7  0)
  +
  +/* MSI-X capability structure */
  +#define MSIX_TABLE_OFFSET 4
  +#define MSIX_PBA_OFFSET 8
  +#define MSIX_CAP_LENGTH 12
  +
  +/* MSI enable bit is in byte 1 in FLAGS register */
  +#define MSIX_ENABLE_OFFSET (PCI_MSIX_FLAGS + 1)
  +#define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE  8)
  +
  +/* MSI-X table format */
  +#define MSIX_MSG_ADDR 0
  +#define MSIX_MSG_UPPER_ADDR 4
  +#define MSIX_MSG_DATA 8
  +#define MSIX_VECTOR_CTRL 12
  +#define MSIX_ENTRY_SIZE 16
  +#define MSIX_VECTOR_MASK 0x1
  +
  +/* How much space does an MSIX table need. */
  +/* The spec requires giving the table structure
  + * a 4K aligned region all by itself. Align it to
  + * target pages so that drivers can do passthrough
  + * on the rest of the region. */
  +#define MSIX_PAGE_SIZE TARGET_PAGE_ALIGN(0x1000)
  +/* Reserve second half of the page for pending bits */
  +#define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2)
  +#define MSIX_MAX_ENTRIES 32
  +
  +
  +#ifdef MSIX_DEBUG
  +#define DEBUG(fmt, ...)   \
  +do {  \
  +  fprintf(stderr, %s:  fmt, __func__ , __VA_ARGS__);\
  +} while (0)
  +#else
  +#define DEBUG(fmt, ...) do { } while(0)
  +#endif
  +
  +/* Flag to globally disable MSI-X support */
  +int msix_disable;
  +
  +/* Flag for interrupt controller to declare MSI-X support */
  +int msix_supported;
 maybe better to make it static,

It's not read-only either.

 and provide msi_state() returning -1 for disabled,
 0 for supported, etc... 

Matter of taste, I prefer a set of binary flags rather than yet another enum:
msix_disable is controlled by user, msix_supported is a safety valve
for non-PC platforms. It's easier to keep them separate IMO.

  +
  +/* Add MSI-X capability to the config space for the device. */
  +/* Given a bar and its size, add MSI-X table on top of it
  + * and fill MSI-X capability in the config space.
  + * Original bar size must be a power of 2 or 0.
  + * New bar size is returned. */
  +static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
  +   unsigned bar_nr, unsigned bar_size)
  +{
  +int config_offset;
  +uint8_t *config;
  +uint32_t new_size;
  +
  +if (nentries  1 || nentries  PCI_MSIX_FLAGS_QSIZE + 1)
  +return -EINVAL;
  +if (bar_size  0x8000)
  +return -ENOSPC;
  +
  +/* Add space for MSI-X structures */
  +if (!bar_size)
  +new_size = MSIX_PAGE_SIZE;
  +else if (bar_size  MSIX_PAGE_SIZE) {
  +bar_size = MSIX_PAGE_SIZE;
  +new_size = MSIX_PAGE_SIZE * 2;
  +} else
  +new_size = bar_size * 2;
  +
  +pdev-msix_bar_size = new_size;
  +config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX, 
  MSIX_CAP_LENGTH);
  +if (config_offset  0)
  +return config_offset;
  +config = pdev-config + config_offset;
  +
  +

Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive

2009-06-10 Thread Michael Goldish

- Yolkfull Chow yz...@redhat.com wrote:

 On 06/09/2009 05:44 PM, Michael Goldish wrote:
  The test looks pretty nicely written. Comments:
 
  1. Consider making all the cloned VMs use image snapshots:
 
  curr_vm = vm1.clone()
  curr_vm.get_params()[extra_params] +=  -snapshot
 
  I'm not sure it's a good idea to let all VMs use the same disk
 image.
  Or maybe you shouldn't add -snapshot yourself, but rather do it in
 the config
  file for the first VM, and then all cloned VMs will have -snapshot
 as well.
 
 Yes I use 'image_snapshot = yes' in config file.
  2. Consider changing the message
   Booting the %dth guest % num
  to
  Booting guest #%d % num
  (because there's no such thing as 2th and 3th)
 
  3. Consider changing the message
  Cannot boot vm anylonger
  to
  Cannot create VM #%d % num
 
  4. Why not add curr_vm to vms immediately after cloning it?
  That way you can kill it in the exception handler later, without
 having
  to send it a 'quit' if you can't login ('if not curr_vm_session').
 
 Yes, good idea.
  5.  %dth guest boots up successfully % num --  again, 2th and 3th
 make no sense.
  Also, I wonder why you add those spaces before every info message.
 
  6. %dth guest's session is not responsive --  same
  (maybe use Guest session #%d is not responsive % num)
 
  7. Shut down the %dth guest --  same
  (maybe Shutting down guest #%d? or destroying/killing?)
 
  8. Shouldn't we fail the test when we find an unresponsive session?
  It seems you just display an error message. You can simply replace
  logging.error( with raise error.TestFail(.
 
 
  9. Consider using a stricter test than just
 vm_session.is_responsive().
  vm_session.is_responsive() just sends ENTER to the sessions and
 returns
  True if it gets anything as a result (usually a prompt, or even just
 a
  newline echoed back). If the session passes this test it is indeed
  responsive, so it's a decent test, but maybe you can send some
 command
  (user configurable?) and test for some output. I'm really not sure
 this
  is important, because I can't imagine a session would respond to a
 newline
  but not to other commands, but who knows. Maybe you can send the
 first VM
  a user-specified command when the test begins, remember the output,
 and
  then send all other VMs the same command and make sure the output is
 the
  same.
 
 maybe use 'info status' and send command 'help' via session to vms and
 compare their output?

I'm not sure I understand. What does 'info status' do? We're talking about
an SSH shell, not the monitor. You can do whatever you like, like 'uname -a',
and 'ls /', but you should leave it up to the user to decide, so he/she
can specify different commands for different guests. Linux commands won't
work under Windows, so Linux and Windows must have different commands in
the config file. In the Linux section, under '- @Linux:' you can add
something like:

stress_boot:
stress_boot_test_command = uname -a

and under '- @Windows:':

stress_boot:
stress_boot_test_command = ver  vol

These commands are just naive suggestions. I'm sure someone can think of
much more informative commands.

  10. I'm not sure you should use the param kill_vm_gracefully
 because that's
  a postprocessor param (probably not your business). You can just
 call
  destroy() in the exception handler with gracefully=False, because if
 the VMs
  are non- responsive, I don't expect them to shutdown nicely with an
 SSH
  command (that's what gracefully does). Also, we're using -snapshot,
 so
  there's no reason to shut them down nicely.
 
 Yes,  I agree. :)
  11. Total number booted successfully: %d % (num - 1) --  why not
 just num?
  We really have num VMs including the first one.
  Or you can say: Total number booted successfully in addition to the
 first one
  but that's much longer.
 
 Since after the first guest booted, I set num = 1 and then  'num += 1'
 
 at first in while loop ( for the purpose of getting a new vm ).
 So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot up,
 the num booted successfully should be (num - 1).
 I would use enumerate(vms) that Uri suggested to make number easier to
 count.

OK, I didn't notice that.

  12. Consider adding a 'max_vms' (or 'threshold') user param to the
 test. If
  num reaches 'max_vms', we stop adding VMs and pass the test.
 Otherwise the
  test will always fail (which is depressing). If
 params.get(threshold) is
  None or , or in short -- 'if not params.get(threshold)', disable
 this
  feature and keep adding VMs forever. The user can enable the feature
 with:
  max_vms = 50
  or disable it with:
  max_vms =
 
 This is a good idea for hardware resource limit of host.
  13. Why are you catching OSError? If you get OSError it might be a
 framework bug.
 
 Since sometimes, vm.create() successfully but failed to ssh-login
 since 
 the running python cannot allocate physical memory (OSError).
 Add max_vms could fix this problem I 

Re: [Qemu-devel] [PATCHv3 08/13] qemu: add support for resizing regions

2009-06-10 Thread Michael S. Tsirkin
On Tue, Jun 09, 2009 at 02:36:21PM -0300, Glauber Costa wrote:
 On Fri, Jun 05, 2009 at 01:23:55PM +0300, Michael S. Tsirkin wrote:
  Make it possible to resize PCI regions.  This will be used by virtio
  with MSI-X, where the region size depends on whether MSI-X is enabled,
  and can change across load/save.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   hw/pci.c |   54 --
   hw/pci.h |3 +++
   2 files changed, 39 insertions(+), 18 deletions(-)
  
  diff --git a/hw/pci.c b/hw/pci.c
  index ed011b5..042a216 100644
  --- a/hw/pci.c
  +++ b/hw/pci.c
  @@ -392,6 +392,41 @@ void pci_register_io_region(PCIDevice *pci_dev, int 
  region_num,
   *(uint32_t *)(pci_dev-wmask + addr) = cpu_to_le32(wmask);
   }
   
  +static void pci_unmap_region(PCIDevice *d, PCIIORegion *r)
  +{
  +if (r-addr == -1)
  +return;
  +if (r-type  PCI_ADDRESS_SPACE_IO) {
  +int class;
  +/* NOTE: specific hack for IDE in PC case:
  +   only one byte must be mapped. */
  +class = pci_get_word(d-config + PCI_CLASS_DEVICE);
  +if (class == 0x0101  r-size == 4) {
  +isa_unassign_ioport(r-addr + 2, 1);
  +} else {
  +isa_unassign_ioport(r-addr, r-size);
  +}
  +} else {
  +cpu_register_physical_memory(pci_to_cpu_addr(r-addr),
  + r-size,
  + IO_MEM_UNASSIGNED);
  +qemu_unregister_coalesced_mmio(r-addr, r-size);
  +}
  +}
  +
 this is a good cleanup...
 
  +void pci_resize_io_region(PCIDevice *pci_dev, int region_num,
  +  uint32_t size)
  +{
  +
  +PCIIORegion *r = pci_dev-io_regions[region_num];
  +if (r-size == size)
  +return;
  +r-size = size;
  +pci_unmap_region(pci_dev, r);
  +r-addr = -1;
  +pci_update_mappings(pci_dev);
  +}
  +
 but the only user of this one seem to be commented out, and later removed.
 Why is this needed?
 

Um, I think this needs to be called on load: virtio has a memmory region
if and only if it has MSI-X.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv3 12/13] qemu: virtio save/load bindings

2009-06-10 Thread Michael S. Tsirkin
On Tue, Jun 09, 2009 at 02:45:54PM -0300, Glauber Costa wrote:
 duplicated save config.
 
  diff --git a/hw/virtio.h b/hw/virtio.h
  index 04a3c3d..ce05517 100644
  --- a/hw/virtio.h
  +++ b/hw/virtio.h
  @@ -72,6 +72,10 @@ typedef struct VirtQueueElement
   
   typedef struct {
   void (*notify)(void * opaque, uint16_t vector);
  +void (*save_config)(void * opaque, QEMUFile *f);
  +void (*save_queue)(void * opaque, int n, QEMUFile *f);
  +int (*load_config)(void * opaque, QEMUFile *f);
  +int (*load_queue)(void * opaque, int n, QEMUFile *f);
   } VirtIOBindings;
   
 So, what's the overall effect on a virtual machine that gets migrated,
 of a certain device not implementing one of those functions?

Those are implemented by a transport (e.g. virtio_pci) not the device.

 Will it work? Will it break?

It will work - assuming there's nothing transport-specific you need to
save and load. If there is - this patch is not breaking anything
that isn't already broken ...

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive

2009-06-10 Thread Yolkfull Chow

On 06/10/2009 06:03 PM, Michael Goldish wrote:

- Yolkfull Chowyz...@redhat.com  wrote:

   

On 06/09/2009 05:44 PM, Michael Goldish wrote:
 

The test looks pretty nicely written. Comments:

1. Consider making all the cloned VMs use image snapshots:

curr_vm = vm1.clone()
curr_vm.get_params()[extra_params] +=  -snapshot

I'm not sure it's a good idea to let all VMs use the same disk
   

image.
 

Or maybe you shouldn't add -snapshot yourself, but rather do it in
   

the config
 

file for the first VM, and then all cloned VMs will have -snapshot
   

as well.
 


   

Yes I use 'image_snapshot = yes' in config file.
 

2. Consider changing the message
 Booting the %dth guest % num
to
Booting guest #%d % num
(because there's no such thing as 2th and 3th)

3. Consider changing the message
Cannot boot vm anylonger
to
Cannot create VM #%d % num

4. Why not add curr_vm to vms immediately after cloning it?
That way you can kill it in the exception handler later, without
   

having
 

to send it a 'quit' if you can't login ('if not curr_vm_session').

   

Yes, good idea.
 

5.  %dth guest boots up successfully % num --   again, 2th and 3th
   

make no sense.
 

Also, I wonder why you add those spaces before every info message.

6. %dth guest's session is not responsive --   same
(maybe use Guest session #%d is not responsive % num)

7. Shut down the %dth guest --   same
(maybe Shutting down guest #%d? or destroying/killing?)

8. Shouldn't we fail the test when we find an unresponsive session?
It seems you just display an error message. You can simply replace
logging.error( with raise error.TestFail(.

   
 

9. Consider using a stricter test than just
   

vm_session.is_responsive().
 

vm_session.is_responsive() just sends ENTER to the sessions and
   

returns
 

True if it gets anything as a result (usually a prompt, or even just
   

a
 

newline echoed back). If the session passes this test it is indeed
responsive, so it's a decent test, but maybe you can send some
   

command
 

(user configurable?) and test for some output. I'm really not sure
   

this
 

is important, because I can't imagine a session would respond to a
   

newline
 

but not to other commands, but who knows. Maybe you can send the
   

first VM
 

a user-specified command when the test begins, remember the output,
   

and
 

then send all other VMs the same command and make sure the output is
   

the
 

same.

   

maybe use 'info status' and send command 'help' via session to vms and
compare their output?
 

I'm not sure I understand. What does 'info status' do? We're talking about
an SSH shell, not the monitor. You can do whatever you like, like 'uname -a',
and 'ls /', but you should leave it up to the user to decide, so he/she
can specify different commands for different guests. Linux commands won't
work under Windows, so Linux and Windows must have different commands in
the config file. In the Linux section, under '- @Linux:' you can add
something like:

stress_boot:
 stress_boot_test_command = uname -a

and under '- @Windows:':

stress_boot:
 stress_boot_test_command = ver  vol

These commands are just naive suggestions. I'm sure someone can think of
much more informative commands.
   
That's really good suggestions.  Thanks, Michael.  And can I use 
'migration_test_command' instead?
   

10. I'm not sure you should use the param kill_vm_gracefully
   

because that's
 

a postprocessor param (probably not your business). You can just
   

call
 

destroy() in the exception handler with gracefully=False, because if
   

the VMs
 

are non- responsive, I don't expect them to shutdown nicely with an
   

SSH
 

command (that's what gracefully does). Also, we're using -snapshot,
   

so
 

there's no reason to shut them down nicely.

   

Yes,  I agree. :)
 

11. Total number booted successfully: %d % (num - 1) --   why not
   

just num?
 

We really have num VMs including the first one.
Or you can say: Total number booted successfully in addition to the
   

first one
 

but that's much longer.

   

Since after the first guest booted, I set num = 1 and then  'num += 1'

at first in while loop ( for the purpose of getting a new vm ).
So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot up,
the num booted successfully should be (num - 1).
I would use enumerate(vms) that Uri suggested to make number easier to
count.
 

OK, I didn't notice that.

   

12. Consider adding a 'max_vms' (or 'threshold') user param to the
   

test. If
 

num reaches 'max_vms', we stop adding VMs and pass the test.
   

Otherwise the
 

test will always fail (which is depressing). If
   

params.get(threshold) is
 

None or , or in short -- 'if not params.get(threshold)', disable

Re: [PATCH 0/4] Arguments to skip boot menu and to hide bios output

2009-06-10 Thread David Lindsay
Just wondered... what version of qemu was this for, has it gotten anywhere, etc?

I've tried to apply these patches to 0.10.5, 0.10.0, 0.9.1 and the
current Git (as of June 10 2009) but none seem to include rombios.c or
vgabios.c :(

Thanks!

-dav7
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] kvm-kmod-2.6.30

2009-06-10 Thread Avi Kivity

This is a package containing the kvm external module, based on the
2.6.30 series.  kvm-kmod-2.6.30 contains the kvm code that is
present in Linux 2.6.30, except that it can run on older kernels.
It is a good companion to the qemu-kvm-0.10 series.

Note that performance and features will vary with the host kernel used.

Changes from kvm-kmod-2.6.30-rc8
- update source to Linux 2.6.30
  - fix oops during host reboot if CONFIG_MAXSMP is selected
- backport new required API zalloc_cpumask_var()

http://www.linux-kvm.org

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv3 08/13] qemu: add support for resizing regions

2009-06-10 Thread Michael S. Tsirkin
On Tue, Jun 09, 2009 at 02:36:21PM -0300, Glauber Costa wrote:
 On Fri, Jun 05, 2009 at 01:23:55PM +0300, Michael S. Tsirkin wrote:
  Make it possible to resize PCI regions.  This will be used by virtio
  with MSI-X, where the region size depends on whether MSI-X is enabled,
  and can change across load/save.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   hw/pci.c |   54 --
   hw/pci.h |3 +++
   2 files changed, 39 insertions(+), 18 deletions(-)
  
  diff --git a/hw/pci.c b/hw/pci.c
  index ed011b5..042a216 100644
  --- a/hw/pci.c
  +++ b/hw/pci.c
  @@ -392,6 +392,41 @@ void pci_register_io_region(PCIDevice *pci_dev, int 
  region_num,
   *(uint32_t *)(pci_dev-wmask + addr) = cpu_to_le32(wmask);
   }
   
  +static void pci_unmap_region(PCIDevice *d, PCIIORegion *r)
  +{
  +if (r-addr == -1)
  +return;
  +if (r-type  PCI_ADDRESS_SPACE_IO) {
  +int class;
  +/* NOTE: specific hack for IDE in PC case:
  +   only one byte must be mapped. */
  +class = pci_get_word(d-config + PCI_CLASS_DEVICE);
  +if (class == 0x0101  r-size == 4) {
  +isa_unassign_ioport(r-addr + 2, 1);
  +} else {
  +isa_unassign_ioport(r-addr, r-size);
  +}
  +} else {
  +cpu_register_physical_memory(pci_to_cpu_addr(r-addr),
  + r-size,
  + IO_MEM_UNASSIGNED);
  +qemu_unregister_coalesced_mmio(r-addr, r-size);
  +}
  +}
  +
 this is a good cleanup...
 
  +void pci_resize_io_region(PCIDevice *pci_dev, int region_num,
  +  uint32_t size)
  +{
  +
  +PCIIORegion *r = pci_dev-io_regions[region_num];
  +if (r-size == size)
  +return;
  +r-size = size;
  +pci_unmap_region(pci_dev, r);
  +r-addr = -1;
  +pci_update_mappings(pci_dev);
  +}
  +
 but the only user of this one seem to be commented out, and later removed.
 Why is this needed?
 

This was the missing bit:

Set correct size for msi-x memory region when loading the device.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 589fbb1..f657364 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -133,6 +133,8 @@ static int virtio_pci_load_config(void * opaque, QEMUFile 
*f)
 return ret;
 if (msix_present(proxy-pci_dev))
 qemu_get_be16s(f, proxy-vdev-config_vector);
+
+pci_resize_io_region(proxy-pci_dev, 1, msix_bar_size(proxy-pci_dev));
 return 0;
 }
 

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [KVM-AUTOTEST PATCH 1/4] Make all programs on kvm test use /usr/bin/python

2009-06-10 Thread Alexey Eromenko

Even better would be to use /usr/bin/python2.

This is because future distros will include python3, which is incompatible with 
python2 code.

python will be symlink of python3.

-Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] pte accessor fixes

2009-06-10 Thread Avi Kivity
A minor fix to the pte accessors, followed by a cosmetic fix so we don't
hit the same problem next time, followed by a cosmetic patch just because
I had my search-and-replace warmed up.  Please review.

Avi Kivity (3):
  KVM: MMU: Fix is_dirty_pte()
  KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow
pte
  KVM: MMU: s/shadow_pte/spte/

 arch/x86/kvm/mmu.c |  116 ++--
 arch/x86/kvm/mmu.h |2 +-
 arch/x86/kvm/paging_tmpl.h |   38 +++---
 arch/x86/kvm/x86.c |2 +-
 4 files changed, 79 insertions(+), 79 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: MMU: Fix is_dirty_pte()

2009-06-10 Thread Avi Kivity
is_dirty_pte() is used on guest ptes, not shadow ptes, so it needs to avoid
shadow_dirty_mask and use PT_DIRTY_MASK instead.

Misdetecting dirty pages could lead to unnecessarily setting the dirty bit
under EPT.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/mmu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 809cce0..8e5003c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -242,7 +242,7 @@ static int is_writeble_pte(unsigned long pte)
 
 static int is_dirty_pte(unsigned long pte)
 {
-   return pte  shadow_dirty_mask;
+   return pte  PT_DIRTY_MASK;
 }
 
 static int is_rmap_pte(u64 pte)
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pte

2009-06-10 Thread Avi Kivity
Since the guest and host ptes can have wildly different format, adjust
the pte accessor names to indicate on which type of pte they operate on.

No functional changes.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/mmu.c |   16 
 arch/x86/kvm/mmu.h |2 +-
 arch/x86/kvm/paging_tmpl.h |   22 +++---
 arch/x86/kvm/x86.c |2 +-
 4 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8e5003c..415630d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -240,12 +240,12 @@ static int is_writeble_pte(unsigned long pte)
return pte  PT_WRITABLE_MASK;
 }
 
-static int is_dirty_pte(unsigned long pte)
+static int is_dirty_gpte(unsigned long pte)
 {
return pte  PT_DIRTY_MASK;
 }
 
-static int is_rmap_pte(u64 pte)
+static int is_rmap_spte(u64 pte)
 {
return is_shadow_present_pte(pte);
 }
@@ -498,7 +498,7 @@ static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, 
gfn_t gfn, int lpage)
unsigned long *rmapp;
int i;
 
-   if (!is_rmap_pte(*spte))
+   if (!is_rmap_spte(*spte))
return;
gfn = unalias_gfn(vcpu-kvm, gfn);
sp = page_header(__pa(spte));
@@ -560,7 +560,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
unsigned long *rmapp;
int i;
 
-   if (!is_rmap_pte(*spte))
+   if (!is_rmap_spte(*spte))
return;
sp = page_header(__pa(spte));
pfn = spte_to_pfn(*spte);
@@ -1747,7 +1747,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
 __func__, *shadow_pte, pt_access,
 write_fault, user_fault, gfn);
 
-   if (is_rmap_pte(*shadow_pte)) {
+   if (is_rmap_spte(*shadow_pte)) {
/*
 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
 * the parent of the now unreachable PTE.
@@ -1783,7 +1783,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
page_header_update_slot(vcpu-kvm, shadow_pte, gfn);
if (!was_rmapped) {
rmap_add(vcpu, shadow_pte, gfn, largepage);
-   if (!is_rmap_pte(*shadow_pte))
+   if (!is_rmap_spte(*shadow_pte))
kvm_release_pfn_clean(pfn);
} else {
if (was_writeble)
@@ -1960,7 +1960,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
ASSERT(!VALID_PAGE(root));
if (vcpu-arch.mmu.root_level == PT32E_ROOT_LEVEL) {
pdptr = kvm_pdptr_read(vcpu, i);
-   if (!is_present_pte(pdptr)) {
+   if (!is_present_gpte(pdptr)) {
vcpu-arch.mmu.pae_root[i] = 0;
continue;
}
@@ -2451,7 +2451,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu 
*vcpu, gpa_t gpa,
if ((bytes == 4)  (gpa % 4 == 0))
memcpy((void *)gpte, new, 4);
}
-   if (!is_present_pte(gpte))
+   if (!is_present_gpte(gpte))
return;
gfn = (gpte  PT64_BASE_ADDR_MASK)  PAGE_SHIFT;
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 3494a2f..016bf71 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -75,7 +75,7 @@ static inline int is_paging(struct kvm_vcpu *vcpu)
return vcpu-arch.cr0  X86_CR0_PG;
 }
 
-static inline int is_present_pte(unsigned long pte)
+static inline int is_present_gpte(unsigned long pte)
 {
return pte  PT_PRESENT_MASK;
 }
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 4cb1dbf..238a193 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -132,7 +132,7 @@ walk:
 #if PTTYPE == 64
if (!is_long_mode(vcpu)) {
pte = kvm_pdptr_read(vcpu, (addr  30)  3);
-   if (!is_present_pte(pte))
+   if (!is_present_gpte(pte))
goto not_present;
--walker-level;
}
@@ -155,7 +155,7 @@ walk:
 
kvm_read_guest(vcpu-kvm, pte_gpa, pte, sizeof(pte));
 
-   if (!is_present_pte(pte))
+   if (!is_present_gpte(pte))
goto not_present;
 
rsvd_fault = is_rsvd_bits_set(vcpu, pte, walker-level);
@@ -205,7 +205,7 @@ walk:
--walker-level;
}
 
-   if (write_fault  !is_dirty_pte(pte)) {
+   if (write_fault  !is_dirty_gpte(pte)) {
bool ret;
 
mark_page_dirty(vcpu-kvm, table_gfn);
@@ -252,7 +252,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *page,
 
gpte = *(const pt_element_t *)pte;
if (~gpte  (PT_PRESENT_MASK | PT_ACCESSED_MASK)) {
-   if (!is_present_pte(gpte))
+   if (!is_present_gpte(gpte))

[PATCH 3/3] KVM: MMU: s/shadow_pte/spte/

2009-06-10 Thread Avi Kivity
We use shadow_pte and spte inconsistently, switch to the shorter spelling.

Rename set_shadow_pte() to __set_spte() to avoid a conflict with the
existing set_spte(), and to indicate its lowlevelness.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/mmu.c |  102 ++--
 arch/x86/kvm/paging_tmpl.h |   16 +++---
 2 files changed, 59 insertions(+), 59 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 415630d..abd3a17 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -143,7 +143,7 @@ module_param(oos_shadow, bool, 0644);
 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
 
 struct kvm_rmap_desc {
-   u64 *shadow_ptes[RMAP_EXT];
+   u64 *sptes[RMAP_EXT];
struct kvm_rmap_desc *more;
 };
 
@@ -262,7 +262,7 @@ static gfn_t pse36_gfn_delta(u32 gpte)
return (gpte  PT32_DIR_PSE36_MASK)  shift;
 }
 
-static void set_shadow_pte(u64 *sptep, u64 spte)
+static void __set_spte(u64 *sptep, u64 spte)
 {
 #ifdef CONFIG_X86_64
set_64bit((unsigned long *)sptep, spte);
@@ -510,21 +510,21 @@ static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, 
gfn_t gfn, int lpage)
} else if (!(*rmapp  1)) {
rmap_printk(rmap_add: %p %llx 1-many\n, spte, *spte);
desc = mmu_alloc_rmap_desc(vcpu);
-   desc-shadow_ptes[0] = (u64 *)*rmapp;
-   desc-shadow_ptes[1] = spte;
+   desc-sptes[0] = (u64 *)*rmapp;
+   desc-sptes[1] = spte;
*rmapp = (unsigned long)desc | 1;
} else {
rmap_printk(rmap_add: %p %llx many-many\n, spte, *spte);
desc = (struct kvm_rmap_desc *)(*rmapp  ~1ul);
-   while (desc-shadow_ptes[RMAP_EXT-1]  desc-more)
+   while (desc-sptes[RMAP_EXT-1]  desc-more)
desc = desc-more;
-   if (desc-shadow_ptes[RMAP_EXT-1]) {
+   if (desc-sptes[RMAP_EXT-1]) {
desc-more = mmu_alloc_rmap_desc(vcpu);
desc = desc-more;
}
-   for (i = 0; desc-shadow_ptes[i]; ++i)
+   for (i = 0; desc-sptes[i]; ++i)
;
-   desc-shadow_ptes[i] = spte;
+   desc-sptes[i] = spte;
}
 }
 
@@ -535,14 +535,14 @@ static void rmap_desc_remove_entry(unsigned long *rmapp,
 {
int j;
 
-   for (j = RMAP_EXT - 1; !desc-shadow_ptes[j]  j  i; --j)
+   for (j = RMAP_EXT - 1; !desc-sptes[j]  j  i; --j)
;
-   desc-shadow_ptes[i] = desc-shadow_ptes[j];
-   desc-shadow_ptes[j] = NULL;
+   desc-sptes[i] = desc-sptes[j];
+   desc-sptes[j] = NULL;
if (j != 0)
return;
if (!prev_desc  !desc-more)
-   *rmapp = (unsigned long)desc-shadow_ptes[0];
+   *rmapp = (unsigned long)desc-sptes[0];
else
if (prev_desc)
prev_desc-more = desc-more;
@@ -587,8 +587,8 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
desc = (struct kvm_rmap_desc *)(*rmapp  ~1ul);
prev_desc = NULL;
while (desc) {
-   for (i = 0; i  RMAP_EXT  desc-shadow_ptes[i]; ++i)
-   if (desc-shadow_ptes[i] == spte) {
+   for (i = 0; i  RMAP_EXT  desc-sptes[i]; ++i)
+   if (desc-sptes[i] == spte) {
rmap_desc_remove_entry(rmapp,
   desc, i,
   prev_desc);
@@ -619,10 +619,10 @@ static u64 *rmap_next(struct kvm *kvm, unsigned long 
*rmapp, u64 *spte)
prev_desc = NULL;
prev_spte = NULL;
while (desc) {
-   for (i = 0; i  RMAP_EXT  desc-shadow_ptes[i]; ++i) {
+   for (i = 0; i  RMAP_EXT  desc-sptes[i]; ++i) {
if (prev_spte == spte)
-   return desc-shadow_ptes[i];
-   prev_spte = desc-shadow_ptes[i];
+   return desc-sptes[i];
+   prev_spte = desc-sptes[i];
}
desc = desc-more;
}
@@ -644,7 +644,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
BUG_ON(!(*spte  PT_PRESENT_MASK));
rmap_printk(rmap_write_protect: spte %p %llx\n, spte, *spte);
if (is_writeble_pte(*spte)) {
-   set_shadow_pte(spte, *spte  ~PT_WRITABLE_MASK);
+   __set_spte(spte, *spte  ~PT_WRITABLE_MASK);
write_protected = 1;
}
spte = rmap_next(kvm, rmapp, spte);
@@ -668,7 +668,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
if (is_writeble_pte(*spte)) {
  

Re: [Qemu-devel] [PATCHv3 12/13] qemu: virtio save/load bindings

2009-06-10 Thread Michael S. Tsirkin
On Tue, Jun 09, 2009 at 02:45:54PM -0300, Glauber Costa wrote:
 duplicated save config.

Good catch. Fixed.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: KVM VServer coexistence

2009-06-10 Thread Martin Maurer
Should work, we use KVM and openVZ in Proxmox VE.

Best Regards,

Martin Maurer

mar...@proxmox.com
http://pve.proxmox.com

 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of Ajay EP
 Sent: Dienstag, 09. Juni 2009 15:37
 To: kvm@vger.kernel.org
 Subject: KVM  VServer coexistence
 
 Hi ,
 
 I would like to know if both container based (Vserver ) and para
 virtualization (KVM) co-exist at the same time on the host machine (
 INTEL
 VT) ?
 
 
 Thanks
 Ajay
 
 __
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive

2009-06-10 Thread Michael Goldish

- Yolkfull Chow yz...@redhat.com wrote:

 On 06/10/2009 06:03 PM, Michael Goldish wrote:
  - Yolkfull Chowyz...@redhat.com  wrote:
 
 
  On 06/09/2009 05:44 PM, Michael Goldish wrote:
   
  The test looks pretty nicely written. Comments:
 
  1. Consider making all the cloned VMs use image snapshots:
 
  curr_vm = vm1.clone()
  curr_vm.get_params()[extra_params] +=  -snapshot
 
  I'm not sure it's a good idea to let all VMs use the same disk
 
  image.
   
  Or maybe you shouldn't add -snapshot yourself, but rather do it
 in
 
  the config
   
  file for the first VM, and then all cloned VMs will have
 -snapshot
 
  as well.
   
 
 
  Yes I use 'image_snapshot = yes' in config file.
   
  2. Consider changing the message
   Booting the %dth guest % num
  to
  Booting guest #%d % num
  (because there's no such thing as 2th and 3th)
 
  3. Consider changing the message
  Cannot boot vm anylonger
  to
  Cannot create VM #%d % num
 
  4. Why not add curr_vm to vms immediately after cloning it?
  That way you can kill it in the exception handler later, without
 
  having
   
  to send it a 'quit' if you can't login ('if not
 curr_vm_session').
 
 
  Yes, good idea.
   
  5.  %dth guest boots up successfully % num --   again, 2th and
 3th
 
  make no sense.
   
  Also, I wonder why you add those spaces before every info
 message.
 
  6. %dth guest's session is not responsive --   same
  (maybe use Guest session #%d is not responsive % num)
 
  7. Shut down the %dth guest --   same
  (maybe Shutting down guest #%d? or destroying/killing?)
 
  8. Shouldn't we fail the test when we find an unresponsive
 session?
  It seems you just display an error message. You can simply
 replace
  logging.error( with raise error.TestFail(.
 
 
   
  9. Consider using a stricter test than just
 
  vm_session.is_responsive().
   
  vm_session.is_responsive() just sends ENTER to the sessions and
 
  returns
   
  True if it gets anything as a result (usually a prompt, or even
 just
 
  a
   
  newline echoed back). If the session passes this test it is
 indeed
  responsive, so it's a decent test, but maybe you can send some
 
  command
   
  (user configurable?) and test for some output. I'm really not
 sure
 
  this
   
  is important, because I can't imagine a session would respond to
 a
 
  newline
   
  but not to other commands, but who knows. Maybe you can send the
 
  first VM
   
  a user-specified command when the test begins, remember the
 output,
 
  and
   
  then send all other VMs the same command and make sure the output
 is
 
  the
   
  same.
 
 
  maybe use 'info status' and send command 'help' via session to vms
 and
  compare their output?
   
  I'm not sure I understand. What does 'info status' do? We're talking
 about
  an SSH shell, not the monitor. You can do whatever you like, like
 'uname -a',
  and 'ls /', but you should leave it up to the user to decide, so
 he/she
  can specify different commands for different guests. Linux commands
 won't
  work under Windows, so Linux and Windows must have different
 commands in
  the config file. In the Linux section, under '- @Linux:' you can
 add
  something like:
 
  stress_boot:
   stress_boot_test_command = uname -a
 
  and under '- @Windows:':
 
  stress_boot:
   stress_boot_test_command = ver  vol
 
  These commands are just naive suggestions. I'm sure someone can
 think of
  much more informative commands.
 
 That's really good suggestions.  Thanks, Michael.  And can I use 
 'migration_test_command' instead?

Not really. Why would you want to use another test's param?

1. There's no guarantee that 'migration_test_command' is defined
for your boot stress test. In fact, it is probably only defined for
migration tests, so you probably won't be able to access it. Try
params.get('migration_test_command') in your test and you'll probably
get None.

2. The user may not want to run migration at all, and then he/she
will probably not define 'migration_test_command'.

3. The user might want to use different test commands for migration
and for the boot stress test.

  10. I'm not sure you should use the param kill_vm_gracefully
 
  because that's
   
  a postprocessor param (probably not your business). You can just
 
  call
   
  destroy() in the exception handler with gracefully=False, because
 if
 
  the VMs
   
  are non- responsive, I don't expect them to shutdown nicely with
 an
 
  SSH
   
  command (that's what gracefully does). Also, we're using
 -snapshot,
 
  so
   
  there's no reason to shut them down nicely.
 
 
  Yes,  I agree. :)
   
  11. Total number booted successfully: %d % (num - 1) --   why
 not
 
  just num?
   
  We really 

[PATCH 0/2] *** SUBJECT HERE ***

2009-06-10 Thread Izik Eidus
RFC: move the dirty page tracking to use dirty bit

Well, i was bored this morning and had this idea for a while, didnt test it to
much..., first i want to hear what ppl think?

Thanks.

Izik Eidus (2):
  kvm: fix dirty bit tracking for slots with large pages
  kvm: change the dirty page tracking to work with dirty bit instead of
page fault

 arch/ia64/kvm/kvm-ia64.c|4 
 arch/powerpc/kvm/powerpc.c  |4 
 arch/s390/kvm/kvm-s390.c|4 
 arch/x86/include/asm/kvm_host.h |3 +++
 arch/x86/kvm/mmu.c  |   32 +---
 arch/x86/kvm/svm.c  |7 +++
 arch/x86/kvm/vmx.c  |7 +++
 arch/x86/kvm/x86.c  |   21 ++---
 include/linux/kvm_host.h|1 +
 virt/kvm/kvm_main.c |   17 -
 10 files changed, 89 insertions(+), 11 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] kvm: fix dirty bit tracking for slots with large pages

2009-06-10 Thread Izik Eidus
When slot is already allocted and being asked to be tracked we need to break the
large pages.

This code flush the mmu when someone ask a slot to start dirty bit tracking.

Signed-off-by: Izik Eidus iei...@redhat.com
---
 virt/kvm/kvm_main.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 669eb4a..4a60c72 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1160,6 +1160,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
new.userspace_addr = mem-userspace_addr;
else
new.userspace_addr = 0;
+
+   kvm_arch_flush_shadow(kvm);
}
if (npages  !new.lpage_info) {
largepages = 1 + (base_gfn + npages - 1) / KVM_PAGES_PER_HPAGE;
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm: change the dirty page tracking to work with dirty bit instead of page fault

2009-06-10 Thread Izik Eidus
right now the dirty page tracking work with the help of page faults, when we
want to track a page for being dirty, we write protect it and we mark it dirty
when we have write page fault, this code move into looking at the dirty bit
of the spte.

Signed-off-by: Izik Eidus iei...@redhat.com
---
 arch/ia64/kvm/kvm-ia64.c|4 
 arch/powerpc/kvm/powerpc.c  |4 
 arch/s390/kvm/kvm-s390.c|4 
 arch/x86/include/asm/kvm_host.h |3 +++
 arch/x86/kvm/mmu.c  |   32 +---
 arch/x86/kvm/svm.c  |7 +++
 arch/x86/kvm/vmx.c  |7 +++
 arch/x86/kvm/x86.c  |   21 ++---
 include/linux/kvm_host.h|1 +
 virt/kvm/kvm_main.c |   15 ++-
 10 files changed, 87 insertions(+), 11 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 3199221..5914128 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1809,6 +1809,10 @@ void kvm_arch_exit(void)
kvm_vmm_info = NULL;
 }
 
+void kvm_arch_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+}
+
 static int kvm_ia64_sync_dirty_log(struct kvm *kvm,
struct kvm_dirty_log *log)
 {
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2cf915e..6beb368 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -418,6 +418,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
return -ENOTSUPP;
 }
 
+void kvm_arch_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
 {
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 981ab04..ab6f115 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -130,6 +130,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
return 0;
 }
 
+void kvm_arch_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c7b0cc2..8a24149 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -527,6 +527,7 @@ struct kvm_x86_ops {
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
+   int (*dirty_bit_support)(void);
 };
 
 extern struct kvm_x86_ops *kvm_x86_ops;
@@ -796,4 +797,6 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
 int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
 
+int is_dirty_and_clean_rmapp(struct kvm *kvm, unsigned long *rmapp);
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 809cce0..3ec6a7d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -140,6 +140,8 @@ module_param(oos_shadow, bool, 0644);
 #define ACC_USER_MASKPT_USER_MASK
 #define ACC_ALL  (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
 
+#define SPTE_DONT_DIRTY (1ULL  PT_FIRST_AVAIL_BITS_SHIFT)
+
 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
 
 struct kvm_rmap_desc {
@@ -629,6 +631,25 @@ static u64 *rmap_next(struct kvm *kvm, unsigned long 
*rmapp, u64 *spte)
return NULL;
 }
 
+int is_dirty_and_clean_rmapp(struct kvm *kvm, unsigned long *rmapp)
+{
+   u64 *spte;
+   int dirty = 0;
+
+   spte = rmap_next(kvm, rmapp, NULL);
+   while (spte) {
+   if (*spte  PT_DIRTY_MASK) {
+   set_shadow_pte(spte, (*spte = ~PT_DIRTY_MASK) |
+  SPTE_DONT_DIRTY);
+   dirty = 1;
+   }
+   spte = rmap_next(kvm, rmapp, spte);
+   }
+
+   return dirty;
+}
+
+
 static int rmap_write_protect(struct kvm *kvm, u64 gfn)
 {
unsigned long *rmapp;
@@ -1676,7 +1697,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
 * whether the guest actually used the pte (in order to detect
 * demand paging).
 */
-   spte = shadow_base_present_pte | shadow_dirty_mask;
+   spte = shadow_base_present_pte;
+   if (!(spte  SPTE_DONT_DIRTY))
+   spte |= shadow_dirty_mask;
+
if (!speculative)
spte |= shadow_accessed_mask;
if (!dirty)
@@ -1725,8 +1749,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
}
}
 
-   if (pte_access  ACC_WRITE_MASK)
-   mark_page_dirty(vcpu-kvm, gfn);
+   if (!shadow_dirty_mask) {
+   if (pte_access  ACC_WRITE_MASK)
+   mark_page_dirty(vcpu-kvm, gfn);
+   }
 
 set_pte:

Re: [PATCH 1/2] kvm: fix dirty bit tracking for slots with large pages

2009-06-10 Thread Avi Kivity

Izik Eidus wrote:

When slot is already allocted and being asked to be tracked we need to break the
large pages.

This code flush the mmu when someone ask a slot to start dirty bit tracking.

Signed-off-by: Izik Eidus iei...@redhat.com
---
 virt/kvm/kvm_main.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 669eb4a..4a60c72 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1160,6 +1160,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
new.userspace_addr = mem-userspace_addr;
else
new.userspace_addr = 0;
+
+   kvm_arch_flush_shadow(kvm);
}
if (npages  !new.lpage_info) {
largepages = 1 + (base_gfn + npages - 1) / KVM_PAGES_PER_HPAGE;
  


Ryan, can you try this out with your large page migration failures?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: change the dirty page tracking to work with dirty bit instead of page fault

2009-06-10 Thread Izik Eidus

Few quick thoughts:


 
+void kvm_arch_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)

+{
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c7b0cc2..8a24149 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -527,6 +527,7 @@ struct kvm_x86_ops {
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
+   int (*dirty_bit_support)(void);
 };
 
 extern struct kvm_x86_ops *kvm_x86_ops;

@@ -796,4 +797,6 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
 int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
 
+int is_dirty_and_clean_rmapp(struct kvm *kvm, unsigned long *rmapp);

+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 809cce0..3ec6a7d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -140,6 +140,8 @@ module_param(oos_shadow, bool, 0644);
 #define ACC_USER_MASKPT_USER_MASK
 #define ACC_ALL  (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
 
+#define SPTE_DONT_DIRTY (1ULL  PT_FIRST_AVAIL_BITS_SHIFT)

+
 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
 
 struct kvm_rmap_desc {

@@ -629,6 +631,25 @@ static u64 *rmap_next(struct kvm *kvm, unsigned long 
*rmapp, u64 *spte)
return NULL;
 }
 
+int is_dirty_and_clean_rmapp(struct kvm *kvm, unsigned long *rmapp)

+{
+   u64 *spte;
+   int dirty = 0;
+
  


Here we should add:

if (!shadow_dirty_mask)
   return 0;



+   spte = rmap_next(kvm, rmapp, NULL);
+   while (spte) {
+   if (*spte  PT_DIRTY_MASK) {
+   set_shadow_pte(spte, (*spte = ~PT_DIRTY_MASK) |
+  SPTE_DONT_DIRTY);
+   dirty = 1;
+   }
+   spte = rmap_next(kvm, rmapp, spte);
+   }
+
+   return dirty;
+}
+




  */
@@ -1982,9 +1995,11 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
 	/* If nothing is dirty, don't bother messing with page tables. */

if (is_dirty) {
-   spin_lock(kvm-mmu_lock);
-   kvm_mmu_slot_remove_write_access(kvm, log-slot);
-   spin_unlock(kvm-mmu_lock);
+   if (kvm_x86_ops-dirty_bit_support()) {
  


This should be  if (kvm_x86_ops-dirty_bit_support() - if 
(!kvm_x86_ops-dirty_bit_support())



+   spin_lock(kvm-mmu_lock);
+   kvm_mmu_slot_remove_write_access(kvm, log-slot);
+   spin_unlock(kvm-mmu_lock);
+   }
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] kvm: fix dirty bit tracking for slots with large pages

2009-06-10 Thread Izik Eidus

Avi Kivity wrote:

Izik Eidus wrote:
When slot is already allocted and being asked to be tracked we need 
to break the

large pages.

This code flush the mmu when someone ask a slot to start dirty bit 
tracking.


Signed-off-by: Izik Eidus iei...@redhat.com
---
 virt/kvm/kvm_main.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 669eb4a..4a60c72 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1160,6 +1160,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
 new.userspace_addr = mem-userspace_addr;
 else
 new.userspace_addr = 0;
+
+kvm_arch_flush_shadow(kvm);
 }
 if (npages  !new.lpage_info) {
 largepages = 1 + (base_gfn + npages - 1) / KVM_PAGES_PER_HPAGE;
  


Ryan, can you try this out with your large page migration failures?


Wait, i think it is in the wrong place., i am sending a second seires :(
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG at mmu.c:615 from localhost migration using ept+hugetlbfs

2009-06-10 Thread Marcelo Tosatti
On Wed, Jun 10, 2009 at 11:08:14AM +0300, Avi Kivity wrote:
 Avi Kivity wrote:

 Not really.  One thing, migration should transition the shadow  
 pagetables from large pages to small ones, maybe that bit is broken.

 Maybe we're looking at a largepage spte and interpreting it as a  
 normal L2 spte, and interpreting a guest page as the L1 spt.

 I tried to find where we drop the mmu (or at least large sptes for the  
 slot) when we enable dirty logging, and failed.  Maybe  
 remove_write_access() is sufficient.

I believe you have to break down large pages into 4k pages for migration
to work reliably. Was tempted to copypaste the hugetlbfs file ram alloc
code into user/main.c to use with user/vm.c (which then can also be used
to test TLB flushes on 2M-4k transition which are lacking).

Regarding the bogus spte, could not reproduce yesterday with kvm.git,
but in the worst case the audit code will catch it.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/5] KVM: MMU: add kvm_mmu_shadow_walk helper

2009-06-10 Thread Marcelo Tosatti
On Wed, Jun 10, 2009 at 12:17:09PM +0300, Avi Kivity wrote:
 Marcelo Tosatti wrote:
 Required by EPT misconfiguration handler.

 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

 Index: kvm/arch/x86/kvm/mmu.c
 ===
 --- kvm.orig/arch/x86/kvm/mmu.c
 +++ kvm/arch/x86/kvm/mmu.c
 @@ -3013,6 +3013,26 @@ out:
  return r;
  }
  +void kvm_mmu_shadow_walk(struct kvm_vcpu *vcpu, u64 addr,
 + struct mmu_shadow_walk *walk)
 +{
 +struct kvm_shadow_walk_iterator iterator;
 +
 +spin_lock(vcpu-kvm-mmu_lock);
 +for_each_shadow_entry(vcpu, addr, iterator) {
 +int err;
 +
 +err = walk-fn(vcpu, iterator.sptep, iterator.level, walk);
 +if (err)
 +break;
 +
 +if (!is_shadow_present_pte(*iterator.sptep))
 +break;
 +}
 +spin_unlock(vcpu-kvm-mmu_lock);
 +}
 +EXPORT_SYMBOL(kvm_mmu_shadow_walk);
 +
   

 Isn't it simpler to invoke for_each_shadow_entry(), instead of defining  
 a callback and calling it?

 We had those callbacks once, then switched to for_each.

The point is its exported to use in a external module (kvm-intel.ko),
so you hide the details (such as locking) in the kvm_mmu_shadow_walk
helper. Let me know how do you prefer this to be.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [KVM-AUTOTEST PATCH 2/4] Make kvm_config.py to use internal/standard exeptions

2009-06-10 Thread Michael Goldish
Looks fine to me.

BTW, I think debug_print() should be prefixed by a single underscore, not two.
A double underscore should be used only when name mangling is required -- at 
least that's what I understood from PEP 8.
Let me know what you think.

Thanks,
Michael

- Original Message -
From: Lucas Meneghel Rodrigues l...@redhat.com
To: autot...@test.kernel.org
Cc: kvm@vger.kernel.org
Sent: Tuesday, June 9, 2009 7:33:27 PM (GMT+0200) Auto-Detected
Subject: [Autotest] [KVM-AUTOTEST PATCH 2/4] Make kvm_config.py to use 
internal/standard exeptions

Instead of resorting to internal autotest exception types, use
either python standard exceptions or an internally defined
ConfigError exception.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/kvm_config.py |   13 +
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/client/tests/kvm/kvm_config.py b/client/tests/kvm/kvm_config.py
index a3467a0..cce931a 100755
--- a/client/tests/kvm/kvm_config.py
+++ b/client/tests/kvm/kvm_config.py
@@ -8,6 +8,11 @@ KVM configuration file utility functions.
 @copyright: Red Hat 2008-2009
 
 
+
+class ConfigError(Exception):
+pass
+
+
 class config:
 
 Parse an input file or string that follows the KVM Test Config File format
@@ -47,7 +52,7 @@ class config:
 @param filename: Path of the configuration file.
 
 if not os.path.exists(filename):
-raise Exception, File %s not found % filename
+raise IOError(File %s not found % filename)
 self.filename = filename
 file = open(filename, r)
 self.list = self.parse(file, self.list)
@@ -357,7 +362,7 @@ class config:
 # (inside an exception or inside subvariants)
 if restricted:
 e_msg = Using variants in this context is not allowed
-raise error.AutotestError()
+raise ConfigError(e_msg)
 if self.debug and not restricted:
 self.__debug_print(indented_line,
  Entering variants block (%d dicts in
@@ -402,7 +407,7 @@ class config:
 words[1])
 if not os.path.exists(filename):
 e_msg = Cannot include %s -- file not found % 
filename
-raise error.AutotestError(e_msg)
+raise ConfigError(e_msg)
 new_file = open(filename, r)
 list = self.parse(new_file, list, restricted)
 new_file.close()
@@ -410,7 +415,7 @@ class config:
 self.__debug_print(, Leaving file %s % words[1])
 else:
 e_msg = Cannot include anything because no file is open
-raise error.AutotestError(e_msg)
+raise ConfigError(e_msg)
 
 # Parse multi-line exceptions
 # (the block is parsed for each dict separately)
-- 
1.6.2.2

___
Autotest mailing list
autot...@test.kernel.org
http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/5] KVM: MMU: add kvm_mmu_shadow_walk helper

2009-06-10 Thread Avi Kivity

Marcelo Tosatti wrote:
Isn't it simpler to invoke for_each_shadow_entry(), instead of defining  
a callback and calling it?


We had those callbacks once, then switched to for_each.



The point is its exported to use in a external module (kvm-intel.ko),
so you hide the details (such as locking) in the kvm_mmu_shadow_walk
helper. Let me know how do you prefer this to be.
  


Ah, you're right.

I don't think it's worthwhile to add all this just for debugging.  You 
can add a function that dumps the spte chain as well as the features 
MSR, and we can decode it by hand when we see it.  Disadvantage is more 
work for us when we hit the bug, but at least that function is reusable 
in other contexts.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Clean up MADT Table Creation (v2)

2009-06-10 Thread Avi Kivity

Beth Kon wrote:

This patch is based on the recent patch from Vincent Minet. I split Vincent's
changes into 2 patches (to separate MADT and RSDT table cleanup, as suggested by
Marcelo) and added a bit to them.

There has been much ado over the acpi_bios_init function recently. I had actually 
done a rewrite very similar to Gleb's, but Avi argued that the rewrite has to be 
more incremental. This patch contains minimal changes without any rewrite because 
the changes are kvm-only. The rewrite would better be a separate step, submitted to 
qemu and then merged into kvm.  


I am submitting the RSDT fix to kvm because the kvm and qemu RSDT 
implementation differs.
Again, as a separate rewrite effort, the kvm and qemu RSDT manipulation could be merged 
into one base as a later, separate step.


This patch will get MADT into reasonable enough shape for me to resubmit hpet 
patches
on top of it. After that, I'd be willing to submit incremental rewrite patches for 
acpi_bios_init to qemu, starting with MADT and RSDT.  

  


Applied both, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [KVM-AUTOTEST PATCH 4/4] Fix bad logging calls

2009-06-10 Thread Michael Goldish
Looks fine to me.

- Original Message -
From: Lucas Meneghel Rodrigues l...@redhat.com
To: autot...@test.kernel.org
Cc: kvm@vger.kernel.org
Sent: Tuesday, June 9, 2009 7:33:29 PM (GMT+0200) Auto-Detected
Subject: [Autotest] [KVM-AUTOTEST PATCH 4/4] Fix bad logging calls

During the conversion of kvm autotest to upstream coding standards,
some bad logging calls were left behind. This patch fixes them.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/kvm_utils.py |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index 0f4c770..70a49c9 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -304,7 +304,7 @@ class kvm_spawn:
 
 # Print some debugging info
 if match == None and self.poll() != 0:
-logging.debug(Timeout elapsed or process terminated. Output:,
+logging.debug(Timeout elapsed or process terminated. Output:%s,
   format_str_for_message(data.strip()))
 
 return (match, data)
@@ -465,8 +465,8 @@ class kvm_spawn:
 
 # Print some debugging info
 if status != 0:
-logging.debug(Command failed; status: %d, output: % status \
-+ format_str_for_message(output.strip()))
+logging.debug(Command failed; status: %d, output:%s, status,
+  format_str_for_message(output.strip()))
 
 return (status, output)
 
-- 
1.6.2.2

___
Autotest mailing list
autot...@test.kernel.org
http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv4 00/13] qemu: MSI-X support

2009-06-10 Thread Michael S. Tsirkin
Here is the port of MSI-X support patches to upstream qemu.
Please comment or commit.

This patchset adds generic support for MSI-X, adds implementation in
APIC, and uses MSI-X in virtio-net.

Changelog:
- since v3
  call to resize_region on load
  split patches a bit differently to address style comments by Glauber
  update commit message to clarify what msix_support flag does
- since v2
  rename mask - wmask to avoid conflict with work by Yamahata
- since v1
  At Paul's suggestion, use stl_phy to decouple APIC and MSI-X implementation

This uses the mask table patch that I posted previously, and which is
included in the series.

-- 
MST

Michael S. Tsirkin (13):
  qemu: make default_write_config use mask table
  qemu: capability bits in pci save/restore
  qemu: add routines to manage PCI capabilities
  qemu: helper routines for pci access
  qemu: MSI-X support functions
  qemu: add flag to disable MSI-X by default
  qemu: minimal MSI/MSI-X implementation for PC
  qemu: add support for resizing regions
  qemu: virtio support for many interrupt vectors
  qemu: MSI-X support in virtio PCI
  qemu: request 3 vectors in virtio-net
  qemu: virtio save/load bindings
  qemu: add pci_get/set_byte

 Makefile.target|2 +-
 hw/apic.c  |   43 +-
 hw/msix.c  |  420 
 hw/msix.h  |   35 +
 hw/pci.c   |  295 -
 hw/pci.h   |  105 +-
 hw/syborg_virtio.c |   13 ++-
 hw/virtio-net.c|1 +
 hw/virtio-pci.c|  216 ++-
 hw/virtio.c|   70 ++---
 hw/virtio.h|   14 ++-
 qemu-options.hx|2 +
 vl.c   |3 +
 13 files changed, 1005 insertions(+), 214 deletions(-)
 create mode 100644 hw/msix.c
 create mode 100644 hw/msix.h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv4 01/13] qemu: make default_write_config use mask table

2009-06-10 Thread Michael S. Tsirkin
Change much of hw/pci to use symbolic constants and a table-driven
design: add a mask table with writable bits set and readonly bits unset.
Detect change by comparing original and new registers.

This makes it easy to support capabilities where read-only/writeable
bit layout differs between devices, depending on capabilities present.

As a result, writing a single byte in BAR registers now works as
it should. Writing to upper limit registers in the bridge
also works as it should. Code is also shorter.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/pci.c |  145 -
 hw/pci.h |   18 +++-
 2 files changed, 46 insertions(+), 117 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 8c904ba..766e0c6 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -238,6 +238,17 @@ int pci_assign_devaddr(const char *addr, int *domp, int 
*busp, unsigned *slotp)
 return pci_parse_devaddr(devaddr, domp, busp, slotp);
 }
 
+static void pci_init_mask(PCIDevice *dev)
+{
+int i;
+dev-wmask[PCI_CACHE_LINE_SIZE] = 0xff;
+dev-wmask[PCI_INTERRUPT_LINE] = 0xff;
+dev-wmask[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY
+  | PCI_COMMAND_MASTER;
+for (i = PCI_CONFIG_HEADER_SIZE; i  PCI_CONFIG_SPACE_SIZE; ++i)
+dev-wmask[i] = 0xff;
+}
+
 /* -1 for devfn means auto assign */
 static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
  const char *name, int devfn,
@@ -257,6 +268,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 pstrcpy(pci_dev-name, sizeof(pci_dev-name), name);
 memset(pci_dev-irq_state, 0, sizeof(pci_dev-irq_state));
 pci_set_default_subsystem_id(pci_dev);
+pci_init_mask(pci_dev);
 
 if (!config_read)
 config_read = pci_default_read_config;
@@ -328,6 +340,7 @@ void pci_register_io_region(PCIDevice *pci_dev, int 
region_num,
 {
 PCIIORegion *r;
 uint32_t addr;
+uint32_t wmask;
 
 if ((unsigned int)region_num = PCI_NUM_REGIONS)
 return;
@@ -343,12 +356,17 @@ void pci_register_io_region(PCIDevice *pci_dev, int 
region_num,
 r-size = size;
 r-type = type;
 r-map_func = map_func;
+
+wmask = ~(size - 1);
 if (region_num == PCI_ROM_SLOT) {
 addr = 0x30;
+/* ROM enable bit is writeable */
+wmask |= 1;
 } else {
 addr = 0x10 + region_num * 4;
 }
 *(uint32_t *)(pci_dev-config + addr) = cpu_to_le32(type);
+*(uint32_t *)(pci_dev-wmask + addr) = cpu_to_le32(wmask);
 }
 
 static void pci_update_mappings(PCIDevice *d)
@@ -457,118 +475,21 @@ uint32_t pci_default_read_config(PCIDevice *d,
 return val;
 }
 
-void pci_default_write_config(PCIDevice *d,
-  uint32_t address, uint32_t val, int len)
+void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
 {
-int can_write, i;
-uint32_t end, addr;
-
-if (len == 4  ((address = 0x10  address  0x10 + 4 * 6) ||
- (address = 0x30  address  0x34))) {
-PCIIORegion *r;
-int reg;
+uint8_t orig[PCI_CONFIG_SPACE_SIZE];
+int i;
 
-if ( address = 0x30 ) {
-reg = PCI_ROM_SLOT;
-}else{
-reg = (address - 0x10)  2;
-}
-r = d-io_regions[reg];
-if (r-size == 0)
-goto default_config;
-/* compute the stored value */
-if (reg == PCI_ROM_SLOT) {
-/* keep ROM enable bit */
-val = (~(r-size - 1)) | 1;
-} else {
-val = ~(r-size - 1);
-val |= r-type;
-}
-*(uint32_t *)(d-config + address) = cpu_to_le32(val);
-pci_update_mappings(d);
-return;
-}
- default_config:
 /* not efficient, but simple */
-addr = address;
-for(i = 0; i  len; i++) {
-/* default read/write accesses */
-switch(d-config[0x0e]) {
-case 0x00:
-case 0x80:
-switch(addr) {
-case 0x00:
-case 0x01:
-case 0x02:
-case 0x03:
-case 0x06:
-case 0x07:
-case 0x08:
-case 0x09:
-case 0x0a:
-case 0x0b:
-case 0x0e:
-case 0x10 ... 0x27: /* base */
-case 0x2c ... 0x2f: /* read-only subsystem ID  vendor ID */
-case 0x30 ... 0x33: /* rom */
-case 0x3d:
-can_write = 0;
-break;
-default:
-can_write = 1;
-break;
-}
-break;
-default:
-case 0x01:
-switch(addr) {
-case 0x00:
-case 0x01:
-case 0x02:
-case 0x03:
-case 0x06:
-case 0x07:
-case 0x08:
-case 0x09:
-case 0x0a:
-case 0x0b:
-  

[PATCHv4 02/13] qemu: capability bits in pci save/restore

2009-06-10 Thread Michael S. Tsirkin
Add support for capability bits in save/restore for pci.
These will be used for MSI, where the capability might
be present or not as requested by user, which does not
map well into a single version number.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/pci.c |   17 ++---
 hw/pci.h |4 
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 766e0c6..6740b07 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -126,9 +126,13 @@ int pci_bus_num(PCIBus *s)
 
 void pci_device_save(PCIDevice *s, QEMUFile *f)
 {
+int version = s-cap_present ? 3 : 2;
 int i;
 
-qemu_put_be32(f, 2); /* PCI device version */
+/* PCI device version and capabilities */
+qemu_put_be32(f, version);
+if (version = 3)
+qemu_put_be32(f, s-cap_present);
 qemu_put_buffer(f, s-config, 256);
 for (i = 0; i  4; i++)
 qemu_put_be32(f, s-irq_state[i]);
@@ -140,15 +144,22 @@ int pci_device_load(PCIDevice *s, QEMUFile *f)
 int i;
 
 version_id = qemu_get_be32(f);
-if (version_id  2)
+if (version_id  3)
 return -EINVAL;
+if (version_id = 3)
+s-cap_present = qemu_get_be32(f);
+else
+s-cap_present = 0;
+
+if (s-cap_present  ~s-cap_supported)
+return -EINVAL;
+
 qemu_get_buffer(f, s-config, 256);
 pci_update_mappings(s);
 
 if (version_id = 2)
 for (i = 0; i  4; i ++)
 s-irq_state[i] = qemu_get_be32(f);
-
 return 0;
 }
 
diff --git a/hw/pci.h b/hw/pci.h
index c50ea50..656badc 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -174,6 +174,10 @@ struct PCIDevice {
 
 /* Current IRQ levels.  Used internally by the generic PCI code.  */
 int irq_state[4];
+
+/* Capability bits for save/load */
+uint32_t cap_supported;
+uint32_t cap_present;
 };
 
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
-- 
1.6.3.1.56.g79e1.dirty

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv4 03/13] qemu: add routines to manage PCI capabilities

2009-06-10 Thread Michael S. Tsirkin
Add routines to manage PCI capability list. First user will be MSI-X.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/pci.c |   79 ++
 hw/pci.h |   18 +-
 2 files changed, 96 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 6740b07..a3f3fd3 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -160,6 +160,12 @@ int pci_device_load(PCIDevice *s, QEMUFile *f)
 if (version_id = 2)
 for (i = 0; i  4; i ++)
 s-irq_state[i] = qemu_get_be32(f);
+/* Clear wmask and used bits for capabilities.
+   Must be restored separately, since capabilities can
+   be placed anywhere in config space. */
+memset(s-used, 0, PCI_CONFIG_SPACE_SIZE);
+for (i = PCI_CONFIG_HEADER_SIZE; i  PCI_CONFIG_SPACE_SIZE; ++i)
+s-wmask[i] = 0xff;
 return 0;
 }
 
@@ -865,3 +871,76 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const 
char *name)
 
 return (PCIDevice *)dev;
 }
+
+static int pci_find_space(PCIDevice *pdev, uint8_t size)
+{
+int offset = PCI_CONFIG_HEADER_SIZE;
+int i;
+for (i = PCI_CONFIG_HEADER_SIZE; i  PCI_CONFIG_SPACE_SIZE; ++i)
+if (pdev-used[i])
+offset = i + 1;
+else if (i - offset + 1 == size)
+return offset;
+return 0;
+}
+
+static uint8_t pci_find_capability_list(PCIDevice *pdev, uint8_t cap_id,
+uint8_t *prev_p)
+{
+uint8_t next, prev;
+
+if (!(pdev-config[PCI_STATUS]  PCI_STATUS_CAP_LIST))
+return 0;
+
+for (prev = PCI_CAPABILITY_LIST; (next = pdev-config[prev]);
+ prev = next + PCI_CAP_LIST_NEXT)
+if (pdev-config[next + PCI_CAP_LIST_ID] == cap_id)
+break;
+
+if (prev_p)
+*prev_p = prev;
+return next;
+}
+
+/* Reserve space and add capability to the linked list in pci config space */
+int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
+{
+uint8_t offset = pci_find_space(pdev, size);
+uint8_t *config = pdev-config + offset;
+if (!offset)
+return -ENOSPC;
+config[PCI_CAP_LIST_ID] = cap_id;
+config[PCI_CAP_LIST_NEXT] = pdev-config[PCI_CAPABILITY_LIST];
+pdev-config[PCI_CAPABILITY_LIST] = offset;
+pdev-config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
+memset(pdev-used + offset, 0xFF, size);
+/* Make capability read-only by default */
+memset(pdev-wmask + offset, 0, size);
+return offset;
+}
+
+/* Unlink capability from the pci config space. */
+void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
+{
+uint8_t prev, offset = pci_find_capability_list(pdev, cap_id, prev);
+if (!offset)
+return;
+pdev-config[prev] = pdev-config[offset + PCI_CAP_LIST_NEXT];
+/* Make capability writeable again */
+memset(pdev-wmask + offset, 0xff, size);
+memset(pdev-used + offset, 0, size);
+
+if (!pdev-config[PCI_CAPABILITY_LIST])
+pdev-config[PCI_STATUS] = ~PCI_STATUS_CAP_LIST;
+}
+
+/* Reserve space for capability at a known offset (to call after load). */
+void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size)
+{
+memset(pdev-used + offset, 0xff, size);
+}
+
+uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id)
+{
+return pci_find_capability_list(pdev, cap_id, NULL);
+}
diff --git a/hw/pci.h b/hw/pci.h
index 656badc..2ae72f0 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -121,6 +121,10 @@ typedef struct PCIIORegion {
 #define PCI_MIN_GNT0x3e/* 8 bits */
 #define PCI_MAX_LAT0x3f/* 8 bits */
 
+/* Capability lists */
+#define PCI_CAP_LIST_ID0   /* Capability ID */
+#define PCI_CAP_LIST_NEXT  1   /* Next capability in the list */
+
 #define PCI_REVISION0x08/* obsolete, use PCI_REVISION_ID */
 #define PCI_SUBVENDOR_ID0x2c/* obsolete, use 
PCI_SUBSYSTEM_VENDOR_ID */
 #define PCI_SUBDEVICE_ID0x2e/* obsolete, use PCI_SUBSYSTEM_ID */
@@ -128,7 +132,7 @@ typedef struct PCIIORegion {
 /* Bits in the PCI Status Register (PCI 2.3 spec) */
 #define PCI_STATUS_RESERVED1   0x007
 #define PCI_STATUS_INT_STATUS  0x008
-#define PCI_STATUS_CAPABILITIES0x010
+#define PCI_STATUS_CAP_LIST0x010
 #define PCI_STATUS_66MHZ   0x020
 #define PCI_STATUS_RESERVED2   0x040
 #define PCI_STATUS_FAST_BACK   0x080
@@ -158,6 +162,9 @@ struct PCIDevice {
 /* Used to implement R/W bytes */
 uint8_t wmask[PCI_CONFIG_SPACE_SIZE];
 
+/* Used to allocate config space for capabilities. */
+uint8_t used[PCI_CONFIG_SPACE_SIZE];
+
 /* the following fields are read only */
 PCIBus *bus;
 int devfn;
@@ -190,6 +197,15 @@ void pci_register_io_region(PCIDevice *pci_dev, int 
region_num,
 uint32_t size, int type,
 PCIMapIORegionFunc *map_func);
 
+int pci_add_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
+
+void 

[PATCHv4 04/13] qemu: helper routines for pci access

2009-06-10 Thread Michael S. Tsirkin
Add inline routines for convenient access to pci devices
with correct (little) endianness. Will be used by MSI-X support.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/pci.h |   30 +++---
 1 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/hw/pci.h b/hw/pci.h
index 2ae72f0..c603384 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -236,21 +236,45 @@ PCIBus *pci_bridge_init(PCIBus *bus, int devfn, uint16_t 
vid, uint16_t did,
 pci_map_irq_fn map_irq, const char *name);
 
 static inline void
+pci_set_word(uint8_t *config, uint16_t val)
+{
+cpu_to_le16wu((uint16_t *)config, val);
+}
+
+static inline uint16_t
+pci_get_word(uint8_t *config)
+{
+return le16_to_cpupu((uint16_t *)config);
+}
+
+static inline void
+pci_set_long(uint8_t *config, uint32_t val)
+{
+cpu_to_le32wu((uint32_t *)config, val);
+}
+
+static inline uint32_t
+pci_get_long(uint8_t *config)
+{
+return le32_to_cpupu((uint32_t *)config);
+}
+
+static inline void
 pci_config_set_vendor_id(uint8_t *pci_config, uint16_t val)
 {
-cpu_to_le16wu((uint16_t *)pci_config[PCI_VENDOR_ID], val);
+pci_set_word(pci_config[PCI_VENDOR_ID], val);
 }
 
 static inline void
 pci_config_set_device_id(uint8_t *pci_config, uint16_t val)
 {
-cpu_to_le16wu((uint16_t *)pci_config[PCI_DEVICE_ID], val);
+pci_set_word(pci_config[PCI_DEVICE_ID], val);
 }
 
 static inline void
 pci_config_set_class(uint8_t *pci_config, uint16_t val)
 {
-cpu_to_le16wu((uint16_t *)pci_config[PCI_CLASS_DEVICE], val);
+pci_set_word(pci_config[PCI_CLASS_DEVICE], val);
 }
 
 typedef void (*pci_qdev_initfn)(PCIDevice *dev);
-- 
1.6.3.1.56.g79e1.dirty

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv4 05/13] qemu: MSI-X support functions

2009-06-10 Thread Michael S. Tsirkin
Add functions implementing MSI-X support. First user will be virtio-pci.
Note that platform must set a flag to declare MSI supported: this
is a safety measure to avoid breaking platforms which should support
MSI-X but currently lack this in the interrupt controller emulation.
For PC this will be set by APIC.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 Makefile.target |2 +-
 hw/msix.c   |  417 +++
 hw/msix.h   |   35 +
 hw/pci.h|   20 +++
 4 files changed, 473 insertions(+), 1 deletions(-)
 create mode 100644 hw/msix.c
 create mode 100644 hw/msix.h

diff --git a/Makefile.target b/Makefile.target
index 27de4b9..85d8a4a 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -494,7 +494,7 @@ endif #CONFIG_BSD_USER
 ifndef CONFIG_USER_ONLY
 
 OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o \
- gdbstub.o gdbstub-xml.o
+ gdbstub.o gdbstub-xml.o msix.o
 # virtio has to be here due to weird dependency between PCI and virtio-net.
 # need to fix this properly
 OBJS+=virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o
diff --git a/hw/msix.c b/hw/msix.c
new file mode 100644
index 000..a448eab
--- /dev/null
+++ b/hw/msix.c
@@ -0,0 +1,417 @@
+/*
+ * MSI-X device support
+ *
+ * This module includes support for MSI-X in pci devices.
+ *
+ * Author: Michael S. Tsirkin m...@redhat.com
+ *
+ *  Copyright (c) 2009, Red Hat Inc, Michael S. Tsirkin (m...@redhat.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include hw.h
+#include msix.h
+#include pci.h
+
+/* Declaration from linux/pci_regs.h */
+#define  PCI_CAP_ID_MSIX 0x11 /* MSI-X */
+#define  PCI_MSIX_FLAGS 2 /* Table at lower 11 bits */
+#define  PCI_MSIX_FLAGS_QSIZE  0x7FF
+#define  PCI_MSIX_FLAGS_ENABLE (1  15)
+#define  PCI_MSIX_FLAGS_BIRMASK(7  0)
+
+/* MSI-X capability structure */
+#define MSIX_TABLE_OFFSET 4
+#define MSIX_PBA_OFFSET 8
+#define MSIX_CAP_LENGTH 12
+
+/* MSI enable bit is in byte 1 in FLAGS register */
+#define MSIX_ENABLE_OFFSET (PCI_MSIX_FLAGS + 1)
+#define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE  8)
+
+/* MSI-X table format */
+#define MSIX_MSG_ADDR 0
+#define MSIX_MSG_UPPER_ADDR 4
+#define MSIX_MSG_DATA 8
+#define MSIX_VECTOR_CTRL 12
+#define MSIX_ENTRY_SIZE 16
+#define MSIX_VECTOR_MASK 0x1
+
+/* How much space does an MSIX table need. */
+/* The spec requires giving the table structure
+ * a 4K aligned region all by itself. Align it to
+ * target pages so that drivers can do passthrough
+ * on the rest of the region. */
+#define MSIX_PAGE_SIZE TARGET_PAGE_ALIGN(0x1000)
+/* Reserve second half of the page for pending bits */
+#define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2)
+#define MSIX_MAX_ENTRIES 32
+
+
+#ifdef MSIX_DEBUG
+#define DEBUG(fmt, ...)   \
+do {  \
+  fprintf(stderr, %s:  fmt, __func__ , __VA_ARGS__);\
+} while (0)
+#else
+#define DEBUG(fmt, ...) do { } while(0)
+#endif
+
+/* Flag to globally disable MSI-X support */
+int msix_disable;
+
+/* Flag for interrupt controller to declare MSI-X support */
+int msix_supported;
+
+/* Add MSI-X capability to the config space for the device. */
+/* Given a bar and its size, add MSI-X table on top of it
+ * and fill MSI-X capability in the config space.
+ * Original bar size must be a power of 2 or 0.
+ * New bar size is returned. */
+static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
+   unsigned bar_nr, unsigned bar_size)
+{
+int config_offset;
+uint8_t *config;
+uint32_t new_size;
+
+if (nentries  1 || nentries  PCI_MSIX_FLAGS_QSIZE + 1)
+return -EINVAL;
+if (bar_size  0x8000)
+return -ENOSPC;
+
+/* Add space for MSI-X structures */
+if (!bar_size)
+new_size = MSIX_PAGE_SIZE;
+else if (bar_size  MSIX_PAGE_SIZE) {
+bar_size = MSIX_PAGE_SIZE;
+new_size = MSIX_PAGE_SIZE * 2;
+} else
+new_size = bar_size * 2;
+
+pdev-msix_bar_size = new_size;
+config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
+if (config_offset  0)
+return config_offset;
+config = pdev-config + config_offset;
+
+pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
+/* Table on top of BAR */
+pci_set_long(config + MSIX_TABLE_OFFSET, bar_size | bar_nr);
+/* Pending bits on top of that */
+pci_set_long(config + MSIX_PBA_OFFSET, (bar_size + MSIX_PAGE_PENDING) |
+ bar_nr);
+pdev-msix_cap = config_offset;
+/* Make flags bit writeable. */
+pdev-wmask[config_offset + MSIX_ENABLE_OFFSET] |= MSIX_ENABLE_MASK;
+return 0;
+}
+
+static void msix_free_irq_entries(PCIDevice *dev)
+{
+int vector;
+
+for (vector = 0; vector  dev-msix_entries_nr; ++vector)
+

[PATCHv4 06/13] qemu: add flag to disable MSI-X by default

2009-06-10 Thread Michael S. Tsirkin
Add global flag to disable MSI-X by default.  This is useful primarily
to make images loadable by older qemu (without msix).  Even when MSI-X
is disabled by flag, you can still load images that have MSI-X enabled.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/msix.c   |3 +++
 qemu-options.hx |2 ++
 vl.c|3 +++
 3 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index a448eab..ce4e6ba 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -122,6 +122,9 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
uint32_t val, int len)
 {
 unsigned enable_pos = dev-msix_cap + MSIX_ENABLE_OFFSET;
+if (!(dev-cap_present  QEMU_PCI_CAP_MSIX))
+return;
+
 if (addr + len = enable_pos || addr  enable_pos)
 return;
 
diff --git a/qemu-options.hx b/qemu-options.hx
index fa549c3..ed92ec2 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1575,3 +1575,5 @@ DEF(semihosting, 0, QEMU_OPTION_semihosting,
 DEF(old-param, 0, QEMU_OPTION_old_param,
 -old-param  old param mode\n)
 #endif
+DEF(disable-msix, 0, QEMU_OPTION_disable_msix,
+-disable-msix disable msix support for PCI devices (enabled by 
default)\n)
diff --git a/vl.c b/vl.c
index f08f0f3..8c116c7 100644
--- a/vl.c
+++ b/vl.c
@@ -135,6 +135,7 @@ int main(int argc, char **argv)
 #include hw/usb.h
 #include hw/pcmcia.h
 #include hw/pc.h
+#include hw/msix.h
 #include hw/audiodev.h
 #include hw/isa.h
 #include hw/baum.h
@@ -5681,6 +5682,8 @@ int main(int argc, char **argv, char **envp)
 xen_mode = XEN_ATTACH;
 break;
 #endif
+case QEMU_OPTION_disable_msix:
+msix_disable = 1;
 }
 }
 }
-- 
1.6.3.1.56.g79e1.dirty

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv4 07/13] qemu: minimal MSI/MSI-X implementation for PC

2009-06-10 Thread Michael S. Tsirkin
Implement MSI support in APIC. Note that MSI and MMIO APIC registers
are at the same memory location, but actually not on the global bus: MSI
is on PCI bus, APIC is connected directly to the CPU. We map them on the
global bus at the same address which happens to work because MSI
registers are reserved in APIC MMIO and vice versa.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/apic.c |   43 +++
 1 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 8c8b2de..ed03a36 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -19,6 +19,8 @@
  */
 #include hw.h
 #include pc.h
+#include pci.h
+#include msix.h
 #include qemu-timer.h
 #include host-utils.h
 
@@ -63,6 +65,19 @@
 #define MAX_APICS 255
 #define MAX_APIC_WORDS 8
 
+/* Intel APIC constants: from include/asm/msidef.h */
+#define MSI_DATA_VECTOR_SHIFT  0
+#define MSI_DATA_VECTOR_MASK   0x00ff
+#define MSI_DATA_DELIVERY_MODE_SHIFT   8
+#define MSI_DATA_TRIGGER_SHIFT 15
+#define MSI_DATA_LEVEL_SHIFT   14
+#define MSI_ADDR_DEST_MODE_SHIFT   2
+#define MSI_ADDR_DEST_ID_SHIFT 12
+#defineMSI_ADDR_DEST_ID_MASK   0x000
+
+#define MSI_ADDR_BASE   0xfee0
+#define MSI_ADDR_SIZE   0x10
+
 typedef struct APICState {
 CPUState *cpu_env;
 uint32_t apicbase;
@@ -712,11 +727,31 @@ static uint32_t apic_mem_readl(void *opaque, 
target_phys_addr_t addr)
 return val;
 }
 
+static void apic_send_msi(target_phys_addr_t addr, uint32 data)
+{
+uint8_t dest = (addr  MSI_ADDR_DEST_ID_MASK)  MSI_ADDR_DEST_ID_SHIFT;
+uint8_t vector = (data  MSI_DATA_VECTOR_MASK)  MSI_DATA_VECTOR_SHIFT;
+uint8_t dest_mode = (addr  MSI_ADDR_DEST_MODE_SHIFT)  0x1;
+uint8_t trigger_mode = (data  MSI_DATA_TRIGGER_SHIFT)  0x1;
+uint8_t delivery = (data  MSI_DATA_DELIVERY_MODE_SHIFT)  0x7;
+/* XXX: Ignore redirection hint. */
+apic_deliver_irq(dest, dest_mode, delivery, vector, 0, trigger_mode);
+}
+
 static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t 
val)
 {
 CPUState *env;
 APICState *s;
-int index;
+int index = (addr  4)  0xff;
+if (addr  0xfff || !index) {
+/* MSI and MMIO APIC are at the same memory location,
+ * but actually not on the global bus: MSI is on PCI bus
+ * APIC is connected directly to the CPU.
+ * Mapping them on the global bus happens to work because
+ * MSI registers are reserved in APIC MMIO and vice versa. */
+apic_send_msi(addr, val);
+return;
+}
 
 env = cpu_single_env;
 if (!env)
@@ -727,7 +762,6 @@ static void apic_mem_writel(void *opaque, 
target_phys_addr_t addr, uint32_t val)
 printf(APIC write: %08x = %08x\n, (uint32_t)addr, val);
 #endif
 
-index = (addr  4)  0xff;
 switch(index) {
 case 0x02:
 s-id = (val  24);
@@ -911,6 +945,7 @@ int apic_init(CPUState *env)
 s-cpu_env = env;
 
 apic_reset(s);
+msix_supported = 1;
 
 /* XXX: mapping more APICs at the same memory location */
 if (apic_io_memory == 0) {
@@ -918,7 +953,8 @@ int apic_init(CPUState *env)
on the global memory bus. */
 apic_io_memory = cpu_register_io_memory(0, apic_mem_read,
 apic_mem_write, NULL);
-cpu_register_physical_memory(s-apicbase  ~0xfff, 0x1000,
+/* XXX: what if the base changes? */
+cpu_register_physical_memory(MSI_ADDR_BASE, MSI_ADDR_SIZE,
  apic_io_memory);
 }
 s-timer = qemu_new_timer(vm_clock, apic_timer, s);
@@ -929,4 +965,3 @@ int apic_init(CPUState *env)
 local_apics[s-id] = s;
 return 0;
 }
-
-- 
1.6.3.1.56.g79e1.dirty

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv4 09/13] qemu: virtio support for many interrupt vectors

2009-06-10 Thread Michael S. Tsirkin
Extend virtio to support many interrupt vectors, and rearrange code in
preparation for multi-vector support (mostly move reset out to bindings,
because we will have to reset the vectors in transport-specific code).
Actual bindings in pci, and use in net, to follow.
Load and save are not connected to bindings yet, so they are left
stubbed out for now.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/syborg_virtio.c |   13 --
 hw/virtio-pci.c|   24 
 hw/virtio.c|   59 ++-
 hw/virtio.h|   10 +++-
 4 files changed, 81 insertions(+), 25 deletions(-)

diff --git a/hw/syborg_virtio.c b/hw/syborg_virtio.c
index 37c219c..d8c978a 100644
--- a/hw/syborg_virtio.c
+++ b/hw/syborg_virtio.c
@@ -134,7 +134,10 @@ static void syborg_virtio_writel(void *opaque, 
target_phys_addr_t offset,
 vdev-features = value;
 break;
 case SYBORG_VIRTIO_QUEUE_BASE:
-virtio_queue_set_addr(vdev, vdev-queue_sel, value);
+if (value == 0)
+virtio_reset(vdev);
+else
+virtio_queue_set_addr(vdev, vdev-queue_sel, value);
 break;
 case SYBORG_VIRTIO_QUEUE_SEL:
 if (value  VIRTIO_PCI_QUEUE_MAX)
@@ -228,7 +231,7 @@ static CPUWriteMemoryFunc *syborg_virtio_writefn[] = {
  syborg_virtio_writel
 };
 
-static void syborg_virtio_update_irq(void *opaque)
+static void syborg_virtio_update_irq(void *opaque, uint16_t vector)
 {
 SyborgVirtIOProxy *proxy = opaque;
 int level;
@@ -239,7 +242,7 @@ static void syborg_virtio_update_irq(void *opaque)
 }
 
 static VirtIOBindings syborg_virtio_bindings = {
-.update_irq = syborg_virtio_update_irq
+.notify = syborg_virtio_update_irq
 };
 
 static void syborg_virtio_init(SyborgVirtIOProxy *proxy, VirtIODevice *vdev)
@@ -248,6 +251,8 @@ static void syborg_virtio_init(SyborgVirtIOProxy *proxy, 
VirtIODevice *vdev)
 
 proxy-vdev = vdev;
 
+/* Don't support multiple vectors */
+proxy-vdev-nvectors = 0;
 sysbus_init_irq(proxy-busdev, proxy-irq);
 iomemtype = cpu_register_io_memory(0, syborg_virtio_readfn,
syborg_virtio_writefn, proxy);
@@ -255,6 +260,8 @@ static void syborg_virtio_init(SyborgVirtIOProxy *proxy, 
VirtIODevice *vdev)
 
 proxy-id = ((uint32_t)0x1af4  16) | vdev-device_id;
 
+qemu_register_reset(virtio_reset, 0, vdev);
+
 virtio_bind_device(vdev, syborg_virtio_bindings, proxy);
 }
 
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index c072423..7dfdd80 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -78,13 +78,19 @@ typedef struct {
 
 /* virtio device */
 
-static void virtio_pci_update_irq(void *opaque)
+static void virtio_pci_notify(void *opaque, uint16_t vector)
 {
 VirtIOPCIProxy *proxy = opaque;
 
 qemu_set_irq(proxy-pci_dev.irq[0], proxy-vdev-isr  1);
 }
 
+static void virtio_pci_reset(void *opaque)
+{
+VirtIOPCIProxy *proxy = opaque;
+virtio_reset(proxy-vdev);
+}
+
 static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
 VirtIOPCIProxy *proxy = opaque;
@@ -108,7 +114,10 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 break;
 case VIRTIO_PCI_QUEUE_PFN:
 pa = (target_phys_addr_t)val  VIRTIO_PCI_QUEUE_ADDR_SHIFT;
-virtio_queue_set_addr(vdev, vdev-queue_sel, pa);
+if (pa == 0)
+virtio_pci_reset(proxy);
+else
+virtio_queue_set_addr(vdev, vdev-queue_sel, pa);
 break;
 case VIRTIO_PCI_QUEUE_SEL:
 if (val  VIRTIO_PCI_QUEUE_MAX)
@@ -120,7 +129,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 case VIRTIO_PCI_STATUS:
 vdev-status = val  0xFF;
 if (vdev-status == 0)
-virtio_reset(vdev);
+virtio_pci_reset(proxy);
 break;
 }
 }
@@ -158,7 +167,7 @@ static uint32_t virtio_ioport_read(void *opaque, uint32_t 
addr)
 /* reading from the ISR also clears it. */
 ret = vdev-isr;
 vdev-isr = 0;
-virtio_update_irq(vdev);
+qemu_set_irq(proxy-pci_dev.irq[0], 0);
 break;
 default:
 break;
@@ -243,7 +252,7 @@ static void virtio_map(PCIDevice *pci_dev, int region_num,
 }
 
 static const VirtIOBindings virtio_pci_bindings = {
-.update_irq = virtio_pci_update_irq
+.notify = virtio_pci_notify
 };
 
 static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
@@ -255,6 +264,9 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, 
VirtIODevice *vdev,
 
 proxy-vdev = vdev;
 
+/* No support for multiple vectors yet. */
+proxy-vdev-nvectors = 0;
+
 config = proxy-pci_dev.config;
 pci_config_set_vendor_id(config, vendor);
 pci_config_set_device_id(config, device);
@@ -279,6 +291,8 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, 
VirtIODevice *vdev,
 pci_register_io_region(proxy-pci_dev, 0, 

Re: [KVM-AUTOTEST PATCH 3/4] Fix bad line breaks

2009-06-10 Thread Michael Goldish
Looks fine to me.

- Original Message -
From: Lucas Meneghel Rodrigues l...@redhat.com
To: autot...@test.kernel.org
Cc: kvm@vger.kernel.org, Lucas Meneghel Rodrigues l...@redhat.com
Sent: Tuesday, June 9, 2009 7:33:28 PM (GMT+0200) Auto-Detected
Subject: [KVM-AUTOTEST PATCH 3/4] Fix bad line breaks

During the conversion of the kvm_runtest_2 to the kvm test, some
bad line breaks were introduced. Started using parenthesis
implicit line continuation instead of backslash continuation in
assignments that were using it.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/kvm_guest_wizard.py |8 
 client/tests/kvm/kvm_tests.py|4 ++--
 client/tests/kvm/kvm_utils.py|   12 ++--
 client/tests/kvm/kvm_vm.py   |4 ++--
 client/tests/kvm/make_html_report.py |6 ++
 5 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/client/tests/kvm/kvm_guest_wizard.py 
b/client/tests/kvm/kvm_guest_wizard.py
index 01aeb97..2dd9be5 100644
--- a/client/tests/kvm/kvm_guest_wizard.py
+++ b/client/tests/kvm/kvm_guest_wizard.py
@@ -105,8 +105,8 @@ def barrier_2(vm, words, fail_if_stuck_for, 
stuck_detection_history,
 time.sleep(sleep_duration)
 
 # Failure
-message = Barrier failed at step %s after %.2f seconds (%s) % \
-(current_step_num, time.time() - start_time, failure_message)
+message = (Barrier failed at step %s after %.2f seconds (%s) %
+   (current_step_num, time.time() - start_time, failure_message))
 
 # What should we do with this failure?
 if words[-1] == optional:
@@ -201,9 +201,9 @@ def run_steps(test, params, env):
 logging.error(Variable not defined: %s % words[1])
 elif words[0] == barrier_2:
 if current_screendump:
-scrdump_filename = \
+scrdump_filename = (
 os.path.join(ppm_utils.get_data_dir(steps_filename),
- current_screendump)
+ current_screendump))
 else:
 scrdump_filename = None
 if not barrier_2(vm, words, fail_if_stuck_for,
diff --git a/client/tests/kvm/kvm_tests.py b/client/tests/kvm/kvm_tests.py
index 48e..54d2a7a 100644
--- a/client/tests/kvm/kvm_tests.py
+++ b/client/tests/kvm/kvm_tests.py
@@ -321,8 +321,8 @@ def run_autotest(test, params, env):
 status_fail = False
 if result_list == []:
 status_fail = True
-message_fail = Test '%s' did not produce any recognizable
- results % test_name
+message_fail = (Test '%s' did not produce any recognizable 
+results % test_name)
 for result in result_list:
 logging.info(str(result))
 if result[1] == FAIL:
diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index 434190d..0f4c770 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -644,8 +644,8 @@ def scp_to_remote(host, port, username, password, 
local_path, remote_path,
 
 @return: True on success and False on failure.
 
-command = scp -o UserKnownHostsFile=/dev/null -r -P %s %s %...@%s:%s % \
-(port, local_path, username, host, remote_path)
+command = (scp -o UserKnownHostsFile=/dev/null -r -P %s %s %...@%s:%s %
+   (port, local_path, username, host, remote_path))
 return remote_scp(command, password, timeout)
 
 
@@ -664,8 +664,8 @@ def scp_from_remote(host, port, username, password, 
remote_path, local_path,
 
 @return: True on success and False on failure.
 
-command = scp -o UserKnownHostsFile=/dev/null -r -P %s %...@%s:%s %s % \
-(port, username, host, remote_path, local_path)
+command = (scp -o UserKnownHostsFile=/dev/null -r -P %s %...@%s:%s %s %
+   (port, username, host, remote_path, local_path))
 return remote_scp(command, password, timeout)
 
 
@@ -681,8 +681,8 @@ def ssh(host, port, username, password, prompt, timeout=10):
 
 @return: kvm_spawn object on success and None on failure.
 
-command = ssh -o UserKnownHostsFile=/dev/null -p %s %...@%s % \
-(port, username, host)
+command = (ssh -o UserKnownHostsFile=/dev/null -p %s %...@%s %
+   (port, username, host))
 return remote_login(command, password, prompt, \n, timeout)
 
 
diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index eb9717b..de21b2f 100644
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -118,8 +118,8 @@ class VM:
 # Find available monitor filename
 while True:
 # The monitor filename should be unique
-self.instance = time.strftime(%Y%m%d-%H%M%S-) + \
-kvm_utils.generate_random_string(4)
+self.instance = (time.strftime(%Y%m%d-%H%M%S-) +
+ kvm_utils.generate_random_string(4))
 self.monitor_file_name = 

[PATCHv4 11/13] qemu: request 3 vectors in virtio-net

2009-06-10 Thread Michael S. Tsirkin
Request up to 3 vectors in virtio-net. Actual bindings might supply
less.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/virtio-net.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 60aa6da..6118fe3 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -621,6 +621,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev)
 n-mac_table.macs = qemu_mallocz(MAC_TABLE_ENTRIES * ETH_ALEN);
 
 n-vlans = qemu_mallocz(MAX_VLAN  3);
+n-vdev.nvectors = 3;
 
 register_savevm(virtio-net, virtio_net_id++, VIRTIO_NET_VM_VERSION,
 virtio_net_save, virtio_net_load, n);
-- 
1.6.3.1.56.g79e1.dirty

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv4 12/13] qemu: virtio save/load bindings

2009-06-10 Thread Michael S. Tsirkin
Implement bindings for virtio save/load. Use them in virtio pci.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/virtio-pci.c |   50 +-
 hw/virtio.c |   33 -
 hw/virtio.h |4 
 3 files changed, 69 insertions(+), 18 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 294f4c7..5a7bdcc 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -105,6 +105,50 @@ static void virtio_pci_notify(void *opaque, uint16_t 
vector)
 qemu_set_irq(proxy-pci_dev.irq[0], proxy-vdev-isr  1);
 }
 
+static void virtio_pci_save_config(void * opaque, QEMUFile *f)
+{
+VirtIOPCIProxy *proxy = opaque;
+pci_device_save(proxy-pci_dev, f);
+msix_save(proxy-pci_dev, f);
+if (msix_present(proxy-pci_dev))
+qemu_put_be16(f, proxy-vdev-config_vector);
+}
+
+static void virtio_pci_save_queue(void * opaque, int n, QEMUFile *f)
+{
+VirtIOPCIProxy *proxy = opaque;
+if (msix_present(proxy-pci_dev))
+qemu_put_be16(f, virtio_queue_vector(proxy-vdev, n));
+}
+
+static int virtio_pci_load_config(void * opaque, QEMUFile *f)
+{
+VirtIOPCIProxy *proxy = opaque;
+int ret;
+ret = pci_device_load(proxy-pci_dev, f);
+if (ret)
+return ret;
+ret = msix_load(proxy-pci_dev, f);
+if (ret)
+return ret;
+if (msix_present(proxy-pci_dev))
+qemu_get_be16s(f, proxy-vdev-config_vector);
+
+pci_resize_io_region(proxy-pci_dev, 1, msix_bar_size(proxy-pci_dev));
+return 0;
+}
+
+static int virtio_pci_load_queue(void * opaque, int n, QEMUFile *f)
+{
+VirtIOPCIProxy *proxy = opaque;
+uint16_t vector;
+if (!msix_present(proxy-pci_dev))
+return 0;
+qemu_get_be16s(f, vector);
+virtio_queue_set_vector(proxy-vdev, n, vector);
+return 0;
+}
+
 static void virtio_pci_reset(void *opaque)
 {
 VirtIOPCIProxy *proxy = opaque;
@@ -317,7 +361,11 @@ static void virtio_write_config(PCIDevice *pci_dev, 
uint32_t address,
 }
 
 static const VirtIOBindings virtio_pci_bindings = {
-.notify = virtio_pci_notify
+.notify = virtio_pci_notify,
+.save_config = virtio_pci_save_config,
+.load_config = virtio_pci_load_config,
+.save_queue = virtio_pci_save_queue,
+.load_queue = virtio_pci_load_queue,
 };
 
 static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
diff --git a/hw/virtio.c b/hw/virtio.c
index fe9f793..b773dff 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -568,9 +568,8 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 {
 int i;
 
-/* FIXME: load/save binding.  */
-//pci_device_save(vdev-pci_dev, f);
-//msix_save(vdev-pci_dev, f);
+if (vdev-binding-save_config)
+vdev-binding-save_config(vdev-binding_opaque, f);
 
 qemu_put_8s(f, vdev-status);
 qemu_put_8s(f, vdev-isr);
@@ -596,18 +595,20 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 qemu_put_be32(f, vdev-vq[i].vring.num);
 qemu_put_be64(f, vdev-vq[i].pa);
 qemu_put_be16s(f, vdev-vq[i].last_avail_idx);
-if (vdev-nvectors)
-qemu_put_be16s(f, vdev-vq[i].vector);
+if (vdev-binding-save_queue)
+vdev-binding-save_queue(vdev-binding_opaque, i, f);
 }
 }
 
-void virtio_load(VirtIODevice *vdev, QEMUFile *f)
+int virtio_load(VirtIODevice *vdev, QEMUFile *f)
 {
-int num, i;
+int num, i, ret;
 
-/* FIXME: load/save binding.  */
-//pci_device_load(vdev-pci_dev, f);
-//r = msix_load(vdev-pci_dev, f);
+if (vdev-binding-load_config) {
+ret = vdev-binding-load_config(vdev-binding_opaque, f);
+if (ret)
+return ret;
+}
 
 qemu_get_8s(f, vdev-status);
 qemu_get_8s(f, vdev-isr);
@@ -616,10 +617,6 @@ void virtio_load(VirtIODevice *vdev, QEMUFile *f)
 vdev-config_len = qemu_get_be32(f);
 qemu_get_buffer(f, vdev-config, vdev-config_len);
 
-if (vdev-nvectors) {
-qemu_get_be16s(f, vdev-config_vector);
-//msix_vector_use(vdev-pci_dev, vdev-config_vector);
-}
 num = qemu_get_be32(f);
 
 for (i = 0; i  num; i++) {
@@ -630,13 +627,15 @@ void virtio_load(VirtIODevice *vdev, QEMUFile *f)
 if (vdev-vq[i].pa) {
 virtqueue_init(vdev-vq[i]);
 }
-if (vdev-nvectors) {
-qemu_get_be16s(f, vdev-vq[i].vector);
-//msix_vector_use(vdev-pci_dev, vdev-config_vector);
+if (vdev-binding-load_queue) {
+ret = vdev-binding-load_queue(vdev-binding_opaque, i, f);
+if (ret)
+return ret;
 }
 }
 
 virtio_notify_vector(vdev, VIRTIO_NO_VECTOR);
+return 0;
 }
 
 void virtio_cleanup(VirtIODevice *vdev)
diff --git a/hw/virtio.h b/hw/virtio.h
index 04a3c3d..ce05517 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -72,6 +72,10 @@ typedef struct VirtQueueElement
 
 typedef struct {
 void (*notify)(void * opaque, uint16_t vector);
+void (*save_config)(void * 

[PATCHv4 13/13] qemu: add pci_get/set_byte

2009-06-10 Thread Michael S. Tsirkin
Add pci_get/set_byte to keep *_word and *_long access functions company.
They are unused for now.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/pci.h |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/hw/pci.h b/hw/pci.h
index 9455e4c..07404c5 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -259,6 +259,18 @@ PCIBus *pci_bridge_init(PCIBus *bus, int devfn, uint16_t 
vid, uint16_t did,
 pci_map_irq_fn map_irq, const char *name);
 
 static inline void
+pci_set_byte(uint8_t *config, uint8_t val)
+{
+*config = val;
+}
+
+static inline uint8_t
+pci_get_byte(uint8_t *config)
+{
+return *config;
+}
+
+static inline void
 pci_set_word(uint8_t *config, uint16_t val)
 {
 cpu_to_le16wu((uint16_t *)config, val);
-- 
1.6.3.1.56.g79e1.dirty
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] pte accessor fixes

2009-06-10 Thread Marcelo Tosatti
On Wed, Jun 10, 2009 at 02:28:19PM +0300, Avi Kivity wrote:
 A minor fix to the pte accessors, followed by a cosmetic fix so we don't
 hit the same problem next time, followed by a cosmetic patch just because
 I had my search-and-replace warmed up.  Please review.

Looks good.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] pte accessor fixes

2009-06-10 Thread Avi Kivity

Marcelo Tosatti wrote:

On Wed, Jun 10, 2009 at 02:28:19PM +0300, Avi Kivity wrote:
  

A minor fix to the pte accessors, followed by a cosmetic fix so we don't
hit the same problem next time, followed by a cosmetic patch just because
I had my search-and-replace warmed up.  Please review.



Looks good.
  


Thanks, merged.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/5] KVM: MMU: add kvm_mmu_shadow_walk helper

2009-06-10 Thread Marcelo Tosatti
On Wed, Jun 10, 2009 at 03:23:47PM +0300, Avi Kivity wrote:
 Marcelo Tosatti wrote:
 Isn't it simpler to invoke for_each_shadow_entry(), instead of 
 defining  a callback and calling it?

 We had those callbacks once, then switched to for_each.
 

 The point is its exported to use in a external module (kvm-intel.ko),
 so you hide the details (such as locking) in the kvm_mmu_shadow_walk
 helper. Let me know how do you prefer this to be.
   

 Ah, you're right.

 I don't think it's worthwhile to add all this just for debugging.  You  
 can add a function that dumps the spte chain as well as the features  
 MSR, and we can decode it by hand when we see it.  Disadvantage is more  
 work for us when we hit the bug, but at least that function is reusable  
 in other contexts.

The problem is if someone hits the bug who has no possibility of
reporting the printks. Nice thing about the WARN_ON's there is you can
look up kerneloops.org, match the line number to the kernel version,
and narrow down what bits are wrong (you still need reporter to send
contents of dmesg for full spte value).

Also the bit-by-bit validity checks (the inspect_spte function in vmx.c)
can be used in the mmu audit code (thats the reason for print=0 in the
callback parameters), so it is reusable in other contexes.

What you dislike is hardcoding the bits checked in C code? Don't worry
about the level stuff, will be handled next.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network throughput limits for local VM - VM communication

2009-06-10 Thread Arnd Bergmann
On Tuesday 09 June 2009, Fischer, Anna wrote:

 I have tried using virtio and using the emulated QEMU virtual NICs.
 It does not make a difference. It seems as if there is an overflow somewhere
 when QEMU/virtio cannot cope with the network load any more, and then the
 virtual interfaces don't seem to transmit anything anymore. It seems to
 mostly work again when I shut down and start up the interfaces of the router
 inside of the guest. I use two bridges (and VLANs) that pass packets between
 sending/receiving guests and the routing guest. The set-up works fine for
 simple ping and other communication that is low-throughput type traffic.

Have you tried eliminating VLAN to simplify the setup?

Does it change when the guests communicate over a -net socket interface
with your router instead of the -net tap + bridge in the host?

Arnd 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Network throughput limits for local VM - VM communication

2009-06-10 Thread Fischer, Anna
 Subject: Re: Network throughput limits for local VM - VM
 communication
 
 On Tuesday 09 June 2009, Fischer, Anna wrote:
 
  I have tried using virtio and using the emulated QEMU virtual NICs.
  It does not make a difference. It seems as if there is an overflow
 somewhere
  when QEMU/virtio cannot cope with the network load any more, and then
 the
  virtual interfaces don't seem to transmit anything anymore. It seems
 to
  mostly work again when I shut down and start up the interfaces of the
 router
  inside of the guest. I use two bridges (and VLANs) that pass packets
 between
  sending/receiving guests and the routing guest. The set-up works fine
 for
  simple ping and other communication that is low-throughput type
 traffic.
 
 Have you tried eliminating VLAN to simplify the setup?

No - but there is a relating bug in the tun/tap interface (well, it is not 
really a bug but simply the way tun/tap works) that will cause packets to be 
replicated on all the tap interfaces (across all bridges attached to those) if 
I do not configure VLANs. This will result in a system that is even more 
overloaded. I had discovered this a while back when running UDP stress tests 
under 10G.

 
 Does it change when the guests communicate over a -net socket interface
 with your router instead of the -net tap + bridge in the host?

I have not tried this - I need the bridge in the network data path for some 
testing, so using the -net socket interface would not solve my problem.

However, I have just today managed to get around this bug by using the e1000 
QEMU emulated NIC model and this seems to do the trick. Now the throughput is 
still very low, but that might simply be because my system is too weak. When 
using the e1000 model instead of rtl8139 or virtio, I do not have any network 
crashes any more.

I have been looking through the RedHat bug lists to search for some hints 
today, and it seems as if there are a lot of people seeing the network under 
KVM break down under heavy load. I think that this is something that needs some 
further investigation. I can provide more detailed system set-up etc, it should 
be easy to reproduce this.

Thanks for your help,
Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-10 Thread Paul Brook
   Note that platform must set a flag to declare MSI supported.
   For PC this will be set by APIC.
 
  This sounds wrong. The device shouldn't know or care whether the system
  has a MSI capable interrupt controller. That's for the guest OS to figure
  out.

 You are right of course. In theory there's nothing that breaks if I
 set this flag to on, on all platforms. OTOH if qemu emulates some
 controller incorrectly, guest might misdetect MSI support in the
 controller, and things will break horribly.

 It seems safer to have a flag that can be enabled by people
 that know about a specific platform.

No. The solution is to fix whatever is broken.

If we really need to avoid MSI-X capable devices then that should be done 
explicity per-device. i.e. you have a different virtio-net device that does 
not use MSI-X.

Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [KVM-AUTOTEST PATCH 1/4] Make all programs on kvm test use /usr/bin/python

2009-06-10 Thread Martin Bligh
On Wed, Jun 10, 2009 at 4:01 AM, Alexey Eromenkoaerom...@redhat.com wrote:

 Even better would be to use /usr/bin/python2.

That doesn't seem to exist, on Ubuntu at least.

 This is because future distros will include python3, which is incompatible 
 with python2 code.

 python will be symlink of python3.

 -Alexey

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-10 Thread Michael S. Tsirkin
On Wed, Jun 10, 2009 at 03:07:34PM +0100, Paul Brook wrote:
Note that platform must set a flag to declare MSI supported.
For PC this will be set by APIC.
  
   This sounds wrong. The device shouldn't know or care whether the system
   has a MSI capable interrupt controller. That's for the guest OS to figure
   out.
 
  You are right of course. In theory there's nothing that breaks if I
  set this flag to on, on all platforms. OTOH if qemu emulates some
  controller incorrectly, guest might misdetect MSI support in the
  controller, and things will break horribly.
 
  It seems safer to have a flag that can be enabled by people
  that know about a specific platform.
 
 No. The solution is to fix whatever is broken.

That's easy enough then. Patch below.

 
 If we really need to avoid MSI-X capable devices then that should be done 
 explicity per-device. i.e. you have a different virtio-net device that does 
 not use MSI-X.
 
 Paul

Why should it be done per-device?

---

Don't add an option for platforms to disable MSI-X in all devices.
Paul Brook will find and fix all platforms that have broken MSI-X
emulation.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

---

diff --git a/hw/apic.c b/hw/apic.c
index ed03a36..9d44061 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -945,7 +945,6 @@ int apic_init(CPUState *env)
 s-cpu_env = env;
 
 apic_reset(s);
-msix_supported = 1;
 
 /* XXX: mapping more APICs at the same memory location */
 if (apic_io_memory == 0) {
diff --git a/hw/msix.c b/hw/msix.c
index ce4e6ba..16efb27 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -62,9 +62,6 @@
 /* Flag to globally disable MSI-X support */
 int msix_disable;
 
-/* Flag for interrupt controller to declare MSI-X support */
-int msix_supported;
-
 /* Add MSI-X capability to the config space for the device. */
 /* Given a bar and its size, add MSI-X table on top of it
  * and fill MSI-X capability in the config space.
@@ -232,10 +229,7 @@ void msix_mmio_map(PCIDevice *d, int region_num,
 int msix_init(struct PCIDevice *dev, unsigned short nentries,
   unsigned bar_nr, unsigned bar_size)
 {
-int ret = -ENOMEM;
-/* Nothing to do if MSI is not supported by interrupt controller */
-if (!msix_supported)
-return -ENOTTY;
+int ret;
 
 if (nentries  MSIX_MAX_ENTRIES)
 return -EINVAL;
diff --git a/hw/msix.h b/hw/msix.h
index 79e84a3..2fcadd3 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -30,6 +30,5 @@ void msix_notify(PCIDevice *dev, unsigned vector);
 void msix_reset(PCIDevice *dev);
 
 extern int msix_disable;
-extern int msix_supported;
 
 #endif
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-10 Thread Paul Brook
  If we really need to avoid MSI-X capable devices then that should be done
  explicity per-device. i.e. you have a different virtio-net device that
  does not use MSI-X.
 
  Paul

 Why should it be done per-device?


Because otherwise you end up with the horrible hacks that you're currently 
tripping over: devices have to magically morph into a different device when 
you load a VM. That's seems just plain wrong to me. Loading a VM shouldn't not 
do anything that can't happen during normal operation.

Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TODO list for qemu+KVM networking performance v2

2009-06-10 Thread Rusty Russell
On Wed, 10 Jun 2009 03:56:31 pm Dor Laor wrote:
 Rusty Russell wrote:
  The current theoretical hole is that the host suppresses notifications
  using the VIRTIO_AVAIL_F_NO_NOTIFY flag, but we can get a number of
  notifications in before it gets to that suppression.  You can use a
  counter to improve this: you only notify when they're equal, and inc when
  you notify.  That way you suppress further notifications even if the
  other side takes ages to wake up. In practice, this shouldn't be played
  with until we have full aio (or equiv in kernel) for other side: host
  xmit tends to be too fast at the moment and we get a notification per
  packet anyway.

 Xen ring has the exact optimization for ages. imho we should have it
 too, regardless of aio.
 It reduces #vmexits/spurious wakeups and it is very simple to implement.

But look at number of wakeups received vs notifications sent: I just don't see 
any benefit there at the moment.  As I said, improving the host code might 
change that significantly.

And implementing it the other way is v. v. hard given the nature of interrupts 
(shared and coalesced).

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-10 Thread Michael S. Tsirkin
On Wed, Jun 10, 2009 at 03:39:05PM +0100, Paul Brook wrote:
   If we really need to avoid MSI-X capable devices then that should be done
   explicity per-device. i.e. you have a different virtio-net device that
   does not use MSI-X.
  
   Paul
 
  Why should it be done per-device?
 
 
 Because otherwise you end up with the horrible hacks that you're currently 
 tripping over: devices have to magically morph into a different device when 
 you load a VM.

No, the hacks are there so I that I can support loading and saving from
non-MSI setups in a backward-compatible way.

The flag we are discussing is set at qemu startup and can't change
across load/store.

 That's seems just plain wrong to me.
 Loading a VM shouldn't not 
 do anything that can't happen during normal operation.

At least wrt pci, we are very far from this state: load just overwrites
all registers, readonly or not, which can never happen during normal
operation. And if we fix it, and only edit BAR registers, the happy
result will be that we can't add functionality to PCI devices without
breaking guests and/or breaking loading from old images each time.

 Paul



-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities

2009-06-10 Thread Glauber Costa
On Wed, Jun 10, 2009 at 12:54:15PM +0300, Michael S. Tsirkin wrote:
 On Tue, Jun 09, 2009 at 02:11:14PM -0300, Glauber Costa wrote:
  On Fri, Jun 05, 2009 at 01:23:15PM +0300, Michael S. Tsirkin wrote:
   Add routines to manage PCI capability list. First user will be MSI-X.
   
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   ---
hw/pci.c |   98 
   --
hw/pci.h |   18 +++-
2 files changed, 106 insertions(+), 10 deletions(-)
   
   diff --git a/hw/pci.c b/hw/pci.c
   index 361d741..ed011b5 100644
   --- a/hw/pci.c
   +++ b/hw/pci.c
   @@ -130,12 +130,13 @@ void pci_device_save(PCIDevice *s, QEMUFile *f)
int version = s-cap_present ? 3 : 2;
int i;

   -qemu_put_be32(f, version); /* PCI device version */
   +/* PCI device version and capabilities */
   +qemu_put_be32(f, version);
   +if (version = 3)
   +qemu_put_be32(f, s-cap_present);
qemu_put_buffer(f, s-config, 256);
for (i = 0; i  4; i++)
qemu_put_be32(f, s-irq_state[i]);
   -if (version = 3)
   -qemu_put_be32(f, s-cap_present);
}
  What is it doing here?
  You should just do it right in the first patch, instead of doing in
  one way there, and fixing here.
  

int pci_device_load(PCIDevice *s, QEMUFile *f)
   @@ -146,12 +147,6 @@ int pci_device_load(PCIDevice *s, QEMUFile *f)
version_id = qemu_get_be32(f);
if (version_id  3)
return -EINVAL;
   -qemu_get_buffer(f, s-config, 256);
   -pci_update_mappings(s);
   -
   -if (version_id = 2)
   -for (i = 0; i  4; i ++)
   -s-irq_state[i] = qemu_get_be32(f);
if (version_id = 3)
s-cap_present = qemu_get_be32(f);
else
  ditto.
   @@ -160,6 +155,18 @@ int pci_device_load(PCIDevice *s, QEMUFile *f)
if (s-cap_present  ~s-cap_supported)
return -EINVAL;

   +qemu_get_buffer(f, s-config, 256);
   +pci_update_mappings(s);
   +
   +if (version_id = 2)
   +for (i = 0; i  4; i ++)
   +s-irq_state[i] = qemu_get_be32(f);
   +/* Clear wmask and used bits for capabilities.
   +   Must be restored separately, since capabilities can
   +   be placed anywhere in config space. */
   +memset(s-used, 0, PCI_CONFIG_SPACE_SIZE);
   +for (i = PCI_CONFIG_HEADER_SIZE; i  PCI_CONFIG_SPACE_SIZE; ++i)
   +s-wmask[i] = 0xff;
return 0;
}
  Sorry, I don't exactly understand it. Although it can be anywhere, what do 
  we actually
  lose by keeping it at the same place in config space?
 
 We lose the ability to let user control the capabilities exposed
 by the device.
 
 And generally, I dislike arbitrary limitations. The PCI spec says the
 capability can be anywhere, implementing a linked list of caps is simple
 enough to not invent abritrary restrictions.
yes, but this is migration time, right?

caps can be anywhere, but we don't expect it to change during machine execution
lifetime.

Or I am just confused by the name pci_device_load ?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TODO list for qemu+KVM networking performance v2

2009-06-10 Thread Michael S. Tsirkin
On Thu, Jun 11, 2009 at 12:09:33AM +0930, Rusty Russell wrote:
 On Wed, 10 Jun 2009 03:56:31 pm Dor Laor wrote:
  Rusty Russell wrote:
   The current theoretical hole is that the host suppresses notifications
   using the VIRTIO_AVAIL_F_NO_NOTIFY flag, but we can get a number of
   notifications in before it gets to that suppression.  You can use a
   counter to improve this: you only notify when they're equal, and inc when
   you notify.  That way you suppress further notifications even if the
   other side takes ages to wake up. In practice, this shouldn't be played
   with until we have full aio (or equiv in kernel) for other side: host
   xmit tends to be too fast at the moment and we get a notification per
   packet anyway.
 
  Xen ring has the exact optimization for ages. imho we should have it
  too, regardless of aio.
  It reduces #vmexits/spurious wakeups and it is very simple to implement.
 
 But look at number of wakeups received vs notifications sent: I just don't 
 see 
 any benefit there at the moment.  As I said, improving the host code might 
 change that significantly.
 
 And implementing it the other way is v. v. hard given the nature of 
 interrupts 
 (shared and coalesced).

I agree it's not such a simple thing to implement race-free,
so I do buy the argument that we shouldn't unless it gives
a performance benefit.

But I don't understand how aio will make implementing it easier -
or are you merely saying that it will make it worthwhile?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] qemu-kvm BIOS move _PR to SSDT

2009-06-10 Thread Jes Sorensen

Hi,

Per Avi's request, here's a patch which moves the current _PR section
from the DSDT table into a new SSDT table.

The long term idea is to be able to have multiple different SSDTs for
different sized systems, and in particular allow to keep the legacy one
for supporting legacy operating systems which cannot handle more then 15
vcpus.

Cheers,
Jes

Move _PR block from the DSDT to a new SSDT in the KVM BIOS.

This will make it possible to plug in different SSDTs with different
processor counts in the future.

Signed-off-by: Jes Sorensen j...@sgi.com

---
 kvm/bios/Makefile  |   10 ++-
 kvm/bios/acpi-dsdt.dsl |  115 
 kvm/bios/acpi-ssdt.dsl |  140 +
 kvm/bios/rombios32.c   |   16 -
 4 files changed, 162 insertions(+), 119 deletions(-)

Index: qemu-kvm/kvm/bios/Makefile
===
--- qemu-kvm.orig/kvm/bios/Makefile
+++ qemu-kvm/kvm/bios/Makefile
@@ -108,13 +108,19 @@ rombios32.bin: rombios32.out rombios.h
 rombios32.out: rombios32start.o rombios32.o vapic.o rombios32.ld
 	ld -o $@ -T rombios32.ld rombios32start.o vapic.o rombios32.o
 
-rombios32.o: rombios32.c acpi-dsdt.hex
+rombios32.o: rombios32.c acpi-dsdt.hex acpi-ssdt.hex
 	$(GCC) -m32 -O2 -Wall -c -o $@ $
 
 acpi-dsdt.hex: acpi-dsdt.dsl
 	cpp -P $ $.i
 	iasl -tc -p $@ $.i
-	sed -i -e's/^unsigned/const unsigned/' $@
+	sed -i -e's/^unsigned char AmlCode/const unsigned char DSDTCode/' $@
+	rm $.i
+
+acpi-ssdt.hex: acpi-ssdt.dsl
+	cpp -P $ $.i
+	iasl -tc -p $@ $.i
+	sed -i -e's/^unsigned char AmlCode/const unsigned char SSDTCode/' $@
 	rm $.i
 
 rombios32start.o: rombios32start.S
Index: qemu-kvm/kvm/bios/acpi-dsdt.dsl
===
--- qemu-kvm.orig/kvm/bios/acpi-dsdt.dsl
+++ qemu-kvm/kvm/bios/acpi-dsdt.dsl
@@ -25,119 +25,6 @@ DefinitionBlock (
 0x1 // OEM Revision
 )
 {
-   Scope (\_PR)
-   {
-	/* pointer to first element of MADT APIC structures */
-	OperationRegion(ATPR, SystemMemory, 0x0514, 4)
-	Field (ATPR, DwordAcc, NoLock, Preserve)
-	{
-		ATP, 32
-	}
-
-#define madt_addr(nr)  Add (ATP, Multiply(nr, 8))
-
-#define gen_processor(nr, name) \
-	Processor (CPU##name, nr, 0xb010, 0x06) {   \
-	OperationRegion (MATR, SystemMemory, madt_addr(nr), 8)  \
-	Field (MATR, ByteAcc, NoLock, Preserve) \
-	{   \
-	MAT, 64 \
-	}   \
-	Field (MATR, ByteAcc, NoLock, Preserve) \
-	{   \
-	Offset(4),  \
-	FLG, 1  \
-	}   \
-Method(_MAT, 0) {   \
-		Return(MAT) \
-}   \
-Method (_STA) { \
-If (FLG) { Return(0xF) } Else { Return(0x9) }   \
-}   \
-}   \
-
-
-	gen_processor(0, 0)
-	gen_processor(1, 1)
-	gen_processor(2, 2)
-	gen_processor(3, 3)
-	gen_processor(4, 4)
-	gen_processor(5, 5)
-	gen_processor(6, 6)
-	gen_processor(7, 7)
-	gen_processor(8, 8)
-	gen_processor(9, 9)
-	gen_processor(10, A)
-	gen_processor(11, B)
-	gen_processor(12, C)
-	gen_processor(13, D)
-	gen_processor(14, E)
-
-	Method (NTFY, 2) {
-#define gen_ntfy(nr)\
-	If (LEqual(Arg0, 0x##nr)) { \
-		If (LNotEqual(Arg1, \_PR.CPU##nr.FLG)) {\
-			Store (Arg1, \_PR.CPU##nr.FLG)  \
-			If (LEqual(Arg1, 1)) {  \
-Notify(CPU##nr, 1)  \
-			} Else {\
-Notify(CPU##nr, 3)  \
-			}   \
-		}   \
-	}
-		gen_ntfy(0)
-		gen_ntfy(1)
-		gen_ntfy(2)
-		gen_ntfy(3)
-		gen_ntfy(4)
-		gen_ntfy(5)
-		gen_ntfy(6)
-		gen_ntfy(7)
-		gen_ntfy(8)
-		gen_ntfy(9)
-		gen_ntfy(A)
-		gen_ntfy(B)
-		gen_ntfy(C)
-		gen_ntfy(D)
-		gen_ntfy(E)
-		Return(One)
-	}
-
-	OperationRegion(PRST, SystemIO, 0xaf00, 32)
-	Field (PRST, ByteAcc, NoLock, Preserve)
-	{
-		PRS, 256
-	}
-
-	Method(PRSC, 0) {
-		Store(PRS, Local3)
-		Store(Zero, Local0)
-		While(LLess(Local0, 32)) {
-			Store(Zero, Local1)
-			Store(DerefOf(Index(Local3, Local0)), Local2)
-			While(LLess(Local1, 8)) {
-

Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities

2009-06-10 Thread Michael S. Tsirkin
On Wed, Jun 10, 2009 at 11:55:40AM -0300, Glauber Costa wrote:
 On Wed, Jun 10, 2009 at 12:54:15PM +0300, Michael S. Tsirkin wrote:
  On Tue, Jun 09, 2009 at 02:11:14PM -0300, Glauber Costa wrote:
   On Fri, Jun 05, 2009 at 01:23:15PM +0300, Michael S. Tsirkin wrote:
Add routines to manage PCI capability list. First user will be MSI-X.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/pci.c |   98 
--
 hw/pci.h |   18 +++-
 2 files changed, 106 insertions(+), 10 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 361d741..ed011b5 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -130,12 +130,13 @@ void pci_device_save(PCIDevice *s, QEMUFile *f)
 int version = s-cap_present ? 3 : 2;
 int i;
 
-qemu_put_be32(f, version); /* PCI device version */
+/* PCI device version and capabilities */
+qemu_put_be32(f, version);
+if (version = 3)
+qemu_put_be32(f, s-cap_present);
 qemu_put_buffer(f, s-config, 256);
 for (i = 0; i  4; i++)
 qemu_put_be32(f, s-irq_state[i]);
-if (version = 3)
-qemu_put_be32(f, s-cap_present);
 }
   What is it doing here?
   You should just do it right in the first patch, instead of doing in
   one way there, and fixing here.
   
 
 int pci_device_load(PCIDevice *s, QEMUFile *f)
@@ -146,12 +147,6 @@ int pci_device_load(PCIDevice *s, QEMUFile *f)
 version_id = qemu_get_be32(f);
 if (version_id  3)
 return -EINVAL;
-qemu_get_buffer(f, s-config, 256);
-pci_update_mappings(s);
-
-if (version_id = 2)
-for (i = 0; i  4; i ++)
-s-irq_state[i] = qemu_get_be32(f);
 if (version_id = 3)
 s-cap_present = qemu_get_be32(f);
 else
   ditto.
@@ -160,6 +155,18 @@ int pci_device_load(PCIDevice *s, QEMUFile *f)
 if (s-cap_present  ~s-cap_supported)
 return -EINVAL;
 
+qemu_get_buffer(f, s-config, 256);
+pci_update_mappings(s);
+
+if (version_id = 2)
+for (i = 0; i  4; i ++)
+s-irq_state[i] = qemu_get_be32(f);
+/* Clear wmask and used bits for capabilities.
+   Must be restored separately, since capabilities can
+   be placed anywhere in config space. */
+memset(s-used, 0, PCI_CONFIG_SPACE_SIZE);
+for (i = PCI_CONFIG_HEADER_SIZE; i  PCI_CONFIG_SPACE_SIZE; ++i)
+s-wmask[i] = 0xff;
 return 0;
 }
   Sorry, I don't exactly understand it. Although it can be anywhere, what 
   do we actually
   lose by keeping it at the same place in config space?
  
  We lose the ability to let user control the capabilities exposed
  by the device.
  
  And generally, I dislike arbitrary limitations. The PCI spec says the
  capability can be anywhere, implementing a linked list of caps is simple
  enough to not invent abritrary restrictions.
 yes, but this is migration time, right?

I think so, yes.

 
 caps can be anywhere, but we don't expect it to change during machine 
 execution
 lifetime.
 
 Or I am just confused by the name pci_device_load ?

Right. So I want to load an image and it has capability X at offset Y.
wmask has to match. I don't want to assume that we never change Y
for the device without breaking old images, so I clear wmask here
and set it up again after looking up capabilities that I loaded.

Maybe this explanation should go into the comment above?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [KVM-AUTOTEST PATCH] Make all programs on kvm test use /usr/bin/python - take 2

2009-06-10 Thread Martin Bligh
Looks good.

On Tue, Jun 9, 2009 at 5:57 PM, Lucas Meneghel Rodriguesl...@redhat.com wrote:
 All kvm modules that can be used as stand alone programs were
 updated to use #!/usr/bin/python instead of #!/usr/bin/env python,
 complying with the rest of the autotest code base. As suggested
 by Martin, common.py was added. With this, the stand alone
 programs will be able to use the autotest library namespace and
 choose the best python interpreter available in the system.

 Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
 ---
  client/tests/kvm/common.py           |    8 
  client/tests/kvm/fix_cdkeys.py       |    3 ++-
  client/tests/kvm/kvm_config.py       |    4 +++-
  client/tests/kvm/make_html_report.py |    5 +++--
  client/tests/kvm/stepeditor.py       |    4 ++--
  client/tests/kvm/stepmaker.py        |    4 +++-
  6 files changed, 21 insertions(+), 7 deletions(-)
  create mode 100644 client/tests/kvm/common.py

 diff --git a/client/tests/kvm/common.py b/client/tests/kvm/common.py
 new file mode 100644
 index 000..ce78b85
 --- /dev/null
 +++ b/client/tests/kvm/common.py
 @@ -0,0 +1,8 @@
 +import os, sys
 +dirname = os.path.dirname(sys.modules[__name__].__file__)
 +client_dir = os.path.abspath(os.path.join(dirname, .., ..))
 +sys.path.insert(0, client_dir)
 +import setup_modules
 +sys.path.pop(0)
 +setup_modules.setup(base_path=client_dir,
 +                    root_module_name=autotest_lib.client)
 diff --git a/client/tests/kvm/fix_cdkeys.py b/client/tests/kvm/fix_cdkeys.py
 index 4f7a824..7a821fa 100755
 --- a/client/tests/kvm/fix_cdkeys.py
 +++ b/client/tests/kvm/fix_cdkeys.py
 @@ -1,5 +1,6 @@
 -#!/usr/bin/env python
 +#!/usr/bin/python
  import shutil, os, sys
 +import common

  
  Program that replaces the CD keys present on a KVM autotest configuration 
 file.
 diff --git a/client/tests/kvm/kvm_config.py b/client/tests/kvm/kvm_config.py
 index 40f16f1..13fdac2 100755
 --- a/client/tests/kvm/kvm_config.py
 +++ b/client/tests/kvm/kvm_config.py
 @@ -1,4 +1,6 @@
 +#!/usr/bin/python
  import re, os, sys, StringIO
 +import common
  from autotest_lib.client.common_lib import error

  
 @@ -356,7 +358,7 @@ class config:
                 # (inside an exception or inside subvariants)
                 if restricted:
                     e_msg = Using variants in this context is not allowed
 -                    raise error.AutotestError()
 +                    raise error.AutotestError(e_msg)
                 if self.debug and not restricted:
                     self.__debug_print(indented_line,
                                      Entering variants block (%d dicts in
 diff --git a/client/tests/kvm/make_html_report.py 
 b/client/tests/kvm/make_html_report.py
 index 6aed39e..e69367b 100755
 --- a/client/tests/kvm/make_html_report.py
 +++ b/client/tests/kvm/make_html_report.py
 @@ -1,4 +1,7 @@
  #!/usr/bin/python
 +import os, sys, re, getopt, time, datetime, commands
 +import common
 +
  
  Script used to parse the test results and generate an HTML report.

 @@ -7,8 +10,6 @@ Script used to parse the test results and generate an HTML 
 report.
 �...@author: Dror Russo (dru...@redhat.com)
  

 -import os, sys, re, getopt, time, datetime, commands
 -

  format_css=
  html,body {
 diff --git a/client/tests/kvm/stepeditor.py b/client/tests/kvm/stepeditor.py
 index 9669200..e7794ac 100755
 --- a/client/tests/kvm/stepeditor.py
 +++ b/client/tests/kvm/stepeditor.py
 @@ -1,6 +1,6 @@
 -#!/usr/bin/env python
 +#!/usr/bin/python
  import pygtk, gtk, os, glob, shutil, sys, logging
 -import ppm_utils
 +import common, ppm_utils
  pygtk.require('2.0')

  
 diff --git a/client/tests/kvm/stepmaker.py b/client/tests/kvm/stepmaker.py
 index 2b7fd54..8f16ffd 100644
 --- a/client/tests/kvm/stepmaker.py
 +++ b/client/tests/kvm/stepmaker.py
 @@ -1,8 +1,10 @@
 -#!/usr/bin/env python
 +#!/usr/bin/python
  import pygtk, gtk, gobject, time, os, commands
 +import common
  from autotest_lib.client.common_lib import error
  import kvm_utils, logging, ppm_utils, stepeditor
  pygtk.require('2.0')
 +
  
  Step file creator/editor.

 --
 1.6.2.2

 ___
 Autotest mailing list
 autot...@test.kernel.org
 http://test.kernel.org/cgi-bin/mailman/listinfo/autotest

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] qemu-kvm BIOS move _PR to SSDT

2009-06-10 Thread Gleb Natapov
On Wed, Jun 10, 2009 at 05:01:18PM +0200, Jes Sorensen wrote:
 Index: qemu-kvm/kvm/bios/acpi-ssdt.dsl
 ===
 --- /dev/null
 +++ qemu-kvm/kvm/bios/acpi-ssdt.dsl
 @@ -0,0 +1,140 @@
 +/*
 + * Bochs/QEMU ACPI SSDT ASL definition
 + *
 + * Copyright (c) 2006 Fabrice Bellard
I am not sure that there is even one bit of code below written by Fabrice.

 + * Copyright (c) 2009 SGI, Jes Sorensen j...@sgi.com
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License version 2 as published by the Free Software Foundation.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, write to the Free Software
 + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
 + */
 +DefinitionBlock (
 +acpi-ssdt.aml,// Output Filename
 +SSDT, // Signature
 +0x01,   // DSDT Compliance Revision
 +BXPC, // OEMID
 +BXSSDT,   // TABLE ID
 +0x1 // OEM Revision
 +)
 +{
 +   Scope (\_PR)
 +   {
 + /* pointer to first element of MADT APIC structures */
 + OperationRegion(ATPR, SystemMemory, 0x0514, 4)
 + Field (ATPR, DwordAcc, NoLock, Preserve)
 + {
 + ATP, 32
 + }
 +
 +#define madt_addr(nr)  Add (ATP, Multiply(nr, 8))
 +
 +#define gen_processor(nr, name)  \
 + Processor (C##name, nr, 0xb010, 0x06) {   \
 + OperationRegion (MATR, SystemMemory, madt_addr(nr), 8)  \
 + Field (MATR, ByteAcc, NoLock, Preserve) \
 + {   \
 + MAT, 64 \
 + }   \
 + Field (MATR, ByteAcc, NoLock, Preserve) \
 + {   \
 + Offset(4),  \
 + FLG, 1  \
 + }   \
 +Method(_MAT, 0) {   \
 + Return(MAT) \
 +}   \
 +Method (_STA) { \
 +If (FLG) { Return(0xF) } Else { Return(0x9) }   \
 +}   \
 +}   \
 +
 +
 + gen_processor(0, 0)
 + gen_processor(1, 1)
 + gen_processor(2, 2)
 + gen_processor(3, 3)
 + gen_processor(4, 4)
 + gen_processor(5, 5)
 + gen_processor(6, 6)
 + gen_processor(7, 7)
 + gen_processor(8, 8)
 + gen_processor(9, 9)
 + gen_processor(10, A)
 + gen_processor(11, B)
 + gen_processor(12, C)
 + gen_processor(13, D)
 + gen_processor(14, E)
 +
 + Method (NTFY, 2) {
 +#define gen_ntfy(nr)\
 + If (LEqual(Arg0, 0x##nr)) { \
 + If (LNotEqual(Arg1, \_PR.C##nr.FLG)) {  \
 + Store (Arg1, \_PR.C##nr.FLG)\
 + If (LEqual(Arg1, 1)) {  \
 + Notify(C##nr, 1)\
 + } Else {\
 + Notify(C##nr, 3)\
 + }   \
 + }   \
 + }
 + gen_ntfy(0)
 + gen_ntfy(1)
 + gen_ntfy(2)
 + gen_ntfy(3)
 + gen_ntfy(4)
 + gen_ntfy(5)
 + gen_ntfy(6)
 + gen_ntfy(7)
 + gen_ntfy(8)
 + gen_ntfy(9)
 + gen_ntfy(A)
 + gen_ntfy(B)
 + gen_ntfy(C)
 + gen_ntfy(D)
 + gen_ntfy(E)
 + Return(One)
 + }
 +
 + OperationRegion(PRST, SystemIO, 0xaf00, 32)
 + Field (PRST, ByteAcc, NoLock, Preserve)
 + {
 + PRS, 256
 + }
 +
 + Method(PRSC, 0) {
 + Store(PRS, Local3)
 + Store(Zero, Local0)
 + While(LLess(Local0, 

Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-10 Thread Paul Brook
  That's seems just plain wrong to me.
  Loading a VM shouldn't not
  do anything that can't happen during normal operation.

 At least wrt pci, we are very far from this state: load just overwrites
 all registers, readonly or not, which can never happen during normal
 operation.

IMO that code is wrong. We should only be loading things that the guest can 
change (directly or indirectly).

Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CentOS-devel] Latest kvm packages for CentOS 5.3

2009-06-10 Thread Dag Wieers

On Wed, 10 Jun 2009, Federico Simoncelli wrote:


I've been working quite extensively with kvm on CentOS 5.3 lately.
If you are interested in the latest rpm of kvm-kmod-2.6.30-rc8,
qemu-kvm-0.10.5 and libvirt-0.6.4 you can temporary find them here:

http://update.nethesis.it/kvm/

I've had no problem so far using these packages.
Feedback is welcome.


RHEL5.4 is expected to have KVM support, so it would be nice to know in 
advance which version is being included with RHEL 5.4. Then we can update 
our own CentOS kvm kmod for testing and reporting upstream the issue(s) we 
still find.


Today the kvm module is still missing from the upstream 2.6.18-152.el5 
kernel, just as fuse which is also expected to appear.


--
--   dag wieers,  d...@centos.org,  http://dag.wieers.com/   --
[Any errors in spelling, tact or fact are transmission errors]
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TODO list for qemu+KVM networking performance v2

2009-06-10 Thread Avi Kivity

Michael S. Tsirkin wrote:

But I don't understand how aio will make implementing it easier -
or are you merely saying that it will make it worthwhile?
  


If you have aio, the the NIC and the guest proceed in parallel.  If the 
guest is faster (likely), then when it sends the next packet it will see 
that interrupts are disabled and not notify again.  Once aio complete we 
can recheck the queue; if it's empty we reenable notifications.  If 
there's still stuff in it we submit it with notifications disabled.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/5] KVM: MMU: add kvm_mmu_shadow_walk helper

2009-06-10 Thread Avi Kivity

Marcelo Tosatti wrote:

 


Ah, you're right.

I don't think it's worthwhile to add all this just for debugging.  You  
can add a function that dumps the spte chain as well as the features  
MSR, and we can decode it by hand when we see it.  Disadvantage is more  
work for us when we hit the bug, but at least that function is reusable  
in other contexts.



The problem is if someone hits the bug who has no possibility of
reporting the printks. Nice thing about the WARN_ON's there is you can
look up kerneloops.org, match the line number to the kernel version,
and narrow down what bits are wrong (you still need reporter to send
contents of dmesg for full spte value).
  


Well, we can KERN_WARNING instead of KERN_DEBUG in that printout.


Also the bit-by-bit validity checks (the inspect_spte function in vmx.c)
can be used in the mmu audit code (thats the reason for print=0 in the
callback parameters), so it is reusable in other contexes.

What you dislike is hardcoding the bits checked in C code? Don't worry
about the level stuff, will be handled next.
  


I just don't want to introduce yet another function-callback pair for 
such a small thing.  No objection to the code itself.


Printing out the spte hierarchy seems a good idea in other bug contexts, 
so at least that function is reusable.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] qemu-kvm BIOS move _PR to SSDT

2009-06-10 Thread Jes Sorensen

On 06/10/2009 05:13 PM, Gleb Natapov wrote:

+ * Bochs/QEMU ACPI SSDT ASL definition
+ *
+ * Copyright (c) 2006 Fabrice Bellard

I am not sure that there is even one bit of code below written by Fabrice.


Well better safe than sorry, since I copied over a bunch of stuff,
I figured it was simpler to just leave it intact.


+Scope (\)
+{
+/* Debug Output */
+OperationRegion (DBG, SystemIO, 0xb044, 0x04)
+Field (DBG, DWordAcc, NoLock, Preserve)
+{
+DBGL,   32,
+}
+}

Except may be this one, but it should stay in DSDT.


Hmmm, good point. I think I just moved it when I had zero idea what was
going on at first in this AML hell :-)

Try this one.

Cheers,
Jes
Move _PR block from the DSDT to a new SSDT in the KVM BIOS.

This will make it possible to plug in different SSDTs with different
processor counts in the future.

Signed-off-by: Jes Sorensen j...@sgi.com

---
 kvm/bios/Makefile  |   10 +++
 kvm/bios/acpi-dsdt.dsl |  105 ---
 kvm/bios/acpi-ssdt.dsl |  130 +
 kvm/bios/rombios32.c   |   16 --
 4 files changed, 152 insertions(+), 109 deletions(-)

Index: qemu-kvm/kvm/bios/Makefile
===
--- qemu-kvm.orig/kvm/bios/Makefile
+++ qemu-kvm/kvm/bios/Makefile
@@ -108,13 +108,19 @@ rombios32.bin: rombios32.out rombios.h
 rombios32.out: rombios32start.o rombios32.o vapic.o rombios32.ld
 	ld -o $@ -T rombios32.ld rombios32start.o vapic.o rombios32.o
 
-rombios32.o: rombios32.c acpi-dsdt.hex
+rombios32.o: rombios32.c acpi-dsdt.hex acpi-ssdt.hex
 	$(GCC) -m32 -O2 -Wall -c -o $@ $
 
 acpi-dsdt.hex: acpi-dsdt.dsl
 	cpp -P $ $.i
 	iasl -tc -p $@ $.i
-	sed -i -e's/^unsigned/const unsigned/' $@
+	sed -i -e's/^unsigned char AmlCode/const unsigned char DSDTCode/' $@
+	rm $.i
+
+acpi-ssdt.hex: acpi-ssdt.dsl
+	cpp -P $ $.i
+	iasl -tc -p $@ $.i
+	sed -i -e's/^unsigned char AmlCode/const unsigned char SSDTCode/' $@
 	rm $.i
 
 rombios32start.o: rombios32start.S
Index: qemu-kvm/kvm/bios/acpi-dsdt.dsl
===
--- qemu-kvm.orig/kvm/bios/acpi-dsdt.dsl
+++ qemu-kvm/kvm/bios/acpi-dsdt.dsl
@@ -25,108 +25,6 @@ DefinitionBlock (
 0x1 // OEM Revision
 )
 {
-   Scope (\_PR)
-   {
-	/* pointer to first element of MADT APIC structures */
-	OperationRegion(ATPR, SystemMemory, 0x0514, 4)
-	Field (ATPR, DwordAcc, NoLock, Preserve)
-	{
-		ATP, 32
-	}
-
-#define madt_addr(nr)  Add (ATP, Multiply(nr, 8))
-
-#define gen_processor(nr, name) \
-	Processor (CPU##name, nr, 0xb010, 0x06) {   \
-	OperationRegion (MATR, SystemMemory, madt_addr(nr), 8)  \
-	Field (MATR, ByteAcc, NoLock, Preserve) \
-	{   \
-	MAT, 64 \
-	}   \
-	Field (MATR, ByteAcc, NoLock, Preserve) \
-	{   \
-	Offset(4),  \
-	FLG, 1  \
-	}   \
-Method(_MAT, 0) {   \
-		Return(MAT) \
-}   \
-Method (_STA) { \
-If (FLG) { Return(0xF) } Else { Return(0x9) }   \
-}   \
-}   \
-
-
-	gen_processor(0, 0)
-	gen_processor(1, 1)
-	gen_processor(2, 2)
-	gen_processor(3, 3)
-	gen_processor(4, 4)
-	gen_processor(5, 5)
-	gen_processor(6, 6)
-	gen_processor(7, 7)
-	gen_processor(8, 8)
-	gen_processor(9, 9)
-	gen_processor(10, A)
-	gen_processor(11, B)
-	gen_processor(12, C)
-	gen_processor(13, D)
-	gen_processor(14, E)
-
-	Method (NTFY, 2) {
-#define gen_ntfy(nr)\
-	If (LEqual(Arg0, 0x##nr)) { \
-		If (LNotEqual(Arg1, \_PR.CPU##nr.FLG)) {\
-			Store (Arg1, \_PR.CPU##nr.FLG)  \
-			If (LEqual(Arg1, 1)) {  \
-Notify(CPU##nr, 1)  \
-			} Else {\
-Notify(CPU##nr, 3)  \
-			}   \
-		}   \
-	}
-		gen_ntfy(0)
-		gen_ntfy(1)
-		gen_ntfy(2)
-		gen_ntfy(3)
-		gen_ntfy(4)
-		gen_ntfy(5)
-		gen_ntfy(6)
-		gen_ntfy(7)
-		gen_ntfy(8)
-		gen_ntfy(9)
-		gen_ntfy(A)
-		gen_ntfy(B)
-		

[patch 5/6] KVM: MMU audit: audit_mappings tweaks

2009-06-10 Thread Marcelo Tosatti
- Fail early in case gfn_to_pfn returns is_error_pfn.
- For the pre pte write case, avoid spurious gva is valid but spte is notrap 
  messages (the emulation code does the guest write first, so this particular
  case is OK).

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -3092,6 +3092,11 @@ static void audit_mappings_page(struct k
pfn_t pfn = gfn_to_pfn(vcpu-kvm, gfn);
hpa_t hpa = (hpa_t)pfn  PAGE_SHIFT;
 
+   if (is_error_pfn(pfn)) {
+   kvm_release_pfn_clean(pfn);
+   continue;
+   }
+
if (is_shadow_present_pte(ent)
 (ent  PT64_BASE_ADDR_MASK) != hpa)
printk(KERN_ERR xx audit error: (%s) levels %d
@@ -3263,7 +3268,8 @@ static void kvm_mmu_audit(struct kvm_vcp
audit_msg = msg;
audit_rmap(vcpu);
audit_write_protection(vcpu);
-   audit_mappings(vcpu);
+   if (strcmp(pre pte write, audit_msg) != 0)
+   audit_mappings(vcpu);
audit_writable_sptes_have_rmaps(vcpu);
dbg = olddbg;
 }

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 3/6] KVM: MMU audit: update audit_write_protection

2009-06-10 Thread Marcelo Tosatti
- Unsync pages contain writable sptes in the rmap.
- rmaps do not exclusively contain writable sptes anymore.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -3235,20 +3235,28 @@ static void audit_write_protection(struc
struct kvm_mmu_page *sp;
struct kvm_memory_slot *slot;
unsigned long *rmapp;
+   u64 *spte;
gfn_t gfn;
 
list_for_each_entry(sp, vcpu-kvm-arch.active_mmu_pages, link) {
if (sp-role.direct)
continue;
+   if (sp-unsync)
+   continue;
 
gfn = unalias_gfn(vcpu-kvm, sp-gfn);
slot = gfn_to_memslot_unaliased(vcpu-kvm, sp-gfn);
rmapp = slot-rmap[gfn - slot-base_gfn];
-   if (*rmapp)
-   printk(KERN_ERR %s: (%s) shadow page has writable
-   mappings: gfn %lx role %x\n,
+
+   spte = rmap_next(vcpu-kvm, rmapp, NULL);
+   while (spte) {
+   if (*spte  PT_WRITABLE_MASK)
+   printk(KERN_ERR %s: (%s) shadow page has 
+   writable mappings: gfn %lx role %x\n,
   __func__, audit_msg, sp-gfn,
   sp-role.word);
+   spte = rmap_next(vcpu-kvm, rmapp, spte);
+   }
}
 }
 

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/6] mmu audit update v4

2009-06-10 Thread Marcelo Tosatti
Addressing comments, introducing a new helper, handling largepages.

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/6] KVM: MMU audit: update count_writable_mappings / count_rmaps

2009-06-10 Thread Marcelo Tosatti
Under testing, count_writable_mappings returns a value that is 2 integers
larger than what count_rmaps returns.

Suspicion is that either of the two functions is counting a duplicate (either
positively or negatively). 

Modifying check_writable_mappings_rmap to check for rmap existance on
all present MMU pages fails to trigger an error, which should keep Avi
happy.

Also introduce mmu_spte_walk to invoke a callback on all present sptes visible
to the current vcpu, might be useful in the future.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -3020,6 +3020,55 @@ static gva_t canonicalize(gva_t gva)
return gva;
 }
 
+
+typedef void (*inspect_spte_fn) (struct kvm *kvm, struct kvm_mmu_page *sp,
+u64 *sptep);
+
+static void __mmu_spte_walk(struct kvm *kvm, struct kvm_mmu_page *sp,
+   inspect_spte_fn fn)
+{
+   int i;
+
+   for (i = 0; i  PT64_ENT_PER_PAGE; ++i) {
+   u64 ent = sp-spt[i];
+
+   if (is_shadow_present_pte(ent)) {
+   if (sp-role.level  1  !is_large_pte(ent)) {
+   struct kvm_mmu_page *child;
+   child = page_header(ent  PT64_BASE_ADDR_MASK);
+   __mmu_spte_walk(kvm, child, fn);
+   }
+   if (sp-role.level == 1)
+   fn(kvm, sp, sp-spt[i]);
+   }
+   }
+}
+
+static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn)
+{
+   int i;
+   struct kvm_mmu_page *sp;
+
+   if (!VALID_PAGE(vcpu-arch.mmu.root_hpa))
+   return;
+   if (vcpu-arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+   hpa_t root = vcpu-arch.mmu.root_hpa;
+   sp = page_header(root);
+   __mmu_spte_walk(vcpu-kvm, sp, fn);
+   return;
+   }
+   for (i = 0; i  4; ++i) {
+   hpa_t root = vcpu-arch.mmu.pae_root[i];
+
+   if (root  VALID_PAGE(root)) {
+   root = PT64_BASE_ADDR_MASK;
+   sp = page_header(root);
+   __mmu_spte_walk(vcpu-kvm, sp, fn);
+   }
+   }
+   return;
+}
+
 static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte,
gva_t va, int level)
 {
@@ -3112,9 +3161,47 @@ static int count_rmaps(struct kvm_vcpu *
return nmaps;
 }
 
-static int count_writable_mappings(struct kvm_vcpu *vcpu)
+void inspect_spte_has_rmap(struct kvm *kvm, struct kvm_mmu_page *sp, u64 
*sptep)
+{
+   unsigned long *rmapp;
+   struct kvm_mmu_page *rev_sp;
+   gfn_t gfn;
+
+   if (*sptep  PT_WRITABLE_MASK) {
+   rev_sp = page_header(__pa(sptep));
+   gfn = rev_sp-gfns[sptep - rev_sp-spt];
+
+   if (!gfn_to_memslot(kvm, gfn)) {
+   if (!printk_ratelimit())
+   return;
+   printk(KERN_ERR %s: no memslot for gfn %ld\n,
+audit_msg, gfn);
+   printk(KERN_ERR %s: index %ld of sp (gfn=%lx)\n,
+   audit_msg, sptep - rev_sp-spt,
+   rev_sp-gfn);
+   dump_stack();
+   return;
+   }
+
+   rmapp = gfn_to_rmap(kvm, rev_sp-gfns[sptep - rev_sp-spt], 0);
+   if (!*rmapp) {
+   if (!printk_ratelimit())
+   return;
+   printk(KERN_ERR %s: no rmap for writable spte %llx\n,
+audit_msg, *sptep);
+   dump_stack();
+   }
+   }
+
+}
+
+void audit_writable_sptes_have_rmaps(struct kvm_vcpu *vcpu)
+{
+   mmu_spte_walk(vcpu, inspect_spte_has_rmap);
+}
+
+static void check_writable_mappings_rmap(struct kvm_vcpu *vcpu)
 {
-   int nmaps = 0;
struct kvm_mmu_page *sp;
int i;
 
@@ -3131,20 +3218,16 @@ static int count_writable_mappings(struc
continue;
if (!(ent  PT_WRITABLE_MASK))
continue;
-   ++nmaps;
+   inspect_spte_has_rmap(vcpu-kvm, sp, pt[i]);
}
}
-   return nmaps;
+   return;
 }
 
 static void audit_rmap(struct kvm_vcpu *vcpu)
 {
-   int n_rmap = count_rmaps(vcpu);
-   int n_actual = count_writable_mappings(vcpu);
-
-   if (n_rmap != n_actual)
-   printk(KERN_ERR %s: (%s) rmap %d actual %d\n,
-  __func__, audit_msg, n_rmap, n_actual);
+   check_writable_mappings_rmap(vcpu);
+   count_rmaps(vcpu);
 }
 
 

[patch 1/6] KVM: MMU: introduce is_last_spte helper

2009-06-10 Thread Marcelo Tosatti
Hiding some of the last largepage / level interaction (which is useful
for gbpages and for zero based levels).

Also merge the PT_PAGE_TABLE_LEVEL clearing loop in unlink_children.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -250,6 +250,15 @@ static int is_rmap_spte(u64 pte)
return is_shadow_present_pte(pte);
 }
 
+static int is_last_spte(u64 pte, int level)
+{
+   if (level == PT_PAGE_TABLE_LEVEL)
+   return 1;
+   if (level == PT_DIRECTORY_LEVEL  is_large_pte(pte))
+   return 1;
+   return 0;
+}
+
 static pfn_t spte_to_pfn(u64 pte)
 {
return (pte  PT64_BASE_ADDR_MASK)  PAGE_SHIFT;
@@ -1293,25 +1302,17 @@ static void kvm_mmu_page_unlink_children
 
pt = sp-spt;
 
-   if (sp-role.level == PT_PAGE_TABLE_LEVEL) {
-   for (i = 0; i  PT64_ENT_PER_PAGE; ++i) {
-   if (is_shadow_present_pte(pt[i]))
-   rmap_remove(kvm, pt[i]);
-   pt[i] = shadow_trap_nonpresent_pte;
-   }
-   return;
-   }
-
for (i = 0; i  PT64_ENT_PER_PAGE; ++i) {
ent = pt[i];
 
if (is_shadow_present_pte(ent)) {
-   if (!is_large_pte(ent)) {
+   if (!is_last_spte(ent, sp-role.level)) {
ent = PT64_BASE_ADDR_MASK;
mmu_page_remove_parent_pte(page_header(ent),
   pt[i]);
} else {
-   --kvm-stat.lpages;
+   if (is_large_pte(ent))
+   --kvm-stat.lpages;
rmap_remove(kvm, pt[i]);
}
}
@@ -2357,8 +2358,7 @@ static void mmu_pte_write_zap_pte(struct
 
pte = *spte;
if (is_shadow_present_pte(pte)) {
-   if (sp-role.level == PT_PAGE_TABLE_LEVEL ||
-   is_large_pte(pte))
+   if (is_last_spte(pte, sp-role.level))
rmap_remove(vcpu-kvm, spte);
else {
child = page_header(pte  PT64_BASE_ADDR_MASK);

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 4/6] KVM: MMU audit: nontrapping ptes in nonleaf level

2009-06-10 Thread Marcelo Tosatti
It is valid to set non leaf sptes as notrap.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -3084,12 +3084,7 @@ static void audit_mappings_page(struct k
 
va = canonicalize(va);
if (level  1) {
-   if (ent == shadow_notrap_nonpresent_pte)
-   printk(KERN_ERR audit: (%s) nontrapping pte
-   in nonleaf level: levels %d gva %lx
-   level %d pte %llx\n, audit_msg,
-  vcpu-arch.mmu.root_level, va, level, 
ent);
-   else
+   if (is_shadow_present_pte(ent))
audit_mappings_page(vcpu, ent, va, level - 1);
} else {
gpa_t gpa = vcpu-arch.mmu.gva_to_gpa(vcpu, va);

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities

2009-06-10 Thread Michael S. Tsirkin
On Wed, Jun 10, 2009 at 04:24:28PM +0100, Paul Brook wrote:
   caps can be anywhere, but we don't expect it to change during machine
   execution lifetime.
  
   Or I am just confused by the name pci_device_load ?
 
  Right. So I want to load an image and it has capability X at offset Y.
  wmask has to match. I don't want to assume that we never change Y
  for the device without breaking old images, so I clear wmask here
  and set it up again after looking up capabilities that I loaded.
 
 We should not be loading state into a different device (or a similar device 
 with a different set of capabilities).
 
 If you want to provide backwards compatibility then you should do that by 
 creating a device that is the same as the original.  As I mentioned in my 
 earlier mail, loading a snapshot should never do anything that can not be 
 achieved through normal operation.
 
 Paul

Why shouldn't it? You don't load a snapshot while guest is running.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TODO list for qemu+KVM networking performance v2

2009-06-10 Thread Michael S. Tsirkin
On Wed, Jun 10, 2009 at 06:18:01PM +0300, Avi Kivity wrote:
 Michael S. Tsirkin wrote:
 But I don't understand how aio will make implementing it easier -
 or are you merely saying that it will make it worthwhile?
   

 If you have aio, the the NIC and the guest proceed in parallel.  If the  
 guest is faster (likely), then when it sends the next packet it will see  
 that interrupts are disabled and not notify again.  Once aio complete we  
 can recheck the queue; if it's empty we reenable notifications.  If  
 there's still stuff in it we submit it with notifications disabled.

So you are saying that with aio we won't need this optimization at all?
I guess it's late in the day, and my mind is fuzzy...

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-10 Thread Michael S. Tsirkin
On Wed, Jun 10, 2009 at 04:15:04PM +0100, Paul Brook wrote:
   That's seems just plain wrong to me.
   Loading a VM shouldn't not
   do anything that can't happen during normal operation.
 
  At least wrt pci, we are very far from this state: load just overwrites
  all registers, readonly or not, which can never happen during normal
  operation.
 
 IMO that code is wrong. We should only be loading things that the guest can 
 change (directly or indirectly).
 
 Paul

Making it work this way will mean that minor changes to a device can
break backwards compatibility with old images, often in surprising ways.
What are the advantages?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-10 Thread Paul Brook
On Wednesday 10 June 2009, Michael S. Tsirkin wrote:
 On Wed, Jun 10, 2009 at 04:15:04PM +0100, Paul Brook wrote:
That's seems just plain wrong to me.
Loading a VM shouldn't not
do anything that can't happen during normal operation.
  
   At least wrt pci, we are very far from this state: load just overwrites
   all registers, readonly or not, which can never happen during normal
   operation.
 
  IMO that code is wrong. We should only be loading things that the guest
  can change (directly or indirectly).

 Making it work this way will mean that minor changes to a device can
 break backwards compatibility with old images, often in surprising ways.
 What are the advantages?

If you can't create an identical machine from scratch then I don't consider 
snapshot/migration to be a useful feature. i.e. as soon as you shutdown and 
restart the guest it is liable to break anyway.

It may be that the snapshot/migration code wants to include a machine config, 
and create a new machine from that. However this is a separate issue, and 
arguably something your VM manager should be handling for you.

Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TODO list for qemu+KVM networking performance v2

2009-06-10 Thread Avi Kivity

Michael S. Tsirkin wrote:

On Wed, Jun 10, 2009 at 06:18:01PM +0300, Avi Kivity wrote:
  

Michael S. Tsirkin wrote:


But I don't understand how aio will make implementing it easier -
or are you merely saying that it will make it worthwhile?
  
  
If you have aio, the the NIC and the guest proceed in parallel.  If the  
guest is faster (likely), then when it sends the next packet it will see  
that interrupts are disabled and not notify again.  Once aio complete we  
can recheck the queue; if it's empty we reenable notifications.  If  
there's still stuff in it we submit it with notifications disabled.



So you are saying that with aio we won't need this optimization at all?
I guess it's late in the day, and my mind is fuzzy...
  


No, I'm saying with aio the optimization becomes worthwhile.  But I 
joined late in the thread so we may be talking about different things.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] qemu-kvm BIOS move _PR to SSDT

2009-06-10 Thread Gleb Natapov
On Wed, Jun 10, 2009 at 05:30:40PM +0200, Jes Sorensen wrote:
 On 06/10/2009 05:13 PM, Gleb Natapov wrote:
 + * Bochs/QEMU ACPI SSDT ASL definition
 + *
 + * Copyright (c) 2006 Fabrice Bellard
 I am not sure that there is even one bit of code below written by Fabrice.

 Well better safe than sorry, since I copied over a bunch of stuff,
 I figured it was simpler to just leave it intact.

 +Scope (\)
 +{
 +/* Debug Output */
 +OperationRegion (DBG, SystemIO, 0xb044, 0x04)
 +Field (DBG, DWordAcc, NoLock, Preserve)
 +{
 +DBGL,   32,
 +}
 +}
 Except may be this one, but it should stay in DSDT.

 Hmmm, good point. I think I just moved it when I had zero idea what was
 going on at first in this AML hell :-)

 Try this one.

WindowXP/Windows7 BSOD - The system is not (MS)ACPI compliant.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >