[questions] savevm|loadvm

2010-03-30 Thread Wenhao Xu
Hi, all,
   I am working with switching QEMU from running in KVM mode to QEMU
emulatoin mode dynamically.
   Intuitively, if the snapshot created using savevm in kvm mode can
be used by the loadvm command in QEMU emulator mode, the switchment
could makes use of this.  I tried to do so. However, it does not work.
 Any idea how to fix it?
    Thanks for the help.

regards,
Wenhao

--
~_~
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clocksource tsc unstable (delta = -4398046474878 ns)

2010-03-30 Thread Sebastian Hetze
On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
 On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote:
  this message appeared in the KVM guest kern.log last night:
  
  Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable 
  (delta = -4398046474878 ns)
  
  The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
  hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.
  
  If I understand things correct, in kernel/time/clocksource.c
  clocksource_watchdog() checks all the
  /sys/devices/system/clocksource/clocksource0/available_clocksource
  every 0.5sec for an delta of more than 0.0625s. So the tsc must have
  changed more than one hour within two subsequent calls of
  clocksource_watchdog. No event in the host nor anything in the
  guest gives reasonable cause for this step.
  
  However, the number 4398046474878 is only 36226 ns away from
  4*1024*1024*1024*1024
 
   I didn't see any such messages but I've had a recent experience with
 the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
 two separate incidents.  Eerily the exact jumps, as best I can tell from
 logs are of 17592 and 8796 seconds, give or take a second or two.  If
 you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
 nanoseconds.
   What I've done that seems to have avoided this happening again is drop
 KVM_CLOCK kernel option from the kvm guests' kernel.

To my understanding, kvm-clock is the best and most reliable clocksource
available, so I do not think it is a good idea to disable it.

There is a lot of bit shift operation happening with the clocksources,
so there may be a real bug hidden somewhere in the code.
Somehow ntp adjustment is involved, can this cause such huge steps?
Im my case, I actually have NTP running in the guest. However, the
statistics show a pretty stable timing here.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/7] Beginning implementing the AMD IOMMU emulation

2010-03-30 Thread Eduard - Gabriel Munteanu
Hi everybody,

This patchset is intended to provide a start for implementing the
emulation of the AMD IOMMU. For those who aren't aware yet, I intend
to participate as a student in GSoC 2010.

The patches are meant to be applied on top of qemu-kvm.

In short, this demonstrates a mechanism of inserting ACPI tables without
modifying SeaBIOS or other BIOS implementations. I also have a SeaBIOS
equivalent, but I think this approach is better, at least at the moment.

To test, simply boot a Linux kernel inside KVM and look in dmesg for the
IVRS table.

I wouldn't merge this patchset yet, at least stuff after the first patch,
until it accumulates more work. I also didn't test loading ACPI tables from
the command line after these modifications.

I'd appreciate comments on these patches.


Cheers,
Eduard

Eduard - Gabriel Munteanu (7):
  acpi: qemu_realloc() might return a different pointer
  acpi: split and rename acpi_table_add()
  acpi: move table header definition into pc.h
  sparc: rename hw/iommu.c
  x86-64: AMD IOMMU stub
  acpi: cleanup acpi_checksum()
  acpi: fix bug in acpi_checksum() caused by garbage in checksum field

 Makefile.target   |3 +-
 hw/acpi.c |   83 +++-
 hw/amd_iommu.c|  103 +
 hw/pc.c   |2 +
 hw/pc.h   |   20 -
 hw/{iommu.c = sparc_iommu.c} |0
 hw/sun4m.h|2 +-
 vl.c  |2 +-
 8 files changed, 177 insertions(+), 38 deletions(-)
 create mode 100644 hw/amd_iommu.c
 rename hw/{iommu.c = sparc_iommu.c} (100%)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/7] acpi: qemu_realloc() might return a different pointer

2010-03-30 Thread Eduard - Gabriel Munteanu
We mustn't assume qemu_realloc() returns the same pointer in
acpi_table_add(). Therefore, 'p' might be invalid if it's relative to
the old value of acpi_tables.

Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
---
 hw/acpi.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/acpi.c b/hw/acpi.c
index d293127..7c4e8d3 100644
--- a/hw/acpi.c
+++ b/hw/acpi.c
@@ -857,7 +857,7 @@ int acpi_table_add(const char *t)
 char buf[1024], *p, *f;
 struct acpi_table_header acpi_hdr;
 unsigned long val;
-size_t off;
+size_t newlen, off;
 
 memset(acpi_hdr, 0, sizeof(acpi_hdr));
   
@@ -938,9 +938,10 @@ int acpi_table_add(const char *t)
 acpi_tables_len = sizeof(uint16_t);
 acpi_tables = qemu_mallocz(acpi_tables_len);
 }
+newlen = acpi_tables_len + sizeof(uint16_t) + acpi_hdr.length;
+acpi_tables = qemu_realloc(acpi_tables, newlen);
 p = acpi_tables + acpi_tables_len;
-acpi_tables_len += sizeof(uint16_t) + acpi_hdr.length;
-acpi_tables = qemu_realloc(acpi_tables, acpi_tables_len);
+acpi_tables_len = newlen;
 
 acpi_hdr.length = cpu_to_le32(acpi_hdr.length);
 *(uint16_t*)p = acpi_hdr.length;
-- 
1.6.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/7] acpi: split and rename acpi_table_add()

2010-03-30 Thread Eduard - Gabriel Munteanu
We'd like to let emulation code build and insert ACPI tables at bootup,
without depending on hacking the BIOS code. This will be used to provide
an IVRS table for emulating the AMD IOMMU, for instance.

This splits acpi_table_add(), retaining the old behavior of inserting
cmdline-supplied tables under the name of acpi_table_cmdline_add(). The
other two resulting functions can be used for the aforementioned
purpose.

Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
---
 hw/acpi.c |   64 -
 hw/pc.h   |4 ++-
 vl.c  |2 +-
 3 files changed, 46 insertions(+), 24 deletions(-)

diff --git a/hw/acpi.c b/hw/acpi.c
index 7c4e8d3..8eb53da 100644
--- a/hw/acpi.c
+++ b/hw/acpi.c
@@ -840,7 +840,7 @@ struct acpi_table_header
 } __attribute__((packed));
 
 char *acpi_tables;
-size_t acpi_tables_len;
+size_t acpi_tables_len, acpi_tables_prev_len;
 
 static int acpi_checksum(const uint8_t *data, int len)
 {
@@ -851,13 +851,44 @@ static int acpi_checksum(const uint8_t *data, int len)
 return (-sum)  0xff;
 }
 
-int acpi_table_add(const char *t)
+void *acpi_alloc_table(size_t size)
+{
+void *ptr;
+
+if (!acpi_tables) {
+acpi_tables_len = sizeof(uint16_t);
+acpi_tables = qemu_mallocz(acpi_tables_len);
+}
+acpi_tables_prev_len = acpi_tables_len;
+acpi_tables_len += sizeof(uint16_t) + size;
+acpi_tables = qemu_realloc(acpi_tables, acpi_tables_len);
+ptr = acpi_tables + acpi_tables_prev_len;
+
+*(uint16_t *) ptr = size;
+
+return ptr + sizeof(uint16_t);
+}
+
+void acpi_commit_table(void *buf)
+{
+struct acpi_table_header *acpi_hdr = buf;
+size_t size = acpi_tables_len - acpi_tables_prev_len - sizeof(uint16_t);
+
+acpi_hdr-length = cpu_to_le32(size);
+acpi_hdr-checksum = acpi_checksum(buf, size);
+
+/* increase number of tables */
+(*(uint16_t *) acpi_tables) =
+   cpu_to_le32(le32_to_cpu(*(uint16_t *) acpi_tables) + 1);
+}
+
+int acpi_table_cmdline_add(const char *t)
 {
 static const char *dfl_id = QEMUQEMU;
 char buf[1024], *p, *f;
 struct acpi_table_header acpi_hdr;
 unsigned long val;
-size_t newlen, off;
+size_t size, off;
 
 memset(acpi_hdr, 0, sizeof(acpi_hdr));
   
@@ -915,7 +946,7 @@ int acpi_table_add(const char *t)
  buf[0] = '\0';
 }
 
-acpi_hdr.length = sizeof(acpi_hdr);
+size = sizeof(acpi_hdr);
 
 f = buf;
 while (buf[0]) {
@@ -927,27 +958,17 @@ int acpi_table_add(const char *t)
 fprintf(stderr, Can't stat file '%s': %s\n, f, strerror(errno));
 goto out;
 }
-acpi_hdr.length += s.st_size;
+size += s.st_size;
 if (!n)
 break;
 *n = ':';
 f = n + 1;
 }
 
-if (!acpi_tables) {
-acpi_tables_len = sizeof(uint16_t);
-acpi_tables = qemu_mallocz(acpi_tables_len);
-}
-newlen = acpi_tables_len + sizeof(uint16_t) + acpi_hdr.length;
-acpi_tables = qemu_realloc(acpi_tables, newlen);
-p = acpi_tables + acpi_tables_len;
-acpi_tables_len = newlen;
+p = acpi_alloc_table(size);
+off = sizeof(struct acpi_table_header);
 
-acpi_hdr.length = cpu_to_le32(acpi_hdr.length);
-*(uint16_t*)p = acpi_hdr.length;
-p += sizeof(uint16_t);
-memcpy(p, acpi_hdr, sizeof(acpi_hdr));
-off = sizeof(acpi_hdr);
+memcpy(p, acpi_hdr, off);
 
 f = buf;
 while (buf[0]) {
@@ -983,10 +1004,8 @@ int acpi_table_add(const char *t)
 f = n + 1;
 }
 
-((struct acpi_table_header*)p)-checksum = acpi_checksum((uint8_t*)p, off);
-/* increase number of tables */
-(*(uint16_t*)acpi_tables) =
-   cpu_to_le32(le32_to_cpu(*(uint16_t*)acpi_tables) + 1);
+acpi_commit_table(p);
+
 return 0;
 out:
 if (acpi_tables) {
@@ -995,3 +1014,4 @@ out:
 }
 return -1;
 }
+
diff --git a/hw/pc.h b/hw/pc.h
index b599564..0cef140 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -108,7 +108,9 @@ extern char *acpi_tables;
 extern size_t acpi_tables_len;
 
 void acpi_bios_init(void);
-int acpi_table_add(const char *table_desc);
+void *acpi_alloc_table(size_t size);
+void acpi_commit_table(void *buf);
+int acpi_table_cmdline_add(const char *table_desc);
 
 /* acpi_piix.c */
 i2c_bus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t smb_io_base,
diff --git a/vl.c b/vl.c
index d959fdb..0efba90 100644
--- a/vl.c
+++ b/vl.c
@@ -5492,7 +5492,7 @@ int main(int argc, char **argv, char **envp)
 rtc_td_hack = 1;
 break;
 case QEMU_OPTION_acpitable:
-if(acpi_table_add(optarg)  0) {
+if(acpi_table_cmdline_add(optarg)  0) {
 fprintf(stderr, Wrong acpi table provided\n);
 exit(1);
 }
-- 
1.6.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[RFC PATCH 4/7] sparc: rename hw/iommu.c

2010-03-30 Thread Eduard - Gabriel Munteanu
hw/iommu.c concerns the SPARC IOMMU. However we intend to implement the
AMD IOMMU, which could lead to confusion unless we rename the former.

Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
---
 Makefile.target   |2 +-
 hw/{iommu.c = sparc_iommu.c} |0
 hw/sun4m.h|2 +-
 3 files changed, 2 insertions(+), 2 deletions(-)
 rename hw/{iommu.c = sparc_iommu.c} (100%)

diff --git a/Makefile.target b/Makefile.target
index 4d88543..cbe19a6 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -305,7 +305,7 @@ obj-sparc-y += vga.o vga-pci.o
 obj-sparc-y += fdc.o mc146818rtc.o serial.o
 obj-sparc-y += cirrus_vga.o parallel.o
 else
-obj-sparc-y = sun4m.o lance.o tcx.o iommu.o slavio_intctl.o
+obj-sparc-y = sun4m.o lance.o tcx.o sparc_iommu.o slavio_intctl.o
 obj-sparc-y += slavio_timer.o slavio_misc.o fdc.o sparc32_dma.o
 obj-sparc-y += cs4231.o eccmemctl.o sbi.o sun4c_intctl.o
 endif
diff --git a/hw/iommu.c b/hw/sparc_iommu.c
similarity index 100%
rename from hw/iommu.c
rename to hw/sparc_iommu.c
diff --git a/hw/sun4m.h b/hw/sun4m.h
index ce97ee5..5007924 100644
--- a/hw/sun4m.h
+++ b/hw/sun4m.h
@@ -5,7 +5,7 @@
 
 /* Devices used by sparc32 system.  */
 
-/* iommu.c */
+/* sparc_iommu.c */
 void sparc_iommu_memory_rw(void *opaque, target_phys_addr_t addr,
  uint8_t *buf, int len, int is_write);
 static inline void sparc_iommu_memory_read(void *opaque,
-- 
1.6.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 6/7] acpi: cleanup acpi_checksum()

2010-03-30 Thread Eduard - Gabriel Munteanu
This adds newlines in acpi_checksum() to separate the declarations, the
body and the return statement.

Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
---
 hw/acpi.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/hw/acpi.c b/hw/acpi.c
index 3794f70..f067f85 100644
--- a/hw/acpi.c
+++ b/hw/acpi.c
@@ -832,9 +832,11 @@ size_t acpi_tables_len, acpi_tables_prev_len;
 static int acpi_checksum(const uint8_t *data, int len)
 {
 int sum, i;
+
 sum = 0;
 for(i = 0; i  len; i++)
 sum += data[i];
+
 return (-sum)  0xff;
 }
 
-- 
1.6.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 7/7] acpi: fix bug in acpi_checksum() caused by garbage in checksum field

2010-03-30 Thread Eduard - Gabriel Munteanu
The whole table must sum to zero. We need to ignore garbage in the
checksum field (i.e. consider it zero) when checksumming. It is
legitimate to have garbage there, as the checksum makes sense only when
the table has been filled.

Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
---
 hw/acpi.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/hw/acpi.c b/hw/acpi.c
index f067f85..bb015f3 100644
--- a/hw/acpi.c
+++ b/hw/acpi.c
@@ -832,11 +832,16 @@ size_t acpi_tables_len, acpi_tables_prev_len;
 static int acpi_checksum(const uint8_t *data, int len)
 {
 int sum, i;
+struct acpi_table_header *acpi_hdr;
 
 sum = 0;
 for(i = 0; i  len; i++)
 sum += data[i];
 
+/* Ignore preexisting garbage in checksum. */
+acpi_hdr = (struct acpi_table_header *) data;
+sum -= acpi_hdr-checksum;
+
 return (-sum)  0xff;
 }
 
-- 
1.6.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 5/7] x86-64: AMD IOMMU stub

2010-03-30 Thread Eduard - Gabriel Munteanu
This currently loads a non-functional IVRS ACPI table and provides a
skeleton for initializing the AMD IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
---
 Makefile.target |1 +
 hw/amd_iommu.c  |  103 +++
 hw/pc.c |2 +
 hw/pc.h |3 ++
 4 files changed, 109 insertions(+), 0 deletions(-)
 create mode 100644 hw/amd_iommu.c

diff --git a/Makefile.target b/Makefile.target
index cbe19a6..dfa4652 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -230,6 +230,7 @@ obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o 
wdt_ib700.o
 obj-i386-y += extboot.o
 obj-i386-y += ne2000-isa.o debugcon.o multiboot.o
 obj-i386-y += testdev.o
+obj-i386-y += amd_iommu.o
 
 obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o
 obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o
diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
new file mode 100644
index 000..b502430
--- /dev/null
+++ b/hw/amd_iommu.c
@@ -0,0 +1,103 @@
+/*
+ * AMD IOMMU emulation
+ *
+ * Copyright (c) 2010 Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+/*
+ * IVRS (I/O Virtualization Reporting Structure) table.
+ *
+ * Describes the AMD IOMMU, as per:
+ * AMD I/O Virtualization Technology (IOMMU) Specification, rev 1.26
+ */
+
+#include stdint.h
+
+#include pc.h
+
+struct ivrs_ivhd
+{
+uint8_ttype;
+uint8_tflags;
+uint16_t   length;
+uint16_t   devid;
+uint16_t   capab_off;
+uint64_t   iommu_base_addr;
+uint16_t   pci_seg_group;
+uint16_t   iommu_info;
+uint32_t   reserved;
+uint32_t   entry;
+} __attribute__ ((__packed__));
+
+struct ivrs_table
+{
+struct acpi_table_headeracpi_hdr;
+uint32_tiv_info;
+uint32_treserved[2];
+struct ivrs_ivhdivhd;
+} __attribute__ ((__packed__));
+
+static const char ivrs_sig[]= IVRS;
+static const char dfl_id[]  = QEMUQEMU;
+
+static void amd_iommu_init_ivrs(void)
+{
+int ivrs_size = sizeof(struct ivrs_table);
+struct ivrs_table *ivrs;
+struct ivrs_ivhd *ivhd;
+struct acpi_table_header *acpi_hdr;
+
+ivrs = acpi_alloc_table(ivrs_size);
+acpi_hdr = ivrs-acpi_hdr;
+ivhd = ivrs-ivhd;
+
+ivrs-iv_info = (64  15) |/* Virtual address space size. */
+(48  8);  /* Physical address space size. */
+
+ivhd-type  = 0x10;
+ivhd-flags = 0;
+ivhd-length= sizeof(struct ivrs_ivhd);
+ivhd-devid = 0;
+ivhd-capab_off = 0;
+ivhd-iommu_base_addr   = 0;
+ivhd-pci_seg_group = 0;
+ivhd-iommu_info= 0;
+ivhd-reserved  = 0;
+ivhd-entry = 0;
+
+strncpy(acpi_hdr-signature, ivrs_sig, 4);
+acpi_hdr-revision = 1;
+strncpy(acpi_hdr-oem_id, dfl_id, 6);
+strncpy(acpi_hdr-oem_table_id, dfl_id, 8);
+acpi_hdr-oem_revision = 1;
+strncpy(acpi_hdr-asl_compiler_id, dfl_id, 4);
+acpi_hdr-asl_compiler_revision = 1;
+
+acpi_commit_table(ivrs);
+}
+
+int amd_iommu_init(void)
+{
+amd_iommu_init_ivrs();
+
+return 0;
+}
+
diff --git a/hw/pc.c b/hw/pc.c
index 0aebae9..89a7a30 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -906,6 +906,8 @@ static void pc_init1(ram_addr_t ram_size,
 cpu_register_physical_memory((uint32_t)(-bios_size),
  bios_size, bios_offset | IO_MEM_ROM);
 
+amd_iommu_init();
+
 fw_cfg = bochs_bios_init();
 rom_set_fw(fw_cfg);
 
diff --git a/hw/pc.h b/hw/pc.h
index 92954db..b91300e 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -194,4 +194,7 @@ int cpu_is_bsp(CPUState *env);
 
 int e820_add_entry(uint64_t, uint64_t, uint32_t);
 
+/* amd_iommu.c */
+int amd_iommu_init(void);
+
 #endif
-- 
1.6.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to 

[RFC PATCH 3/7] acpi: move table header definition into pc.h

2010-03-30 Thread Eduard - Gabriel Munteanu
This moves the table header definition into pc.h to allow other code to
build ACPI tables.

Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
---
 hw/acpi.c |   13 -
 hw/pc.h   |   13 +
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/hw/acpi.c b/hw/acpi.c
index 8eb53da..3794f70 100644
--- a/hw/acpi.c
+++ b/hw/acpi.c
@@ -826,19 +826,6 @@ static int piix4_device_hotplug(PCIDevice *dev, int state)
 return 0;
 }
 
-struct acpi_table_header
-{
-char signature [4];/* ACPI signature (4 ASCII characters) */
-uint32_t length;  /* Length of table, in bytes, including header */
-uint8_t revision; /* ACPI Specification minor version # */
-uint8_t checksum; /* To make sum of entire table == 0 */
-char oem_id [6];   /* OEM identification */
-char oem_table_id [8]; /* OEM table identification */
-uint32_t oem_revision;/* OEM revision number */
-char asl_compiler_id [4]; /* ASL compiler vendor ID */
-uint32_t asl_compiler_revision; /* ASL compiler revision number */
-} __attribute__((packed));
-
 char *acpi_tables;
 size_t acpi_tables_len, acpi_tables_prev_len;
 
diff --git a/hw/pc.h b/hw/pc.h
index 0cef140..92954db 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -103,6 +103,19 @@ int ioport_get_a20(void);
 CPUState *pc_new_cpu(const char *cpu_model);
 
 /* acpi.c */
+struct acpi_table_header
+{
+char signature [4];/* ACPI signature (4 ASCII characters) */
+uint32_t length;  /* Length of table, in bytes, including header */
+uint8_t revision; /* ACPI Specification minor version # */
+uint8_t checksum; /* To make sum of entire table == 0 */
+char oem_id [6];   /* OEM identification */
+char oem_table_id [8]; /* OEM table identification */
+uint32_t oem_revision;/* OEM revision number */
+char asl_compiler_id [4]; /* ASL compiler vendor ID */
+uint32_t asl_compiler_revision; /* ASL compiler revision number */
+} __attribute__((packed));
+
 extern int acpi_enabled;
 extern char *acpi_tables;
 extern size_t acpi_tables_len;
-- 
1.6.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [questions] savevm|loadvm

2010-03-30 Thread Juan Quintela
Wenhao Xu xuwenhao2...@gmail.com wrote:
 Hi, all,
 ¿ï½ I am working with switching QEMU from running in KVM mode to QEMU
 emulatoin mode dynamically.
 ¿ï½ Intuitively, if the snapshot created using savevm in kvm mode can be
 used by the loadvm command in QEMU emulator mode, the switchment could
 makes use of this.¿ I tried to do so. However, it does not work.¿ Any idea
 how to fix it?
 ¿ï½¿ Thanks for the help.

kvm uses a different memory layout (slots in qemu/kvm lingo), that means
that memory can't be migrated (that is a big problem).  Once that is
fixed, you need to work on the several in-kernel chips that don't
exist in qemu (kvm-irq-chip and the like).  Once that is fixed, you can
look for what more things are broken.

Once here, why do you want to do that switch?

Later, Juan.

 regards,
 Wenhao

 --
 ~_~
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Strange CPU usage pattern in SMP guest

2010-03-30 Thread Sebastian Hetze
On Tue, Mar 23, 2010 at 06:18:08PM -0300, Marcelo Tosatti wrote:
 On Mon, Mar 22, 2010 at 01:51:20PM +0100, Sebastian Hetze wrote:
  On Sun, Mar 21, 2010 at 05:17:38PM +0200, Avi Kivity wrote:
   On 03/21/2010 04:55 PM, Sebastian Hetze wrote:
   On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote:
  
   On 03/21/2010 02:02 PM, Sebastian Hetze wrote:

   12:46:02 CPU%usr   %nice%sys %iowait%irq   %soft  
   %steal  %guest   %idle
   12:46:03 all0,20   11,35   10,968,960,402,99
   0,000,00   65,14
   12:46:03   01,00   11,007,00   15,000,001,00
   0,000,00   65,00
   12:46:03   10,007,142,046,121,02   11,22
   0,000,00   72,45
   12:46:03   20,00   15,001,00   12,000,001,00
   0,000,00   71,00
   12:46:03   30,00   11,00   23,008,000,000,00
   0,000,00   58,00
   12:46:03   40,000,00   50,000,000,000,00
   0,000,00   50,00
   12:46:03   50,00   13,00   20,004,000,001,00
   0,000,00   62,00
  
   So it is only CPU4 that is showing this strange behaviour.
  
  
   Can you adjust irqtop to only count cpu4?  or even just post a few 'cat
   /proc/interrupts' from that guest.
  
   Most likely the timer interrupt for cpu4 died.

   I've added two keys +/- to your irqtop to focus up and down
   in the row of available CPUs.
   The irqtop for CPU4 shows a constant number of 6 local timer interrupts
   per update, while the other CPUs show various higher values:
  
   irqtop for cpu 4
  
 eth0  188
 Rescheduling interrupts   162
 Local timer interrupts  6
 ata_piix3
 TLB shootdowns  1
 Spurious interrupts 0
 Machine check exceptions0
  
  
   irqtop for cpu 5
  
 eth0  257
 Local timer interrupts251
 Rescheduling interrupts   237
 Spurious interrupts 0
 Machine check exceptions0
  
   So the timer interrupt for cpu4 is not completely dead but somehow
   broken.
  
   That is incredibly weird.
  
   What can cause this problem? Any way to speed it up again?
  
  
   The host has 8 cpus and is only running this 6 vcpu guest, yes?
  
   Can you confirm the other vcpus are ticking at 250 Hz?
  
   What does 'top' show running on cpu 4?  Pressing 'f' 'j' will add a  
   last-used-cpu field in the display.
  
   Marcelo, any ideas?
  
  Just to let you know, right after startup, all vcpus work fine.
  
  The following message might be related to the problem:
  hrtimer: interrupt too slow, forcing clock min delta to 165954639 ns
  
  The guest is an 32bit system running on an 64bit host.
 
 Sebastian,
 
 Please apply the attached patch to your guest kernel.
 

With this patch applied, the system runs without hrtimer messages since
5 days and the timer iterrupts look fine. However, I had this
Clocksource tsc unstable (delta = -4398046474878 ns) message that I
reported on Sunday.

Actually, when restarting the system with the hrtimer patch applied,
we also changed the BIOS setting to disable Intel SmartStep on the host.
Since there are no hrtimer messages at all, it might be that the SmartStep
CPU frequency adjustment is the real cause for the slow interrupts in
the KVM guest. Anyone else experienced these problems?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM test: Make the profiler could be configurated

2010-03-30 Thread Jason Wang
The patch let the profilers could be specified through configuration
file. kvm_stat was kept as the default profiler.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_utils.py  |   23 ++-
 client/tests/kvm/tests_base.cfg.sample |2 +-
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index 8531c79..a73d5d4 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -866,24 +866,21 @@ def run_tests(test_list, job):
 if dependencies_satisfied:
 test_iterations = int(dict.get(iterations, 1))
 test_tag = dict.get(shortname)
-# Setting up kvm_stat profiling during test execution.
-# We don't need kvm_stat profiling on the build tests.
-if dict.get(run_kvm_stat) == yes:
-profile = True
-else:
-# None because it's the default value on the base_test class
-# and the value None is specifically checked there.
-profile = None
+# Setting up profilers during test execution.
+profilers = dict.get(profilers)
+if profilers is not None:
+for profiler in profilers.split():
+job.profilers.add(profiler)
 
-if profile:
-job.profilers.add('kvm_stat')
 # We need only one execution, profiled, hence we're passing
 # the profile_only parameter to job.run_test().
 current_status = job.run_test(kvm, params=dict, tag=test_tag,
   iterations=test_iterations,
-  profile_only=profile)
-if profile:
-job.profilers.delete('kvm_stat')
+  profile_only= profilers is not None)
+
+if profilers is not None:
+for profiler in profilers.split():
+job.profilers.delete(profiler)
 
 if not current_status:
 failed = True
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index d162cf8..cc10713 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -41,7 +41,7 @@ nic_script = scripts/qemu-ifup
 address_index = 0
 
 # Misc
-run_kvm_stat = yes
+profilers = kvm_stat 
 
 
 # Tests

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM test: Add the hwclock test into guest test

2010-03-30 Thread Jason Wang
Hwclock is useful to do the basic testing of emulated RTC.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/autotest_control/hwclock.control |   12 
 client/tests/kvm/tests_base.cfg.sample|3 +++
 2 files changed, 15 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/autotest_control/hwclock.control

diff --git a/client/tests/kvm/autotest_control/hwclock.control 
b/client/tests/kvm/autotest_control/hwclock.control
new file mode 100644
index 000..bf3e9d3
--- /dev/null
+++ b/client/tests/kvm/autotest_control/hwclock.control
@@ -0,0 +1,12 @@
+AUTHOR = Martin J. Bligh mbl...@mbligh.org
+NAME = Hwclock
+TIME = SHORT
+TEST_CATEGORY = Functional
+TEST_CLASS = General
+TEST_TYPE = client
+
+DOC = 
+This test checks that we can set and read the hwclock successfully
+
+
+job.run_test('hwclock')
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index cc10713..00b80c3 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -145,6 +145,9 @@ variants:
 - scrashme:
 test_name = scrashme
 test_control_file = scrashme.control
+- hwclock:
+test_name = hwclock
+test_control_file = hwclock.control
 
 - linux_s3: install setup unattended_install
 type = linux_s3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] vhost-blk implementation

2010-03-30 Thread Avi Kivity

On 03/30/2010 01:51 AM, Badari Pulavarty wrote:


   

Your io wait time is twice as long and your throughput is about half.
I think the qmeu block submission does an extra attempt at merging
requests.  Does blktrace tell you anything interesting?

   

Yes. I see that in my testcase (2M writes) - QEMU is pickup 512K
requests from the virtio ring and merging them back to 2M before
submitting them.

Unfortunately, I can't do that quite easily in vhost-blk. QEMU
does re-creates iovecs for the merged IO. I have to come up with
a scheme to do this :(
   


I don't think that either vhost-blk or virtio-blk should do this.  
Merging increases latency, and in the case of streaming writes makes it 
impossible for the guest to prepare new requests while earlier ones are 
being serviced (in effect it reduces the queue depth to 1).


qcow2 does benefit from merging, but it should do so itself without 
impacting raw.



It does.  I suggest using fio O_DIRECT random access patterns to avoid
such issues.
 

Well, I am not trying to come up with a test case where vhost-blk
performs better than virtio-blk. I am trying to understand where
and why vhost-blk performnce worse than virtio-blk.
   


In this case qemu-virtio is making an incorrect tradeoff.  The guest 
could easily merge those requests itself.  If you want larger writes, 
tune the guest to issue them.


Another way to look at it:  merging improved bandwidth but increased 
latency, yet you are only measuring bandwidth.  If you measured only 
latency you'd find that vhost-blk is better.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM MMU: thinking of shadow page cache

2010-03-30 Thread Avi Kivity

On 03/30/2010 04:59 AM, Xiao Guangrong wrote:

When we cached shadow page tables, one guest page table may have
many shadow pages, take below case for example:

  (RO+U) ---  |--| __ |--|
  (W+U ) ---  |  GP1 |   ||  GP2 |
  (W+P ) ---  |--|   |--  |--|

There have 3 kinds of permission mapping to GP1, so we should
allocate 3 shadow pages for GP1 and 3 shadow pages for GP2.
And it has 3 class permissions(R/W, U/S, X/NX) in x86's architecture,
for the worst case, we should allocate 2^3 pages for every paging
mapping level.

This waste is caused by that we only set the permission bits in PTE,
not in the middle mapping level.

So, i think we can mapping guest page table's permission into cache
shadow page table, then it can be shared between many shadow page tables
if their map to the same gust physics address. For above case, we only need 2
pages.

Any comments?
   


We've considered this in the past, it makes sense.  The big question is 
whether any guests actually map the same page table through PDEs with 
different permissions (mapping the same page table through multiple PDEs 
is very common, but always with the same permissions).  Do you know of 
any such guest?


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/1] Shared memory uio_pci driver

2010-03-30 Thread Cam Macdonell
On Mon, Mar 29, 2010 at 2:59 PM, Avi Kivity a...@redhat.com wrote:
 On 03/28/2010 10:48 PM, Cam Macdonell wrote:

 On Sat, Mar 27, 2010 at 11:48 AM, Avi Kivitya...@redhat.com  wrote:


 On 03/26/2010 07:14 PM, Cam Macdonell wrote:




 I'm not familiar with the uio internals, but for the interface, an
 ioctl()
 on the fd to assign an eventfd to an MSI vector.  Similar to ioeventfd,
 but
 instead of mapping a doorbell to an eventfd, it maps a real MSI to an
 eventfd.



 uio will never support ioctls.


 Why not?


 Perhaps I spoke too strongly, but it was rejected before

 http://thread.gmane.org/gmane.linux.kernel/756481

 With a compelling case perhaps it could be added.


 Ah, the usual ioctls are ugly, go away.

 It could be done via sysfs:

  $ cat /sys/.../msix/max-interrupts
  256
  $ echo 4  /sys/.../msix/allocate
  $ # subdirectories 0 1 2 3 magically appear
  $ # bind fd 13 to msix
  $ echo 13  /sys/.../msix/2/bind-fd
  $ # from now on, msix interrupt 2 will call eventfd_signal() on fd 13

 Call me old fashioned, but I prefer ioctls.

Good point.  iiuc, the goal relative to ioctls in UIO was to not have
device drivers creating their own device-specific ABIs and drivers
that are just massive switch statements.  Having ioctls that support
functions for UIO in general, such as pairing msi vectors to eventfds,
does not go against that goal.


 --
 Do not meddle in the internals of kernels, for they are subtle and quick to
 panic.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes for Mar 30

2010-03-30 Thread Chris Wright
vhost-blk
- started w/ vhost-net, nice and modular
- qemu merging requests, so outperforming (throughput) for sequential write
- random read/write and sequential reads are comparable or better
- can't do e.g. qcow2
- spreading work across all cpu workqueues
- are there cases where we expect vhost-blk is needed?
  - not driven by poor perf benchmarks, more driven from interest/curiosity
  - hch's numbers (esp. large request) are solid
  - low hanging fruit...no interrupt mitigation, addr lookup on all request
  - get some performance numbers and profiles to see what we can do to
improve existing virtio before moving to vhost

qemu optimizations (re: above addr lookup)
- no long lived ptr to guest memory
- ram memory patch allowed for stable mapping
  - objection was memory hotplug/remove
- how much do we want to gate qemu progress on memory hotplug?
- http://wiki.qemu.org/Features/RamAPI

ide emulation
- qemu virtio-blk vs. other hv ide emulation...ide was better
  - this was some time ago
  - older qemu, even qemu ide emulation was better than virtio-blk in
some cases
- ide still bounce buffering, and has some fundamental limitations
  (some of those may be fixable w/ ahci)
- anyone looked at this recently?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 7/7] acpi: fix bug in acpi_checksum() caused by garbage in checksum field

2010-03-30 Thread Richard Henderson
On 03/30/2010 01:20 AM, Eduard - Gabriel Munteanu wrote:
 +/* Ignore preexisting garbage in checksum. */
 +acpi_hdr = (struct acpi_table_header *) data;
 +sum -= acpi_hdr-checksum;
 +
  return (-sum)  0xff;

Wouldn't it be cleaner to adjust the acpi_checksum definition to take
and acpi_table_header operand instead of uint8_t?  And given it's only
usage, perhaps update the checksum instead of returning it?  E.g.

-((struct acpi_table_header*)p)-checksum = acpi_checksum((uint8_t*)p, off);
+acpi_update_checksum((struct acpi_table_header *)p, off);


r~
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


device limit for kvm_io_bus

2010-03-30 Thread Cam Macdonell
Hi,

I'm trying to use ioeventfds for notification between guests.  After
assigning a handful of ioeventfds I was getting a no space left on
device error.  The culprit seems to be that only 6 devices are
allowed for a guest on the kvm IO bus.  The comment indicates a
somewhat low number was chosen.  Can the limit be increased to
something on the order of the number of file descriptors a process
could have?

/*
 * It would be nice to use something smarter than a linear search, TBD...
 * Thankfully we dont expect many devices to register (famous last words :),
 * so until then it will suffice.  At least its abstracted so we can change
 * in one place.
 */
struct kvm_io_bus {
int   dev_count;
#define NR_IOBUS_DEVS 6
struct kvm_io_device *devs[NR_IOBUS_DEVS];
};

from kvm_main.c
int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx,
struct kvm_io_device *dev)
{
struct kvm_io_bus *new_bus, *bus;

bus = kvm-buses[bus_idx];
if (bus-dev_count  NR_IOBUS_DEVS-1)
return -ENOSPC;

...snip...

}

Thanks,
Cam
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clocksource tsc unstable (delta = -4398046474878 ns)

2010-03-30 Thread Athanasius
On Tue, Mar 30, 2010 at 10:08:28AM +0200, Sebastian Hetze wrote:
 On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
I didn't see any such messages but I've had a recent experience with
  the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
  two separate incidents.  Eerily the exact jumps, as best I can tell from
  logs are of 17592 and 8796 seconds, give or take a second or two.  If
  you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
  nanoseconds.
What I've done that seems to have avoided this happening again is drop
  KVM_CLOCK kernel option from the kvm guests' kernel.
 
 To my understanding, kvm-clock is the best and most reliable clocksource
 available, so I do not think it is a good idea to disable it.
 
 There is a lot of bit shift operation happening with the clocksources,
 so there may be a real bug hidden somewhere in the code.
 Somehow ntp adjustment is involved, can this cause such huge steps?
 Im my case, I actually have NTP running in the guest. However, the
 statistics show a pretty stable timing here.

  This is one thing thing to note, I *was* running ntpd in the affected
guest (and rather obviously, I still am).  If there's some bad
interaction between KVM_CLOCK and ntpd it needs documenting in the
first instance and preferably also fixing.

-- 
- Athanasius = Athanasius(at)miggy.org / http://www.miggy.org/
  Finger athan(at)fysh.org for PGP key
   And it's me who is my enemy. Me who beats me up.
Me who makes the monsters. Me who strips my confidence. Paula Cole - ME
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [questions] savevm|loadvm

2010-03-30 Thread Wenhao Xu
Hi, Juan,
   I am fresh to both QEMU and KVM. But so far, I notice that QEMU
uses KVM_SET_USER_MEMORY_REGION to set memory region that KVM can
use and uses cpu_register_physical_memory_offset to register the same
memory to QEMU emulator, which means QEMU and KVM use the same host
virtual memory. And therefore the memory KVM modified could be
directly reflected to QEMU. I don't quite understand the different
memory layout problem between the two. So I don't know exactly what
you mean to fix it?

   For why switching is useful? Actually, I am a master student now
and doing a course project. What am I arguing is that QEMU could be
potentially useful to do many instrumentation analysis, but it is a
bit slow. So by combing with KVM, when the os runs to some place where
we are interested in, switch it to QEMU emulator mode and do the
analysis and then switch back.
   FYI, there is a paper doing so in Xen, Practical taint-based
protection using demand emulation. I want to do the same demand
emulation for KVM.

   I am trying to patch some code.  When kvm_run returns, I tried to
synchronize the CPU state and memory dirty map, and then run into QEMU
emulator mode. However, I got an error, qemu: fatal: invalid tss
type. I don't know exactly where is the problem.

   Thanks for helping me working this out. I am really stuck into this problem.

regards,
Wenhao

On Tue, Mar 30, 2010 at 1:22 AM, Juan Quintela quint...@redhat.com wrote:
 Wenhao Xu xuwenhao2...@gmail.com wrote:
 Hi, all,
 żď˝ I am working with switching QEMU from running in KVM mode to QEMU
 emulatoin mode dynamically.
 żď˝ Intuitively, if the snapshot created using savevm in kvm mode can be
 used by the loadvm command in QEMU emulator mode, the switchment could
 makes use of this.ż I tried to do so. However, it does not work.ż Any idea
 how to fix it?
 żď˝ż Thanks for the help.

 kvm uses a different memory layout (slots in qemu/kvm lingo), that means
 that memory can't be migrated (that is a big problem).  Once that is
 fixed, you need to work on the several in-kernel chips that don't
 exist in qemu (kvm-irq-chip and the like).  Once that is fixed, you can
 look for what more things are broken.

 Once here, why do you want to do that switch?

 Later, Juan.

 regards,
 Wenhao

 --
 ~_~




-- 
~_~
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 4/7] sparc: rename hw/iommu.c

2010-03-30 Thread Blue Swirl
On 3/30/10, Eduard - Gabriel Munteanu eduard.munte...@linux360.ro wrote:
 hw/iommu.c concerns the SPARC IOMMU. However we intend to implement the
  AMD IOMMU, which could lead to confusion unless we rename the former.

I was also thinking of renaming the file some time ago. The correct
name would be sun4m_iommu.c. Sun4c (while still Sparc based) had a
different architecture (IIRC CPU MMU doubled as IOMMU) and Sun4d had
several IO-UNITs instead. All Sun4m machines had an IOMMU.

But the qdev name of the device is still iommu and we can't change
that. So I'm not so sure it's worth renaming. Can't AMD IOMMU reside
in amd_iommu.c?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM test: Make the profiler could be configurated

2010-03-30 Thread Michael Goldish

- Jason Wang jasow...@redhat.com wrote:

 The patch let the profilers could be specified through configuration
 file. kvm_stat was kept as the default profiler.

Looks good.  Some minor style comments:

 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  client/tests/kvm/kvm_utils.py  |   23
 ++-
  client/tests/kvm/tests_base.cfg.sample |2 +-
  2 files changed, 11 insertions(+), 14 deletions(-)
 
 diff --git a/client/tests/kvm/kvm_utils.py
 b/client/tests/kvm/kvm_utils.py
 index 8531c79..a73d5d4 100644
 --- a/client/tests/kvm/kvm_utils.py
 +++ b/client/tests/kvm/kvm_utils.py
 @@ -866,24 +866,21 @@ def run_tests(test_list, job):
  if dependencies_satisfied:
  test_iterations = int(dict.get(iterations, 1))
  test_tag = dict.get(shortname)
 -# Setting up kvm_stat profiling during test execution.
 -# We don't need kvm_stat profiling on the build tests.
 -if dict.get(run_kvm_stat) == yes:
 -profile = True
 -else:
 -# None because it's the default value on the base_test class
 -# and the value None is specifically checked there.
 -profile = None
 +# Setting up profilers during test execution.
 +profilers = dict.get(profilers)
 +if profilers is not None:

I think it's nicer and shorter to say if profilers instead of
if profilers is not None.
Better yet, use 'profilers = dict.get(profilers, )' so that if
profilers isn't defined, or if the user said 'profilers = ', you can
still call profilers.split(), i.e.:

profilers = dict.get(profilers, )
for profiler in profilers.split():
job.profilers.add(profiler)

and then you don't need the 'if'.
This is also relevant to the job.profilers.delete() code below.

 +for profiler in profilers.split():
 +job.profilers.add(profiler)
  
 -if profile:
 -job.profilers.add('kvm_stat')
  # We need only one execution, profiled, hence we're passing
  # the profile_only parameter to job.run_test().
  current_status = job.run_test(kvm, params=dict, tag=test_tag,
iterations=test_iterations,
 -  profile_only=profile)
 -if profile:
 -job.profilers.delete('kvm_stat')
 +  profile_only= profilers is not 
 None)

AFAIK, profile_only needs to be either True or None (Lucas, please correct
me if I'm wrong).
In that case, it would be appropriate to use

profile_only=bool(profilers) or None

so that if profilers is e.g. kvm_stat, profile_only will be True,
and if profilers is , profile_only will be None.

 +
 +if profilers is not None:
 +for profiler in profilers.split():
 +job.profilers.delete(profiler)
  
  if not current_status:
  failed = True
 diff --git a/client/tests/kvm/tests_base.cfg.sample
 b/client/tests/kvm/tests_base.cfg.sample
 index d162cf8..cc10713 100644
 --- a/client/tests/kvm/tests_base.cfg.sample
 +++ b/client/tests/kvm/tests_base.cfg.sample
 @@ -41,7 +41,7 @@ nic_script = scripts/qemu-ifup
  address_index = 0
  
  # Misc
 -run_kvm_stat = yes
 +profilers = kvm_stat 

We don't need the quotes here now.  We'll need them later if we add
more profilers.  So it's OK to use

profilers = kvm_stat

and then later if we need another profiler:

profilers +=  some_other_profiler

  
  
  # Tests
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


CfP with Extended Deadline 5th Workshop on Virtualization in High-Performance Cloud Computing (VHPC'10)

2010-03-30 Thread Michael Alexander
Apologies if you received multiple copies of this message.


=

CALL FOR PAPERS

5th Workshop on

Virtualization in High-Performance Cloud Computing

VHPC'10

as part of Euro-Par 2010, Island of Ischia-Naples, Italy

=

Date: August 31, 2010

Euro-Par 2009: http://www.europar2010.org/

Workshop URL: http://vhpc.org

SUBMISSION DEADLINE:

Abstracts: April 4, 2010 (extended)
Full Paper: June 19, 2010 (extended) 


Scope:

Virtualization has become a common abstraction layer in modern data
centers, enabling resource owners to manage complex infrastructure
independently of their applications. Conjointly virtualization is
becoming a driving technology for a manifold of industry grade IT
services. Piloted by the Amazon Elastic Computing Cloud services, the
cloud concept includes the notion of a separation between resource
owners and users, adding services such as hosted application
frameworks and queuing. Utilizing the same infrastructure, clouds
carry significant potential for use in high-performance scientific
computing. The ability of clouds to provide for requests and releases
of vast computing resource dynamically and close to the marginal cost
of providing the services is unprecedented in the history of
scientific and commercial computing.

Distributed computing concepts that leverage federated resource access
are popular within the grid community, but have not seen previously
desired deployed levels so far. Also, many of the scientific
datacenters have not adopted virtualization or cloud concepts yet.

This workshop aims to bring together industrial providers with the
scientific community in order to foster discussion, collaboration and
mutual exchange of knowledge and experience.

The workshop will be one day in length, composed of 20 min paper
presentations, each followed by 10 min discussion sections.
Presentations may be accompanied by interactive demonstrations. It
concludes with a 30 min panel discussion by presenters.

TOPICS

Topics include, but are not limited to, the following subjects:

- Virtualization in cloud, cluster and grid HPC environments
- VM cloud, cluster load distribution algorithms
- Cloud, cluster and grid filesystems
- QoS and and service level guarantees
- Cloud programming models, APIs and databases
- Software as a service (SaaS)
- Cloud provisioning
- Virtualized I/O
- VMMs and storage virtualization
- MPI, PVM on virtual machines
- High-performance network virtualization
- High-speed interconnects
- Hypervisor extensions
- Tools for cluster and grid computing
- Xen/other VMM cloud/cluster/grid tools
- Raw device access from VMs
- Cloud reliability, fault-tolerance, and security
- Cloud load balancing
- VMs - power efficiency
- Network architectures for VM-based environments
- VMMs/Hypervisors
- Hardware support for virtualization
- Fault tolerant VM environments
- Workload characterizations for VM-based environments
- Bottleneck management
- Metering
- VM-based cloud performance modeling
- Cloud security, access control and data integrity
- Performance management and tuning hosts and guest VMs
- VMM performance tuning on various load types
- Research and education use cases
- Cloud use cases
- Management of VM environments and clouds
- Deployment of VM-based environments



PAPER SUBMISSION

Papers submitted to the workshop will be reviewed by at least two
members of the program committee and external reviewers. Submissions
should include abstract, key words, the e-mail address of the
corresponding author, and must not exceed 10 pages, including tables
and figures at a main font size no smaller than 11 point. Submission
of a paper should be regarded as a commitment that, should the paper
be accepted, at least one of the authors will register and attend the
conference to present the work.

Accepted papers will be published in the Springer LNCS series - the
format must be according to the Springer LNCS Style. Initial
submissions are in PDF, accepted papers will be requested to provided
source files.

Format Guidelines: http://www.springer.de/comp/lncs/authors.html

Submission Link: http://edas.info/newPaper.php?c=8553


IMPORTANT DATES

April 4 - Abstract submission due (extended)
May 19 - Full paper submission (extended)
July 14 - Acceptance notification
August 3 - Camera-ready version due
August 31 - September 3 - conference


CHAIR

Michael Alexander (chair), scaledinfra technologies GmbH, Austria
Gianluigi Zanetti (co-chair), CRS4, Italy


PROGRAM COMMITTEE

Padmashree Apparao, Intel Corp., USA
Volker Buege, University of Karlsruhe, Germany
Roberto Canonico, University of Napoli Federico II, Italy
Tommaso Cucinotta, Scuola Superiore Sant'Anna, Italy
Werner Fischer, Thomas Krenn AG, Germany
William Gardner, University of Guelph, Canada
Wolfgang Gentzsch, DEISA. Max Planck Gesellschaft, Germany
Derek Groen, UVA, The Netherlands
Marcus Hardt, 

Re: Clocksource tsc unstable (delta = -4398046474878 ns)

2010-03-30 Thread Beinicke, Thomas
On Tuesday 30 March 2010 10:08:28 Sebastian Hetze wrote:
 On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
  On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote:
   this message appeared in the KVM guest kern.log last night:
   
   Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable
   (delta = -4398046474878 ns)
   
   The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
   hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.
   
   If I understand things correct, in kernel/time/clocksource.c
   clocksource_watchdog() checks all the
   /sys/devices/system/clocksource/clocksource0/available_clocksource
   every 0.5sec for an delta of more than 0.0625s. So the tsc must have
   changed more than one hour within two subsequent calls of
   clocksource_watchdog. No event in the host nor anything in the
   guest gives reasonable cause for this step.
   
   However, the number 4398046474878 is only 36226 ns away from
   4*1024*1024*1024*1024
   
I didn't see any such messages but I've had a recent experience with
  
  the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
  two separate incidents.  Eerily the exact jumps, as best I can tell from
  logs are of 17592 and 8796 seconds, give or take a second or two.  If
  you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
  nanoseconds.
  
What I've done that seems to have avoided this happening again is drop
  
  KVM_CLOCK kernel option from the kvm guests' kernel.
 
 To my understanding, kvm-clock is the best and most reliable clocksource
 available, so I do not think it is a good idea to disable it.
 
 There is a lot of bit shift operation happening with the clocksources,
 so there may be a real bug hidden somewhere in the code.
 Somehow ntp adjustment is involved, can this cause such huge steps?
 Im my case, I actually have NTP running in the guest. However, the
 statistics show a pretty stable timing here.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

I am having the same problem occasional.
It only occurs if the VM is under heavy IO or CPU Load but I can't reproduce 
it 100%. It just never occurs on VMs that only serve a few web pages though.
I also noticed that on a machine which has this problem even an ssh shell is 
*very* laggy so it's not just a cosmetic problem.

Would removing the hrtimer from the kernel config solve it or is it necessary 
for KVM?

I remember this problem has been posted her before though there wasn't any 
real conclusion or solution for it.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][QEMU][VHOST]fix feature bit handling for mergeable rx buffers

2010-03-30 Thread David L Stevens

This patch adds mergeable receive buffer support to qemu-kvm,
to allow enabling it when vhost_net supports it.

It also adds a missing call to vhost_net_ack_features() to
push acked features to vhost_net.

The patch is relative to Michael Tsirkin's qemu-kvm git tree.

+-DLS

Signed-off-by: David L Stevens dlstev...@us.ibm.com

diff -ruNp qemu-kvm.mst/hw/vhost_net.c qemu-kvm.dls/hw/vhost_net.c
--- qemu-kvm.mst/hw/vhost_net.c 2010-03-03 13:39:07.0 -0800
+++ qemu-kvm.dls/hw/vhost_net.c 2010-03-29 20:37:34.0 -0700
@@ -5,6 +5,7 @@
 #include sys/ioctl.h
 #include linux/vhost.h
 #include linux/virtio_ring.h
+#include linux/if_tun.h
 #include netpacket/packet.h
 #include net/ethernet.h
 #include net/if.h
@@ -38,15 +39,6 @@ unsigned vhost_net_get_features(struct v
return features;
 }
 
-void vhost_net_ack_features(struct vhost_net *net, unsigned features)
-{
-   net-dev.acked_features = net-dev.backend_features;
-   if (features  (1  VIRTIO_F_NOTIFY_ON_EMPTY))
-   net-dev.acked_features |= (1  VIRTIO_F_NOTIFY_ON_EMPTY);
-   if (features  (1  VIRTIO_RING_F_INDIRECT_DESC))
-   net-dev.acked_features |= (1  VIRTIO_RING_F_INDIRECT_DESC);
-}
-
 static int vhost_net_get_fd(VLANClientState *backend)
 {
switch (backend-info-type) {
@@ -58,6 +50,25 @@ static int vhost_net_get_fd(VLANClientSt
}
 }
 
+void vhost_net_ack_features(struct vhost_net *net, unsigned features)
+{
+   int vnet_hdr_sz = sizeof(struct virtio_net_hdr);
+
+   net-dev.acked_features = net-dev.backend_features;
+   if (features  (1  VIRTIO_F_NOTIFY_ON_EMPTY))
+   net-dev.acked_features |= (1  VIRTIO_F_NOTIFY_ON_EMPTY);
+   if (features  (1  VIRTIO_RING_F_INDIRECT_DESC))
+   net-dev.acked_features |= (1  VIRTIO_RING_F_INDIRECT_DESC);
+   if (features  (1  VIRTIO_NET_F_MRG_RXBUF)) {
+   net-dev.acked_features |= (1  VIRTIO_NET_F_MRG_RXBUF);
+   vnet_hdr_sz = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+   }
+#ifdef TUNSETVNETHDRSZ
+   if (ioctl(vhost_net_get_fd(net-vc), TUNSETVNETHDRSZ, vnet_hdr_sz)  0)
+   perror(TUNSETVNETHDRSZ);
+#endif /* TUNSETVNETHDRSZ */
+}
+
 struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd)
 {
int r;
diff -ruNp qemu-kvm.mst/hw/virtio-net.c qemu-kvm.dls/hw/virtio-net.c
--- qemu-kvm.mst/hw/virtio-net.c2010-03-03 13:39:07.0 -0800
+++ qemu-kvm.dls/hw/virtio-net.c2010-03-29 16:15:46.0 -0700
@@ -211,12 +211,16 @@ static void virtio_net_set_features(Virt
 n-mergeable_rx_bufs = !!(features  (1  VIRTIO_NET_F_MRG_RXBUF));
 
 if (n-has_vnet_hdr) {
+   struct vhost_net *vhost_net = tap_get_vhost_net(n-nic-nc.peer);
+
 tap_set_offload(n-nic-nc.peer,
 (features  VIRTIO_NET_F_GUEST_CSUM)  1,
 (features  VIRTIO_NET_F_GUEST_TSO4)  1,
 (features  VIRTIO_NET_F_GUEST_TSO6)  1,
 (features  VIRTIO_NET_F_GUEST_ECN)   1,
 (features  VIRTIO_NET_F_GUEST_UFO)   1);
+   if (vhost_net)
+   vhost_net_ack_features(vhost_net, features);
 }
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 4/7] sparc: rename hw/iommu.c

2010-03-30 Thread Joerg Roedel
On Tue, Mar 30, 2010 at 08:06:36PM +0300, Blue Swirl wrote:
 On 3/30/10, Eduard - Gabriel Munteanu eduard.munte...@linux360.ro wrote:
  hw/iommu.c concerns the SPARC IOMMU. However we intend to implement the
   AMD IOMMU, which could lead to confusion unless we rename the former.
 
 I was also thinking of renaming the file some time ago. The correct
 name would be sun4m_iommu.c. Sun4c (while still Sparc based) had a
 different architecture (IIRC CPU MMU doubled as IOMMU) and Sun4d had
 several IO-UNITs instead. All Sun4m machines had an IOMMU.
 
 But the qdev name of the device is still iommu and we can't change
 that. So I'm not so sure it's worth renaming. Can't AMD IOMMU reside
 in amd_iommu.c?

Keeping the plain name 'iommu' will likely cause confusion when more
iommu implementations are added. It is better to rename it so that the
name better describes what the file implements. So this change makes
sense for me.

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clocksource tsc unstable (delta = -4398046474878 ns)

2010-03-30 Thread Zachary Amsden
On 03/30/10 07:04, Beinicke, Thomas wrote:
 On Tuesday 30 March 2010 10:08:28 Sebastian Hetze wrote:
   
 On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
 
 On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote:
   
 this message appeared in the KVM guest kern.log last night:

 Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable
 (delta = -4398046474878 ns)

 The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
 hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.

 If I understand things correct, in kernel/time/clocksource.c
 clocksource_watchdog() checks all the
 /sys/devices/system/clocksource/clocksource0/available_clocksource
 every 0.5sec for an delta of more than 0.0625s. So the tsc must have
 changed more than one hour within two subsequent calls of
 clocksource_watchdog. No event in the host nor anything in the
 guest gives reasonable cause for this step.

 However, the number 4398046474878 is only 36226 ns away from
 4*1024*1024*1024*1024

 
   I didn't see any such messages but I've had a recent experience with

 the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
 two separate incidents.  Eerily the exact jumps, as best I can tell from
 logs are of 17592 and 8796 seconds, give or take a second or two.  If
 you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
 nanoseconds.

   What I've done that seems to have avoided this happening again is drop

 KVM_CLOCK kernel option from the kvm guests' kernel.
   
 To my understanding, kvm-clock is the best and most reliable clocksource
 available, so I do not think it is a good idea to disable it.

 There is a lot of bit shift operation happening with the clocksources,
 so there may be a real bug hidden somewhere in the code.
 Somehow ntp adjustment is involved, can this cause such huge steps?
 Im my case, I actually have NTP running in the guest. However, the
 statistics show a pretty stable timing here.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 I am having the same problem occasional.
 It only occurs if the VM is under heavy IO or CPU Load but I can't reproduce 
 it 100%. It just never occurs on VMs that only serve a few web pages though.
 I also noticed that on a machine which has this problem even an ssh shell is 
 *very* laggy so it's not just a cosmetic problem.

 Would removing the hrtimer from the kernel config solve it or is it necessary 
 for KVM?

 I remember this problem has been posted her before though there wasn't any 
 real conclusion or solution for it.
   

Are you also running a 32-bit kernel?

Thanks,

Zach
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/7] Beginning implementing the AMD IOMMU emulation

2010-03-30 Thread Joerg Roedel
Hello Eduard,

On Tue, Mar 30, 2010 at 11:20:01AM +0300, Eduard - Gabriel Munteanu wrote:
 This patchset is intended to provide a start for implementing the
 emulation of the AMD IOMMU. For those who aren't aware yet, I intend
 to participate as a student in GSoC 2010.

Great. This is a good start.

 In short, this demonstrates a mechanism of inserting ACPI tables without
 modifying SeaBIOS or other BIOS implementations. I also have a SeaBIOS
 equivalent, but I think this approach is better, at least at the moment.

I like the approach implemented in this patchset because of its
simplicity. The right place for building acpi tables is the bios,
though. I am fine with both ways. Anthony, Avi, what do you
think about it?

 I wouldn't merge this patchset yet, at least stuff after the first patch,
 until it accumulates more work. I also didn't test loading ACPI tables from
 the command line after these modifications.

When Linux finds an IVRS table it expects that there is a working AMD
IOMMU so you should change this patchset so that the ACPI table is
enabled later when the hardware emulation is working. That will keep
this work bisectable.
As the next step I suggest you to implement an AMD IOMMU pci device
with the config space, capability and the mmio register space.

Regards,

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 4/7] sparc: rename hw/iommu.c

2010-03-30 Thread Blue Swirl
On 3/30/10, Joerg Roedel j...@8bytes.org wrote:
 On Tue, Mar 30, 2010 at 08:06:36PM +0300, Blue Swirl wrote:
   On 3/30/10, Eduard - Gabriel Munteanu eduard.munte...@linux360.ro wrote:
hw/iommu.c concerns the SPARC IOMMU. However we intend to implement the
 AMD IOMMU, which could lead to confusion unless we rename the former.
  
   I was also thinking of renaming the file some time ago. The correct
   name would be sun4m_iommu.c. Sun4c (while still Sparc based) had a
   different architecture (IIRC CPU MMU doubled as IOMMU) and Sun4d had
   several IO-UNITs instead. All Sun4m machines had an IOMMU.
  
   But the qdev name of the device is still iommu and we can't change
   that. So I'm not so sure it's worth renaming. Can't AMD IOMMU reside
   in amd_iommu.c?


 Keeping the plain name 'iommu' will likely cause confusion when more
  iommu implementations are added. It is better to rename it so that the
  name better describes what the file implements. So this change makes
  sense for me.

I see. I'm OK then with sun4m_iommu.c.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 4/7] sparc: rename hw/iommu.c

2010-03-30 Thread Eduard - Gabriel Munteanu
On Tue, Mar 30, 2010 at 11:00:10PM +0300, Blue Swirl wrote:
 On 3/30/10, Joerg Roedel j...@8bytes.org wrote:
  On Tue, Mar 30, 2010 at 08:06:36PM +0300, Blue Swirl wrote:
On 3/30/10, Eduard - Gabriel Munteanu eduard.munte...@linux360.ro 
  wrote:
 hw/iommu.c concerns the SPARC IOMMU. However we intend to implement the
  AMD IOMMU, which could lead to confusion unless we rename the former.
   
I was also thinking of renaming the file some time ago. The correct
name would be sun4m_iommu.c. Sun4c (while still Sparc based) had a
different architecture (IIRC CPU MMU doubled as IOMMU) and Sun4d had
several IO-UNITs instead. All Sun4m machines had an IOMMU.
   
But the qdev name of the device is still iommu and we can't change
that. So I'm not so sure it's worth renaming. Can't AMD IOMMU reside
in amd_iommu.c?
 
 
  Keeping the plain name 'iommu' will likely cause confusion when more
   iommu implementations are added. It is better to rename it so that the
   name better describes what the file implements. So this change makes
   sense for me.
 
 I see. I'm OK then with sun4m_iommu.c.

Yes, I think it's enough to just change the filename, since multiple
such devices aren't likely to conflict, in any configuration.

Then sun4m_iommu.c it is. Will resubmit with the next patchset.


Eduard

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Setting nx bit in virtual CPU

2010-03-30 Thread Richard Simpson
OK, thanks for that.  Clearly something wrong with my installation.  At
least now I know it is possible I can keep fiddling until it works.

Richard

On 30/03/10 03:12, Chris Wright wrote:
 * Richard Simpson (rs1...@huskydog.org.uk) wrote:
 So, is there any way of having the nx bit and the benefits of KVM
 acceleration.
 
 WFM here (both current git tree and 0.12.3) w/ either -cpu host or -cpu
 qemu64.  The code definitly does what you'd expect in both those cases.
 
 thanks,
 -chris

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 5/7] x86-64: AMD IOMMU stub

2010-03-30 Thread Blue Swirl
On 3/30/10, Eduard - Gabriel Munteanu eduard.munte...@linux360.ro wrote:
 This currently loads a non-functional IVRS ACPI table and provides a
  skeleton for initializing the AMD IOMMU.

  Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
  ---
   Makefile.target |1 +
   hw/amd_iommu.c  |  103 
 +++
   hw/pc.c |2 +
   hw/pc.h |3 ++
   4 files changed, 109 insertions(+), 0 deletions(-)
   create mode 100644 hw/amd_iommu.c

  diff --git a/Makefile.target b/Makefile.target
  index cbe19a6..dfa4652 100644
  --- a/Makefile.target
  +++ b/Makefile.target
  @@ -230,6 +230,7 @@ obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o 
 wdt_ib700.o
   obj-i386-y += extboot.o
   obj-i386-y += ne2000-isa.o debugcon.o multiboot.o
   obj-i386-y += testdev.o
  +obj-i386-y += amd_iommu.o

   obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o
   obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o
  diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
  new file mode 100644
  index 000..b502430
  --- /dev/null
  +++ b/hw/amd_iommu.c
  @@ -0,0 +1,103 @@
  +/*
  + * AMD IOMMU emulation
  + *
  + * Copyright (c) 2010 Eduard - Gabriel Munteanu 
 eduard.munte...@linux360.ro
  + *
  + * Permission is hereby granted, free of charge, to any person obtaining a 
 copy
  + * of this software and associated documentation files (the Software), to 
 deal
  + * in the Software without restriction, including without limitation the 
 rights
  + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  + * copies of the Software, and to permit persons to whom the Software is
  + * furnished to do so, subject to the following conditions:
  + *
  + * The above copyright notice and this permission notice shall be included 
 in
  + * all copies or substantial portions of the Software.
  + *
  + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS 
 OR
  + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
  + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
 OTHER
  + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
 FROM,
  + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
  + * THE SOFTWARE.
  + */
  +
  +/*
  + * IVRS (I/O Virtualization Reporting Structure) table.
  + *
  + * Describes the AMD IOMMU, as per:
  + * AMD I/O Virtualization Technology (IOMMU) Specification, rev 1.26
  + */
  +
  +#include stdint.h
  +
  +#include pc.h
  +
  +struct ivrs_ivhd
  +{
  +uint8_ttype;
  +uint8_tflags;
  +uint16_t   length;
  +uint16_t   devid;
  +uint16_t   capab_off;
  +uint64_t   iommu_base_addr;
  +uint16_t   pci_seg_group;
  +uint16_t   iommu_info;
  +uint32_t   reserved;
  +uint32_t   entry;
  +} __attribute__ ((__packed__));
  +
  +struct ivrs_table
  +{
  +struct acpi_table_headeracpi_hdr;
  +uint32_tiv_info;
  +uint32_treserved[2];
  +struct ivrs_ivhdivhd;
  +} __attribute__ ((__packed__));
  +
  +static const char ivrs_sig[]= IVRS;
  +static const char dfl_id[]  = QEMUQEMU;
  +
  +static void amd_iommu_init_ivrs(void)
  +{
  +int ivrs_size = sizeof(struct ivrs_table);
  +struct ivrs_table *ivrs;
  +struct ivrs_ivhd *ivhd;
  +struct acpi_table_header *acpi_hdr;
  +
  +ivrs = acpi_alloc_table(ivrs_size);
  +acpi_hdr = ivrs-acpi_hdr;
  +ivhd = ivrs-ivhd;
  +
  +ivrs-iv_info = (64  15) |/* Virtual address space size. */
  +(48  8);  /* Physical address space size. */
  +
  +ivhd-type  = 0x10;
  +ivhd-flags = 0;
  +ivhd-length= sizeof(struct ivrs_ivhd);
  +ivhd-devid = 0;
  +ivhd-capab_off = 0;
  +ivhd-iommu_base_addr   = 0;
  +ivhd-pci_seg_group = 0;
  +ivhd-iommu_info= 0;
  +ivhd-reserved  = 0;
  +ivhd-entry = 0;

The table would be filled with wrong data on big endian host. Please
use cpu_to_le64/32/16.

  +strncpy(acpi_hdr-signature, ivrs_sig, 4);
  +acpi_hdr-revision = 1;
  +strncpy(acpi_hdr-oem_id, dfl_id, 6);
  +strncpy(acpi_hdr-oem_table_id, dfl_id, 8);
  +acpi_hdr-oem_revision = 1;
  +strncpy(acpi_hdr-asl_compiler_id, dfl_id, 4);
  +acpi_hdr-asl_compiler_revision = 1;
  +
  +acpi_commit_table(ivrs);
  +}
  +
  +int amd_iommu_init(void)
  +{
  +amd_iommu_init_ivrs();
  +
  +return 0;
  +}
  +
  diff --git a/hw/pc.c b/hw/pc.c
  index 0aebae9..89a7a30 100644
  --- a/hw/pc.c
  +++ b/hw/pc.c
  @@ -906,6 +906,8 @@ static void pc_init1(ram_addr_t ram_size,
  cpu_register_physical_memory((uint32_t)(-bios_size),
   bios_size, bios_offset | IO_MEM_ROM);

 

Re: PCI passthrough resource remapping

2010-03-30 Thread Kenni Lund
2010/3/30 Chris Wright chr...@redhat.com:
 * Kenni Lund (ke...@kelu.dk) wrote:
 Client dmesg: http://pastebin.com/uNG4QK5j
 Host dmesg: http://pastebin.com/jZu3WKZW

 I just verified it and I do get the call trace in the host (which
 disables IRQ 19, used by the PCI USB card), exactly at the same second

 It looks like IRQ 19 is shared between the ehci controller and the
 ivtv tuner.  What do you see in /proc/interrupts on the host (before
 you unbind and after you bind to pci stub)?

Ahh, and even if the ivtv module is not loaded, I will still have a
shared IRQ, right? I didn't see ivtv in /proc/interrupts before, as I
unload the ivtv driver on boot in /etc/rc.local, before unbinding the
ivtv tuner and binding it to pci stub. (the ivtv tuner is normally
assigned to the same guest, but not now while testing the PCI USB
card).

If I don't unload (and unbind/bind) the ivtv driver/tuner on boot in
/etc/local, I get the following in /proc/interrupts on a clean boot:
http://pastebin.com/SFQj58LC

If I now unbind and bind the PCI USB card to pci stub, I get no
changes in /proc/interrupts.

So I suppose I'll need to get rid of this shared IRQ before I can
conclude anything on the patch in git. Hmm, is there some cleaver way
of fixing this in Linux, or do I have to fix it by changing BIOS IRQ
settings, disabling hardware and/or moving the hardware around in
various PCI slots?

Best Regards
Kenni
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


can't start qemu-kvm on 2.6.34-rc3

2010-03-30 Thread Tomasz Chmielewski
With qemu-kvm 0.12.3 used on 2.6.34-rc3, this command:

qemu-kvm -m 1500 -drive 
file=/srv/kvm/images/im1.qcow2,if=virtio,cache=none,index=0,boot=on -drive 
file=/srv/kvm/images/im1-backup.qcow2,if=virtio,cache=none,index=1 -net 
nic,vlan=0,model=virtio,macaddr=F2:4A:51:41:B1:AA -net 
tap,vlan=0,script=/etc/qemu-ifup -localtime -nographic

Renders the below - is it a known issue, or something particular with my 
configuration?


[  282.364859] BUG: unable to handle kernel paging request at 00020001
[  282.364863] IP: [8111c805] __kmalloc_node+0x125/0x200
[  282.364869] PGD 17d967067 PUD 0 
[  282.364871] Oops:  [#1] SMP 
[  282.364873] last sysfs file: 
/sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
[  282.364875] CPU 3 
[  282.364876] Modules linked in: bridge stp radeon ttm drm_kms_helper drm 
i2c_algo_bit tun af_packet xt_tcpudp iptable_filter ip_tables x_tables ipv6 
coretemp binfmt_misc loop dm_mod cpufreq_conservative cpufreq_powersave 
acpi_cpufreq kvm_intel kvm snd_hda_codec_atihdmi snd_hda_codec_realtek 
snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss 
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_timer 
snd_mixer_oss joydev iTCO_wdt snd soundcore snd_page_alloc wmi processor evdev 
i2c_i801 iTCO_vendor_support i2c_core sr_mod e1000e sg pcspkr thermal button 
serio_raw ata_piix ahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 
uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan]
[  282.364908] 
[  282.364909] Pid: 14874, comm: qemu-kvm Not tainted 2.6.34-rc3 #1 DX58SO/ 
   
[  282.364911] RIP: 0010:[8111c805]  [8111c805] 
__kmalloc_node+0x125/0x200
[  282.364914] RSP: 0018:88017e983ae8  EFLAGS: 00010046
[  282.364916] RAX: 880001a72568 RBX: 00020001 RCX: 81106153
[  282.364917] RDX:  RSI: 80d0 RDI: 0003
[  282.364918] RBP: 88017e983b38 R08: a02fff5c R09: 00d2
[  282.364920] R10: 0001 R11: 0001 R12: 8160db68
[  282.364921] R13: 80d0 R14: 80d0 R15: 0246
[  282.364923] FS:  7f838f566710() GS:880001a6() 
knlGS:
[  282.364925] CS:  0010 DS: 002b ES: 002b CR0: 8005003b
[  282.364926] CR2: 00020001 CR3: 00017db7e000 CR4: 26e0
[  282.364928] DR0:  DR1:  DR2: 
[  282.364929] DR3:  DR6: 0ff0 DR7: 0400
[  282.364931] Process qemu-kvm (pid: 14874, threadinfo 88017e982000, task 
88017e478000)
[  282.364932] Stack:
[  282.364933]   81106153  
0008
[  282.364935] 0 88023d94d440 88023d94d440  
a02fff5c
[  282.364937] 0 8163 00d2 88017e983b98 
81106153
[  282.364940] Call Trace:
[  282.364944]  [81106153] ? __vmalloc_area_node+0x63/0x190
[  282.364955]  [a02fff5c] ? __kvm_set_memory_region+0x61c/0x7a0 [kvm]
[  282.364957]  [81106153] __vmalloc_area_node+0x63/0x190
[  282.364963]  [a02fff5c] ? __kvm_set_memory_region+0x61c/0x7a0 [kvm]
[  282.364966]  [811060e2] __vmalloc_node+0xa2/0xb0
[  282.364971]  [a02fff5c] ? __kvm_set_memory_region+0x61c/0x7a0 [kvm]
[  282.364974]  [8110643c] vmalloc+0x2c/0x30
[  282.364979]  [a02fff5c] __kvm_set_memory_region+0x61c/0x7a0 [kvm]
[  282.364984]  [a02fc1c8] ? kvm_io_bus_write+0x68/0xa0 [kvm]
[  282.364991]  [a0300123] kvm_set_memory_region+0x43/0x70 [kvm]
[  282.364997]  [a030016d] kvm_vm_ioctl_set_memory_region+0x1d/0x30 
[kvm]
[  282.365003]  [a03003f0] kvm_vm_ioctl+0x270/0x410 [kvm]
[  282.365009]  [a03014ee] ? kvm_dev_ioctl+0xbe/0x440 [kvm]
[  282.365011]  [8113572d] vfs_ioctl+0x3d/0xd0
[  282.365013]  [81135cba] do_vfs_ioctl+0x8a/0x5a0
[  282.365016]  [811c2a55] ? tomoyo_path_perm+0x45/0x110
[  282.365018]  [81136251] sys_ioctl+0x81/0xa0
[  282.365021]  [8100a002] system_call_fastpath+0x16/0x1b
[  282.365023] Code: e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 0f 1f 00 49 63 54 24 14 
31 f6 48 89 df e8 49 89 0d 00 eb bc 0f 1f 80 00 00 00 00 49 63 54 24 18 48 8b 
14 13 48 89 10 eb 92 66 90 48 89 4d b8 e8 77 ce 29 00 48 
[  282.365041] RIP  [8111c805] __kmalloc_node+0x125/0x200
[  282.365043]  RSP 88017e983ae8
[  282.365044] CR2: 00020001
[  282.365046] ---[ end trace 7fb0c79c903996ce ]---



-- 
Tomasz Chmielewski
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI passthrough resource remapping

2010-03-30 Thread Alexander Graf

On 31.03.2010, at 00:27, Kenni Lund wrote:

 2010/3/30 Chris Wright chr...@redhat.com:
 * Kenni Lund (ke...@kelu.dk) wrote:
 Client dmesg: http://pastebin.com/uNG4QK5j
 Host dmesg: http://pastebin.com/jZu3WKZW
 
 I just verified it and I do get the call trace in the host (which
 disables IRQ 19, used by the PCI USB card), exactly at the same second
 
 It looks like IRQ 19 is shared between the ehci controller and the
 ivtv tuner.  What do you see in /proc/interrupts on the host (before
 you unbind and after you bind to pci stub)?
 
 Ahh, and even if the ivtv module is not loaded, I will still have a
 shared IRQ, right? I didn't see ivtv in /proc/interrupts before, as I
 unload the ivtv driver on boot in /etc/rc.local, before unbinding the
 ivtv tuner and binding it to pci stub. (the ivtv tuner is normally
 assigned to the same guest, but not now while testing the PCI USB
 card).
 
 If I don't unload (and unbind/bind) the ivtv driver/tuner on boot in
 /etc/local, I get the following in /proc/interrupts on a clean boot:
 http://pastebin.com/SFQj58LC
 
 If I now unbind and bind the PCI USB card to pci stub, I get no
 changes in /proc/interrupts.
 
 So I suppose I'll need to get rid of this shared IRQ before I can
 conclude anything on the patch in git. Hmm, is there some cleaver way
 of fixing this in Linux, or do I have to fix it by changing BIOS IRQ
 settings, disabling hardware and/or moving the hardware around in
 various PCI slots?

The easiest thing coming to mind is to unplug the ivtv card for now. It's 
really only to verify that the patch does something useful :-).


Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM warning about uncertified CPU for SMP for AMD model 2, stepping 3

2010-03-30 Thread Jiri Kosina
Hi,

booting 32bit guest on 32bit host on AMD system gives me the following 
warning when KVM is instructed to boot as SMP:



CPU0: AMD QEMU Virtual CPU version 0.9.1 stepping 03
Booting Node   0, Processors  #1
Initializing CPU#1
Leaving ESR disabled.
Mapping cpu 1 to node 0
[ cut here ]
WARNING: at linux-2.6.32/arch/x86/kernel/cpu/amd.c:187 init_amd_k7+0x178/0x187()
Hardware name: 
WARNING: This combination of AMD processors is not suitable for SMP.
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.32.9-0.5-pae #1
Call Trace:
 [c02069a1] try_stack_unwind+0x1b1/0x1f0
 [c020596f] dump_trace+0x3f/0xe0
 [c02065ab] show_trace_log_lvl+0x4b/0x60
 [c02065d8] show_trace+0x18/0x20
 [c052dc19] dump_stack+0x6d/0x74
 [c023ebbf] warn_slowpath_common+0x6f/0xd0
 [c023ec6b] warn_slowpath_fmt+0x2b/0x30
 [c052875c] init_amd_k7+0x178/0x187
 [c052896f] init_amd+0x138/0x279
 [c0527d74] identify_cpu+0xc2/0x223
 [c0527ee1] identify_secondary_cpu+0xc/0x1a
 [c052b3bc] smp_callin+0xd4/0x1a1
 [c052b493] start_secondary+0xa/0xe7



The virtual CPU identifies itself as cpu family 6, model 2, stepping 3 in 
/proc/cpuinfo.

Model 2 is indeed not handled by amd_k7_smp_check() and thus this warning 
is spit out.

Is that correct? Model 2 refers to Pluto/Orion (K75) if I remember 
correctly, right? That one is not oficially certified for SMP by AMD?

If it is not, maybe KVM should better emulate different CPU for 
SMP-enabled configurations, right? 
On the other hand, if it is certified (I have no idea), amd_k7_smp_check() 
should handle this model properly.

Thanks,

-- 
Jiri Kosina
SUSE Labs, Novell Inc.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM warning about uncertified CPU for SMP for AMD model 2, stepping3

2010-03-30 Thread Brian Jackson
On Tuesday 30 March 2010 06:03:02 pm Jiri Kosina wrote:
 Hi,
 
 booting 32bit guest on 32bit host on AMD system gives me the following
 warning when KVM is instructed to boot as SMP:


This has been discussed before (fairly recently). Subject was tainted Linux 
kernel in default SMP QEMU/KVM guests. A few solutions were mentioned, so I 
won't bother boring everyone with repeating them.



 
 
 
 CPU0: AMD QEMU Virtual CPU version 0.9.1 stepping 03
 Booting Node   0, Processors  #1
 Initializing CPU#1
 Leaving ESR disabled.
 Mapping cpu 1 to node 0
 [ cut here ]
 WARNING: at linux-2.6.32/arch/x86/kernel/cpu/amd.c:187
 init_amd_k7+0x178/0x187() Hardware name:
 WARNING: This combination of AMD processors is not suitable for SMP.
 Modules linked in:
 Pid: 0, comm: swapper Not tainted 2.6.32.9-0.5-pae #1
 Call Trace:
  [c02069a1] try_stack_unwind+0x1b1/0x1f0
  [c020596f] dump_trace+0x3f/0xe0
  [c02065ab] show_trace_log_lvl+0x4b/0x60
  [c02065d8] show_trace+0x18/0x20
  [c052dc19] dump_stack+0x6d/0x74
  [c023ebbf] warn_slowpath_common+0x6f/0xd0
  [c023ec6b] warn_slowpath_fmt+0x2b/0x30
  [c052875c] init_amd_k7+0x178/0x187
  [c052896f] init_amd+0x138/0x279
  [c0527d74] identify_cpu+0xc2/0x223
  [c0527ee1] identify_secondary_cpu+0xc/0x1a
  [c052b3bc] smp_callin+0xd4/0x1a1
  [c052b493] start_secondary+0xa/0xe7
 
 
 
 The virtual CPU identifies itself as cpu family 6, model 2, stepping 3 in
 /proc/cpuinfo.
 
 Model 2 is indeed not handled by amd_k7_smp_check() and thus this warning
 is spit out.
 
 Is that correct? Model 2 refers to Pluto/Orion (K75) if I remember
 correctly, right? That one is not oficially certified for SMP by AMD?
 
 If it is not, maybe KVM should better emulate different CPU for
 SMP-enabled configurations, right?
 On the other hand, if it is certified (I have no idea), amd_k7_smp_check()
 should handle this model properly.
 
 Thanks,
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: Increase NR_IOBUS_DEVS limit to 200

2010-03-30 Thread Sridhar Samudrala
This patch increases the current hardcoded limit of NR_IOBUS_DEVS
from 6 to 200. We are hitting this limit when creating a guest with more
than 1 virtio-net device using vhost-net backend. Each virtio-net
device requires 2 such devices to service notifications from rx/tx queues.

Signed-off-by: Sridhar Samudrala s...@us.ibm.com


diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a3fd0f9..7fb48d3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -54,7 +54,7 @@ extern struct kmem_cache *kvm_vcpu_cache;
  */
 struct kvm_io_bus {
int   dev_count;
-#define NR_IOBUS_DEVS 6
+#define NR_IOBUS_DEVS 200 
struct kvm_io_device *devs[NR_IOBUS_DEVS];
 };
 



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI passthrough resource remapping

2010-03-30 Thread Kenni Lund
2010/3/31 Alexander Graf ag...@suse.de:

 On 31.03.2010, at 00:27, Kenni Lund wrote:

 2010/3/30 Chris Wright chr...@redhat.com:
 * Kenni Lund (ke...@kelu.dk) wrote:
 Client dmesg: http://pastebin.com/uNG4QK5j
 Host dmesg: http://pastebin.com/jZu3WKZW

 I just verified it and I do get the call trace in the host (which
 disables IRQ 19, used by the PCI USB card), exactly at the same second

 It looks like IRQ 19 is shared between the ehci controller and the
 ivtv tuner.  What do you see in /proc/interrupts on the host (before
 you unbind and after you bind to pci stub)?

 Ahh, and even if the ivtv module is not loaded, I will still have a
 shared IRQ, right? I didn't see ivtv in /proc/interrupts before, as I
 unload the ivtv driver on boot in /etc/rc.local, before unbinding the
 ivtv tuner and binding it to pci stub. (the ivtv tuner is normally
 assigned to the same guest, but not now while testing the PCI USB
 card).

 If I don't unload (and unbind/bind) the ivtv driver/tuner on boot in
 /etc/local, I get the following in /proc/interrupts on a clean boot:
 http://pastebin.com/SFQj58LC

 If I now unbind and bind the PCI USB card to pci stub, I get no
 changes in /proc/interrupts.

 So I suppose I'll need to get rid of this shared IRQ before I can
 conclude anything on the patch in git. Hmm, is there some cleaver way
 of fixing this in Linux, or do I have to fix it by changing BIOS IRQ
 settings, disabling hardware and/or moving the hardware around in
 various PCI slots?

 The easiest thing coming to mind is to unplug the ivtv card for now. It's 
 really only to verify that the patch does something useful :-).

This was not sufficient. Same issue with the ivtv card unplugged...if
I interpret the content of /proc/interrupt correctly, the USB PCI
cards uses 4 IRQs and out of these three IRQs are still shared with
other components.

The card uses IRQ 16, 18, 19, 20:
pci-stub :02:01.0: PCI INT B - GSI 18 (level, low) - IRQ 18
pci-stub :02:01.1: PCI INT C - GSI 16 (level, low) - IRQ 16
pci-stub :02:01.2: PCI INT D - GSI 20 (level, low) - IRQ 20
pci-stub :02:01.3: PCI INT A - GSI 19 (level, low) - IRQ 19

Only IRQ 20 gets kvm_assigned_intx_device after
unbinding+binding+qemu-kvm launch:
interrupts before: http://pastebin.com/DdJEv29z
interrupts after: http://pastebin.com/zasWEZsL

It seems like i still have shared IRQs with the onboard USB controller
as well as the PATA controller. In the BIOS i have the option of
setting the IRQ of PCI Card 1 and PCI Card 2. I tried changing
these into IRQ 10 and 11 (which are free according to
/proc/interrupts), but it changed absolutely nothing in
/proc/interrupts after booting(?) :-(

Any kind of hint which could lead me in hte right direction, would be
highly appreciated...thanks.

Best Regards
Kenni
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI passthrough resource remapping

2010-03-30 Thread Chris Wright
* Kenni Lund (ke...@kelu.dk) wrote:
 2010/3/30 Chris Wright chr...@redhat.com:
  * Kenni Lund (ke...@kelu.dk) wrote:
  Client dmesg: http://pastebin.com/uNG4QK5j
  Host dmesg: http://pastebin.com/jZu3WKZW
 
  I just verified it and I do get the call trace in the host (which
  disables IRQ 19, used by the PCI USB card), exactly at the same second
 
  It looks like IRQ 19 is shared between the ehci controller and the
  ivtv tuner.  What do you see in /proc/interrupts on the host (before
  you unbind and after you bind to pci stub)?
 
 Ahh, and even if the ivtv module is not loaded, I will still have a
 shared IRQ, right? I didn't see ivtv in /proc/interrupts before, as I
 unload the ivtv driver on boot in /etc/rc.local, before unbinding the
 ivtv tuner and binding it to pci stub. (the ivtv tuner is normally
 assigned to the same guest, but not now while testing the PCI USB
 card).
 
 If I don't unload (and unbind/bind) the ivtv driver/tuner on boot in
 /etc/local, I get the following in /proc/interrupts on a clean boot:
 http://pastebin.com/SFQj58LC
 
 If I now unbind and bind the PCI USB card to pci stub, I get no
 changes in /proc/interrupts.

Sorry, I meant bind to pci_stub and launch guest.  IOW, you should see
kvm_assigned_{msi,msix,intx}_device (from your lspci, I'd expect intx).

What's odd is a device is asserting an interrupt to a line w/ no handler
acking.  The IRQ 19, should have kvm handling the interrupt, in which
case it'd always return IRQ_HANDLED.  And for the case of shared
interrupts, we won't let you start the guest with an assigned device
that's sharing an interrupt.  IOW, we do request_irq() w/out specifying
IRQF_SHARED (meaning we want an exclusive irq).

 So I suppose I'll need to get rid of this shared IRQ before I can
 conclude anything on the patch in git. Hmm, is there some cleaver way
 of fixing this in Linux, or do I have to fix it by changing BIOS IRQ
 settings, disabling hardware and/or moving the hardware around in
 various PCI slots?

The way I typically work around this is simply unbinding the driver from
the device in the host (and thus freeing the irq).

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[027/116] hrtimer: Tune hrtimer_interrupt hang logic

2010-03-30 Thread Greg KH
2.6.32-stable review patch.  If anyone has any objections, please let us know.

--

From: Thomas Gleixner t...@linutronix.de

commit 41d2e494937715d3150e5c75d01f0e75ae899337 upstream.

The hrtimer_interrupt hang logic adjusts min_delta_ns based on the
execution time of the hrtimer callbacks.

This is error-prone for virtual machines, where a guest vcpu can be
scheduled out during the execution of the callbacks (and the callbacks
themselves can do operations that translate to blocking operations in
the hypervisor), which in can lead to large min_delta_ns rendering the
system unusable.

Replace the current heuristics with something more reliable. Allow the
interrupt code to try 3 times to catch up with the lost time. If that
fails use the total time spent in the interrupt handler to defer the
next timer interrupt so the system can catch up with other things
which got delayed. Limit that deferment to 100ms.

The retry events and the maximum time spent in the interrupt handler
are recorded and exposed via /proc/timer_list

Inspired by a patch from Marcelo.

Reported-by: Michael Tokarev m...@tls.msk.ru
Signed-off-by: Thomas Gleixner t...@linutronix.de
Tested-by: Marcelo Tosatti mtosa...@redhat.com
Cc: kvm@vger.kernel.org
Cc: Jeremy Fitzhardinge jer...@goop.org
Signed-off-by: Greg Kroah-Hartman gre...@suse.de

---
 include/linux/hrtimer.h  |   13 --
 kernel/hrtimer.c |   96 +++
 kernel/time/timer_list.c |5 +-
 3 files changed, 70 insertions(+), 44 deletions(-)

--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -162,10 +162,11 @@ struct hrtimer_clock_base {
  * @expires_next:  absolute time of the next event which was scheduled
  * via clock_set_next_event()
  * @hres_active:   State of high resolution mode
- * @check_clocks:  Indictator, when set evaluate time source and clock
- * event devices whether high resolution mode can be
- * activated.
- * @nr_events: Total number of timer interrupt events
+ * @hang_detected: The last hrtimer interrupt detected a hang
+ * @nr_events: Total number of hrtimer interrupt events
+ * @nr_retries:Total number of hrtimer interrupt retries
+ * @nr_hangs:  Total number of hrtimer interrupt hangs
+ * @max_hang_time: Maximum time spent in hrtimer_interrupt
  */
 struct hrtimer_cpu_base {
spinlock_t  lock;
@@ -173,7 +174,11 @@ struct hrtimer_cpu_base {
 #ifdef CONFIG_HIGH_RES_TIMERS
ktime_t expires_next;
int hres_active;
+   int hang_detected;
unsigned long   nr_events;
+   unsigned long   nr_retries;
+   unsigned long   nr_hangs;
+   ktime_t max_hang_time;
 #endif
 };
 
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -557,7 +557,7 @@ hrtimer_force_reprogram(struct hrtimer_c
 static int hrtimer_reprogram(struct hrtimer *timer,
 struct hrtimer_clock_base *base)
 {
-   ktime_t *expires_next = __get_cpu_var(hrtimer_bases).expires_next;
+   struct hrtimer_cpu_base *cpu_base = __get_cpu_var(hrtimer_bases);
ktime_t expires = ktime_sub(hrtimer_get_expires(timer), base-offset);
int res;
 
@@ -582,7 +582,16 @@ static int hrtimer_reprogram(struct hrti
if (expires.tv64  0)
return -ETIME;
 
-   if (expires.tv64 = expires_next-tv64)
+   if (expires.tv64 = cpu_base-expires_next.tv64)
+   return 0;
+
+   /*
+* If a hang was detected in the last timer interrupt then we
+* do not schedule a timer which is earlier than the expiry
+* which we enforced in the hang detection. We want the system
+* to make progress.
+*/
+   if (cpu_base-hang_detected)
return 0;
 
/*
@@ -590,7 +599,7 @@ static int hrtimer_reprogram(struct hrti
 */
res = tick_program_event(expires, 0);
if (!IS_ERR_VALUE(res))
-   *expires_next = expires;
+   cpu_base-expires_next = expires;
return res;
 }
 
@@ -1217,29 +1226,6 @@ static void __run_hrtimer(struct hrtimer
 
 #ifdef CONFIG_HIGH_RES_TIMERS
 
-static int force_clock_reprogram;
-
-/*
- * After 5 iteration's attempts, we consider that hrtimer_interrupt()
- * is hanging, which could happen with something that slows the interrupt
- * such as the tracing. Then we force the clock reprogramming for each future
- * hrtimer interrupts to avoid infinite loops and use the min_delta_ns
- * threshold that we will overwrite.
- * The next tick event will be scheduled to 3 times we currently spend on
- * hrtimer_interrupt(). This gives a good compromise, the cpus will spend
- * 1/4 of their time to process the hrtimer interrupts. This is enough to
- * 

Re: PCI passthrough resource remapping

2010-03-30 Thread Kenni Lund
2010/3/31 Chris Wright chr...@redhat.com:
 So I suppose I'll need to get rid of this shared IRQ before I can
 conclude anything on the patch in git. Hmm, is there some cleaver way
 of fixing this in Linux, or do I have to fix it by changing BIOS IRQ
 settings, disabling hardware and/or moving the hardware around in
 various PCI slots?

 The way I typically work around this is simply unbinding the driver from
 the device in the host (and thus freeing the irq).

Doh...anyway, I went all the way, found a USB-PS2 adaptor, disabled
the onboard USB controller and PATA controller in BIOS, and now got
kvm_assigned_intx_device for all 4 IRQs :)

Booting the guest and tuning to a DVB-T channel _works_ !!! :-D Thanks
a lot for your help...I have one more question, though: If I have two
devices (like the ivtv tuner and the USB card) and they share an IRQ,
if I then assign BOTH of them to the same guest, will it then work?

Alexander, the patch works, I hope to see it in a stable release in
the near future ;)

Best Regards
Kenni
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI passthrough resource remapping

2010-03-30 Thread Chris Wright
* Kenni Lund (ke...@kelu.dk) wrote:
 2010/3/31 Alexander Graf ag...@suse.de:
  The easiest thing coming to mind is to unplug the ivtv card for now. It's 
  really only to verify that the patch does something useful :-).
 
 This was not sufficient. Same issue with the ivtv card unplugged...if
 I interpret the content of /proc/interrupt correctly, the USB PCI
 cards uses 4 IRQs and out of these three IRQs are still shared with
 other components.
 
 The card uses IRQ 16, 18, 19, 20:
 pci-stub :02:01.0: PCI INT B - GSI 18 (level, low) - IRQ 18
 pci-stub :02:01.1: PCI INT C - GSI 16 (level, low) - IRQ 16
 pci-stub :02:01.2: PCI INT D - GSI 20 (level, low) - IRQ 20
 pci-stub :02:01.3: PCI INT A - GSI 19 (level, low) - IRQ 19
 
 Only IRQ 20 gets kvm_assigned_intx_device after
 unbinding+binding+qemu-kvm launch:

What is your qemu command line? It's acting as if function .2 (one of
the ochi controllers) is generating the interrupt rather than .3 (the
ehci controller).  And indeed, there is no handler to ack that interrupt
(just the likely on-board uhci one).

before:

19:  0  0  0  0   IO-APIC-fasteoi   
ehci_hcd:usb3, uhci_hcd:usb9
20:  0  0  0  0   IO-APIC-fasteoi   
ohci_hcd:usb12

after:
19:  75010  73438  76479  75074   IO-APIC-fasteoi   
uhci_hcd:usb9
20:  0  0  0  0   IO-APIC-fasteoi   
kvm_assigned_intx_device

Can you unbind uhci_hcd:usb9 and then give both .2 and .3 to the guest?

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Add Mergeable RX buffer feature to vhost_net

2010-03-30 Thread David Stevens
This patch adds support for the Mergeable Receive Buffers feature to
vhost_net.

Changes:
1) generalize descriptor allocation functions to allow multiple
descriptors per packet
2) add socket peek to know datalen at buffer allocation time
3) change notification to enable a multi-buffer max packet, rather
than the single-buffer run until completely empty

Changes from previous revision:
1) incorporate review comments from Michael Tsirkin
2) assume use of TUNSETVNETHDRSZ ioctl by qemu, which simplifies vnet 
header
processing
3) fixed notification code to only affect the receive side

Signed-Off-By: David L Stevens dlstev...@us.ibm.com

[in-line for review, attached for applying w/o whitespace mangling]

diff -ruNp net-next-p0/drivers/vhost/net.c net-next-p3/drivers/vhost/net.c
--- net-next-p0/drivers/vhost/net.c 2010-03-22 12:04:38.0 
-0700
+++ net-next-p3/drivers/vhost/net.c 2010-03-30 12:50:57.0 
-0700
@@ -54,26 +54,6 @@ struct vhost_net {
enum vhost_net_poll_state tx_poll_state;
 };
 
-/* Pop first len bytes from iovec. Return number of segments used. */
-static int move_iovec_hdr(struct iovec *from, struct iovec *to,
- size_t len, int iov_count)
-{
-   int seg = 0;
-   size_t size;
-   while (len  seg  iov_count) {
-   size = min(from-iov_len, len);
-   to-iov_base = from-iov_base;
-   to-iov_len = size;
-   from-iov_len -= size;
-   from-iov_base += size;
-   len -= size;
-   ++from;
-   ++to;
-   ++seg;
-   }
-   return seg;
-}
-
 /* Caller must have TX VQ lock */
 static void tx_poll_stop(struct vhost_net *net)
 {
@@ -97,7 +77,8 @@ static void tx_poll_start(struct vhost_n
 static void handle_tx(struct vhost_net *net)
 {
struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
-   unsigned head, out, in, s;
+   unsigned out, in;
+   struct iovec head;
struct msghdr msg = {
.msg_name = NULL,
.msg_namelen = 0,
@@ -108,8 +89,8 @@ static void handle_tx(struct vhost_net *
};
size_t len, total_len = 0;
int err, wmem;
-   size_t hdr_size;
struct socket *sock = rcu_dereference(vq-private_data);
+
if (!sock)
return;
 
@@ -127,22 +108,19 @@ static void handle_tx(struct vhost_net *
 
if (wmem  sock-sk-sk_sndbuf / 2)
tx_poll_stop(net);
-   hdr_size = vq-hdr_size;
 
for (;;) {
-   head = vhost_get_vq_desc(net-dev, vq, vq-iov,
-ARRAY_SIZE(vq-iov),
-out, in,
-NULL, NULL);
+   head.iov_base = (void *)vhost_get_vq_desc(net-dev, vq,
+   vq-iov, ARRAY_SIZE(vq-iov), out, in, NULL, 
NULL);
/* Nothing new?  Wait for eventfd to tell us they 
refilled. */
-   if (head == vq-num) {
+   if (head.iov_base == (void *)vq-num) {
wmem = atomic_read(sock-sk-sk_wmem_alloc);
if (wmem = sock-sk-sk_sndbuf * 3 / 4) {
tx_poll_start(net, sock);
set_bit(SOCK_ASYNC_NOSPACE, sock-flags);
break;
}
-   if (unlikely(vhost_enable_notify(vq))) {
+   if (unlikely(vhost_enable_notify(vq, 0))) {
vhost_disable_notify(vq);
continue;
}
@@ -154,27 +132,30 @@ static void handle_tx(struct vhost_net *
break;
}
/* Skip header. TODO: support TSO. */
-   s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, out);
msg.msg_iovlen = out;
-   len = iov_length(vq-iov, out);
+   head.iov_len = len = iov_length(vq-iov, out);
+
/* Sanity check */
if (!len) {
-   vq_err(vq, Unexpected header len for TX: 
-  %zd expected %zd\n,
-  iov_length(vq-hdr, s), hdr_size);
+   vq_err(vq, Unexpected buffer len for TX: %zd , 
len);
break;
}
-   /* TODO: Check specific error and bomb out unless ENOBUFS? 
*/
err = sock-ops-sendmsg(NULL, sock, msg, len);
if (unlikely(err  0)) {
-   vhost_discard_vq_desc(vq);
-   tx_poll_start(net, sock);
+   if (err == -EAGAIN) {
+   vhost_discard_vq_desc(vq, 1);
+   tx_poll_start(net, sock);
+   } else {
+  

Re: PCI passthrough resource remapping

2010-03-30 Thread Chris Wright
* Kenni Lund (ke...@kelu.dk) wrote:
 2010/3/31 Chris Wright chr...@redhat.com:
  So I suppose I'll need to get rid of this shared IRQ before I can
  conclude anything on the patch in git. Hmm, is there some cleaver way
  of fixing this in Linux, or do I have to fix it by changing BIOS IRQ
  settings, disabling hardware and/or moving the hardware around in
  various PCI slots?
 
  The way I typically work around this is simply unbinding the driver from
  the device in the host (and thus freeing the irq).
 
 Doh...anyway, I went all the way, found a USB-PS2 adaptor, disabled
 the onboard USB controller and PATA controller in BIOS, and now got
 kvm_assigned_intx_device for all 4 IRQs :)

A little extreme...but hey, that works ;-)  I'm still curious what's
going on w/ your PCI card and the irq routing.  Something is suspect.

 Booting the guest and tuning to a DVB-T channel _works_ !!! :-D Thanks
 a lot for your help...I have one more question, though: If I have two
 devices (like the ivtv tuner and the USB card) and they share an IRQ,
 if I then assign BOTH of them to the same guest, will it then work?

No, it won't.  The first one will request an exclusive interrupt, and the
second one will fail its request_irq.  There's two options, one is to
use a different driver for kvm (if the device is new enough to handle
PCI 2.3) or use MSI/MSI-X based devices (that works best ;)  Neither of
those options are readily available in your case.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM warning about uncertified CPU for SMP for AMD model 2, stepping 3

2010-03-30 Thread Andi Kleen
On Wed, Mar 31, 2010 at 01:03:02AM +0200, Jiri Kosina wrote:
 Hi,
 
 booting 32bit guest on 32bit host on AMD system gives me the following 
 warning when KVM is instructed to boot as SMP:

I guess these warnings could be just disabled. With nearly everyone
using multi-core these days they are kind of obsolete anyways.

-Andi


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to debug problems when nothing shows up in kvm_stat on kvm-88?

2010-03-30 Thread Neo Jia
hi,

I am running official kvm-88 release with my own 32-bit .so library
dlopen'ed by qemu-kvm. So it has 32-bit qemu-kvm on 64-bit kvm kernel
module. Everything works great but after a while the guest (winxp
32-bit) hard hangs and kvm_stat shows 0.

So, is there any way to trace back when the kvm_stat starts showing
0s? qemu-kvm is still alive.

Thanks,
Neo

-- 
I would remember that if researchers were not ambitious
probably today we haven't the technology we are using!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST PATCH] KVM test: kvm_vm: destroy VM if hugepage setup fails

2010-03-30 Thread Michael Goldish
Signed-off-by: Michael Goldish mgold...@redhat.com
---
 client/tests/kvm/kvm_vm.py |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index 921414d..047505a 100755
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -486,6 +486,7 @@ class VM:
   qemu command:\n%s % qemu_command)
 logging.error(Output: + kvm_utils.format_str_for_message(
   self.process.get_output()))
+self.destroy()
 return False
 
 logging.debug(VM appears to be alive with PID %d,
-- 
1.5.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/7] Beginning implementing the AMD IOMMU emulation

2010-03-30 Thread Avi Kivity

On 03/30/2010 10:40 PM, Joerg Roedel wrote:



In short, this demonstrates a mechanism of inserting ACPI tables without
modifying SeaBIOS or other BIOS implementations. I also have a SeaBIOS
equivalent, but I think this approach is better, at least at the moment.
 

I like the approach implemented in this patchset because of its
simplicity. The right place for building acpi tables is the bios,
though. I am fine with both ways. Anthony, Avi, what do you
think about it?
   


I agree.  Let qemu emulate the hardware, build the tables in the BIOS.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html