date:20141126

[Qemu-devel] [PATCH 1/1] qmp: extend QMP to provide read/write access to physical memory

2014-11-26 Thread Bryan D. Payne

This patch adds a new QMP command that sets up a domain socket. This
socket can then be used for fast read/write access to the guest's
physical memory. The key benefit to this system over existing solutions
is speed. Using this patch, guest memory can be copied out at a rate of
~200MB/sec, depending on the hardware. Existing solutions only achieve
a small fraction of this speed.

Signed-off-by: Bryan D. Payne 
---
 Makefile.target |   2 +-
 memory-access.c | 200 
 memory-access.h |  11 
 monitor.c   |  10 +++
 qmp-commands.hx |  27 
 5 files changed, 249 insertions(+), 1 deletion(-)
 create mode 100644 memory-access.c
 create mode 100644 memory-access.h

diff --git a/Makefile.target b/Makefile.target
index e9ff1ee..4b3cd99 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -127,7 +127,7 @@ endif #CONFIG_BSD_USER
 # System emulator target
 ifdef CONFIG_SOFTMMU
 obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o
-obj-y += qtest.o bootdevice.o
+obj-y += qtest.o bootdevice.o memory-access.o
 obj-y += hw/
 obj-$(CONFIG_FDT) += device_tree.o
 obj-$(CONFIG_KVM) += kvm-all.o
diff --git a/memory-access.c b/memory-access.c
new file mode 100644
index 000..f696d7b
--- /dev/null
+++ b/memory-access.c
@@ -0,0 +1,200 @@
+/*
+ * Access guest physical memory via a domain socket.
+ *
+ * Copyright (c) 2014 Bryan D. Payne (bdpa...@acm.org)
+ */
+
+#include "memory-access.h"
+#include "qemu-common.h"
+#include "exec/cpu-common.h"
+#include "config.h"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct request {
+uint8_t type;  /* 0 quit, 1 read, 2 write, ... rest reserved */
+uint64_t address;  /* address to read from OR write to */
+uint64_t length;   /* number of bytes to read OR write */
+};
+
+static uint64_t
+connection_read_memory(uint64_t user_paddr, void *buf, uint64_t user_len)
+{
+hwaddr paddr = (hwaddr) user_paddr;
+hwaddr len = (hwaddr) user_len;
+void *guestmem = cpu_physical_memory_map(paddr, &len, 0);
+if (!guestmem) {
+return 0;
+}
+memcpy(buf, guestmem, len);
+cpu_physical_memory_unmap(guestmem, len, 0, len);
+
+return len;
+}
+
+static uint64_t
+connection_write_memory(uint64_t user_paddr,
+const void *buf,
+uint64_t user_len)
+{
+hwaddr paddr = (hwaddr) user_paddr;
+hwaddr len = (hwaddr) user_len;
+void *guestmem = cpu_physical_memory_map(paddr, &len, 1);
+if (!guestmem) {
+return 0;
+}
+memcpy(guestmem, buf, len);
+cpu_physical_memory_unmap(guestmem, len, 0, len);
+
+return len;
+}
+
+static void
+send_success_ack(int connection_fd)
+{
+uint8_t success = 1;
+int nbytes = write(connection_fd, &success, 1);
+if (nbytes != 1) {
+printf("QemuMemoryAccess: failed to send success ack\n");
+}
+}
+
+static void
+send_fail_ack(int connection_fd)
+{
+uint8_t fail = 0;
+int nbytes = write(connection_fd, &fail, 1);
+if (nbytes != 1) {
+printf("QemuMemoryAccess: failed to send fail ack\n");
+}
+}
+
+static void
+connection_handler(int connection_fd)
+{
+int nbytes;
+struct request req;
+
+while (1) {
+/* client request should match the struct request format */
+nbytes = read(connection_fd, &req, sizeof(struct request));
+if (nbytes != sizeof(struct request)) {
+/* error */
+send_fail_ack(connection_fd);
+continue;
+} else if (req.type == 0) {
+/* request to quit, goodbye */
+break;
+} else if (req.type == 1) {
+/* request to read */
+char *buf = g_malloc(req.length + 1);
+nbytes = connection_read_memory(req.address, buf, req.length);
+if (nbytes != req.length) {
+/* read failure, return failure message */
+buf[req.length] = 0; /* set last byte to 0 for failure */
+nbytes = write(connection_fd, buf, 1);
+} else {
+/* read success, return bytes */
+buf[req.length] = 1; /* set last byte to 1 for success */
+nbytes = write(connection_fd, buf, nbytes + 1);
+}
+g_free(buf);
+} else if (req.type == 2) {
+/* request to write */
+void *write_buf = g_malloc(req.length);
+nbytes = read(connection_fd, &write_buf, req.length);
+if (nbytes != req.length) {
+/* failed reading the message to write */
+send_fail_ack(connection_fd);
+} else{
+/* do the write */
+nbytes = connection_write_memory(req.address,
+ write_buf,
+ req.length);
+if (nby

[Qemu-devel] [PATCH 0/1] qmp: extend QMP to provide read/write access to physical memory

2014-11-26 Thread Bryan D. Payne

Summary:
This patch improves Qemu support for virtual machine introspection.

Background:
Virtual machine introspection (VMI) is a technique where one accesses the
memory of a (usually) paused guest. This access is typically used to perform
security checks, debugging, or malware analysis. The LibVMI project provides
and open source library that simplifies VMI programming. LibVMI supports 
both Xen and KVM environments.

Under KVM, LibVMI can work on systems today (albeit slowly) using the human
monitor command functionality to extract memory with the xp command. This
access is too slow for performance sensitive applications, so the LibVMI
project has created and maintained a QEMU patch that enables faster access.
We have used this patch for about 3 years now and it appears to be working
nicely for our community.

The patch in this email is an updated version of the LibVMI patch that aims
to conform to the Qemu coding guidelines. It is my hope that we can include
this in Qemu so that LibVMI users can leverage this faster access without
needing to do custom Qemu builds on their KVM systems.


Bryan D. Payne (1):
  qmp: extend QMP to provide read/write access to physical memory

 Makefile.target |   2 +-
 memory-access.c | 200 
 memory-access.h |  11 
 monitor.c   |  10 +++
 qmp-commands.hx |  27 
 5 files changed, 249 insertions(+), 1 deletion(-)
 create mode 100644 memory-access.c
 create mode 100644 memory-access.h

-- 
1.9.1

[Qemu-devel] [PATCH 3/3 V1] kvm: extend kvm_irqchip_add_msi_route to work on s390

2014-11-26 Thread Frank Blaschka

From: Frank Blaschka 

on s390 MSI-X irqs are presented as thin or adapter interrupts
for this we have to reorganize the routing entry to contain
valid information for the adapter interrupt code on s390.
To minimize impact on existing code we introduce an architecture
function to fixup the routing entry.

Signed-off-by: Frank Blaschka 
---
 include/sysemu/kvm.h |  4 
 kvm-all.c|  7 +++
 target-arm/kvm.c |  6 ++
 target-i386/kvm.c|  6 ++
 target-mips/kvm.c|  6 ++
 target-ppc/kvm.c |  6 ++
 target-s390x/kvm.c   | 26 ++
 7 files changed, 61 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index b0cd657..702dc93 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -148,6 +148,7 @@ extern bool kvm_readonly_mem_allowed;
 
 struct kvm_run;
 struct kvm_lapic_state;
+struct kvm_irq_routing_entry;
 
 typedef struct KVMCapabilityInfo {
 const char *name;
@@ -259,6 +260,9 @@ int kvm_arch_on_sigbus(int code, void *addr);
 
 void kvm_arch_init_irq_routing(KVMState *s);
 
+int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+ uint64_t address, uint32_t data);
+
 int kvm_set_irq(KVMState *s, int irq, int level);
 int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
 
diff --git a/kvm-all.c b/kvm-all.c
index 596e7ce..38589b3 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1208,6 +1208,10 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage 
msg)
 kroute.u.msi.address_lo = (uint32_t)msg.address;
 kroute.u.msi.address_hi = msg.address >> 32;
 kroute.u.msi.data = le32_to_cpu(msg.data);
+if (kvm_arch_fixup_msi_route(&kroute, msg.address, msg.data)) {
+kvm_irqchip_release_virq(s, virq);
+return -EINVAL;
+}
 
 kvm_add_routing_entry(s, &kroute);
 kvm_irqchip_commit_routes(s);
@@ -1233,6 +1237,9 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, 
MSIMessage msg)
 kroute.u.msi.address_lo = (uint32_t)msg.address;
 kroute.u.msi.address_hi = msg.address >> 32;
 kroute.u.msi.data = le32_to_cpu(msg.data);
+if (kvm_arch_fixup_msi_route(&kroute, msg.address, msg.data)) {
+return -EINVAL;
+}
 
 return kvm_update_routing_entry(s, &kroute);
 }
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 319784d..3285f81 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -441,3 +441,9 @@ int kvm_arch_irqchip_create(KVMState *s)
 
 return 0;
 }
+
+int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+ uint64_t address, uint32_t data)
+{
+return 0;
+}
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index ccf36e8..7bc818c 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -2707,3 +2707,9 @@ int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id)
 return kvm_deassign_irq_internal(s, dev_id, KVM_DEV_IRQ_GUEST_MSIX |
 KVM_DEV_IRQ_HOST_MSIX);
 }
+
+int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+ uint64_t address, uint32_t data)
+{
+return 0;
+}
diff --git a/target-mips/kvm.c b/target-mips/kvm.c
index 97fd51a..c7eb1dc 100644
--- a/target-mips/kvm.c
+++ b/target-mips/kvm.c
@@ -688,3 +688,9 @@ int kvm_arch_get_registers(CPUState *cs)
 
 return ret;
 }
+
+int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+ uint64_t address, uint32_t data)
+{
+return 0;
+}
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 6843fa0..04c83cd 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -2388,3 +2388,9 @@ out_close:
 error_out:
 return;
 }
+
+int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+ uint64_t address, uint32_t data)
+{
+return 0;
+}
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 1b62257..32af46b 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -41,6 +41,7 @@
 #include "trace.h"
 #include "qapi-event.h"
 #include "hw/s390x/s390-pci-inst.h"
+#include "hw/s390x/s390-pci-bus.h"
 
 /* #define DEBUG_KVM */
 
@@ -1510,3 +1511,28 @@ int kvm_s390_set_cpu_state(S390CPU *cpu, uint8_t 
cpu_state)
 
 return ret;
 }
+
+int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+  uint64_t address, uint32_t data)
+{
+S390PCIBusDevice *pbdev;
+uint32_t fid = data >> ZPCI_MSI_VEC_BITS;
+uint32_t vec = data & ZPCI_MSI_VEC_MASK;
+
+pbdev = s390_pci_find_dev_by_fid(fid);
+if (!pbdev) {
+DPRINTF("add_msi_route no dev\n");
+return -ENODEV;
+}
+
+pbdev->routes.adapter.ind_offset = vec;
+
+route->type = KVM_IRQ_ROUTING_S390_ADAPTER;
+route->flags = 0;
+route->u.adapter.summary_addr = pbdev->routes.adapter.summary_addr;
+route->u.adapter.ind_addr = pbdev->routes.adapter.ind_addr;
+route->u.adapter.summary_offset = pbdev->routes.adapter.summary_offset

[Qemu-devel] [PATCH 0/3 V1] add PCI support for the s390 platform

2014-11-26 Thread Frank Blaschka

This set of patches implemets PCI support for the s390 platform.
Now it is possible to run virtio-net-pci and potentially all
virtual pci devices conforming to s390 platform constrains.

V1 added lot of feedback from Alex Graf
   fixed tons of endian issues

Please review and consider for integration into 2.3

Thanks,

Frank

Frank Blaschka (3):
  s390: Add PCI bus support
  s390: implement pci instructions
  kvm: extend kvm_irqchip_add_msi_route to work on s390

 default-configs/s390x-softmmu.mak |   1 +
 hw/s390x/Makefile.objs|   1 +
 hw/s390x/css.c|   5 +
 hw/s390x/css.h|   1 +
 hw/s390x/s390-pci-bus.c   | 554 +
 hw/s390x/s390-pci-bus.h   | 217 
 hw/s390x/s390-pci-inst.c  | 711 ++
 hw/s390x/s390-pci-inst.h  | 287 +++
 hw/s390x/s390-virtio-ccw.c|   7 +
 hw/s390x/sclp.c   |  10 +-
 include/hw/s390x/sclp.h   |   8 +
 include/sysemu/kvm.h  |   4 +
 kvm-all.c |   7 +
 target-arm/kvm.c  |   6 +
 target-i386/kvm.c |   6 +
 target-mips/kvm.c |   6 +
 target-ppc/kvm.c  |   6 +
 target-s390x/ioinst.c |  52 +++
 target-s390x/ioinst.h |   1 +
 target-s390x/kvm.c| 174 ++
 20 files changed, 2063 insertions(+), 1 deletion(-)
 create mode 100644 hw/s390x/s390-pci-bus.c
 create mode 100644 hw/s390x/s390-pci-bus.h
 create mode 100644 hw/s390x/s390-pci-inst.c
 create mode 100644 hw/s390x/s390-pci-inst.h

-- 
1.8.5.5

[Qemu-devel] [PATCH 1/3 V1] s390: Add PCI bus support

2014-11-26 Thread Frank Blaschka

From: Frank Blaschka 

This patch implements a pci bus for s390x together with infrastructure
to generate and handle hotplug events, to configure/unconfigure via
sclp instruction, to do iommu translations and provide s390 support for
MSI/MSI-X notification processing.

Signed-off-by: Frank Blaschka 
---
 default-configs/s390x-softmmu.mak |   1 +
 hw/s390x/Makefile.objs|   1 +
 hw/s390x/css.c|   5 +
 hw/s390x/css.h|   1 +
 hw/s390x/s390-pci-bus.c   | 554 ++
 hw/s390x/s390-pci-bus.h   | 217 +++
 hw/s390x/s390-virtio-ccw.c|   7 +
 hw/s390x/sclp.c   |  10 +-
 include/hw/s390x/sclp.h   |   8 +
 target-s390x/ioinst.c |  52 
 target-s390x/ioinst.h |   1 +
 11 files changed, 856 insertions(+), 1 deletion(-)
 create mode 100644 hw/s390x/s390-pci-bus.c
 create mode 100644 hw/s390x/s390-pci-bus.h

diff --git a/default-configs/s390x-softmmu.mak 
b/default-configs/s390x-softmmu.mak
index 126d88d..6ee2ff8 100644
--- a/default-configs/s390x-softmmu.mak
+++ b/default-configs/s390x-softmmu.mak
@@ -1,3 +1,4 @@
+include pci.mak
 CONFIG_VIRTIO=y
 CONFIG_SCLPCONSOLE=y
 CONFIG_S390_FLIC=y
diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
index 1ba6c3a..428d957 100644
--- a/hw/s390x/Makefile.objs
+++ b/hw/s390x/Makefile.objs
@@ -8,3 +8,4 @@ obj-y += ipl.o
 obj-y += css.o
 obj-y += s390-virtio-ccw.o
 obj-y += virtio-ccw.o
+obj-y += s390-pci-bus.o
diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index b67c039..7553085 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1299,6 +1299,11 @@ void css_generate_chp_crws(uint8_t cssid, uint8_t chpid)
 /* TODO */
 }
 
+void css_generate_css_crws(uint8_t cssid)
+{
+css_queue_crw(CRW_RSC_CSS, 0, 0, 0);
+}
+
 int css_enable_mcsse(void)
 {
 trace_css_enable_facility("mcsse");
diff --git a/hw/s390x/css.h b/hw/s390x/css.h
index 33104ac..7e53148 100644
--- a/hw/s390x/css.h
+++ b/hw/s390x/css.h
@@ -101,6 +101,7 @@ void css_queue_crw(uint8_t rsc, uint8_t erc, int chain, 
uint16_t rsid);
 void css_generate_sch_crws(uint8_t cssid, uint8_t ssid, uint16_t schid,
int hotplugged, int add);
 void css_generate_chp_crws(uint8_t cssid, uint8_t chpid);
+void css_generate_css_crws(uint8_t cssid);
 void css_adapter_interrupt(uint8_t isc);
 
 #define CSS_IO_ADAPTER_VIRTIO 1
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
new file mode 100644
index 000..06d153a
--- /dev/null
+++ b/hw/s390x/s390-pci-bus.c
@@ -0,0 +1,554 @@
+/*
+ * s390 PCI BUS
+ *
+ * Copyright 2014 IBM Corp.
+ * Author(s): Frank Blaschka 
+ *Hong Bo Li 
+ *Yi Min Zhao 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "qemu/error-report.h"
+#include "s390-pci-bus.h"
+
+/* #define DEBUG_S390PCI_BUS */
+#ifdef DEBUG_S390PCI_BUS
+#define DPRINTF(fmt, ...) \
+do { fprintf(stderr, "S390pci-bus: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+do { } while (0)
+#endif
+
+int chsc_sei_nt2_get_event(void *res)
+{
+ChscSeiNt2Res *nt2_res = (ChscSeiNt2Res *)res;
+PciCcdfAvail *accdf;
+PciCcdfErr *eccdf;
+int rc = 1;
+SeiContainer *sei_cont;
+S390pciState *s = S390_PCI_HOST_BRIDGE(
+object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL));
+
+if (!s) {
+return rc;
+}
+
+sei_cont = QTAILQ_FIRST(&s->pending_sei);
+if (sei_cont) {
+QTAILQ_REMOVE(&s->pending_sei, sei_cont, link);
+nt2_res->nt = 2;
+nt2_res->cc = sei_cont->cc;
+switch (sei_cont->cc) {
+case 1: /* error event */
+eccdf = (PciCcdfErr *)nt2_res->ccdf;
+eccdf->fid = cpu_to_be32(sei_cont->fid);
+eccdf->fh = cpu_to_be32(sei_cont->fh);
+break;
+case 2: /* availability event */
+accdf = (PciCcdfAvail *)nt2_res->ccdf;
+accdf->fid = cpu_to_be32(sei_cont->fid);
+accdf->fh = cpu_to_be32(sei_cont->fh);
+accdf->pec = cpu_to_be16(sei_cont->pec);
+break;
+default:
+abort();
+}
+g_free(sei_cont);
+rc = 0;
+}
+
+return rc;
+}
+
+int chsc_sei_nt2_have_event(void)
+{
+S390pciState *s = S390_PCI_HOST_BRIDGE(
+object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL));
+
+if (!s) {
+return 0;
+}
+
+return !QTAILQ_EMPTY(&s->pending_sei);
+}
+
+S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid)
+{
+S390PCIBusDevice *pbdev;
+int i;
+S390pciState *s = S390_PCI_HOST_BRIDGE(
+object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL));
+
+if (!s) {
+return NULL;
+}
+
+for (i = 0; i < PCI_SLOT_MAX; i++) {
+pbdev =

[Qemu-devel] [PATCH 2/3 V1] s390: implement pci instructions

2014-11-26 Thread Frank Blaschka

From: Frank Blaschka 

This patch implements the s390 pci instructions in qemu. It allows
to access and drive pci devices attached to the s390 pci bus.
Because of platform constrains devices using IO BARs are not
supported. Also a device has to support MSI/MSI-X to run on s390.

Signed-off-by: Frank Blaschka 
---
 hw/s390x/Makefile.objs   |   2 +-
 hw/s390x/s390-pci-inst.c | 711 +++
 hw/s390x/s390-pci-inst.h | 287 +++
 target-s390x/kvm.c   | 148 ++
 4 files changed, 1147 insertions(+), 1 deletion(-)
 create mode 100644 hw/s390x/s390-pci-inst.c
 create mode 100644 hw/s390x/s390-pci-inst.h

diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
index 428d957..27cd75a 100644
--- a/hw/s390x/Makefile.objs
+++ b/hw/s390x/Makefile.objs
@@ -8,4 +8,4 @@ obj-y += ipl.o
 obj-y += css.o
 obj-y += s390-virtio-ccw.o
 obj-y += virtio-ccw.o
-obj-y += s390-pci-bus.o
+obj-y += s390-pci-bus.o s390-pci-inst.o
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
new file mode 100644
index 000..e233046
--- /dev/null
+++ b/hw/s390x/s390-pci-inst.c
@@ -0,0 +1,711 @@
+/*
+ * s390 PCI instructions
+ *
+ * Copyright 2014 IBM Corp.
+ * Author(s): Frank Blaschka 
+ *Hong Bo Li 
+ *Yi Min Zhao 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "qemu-common.h"
+#include "qemu/timer.h"
+#include "migration/qemu-file.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/kvm.h"
+#include "cpu.h"
+#include "sysemu/device_tree.h"
+#include "monitor/monitor.h"
+#include "s390-pci-inst.h"
+
+#include "hw/hw.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci_host.h"
+#include "hw/s390x/s390-pci-bus.h"
+#include "exec/exec-all.h"
+#include "exec/memory-internal.h"
+
+/* #define DEBUG_S390PCI_INST */
+#ifdef DEBUG_S390PCI_INST
+#define DPRINTF(fmt, ...) \
+do { fprintf(stderr, "s390pci-inst: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+do { } while (0)
+#endif
+
+static void s390_set_status_code(CPUS390XState *env,
+ uint8_t r, uint64_t status_code)
+{
+env->regs[r] &= ~0xff00ULL;
+env->regs[r] |= (status_code & 0xff) << 24;
+}
+
+static int list_pci(ClpReqRspListPci *rrb, uint8_t *cc)
+{
+S390PCIBusDevice *pbdev;
+uint32_t res_code, initial_l2, g_l2, finish;
+int rc, idx;
+uint64_t resume_token;
+
+rc = 0;
+if (lduw_p(&rrb->request.hdr.len) != 32) {
+res_code = CLP_RC_LEN;
+rc = -EINVAL;
+goto out;
+}
+
+if ((ldl_p(&rrb->request.fmt) & CLP_MASK_FMT) != 0) {
+res_code = CLP_RC_FMT;
+rc = -EINVAL;
+goto out;
+}
+
+if ((ldl_p(&rrb->request.fmt) & ~CLP_MASK_FMT) != 0 ||
+ldq_p(&rrb->request.reserved1) != 0 ||
+ldq_p(&rrb->request.reserved2) != 0) {
+res_code = CLP_RC_RESNOT0;
+rc = -EINVAL;
+goto out;
+}
+
+resume_token = ldq_p(&rrb->request.resume_token);
+
+if (resume_token) {
+pbdev = s390_pci_find_dev_by_idx(resume_token);
+if (!pbdev) {
+res_code = CLP_RC_LISTPCI_BADRT;
+rc = -EINVAL;
+goto out;
+}
+}
+
+if (lduw_p(&rrb->response.hdr.len) < 48) {
+res_code = CLP_RC_8K;
+rc = -EINVAL;
+goto out;
+}
+
+initial_l2 = lduw_p(&rrb->response.hdr.len);
+if ((initial_l2 - LIST_PCI_HDR_LEN) % sizeof(ClpFhListEntry)
+!= 0) {
+res_code = CLP_RC_LEN;
+rc = -EINVAL;
+*cc = 3;
+goto out;
+}
+
+stl_p(&rrb->response.fmt, 0);
+stq_p(&rrb->response.reserved1, 0);
+stq_p(&rrb->response.reserved2, 0);
+stl_p(&rrb->response.mdd, FH_VIRT);
+stw_p(&rrb->response.max_fn, PCI_MAX_FUNCTIONS);
+rrb->response.entry_size = sizeof(ClpFhListEntry);
+finish = 0;
+idx = resume_token;
+g_l2 = LIST_PCI_HDR_LEN;
+do {
+pbdev = s390_pci_find_dev_by_idx(idx);
+if (!pbdev) {
+finish = 1;
+break;
+}
+stw_p(&rrb->response.fh_list[idx - resume_token].device_id,
+pci_get_word(pbdev->pdev->config + PCI_DEVICE_ID));
+stw_p(&rrb->response.fh_list[idx - resume_token].vendor_id,
+pci_get_word(pbdev->pdev->config + PCI_VENDOR_ID));
+stl_p(&rrb->response.fh_list[idx - resume_token].config, 0x8000);
+stl_p(&rrb->response.fh_list[idx - resume_token].fid, pbdev->fid);
+stl_p(&rrb->response.fh_list[idx - resume_token].fh, pbdev->fh);
+
+g_l2 += sizeof(ClpFhListEntry);
+/* Add endian check for DPRINTF? */
+DPRINTF("g_l2 %d vendor id 0x%x device id 0x%x fid 0x%x fh 0x%x\

Re: [Qemu-devel] [PATCH v3 2/5] block: JSON filenames and relative backing files

2014-11-26 Thread Max Reitz


On 2014-11-25 at 20:57, Eric Blake wrote:

On 11/24/2014 02:43 AM, Max Reitz wrote:

When using a relative backing file name, qemu needs to know the
directory of the top image file. For JSON filenames, such a directory
cannot be easily determined (e.g. how do you determine the directory of
a qcow2 BDS directly on top of a quorum BDS?). Therefore, do not allow
relative filenames for the backing file of BDSs only having a JSON
filename.

Furthermore, BDS::exact_filename should be used whenever possible. If
BDS::filename is not equal to BDS::exact_filename, the former will
always be a JSON object.

Signed-off-by: Max Reitz 
---
  block.c   | 27 +--
  block/qapi.c  |  7 ++-
  include/block/block.h |  5 +++--
  3 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index 0c1be37..a0cddcd 100644
--- a/block.c
+++ b/block.c
@@ -305,19 +305,28 @@ void path_combine(char *dest, int dest_size,
  
  void bdrv_get_full_backing_filename_from_filename(const char *backed,

const char *backing,
-  char *dest, size_t sz)
+  char *dest, size_t sz,
+  Error **errp)
  {
-if (backing[0] == '\0' || path_has_protocol(backing)) {
+if (backing[0] == '\0' || path_has_protocol(backing) ||
+path_is_absolute(backing))
+{

checkpatch.pl didn't complain about this?  The { should be on the
previous line.  With that fixed,
Reviewed-by: Eric Blake 


Oh, actually it does.

But there's no rule about that in CODING_STYLE. There is a rule about { 
being on the same line as the if, though ("The opening brace is on the 
line that contains the control
flow statement that introduces the new block"). With the condition split 
over multiple lines, that is no longer possible; therefore, we can place 
{ anywhere we want.


I always place { on the following line for multi-line conditions because 
I find that more readable[1]; maybe I'll change my mind in a year or 
two, but for now there is no rule about this case and I always do it 
this way; so far, nobody complained. :-)


[1] I don't like the following:

if (foo0() || foo1() || foo2() || foo3() ||
bar0() || bar1() || bar2() || bar3()) {
baz();

Because in my eyes it looks like baz() may be part of the condition list 
at first glance (because it is indented just as much as the last line of 
the condition list).


Max

Re: [Qemu-devel] [PATCH v3 2/5] block: JSON filenames and relative backing files

2014-11-26 Thread Max Reitz


On 2014-11-26 at 06:35, Fam Zheng wrote:

On Mon, 11/24 10:43, Max Reitz wrote:

@@ -1209,7 +1218,13 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
  QDECREF(options);
  goto free_exit;
  } else {
-bdrv_get_full_backing_filename(bs, backing_filename, PATH_MAX);
+bdrv_get_full_backing_filename(bs, backing_filename, PATH_MAX, 
&local_err);

Over 80 charaters?


Oops, will fix, thanks.

Max

Re: [Qemu-devel] [PATCH 01/12] block: qcow2 driver may not be found

2014-11-26 Thread Max Reitz


On 2014-11-26 at 08:23, Markus Armbruster wrote:

Max Reitz  writes:


Albeit absolutely impossible right now, bdrv_find_format("qcow2") may
fail. bdrv_append_temp_snapshot() should heed that case.

Impossible because we always compile in bdrv_qcow2.


Right now we do, right.


Cc: qemu-sta...@nongnu.org
Signed-off-by: Max Reitz 
---
  block.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/block.c b/block.c
index 866c8b4..b31fb67 100644
--- a/block.c
+++ b/block.c
@@ -1320,6 +1320,12 @@ int bdrv_append_temp_snapshot(BlockDriverState *bs, int 
flags, Error **errp)
  }
  
  bdrv_qcow2 = bdrv_find_format("qcow2");

+if (!bdrv_qcow2) {
+error_setg(errp, "Failed to locate qcow2 driver");
+ret = -ENOENT;
+goto out;
+}
+
  opts = qemu_opts_create(bdrv_qcow2->create_opts, NULL, 0,
  &error_abort);
  qemu_opt_set_number(opts, BLOCK_OPT_SIZE, total_size);

This dynamic qcow2 driver lookup business is silly.  Compiling without
qcow2 would be a massive loss of functionality, and I wouldn't bet a
nickel on the error paths it would pring to live.


True.


Even sillier lookups: "file" in bdrv_find_protocol(), and "raw" in
find_image_format().

Statically linking to them would be simpler and more honest.

Aside: similar silliness exists around QemuOpts.

Patch looks correct.


Well, at least it will silence Coverity...

Max

Re: [Qemu-devel] [PATCH v6 1/3] linux-aio: fix submit aio as a batch

2014-11-26 Thread Ming Lei

On Wed, Nov 26, 2014 at 12:18 AM, Stefan Hajnoczi  wrote:
>>
>> You mean the abort BH may not have chance to run before its deletion
>> in the detach callback?
>
> Exactly.  Any time you schedule a BH you need to be aware of things that
> may happen before the BH is invoked.
>
>> If so, bdrv_drain_all() from bdrv_set_aio_context() should have
>> handled the pending BH, right?
>
> I'm not sure if it's good to make subtle assumptions like that.  If the
> code changes they will break.

IMO, that should be the purpose of bdrv_drain_all(), at least from
its comment:

 /* ensure there are no in-flight requests */

If it changes in future, the general problem has to be considered.

> Since it is very easy to protect against this case (the code I posted
> before), it seems worthwhile to be on the safe side.

Given there hasn't the potential problem in current tree, could you
agree on merging it first?

BTW, there isn't sort of handling for 'completion_bh' of linux aio too, :-)

Thanks,
Ming Lei

Re: [Qemu-devel] [PATCH] target-i386: add feature flags for CPUID[EAX=0xd, ECX=1]

2014-11-26 Thread Paolo Bonzini

On 25/11/2014 21:02, Paolo Bonzini wrote:
> > > +static const char *cpuid_xsave_feature_name[] = {
> > > +"xsaveopt", "xsavec", "xgetbv1", "xsaves",
> > 
> > None of the above features introduce any new state that might need to be
> > migrated, or will require other changes in QEMU to work, right?
> > 
> > It looks like they don't introduce any extra state, but if they do, they
> > need to be added to unmigratable_flags until migration support is
> > implemented.
> > 
> > If they require other QEMU changes, it would be nice if KVM reported
> > them using KVM_CHECK_EXTENSION instead of GET_SUPPORTED_CPUID, so it
> > wouldn't break "-cpu host".
> 
> No, they don't.

Actually, xsaves does but I don't think KVM_CHECK_EXTENSION is right.
It's just another MSR, and we haven't used KVM_CHECK_EXTENSION for new
MSRs and new XSAVE areas (last example: avx512).

Since no hardware really exists for it, and KVM does not support it
anyway, I think it's simplest to leave xsaves out for now.  Is this right?

Paolo

Re: [Qemu-devel] [PATCH v3 5/5] iotests: Add test for relative backing file names

2014-11-26 Thread Max Reitz


On 2014-11-25 at 23:06, Eric Blake wrote:

On 11/24/2014 02:43 AM, Max Reitz wrote:

Sometimes, qemu does not have a filename to work with, so it does not
know which directory to use for a backing file specified by a relative
filename. Add a test which tests that qemu exits with an appropriate
error message.

Additionally, add a test for qemu-img create with a backing filename
relative to the backed image's base directory while omitting the image
size.

Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/110 | 94 ++
  tests/qemu-iotests/110.out | 19 ++
  tests/qemu-iotests/group   |  1 +
  3 files changed, 114 insertions(+)
  create mode 100755 tests/qemu-iotests/110
  create mode 100644 tests/qemu-iotests/110.out

+echo
+echo '=== Backing name is always relative to the backed image ==='
+echo
+
+# omit the image size; it shoud work anyway

s/shoud/should/


As I did the same mistake twice in some other series, I'm beginning to 
want to blame my keyboard... (I know that certain key sequences work 
bad, for instance, the U in NULL is sometimes omitted (not in null, only 
in NULL))


Max


With the typo fix,
Reviewed-by: Eric Blake

[Qemu-devel] [PATCH RFC] block: fix spoiling all dirty bitmaps by mirror and migration

2014-11-26 Thread Vladimir Sementsov-Ogievskiy

Mirror and migration use dirty bitmaps for their purposes, and since
commit [block: per caller dirty bitmap] they use their own bitmaps, not
the global one. But they use old functions bdrv_set_dirty and
bdrv_reset_dirty, which change all dirty bitmaps.

Named dirty bitmaps series by Fam and Snow are affected: mirroring and
migration will spoil all (not related to this mirroring or migration)
named dirty bitmaps.

This patch fixes this by adding bdrv_set_dirty_bitmap and
bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to prevent
such mistakes in future, old functions bdrv_(set,reset)_dirty are made
static, for internal block usage.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block-migration.c |  4 ++--
 block.c   | 23 ---
 block/mirror.c|  8 
 include/block/block.h |  6 --
 4 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 08db01a..2b077ad 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -303,7 +303,7 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState 
*bmds)
 blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
 nr_sectors, blk_mig_read_cb, blk);
 
-bdrv_reset_dirty(bs, cur_sector, nr_sectors);
+bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, nr_sectors);
 qemu_mutex_unlock_iothread();
 
 bmds->cur_sector = cur_sector + nr_sectors;
@@ -496,7 +496,7 @@ static int mig_save_device_dirty(QEMUFile *f, 
BlkMigDevState *bmds,
 g_free(blk);
 }
 
-bdrv_reset_dirty(bmds->bs, sector, nr_sectors);
+bdrv_reset_dirty_bitmap(bmds->bs, bmds->dirty_bitmap, sector, 
nr_sectors);
 break;
 }
 sector += BDRV_SECTORS_PER_DIRTY_CHUNK;
diff --git a/block.c b/block.c
index a612594..4d12c0d 100644
--- a/block.c
+++ b/block.c
@@ -97,6 +97,10 @@ static QTAILQ_HEAD(, BlockDriverState) graph_bdrv_states =
 static QLIST_HEAD(, BlockDriver) bdrv_drivers =
 QLIST_HEAD_INITIALIZER(bdrv_drivers);
 
+static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
+   int nr_sectors);
+static void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
+ int nr_sectors);
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
@@ -5361,8 +5365,20 @@ void bdrv_dirty_iter_init(BlockDriverState *bs,
 hbitmap_iter_init(hbi, bitmap->bitmap, 0);
 }
 
-void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
-int nr_sectors)
+void bdrv_set_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
+   int64_t cur_sector, int nr_sectors)
+{
+hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
+}
+
+void bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
+ int64_t cur_sector, int nr_sectors)
+{
+hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
+}
+
+static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
+   int nr_sectors)
 {
 BdrvDirtyBitmap *bitmap;
 QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
@@ -5370,7 +5386,8 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t 
cur_sector,
 }
 }
 
-void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors)
+static void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
+ int nr_sectors)
 {
 BdrvDirtyBitmap *bitmap;
 QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
diff --git a/block/mirror.c b/block/mirror.c
index 2c6dd2a..72011c4 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -128,7 +128,7 @@ static void mirror_write_complete(void *opaque, int ret)
 BlockDriverState *source = s->common.bs;
 BlockErrorAction action;
 
-bdrv_set_dirty(source, op->sector_num, op->nb_sectors);
+bdrv_set_dirty_bitmap(source, s->dirty_bitmap, op->sector_num, 
op->nb_sectors);
 action = mirror_error_action(s, false, -ret);
 if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
 s->ret = ret;
@@ -145,7 +145,7 @@ static void mirror_read_complete(void *opaque, int ret)
 BlockDriverState *source = s->common.bs;
 BlockErrorAction action;
 
-bdrv_set_dirty(source, op->sector_num, op->nb_sectors);
+bdrv_set_dirty_bitmap(source, s->dirty_bitmap, op->sector_num, 
op->nb_sectors);
 action = mirror_error_action(s, true, -ret);
 if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
 s->ret = ret;
@@ -286,7 +286,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 next_sector += sectors_per_chunk;
 }
 
-bdrv_reset_dirty(source, sector_num, nb_sectors);
+bdrv_reset_dirty_bitmap(source, s->dirty_bitmap, sector_num, nb_sectors);
 
 /* Copy the dirty cl

[Qemu-devel] TCG Multithreading performance improvement

2014-11-26 Thread Mark Burton



Hi all,

We are now actively going to pursue TCG Multithreading to improve the 
performance of the TCG for Qemu models that include multiple cores.

We have set up a wiki page to track the project 
http://wiki.qemu.org/Features/tcg-multithread 


At this point, I would like to invite everybody to email us ideas about how the 
project should progress, and ideas that might be useful (e.g. people who have 
tried this before, source code that might be helpful, what order we should 
attack things in etc)…

So - PLEASE let us know if you have interest in this topic, or information that 
might help

Cheers

Mark.





 +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

+33 (0)603762104
mark.burton

Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support

2014-11-26 Thread Eric Auger

On 11/05/2014 11:29 AM, Alexander Graf wrote:
> 
> 
> On 31.10.14 15:05, Eric Auger wrote:
>> Minimal VFIO platform implementation supporting
>> - register space user mapping,
>> - IRQ assignment based on eventfds handled on qemu side.
>>
>> irqfd kernel acceleration comes in a subsequent patch.
>>
>> Signed-off-by: Kim Phillips 
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v6 -> v7:
>> - compat is not exposed anymore as a user option. Rationale is
>>   the vfio device became abstract and a specialization is needed
>>   anyway. The derived device must set the compat string.
>> - in v6 vfio_start_irq_injection was exposed in vfio-platform.h.
>>   A new function dubbed vfio_register_irq_starter replaces it. It
>>   registers a machine init done notifier that programs & starts
>>   all dynamic VFIO device IRQs. This function is supposed to be
>>   called by the machine file. A set of static helper routines are
>>   added too. It must be called before the creation of the platform
>>   bus device.
>>
>> v5 -> v6:
>> - vfio_device property renamed into host property
>> - correct error handling of VFIO_DEVICE_GET_IRQ_INFO ioctl
>>   and remove PCI related comment
>> - remove declaration of vfio_setup_irqfd and irqfd_allowed
>>   property.Both belong to next patch (irqfd)
>> - remove declaration of vfio_intp_interrupt in vfio-platform.h
>> - functions that can be static get this characteristic
>> - remove declarations of vfio_region_ops, vfio_memory_listener,
>>   group_list, vfio_address_spaces. All are moved to vfio-common.h
>> - remove vfio_put_device declaration and definition
>> - print_regions removed. code moved into vfio_populate_regions
>> - replace DPRINTF by trace events
>> - new helper routine to set the trigger eventfd
>> - dissociate intp init from the injection enablement:
>>   vfio_enable_intp renamed into vfio_init_intp and new function
>>   named vfio_start_eventfd_injection
>> - injection start moved to vfio_start_irq_injection (not anymore
>>   in vfio_populate_interrupt)
>> - new start_irq_fn field in VFIOPlatformDevice corresponding to
>>   the function that will be used for starting injection
>> - user handled eventfd:
>>   x add mutex to protect IRQ state & list manipulation,
>>   x correct misleading comment in vfio_intp_interrupt.
>>   x Fix bugs thanks to fake interrupt modality
>> - VFIOPlatformDeviceClass becomes abstract
>> - add error_setg in vfio_platform_realize
>>
>> v4 -> v5:
>> - vfio-plaform.h included first
>> - cleanup error handling in *populate*, vfio_get_device,
>>   vfio_enable_intp
>> - vfio_put_device not called anymore
>> - add some includes to follow vfio policy
>>
>> v3 -> v4:
>> [Eric Auger]
>> - merge of "vfio: Add initial IRQ support in platform device"
>>   to get a full functional patch although perfs are limited.
>> - removal of unrealize function since I currently understand
>>   it is only used with device hot-plug feature.
>>
>> v2 -> v3:
>> [Eric Auger]
>> - further factorization between PCI and platform (VFIORegion,
>>   VFIODevice). same level of functionality.
>>
>> <= v2:
>> [Kim Philipps]
>> - Initial Creation of the device supporting register space mapping
>> ---
>>  hw/vfio/Makefile.objs   |   1 +
>>  hw/vfio/platform.c  | 672 
>> 
>>  include/hw/vfio/vfio-common.h   |   1 +
>>  include/hw/vfio/vfio-platform.h |  87 ++
>>  trace-events|  12 +
>>  5 files changed, 773 insertions(+)
>>  create mode 100644 hw/vfio/platform.c
>>  create mode 100644 include/hw/vfio/vfio-platform.h
>>
>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> index e31f30e..c5c76fe 100644
>> --- a/hw/vfio/Makefile.objs
>> +++ b/hw/vfio/Makefile.objs
>> @@ -1,4 +1,5 @@
>>  ifeq ($(CONFIG_LINUX), y)
>>  obj-$(CONFIG_SOFTMMU) += common.o
>>  obj-$(CONFIG_PCI) += pci.o
>> +obj-$(CONFIG_SOFTMMU) += platform.o
>>  endif
>> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
>> new file mode 100644
>> index 000..9f66610
>> --- /dev/null
>> +++ b/hw/vfio/platform.c
>> @@ -0,0 +1,672 @@
>> +/*
>> + * vfio based device assignment support - platform devices
>> + *
>> + * Copyright Linaro Limited, 2014
>> + *
>> + * Authors:
>> + *  Kim Phillips 
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + * Based on vfio based PCI device assignment support:
>> + *  Copyright Red Hat, Inc. 2012
>> + */
>> +
>> +#include 
>> +#include 
>> +
>> +#include "hw/vfio/vfio-platform.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/range.h"
>> +#include "sysemu/sysemu.h"
>> +#include "exec/memory.h"
>> +#include "qemu/queue.h"
>> +#include "hw/sysbus.h"
>> +#include "trace.h"
>> +#include "hw/platform-bus.h"
>> +
>> +static void vfio_intp_interrupt(VFIOINTp *intp);
>> +typedef void (*eventfd_user_side_handler_t)(VFIOINTp *intp);
>> +static int vfio_set_trigger_eventfd(VFIOINTp *intp,
>> +

[Qemu-devel] [Bug 1395217] Re: Networking in qemu 2.0.0 and beyond is not compatible with Open Solaris (Illumos) 5.11

2014-11-26 Thread Tim Dawson

Bisected merrily away, and this is where it definitively begins to fail
. . . To verify, I checked out both commits, and confirmed change in
function at this point.  I attempted a revoke of this commit on my clone
to test, but too many merge errors to make that a simple task, so that
was not done.

commit ef02ef5f4536dba090b12360a6c862ef0e57e3bc
Author: Eduardo Habkost 
Date:   Wed Feb 19 11:58:12 2014 -0300

target-i386: Enable x2apic by default on KVM

When on KVM mode, enable x2apic by default on all CPU models.

Normally we try to keep the CPU model definitions as close as the real
CPUs as possible, but x2apic can be emulated by KVM without host CPU
support for x2apic, and it improves performance by reducing APIC access
overhead. x2apic emulation is available on KVM since 2009 (Linux
2.6.32-rc1), there's no reason for not enabling x2apic by default when
running KVM.

Signed-off-by: Eduardo Habkost 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Andreas FÃ¤rber 

:04 04 ebdc1ecd08cb507db62cc465696925a4cde6174f e83d9c32f821714600c48594
15911910d4b37c0d M  hw
:04 04 9064bc796128ba1380b67a86af9718dcc1022f0d 5cb337c72259b54780856806
8f56f4abfa628579 M  target-i386

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1395217

Title:
  Networking in qemu 2.0.0 and beyond is not compatible with Open
  Solaris (Illumos) 5.11

Status in QEMU:
  New

Bug description:
  The networking code in qemu in versions 2.0.0 and beyond is non-
  functional with Solaris/Illumos 5.11 images.

  Building 1.7.1, 2.0.0, 2.0.2, 2.1.2,and 2.2.0rc1with the following
  standard Slackware config:

  # From Slackware build tree . . . 
  ./configure \
--prefix=/usr \
--libdir=/usr/lib64 \
--sysconfdir=/etc \
--localstatedir=/var \
--enable-gtk \
--enable-system \
--enable-kvm \
--disable-debug-info \
--enable-virtfs \
--enable-sdl \
--audio-drv-list=alsa,oss,sdl,esd \
--enable-libusb \
--disable-vnc \
--target-list=x86_64-linux-user,i386-linux-user,x86_64-softmmu,i386-softmmu 
\
--enable-spice \
--enable-usb-redir 

  
  And attempting to run the same VM image with the following command (or via 
virt-manager):

  macaddress="DE:AD:BE:EF:3F:A4"

  qemu-system-x86_64 nex4x -cdrom /dev/cdrom -name "Nex41" -cpu Westmere
  -machine accel=kvm -smp 2 -m 4000 -net nic,macaddr=$macaddress  -net 
bridge,br=b
  r0 -net dump,file=/usr1/tmp/ -drive file=nex4x_d1 -drive 
file=nex4x_d2
   -enable-kvm

  Gives success on 1.7.1, and a deaf VM on all subsequent versions.

  Notable in validating my config, is that a Windows 7 image runs
  cleanly with networking on *all* builds, so my configuration appears
  to be good - qemu just hates Solaris at this point.

  Watching with wireshark (as well as pulling network traces from qemu
  as noted above) it appears that the notable difference in the two
  configs is that for some reason, Solaris gets stuck arping for it's
  own interface on startup, and never really comes on line on the
  network.  If other hosts attempt to ping the Solaris instance, they
  can successfully arp the bad VM, but not the other way around.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1395217/+subscriptions

[Qemu-devel] [Bug 1395217] Re: Networking in qemu 2.0.0 and beyond is not compatible with Open Solaris (Illumos) 5.11

2014-11-26 Thread Tim Dawson

This does not appear to be run-time selectable (or I have not found the
option yet . . . ) so not quire sure how to verify if backing this out
will resolve the issue in later versions.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1395217

Title:
  Networking in qemu 2.0.0 and beyond is not compatible with Open
  Solaris (Illumos) 5.11

Status in QEMU:
  New

Bug description:
  The networking code in qemu in versions 2.0.0 and beyond is non-
  functional with Solaris/Illumos 5.11 images.

  Building 1.7.1, 2.0.0, 2.0.2, 2.1.2,and 2.2.0rc1with the following
  standard Slackware config:

  # From Slackware build tree . . . 
  ./configure \
--prefix=/usr \
--libdir=/usr/lib64 \
--sysconfdir=/etc \
--localstatedir=/var \
--enable-gtk \
--enable-system \
--enable-kvm \
--disable-debug-info \
--enable-virtfs \
--enable-sdl \
--audio-drv-list=alsa,oss,sdl,esd \
--enable-libusb \
--disable-vnc \
--target-list=x86_64-linux-user,i386-linux-user,x86_64-softmmu,i386-softmmu 
\
--enable-spice \
--enable-usb-redir 

  
  And attempting to run the same VM image with the following command (or via 
virt-manager):

  macaddress="DE:AD:BE:EF:3F:A4"

  qemu-system-x86_64 nex4x -cdrom /dev/cdrom -name "Nex41" -cpu Westmere
  -machine accel=kvm -smp 2 -m 4000 -net nic,macaddr=$macaddress  -net 
bridge,br=b
  r0 -net dump,file=/usr1/tmp/ -drive file=nex4x_d1 -drive 
file=nex4x_d2
   -enable-kvm

  Gives success on 1.7.1, and a deaf VM on all subsequent versions.

  Notable in validating my config, is that a Windows 7 image runs
  cleanly with networking on *all* builds, so my configuration appears
  to be good - qemu just hates Solaris at this point.

  Watching with wireshark (as well as pulling network traces from qemu
  as noted above) it appears that the notable difference in the two
  configs is that for some reason, Solaris gets stuck arping for it's
  own interface on startup, and never really comes on line on the
  network.  If other hosts attempt to ping the Solaris instance, they
  can successfully arp the bad VM, but not the other way around.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1395217/+subscriptions

[Qemu-devel] [PATCH] s390x/kvm: Fix compile error

2014-11-26 Thread Christian Borntraeger

commit a2b257d6212a "memory: expose alignment used for allocating RAM
as MemoryRegion API" triggered a compile error on KVM/s390x.

Fix the prototype and the implementation of legacy_s390_alloc.

Cc: Igor Mammedov 
Cc: Michael S. Tsirkin 
Signed-off-by: Christian Borntraeger 
---
 target-s390x/kvm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 80fb0aa..5349075 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -110,7 +110,7 @@ static int cap_async_pf;
 
 static uint64_t cpu_model_call_cache;
 
-static void *legacy_s390_alloc(size_t size);
+static void *legacy_s390_alloc(size_t size, uint64_t *align);
 
 static int kvm_s390_check_clear_cmma(KVMState *s)
 {
@@ -545,7 +545,7 @@ int kvm_s390_set_clock(uint8_t *tod_clock_high, uint64_t 
*tod_clock)
  * to grow. We also have to use MAP parameters that avoid
  * read-only mapping of guest pages.
  */
-static void *legacy_s390_alloc(size_t size, , uint64_t *align)
+static void *legacy_s390_alloc(size_t size, uint64_t *align)
 {
 void *mem;
 
-- 
1.9.3

[Qemu-devel] [PATCH 1/2] balloon: call qdev_alias_all_properties for proxy dev in balloon class init

2014-11-26 Thread Denis V. Lunev

From: Raushaniya Maksudova 

The idea is that all other virtio devices are calling this helper
to merge properties of the proxy device. This is the only difference
in between this helper and code in inside virtio_instance_init_common.
The patch should not cause any harm as property list in generic balloon
code is empty.

This also allows to avoid some dummy errors like fixed by this
commit 91ba21208839643603e7f7fa5864723c3f371ebe
Author: Gonglei 
Date:   Tue Sep 30 14:10:35 2014 +0800
virtio-balloon: fix virtio-balloon child refcount in transports

Signed-off-by: Denis V. Lunev 
Acked-by: Raushaniya Maksudova 
CC: Cornelia Huck 
CC: Christian Borntraeger 
CC: Anthony Liguori 
CC: Michael S. Tsirkin 
---
 hw/s390x/virtio-ccw.c  | 5 ++---
 hw/virtio/virtio-pci.c | 5 ++---
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index ea236c9..82da894 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -899,9 +899,8 @@ static void balloon_ccw_stats_set_poll_interval(Object 
*obj, struct Visitor *v,
 static void virtio_ccw_balloon_instance_init(Object *obj)
 {
 VirtIOBalloonCcw *dev = VIRTIO_BALLOON_CCW(obj);
-object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_BALLOON);
-object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
-object_unref(OBJECT(&dev->vdev));
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_BALLOON);
 object_property_add(obj, "guest-stats", "guest statistics",
 balloon_ccw_stats_get_all, NULL, NULL, dev, NULL);
 
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index dde1d73..745324b 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1316,9 +1316,8 @@ static void virtio_balloon_pci_class_init(ObjectClass 
*klass, void *data)
 static void virtio_balloon_pci_instance_init(Object *obj)
 {
 VirtIOBalloonPCI *dev = VIRTIO_BALLOON_PCI(obj);
-object_initialize(&dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_BALLOON);
-object_property_add_child(obj, "virtio-backend", OBJECT(&dev->vdev), NULL);
-object_unref(OBJECT(&dev->vdev));
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_BALLOON);
 object_property_add(obj, "guest-stats", "guest statistics",
 balloon_pci_stats_get_all, NULL, NULL, dev,
 NULL);
-- 
1.9.1

[Qemu-devel] [PATCH 2/2] balloon: add a feature bit to let Guest OS deflate balloon on oom

2014-11-26 Thread Denis V. Lunev

From: Raushaniya Maksudova 

Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless it
is often the case that these control tools does not have enough time to
react on fast changing memory load. As a result OS runs out of memory and
invokes OOM-killer. The balancing of memory by use of the virtio balloon
should not cause the termination of processes while there are pages in the
balloon. Now there is no way for virtio balloon driver to free memory at
the last moment before some process get killed by OOM-killer.

This does not provide a security breach as balloon itself is running
inside Guest OS and is working in the cooperation with the host. Thus
some improvements from Guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make the
system to return and retry the allocation that forced the out of memory
killer to run.

This behavior should be enabled if and only if appropriate feature bit
is set on the device. It is off by default.

This functionality was recently merged into vanilla Linux (actually in
linux-next at the moment)

  commit 5a10b7dbf904bfe01bb9fcc6298f7df09eed77d5
  Author: Raushaniya Maksudova 
  Date:   Mon Nov 10 09:36:29 2014 +1030

This patch adds respective control bits into QEMU. It introduces
deflate-on-oom option for baloon device which do the trick.

Signed-off-by: Raushaniya Maksudova 
Signed-off-by: Denis V. Lunev 
CC: Anthony Liguori 
CC: Michael S. Tsirkin 
---
 hw/virtio/virtio-balloon.c | 7 +++
 include/hw/virtio/virtio-balloon.h | 2 ++
 qemu-options.hx| 6 +-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 7bfbb75..9d145fa 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -305,7 +305,12 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 
 static uint32_t virtio_balloon_get_features(VirtIODevice *vdev, uint32_t f)
 {
+VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
 f |= (1 << VIRTIO_BALLOON_F_STATS_VQ);
+if (dev->deflate_on_oom) {
+f |= (1 << VIRTIO_BALLOON_F_DEFLATE_ON_OOM);
+}
+
 return f;
 }
 
@@ -409,6 +414,7 @@ static void virtio_balloon_device_unrealize(DeviceState 
*dev, Error **errp)
 }
 
 static Property virtio_balloon_properties[] = {
+DEFINE_PROP_BOOL("deflate-on-oom", VirtIOBalloon, deflate_on_oom, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index f863bfe..45cc55a 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -30,6 +30,7 @@
 /* The feature bitmap for virtio balloon */
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ 1   /* Memory stats virtqueue */
+#define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -67,6 +68,7 @@ typedef struct VirtIOBalloon {
 QEMUTimer *stats_timer;
 int64_t stats_last_update;
 int64_t stats_poll_interval;
+bool deflate_on_oom;
 } VirtIOBalloon;
 
 #endif
diff --git a/qemu-options.hx b/qemu-options.hx
index da9851d..14ede0b 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -324,7 +324,8 @@ ETEXI
 DEF("balloon", HAS_ARG, QEMU_OPTION_balloon,
 "-balloon none   disable balloon device\n"
 "-balloon virtio[,addr=str]\n"
-"enable virtio balloon device (default)\n", QEMU_ARCH_ALL)
+"enable virtio balloon device (default)\n"
+"   [,deflate-on-oom=on|off]\n", QEMU_ARCH_ALL)
 STEXI
 @item -balloon none
 @findex -balloon
@@ -332,6 +333,9 @@ Disable balloon device.
 @item -balloon virtio[,addr=@var{addr}]
 Enable virtio balloon device (default), optionally with PCI address
 @var{addr}.
+@item -balloon virtio[,deflate-on-oom=@var{deflate-on-oom}]
+@var{deflate-on-oom} is "on" or "off" and enables whether to let Guest OS
+to deflate virtio balloon on OOM. Default is off.
 ETEXI
 
 DEF("device", HAS_ARG, QEMU_OPTION_device,
-- 
1.9.1

[Qemu-devel] [PATCH 0/2] balloon: add a feature bit to let Guest OS deflate virtio_balloon on OOM

2014-11-26 Thread Denis V. Lunev

Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless it
is often the case that these control tools does not have enough time to
react on fast changing memory load. As a result OS runs out of memory and
invokes OOM-killer. The balancing of memory by use of the virtio balloon
should not cause the termination of processes while there are pages in the
balloon. Now there is no way for virtio balloon driver to free memory at
the last moment before some process get killed by OOM-killer.

This does not provide a security breach as balloon itself is running
inside Guest OS and is working in the cooperation with the host. Thus
some improvements from Guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make the
system to return and retry the allocation that forced the out of memory
killer to run.

This behavior should be enabled if and only if appropriate feature bit
is set on the device. It is off by default.

This functionality was recently merged into vanilla Linux (actually in
linux-next at the moment)

  commit 5a10b7dbf904bfe01bb9fcc6298f7df09eed77d5
  Author: Raushaniya Maksudova 
  Date:   Mon Nov 10 09:36:29 2014 +1030

This patch adds respective control bits into QEMU. It introduces
deflate-on-oom option for baloon device which do the trick.

Signed-off-by: Raushaniya Maksudova 
Signed-off-by: Denis V. Lunev 
CC: Anthony Liguori 
CC: Michael S. Tsirkin

Re: [Qemu-devel] [PATCH v2 0/6] Geometry and blocksize support for backing devices

2014-11-26 Thread Ekaterina Tumanova


On 11/25/2014 04:01 PM, Stefan Hajnoczi wrote:

On Wed, Nov 19, 2014 at 11:17:50AM +0100, Ekaterina Tumanova wrote:

Hi folks,

I'm sorry for the recent spam. I messed up during code submission last time.
So please ignore any previous notes you received from me and answer only to
this thread.

This is the rework of the geometry+blocksize patch, which was
recently discussed here:
http://lists.gnu.org/archive/html/qemu-devel/2014-11/msg01148.html

Markus suggested that we only detect blocksize and geometry for DASDs.

According to this agreement new version contains DASD special casing.
The driver methods are implemented only for "host_device" and inner hdev_xxx
functions check if the backing storage is a DASD by means of
BIODASDINFO2 ioctl.

Original patchset can be found here:
http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg03791.html


This is description is mainly a changelog.  Links to previous email
threads are useful for additional info but please include a
self-contained description of the series and the rationale behind it.


will include into the next version


Comments:

1. This series overrides the logical_block_size and
physical_block_size options for raw images on DASD devices.  Users
expect their command-line options to be honored, so the options
should not be overriden if they have been given on the command-line.


will fix that


2. Only virtio_blk is modified, this is inconsistent.  All emulated
storage controllers using BlockConf have the same block size
probing behavior.


I will add blkconf_blocksizes call to other BlockConf users.



3. Why does s390 need to customize hd_geometry_guess()?

Since hd_geometry_guess contains semantics of x86-specific LBA 
translation, we have to modify it not to get in the way of z

architecture


4. Please use scripts/checkpatch.pl to check coding style.


I did :)

Thanks a lot,
Kate.

Re: [Qemu-devel] [PATCH] s390x/kvm: Fix compile error

2014-11-26 Thread Cornelia Huck

On Wed, 26 Nov 2014 11:07:24 +0100
Christian Borntraeger  wrote:

> commit a2b257d6212a "memory: expose alignment used for allocating RAM
> as MemoryRegion API" triggered a compile error on KVM/s390x.
> 
> Fix the prototype and the implementation of legacy_s390_alloc.
> 
> Cc: Igor Mammedov 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Christian Borntraeger 
> ---
>  target-s390x/kvm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Acked-by: Cornelia Huck 

Peter, will you pick this up directly?

Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support

2014-11-26 Thread Alexander Graf



On 26.11.14 10:45, Eric Auger wrote:
> On 11/05/2014 11:29 AM, Alexander Graf wrote:
>>
>>
>> On 31.10.14 15:05, Eric Auger wrote:
>>> Minimal VFIO platform implementation supporting
>>> - register space user mapping,
>>> - IRQ assignment based on eventfds handled on qemu side.
>>>
>>> irqfd kernel acceleration comes in a subsequent patch.
>>>
>>> Signed-off-by: Kim Phillips 
>>> Signed-off-by: Eric Auger 

[...]

>>> +/*
>>> + * Mechanics to program/start irq injection on machine init done notifier:
>>> + * this is needed since at finalize time, the device IRQ are not yet
>>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>>> + * always is used. Binding to the platform bus IRQ happens on a machine
>>> + * init done notifier registered by the machine file. After its execution
>>> + * we execute a new notifier that actually starts the injection. When using
>>> + * irqfd, programming the injection consists in associating eventfds to
>>> + * GSI number,ie. virtual IRQ number
>>> + */
>>> +
>>> +typedef struct VfioIrqStarterNotifierParams {
>>> +unsigned int platform_bus_first_irq;
>>> +Notifier notifier;
>>> +} VfioIrqStarterNotifierParams;
>>> +
>>> +typedef struct VfioIrqStartParams {
>>> +PlatformBusDevice *pbus;
>>> +int platform_bus_first_irq;
>>> +} VfioIrqStartParams;
>>> +
>>> +/* Start injection of IRQ for a specific VFIO device */
>>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>>> +{
>>> +int i;
>>> +VfioIrqStartParams *p = opaque;
>>> +VFIOPlatformDevice *vdev;
>>> +VFIODevice *vbasedev;
>>> +uint64_t irq_number;
>>> +PlatformBusDevice *pbus = p->pbus;
>>> +int platform_bus_first_irq = p->platform_bus_first_irq;
>>> +
>>> +if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>>> +vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>> +vbasedev = &vdev->vbasedev;
>>> +for (i = 0; i < vbasedev->num_irqs; i++) {
>>> +irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>>> + + platform_bus_first_irq;
>>> +vfio_start_irq_injection(sbdev, i, irq_number);
>>> +}
>>> +}
>>> +return 0;
>>> +}
>>> +
>>> +/* loop on all VFIO platform devices and start their IRQ injection */
>>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>>> +{
>>> +VfioIrqStarterNotifierParams *p =
>>> +container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>>> +DeviceState *dev =
>>> +qdev_find_recursive(sysbus_get_default(), 
>>> TYPE_PLATFORM_BUS_DEVICE);
>>> +PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>>> +
>>> +if (pbus->done_gathering) {
>>> +VfioIrqStartParams data = {
>>> +.pbus = pbus,
>>> +.platform_bus_first_irq = p->platform_bus_first_irq,
>>> +};
>>> +
>>> +foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>>> +}
>>> +}
>>> +
>>> +/* registers the machine init done notifier that will start VFIO IRQ */
>>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>>> +{
>>> +VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 
>>> 1);
>>> +
>>> +p->platform_bus_first_irq = platform_bus_first_irq;
>>> +p->notifier.notify = vfio_irq_starter_notify;
>>> +qemu_add_machine_init_done_notifier(&p->notifier);
>>
>> Could you add a notifier for each device instead? Then the notifier
>> would be part of the vfio device struct and not some dangling random
>> pointer :).
>>
>> Of course instead of foreach_dynamic_sysbus_device() you would directly
>> know the device you're dealing with and only handle a single device per
>> notifier.
> 
> Hi Alex,
> 
> I don't see how to practically follow your request:
> 
> - at machine init time, VFIO devices are not yet instantiated so I
> cannot call foreach_dynamic_sysbus_device() there - I was definitively
> wrong in my first reply :-().
> 
> - I can't register a per VFIO device notifier in the VFIO device
> finalize function because this latter is called after the platform bus
> instantiation. So the IRQ binding notifier (registered in platform bus
> finalize fn) would be called after the IRQ starter notifier.
> 
> - then to simplify things a bit I could use a qemu_register_reset in
> place of a machine init done notifier (would relax the call order
> constraint) but the problem consists in passing the platform bus first
> irq (all the more so you requested it became part of a const struct)
> 
> Do I miss something?

So the basic idea is that the device itself calls
qemu_add_machine_init_done_notifier() in its realize function. The
Notifier struct would be part of the device state which means you can
cast yourself into the VFIO device state.

At that point the IRQ allocation should have already happened, so your
IRQ objects are populated. You can then ask the KVM GIC to convert that
qemu_irq object to a GIC IRQ ID that you can then use in your ioctl I
su

[Qemu-devel] [Bug 1395217] Re: Networking in qemu 2.0.0 and beyond is not compatible with Open Solaris (Illumos) 5.11

2014-11-26 Thread Tim Dawson

Additional test (I just don't know when to go to bed . . . *sigh* . . .
).

In a checkout of the 2.1.2 code base, and based on the above failing
commit as per bisect, I removed the change in the commit for
target-i386/cpu.c of the line:

[FEAT_1_ECX] = CPUID_EXT_X1APIC,

as added by the errant commit, recompiled, and networking is now working
with Illumos in 2.1.2, so this commit is definitely not as innocent as
it may appear.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1395217

Title:
  Networking in qemu 2.0.0 and beyond is not compatible with Open
  Solaris (Illumos) 5.11

Status in QEMU:
  New

Bug description:
  The networking code in qemu in versions 2.0.0 and beyond is non-
  functional with Solaris/Illumos 5.11 images.

  Building 1.7.1, 2.0.0, 2.0.2, 2.1.2,and 2.2.0rc1with the following
  standard Slackware config:

  # From Slackware build tree . . . 
  ./configure \
--prefix=/usr \
--libdir=/usr/lib64 \
--sysconfdir=/etc \
--localstatedir=/var \
--enable-gtk \
--enable-system \
--enable-kvm \
--disable-debug-info \
--enable-virtfs \
--enable-sdl \
--audio-drv-list=alsa,oss,sdl,esd \
--enable-libusb \
--disable-vnc \
--target-list=x86_64-linux-user,i386-linux-user,x86_64-softmmu,i386-softmmu 
\
--enable-spice \
--enable-usb-redir 

  
  And attempting to run the same VM image with the following command (or via 
virt-manager):

  macaddress="DE:AD:BE:EF:3F:A4"

  qemu-system-x86_64 nex4x -cdrom /dev/cdrom -name "Nex41" -cpu Westmere
  -machine accel=kvm -smp 2 -m 4000 -net nic,macaddr=$macaddress  -net 
bridge,br=b
  r0 -net dump,file=/usr1/tmp/ -drive file=nex4x_d1 -drive 
file=nex4x_d2
   -enable-kvm

  Gives success on 1.7.1, and a deaf VM on all subsequent versions.

  Notable in validating my config, is that a Windows 7 image runs
  cleanly with networking on *all* builds, so my configuration appears
  to be good - qemu just hates Solaris at this point.

  Watching with wireshark (as well as pulling network traces from qemu
  as noted above) it appears that the notable difference in the two
  configs is that for some reason, Solaris gets stuck arping for it's
  own interface on startup, and never really comes on line on the
  network.  If other hosts attempt to ping the Solaris instance, they
  can successfully arp the bad VM, but not the other way around.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1395217/+subscriptions

Re: [Qemu-devel] [PATCH] s390x/kvm: Fix compile error

2014-11-26 Thread Paolo Bonzini



On 26/11/2014 11:19, Cornelia Huck wrote:
> On Wed, 26 Nov 2014 11:07:24 +0100
> Christian Borntraeger  wrote:
> 
>> commit a2b257d6212a "memory: expose alignment used for allocating RAM
>> as MemoryRegion API" triggered a compile error on KVM/s390x.
>>
>> Fix the prototype and the implementation of legacy_s390_alloc.
>>
>> Cc: Igor Mammedov 
>> Cc: Michael S. Tsirkin 
>> Signed-off-by: Christian Borntraeger 
>> ---
>>  target-s390x/kvm.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Acked-by: Cornelia Huck 
> 
> Peter, will you pick this up directly?

I am sending a pull request shortly and will pick this up.

Paolo

[Qemu-devel] [RFC PATCH v5 01/31] cpu-exec: fix cpu_exec_nocache

2014-11-26 Thread Pavel Dovgalyuk

In icount mode cpu_exec_nocache function is used to execute part of the
existing TB. At the end of cpu_exec_nocache newly created TB is deleted.
Sometimes io_read function needs to recompile current TB and restart TB
lookup and execution. After that tb_find_fast function finds old (bigger)
TB again. This TB cannot be executed (because icount is not big enough)
and cpu_exec_nocache is called again. Such a loop continues over and over.
This patch deletes old TB and avoids finding it in the TB cache.

Signed-off-by: Pavel Dovgalyuk 
---
 cpu-exec.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 3913de0..8830255 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -202,13 +202,18 @@ static void cpu_exec_nocache(CPUArchState *env, int 
max_cycles,
 {
 CPUState *cpu = ENV_GET_CPU(env);
 TranslationBlock *tb;
+target_ulong pc = orig_tb->pc;
+target_ulong cs_base = orig_tb->cs_base;
+uint64_t flags = orig_tb->flags;
 
 /* Should never happen.
We only end up here when an existing TB is too long.  */
 if (max_cycles > CF_COUNT_MASK)
 max_cycles = CF_COUNT_MASK;
 
-tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
+/* tb_gen_code can flush our orig_tb, invalidate it now */
+tb_phys_invalidate(orig_tb, -1);
+tb = tb_gen_code(cpu, pc, cs_base, flags,
  max_cycles);
 cpu->current_tb = tb;
 /* execute the generated code */

[Qemu-devel] [RFC PATCH v5 00/31] Deterministic replay and reverse execution

2014-11-26 Thread Pavel Dovgalyuk

This set of patches is related to the reverse execution and deterministic 
replay of qemu execution  Our implementation of deterministic replay can 
be used for deterministic and reverse debugging of guest code through gdb 
remote interface.

Execution recording writes non-deterministic events log, which can be later 
used for replaying the execution anywhere and for unlimited number of times. 
It also supports checkpointing for faster rewinding during reverse debugging. 
Execution replaying reads the log and replays all non-deterministic events 
including external input, hardware clocks, and interrupts.

Reverse execution has the following features:
 * Deterministically replays whole system execution and all contents of the 
memory,
   state of the hadrware devices, clocks, and screen of the VM.
 * Writes execution log into the file for latter replaying for multiple times 
   on different machines.
 * Supports i386, x86_64, and ARM hardware platforms.
 * Performs deterministic replay of all operations with keyboard, mouse, 
network adapters,
   audio devices, serial interfaces, and physical USB devices connected to the 
emulator.
 * Provides support for gdb reverse debugging commands like reverse-step and 
reverse-continue.
 * Supports auto-checkpointing for convenient reverse debugging.

Usage of the record/replay:
 * First, record the execution, by adding '-record fname=replay.bin' to the
   command line.
 * Then you can replay it for the multiple times by using another command
   line option: '-replay fname=replay.bin'
 * Virtual machine should have at least one virtual disk, which is used to
   store checkpoints. If you want to enable automatic checkpointing, simply
   add ',period=XX' to record options, where XX is the checkpointing period
   in seconds.
 * Using of the network adapters in record/replay mode is possible with 
   the following command-line options:
   - '-net user' (or another host adapter) in record mode
   - '-net replay' in replay mode. Every host network adapter should be
 replaced by 'replay' when replaying the execution.
 * Reverse debugging can be used through gdb remote interface.
   reverse-stepi and reverse-continue commands are supported. Other reverse
   commands should also work, because they reuse these ones.
 * Monitor is extended by the following commands:
   - replay_info - prints information about replay mode and current step
 (number of instructions executed)
   - replay_break - sets "breakpoint" at the specified instructions count.
   - replay_seek - rewinds (using the checkpoints, if possible) to the
 specified step of replay log.

Paper with short description of deterministic replay implementation:
http://www.computer.org/csdl/proceedings/csmr/2012/4666/00/4666a553-abs.html

Modifications of qemu include:
 * adding missed fields of the virtual devices' states to the vmstate 
   structures to allow deterministic saving and restoring the VM state
 * adding virtual clock-based timers to vmstate structures, because virtual 
   clock is the part of the virtual machine state
 * modification of block layer to support automatic creation of the overlay
   files to store the changes and snapshots while recording
 * disabling of system reset while loading VM state to avoid generating of
   interrupts by reset handlers
 * adding warpers for clock and time functions to save their return
   values in the log
 * saving different asynchronous events (e.g. system shutdown) into the log
 * synchronization of the bottom halves execution
 * synchronization of the threads from thread pool
 * recording/replaying user input (mouse and keyboard), input from virtual
   serial ports, incoming network packets, input from connected USB devices
 * adding HMP/QMP commands to monitor for controlling replay execution

v4 changes:
 * Updated block drivers to support new bdrv_open interface.
 * Moved migration patches into separate series (as suggested by Paolo Bonzini)
 * Fixed a bug in replay_break operation.
 * Fixed rtl8139 migration for replay.
 * Fixed 'period' parameter processing for record mode.
 * Fixed bug in 'reverse-stepi' implementation.
 * Fixed replay without making any snapshots (even the starting one).
 * Moved core replay patches into the separate series.
 * Fixed reverse step and reverse continue support.

v3 changes:
 * Fixed bug with replay of the aio write operations.
 * Added virtual clock based on replay icount.
 * Removed duplicated saving of interrupt_request CPU field.
 * Fixed some coding style issues.
 * Renamed QMP commands for controlling reverse execution (as suggested by Eric 
Blake)
 * Replay mode and submode implemented as QAPI enumerations (as suggested by 
Eric Blake)
 * Added description and example for replay-info command (as suggested by Eric 
Blake)
 * Added information about the current breakpoint to the output of replay-info 
(as suggested by Eric Blake)
 * Updated version id for HPET vmstate (as suggested by Paolo Bonzini)
 * Removed static

[Qemu-devel] [RFC PATCH v5 03/31] replay: global variables and function stubs

2014-11-26 Thread Pavel Dovgalyuk

This patch adds global variables, defines, functions declarations,
and function stubs for deterministic VM replay used by external modules.

Signed-off-by: Pavel Dovgalyuk 
---
 Makefile.target  |1 +
 qapi-schema.json |   32 
 replay/Makefile.objs |1 +
 replay/replay.c  |   25 +
 replay/replay.h  |   23 +++
 stubs/Makefile.objs  |1 +
 stubs/replay.c   |8 
 7 files changed, 91 insertions(+), 0 deletions(-)
 create mode 100755 replay/Makefile.objs
 create mode 100755 replay/replay.c
 create mode 100755 replay/replay.h
 create mode 100755 stubs/replay.c

diff --git a/Makefile.target b/Makefile.target
index e9ff1ee..a45378f 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -83,6 +83,7 @@ all: $(PROGS) stap
 #
 # cpu emulator library
 obj-y = exec.o translate-all.o cpu-exec.o
+obj-y += replay/
 obj-y += tcg/tcg.o tcg/optimize.o
 obj-$(CONFIG_TCG_INTERPRETER) += tci.o
 obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o
diff --git a/qapi-schema.json b/qapi-schema.json
index 24379ab..797600e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3508,3 +3508,35 @@
 # Since: 2.1
 ##
 { 'command': 'rtc-reset-reinjection' }
+
+##
+# ReplayMode:
+#
+# Mode of the replay subsystem.
+#
+# @none: normal execution mode. Replay or record are not enabled.
+#
+# @record: record mode. All non-deterministic data is written into the
+#  replay log.
+#
+# @play: replay mode. Non-deterministic data required for system execution
+#is read from the log.
+#
+# Since: 2.3
+##
+{ 'enum': 'ReplayMode',
+  'data': [ 'none', 'record', 'play' ] }
+
+##
+# ReplaySubmode:
+#
+# Submode of the replay subsystem.
+#
+# @unknown: used for modes different from play.
+#
+# @normal: normal replay mode.
+#
+# Since: 2.3
+##
+{ 'enum': 'ReplaySubmode',
+  'data': [ 'unknown', 'normal' ] }
diff --git a/replay/Makefile.objs b/replay/Makefile.objs
new file mode 100755
index 000..7ea860f
--- /dev/null
+++ b/replay/Makefile.objs
@@ -0,0 +1 @@
+obj-y += replay.o
diff --git a/replay/replay.c b/replay/replay.c
new file mode 100755
index 000..ac976b2
--- /dev/null
+++ b/replay/replay.c
@@ -0,0 +1,25 @@
+/*
+ * replay.c
+ *
+ * Copyright (c) 2010-2014 Institute for System Programming
+ * of the Russian Academy of Sciences.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "replay.h"
+
+ReplayMode replay_mode = REPLAY_MODE_NONE;
+/*! Stores current submode for PLAY mode */
+ReplaySubmode play_submode = REPLAY_SUBMODE_UNKNOWN;
+
+/* Suffix for the disk images filenames */
+char *replay_image_suffix;
+
+
+ReplaySubmode replay_get_play_submode(void)
+{
+return play_submode;
+}
diff --git a/replay/replay.h b/replay/replay.h
new file mode 100755
index 000..51a18fe
--- /dev/null
+++ b/replay/replay.h
@@ -0,0 +1,23 @@
+#ifndef REPLAY_H
+#define REPLAY_H
+
+/*
+ * replay.h
+ *
+ * Copyright (c) 2010-2014 Institute for System Programming
+ * of the Russian Academy of Sciences.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qapi-types.h"
+
+extern ReplayMode replay_mode;
+extern char *replay_image_suffix;
+
+/*! Returns replay play submode */
+ReplaySubmode replay_get_play_submode(void);
+
+#endif
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 5e347d0..45a6c71 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -27,6 +27,7 @@ stub-obj-y += notify-event.o
 stub-obj-y += pci-drive-hot-add.o
 stub-obj-$(CONFIG_SPICE) += qemu-chr-open-spice.o
 stub-obj-y += qtest.o
+stub-obj-y += replay.o
 stub-obj-y += reset.o
 stub-obj-y += runstate-check.o
 stub-obj-y += set-fd-handler.o
diff --git a/stubs/replay.c b/stubs/replay.c
new file mode 100755
index 000..b146d55
--- /dev/null
+++ b/stubs/replay.c
@@ -0,0 +1,8 @@
+#include "replay/replay.h"
+
+ReplayMode replay_mode;
+
+ReplaySubmode replay_get_play_submode(void)
+{
+return 0;
+}

[Qemu-devel] [RFC PATCH v5 07/31] icount: implement icount requesting

2014-11-26 Thread Pavel Dovgalyuk

Replay uses number of executed instructions to determine corrent events
injection moments. This patch introduces new function for querying the
instructions counter.

Signed-off-by: Pavel Dovgalyuk 
---
 cpus.c   |   26 +++---
 include/qemu/timer.h |1 +
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/cpus.c b/cpus.c
index a7b6c53..492e19a 100644
--- a/cpus.c
+++ b/cpus.c
@@ -136,8 +136,7 @@ typedef struct TimersState {
 
 static TimersState timers_state;
 
-/* Return the virtual CPU time, based on the instruction counter.  */
-static int64_t cpu_get_icount_locked(void)
+static int64_t cpu_get_instructions_counter_locked(void)
 {
 int64_t icount;
 CPUState *cpu = current_cpu;
@@ -145,10 +144,31 @@ static int64_t cpu_get_icount_locked(void)
 icount = timers_state.qemu_icount;
 if (cpu) {
 if (!cpu_can_do_io(cpu)) {
-fprintf(stderr, "Bad clock read\n");
+fprintf(stderr, "Bad icount read\n");
+exit(1);
 }
 icount -= (cpu->icount_decr.u16.low + cpu->icount_extra);
 }
+return icount;
+}
+
+int64_t cpu_get_instructions_counter(void)
+{
+/* This function calls are synchnonized to timer changes,
+   calling cpu_get_instructions_counter_locked without lock is safe */
+int64_t icount = timers_state.qemu_icount;
+CPUState *cpu = current_cpu;
+
+if (cpu) {
+icount -= (cpu->icount_decr.u16.low + cpu->icount_extra);
+}
+return icount;
+}
+
+/* Return the virtual CPU time, based on the instruction counter.  */
+static int64_t cpu_get_icount_locked(void)
+{
+int64_t icount = cpu_get_instructions_counter_locked();
 return timers_state.qemu_icount_bias + cpu_icount_to_ns(icount);
 }
 
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 5f5210d..38a02c5 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -743,6 +743,7 @@ static inline int64_t get_clock(void)
 #endif
 
 /* icount */
+int64_t cpu_get_instructions_counter(void);
 int64_t cpu_get_icount(void);
 int64_t cpu_get_clock(void);
 int64_t cpu_get_clock_offset(void);

[Qemu-devel] [RFC PATCH v5 10/31] i386: do not cross the pages boundaries in replay mode

2014-11-26 Thread Pavel Dovgalyuk

This patch denies crossing the boundary of the pages in the replay mode,
because it can cause an exception. Do it only when boundary is
crossed by the first instruction in the block.
If current instruction already crossed the bound - it's ok,
because an exception hasn't stopped this code.

Signed-off-by: Pavel Dovgalyuk 
---
 target-i386/cpu.h   |3 +++
 target-i386/translate.c |   14 ++
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 2968749..204aaf1 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -28,6 +28,9 @@
 #define TARGET_LONG_BITS 32
 #endif
 
+/* Maximum instruction code size */
+#define TARGET_MAX_INSN_SIZE 16
+
 /* target supports implicit self modifying code */
 #define TARGET_HAS_SMC
 /* support for self modifying code even if the modified instruction is
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 4d5dfb3..a264908 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -8035,6 +8035,20 @@ static inline void gen_intermediate_code_internal(X86CPU 
*cpu,
 gen_eob(dc);
 break;
 }
+/* Do not cross the boundary of the pages in icount mode,
+   it can cause an exception. Do it only when boundary is
+   crossed by the first instruction in the block.
+   If current instruction already crossed the bound - it's ok,
+   because an exception hasn't stopped this code.
+ */
+if (use_icount
+&& ((pc_ptr & TARGET_PAGE_MASK)
+!= ((pc_ptr + TARGET_MAX_INSN_SIZE - 1) & TARGET_PAGE_MASK)
+|| (pc_ptr & ~TARGET_PAGE_MASK) == 0)) {
+gen_jmp_im(pc_ptr - dc->cs_base);
+gen_eob(dc);
+break;
+}
 /* if too long translation, stop generation too */
 if (tcg_ctx.gen_opc_ptr >= gen_opc_end ||
 (pc_ptr - pc_start) >= (TARGET_PAGE_SIZE - 32) ||

[Qemu-devel] [RFC PATCH v5 02/31] acpi: accurate overflow check

2014-11-26 Thread Pavel Dovgalyuk

Compare clock in ns, because acpi_pm_tmr_update uses rounded
to ns value instead of ticks.

Reviewed-by: Paolo Bonzini 

Signed-off-by: Pavel Dovgalyuk 
---
 hw/acpi/core.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/core.c b/hw/acpi/core.c
index a7368fb..51913d6 100644
--- a/hw/acpi/core.c
+++ b/hw/acpi/core.c
@@ -376,8 +376,11 @@ static void acpi_notify_wakeup(Notifier *notifier, void 
*data)
 /* ACPI PM1a EVT */
 uint16_t acpi_pm1_evt_get_sts(ACPIREGS *ar)
 {
-int64_t d = acpi_pm_tmr_get_clock();
-if (d >= ar->tmr.overflow_time) {
+/* Compare ns-clock, not PM timer ticks, because
+   acpi_pm_tmr_update function uses ns for setting the timer. */
+int64_t d = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+if (d >= muldiv64(ar->tmr.overflow_time,
+  get_ticks_per_sec(), PM_TIMER_FREQUENCY)) {
 ar->pm1.evt.sts |= ACPI_BITMASK_TIMER_STATUS;
 }
 return ar->pm1.evt.sts;

[Qemu-devel] [RFC PATCH v5 09/31] replay: introduce icount event

2014-11-26 Thread Pavel Dovgalyuk

This patch adds icount event to the replay subsystem. This event corresponds
to execution of several instructions and used to synchronize input events
in the replay phase.

Signed-off-by: Pavel Dovgalyuk 
---
 replay/replay-internal.c |   14 ++
 replay/replay-internal.h |   18 ++
 replay/replay.c  |   45 +
 replay/replay.h  |7 +++
 4 files changed, 84 insertions(+), 0 deletions(-)

diff --git a/replay/replay-internal.c b/replay/replay-internal.c
index 429b13c..83a53bd 100755
--- a/replay/replay-internal.c
+++ b/replay/replay-internal.c
@@ -10,6 +10,7 @@
  */
 
 #include "qemu-common.h"
+#include "replay.h"
 #include "replay-internal.h"
 
 volatile unsigned int replay_data_kind = -1;
@@ -139,3 +140,16 @@ void replay_fetch_data_kind(void)
 }
 }
 }
+
+/*! Saves cached instructions. */
+void replay_save_instructions(void)
+{
+if (replay_file && replay_mode == REPLAY_MODE_RECORD) {
+int diff = (int)(replay_get_current_step() - 
replay_state.current_step);
+if (first_cpu != NULL && diff > 0) {
+replay_put_event(EVENT_INSTRUCTION);
+replay_put_dword(diff);
+replay_state.current_step += diff;
+}
+}
+}
diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index 927f7c7..582b44c 100755
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -14,6 +14,17 @@
 
 #include 
 
+/* for instruction event */
+#define EVENT_INSTRUCTION   32
+
+typedef struct ReplayState {
+/*! Current step - number of processed instructions and timer events. */
+uint64_t current_step;
+/*! Number of instructions to be executed before other events happen. */
+int instructions_count;
+} ReplayState;
+extern ReplayState replay_state;
+
 extern volatile unsigned int replay_data_kind;
 extern volatile unsigned int replay_has_unread_data;
 
@@ -47,4 +58,11 @@ void replay_save_instructions(void);
 Terminates the program in case of error. */
 void validate_data_kind(int kind);
 
+/*! Skips async events until some sync event will be found. */
+bool skip_async_events(int stop_event);
+/*! Skips async events invocations from the input,
+until required data kind is found. If the requested data is not found
+reports an error and stops the execution. */
+void skip_async_events_until(unsigned int kind);
+
 #endif
diff --git a/replay/replay.c b/replay/replay.c
index ac976b2..c305e0c 100755
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -9,7 +9,10 @@
  *
  */
 
+#include "qemu-common.h"
 #include "replay.h"
+#include "replay-internal.h"
+#include "qemu/timer.h"
 
 ReplayMode replay_mode = REPLAY_MODE_NONE;
 /*! Stores current submode for PLAY mode */
@@ -18,8 +21,50 @@ ReplaySubmode play_submode = REPLAY_SUBMODE_UNKNOWN;
 /* Suffix for the disk images filenames */
 char *replay_image_suffix;
 
+ReplayState replay_state;
 
 ReplaySubmode replay_get_play_submode(void)
 {
 return play_submode;
 }
+
+bool skip_async_events(int stop_event)
+{
+/* nothing to skip - not all instructions used */
+if (replay_state.instructions_count != 0
+&& replay_has_unread_data) {
+return stop_event == EVENT_INSTRUCTION;
+}
+
+bool res = false;
+while (true) {
+replay_fetch_data_kind();
+if (stop_event == replay_data_kind) {
+res = true;
+}
+switch (replay_data_kind) {
+case EVENT_INSTRUCTION:
+replay_state.instructions_count = replay_get_dword();
+return res;
+default:
+/* clock, time_t, checkpoint and other events */
+return res;
+}
+}
+
+return res;
+}
+
+void skip_async_events_until(unsigned int kind)
+{
+if (!skip_async_events(kind)) {
+fprintf(stderr, "%"PRId64": Read data kind %d instead of expected 
%d\n",
+replay_get_current_step(), replay_data_kind, kind);
+exit(1);
+}
+}
+
+uint64_t replay_get_current_step(void)
+{
+return cpu_get_instructions_counter();
+}
diff --git a/replay/replay.h b/replay/replay.h
index 51a18fe..e40daf5 100755
--- a/replay/replay.h
+++ b/replay/replay.h
@@ -12,6 +12,8 @@
  *
  */
 
+#include 
+#include 
 #include "qapi-types.h"
 
 extern ReplayMode replay_mode;
@@ -20,4 +22,9 @@ extern char *replay_image_suffix;
 /*! Returns replay play submode */
 ReplaySubmode replay_get_play_submode(void);
 
+/* Processing the instructions */
+
+/*! Returns number of executed instructions. */
+uint64_t replay_get_current_step(void);
+
 #endif

[Qemu-devel] [RFC PATCH v5 08/31] icount: improve enable/disable ticks

2014-11-26 Thread Pavel Dovgalyuk

This patch eliminates call of the cpu_get_real_ticks while enabling
or disabling the virtual timer in icount mode. These calls are used
for cpu_ticks_offset which is not needed in this mode.

Reviewed-by: Paolo Bonzini 

Signed-off-by: Pavel Dovgalyuk 
---
 cpus.c |   12 
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/cpus.c b/cpus.c
index 492e19a..43ae7fc 100644
--- a/cpus.c
+++ b/cpus.c
@@ -267,8 +267,10 @@ void cpu_enable_ticks(void)
 /* Here, the really thing protected by seqlock is cpu_clock_offset. */
 seqlock_write_lock(&timers_state.vm_clock_seqlock);
 if (!timers_state.cpu_ticks_enabled) {
-timers_state.cpu_ticks_offset -= cpu_get_real_ticks();
-timers_state.cpu_clock_offset -= get_clock();
+if (!use_icount) {
+timers_state.cpu_ticks_offset -= cpu_get_real_ticks();
+timers_state.cpu_clock_offset -= get_clock();
+}
 timers_state.cpu_ticks_enabled = 1;
 }
 seqlock_write_unlock(&timers_state.vm_clock_seqlock);
@@ -283,8 +285,10 @@ void cpu_disable_ticks(void)
 /* Here, the really thing protected by seqlock is cpu_clock_offset. */
 seqlock_write_lock(&timers_state.vm_clock_seqlock);
 if (timers_state.cpu_ticks_enabled) {
-timers_state.cpu_ticks_offset += cpu_get_real_ticks();
-timers_state.cpu_clock_offset = cpu_get_clock_locked();
+if (!use_icount) {
+timers_state.cpu_ticks_offset += cpu_get_real_ticks();
+timers_state.cpu_clock_offset = cpu_get_clock_locked();
+}
 timers_state.cpu_ticks_enabled = 0;
 }
 seqlock_write_unlock(&timers_state.vm_clock_seqlock);

[Qemu-devel] [RFC PATCH v5 04/31] sysemu: system functions for replay

2014-11-26 Thread Pavel Dovgalyuk

This patch removes "static" specifier from several qemu function to make
them visible to the replay module. It also invents several system functions
that will be used by replay.

Signed-off-by: Pavel Dovgalyuk 
---
 cpus.c  |4 ++--
 include/exec/exec-all.h |1 +
 include/qom/cpu.h   |   10 ++
 include/sysemu/cpus.h   |1 +
 translate-all.c |8 
 5 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index 0c33458..e53d605 100644
--- a/cpus.c
+++ b/cpus.c
@@ -88,7 +88,7 @@ static bool cpu_thread_is_idle(CPUState *cpu)
 return true;
 }
 
-static bool all_cpu_threads_idle(void)
+bool all_cpu_threads_idle(void)
 {
 CPUState *cpu;
 
@@ -1112,7 +1112,7 @@ bool qemu_cpu_is_self(CPUState *cpu)
 return qemu_thread_is_self(cpu->thread);
 }
 
-static bool qemu_in_vcpu_thread(void)
+bool qemu_in_vcpu_thread(void)
 {
 return current_cpu && qemu_cpu_is_self(current_cpu);
 }
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index ab956f2..1d17f75 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -212,6 +212,7 @@ static inline unsigned int tb_phys_hash_func(tb_page_addr_t 
pc)
 
 void tb_free(TranslationBlock *tb);
 void tb_flush(CPUArchState *env);
+void tb_flush_all(void);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 
 #if defined(USE_DIRECT_JUMP)
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 2098f1c..5afb44c 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -480,6 +480,16 @@ static inline bool cpu_has_work(CPUState *cpu)
 bool qemu_cpu_is_self(CPUState *cpu);
 
 /**
+ * qemu_in_vcpu_thread:
+ *
+ * Checks whether the caller is executing on the vCPU thread
+ * of the current vCPU.
+ *
+ * Returns: %true if called from vCPU's thread, %false otherwise.
+ */
+bool qemu_in_vcpu_thread(void);
+
+/**
  * qemu_cpu_kick:
  * @cpu: The vCPU to kick.
  *
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index 3f162a9..86ae556 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -6,6 +6,7 @@ void qemu_init_cpu_loop(void);
 void resume_all_vcpus(void);
 void pause_all_vcpus(void);
 void cpu_stop_current(void);
+bool all_cpu_threads_idle(void);
 
 void cpu_synchronize_all_states(void);
 void cpu_synchronize_all_post_reset(void);
diff --git a/translate-all.c b/translate-all.c
index ba5c840..7177b71 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -806,6 +806,14 @@ void tb_flush(CPUArchState *env1)
 tcg_ctx.tb_ctx.tb_flush_count++;
 }
 
+void tb_flush_all(void)
+{
+CPUState *cpu;
+for (cpu = first_cpu ; cpu != NULL ; cpu = CPU_NEXT(cpu)) {
+tb_flush(cpu->env_ptr);
+}
+}
+
 #ifdef DEBUG_TB_CHECK
 
 static void tb_invalidate_check(target_ulong address)

[Qemu-devel] [RFC PATCH v5 14/31] From 04bbd21134dd2c6b7309a7f5f2b780aae2757003 Mon Sep 17 00:00:00 2001

2014-11-26 Thread Pavel Dovgalyuk

From: Paolo Bonzini 

Subject: [PATCH] gen-icount: check cflags instead of use_icount global

Signed-off-by: Paolo Bonzini 

Signed-off-by: Pavel Dovgalyuk 
---
 include/exec/gen-icount.h |6 +++---
 target-alpha/translate.c  |2 +-
 target-arm/translate-a64.c|2 +-
 target-arm/translate.c|2 +-
 target-cris/translate.c   |2 +-
 target-i386/translate.c   |2 +-
 target-lm32/translate.c   |2 +-
 target-m68k/translate.c   |2 +-
 target-microblaze/translate.c |2 +-
 target-mips/translate.c   |2 +-
 target-moxie/translate.c  |2 +-
 target-openrisc/translate.c   |2 +-
 target-ppc/translate.c|2 +-
 target-s390x/translate.c  |2 +-
 target-sh4/translate.c|2 +-
 target-sparc/translate.c  |2 +-
 target-tricore/translate.c|2 +-
 target-unicore32/translate.c  |2 +-
 target-xtensa/translate.c |2 +-
 19 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index da53395..221aad0 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -9,7 +9,7 @@ static TCGArg *icount_arg;
 static int icount_label;
 static int exitreq_label;
 
-static inline void gen_tb_start(void)
+static inline void gen_tb_start(TranslationBlock *tb)
 {
 TCGv_i32 count;
 TCGv_i32 flag;
@@ -21,7 +21,7 @@ static inline void gen_tb_start(void)
 tcg_gen_brcondi_i32(TCG_COND_NE, flag, 0, exitreq_label);
 tcg_temp_free_i32(flag);
 
-if (!use_icount)
+if (!(tb->cflags & CF_USE_ICOUNT))
 return;
 
 icount_label = gen_new_label();
@@ -43,7 +43,7 @@ static void gen_tb_end(TranslationBlock *tb, int num_insns)
 gen_set_label(exitreq_label);
 tcg_gen_exit_tb((uintptr_t)tb + TB_EXIT_REQUESTED);
 
-if (use_icount) {
+if (tb->cflags & CF_USE_ICOUNT) {
 *icount_arg = num_insns;
 gen_set_label(icount_label);
 tcg_gen_exit_tb((uintptr_t)tb + TB_EXIT_ICOUNT_EXPIRED);
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 5387b93..f888367 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -2828,7 +2828,7 @@ static inline void 
gen_intermediate_code_internal(AlphaCPU *cpu,
 pc_mask = ~TARGET_PAGE_MASK;
 }
 
-gen_tb_start();
+gen_tb_start(tb);
 do {
 if (unlikely(!QTAILQ_EMPTY(&cs->breakpoints))) {
 QTAILQ_FOREACH(bp, &cs->breakpoints, entry) {
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index ba6a85c..7f17a0c 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -10963,7 +10963,7 @@ void gen_intermediate_code_internal_a64(ARMCPU *cpu,
 max_insns = CF_COUNT_MASK;
 }
 
-gen_tb_start();
+gen_tb_start(tb);
 
 tcg_clear_temp_count();
 
diff --git a/target-arm/translate.c b/target-arm/translate.c
index d7343df..f3bef91 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -11049,7 +11049,7 @@ static inline void 
gen_intermediate_code_internal(ARMCPU *cpu,
 if (max_insns == 0)
 max_insns = CF_COUNT_MASK;
 
-gen_tb_start();
+gen_tb_start(tb);
 
 tcg_clear_temp_count();
 
diff --git a/target-cris/translate.c b/target-cris/translate.c
index e37b04e..19a452d 100644
--- a/target-cris/translate.c
+++ b/target-cris/translate.c
@@ -3206,7 +3206,7 @@ gen_intermediate_code_internal(CRISCPU *cpu, 
TranslationBlock *tb,
 max_insns = CF_COUNT_MASK;
 }
 
-gen_tb_start();
+gen_tb_start(tb);
 do {
 check_breakpoint(env, dc);
 
diff --git a/target-i386/translate.c b/target-i386/translate.c
index b7631b7..ceaaa54 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -8003,7 +8003,7 @@ static inline void gen_intermediate_code_internal(X86CPU 
*cpu,
 if (max_insns == 0)
 max_insns = CF_COUNT_MASK;
 
-gen_tb_start();
+gen_tb_start(tb);
 for(;;) {
 if (unlikely(!QTAILQ_EMPTY(&cs->breakpoints))) {
 QTAILQ_FOREACH(bp, &cs->breakpoints, entry) {
diff --git a/target-lm32/translate.c b/target-lm32/translate.c
index f748f96..a7579dc 100644
--- a/target-lm32/translate.c
+++ b/target-lm32/translate.c
@@ -1095,7 +1095,7 @@ void gen_intermediate_code_internal(LM32CPU *cpu,
 max_insns = CF_COUNT_MASK;
 }
 
-gen_tb_start();
+gen_tb_start(tb);
 do {
 check_breakpoint(env, dc);
 
diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index efd4cfc..47edc7a 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -3010,7 +3010,7 @@ gen_intermediate_code_internal(M68kCPU *cpu, 
TranslationBlock *tb,
 if (max_insns == 0)
 max_insns = CF_COUNT_MASK;
 
-gen_tb_start();
+gen_tb_start(tb);
 do {
 pc_offset = dc->pc - pc_start;
 gen_throws_exception = NULL;
diff --git a/target-microblaze/translate.c b/target-microblaze/translate.c
index fd2b771..69ce4df 100

[Qemu-devel] [RFC PATCH v5 13/31] From a0cb9e80ba0de409b5ad556109a1c71ce4d8ce19 Mon Sep 17 00:00:00 2001

2014-11-26 Thread Pavel Dovgalyuk

From: Paolo Bonzini 

Subject: [PATCH] translate: check cflags instead of use_icount global

Signed-off-by: Paolo Bonzini 

Signed-off-by: Pavel Dovgalyuk 
---
 target-alpha/translate.c|8 ---
 target-arm/translate-a64.c  |4 ++--
 target-arm/translate.c  |4 ++--
 target-i386/translate.c |   46 ++-
 target-lm32/translate.c |8 ---
 target-mips/translate.c |   24 +-
 target-ppc/translate_init.c |   24 +++---
 translate-all.c |2 +-
 8 files changed, 67 insertions(+), 53 deletions(-)

diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 76658a0..5387b93 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -1285,7 +1285,7 @@ static int cpu_pr_data(int pr)
 return 0;
 }
 
-static ExitStatus gen_mfpr(TCGv va, int regno)
+static ExitStatus gen_mfpr(DisasContext *ctx, TCGv va, int regno)
 {
 int data = cpu_pr_data(regno);
 
@@ -1295,7 +1295,7 @@ static ExitStatus gen_mfpr(TCGv va, int regno)
if (regno == 249) {
helper = gen_helper_get_vmtime;
}
-if (use_icount) {
+if (ctx->tb->cflags & CF_USE_ICOUNT) {
 gen_io_start();
 helper(va);
 gen_io_end();
@@ -2283,7 +2283,7 @@ static ExitStatus translate_one(DisasContext *ctx, 
uint32_t insn)
 case 0xC000:
 /* RPCC */
 va = dest_gpr(ctx, ra);
-if (use_icount) {
+if (ctx->tb->cflags & CF_USE_ICOUNT) {
 gen_io_start();
 gen_helper_load_pcc(va, cpu_env);
 gen_io_end();
@@ -2317,7 +2317,7 @@ static ExitStatus translate_one(DisasContext *ctx, 
uint32_t insn)
 #ifndef CONFIG_USER_ONLY
 REQUIRE_TB_FLAG(TB_FLAGS_PAL_MODE);
 va = dest_gpr(ctx, ra);
-ret = gen_mfpr(va, insn & 0x);
+ret = gen_mfpr(ctx, va, insn & 0x);
 break;
 #else
 goto invalid_opc;
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 3a3c48a..ba6a85c 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -1373,7 +1373,7 @@ static void handle_sys(DisasContext *s, uint32_t insn, 
bool isread,
 break;
 }
 
-if (use_icount && (ri->type & ARM_CP_IO)) {
+if ((s->tb->cflags & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
 gen_io_start();
 }
 
@@ -1404,7 +1404,7 @@ static void handle_sys(DisasContext *s, uint32_t insn, 
bool isread,
 }
 }
 
-if (use_icount && (ri->type & ARM_CP_IO)) {
+if ((s->tb->cflags & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
 /* I/O operations must end the TB here (whether read or write) */
 gen_io_end();
 s->is_jmp = DISAS_UPDATE;
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 9436650..d7343df 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7153,7 +7153,7 @@ static int disas_coproc_insn(CPUARMState * env, 
DisasContext *s, uint32_t insn)
 break;
 }
 
-if (use_icount && (ri->type & ARM_CP_IO)) {
+if ((s->tb->cflags & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
 gen_io_start();
 }
 
@@ -7244,7 +7244,7 @@ static int disas_coproc_insn(CPUARMState * env, 
DisasContext *s, uint32_t insn)
 }
 }
 
-if (use_icount && (ri->type & ARM_CP_IO)) {
+if ((s->tb->cflags & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
 /* I/O operations must end the TB here (whether read or write) */
 gen_io_end();
 gen_lookup_tb(s);
diff --git a/target-i386/translate.c b/target-i386/translate.c
index a264908..b7631b7 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -1168,8 +1168,9 @@ static inline void gen_cmps(DisasContext *s, TCGMemOp ot)
 
 static inline void gen_ins(DisasContext *s, TCGMemOp ot)
 {
-if (use_icount)
+if (s->tb->cflags & CF_USE_ICOUNT) {
 gen_io_start();
+}
 gen_string_movl_A0_EDI(s);
 /* Note: we must do this dummy write first to be restartable in
case of page fault. */
@@ -1181,14 +1182,16 @@ static inline void gen_ins(DisasContext *s, TCGMemOp ot)
 gen_op_st_v(s, ot, cpu_T[0], cpu_A0);
 gen_op_movl_T0_Dshift(ot);
 gen_op_add_reg_T0(s->aflag, R_EDI);
-if (use_icount)
+if (s->tb->cflags & CF_USE_ICOUNT) {
 gen_io_end();
+}
 }
 
 static inline void gen_outs(DisasContext *s, TCGMemOp ot)
 {
-if (use_icount)
+if (s->tb->cflags & CF_USE_ICOUNT) {
 gen_io_start();
+}
 gen_string_movl_A0_ESI(s);
 gen_op_ld_v(s, ot, cpu_T[0], cpu_A0);
 
@@ -1199,8 +1202,9 @@ static inline void gen_outs(DisasContext *s, TCGMemOp ot)
 
 gen_op_movl_T0_Dshift(ot);
 gen_op_add_reg_T0(s->aflag, R_ESI);
-if (use_icount)
+if (s->tb->cflags & CF_USE_ICOUNT) {
 gen_io_end();
+}
 }
 
 /* same method as Valgri

[Qemu-devel] [RFC PATCH v5 20/31] replay: recording and replaying clock ticks

2014-11-26 Thread Pavel Dovgalyuk

Clock ticks are considered as the sources of non-deterministic data for
virtual machine. This patch implements saving the clock values when they
are acquired (virtual, host clock, rdtsc, and some other timers).
When replaying the execution corresponding values are read from log and
transfered to the module, which wants to read the values.
Such a design required the clock polling to be synchronized. Sometimes
it is not true - e.g. when timeouts for timer lists are checked. In this case
we use a cached value of the clock, passing it to the client code.

Signed-off-by: Pavel Dovgalyuk 
---
 include/qemu/timer.h |   20 ++--
 qemu-timer.c |3 +-
 replay/Makefile.objs |1 +
 replay/replay-internal.h |   11 ++
 replay/replay-time.c |   79 ++
 replay/replay.h  |   21 
 stubs/replay.c   |9 +
 7 files changed, 141 insertions(+), 3 deletions(-)
 create mode 100755 replay/replay-time.c

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 38a02c5..7b43331 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -4,6 +4,7 @@
 #include "qemu/typedefs.h"
 #include "qemu-common.h"
 #include "qemu/notify.h"
+#include "replay/replay.h"
 
 /* timers */
 
@@ -699,8 +700,8 @@ static inline int64_t get_ticks_per_sec(void)
  * Low level clock functions
  */
 
-/* real time host monotonic timer */
-static inline int64_t get_clock_realtime(void)
+/* real time host monotonic timer implementation */
+static inline int64_t get_clock_realtime_impl(void)
 {
 struct timeval tv;
 
@@ -708,6 +709,12 @@ static inline int64_t get_clock_realtime(void)
 return tv.tv_sec * 10LL + (tv.tv_usec * 1000);
 }
 
+/* real time host monotonic timer interface */
+static inline int64_t get_clock_realtime(void)
+{
+return REPLAY_CLOCK(REPLAY_CLOCK_HOST, get_clock_realtime_impl());
+}
+
 /* Warning: don't insert tracepoints into these functions, they are
also used by simpletrace backend and tracepoints would cause
an infinite recursion! */
@@ -752,6 +759,8 @@ int64_t cpu_icount_to_ns(int64_t icount);
 /***/
 /* host CPU ticks (if available) */
 
+#define cpu_get_real_ticks cpu_get_real_ticks_impl
+
 #if defined(_ARCH_PPC)
 
 static inline int64_t cpu_get_real_ticks(void)
@@ -905,6 +914,13 @@ static inline int64_t cpu_get_real_ticks (void)
 }
 #endif
 
+#undef cpu_get_real_ticks
+
+static inline int64_t cpu_get_real_ticks(void)
+{
+return REPLAY_CLOCK(REPLAY_CLOCK_REAL_TICKS, cpu_get_real_ticks_impl());
+}
+
 #ifdef CONFIG_PROFILER
 static inline int64_t profile_getclock(void)
 {
diff --git a/qemu-timer.c b/qemu-timer.c
index 00a5d35..8307913 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -25,6 +25,7 @@
 #include "sysemu/sysemu.h"
 #include "monitor/monitor.h"
 #include "ui/console.h"
+#include "replay/replay.h"
 
 #include "hw/hw.h"
 
@@ -562,7 +563,7 @@ int64_t qemu_clock_get_ns(QEMUClockType type)
 now = get_clock_realtime();
 last = clock->last;
 clock->last = now;
-if (now < last) {
+if (now < last && replay_mode == REPLAY_MODE_NONE) {
 notifier_list_notify(&clock->reset_notifiers, &now);
 }
 return now;
diff --git a/replay/Makefile.objs b/replay/Makefile.objs
index 56da09c..257c320 100755
--- a/replay/Makefile.objs
+++ b/replay/Makefile.objs
@@ -1,3 +1,4 @@
 obj-y += replay.o
 obj-y += replay-internal.o
 obj-y += replay-events.o
+obj-y += replay-time.o
diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index fcba977..c36d7de 100755
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -22,12 +22,17 @@
 #define EVENT_ASYNC_OPT 25
 /* for instruction event */
 #define EVENT_INSTRUCTION   32
+/* for clock read/writes */
+#define EVENT_CLOCK 64
+/* some of grteater codes are reserved for clocks */
 
 /* Asynchronous events IDs */
 
 #define REPLAY_ASYNC_COUNT 0
 
 typedef struct ReplayState {
+/*! Cached clock values. */
+int64_t cached_clock[REPLAY_CLOCK_COUNT];
 /*! Current step - number of processed instructions and timer events. */
 uint64_t current_step;
 /*! Number of instructions to be executed before other events happen. */
@@ -75,6 +80,12 @@ bool skip_async_events(int stop_event);
 reports an error and stops the execution. */
 void skip_async_events_until(unsigned int kind);
 
+/*! Reads next clock value from the file.
+If clock kind read from the file is different from the parameter,
+the value is not used.
+If the parameter is -1, the clock value is read to the cache anyway. */
+void replay_read_next_clock(unsigned int kind);
+
 /* Asynchronous events queue */
 
 /*! Initializes events' processing internals */
diff --git a/replay/replay-time.c b/replay/replay-time.c
new file mode 100755
index 000..3f94f4e
--- /dev/null
+++ b/replay/replay-time.c
@@ -0,0 +1,79

[Qemu-devel] [RFC PATCH v5 05/31] replay: internal functions for replay log

2014-11-26 Thread Pavel Dovgalyuk

This patch adds functions to perform read and write operations
with replay log.

Signed-off-by: Pavel Dovgalyuk 
---
 replay/Makefile.objs |1 
 replay/replay-internal.c |  141 ++
 replay/replay-internal.h |   50 
 3 files changed, 192 insertions(+), 0 deletions(-)
 create mode 100755 replay/replay-internal.c
 create mode 100755 replay/replay-internal.h

diff --git a/replay/Makefile.objs b/replay/Makefile.objs
index 7ea860f..1148f45 100755
--- a/replay/Makefile.objs
+++ b/replay/Makefile.objs
@@ -1 +1,2 @@
 obj-y += replay.o
+obj-y += replay-internal.o
diff --git a/replay/replay-internal.c b/replay/replay-internal.c
new file mode 100755
index 000..429b13c
--- /dev/null
+++ b/replay/replay-internal.c
@@ -0,0 +1,141 @@
+/*
+ * replay-internal.c
+ *
+ * Copyright (c) 2010-2014 Institute for System Programming
+ * of the Russian Academy of Sciences.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "replay-internal.h"
+
+volatile unsigned int replay_data_kind = -1;
+volatile unsigned int replay_has_unread_data;
+
+/* File for replay writing */
+FILE *replay_file;
+
+void replay_put_byte(uint8_t byte)
+{
+if (replay_file) {
+fwrite(&byte, sizeof(byte), 1, replay_file);
+}
+}
+
+void replay_put_event(uint8_t event)
+{
+replay_put_byte(event);
+}
+
+
+void replay_put_word(uint16_t word)
+{
+if (replay_file) {
+fwrite(&word, sizeof(word), 1, replay_file);
+}
+}
+
+void replay_put_dword(uint32_t dword)
+{
+if (replay_file) {
+fwrite(&dword, sizeof(dword), 1, replay_file);
+}
+}
+
+void replay_put_qword(int64_t qword)
+{
+if (replay_file) {
+fwrite(&qword, sizeof(qword), 1, replay_file);
+}
+}
+
+void replay_put_array(const uint8_t *buf, size_t size)
+{
+if (replay_file) {
+fwrite(&size, sizeof(size), 1, replay_file);
+fwrite(buf, 1, size, replay_file);
+}
+}
+
+uint8_t replay_get_byte(void)
+{
+uint8_t byte;
+if (replay_file) {
+fread(&byte, sizeof(byte), 1, replay_file);
+}
+return byte;
+}
+
+uint16_t replay_get_word(void)
+{
+uint16_t word;
+if (replay_file) {
+fread(&word, sizeof(word), 1, replay_file);
+}
+
+return word;
+}
+
+uint32_t replay_get_dword(void)
+{
+uint32_t dword;
+if (replay_file) {
+fread(&dword, sizeof(dword), 1, replay_file);
+}
+
+return dword;
+}
+
+int64_t replay_get_qword(void)
+{
+int64_t qword;
+if (replay_file) {
+fread(&qword, sizeof(qword), 1, replay_file);
+}
+
+return qword;
+}
+
+void replay_get_array(uint8_t *buf, size_t *size)
+{
+if (replay_file) {
+fread(size, sizeof(*size), 1, replay_file);
+fread(buf, 1, *size, replay_file);
+}
+}
+
+void replay_get_array_alloc(uint8_t **buf, size_t *size)
+{
+if (replay_file) {
+fread(size, sizeof(*size), 1, replay_file);
+*buf = g_malloc(*size);
+fread(*buf, 1, *size, replay_file);
+}
+}
+
+void replay_check_error(void)
+{
+if (replay_file) {
+if (feof(replay_file)) {
+fprintf(stderr, "replay file is over\n");
+exit(1);
+} else if (ferror(replay_file)) {
+fprintf(stderr, "replay file is over or something goes wrong\n");
+exit(1);
+}
+}
+}
+
+void replay_fetch_data_kind(void)
+{
+if (replay_file) {
+if (!replay_has_unread_data) {
+replay_data_kind = replay_get_byte();
+replay_check_error();
+replay_has_unread_data = 1;
+}
+}
+}
diff --git a/replay/replay-internal.h b/replay/replay-internal.h
new file mode 100755
index 000..927f7c7
--- /dev/null
+++ b/replay/replay-internal.h
@@ -0,0 +1,50 @@
+#ifndef REPLAY_INTERNAL_H
+#define REPLAY_INTERNAL_H
+
+/*
+ * replay-internal.h
+ *
+ * Copyright (c) 2010-2014 Institute for System Programming
+ * of the Russian Academy of Sciences.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include 
+
+extern volatile unsigned int replay_data_kind;
+extern volatile unsigned int replay_has_unread_data;
+
+/* File for replay writing */
+extern FILE *replay_file;
+
+void replay_put_byte(uint8_t byte);
+void replay_put_event(uint8_t event);
+void replay_put_word(uint16_t word);
+void replay_put_dword(uint32_t dword);
+void replay_put_qword(int64_t qword);
+void replay_put_array(const uint8_t *buf, size_t size);
+
+uint8_t replay_get_byte(void);
+uint16_t replay_get_word(void);
+uint32_t replay_get_dword(void);
+int64_t replay_get_qword(void);
+void replay_get_array(uint8_t *buf, size_t *size);
+void replay_get_array_alloc(uint8_t **buf, size_t *size);
+
+/*! Checks error status of the fil

[Qemu-devel] [RFC PATCH v5 18/31] replay: asynchronous events infrastructure

2014-11-26 Thread Pavel Dovgalyuk

This patch adds module for saving and replaying asynchronous events.
These events include network packets, keyboard and mouse input,
USB packets, thread pool and bottom halves callbacks.
All events are stored in the queue to be processed at synchronization points
such as beginning of TB execution, or checkpoint in the iothread.

Signed-off-by: Pavel Dovgalyuk 
---
 replay/Makefile.objs |1 
 replay/replay-events.c   |  217 ++
 replay/replay-internal.h |   27 ++
 replay/replay.h  |4 +
 4 files changed, 249 insertions(+), 0 deletions(-)
 create mode 100755 replay/replay-events.c

diff --git a/replay/Makefile.objs b/replay/Makefile.objs
index 1148f45..56da09c 100755
--- a/replay/Makefile.objs
+++ b/replay/Makefile.objs
@@ -1,2 +1,3 @@
 obj-y += replay.o
 obj-y += replay-internal.o
+obj-y += replay-events.o
diff --git a/replay/replay-events.c b/replay/replay-events.c
new file mode 100755
index 000..f3c9b16
--- /dev/null
+++ b/replay/replay-events.c
@@ -0,0 +1,217 @@
+/*
+ * replay-events.c
+ *
+ * Copyright (c) 2010-2014 Institute for System Programming
+ * of the Russian Academy of Sciences.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "replay.h"
+#include "replay-internal.h"
+
+typedef struct Event {
+int event_kind;
+void *opaque;
+void *opaque2;
+uint64_t id;
+
+QTAILQ_ENTRY(Event) events;
+} Event;
+
+static QTAILQ_HEAD(, Event) events_list = QTAILQ_HEAD_INITIALIZER(events_list);
+
+static QemuMutex lock;
+static unsigned int read_event_kind = -1;
+static uint64_t read_id = -1;
+static int read_opt = -1;
+
+static bool replay_events_enabled = false;
+
+/* Functions */
+
+static void replay_run_event(Event *event)
+{
+switch (event->event_kind) {
+default:
+fprintf(stderr, "Replay: invalid async event ID (%d) in the queue\n",
+event->event_kind);
+exit(1);
+break;
+}
+}
+
+void replay_enable_events(void)
+{
+replay_events_enabled = true;
+}
+
+bool replay_has_events(void)
+{
+return !QTAILQ_EMPTY(&events_list);
+}
+
+void replay_flush_events(void)
+{
+qemu_mutex_lock(&lock);
+while (!QTAILQ_EMPTY(&events_list)) {
+Event *event = QTAILQ_FIRST(&events_list);
+replay_run_event(event);
+QTAILQ_REMOVE(&events_list, event, events);
+g_free(event);
+}
+qemu_mutex_unlock(&lock);
+}
+
+void replay_disable_events(void)
+{
+replay_events_enabled = false;
+/* Flush events queue before waiting of completion */
+replay_flush_events();
+}
+
+void replay_clear_events(void)
+{
+qemu_mutex_lock(&lock);
+while (!QTAILQ_EMPTY(&events_list)) {
+Event *event = QTAILQ_FIRST(&events_list);
+QTAILQ_REMOVE(&events_list, event, events);
+
+g_free(event);
+}
+qemu_mutex_unlock(&lock);
+}
+
+static void replay_add_event_internal(int event_kind, void *opaque,
+  void *opaque2, uint64_t id)
+{
+if (event_kind >= REPLAY_ASYNC_COUNT) {
+fprintf(stderr, "Replay: invalid async event ID (%d)\n", event_kind);
+exit(1);
+}
+if (!replay_file || replay_mode == REPLAY_MODE_NONE
+|| !replay_events_enabled) {
+Event e;
+e.event_kind = event_kind;
+e.opaque = opaque;
+e.opaque2 = opaque2;
+e.id = id;
+replay_run_event(&e);
+return;
+}
+
+Event *event = g_malloc0(sizeof(Event));
+event->event_kind = event_kind;
+event->opaque = opaque;
+event->opaque2 = opaque2;
+event->id = id;
+
+qemu_mutex_lock(&lock);
+QTAILQ_INSERT_TAIL(&events_list, event, events);
+qemu_mutex_unlock(&lock);
+}
+
+void replay_add_event(int event_kind, void *opaque)
+{
+replay_add_event_internal(event_kind, opaque, NULL, 0);
+}
+
+void replay_save_events(int opt)
+{
+qemu_mutex_lock(&lock);
+while (!QTAILQ_EMPTY(&events_list)) {
+Event *event = QTAILQ_FIRST(&events_list);
+if (replay_mode != REPLAY_MODE_PLAY) {
+/* put the event into the file */
+replay_put_event(EVENT_ASYNC_OPT);
+replay_put_byte(opt);
+replay_put_byte(event->event_kind);
+
+/* save event-specific data */
+switch (event->event_kind) {
+}
+}
+
+replay_run_event(event);
+QTAILQ_REMOVE(&events_list, event, events);
+g_free(event);
+}
+qemu_mutex_unlock(&lock);
+}
+
+void replay_read_events(int opt)
+{
+replay_fetch_data_kind();
+while (replay_data_kind == EVENT_ASYNC_OPT) {
+if (read_event_kind == -1) {
+read_opt = replay_get_byte();
+read_event_kind = replay_get_byte();
+read_id = -1;
+replay_check_error();
+}
+
+if (opt

[Qemu-devel] [RFC PATCH v5 22/31] timer: introduce new QEMU_CLOCK_VIRTUAL_RT clock

2014-11-26 Thread Pavel Dovgalyuk

This patch introduces new QEMU_CLOCK_VIRTUAL_RT clock, which
should be used for icount warping. Separate timer is needed
for replaying the execution, because warping callbacks should
be deterministic. We cannot make realtime clock deterministic
because it is used for screen updates and other simulator-specific
actions. That is why we added new clock which is recorded and
replayed when needed.

Signed-off-by: Pavel Dovgalyuk 
---
 include/qemu/timer.h |7 +++
 qemu-timer.c |2 ++
 replay/replay.h  |4 +++-
 3 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 7b43331..df27157 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -37,12 +37,19 @@
  * is suspended, and it will reflect system time changes the host may
  * undergo (e.g. due to NTP). The host clock has the same precision as
  * the virtual clock.
+ *
+ * @QEMU_CLOCK_VIRTUAL_RT: realtime clock used for icount warp
+ *
+ * This clock runs as a realtime clock, but is used for icount warp
+ * and thus should be traced with record/replay to make warp function
+ * behave deterministically.
  */
 
 typedef enum {
 QEMU_CLOCK_REALTIME = 0,
 QEMU_CLOCK_VIRTUAL = 1,
 QEMU_CLOCK_HOST = 2,
+QEMU_CLOCK_VIRTUAL_RT = 3,
 QEMU_CLOCK_MAX
 } QEMUClockType;
 
diff --git a/qemu-timer.c b/qemu-timer.c
index 8307913..3f99af5 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -567,6 +567,8 @@ int64_t qemu_clock_get_ns(QEMUClockType type)
 notifier_list_notify(&clock->reset_notifiers, &now);
 }
 return now;
+case QEMU_CLOCK_VIRTUAL_RT:
+return REPLAY_CLOCK(REPLAY_CLOCK_VIRTUAL_RT, get_clock());
 }
 }
 
diff --git a/replay/replay.h b/replay/replay.h
index 143fe85..0c02e03 100755
--- a/replay/replay.h
+++ b/replay/replay.h
@@ -22,8 +22,10 @@
 #define REPLAY_CLOCK_REAL_TICKS 0
 /* host_clock */
 #define REPLAY_CLOCK_HOST   1
+/* virtual_rt_clock */
+#define REPLAY_CLOCK_VIRTUAL_RT 2
 
-#define REPLAY_CLOCK_COUNT  2
+#define REPLAY_CLOCK_COUNT  3
 
 extern ReplayMode replay_mode;
 extern char *replay_image_suffix;

[Qemu-devel] [RFC PATCH v5 23/31] cpus: make icount warp deterministic in replay mode

2014-11-26 Thread Pavel Dovgalyuk

This patch adds saving and replaying warping parameters in record and replay
modes. These parameters affect on virtual clock values and therefore should
be deterministic.

Signed-off-by: Pavel Dovgalyuk 
---
 cpus.c |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/cpus.c b/cpus.c
index 707bf34..f6a6319 100644
--- a/cpus.c
+++ b/cpus.c
@@ -370,7 +370,7 @@ static void icount_warp_rt(void *opaque)
 
 seqlock_write_lock(&timers_state.vm_clock_seqlock);
 if (runstate_is_running()) {
-int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT);
 int64_t warp_delta;
 
 warp_delta = clock - vm_clock_warp_start;
@@ -444,7 +444,7 @@ void qemu_clock_warp(QEMUClockType type)
 }
 
 /* We want to use the earliest deadline from ALL vm_clocks */
-clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT);
 deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
 if (deadline < 0) {
 return;
@@ -537,8 +537,8 @@ void configure_icount(QemuOpts *opts, Error **errp)
 return;
 }
 icount_align_option = qemu_opt_get_bool(opts, "align", false);
-icount_warp_timer = timer_new_ns(QEMU_CLOCK_REALTIME,
-  icount_warp_rt, NULL);
+icount_warp_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
+ icount_warp_rt, NULL);
 if (strcmp(option, "auto") != 0) {
 errno = 0;
 icount_time_shift = strtol(option, &rem_str, 0);
@@ -562,10 +562,10 @@ void configure_icount(QemuOpts *opts, Error **errp)
the virtual time trigger catches emulated time passing too fast.
Realtime triggers occur even when idle, so use them less frequently
than VM triggers.  */
-icount_rt_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
-icount_adjust_rt, NULL);
+icount_rt_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL_RT,
+   icount_adjust_rt, NULL);
 timer_mod(icount_rt_timer,
-   qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + 1000);
+   qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) + 1000);
 icount_vm_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
 icount_adjust_vm, NULL);
 timer_mod(icount_vm_timer,

[Qemu-devel] [RFC PATCH v5 19/31] cpu: replay instructions sequence

2014-11-26 Thread Pavel Dovgalyuk

This patch adds calls to replay functions into the icount setup block.
In record mode number of executed instructions is written to the log.
In replay mode number of istructions to execute is taken from the replay log.

Signed-off-by: Pavel Dovgalyuk 
---
 cpu-exec.c  |1 +
 cpus.c  |   28 ++--
 replay/replay.c |   18 ++
 replay/replay.h |4 
 4 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 05cca50..8938c89 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -541,6 +541,7 @@ int cpu_exec(CPUArchState *env)
 }
 cpu->exception_index = EXCP_INTERRUPT;
 next_tb = 0;
+qemu_notify_event();
 cpu_loop_exit(cpu);
 }
 break;
diff --git a/cpus.c b/cpus.c
index 43ae7fc..707bf34 100644
--- a/cpus.c
+++ b/cpus.c
@@ -41,6 +41,7 @@
 #include "qemu/seqlock.h"
 #include "qapi-event.h"
 #include "hw/nmi.h"
+#include "replay/replay.h"
 
 #ifndef _WIN32
 #include "qemu/compatfd.h"
@@ -1360,18 +1361,22 @@ static int tcg_cpu_exec(CPUArchState *env)
 + cpu->icount_extra);
 cpu->icount_decr.u16.low = 0;
 cpu->icount_extra = 0;
-deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
+if (replay_mode != REPLAY_MODE_PLAY) {
+deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
 
-/* Maintain prior (possibly buggy) behaviour where if no deadline
- * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
- * INT32_MAX nanoseconds ahead, we still use INT32_MAX
- * nanoseconds.
- */
-if ((deadline < 0) || (deadline > INT32_MAX)) {
-deadline = INT32_MAX;
-}
+/* Maintain prior (possibly buggy) behaviour where if no deadline
+ * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more 
than
+ * INT32_MAX nanoseconds ahead, we still use INT32_MAX
+ * nanoseconds.
+ */
+if ((deadline < 0) || (deadline > INT32_MAX)) {
+deadline = INT32_MAX;
+}
 
-count = qemu_icount_round(deadline);
+count = qemu_icount_round(deadline);
+} else {
+count = replay_get_instructions();
+}
 timers_state.qemu_icount += count;
 decr = (count > 0x) ? 0x : count;
 count -= decr;
@@ -1389,6 +1394,9 @@ static int tcg_cpu_exec(CPUArchState *env)
 + cpu->icount_extra);
 cpu->icount_decr.u32 = 0;
 cpu->icount_extra = 0;
+if (replay_mode == REPLAY_MODE_PLAY) {
+replay_exec_instructions();
+}
 }
 return ret;
 }
diff --git a/replay/replay.c b/replay/replay.c
index c275794..a6de6a1 100755
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -69,6 +69,24 @@ uint64_t replay_get_current_step(void)
 return cpu_get_instructions_counter();
 }
 
+int replay_get_instructions(void)
+{
+if (skip_async_events(EVENT_INSTRUCTION)) {
+return replay_state.instructions_count;
+}
+return 0;
+}
+
+void replay_exec_instructions(void)
+{
+int count = (int)(replay_get_current_step() - replay_state.current_step);
+replay_state.instructions_count -= count;
+replay_state.current_step += count;
+if (replay_state.instructions_count == 0 && count != 0) {
+replay_has_unread_data = 0;
+}
+}
+
 bool replay_exception(void)
 {
 if (replay_mode == REPLAY_MODE_RECORD) {
diff --git a/replay/replay.h b/replay/replay.h
index 0cfd71a..90a949b 100755
--- a/replay/replay.h
+++ b/replay/replay.h
@@ -26,6 +26,10 @@ ReplaySubmode replay_get_play_submode(void);
 
 /*! Returns number of executed instructions. */
 uint64_t replay_get_current_step(void);
+/*! Returns number of instructions to execute in replay mode. */
+int replay_get_instructions(void);
+/*! Updates instructions counter in replay mode. */
+void replay_exec_instructions(void);
 
 /* Interrupts and exceptions */

[Qemu-devel] [RFC PATCH v5 06/31] cpu-exec: reset exception_index correctly

2014-11-26 Thread Pavel Dovgalyuk

Exception index is reset at every entry at every entry into cpu_exec()
function. This may cause missing the exceptions while replaying them.
This patch moves exception_index reset to the locations where they are
processed.

Signed-off-by: Pavel Dovgalyuk 
---
 cpu-exec.c |3 ++-
 cpus.c |3 +++
 2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 8830255..4df9856 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -358,7 +358,6 @@ int cpu_exec(CPUArchState *env)
 }
 
 cc->cpu_exec_enter(cpu);
-cpu->exception_index = -1;
 
 /* Calculate difference between guest clock and host clock.
  * This delay includes the delay of the last cycle, so
@@ -378,6 +377,7 @@ int cpu_exec(CPUArchState *env)
 if (ret == EXCP_DEBUG) {
 cpu_handle_debug_exception(env);
 }
+cpu->exception_index = -1;
 break;
 } else {
 #if defined(CONFIG_USER_ONLY)
@@ -388,6 +388,7 @@ int cpu_exec(CPUArchState *env)
 cc->do_interrupt(cpu);
 #endif
 ret = cpu->exception_index;
+cpu->exception_index = -1;
 break;
 #else
 cc->do_interrupt(cpu);
diff --git a/cpus.c b/cpus.c
index e53d605..a7b6c53 100644
--- a/cpus.c
+++ b/cpus.c
@@ -934,6 +934,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
 qemu_mutex_lock(&qemu_global_mutex);
 qemu_thread_get_self(cpu->thread);
 cpu->thread_id = qemu_get_thread_id();
+cpu->exception_index = -1;
 current_cpu = cpu;
 
 r = kvm_init_vcpu(cpu);
@@ -974,6 +975,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
 qemu_mutex_lock_iothread();
 qemu_thread_get_self(cpu->thread);
 cpu->thread_id = qemu_get_thread_id();
+cpu->exception_index = -1;
 
 sigemptyset(&waitset);
 sigaddset(&waitset, SIG_IPI);
@@ -1016,6 +1018,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 CPU_FOREACH(cpu) {
 cpu->thread_id = qemu_get_thread_id();
 cpu->created = true;
+cpu->exception_index = -1;
 }
 qemu_cond_signal(&qemu_cpu_cond);

[Qemu-devel] [RFC PATCH v5 26/31] replay: bottom halves

2014-11-26 Thread Pavel Dovgalyuk

This patch introduces bottom half event for replay queue. It saves the events
into the queue and process them at the checkpoints and instructions execution.

Signed-off-by: Pavel Dovgalyuk 
---
 async.c  |   46 --
 dma-helpers.c|4 +++-
 hw/ide/ahci.c|4 +++-
 hw/ide/core.c|4 +++-
 hw/timer/arm_timer.c |2 +-
 hw/usb/hcd-uhci.c|2 +-
 include/block/aio.h  |   18 ++
 include/qemu/main-loop.h |1 +
 main-loop.c  |5 +
 replay/replay-events.c   |   16 
 replay/replay-internal.h |3 ++-
 replay/replay.h  |2 ++
 stubs/replay.c   |4 
 13 files changed, 99 insertions(+), 12 deletions(-)

diff --git a/async.c b/async.c
index 6e1b282..97111c0 100644
--- a/async.c
+++ b/async.c
@@ -27,6 +27,7 @@
 #include "block/thread-pool.h"
 #include "qemu/main-loop.h"
 #include "qemu/atomic.h"
+#include "replay/replay.h"
 
 /***/
 /* bottom halves (can be seen as timers which expire ASAP) */
@@ -39,24 +40,53 @@ struct QEMUBH {
 bool scheduled;
 bool idle;
 bool deleted;
+bool replay;
+uint64_t id;
 };
 
 QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
 {
-QEMUBH *bh;
+QEMUBH *bh, **last;
 bh = g_malloc0(sizeof(QEMUBH));
 bh->ctx = ctx;
 bh->cb = cb;
 bh->opaque = opaque;
 qemu_mutex_lock(&ctx->bh_lock);
-bh->next = ctx->first_bh;
-/* Make sure that the members are ready before putting bh into list */
-smp_wmb();
-ctx->first_bh = bh;
+if (replay_mode != REPLAY_MODE_NONE) {
+/* Slower way, but this is a queue and not a stack.
+   Replay will process the BH in the same order they
+   came into the queue. */
+last = &ctx->first_bh;
+while (*last) {
+last = &(*last)->next;
+}
+smp_wmb();
+*last = bh;
+} else {
+bh->next = ctx->first_bh;
+/* Make sure that the members are ready before putting bh into list */
+smp_wmb();
+ctx->first_bh = bh;
+}
 qemu_mutex_unlock(&ctx->bh_lock);
 return bh;
 }
 
+QEMUBH *aio_bh_new_replay(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
+  uint64_t id)
+{
+QEMUBH *bh = aio_bh_new(ctx, cb, opaque);
+bh->replay = true;
+bh->id = id;
+return bh;
+}
+
+void aio_bh_call(void *opaque)
+{
+QEMUBH *bh = (QEMUBH *)opaque;
+bh->cb(bh->opaque);
+}
+
 /* Multiple occurrences of aio_bh_poll cannot be called concurrently */
 int aio_bh_poll(AioContext *ctx)
 {
@@ -79,7 +109,11 @@ int aio_bh_poll(AioContext *ctx)
 if (!bh->idle)
 ret = 1;
 bh->idle = 0;
-bh->cb(bh->opaque);
+if (!bh->replay) {
+aio_bh_call(bh);
+} else {
+replay_add_bh_event(bh, bh->id);
+}
 }
 }
 
diff --git a/dma-helpers.c b/dma-helpers.c
index 6918572..357d7e9 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -13,6 +13,7 @@
 #include "qemu/range.h"
 #include "qemu/thread.h"
 #include "qemu/main-loop.h"
+#include "replay/replay.h"
 
 /* #define DEBUG_IOMMU */
 
@@ -96,7 +97,8 @@ static void continue_after_map_failure(void *opaque)
 {
 DMAAIOCB *dbs = (DMAAIOCB *)opaque;
 
-dbs->bh = qemu_bh_new(reschedule_dma, dbs);
+dbs->bh = qemu_bh_new_replay(reschedule_dma, dbs,
+ replay_get_current_step());
 qemu_bh_schedule(dbs->bh);
 }
 
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 70958e3..fbefc52 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -33,6 +33,7 @@
 #include "internal.h"
 #include 
 #include 
+#include "replay/replay.h"
 
 /* #define DEBUG_AHCI */
 
@@ -1192,7 +1193,8 @@ static void ahci_cmd_done(IDEDMA *dma)
 
 if (!ad->check_bh) {
 /* maybe we still have something to process, check later */
-ad->check_bh = qemu_bh_new(ahci_check_cmd_bh, ad);
+ad->check_bh = qemu_bh_new_replay(ahci_check_cmd_bh, ad,
+  replay_get_current_step());
 qemu_bh_schedule(ad->check_bh);
 }
 }
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 44e3d50..b622aae 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -32,6 +32,7 @@
 #include "sysemu/dma.h"
 #include "hw/block/block.h"
 #include "sysemu/block-backend.h"
+#include "replay/replay.h"
 
 #include 
 
@@ -448,7 +449,8 @@ BlockAIOCB *ide_issue_trim(BlockBackend *blk,
 
 iocb = blk_aio_get(&trim_aiocb_info, blk, cb, opaque);
 iocb->blk = blk;
-iocb->bh = qemu_bh_new(ide_trim_bh_cb, iocb);
+iocb->bh = qemu_bh_new_replay(ide_trim_bh_cb, iocb,
+  replay_get_current_step());
 iocb->ret = 0;
 iocb->qiov = qiov;
 iocb->i = -1;
diff --git a/hw/timer/arm_timer.c b/hw/timer/arm_timer.c
index 14529

[Qemu-devel] [RFC PATCH v5 28/31] replay: thread pool

2014-11-26 Thread Pavel Dovgalyuk

This patch modifies thread pool to allow replaying asynchronous thread tasks
synchronously in replay mode.

Signed-off-by: Pavel Dovgalyuk 
---
 block/raw-posix.c   |6 -
 block/raw-win32.c   |4 +++-
 include/block/thread-pool.h |4 +++-
 replay/replay-events.c  |   11 ++
 replay/replay-internal.h|3 ++-
 replay/replay.h |2 ++
 stubs/replay.c  |4 
 tests/test-thread-pool.c|7 --
 thread-pool.c   |   49 ++-
 9 files changed, 67 insertions(+), 23 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 475cf74..5733f4f 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1078,7 +1078,9 @@ static BlockAIOCB *paio_submit(BlockDriverState *bs, int 
fd,
 
 trace_paio_submit(acb, opaque, sector_num, nb_sectors, type);
 pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
-return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
+return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque,
+  qiov ? qiov->replay : false,
+  qiov ? qiov->replay_step : 0);
 }
 
 static BlockAIOCB *raw_aio_submit(BlockDriverState *bs,
@@ -1970,7 +1972,7 @@ static BlockAIOCB *hdev_aio_ioctl(BlockDriverState *bs,
 acb->aio_ioctl_buf = buf;
 acb->aio_ioctl_cmd = req;
 pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
-return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
+return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque, false, 0);
 }
 
 #elif defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
diff --git a/block/raw-win32.c b/block/raw-win32.c
index 7b58881..39f2fa0 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -158,7 +158,9 @@ static BlockAIOCB *paio_submit(BlockDriverState *bs, HANDLE 
hfile,
 
 trace_paio_submit(acb, opaque, sector_num, nb_sectors, type);
 pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
-return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
+return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque,
+  qiov ? qiov->replay : false,
+  qiov ? qiov->replay_step : 0);
 }
 
 int qemu_ftruncate64(int fd, int64_t length)
diff --git a/include/block/thread-pool.h b/include/block/thread-pool.h
index 42eb5e8..801ac00 100644
--- a/include/block/thread-pool.h
+++ b/include/block/thread-pool.h
@@ -29,9 +29,11 @@ void thread_pool_free(ThreadPool *pool);
 
 BlockAIOCB *thread_pool_submit_aio(ThreadPool *pool,
 ThreadPoolFunc *func, void *arg,
-BlockCompletionFunc *cb, void *opaque);
+BlockCompletionFunc *cb, void *opaque,
+bool replay, uint64_t replay_step);
 int coroutine_fn thread_pool_submit_co(ThreadPool *pool,
 ThreadPoolFunc *func, void *arg);
 void thread_pool_submit(ThreadPool *pool, ThreadPoolFunc *func, void *arg);
+void thread_pool_work(ThreadPool *pool, void *r);
 
 #endif
diff --git a/replay/replay-events.c b/replay/replay-events.c
index 1aee0a4..4da5de0 100755
--- a/replay/replay-events.c
+++ b/replay/replay-events.c
@@ -12,6 +12,7 @@
 #include "qemu-common.h"
 #include "replay.h"
 #include "replay-internal.h"
+#include "block/thread-pool.h"
 
 typedef struct Event {
 int event_kind;
@@ -39,6 +40,9 @@ static void replay_run_event(Event *event)
 case REPLAY_ASYNC_EVENT_BH:
 aio_bh_call(event->opaque);
 break;
+case REPLAY_ASYNC_EVENT_THREAD:
+thread_pool_work((ThreadPool *)event->opaque, event->opaque2);
+break;
 default:
 fprintf(stderr, "Replay: invalid async event ID (%d) in the queue\n",
 event->event_kind);
@@ -127,6 +131,11 @@ void replay_add_bh_event(void *bh, uint64_t id)
 replay_add_event_internal(REPLAY_ASYNC_EVENT_BH, bh, NULL, id);
 }
 
+void replay_add_thread_event(void *opaque, void *opaque2, uint64_t id)
+{
+replay_add_event_internal(REPLAY_ASYNC_EVENT_THREAD, opaque, opaque2, id);
+}
+
 void replay_save_events(int opt)
 {
 qemu_mutex_lock(&lock);
@@ -141,6 +150,7 @@ void replay_save_events(int opt)
 /* save event-specific data */
 switch (event->event_kind) {
 case REPLAY_ASYNC_EVENT_BH:
+case REPLAY_ASYNC_EVENT_THREAD:
 replay_put_qword(event->id);
 break;
 }
@@ -170,6 +180,7 @@ void replay_read_events(int opt)
 /* Execute some events without searching them in the queue */
 switch (read_event_kind) {
 case REPLAY_ASYNC_EVENT_BH:
+case REPLAY_ASYNC_EVENT_THREAD:
 if (read_id == -1) {
 read_id = replay_get_qword();
 }
diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index 6e0c2e9..c32bd9c 100755
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -38,7 +38,8 @@
 /* Asynchr

[Qemu-devel] [RFC PATCH v5 21/31] replay: recording and replaying different timers

2014-11-26 Thread Pavel Dovgalyuk

This patch introduces functions for recording and replaying realtime sources,
that do not use qemu-clock interface. These include return value of time()
function in time_t and struct tm forms. Patch also adds warning to
get_timedate function to prevent its usage in recording mode, because it may
lead to non-determinism.

Signed-off-by: Pavel Dovgalyuk 
---
 hw/timer/mc146818rtc.c   |   10 
 hw/timer/pl031.c |   10 
 include/qemu-common.h|1 
 replay/replay-internal.h |4 ++
 replay/replay-time.c |  112 ++
 replay/replay.h  |8 +++
 vl.c |   17 ++-
 7 files changed, 157 insertions(+), 5 deletions(-)

diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c
index f18d128..7e27931 100644
--- a/hw/timer/mc146818rtc.c
+++ b/hw/timer/mc146818rtc.c
@@ -28,6 +28,7 @@
 #include "qapi/visitor.h"
 #include "qapi-event.h"
 #include "qmp-commands.h"
+#include "replay/replay.h"
 
 #ifdef TARGET_I386
 #include "hw/i386/apic.h"
@@ -703,7 +704,14 @@ static void rtc_set_date_from_host(ISADevice *dev)
 RTCState *s = MC146818_RTC(dev);
 struct tm tm;
 
-qemu_get_timedate(&tm, 0);
+if (replay_mode == REPLAY_MODE_RECORD) {
+qemu_get_timedate_no_warning(&tm, 0);
+replay_save_tm(&tm);
+} else if (replay_mode == REPLAY_MODE_PLAY) {
+replay_read_tm(&tm);
+} else {
+qemu_get_timedate_no_warning(&tm, 0);
+}
 
 s->base_rtc = mktimegm(&tm);
 s->last_update = qemu_clock_get_ns(rtc_clock);
diff --git a/hw/timer/pl031.c b/hw/timer/pl031.c
index 34d9b44..19264f2 100644
--- a/hw/timer/pl031.c
+++ b/hw/timer/pl031.c
@@ -14,6 +14,7 @@
 #include "hw/sysbus.h"
 #include "qemu/timer.h"
 #include "sysemu/sysemu.h"
+#include "replay/replay.h"
 
 //#define DEBUG_PL031
 
@@ -200,7 +201,14 @@ static int pl031_init(SysBusDevice *dev)
 sysbus_init_mmio(dev, &s->iomem);
 
 sysbus_init_irq(dev, &s->irq);
-qemu_get_timedate(&tm, 0);
+if (replay_mode == REPLAY_MODE_RECORD) {
+qemu_get_timedate_no_warning(&tm, 0);
+replay_save_tm(&tm);
+} else if (replay_mode == REPLAY_MODE_PLAY) {
+replay_read_tm(&tm);
+} else {
+qemu_get_timedate_no_warning(&tm, 0);
+}
 s->tick_offset = mktimegm(&tm) -
 qemu_clock_get_ns(rtc_clock) / get_ticks_per_sec();
 
diff --git a/include/qemu-common.h b/include/qemu-common.h
index b87e9c2..d18f033 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -129,6 +129,7 @@ void dump_drift_info(FILE *f, fprintf_function cpu_fprintf);
 int qemu_main(int argc, char **argv, char **envp);
 #endif
 
+void qemu_get_timedate_no_warning(struct tm *tm, int offset);
 void qemu_get_timedate(struct tm *tm, int offset);
 int qemu_timedate_diff(struct tm *tm);
 
diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index c36d7de..009029d 100755
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -14,6 +14,10 @@
 
 #include 
 
+/* for time_t event */
+#define EVENT_TIME_T1
+/* for tm event */
+#define EVENT_TM2
 /* for software interrupt */
 #define EVENT_INTERRUPT 15
 /* for emulated exceptions */
diff --git a/replay/replay-time.c b/replay/replay-time.c
index 3f94f4e..5d944d3 100755
--- a/replay/replay-time.c
+++ b/replay/replay-time.c
@@ -77,3 +77,115 @@ int64_t replay_read_clock(unsigned int kind)
 fprintf(stderr, "REPLAY INTERNAL ERROR %d\n", __LINE__);
 exit(1);
 }
+
+/*! Saves time_t value to the log */
+static void replay_save_time_t(time_t tm)
+{
+replay_save_instructions();
+
+if (replay_file) {
+replay_put_event(EVENT_TIME_T);
+if (sizeof(tm) == 4) {
+replay_put_dword(tm);
+} else if (sizeof(tm) == 8) {
+replay_put_qword(tm);
+} else {
+fprintf(stderr, "invalid time_t sizeof: %u\n",
+(unsigned)sizeof(tm));
+exit(1);
+}
+}
+}
+
+/*! Reads time_t value from the log. Stops execution in case of error */
+static time_t replay_read_time_t(void)
+{
+replay_exec_instructions();
+
+if (replay_file) {
+time_t tm;
+
+skip_async_events_until(EVENT_TIME_T);
+
+if (sizeof(tm) == 4) {
+tm = replay_get_dword();
+} else if (sizeof(tm) == 8) {
+tm = replay_get_qword();
+} else {
+fprintf(stderr, "invalid time_t sizeof: %u\n",
+(unsigned)sizeof(tm));
+exit(1);
+}
+
+replay_check_error();
+
+replay_has_unread_data = 0;
+
+return tm;
+}
+
+fprintf(stderr, "REPLAY INTERNAL ERROR %d\n", __LINE__);
+exit(1);
+}
+
+void replay_save_tm(struct tm *tm)
+{
+replay_save_instructions();
+
+if (replay_file) {
+replay_put_event(EVENT_TM);
+
+replay_put_dword(tm->tm_sec);
+replay_put_dword(tm->tm_min);
+replay_put_dword(tm->t

[Qemu-devel] [RFC PATCH v5 11/31] From 7abf2f72777958d395cfd01d97fe707cc06152b5 Mon Sep 17 00:00:00 2001

2014-11-26 Thread Pavel Dovgalyuk

From: Paolo Bonzini 

Subject: [PATCH] target-ppc: pass DisasContext to SPR generator functions

Signed-off-by: Paolo Bonzini 

Signed-off-by: Pavel Dovgalyuk 
---
 target-ppc/cpu.h|   13 +-
 target-ppc/translate.c  |   10 +-
 target-ppc/translate_init.c |  247 +--
 3 files changed, 133 insertions(+), 137 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 8724561..21deddf 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -320,6 +320,7 @@ typedef struct opc_handler_t opc_handler_t;
 /*/
 /* Types used to describe some PowerPC registers */
 typedef struct CPUPPCState CPUPPCState;
+typedef struct DisasContext DisasContext;
 typedef struct ppc_tb_t ppc_tb_t;
 typedef struct ppc_spr_t ppc_spr_t;
 typedef struct ppc_dcr_t ppc_dcr_t;
@@ -328,13 +329,13 @@ typedef union ppc_tlb_t ppc_tlb_t;
 
 /* SPR access micro-ops generations callbacks */
 struct ppc_spr_t {
-void (*uea_read)(void *opaque, int gpr_num, int spr_num);
-void (*uea_write)(void *opaque, int spr_num, int gpr_num);
+void (*uea_read)(DisasContext *ctx, int gpr_num, int spr_num);
+void (*uea_write)(DisasContext *ctx, int spr_num, int gpr_num);
 #if !defined(CONFIG_USER_ONLY)
-void (*oea_read)(void *opaque, int gpr_num, int spr_num);
-void (*oea_write)(void *opaque, int spr_num, int gpr_num);
-void (*hea_read)(void *opaque, int gpr_num, int spr_num);
-void (*hea_write)(void *opaque, int spr_num, int gpr_num);
+void (*oea_read)(DisasContext *ctx, int gpr_num, int spr_num);
+void (*oea_write)(DisasContext *ctx, int spr_num, int gpr_num);
+void (*hea_read)(DisasContext *ctx, int gpr_num, int spr_num);
+void (*hea_write)(DisasContext *ctx, int spr_num, int gpr_num);
 #endif
 const char *name;
 target_ulong default_value;
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index d03daea..439f7f0 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -183,7 +183,7 @@ void ppc_translate_init(void)
 }
 
 /* internal defines */
-typedef struct DisasContext {
+struct DisasContext {
 struct TranslationBlock *tb;
 target_ulong nip;
 uint32_t opcode;
@@ -206,7 +206,7 @@ typedef struct DisasContext {
 int singlestep_enabled;
 uint64_t insns_flags;
 uint64_t insns_flags2;
-} DisasContext;
+};
 
 /* Return true iff byteswap is needed in a scalar memop */
 static inline bool need_byteswap(const DisasContext *ctx)
@@ -4221,7 +4221,7 @@ static void gen_mfmsr(DisasContext *ctx)
 #endif
 }
 
-static void spr_noaccess(void *opaque, int gprn, int sprn)
+static void spr_noaccess(DisasContext *ctx, int gprn, int sprn)
 {
 #if 0
 sprn = ((sprn >> 5) & 0x1F) | ((sprn & 0x1F) << 5);
@@ -4233,7 +4233,7 @@ static void spr_noaccess(void *opaque, int gprn, int sprn)
 /* mfspr */
 static inline void gen_op_mfspr(DisasContext *ctx)
 {
-void (*read_cb)(void *opaque, int gprn, int sprn);
+void (*read_cb)(DisasContext *ctx, int gprn, int sprn);
 uint32_t sprn = SPR(ctx->opcode);
 
 #if !defined(CONFIG_USER_ONLY)
@@ -4384,7 +4384,7 @@ static void gen_mtmsr(DisasContext *ctx)
 /* mtspr */
 static void gen_mtspr(DisasContext *ctx)
 {
-void (*write_cb)(void *opaque, int sprn, int gprn);
+void (*write_cb)(DisasContext *ctx, int sprn, int gprn);
 uint32_t sprn = SPR(ctx->opcode);
 
 #if !defined(CONFIG_USER_ONLY)
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 33fb4cc..c7ca95e 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -65,7 +65,7 @@ static void spr_load_dump_spr(int sprn)
 #endif
 }
 
-static void spr_read_generic (void *opaque, int gprn, int sprn)
+static void spr_read_generic (DisasContext *ctx, int gprn, int sprn)
 {
 gen_load_spr(cpu_gpr[gprn], sprn);
 spr_load_dump_spr(sprn);
@@ -80,14 +80,14 @@ static void spr_store_dump_spr(int sprn)
 #endif
 }
 
-static void spr_write_generic (void *opaque, int sprn, int gprn)
+static void spr_write_generic (DisasContext *ctx, int sprn, int gprn)
 {
 gen_store_spr(sprn, cpu_gpr[gprn]);
 spr_store_dump_spr(sprn);
 }
 
 #if !defined(CONFIG_USER_ONLY)
-static void spr_write_generic32(void *opaque, int sprn, int gprn)
+static void spr_write_generic32(DisasContext *ctx, int sprn, int gprn)
 {
 #ifdef TARGET_PPC64
 TCGv t0 = tcg_temp_new();
@@ -96,11 +96,11 @@ static void spr_write_generic32(void *opaque, int sprn, int 
gprn)
 tcg_temp_free(t0);
 spr_store_dump_spr(sprn);
 #else
-spr_write_generic(opaque, sprn, gprn);
+spr_write_generic(ctx, sprn, gprn);
 #endif
 }
 
-static void spr_write_clear (void *opaque, int sprn, int gprn)
+static void spr_write_clear (DisasContext *ctx, int sprn, int gprn)
 {
 TCGv t0 = tcg_temp_new();
 TCGv t1 = tcg_temp_new();
@@ -112,7 +112,7 @@ static void spr_write_clear (void *opaque, int sprn, int 
gprn)
 tcg_temp_free(t1);
 }
 
-static v

[Qemu-devel] [RFC PATCH v5 29/31] replay: initialization and deinitialization

2014-11-26 Thread Pavel Dovgalyuk

This patch introduces the functions for enabling the record/replay and for
freeing the resources when simulator closes.

Signed-off-by: Pavel Dovgalyuk 
---
 block.c  |2 -
 exec.c   |1 
 replay/replay-internal.h |2 +
 replay/replay.c  |  134 ++
 replay/replay.h  |   13 
 stubs/replay.c   |1 
 vl.c |5 ++
 7 files changed, 157 insertions(+), 1 deletions(-)

diff --git a/block.c b/block.c
index 02c6a78..08b6b3f 100644
--- a/block.c
+++ b/block.c
@@ -1941,7 +1941,7 @@ void bdrv_drain_all(void)
 busy |= bs_busy;
 }
 }
-if (replay_mode == REPLAY_MODE_PLAY) {
+if (replay_mode == REPLAY_MODE_PLAY && replay_checkpoints) {
 /* Skip checkpoints from the log */
 while (replay_checkpoint(8)) {
 /* Nothing */
diff --git a/exec.c b/exec.c
index 759055d..2215984 100644
--- a/exec.c
+++ b/exec.c
@@ -794,6 +794,7 @@ void cpu_abort(CPUState *cpu, const char *fmt, ...)
 }
 va_end(ap2);
 va_end(ap);
+replay_finish();
 #if defined(CONFIG_USER_ONLY)
 {
 struct sigaction act;
diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index c32bd9c..142e09a 100755
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -34,6 +34,8 @@
 
 /* for checkpoint event */
 #define EVENT_CHECKPOINT96
+/* end of log event */
+#define EVENT_END   127
 
 /* Asynchronous events IDs */
 
diff --git a/replay/replay.c b/replay/replay.c
index 07ede73..14e650c 100755
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -14,14 +14,23 @@
 #include "replay-internal.h"
 #include "qemu/timer.h"
 
+/* Current version of the replay mechanism.
+   Increase it when file format changes. */
+#define REPLAY_VERSION  0xe02002
+/* Size of replay log header */
+#define HEADER_SIZE (sizeof(uint32_t) + sizeof(uint64_t))
+
 ReplayMode replay_mode = REPLAY_MODE_NONE;
 /*! Stores current submode for PLAY mode */
 ReplaySubmode play_submode = REPLAY_SUBMODE_UNKNOWN;
 
+/* Name of replay file  */
+static char *replay_filename;
 /* Suffix for the disk images filenames */
 char *replay_image_suffix;
 
 ReplayState replay_state;
+bool replay_checkpoints;
 
 ReplaySubmode replay_get_play_submode(void)
 {
@@ -154,6 +163,10 @@ void replay_shutdown_request(void)
 /* Used checkpoints: 2 3 4 5 6 7 8 9 10 11 */
 int replay_checkpoint(unsigned int checkpoint)
 {
+if (!replay_checkpoints) {
+return 1;
+}
+
 replay_save_instructions();
 
 if (replay_file) {
@@ -178,3 +191,124 @@ int replay_checkpoint(unsigned int checkpoint)
 
 return 1;
 }
+
+static void replay_enable(const char *fname, int mode)
+{
+const char *fmode = NULL;
+if (replay_file) {
+fprintf(stderr,
+"Replay: some record/replay operation is already started\n");
+return;
+}
+
+switch (mode) {
+case REPLAY_MODE_RECORD:
+fmode = "wb";
+break;
+case REPLAY_MODE_PLAY:
+fmode = "rb";
+play_submode = REPLAY_SUBMODE_NORMAL;
+break;
+default:
+fprintf(stderr, "Replay: internal error: invalid replay mode\n");
+exit(1);
+}
+
+atexit(replay_finish);
+
+replay_file = fopen(fname, fmode);
+if (replay_file == NULL) {
+fprintf(stderr, "Replay: open %s: %s\n", fname, strerror(errno));
+exit(1);
+}
+
+replay_filename = g_strdup(fname);
+
+replay_mode = mode;
+replay_has_unread_data = 0;
+replay_data_kind = -1;
+replay_state.instructions_count = 0;
+replay_state.current_step = 0;
+
+/* skip file header for RECORD and check it for PLAY */
+if (replay_mode == REPLAY_MODE_RECORD) {
+fseek(replay_file, HEADER_SIZE, SEEK_SET);
+} else if (replay_mode == REPLAY_MODE_PLAY) {
+unsigned int version = replay_get_dword();
+uint64_t offset = replay_get_qword();
+if (version != REPLAY_VERSION) {
+fprintf(stderr, "Replay: invalid input log file version\n");
+exit(1);
+}
+/* go to the beginning */
+fseek(replay_file, 12, SEEK_SET);
+}
+
+replay_init_events();
+}
+
+void replay_configure(QemuOpts *opts, int mode)
+{
+const char *fname;
+
+fname = qemu_opt_get(opts, "fname");
+if (!fname) {
+fprintf(stderr, "File name not specified for replay\n");
+exit(1);
+}
+
+const char *suffix = qemu_opt_get(opts, "suffix");
+if (suffix) {
+replay_image_suffix = g_strdup(suffix);
+} else {
+replay_image_suffix = g_strdup("replay_qcow");
+}
+
+replay_enable(fname, mode);
+}
+
+void replay_init_timer(void)
+{
+if (replay_mode == REPLAY_MODE_NONE) {
+return;
+}
+
+replay_checkpoints = true;
+replay_enable_events();
+}
+
+void replay_finish(void)
+{
+if (replay_mode == REPLAY_MODE_NONE)

[Qemu-devel] [RFC PATCH v5 25/31] replay: checkpoints

2014-11-26 Thread Pavel Dovgalyuk

This patch introduces checkpoints that synchronize cpu thread and iothread.
When checkpoint is met in the code all asynchronous events from the queue
are executed.

Signed-off-by: Pavel Dovgalyuk 
---
 block.c  |   11 +++
 cpus.c   |7 ++-
 include/qemu/timer.h |6 --
 main-loop.c  |5 +
 qemu-timer.c |   46 ++
 replay/replay-internal.h |3 +++
 replay/replay.c  |   28 
 replay/replay.h  |6 ++
 stubs/replay.c   |   11 +++
 vl.c |3 ++-
 10 files changed, 114 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index 88f6d9b..cc84050 100644
--- a/block.c
+++ b/block.c
@@ -1919,6 +1919,11 @@ void bdrv_drain_all(void)
 BlockDriverState *bs;
 
 while (busy) {
+if (!replay_checkpoint(8)) {
+/* Do not wait anymore, we stopped at some place in
+   the middle of execution during replay */
+return;
+}
 busy = false;
 
 QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
@@ -1935,6 +1940,12 @@ void bdrv_drain_all(void)
 busy |= bs_busy;
 }
 }
+if (replay_mode == REPLAY_MODE_PLAY) {
+/* Skip checkpoints from the log */
+while (replay_checkpoint(8)) {
+/* Nothing */
+}
+}
 }
 
 /* make a BlockDriverState anonymous by removing from bdrv_state and
diff --git a/cpus.c b/cpus.c
index f6a6319..8780eee 100644
--- a/cpus.c
+++ b/cpus.c
@@ -405,7 +405,7 @@ void qtest_clock_warp(int64_t dest)
 timers_state.qemu_icount_bias += warp;
 seqlock_write_unlock(&timers_state.vm_clock_seqlock);
 
-qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
+qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL, false);
 clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 }
 qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
@@ -425,6 +425,11 @@ void qemu_clock_warp(QEMUClockType type)
 return;
 }
 
+/* warp clock deterministically in record/replay mode */
+if (!replay_checkpoint(4)) {
+return;
+}
+
 /*
  * If the CPUs have been sleeping, advance QEMU_CLOCK_VIRTUAL timer now.
  * This ensures that the deadline for the timer is computed correctly 
below.
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index df27157..853eed5 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -239,13 +239,14 @@ void qemu_clock_unregister_reset_notifier(QEMUClockType 
type,
 /**
  * qemu_clock_run_timers:
  * @type: clock on which to operate
+ * @run_all: true, when called from qemu_clock_run_all_timers
  *
  * Run all the timers associated with the default timer list
  * of a clock.
  *
  * Returns: true if any timer ran.
  */
-bool qemu_clock_run_timers(QEMUClockType type);
+bool qemu_clock_run_timers(QEMUClockType type, bool run_all);
 
 /**
  * qemu_clock_run_all_timers:
@@ -336,12 +337,13 @@ QEMUClockType timerlist_get_clock(QEMUTimerList 
*timer_list);
 /**
  * timerlist_run_timers:
  * @timer_list: the timer list to use
+ * @run_all: true, when called from qemu_clock_run_all_timers
  *
  * Call all expired timers associated with the timer list.
  *
  * Returns: true if any timer expired
  */
-bool timerlist_run_timers(QEMUTimerList *timer_list);
+bool timerlist_run_timers(QEMUTimerList *timer_list, bool run_all);
 
 /**
  * timerlist_notify:
diff --git a/main-loop.c b/main-loop.c
index 981bcb5..d6e93c3 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -497,6 +497,11 @@ int main_loop_wait(int nonblocking)
 slirp_pollfds_poll(gpollfds, (ret < 0));
 #endif
 
+/* CPU thread can infinitely wait for event after
+   missing the warp */
+if (replay_mode == REPLAY_MODE_PLAY) {
+qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
+}
 qemu_clock_run_all_timers();
 
 return ret;
diff --git a/qemu-timer.c b/qemu-timer.c
index 3f99af5..dbd6c5e 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -458,7 +458,7 @@ bool timer_expired(QEMUTimer *timer_head, int64_t 
current_time)
 return timer_expired_ns(timer_head, current_time * timer_head->scale);
 }
 
-bool timerlist_run_timers(QEMUTimerList *timer_list)
+bool timerlist_run_timers(QEMUTimerList *timer_list, bool run_all)
 {
 QEMUTimer *ts;
 int64_t current_time;
@@ -466,6 +466,29 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
 QEMUTimerCB *cb;
 void *opaque;
 
+switch (timer_list->clock->type) {
+case QEMU_CLOCK_REALTIME:
+break;
+default:
+case QEMU_CLOCK_VIRTUAL:
+if ((replay_mode != REPLAY_MODE_NONE && !runstate_is_running())
+|| !replay_checkpoint(run_all ? 2 : 3)) {
+return false;
+}
+break;
+case QEMU_CLOCK_HOST:
+if ((replay_mode != REPLAY_MODE_NONE && !runstate_is_running())
+|| !replay_checkpoint(run_all ? 5 : 6)) {
+retur

[Qemu-devel] [RFC PATCH v5 12/31] From 185a3a47d08857a66332ae862b372a153ce92bb9 Mon Sep 17 00:00:00 2001

2014-11-26 Thread Pavel Dovgalyuk

From: Paolo Bonzini 

Subject: [PATCH] cpu-exec: add a new CF_USE_ICOUNT cflag

Signed-off-by: Paolo Bonzini 

Signed-off-by: Pavel Dovgalyuk 
---
 include/exec/exec-all.h |5 +++--
 translate-all.c |3 +++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 1d17f75..3d19e72 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -142,9 +142,11 @@ struct TranslationBlock {
 uint64_t flags; /* flags defining in which context the code was generated 
*/
 uint16_t size;  /* size of target code for this block (1 <=
size <= TARGET_PAGE_SIZE) */
-uint16_t cflags;/* compile flags */
+uint16_t icount;
+uint32_t cflags;/* compile flags */
 #define CF_COUNT_MASK  0x7fff
 #define CF_LAST_IO 0x8000 /* Last insn may be an IO access.  */
+#define CF_USE_ICOUNT  0x1
 
 void *tc_ptr;/* pointer to the translated code */
 /* next matching tb for physical address. */
@@ -168,7 +170,6 @@ struct TranslationBlock {
jmp_first */
 struct TranslationBlock *jmp_next[2];
 struct TranslationBlock *jmp_first;
-uint32_t icount;
 };
 
 #include "exec/spinlock.h"
diff --git a/translate-all.c b/translate-all.c
index 7177b71..e9f5178 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1047,6 +1047,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 int code_gen_size;
 
 phys_pc = get_page_addr_code(env, pc);
+if (use_icount) {
+cflags |= CF_USE_ICOUNT;
+}
 tb = tb_alloc(pc);
 if (!tb) {
 /* flush must be done */

[Qemu-devel] [RFC PATCH v5 30/31] replay: command line options

2014-11-26 Thread Pavel Dovgalyuk

This patch introduces command line options for enabling recording or replaying
virtual machine behavior. "-record" option starts recording of the execution
and saves it into the log, specified with "fname" parameter. "-replay" option
is intended for replaying previously saved log.

Signed-off-by: Pavel Dovgalyuk 
---
 cpus.c  |3 +-
 qemu-options.hx |   27 +++
 vl.c|   79 ++-
 3 files changed, 106 insertions(+), 3 deletions(-)

diff --git a/cpus.c b/cpus.c
index 8780eee..c8fca6c 100644
--- a/cpus.c
+++ b/cpus.c
@@ -929,9 +929,10 @@ static void qemu_wait_io_event_common(CPUState *cpu)
 static void qemu_tcg_wait_io_event(void)
 {
 CPUState *cpu;
+GMainContext *context = g_main_context_default();
 
 while (all_cpu_threads_idle()) {
-   /* Start accounting real time to the virtual clock if the CPUs
+/* Start accounting real time to the virtual clock if the CPUs
   are idle.  */
 qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
 qemu_cond_wait(tcg_halt_cond, &qemu_global_mutex);
diff --git a/qemu-options.hx b/qemu-options.hx
index 22cf3b9..44ad8fc 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3380,6 +3380,33 @@ Dump json-encoded vmstate information for current 
machine type to file
 in @var{file}
 ETEXI
 
+DEF("record", HAS_ARG, QEMU_OPTION_record,
+"-record fname=[,suffix=,snapshot=]\n"
+"writes replay file for latter replaying\n",
+QEMU_ARCH_ALL)
+STEXI
+@item -record fname=@var{file}[,suffix=@var{suffix},snapshot=@var{snapshot}]
+Writes compact execution trace into @var{file}.
+Changes for disk images are written
+into separate files with @var{suffix} added. If no @var{suffix} is
+specified, "replay_qcow" is used as suffix.
+If @var{snapshot} parameter is set as off, then original disk image will be
+modified. Default value is on.
+ETEXI
+
+DEF("replay", HAS_ARG, QEMU_OPTION_replay,
+"-replay fname=[,suffix=,snapshot=]\n"
+"plays saved replay file\n", QEMU_ARCH_ALL)
+STEXI
+@item -replay 
fname=@var{filename}[,suffix=@var{suffix},snapshot=@var{snapshot}]
+Plays compact execution trace from @var{filename}.
+Changes for disk images and VM states are read
+from separate files with @var{suffix} added. If no @var{suffix} is
+specified, "replay_qcow" is used as suffix.
+If @var{snapshot} parameter is set as off, then original disk image will be
+modified. Default value is on.
+ETEXI
+
 HXCOMM This is the last statement. Insert new options before this line!
 STEXI
 @end table
diff --git a/vl.c b/vl.c
index 877a77c..fb5003e 100644
--- a/vl.c
+++ b/vl.c
@@ -548,6 +548,42 @@ static QemuOptsList qemu_icount_opts = {
 },
 };
 
+static QemuOptsList qemu_record_opts = {
+.name = "record",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_record_opts.head),
+.desc = {
+{
+.name = "fname",
+.type = QEMU_OPT_STRING,
+},{
+.name = "suffix",
+.type = QEMU_OPT_STRING,
+},{
+.name = "snapshot",
+.type = QEMU_OPT_BOOL,
+},
+{ /* end of list */ }
+},
+};
+
+static QemuOptsList qemu_replay_opts = {
+.name = "replay",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_replay_opts.head),
+.desc = {
+{
+.name = "fname",
+.type = QEMU_OPT_STRING,
+},{
+.name = "suffix",
+.type = QEMU_OPT_STRING,
+},{
+.name = "snapshot",
+.type = QEMU_OPT_BOOL,
+},
+{ /* end of list */ }
+},
+};
+
 /**
  * Get machine options
  *
@@ -2711,7 +2747,9 @@ out:
 int main(int argc, char **argv, char **envp)
 {
 int i;
-int snapshot, linux_boot;
+int snapshot, linux_boot, replay_snapshot;
+int not_compatible_replay_param = 0;
+const char *icount_option = NULL;
 const char *initrd_filename;
 const char *kernel_filename, *kernel_cmdline;
 const char *boot_order;
@@ -2784,6 +2822,8 @@ int main(int argc, char **argv, char **envp)
 qemu_add_opts(&qemu_name_opts);
 qemu_add_opts(&qemu_numa_opts);
 qemu_add_opts(&qemu_icount_opts);
+qemu_add_opts(&qemu_replay_opts);
+qemu_add_opts(&qemu_record_opts);
 
 runstate_init();
 
@@ -2797,6 +2837,7 @@ int main(int argc, char **argv, char **envp)
 cpu_model = NULL;
 ram_size = default_ram_size;
 snapshot = 0;
+replay_snapshot = 1;
 cyls = heads = secs = 0;
 translation = BIOS_ATA_TRANSLATION_AUTO;
 
@@ -2914,6 +2955,7 @@ int main(int argc, char **argv, char **envp)
 break;
 case QEMU_OPTION_pflash:
 drive_add(IF_PFLASH, -1, optarg, PFLASH_OPTS);
+not_compatible_replay_param++;
 break;
 case QEMU_OPTION_snapshot:
 snapshot = 1;
@@ -3070,6 +3112,7 @@ int main(int argc, char **argv, char **envp)
 #endif
 case QEMU_OPTION

[Qemu-devel] [RFC PATCH v5 27/31] replay: replay aio requests

2014-11-26 Thread Pavel Dovgalyuk

This patch adds identifier to aio requests. ID is used for creating bottom
halves and identifying them while replaying.
The patch also introduces several functions that make possible replaying
of the aio requests.

Signed-off-by: Pavel Dovgalyuk 
---
 block.c|   81 
 block/block-backend.c  |   30 ++-
 block/qcow2.c  |4 ++
 dma-helpers.c  |6 ++-
 hw/block/virtio-blk.c  |   10 ++---
 hw/ide/atapi.c |   10 +++--
 hw/ide/core.c  |   14 ---
 include/block/block.h  |   15 +++
 include/qemu-common.h  |2 +
 include/sysemu/block-backend.h |   10 +
 qemu-io-cmds.c |2 -
 stubs/replay.c |5 ++
 trace-events   |2 +
 util/iov.c |4 ++
 14 files changed, 167 insertions(+), 28 deletions(-)

diff --git a/block.c b/block.c
index cc84050..02c6a78 100644
--- a/block.c
+++ b/block.c
@@ -83,7 +83,8 @@ static BlockAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
  BdrvRequestFlags flags,
  BlockCompletionFunc *cb,
  void *opaque,
- bool is_write);
+ bool is_write,
+ bool aio_replay);
 static void coroutine_fn bdrv_co_do_rw(void *opaque);
 static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, BdrvRequestFlags flags);
@@ -4342,7 +4343,19 @@ BlockAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t 
sector_num,
 trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
 
 return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors, 0,
- cb, opaque, false);
+ cb, opaque, false, false);
+}
+
+BlockAIOCB *bdrv_aio_readv_replay(BlockDriverState *bs,
+  int64_t sector_num,
+  QEMUIOVector *qiov, int nb_sectors,
+  BlockCompletionFunc *cb,
+  void *opaque)
+{
+trace_bdrv_aio_readv_replay(bs, sector_num, nb_sectors, opaque);
+
+return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors, 0,
+ cb, opaque, false, true);
 }
 
 BlockAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
@@ -4352,7 +4365,19 @@ BlockAIOCB *bdrv_aio_writev(BlockDriverState *bs, 
int64_t sector_num,
 trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
 
 return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors, 0,
- cb, opaque, true);
+ cb, opaque, true, false);
+}
+
+BlockAIOCB *bdrv_aio_writev_replay(BlockDriverState *bs,
+   int64_t sector_num,
+   QEMUIOVector *qiov, int nb_sectors,
+   BlockCompletionFunc *cb,
+   void *opaque)
+{
+trace_bdrv_aio_writev_replay(bs, sector_num, nb_sectors, opaque);
+
+return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors, 0,
+ cb, opaque, true, true);
 }
 
 BlockAIOCB *bdrv_aio_write_zeroes(BlockDriverState *bs,
@@ -4363,7 +4388,7 @@ BlockAIOCB *bdrv_aio_write_zeroes(BlockDriverState *bs,
 
 return bdrv_co_aio_rw_vector(bs, sector_num, NULL, nb_sectors,
  BDRV_REQ_ZERO_WRITE | flags,
- cb, opaque, true);
+ cb, opaque, true, true);
 }
 
 
@@ -4505,7 +4530,8 @@ static int multiwrite_merge(BlockDriverState *bs, 
BlockRequest *reqs,
  * requests. However, the fields opaque and error are left unmodified as they
  * are used to signal failure for a single request to the caller.
  */
-int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int num_reqs)
+int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int num_reqs,
+bool replay)
 {
 MultiwriteCB *mcb;
 int i;
@@ -4543,7 +4569,7 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, 
BlockRequest *reqs, int num_reqs)
 bdrv_co_aio_rw_vector(bs, reqs[i].sector, reqs[i].qiov,
   reqs[i].nb_sectors, reqs[i].flags,
   multiwrite_cb, mcb,
-  true);
+  true, replay);
 }
 
 return 0;
@@ -4688,7 +4714,12 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
 acb->req.nb_sectors, acb->req.qiov, acb->req.flags);
 }
 
-acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_em_bh, acb);
+if (acb->common.replay) {
+acb-

[Qemu-devel] [RFC PATCH v5 15/31] cpu-exec: allow temporary disabling icount

2014-11-26 Thread Pavel Dovgalyuk

This patch is required for deterministic replay to generate an exception
by trying executing an instruction without changing icount.
It adds new flag to TB for disabling icount while translating it.

Signed-off-by: Pavel Dovgalyuk 
Signed-off-by: Paolo Bonzini 

Signed-off-by: Pavel Dovgalyuk 
---
 cpu-exec.c  |6 +++---
 include/exec/exec-all.h |1 +
 translate-all.c |2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 4df9856..05341cf 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -198,7 +198,7 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, 
uint8_t *tb_ptr)
 /* Execute the code without caching the generated code. An interpreter
could be used if available. */
 static void cpu_exec_nocache(CPUArchState *env, int max_cycles,
- TranslationBlock *orig_tb)
+ TranslationBlock *orig_tb, bool ignore_icount)
 {
 CPUState *cpu = ENV_GET_CPU(env);
 TranslationBlock *tb;
@@ -214,7 +214,7 @@ static void cpu_exec_nocache(CPUArchState *env, int 
max_cycles,
 /* tb_gen_code can flush our orig_tb, invalidate it now */
 tb_phys_invalidate(orig_tb, -1);
 tb = tb_gen_code(cpu, pc, cs_base, flags,
- max_cycles);
+ max_cycles | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
 cpu->current_tb = tb;
 /* execute the generated code */
 trace_exec_tb_nocache(tb, tb->pc);
@@ -517,7 +517,7 @@ int cpu_exec(CPUArchState *env)
 } else {
 if (insns_left > 0) {
 /* Execute remaining instructions.  */
-cpu_exec_nocache(env, insns_left, tb);
+cpu_exec_nocache(env, insns_left, tb, false);
 align_clocks(&sc, cpu);
 }
 cpu->exception_index = EXCP_INTERRUPT;
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 3d19e72..1e6d7e8 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -147,6 +147,7 @@ struct TranslationBlock {
 #define CF_COUNT_MASK  0x7fff
 #define CF_LAST_IO 0x8000 /* Last insn may be an IO access.  */
 #define CF_USE_ICOUNT  0x1
+#define CF_IGNORE_ICOUNT 0x2 /* Do not generate icount code */
 
 void *tc_ptr;/* pointer to the translated code */
 /* next matching tb for physical address. */
diff --git a/translate-all.c b/translate-all.c
index c256d58..21a78a4 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1047,7 +1047,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 int code_gen_size;
 
 phys_pc = get_page_addr_code(env, pc);
-if (use_icount) {
+if (use_icount && !(cflags & CF_IGNORE_ICOUNT)) {
 cflags |= CF_USE_ICOUNT;
 }
 tb = tb_alloc(pc);

Re: [Qemu-devel] [RFC PATCH v5 00/31] Deterministic replay and reverse execution

2014-11-26 Thread Pavel Dovgaluk

That covermail was wrong. Here is the correct one:




This set of patches is related to the reverse execution and deterministic 
replay of qemu execution  This implementation of deterministic replay can 
be used for deterministic debugging of guest code through gdb remote
interface.

These patches include only core function of the replay,
excluding the support for replaying serial, audio, network, and USB devices'
operations. Reverse debugging and monitor commands were also excluded to
be submitted later as separate patches.

Execution recording writes non-deterministic events log, which can be later 
used for replaying the execution anywhere and for unlimited number of times. 
It also supports checkpointing for faster rewinding during reverse debugging. 
Execution replaying reads the log and replays all non-deterministic events 
including external input, hardware clocks, and interrupts.

Deterministic replay has the following features:
 * Deterministically replays whole system execution and all contents of the 
memory,
   state of the hadrware devices, clocks, and screen of the VM.
 * Writes execution log into the file for latter replaying for multiple times 
   on different machines.
 * Supports i386, x86_64, and ARM hardware platforms.
 * Performs deterministic replay of all operations with keyboard and mouse
   input devices.
 * Supports auto-checkpointing for convenient reverse debugging.

Usage of the record/replay:
 * First, record the execution, by adding the following string to the command 
line:
   '-record fname=replay.bin -icount 7 -net none'. Block devices' images are not
   actually changed in the recording mode, because all of the changes are
   written to the temporary overlay file.
 * Then you can replay it for the multiple times by using another command
   line option: '-replay fname=replay.bin -icount 7 -net none'
 * '-net none' option should also be specified if network replay patches
   are not applied.

Paper with short description of deterministic replay implementation:
http://www.computer.org/csdl/proceedings/csmr/2012/4666/00/4666a553-abs.html

Modifications of qemu include:
 * wrappers for clock and time functions to save their return values in the log
 * saving different asynchronous events (e.g. system shutdown) into the log
 * synchronization of the bottom halves execution
 * synchronization of the threads from thread pool
 * recording/replaying user input (mouse and keyboard)
 * adding internal events for cpu and io synchronization

v5 changes:
 * Minor changes.
 * Used fixed-width integer types for read/write functions (as suggested by 
Alex Bennee)
 * Moved savevm-related code out of the core.
 * Added new traced clock for deterministic virtual clock warping (as suggested 
by Paolo Bonzini)
 * Fixed exception_index reset for user mode (as suggested by Paolo Bonzini)
 * Adopted Paolo's icount patches
 * Fixed hardware interrupts replaying

v4 changes:
 * Updated block drivers to support new bdrv_open interface.
 * Moved migration patches into separate series (as suggested by Paolo Bonzini)
 * Fixed a bug in replay_break operation.
 * Fixed rtl8139 migration for replay.
 * Fixed 'period' parameter processing for record mode.
 * Fixed bug in 'reverse-stepi' implementation.
 * Fixed replay without making any snapshots (even the starting one).
 * Moved core replay patches into the separate series.
 * Fixed reverse step and reverse continue support.
 * Fixed several bugs in icount subsystem.
 * Reusing native qemu icount for replay instructions counting.
 * Separated core patches into their own series.

v3 changes:
 * Fixed bug with replay of the aio write operations.
 * Added virtual clock based on replay icount.
 * Removed duplicated saving of interrupt_request CPU field.
 * Fixed some coding style issues.
 * Renamed QMP commands for controlling reverse execution (as suggested by Eric 
Blake)
 * Replay mode and submode implemented as QAPI enumerations (as suggested by 
Eric Blake)
 * Added description and example for replay-info command (as suggested by Eric 
Blake)
 * Added information about the current breakpoint to the output of replay-info 
(as suggested by Eric Blake)
 * Updated version id for HPET vmstate (as suggested by Paolo Bonzini)
 * Removed static fields from parallel vmstate (as suggested by Paolo Bonzini)
 * New vmstate fields for mc146818rtc, pckbd, kvmapic, serial, fdc, rtl8139 
moved to subsection (as suggested by Paolo Bonzini)
 * Disabled textmode cursor blinking, when virtual machine is stopped (as 
suggested by Paolo Bonzini)
 * Extracted saving of exception_index to separate patch (as suggested by Paolo 
Bonzini)

v2 changes:
 * Patches are split to be reviewable and bisectable (as suggested by Kirill 
Batuzov)
 * Added QMP versions of replay commands (as suggested by Eric Blake)
 * Removed some optional features of replay to make patches cleaner
 * Minor changes and code cleanup were made

[Qemu-devel] [RFC PATCH v5 16/31] cpu-exec: invalidate nocache translation if they are interrupted

2014-11-26 Thread Pavel Dovgalyuk

In this case, QEMU might longjmp out of cpu-exec.c and miss the final
cleanup in cpu_exec_nocache.  Do this manually through a new compile
flag.  This is important once we add no-icount translations.

Signed-off-by: Paolo Bonzini 

Signed-off-by: Pavel Dovgalyuk 
---
 cpu-exec.c  |2 +-
 include/exec/exec-all.h |1 +
 translate-all.c |6 ++
 3 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 05341cf..65cbeb7 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -214,7 +214,7 @@ static void cpu_exec_nocache(CPUArchState *env, int 
max_cycles,
 /* tb_gen_code can flush our orig_tb, invalidate it now */
 tb_phys_invalidate(orig_tb, -1);
 tb = tb_gen_code(cpu, pc, cs_base, flags,
- max_cycles | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
+ max_cycles | CF_NOCACHE | (ignore_icount ? 
CF_IGNORE_ICOUNT : 0));
 cpu->current_tb = tb;
 /* execute the generated code */
 trace_exec_tb_nocache(tb, tb->pc);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 1e6d7e8..d409565 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -148,6 +148,7 @@ struct TranslationBlock {
 #define CF_LAST_IO 0x8000 /* Last insn may be an IO access.  */
 #define CF_USE_ICOUNT  0x1
 #define CF_IGNORE_ICOUNT 0x2 /* Do not generate icount code */
+#define CF_NOCACHE 0x4 /* To be freed after execution */
 
 void *tc_ptr;/* pointer to the translated code */
 /* next matching tb for physical address. */
diff --git a/translate-all.c b/translate-all.c
index 21a78a4..269c4ba 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -264,6 +264,12 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr)
 tb = tb_find_pc(retaddr);
 if (tb) {
 cpu_restore_state_from_tb(cpu, tb, retaddr);
+if (tb->cflags & CF_NOCACHE) {
+/* one-shot translation, invalidate it immediately */
+cpu->current_tb = NULL;
+tb_phys_invalidate(tb, -1);
+tb_free(tb);
+}
 return true;
 }
 return false;

Re: [Qemu-devel] [RFC PATCH v5 20/31] replay: recording and replaying clock ticks

2014-11-26 Thread Paolo Bonzini



On 26/11/2014 11:40, Pavel Dovgalyuk wrote:
> +/* real time host monotonic timer implementation */
> +static inline int64_t get_clock_realtime_impl(void)
>  {
>  struct timeval tv;
>  
> @@ -708,6 +709,12 @@ static inline int64_t get_clock_realtime(void)
>  return tv.tv_sec * 10LL + (tv.tv_usec * 1000);
>  }
>  
> +/* real time host monotonic timer interface */
> +static inline int64_t get_clock_realtime(void)
> +{
> +return REPLAY_CLOCK(REPLAY_CLOCK_HOST, get_clock_realtime_impl());
> +}
> +

Any reason to do this instead of using REPLAY_CLOCK in qemu_get_clock,
like you do for QEMU_CLOCK_VIRTUAL_RT?

Paolo

[Qemu-devel] [RFC PATCH v5 17/31] replay: interrupts and exceptions

2014-11-26 Thread Pavel Dovgalyuk

This patch includes modifications of common cpu files. All interrupts and
exceptions occured during recording are written into the replay log.
These events allow correct replaying the execution by kicking cpu thread
when one of these events is found in the log.

Signed-off-by: Pavel Dovgalyuk 
---
 cpu-exec.c   |   33 +++--
 replay/replay-internal.h |4 +++
 replay/replay.c  |   53 ++
 replay/replay.h  |   17 +++
 4 files changed, 100 insertions(+), 7 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 65cbeb7..05cca50 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -24,6 +24,7 @@
 #include "qemu/atomic.h"
 #include "sysemu/qtest.h"
 #include "qemu/timer.h"
+#include "replay/replay.h"
 
 /* -icount align implementation. */
 
@@ -391,10 +392,21 @@ int cpu_exec(CPUArchState *env)
 cpu->exception_index = -1;
 break;
 #else
-cc->do_interrupt(cpu);
-cpu->exception_index = -1;
+if (replay_exception()) {
+cc->do_interrupt(cpu);
+cpu->exception_index = -1;
+} else if (!replay_has_interrupt()) {
+/* give a chance to iothread in replay mode */
+ret = EXCP_INTERRUPT;
+break;
+}
 #endif
 }
+} else if (replay_has_exception()
+   && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
+/* try to cause an exception pending in the log */
+cpu_exec_nocache(env, 1, tb_find_fast(env), true);
+break;
 }
 
 next_tb = 0; /* force lookup of first TB */
@@ -410,21 +422,24 @@ int cpu_exec(CPUArchState *env)
 cpu->exception_index = EXCP_DEBUG;
 cpu_loop_exit(cpu);
 }
-if (interrupt_request & CPU_INTERRUPT_HALT) {
+if ((interrupt_request & CPU_INTERRUPT_HALT)
+&& replay_interrupt()) {
 cpu->interrupt_request &= ~CPU_INTERRUPT_HALT;
 cpu->halted = 1;
 cpu->exception_index = EXCP_HLT;
 cpu_loop_exit(cpu);
 }
 #if defined(TARGET_I386)
-if (interrupt_request & CPU_INTERRUPT_INIT) {
+if ((interrupt_request & CPU_INTERRUPT_INIT)
+&& replay_interrupt()) {
 cpu_svm_check_intercept_param(env, SVM_EXIT_INIT, 0);
 do_cpu_init(x86_cpu);
 cpu->exception_index = EXCP_HALTED;
 cpu_loop_exit(cpu);
 }
 #else
-if (interrupt_request & CPU_INTERRUPT_RESET) {
+if ((interrupt_request & CPU_INTERRUPT_RESET)
+&& replay_interrupt()) {
 cpu_reset(cpu);
 }
 #endif
@@ -432,7 +447,10 @@ int cpu_exec(CPUArchState *env)
False when the interrupt isn't processed,
True when it is, and we should restart on a new TB,
and via longjmp via cpu_loop_exit.  */
-if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
+if ((replay_mode != REPLAY_MODE_PLAY
+|| replay_has_interrupt())
+&& cc->cpu_exec_interrupt(cpu, interrupt_request)) {
+replay_interrupt();
 next_tb = 0;
 }
 /* Don't use the cached interrupt_request value,
@@ -444,7 +462,8 @@ int cpu_exec(CPUArchState *env)
 next_tb = 0;
 }
 }
-if (unlikely(cpu->exit_request)) {
+if (unlikely(cpu->exit_request
+ || replay_has_interrupt())) {
 cpu->exit_request = 0;
 cpu->exception_index = EXCP_INTERRUPT;
 cpu_loop_exit(cpu);
diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index 582b44c..fd5c230 100755
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -14,6 +14,10 @@
 
 #include 
 
+/* for software interrupt */
+#define EVENT_INTERRUPT 15
+/* for emulated exceptions */
+#define EVENT_EXCEPTION 23
 /* for instruction event */
 #define EVENT_INSTRUCTION   32
 
diff --git a/replay/replay.c b/replay/replay.c
index c305e0c..c275794 100755
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -68,3 +68,56 @@ uint64_t replay_get_current_step(void)
 {
 return cpu_get_instructions_counter();
 }
+
+bool replay_exception(void)
+{
+if (replay_mo

[Qemu-devel] [RFC PATCH v5 24/31] replay: shutdown event

2014-11-26 Thread Pavel Dovgalyuk

This patch records and replays simulator shutdown event.

Signed-off-by: Pavel Dovgalyuk 
---
 include/sysemu/sysemu.h  |1 +
 replay/replay-internal.h |2 ++
 replay/replay.c  |   11 +++
 replay/replay.h  |5 +
 vl.c |8 +++-
 5 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 0037a69..bf4b1bd 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -57,6 +57,7 @@ void qemu_register_suspend_notifier(Notifier *notifier);
 void qemu_system_wakeup_request(WakeupReason reason);
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
+void qemu_system_shutdown_request_impl(void);
 void qemu_system_shutdown_request(void);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index 009029d..ec6973d 100755
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -20,6 +20,8 @@
 #define EVENT_TM2
 /* for software interrupt */
 #define EVENT_INTERRUPT 15
+/* for shutdown request */
+#define EVENT_SHUTDOWN  20
 /* for emulated exceptions */
 #define EVENT_EXCEPTION 23
 /* for async events */
diff --git a/replay/replay.c b/replay/replay.c
index a6de6a1..c118f62 100755
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -43,6 +43,10 @@ bool skip_async_events(int stop_event)
 res = true;
 }
 switch (replay_data_kind) {
+case EVENT_SHUTDOWN:
+replay_has_unread_data = 0;
+qemu_system_shutdown_request_impl();
+break;
 case EVENT_INSTRUCTION:
 replay_state.instructions_count = replay_get_dword();
 return res;
@@ -139,3 +143,10 @@ bool replay_has_interrupt(void)
 }
 return false;
 }
+
+void replay_shutdown_request(void)
+{
+if (replay_mode == REPLAY_MODE_RECORD) {
+replay_put_event(EVENT_SHUTDOWN);
+}
+}
diff --git a/replay/replay.h b/replay/replay.h
index 0c02e03..00c9906 100755
--- a/replay/replay.h
+++ b/replay/replay.h
@@ -78,6 +78,11 @@ void replay_save_tm(struct tm *tm);
 /*! Reads struct tm value from the log. Stops execution in case of error */
 void replay_read_tm(struct tm *tm);
 
+/* Events */
+
+/*! Called when qemu shutdown is requested. */
+void replay_shutdown_request(void);
+
 /* Asynchronous events queue */
 
 /*! Disables storing events in the queue */
diff --git a/vl.c b/vl.c
index 37c6616..4155342 100644
--- a/vl.c
+++ b/vl.c
@@ -1792,13 +1792,19 @@ void qemu_system_killed(int signal, pid_t pid)
 qemu_system_shutdown_request();
 }
 
-void qemu_system_shutdown_request(void)
+void qemu_system_shutdown_request_impl(void)
 {
 trace_qemu_system_shutdown_request();
 shutdown_requested = 1;
 qemu_notify_event();
 }
 
+void qemu_system_shutdown_request(void)
+{
+replay_shutdown_request();
+qemu_system_shutdown_request_impl();
+}
+
 static void qemu_system_powerdown(void)
 {
 qapi_event_send_powerdown(&error_abort);

[Qemu-devel] [RFC PATCH v5 31/31] replay: recording of the user input

2014-11-26 Thread Pavel Dovgalyuk

This records user input (keyboard and mouse events) in record mode and replays
these input events in replay mode.

Signed-off-by: Pavel Dovgalyuk 
---
 include/ui/input.h   |2 +
 replay/Makefile.objs |1 
 replay/replay-events.c   |   48 
 replay/replay-input.c|  108 ++
 replay/replay-internal.h |   11 -
 replay/replay.h  |5 ++
 ui/input.c   |   80 ++
 7 files changed, 235 insertions(+), 20 deletions(-)
 create mode 100755 replay/replay-input.c

diff --git a/include/ui/input.h b/include/ui/input.h
index 5d5ac00..d06a12d 100644
--- a/include/ui/input.h
+++ b/include/ui/input.h
@@ -33,7 +33,9 @@ void qemu_input_handler_bind(QemuInputHandlerState *s,
  const char *device_id, int head,
  Error **errp);
 void qemu_input_event_send(QemuConsole *src, InputEvent *evt);
+void qemu_input_event_send_impl(QemuConsole *src, InputEvent *evt);
 void qemu_input_event_sync(void);
+void qemu_input_event_sync_impl(void);
 
 InputEvent *qemu_input_event_new_key(KeyValue *key, bool down);
 void qemu_input_event_send_key(QemuConsole *src, KeyValue *key, bool down);
diff --git a/replay/Makefile.objs b/replay/Makefile.objs
index 257c320..3936296 100755
--- a/replay/Makefile.objs
+++ b/replay/Makefile.objs
@@ -2,3 +2,4 @@ obj-y += replay.o
 obj-y += replay-internal.o
 obj-y += replay-events.o
 obj-y += replay-time.o
+obj-y += replay-input.o
diff --git a/replay/replay-events.c b/replay/replay-events.c
index 4da5de0..308186b 100755
--- a/replay/replay-events.c
+++ b/replay/replay-events.c
@@ -13,6 +13,7 @@
 #include "replay.h"
 #include "replay-internal.h"
 #include "block/thread-pool.h"
+#include "ui/input.h"
 
 typedef struct Event {
 int event_kind;
@@ -43,6 +44,16 @@ static void replay_run_event(Event *event)
 case REPLAY_ASYNC_EVENT_THREAD:
 thread_pool_work((ThreadPool *)event->opaque, event->opaque2);
 break;
+case REPLAY_ASYNC_EVENT_INPUT:
+qemu_input_event_send_impl(NULL, (InputEvent *)event->opaque);
+/* Using local variables, when replaying. Do not free them. */
+if (replay_mode == REPLAY_MODE_RECORD) {
+qapi_free_InputEvent((InputEvent *)event->opaque);
+}
+break;
+case REPLAY_ASYNC_EVENT_INPUT_SYNC:
+qemu_input_event_sync_impl();
+break;
 default:
 fprintf(stderr, "Replay: invalid async event ID (%d) in the queue\n",
 event->event_kind);
@@ -136,6 +147,16 @@ void replay_add_thread_event(void *opaque, void *opaque2, 
uint64_t id)
 replay_add_event_internal(REPLAY_ASYNC_EVENT_THREAD, opaque, opaque2, id);
 }
 
+void replay_add_input_event(struct InputEvent *event)
+{
+replay_add_event_internal(REPLAY_ASYNC_EVENT_INPUT, event, NULL, 0);
+}
+
+void replay_add_input_sync_event(void)
+{
+replay_add_event_internal(REPLAY_ASYNC_EVENT_INPUT_SYNC, NULL, NULL, 0);
+}
+
 void replay_save_events(int opt)
 {
 qemu_mutex_lock(&lock);
@@ -153,6 +174,9 @@ void replay_save_events(int opt)
 case REPLAY_ASYNC_EVENT_THREAD:
 replay_put_qword(event->id);
 break;
+case REPLAY_ASYNC_EVENT_INPUT:
+replay_save_input_event(event->opaque);
+break;
 }
 }
 
@@ -178,6 +202,7 @@ void replay_read_events(int opt)
 break;
 }
 /* Execute some events without searching them in the queue */
+Event e;
 switch (read_event_kind) {
 case REPLAY_ASYNC_EVENT_BH:
 case REPLAY_ASYNC_EVENT_THREAD:
@@ -185,6 +210,29 @@ void replay_read_events(int opt)
 read_id = replay_get_qword();
 }
 break;
+case REPLAY_ASYNC_EVENT_INPUT:
+e.event_kind = read_event_kind;
+e.opaque = replay_read_input_event();
+
+replay_run_event(&e);
+
+replay_has_unread_data = 0;
+read_event_kind = -1;
+read_opt = -1;
+replay_fetch_data_kind();
+/* continue with the next event */
+continue;
+case REPLAY_ASYNC_EVENT_INPUT_SYNC:
+e.event_kind = read_event_kind;
+e.opaque = 0;
+replay_run_event(&e);
+
+replay_has_unread_data = 0;
+read_event_kind = -1;
+read_opt = -1;
+replay_fetch_data_kind();
+/* continue with the next event */
+continue;
 default:
 fprintf(stderr, "Unknown ID %d of replay event\n", 
read_event_kind);
 exit(1);
diff --git a/replay/replay-input.c b/replay/replay-input.c
new file mode 100755
index 000..f5d1482
--- /dev/null
+++ b/replay/replay-input.c
@@ -0,0 +1,108 @@
+/*
+ * replay-input.c
+ *
+ * Copyright (c) 2010-2014 Institute for System Programming
+ *

Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support

2014-11-26 Thread Eric Auger

On 11/26/2014 11:24 AM, Alexander Graf wrote:
> 
> 
> On 26.11.14 10:45, Eric Auger wrote:
>> On 11/05/2014 11:29 AM, Alexander Graf wrote:
>>>
>>>
>>> On 31.10.14 15:05, Eric Auger wrote:
 Minimal VFIO platform implementation supporting
 - register space user mapping,
 - IRQ assignment based on eventfds handled on qemu side.

 irqfd kernel acceleration comes in a subsequent patch.

 Signed-off-by: Kim Phillips 
 Signed-off-by: Eric Auger 
> 
> [...]
> 
 +/*
 + * Mechanics to program/start irq injection on machine init done notifier:
 + * this is needed since at finalize time, the device IRQ are not yet
 + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
 + * always is used. Binding to the platform bus IRQ happens on a machine
 + * init done notifier registered by the machine file. After its execution
 + * we execute a new notifier that actually starts the injection. When 
 using
 + * irqfd, programming the injection consists in associating eventfds to
 + * GSI number,ie. virtual IRQ number
 + */
 +
 +typedef struct VfioIrqStarterNotifierParams {
 +unsigned int platform_bus_first_irq;
 +Notifier notifier;
 +} VfioIrqStarterNotifierParams;
 +
 +typedef struct VfioIrqStartParams {
 +PlatformBusDevice *pbus;
 +int platform_bus_first_irq;
 +} VfioIrqStartParams;
 +
 +/* Start injection of IRQ for a specific VFIO device */
 +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
 +{
 +int i;
 +VfioIrqStartParams *p = opaque;
 +VFIOPlatformDevice *vdev;
 +VFIODevice *vbasedev;
 +uint64_t irq_number;
 +PlatformBusDevice *pbus = p->pbus;
 +int platform_bus_first_irq = p->platform_bus_first_irq;
 +
 +if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
 +vdev = VFIO_PLATFORM_DEVICE(sbdev);
 +vbasedev = &vdev->vbasedev;
 +for (i = 0; i < vbasedev->num_irqs; i++) {
 +irq_number = platform_bus_get_irqn(pbus, sbdev, i)
 + + platform_bus_first_irq;
 +vfio_start_irq_injection(sbdev, i, irq_number);
 +}
 +}
 +return 0;
 +}
 +
 +/* loop on all VFIO platform devices and start their IRQ injection */
 +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
 +{
 +VfioIrqStarterNotifierParams *p =
 +container_of(notifier, VfioIrqStarterNotifierParams, notifier);
 +DeviceState *dev =
 +qdev_find_recursive(sysbus_get_default(), 
 TYPE_PLATFORM_BUS_DEVICE);
 +PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
 +
 +if (pbus->done_gathering) {
 +VfioIrqStartParams data = {
 +.pbus = pbus,
 +.platform_bus_first_irq = p->platform_bus_first_irq,
 +};
 +
 +foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
 +}
 +}
 +
 +/* registers the machine init done notifier that will start VFIO IRQ */
 +void vfio_register_irq_starter(int platform_bus_first_irq)
 +{
 +VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 
 1);
 +
 +p->platform_bus_first_irq = platform_bus_first_irq;
 +p->notifier.notify = vfio_irq_starter_notify;
 +qemu_add_machine_init_done_notifier(&p->notifier);
>>>
>>> Could you add a notifier for each device instead? Then the notifier
>>> would be part of the vfio device struct and not some dangling random
>>> pointer :).
>>>
>>> Of course instead of foreach_dynamic_sysbus_device() you would directly
>>> know the device you're dealing with and only handle a single device per
>>> notifier.
>>
>> Hi Alex,
>>
>> I don't see how to practically follow your request:
>>
>> - at machine init time, VFIO devices are not yet instantiated so I
>> cannot call foreach_dynamic_sysbus_device() there - I was definitively
>> wrong in my first reply :-().
>>
>> - I can't register a per VFIO device notifier in the VFIO device
>> finalize function because this latter is called after the platform bus
>> instantiation. So the IRQ binding notifier (registered in platform bus
>> finalize fn) would be called after the IRQ starter notifier.
>>
>> - then to simplify things a bit I could use a qemu_register_reset in
>> place of a machine init done notifier (would relax the call order
>> constraint) but the problem consists in passing the platform bus first
>> irq (all the more so you requested it became part of a const struct)
>>
>> Do I miss something?
> 
> So the basic idea is that the device itself calls
> qemu_add_machine_init_done_notifier() in its realize function. The
> Notifier struct would be part of the device state which means you can
> cast yourself into the VFIO device state.

humm, the vfio device is instantia

Re: [Qemu-devel] [Xen-devel] virtio leaks cpu mappings, was: qemu crash with virtio on Xen domUs (backtrace included)

2014-11-26 Thread Stefano Stabellini

On Wed, 26 Nov 2014, Jason Wang wrote:
> On 11/25/2014 09:53 PM, Stefano Stabellini wrote:
> > On Tue, 25 Nov 2014, Jason Wang wrote:
> >> On 11/25/2014 02:44 AM, Stefano Stabellini wrote:
> >>> On Mon, 24 Nov 2014, Stefano Stabellini wrote:
>  On Mon, 24 Nov 2014, Stefano Stabellini wrote:
> > CC'ing Paolo.
> >
> >
> > Wen,
> > thanks for the logs.
> >
> > I investigated a little bit and it seems to me that the bug occurs when
> > QEMU tries to unmap only a portion of a memory region previously mapped.
> > That doesn't work with xen-mapcache.
> >
> > See these logs for example:
> >
> > DEBUG address_space_map phys_addr=78ed8b44 vaddr=7fab50afbb68 len=0xa
> > DEBUG address_space_unmap vaddr=7fab50afbb68 len=0x6
>  Sorry the logs don't quite match, it was supposed to be:
> 
>  DEBUG address_space_map phys_addr=78ed8b44 vaddr=7fab50afbb64 len=0xa
>  DEBUG address_space_unmap vaddr=7fab50afbb68 len=0x6
> >>> It looks like the problem is caused by iov_discard_front, called by
> >>> virtio_net_handle_ctrl. By changing iov_base after the sg has already
> >>> been mapped (cpu_physical_memory_map), it causes a leak in the mapping
> >>> because the corresponding cpu_physical_memory_unmap will only unmap a
> >>> portion of the original sg.  On Xen the problem is worse because
> >>> xen-mapcache aborts.
> >>>
> >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>> index 2ac6ce5..b2b5c2d 100644
> >>> --- a/hw/net/virtio-net.c
> >>> +++ b/hw/net/virtio-net.c
> >>> @@ -775,7 +775,7 @@ static void virtio_net_handle_ctrl(VirtIODevice 
> >>> *vdev, VirtQueue *vq)
> >>>  struct iovec *iov;
> >>>  unsigned int iov_cnt;
> >>>  
> >>> -while (virtqueue_pop(vq, &elem)) {
> >>> +while (virtqueue_pop_nomap(vq, &elem)) {
> >>>  if (iov_size(elem.in_sg, elem.in_num) < sizeof(status) ||
> >>>  iov_size(elem.out_sg, elem.out_num) < sizeof(ctrl)) {
> >>>  error_report("virtio-net ctrl missing headers");
> >>> @@ -784,8 +784,12 @@ static void virtio_net_handle_ctrl(VirtIODevice 
> >>> *vdev, VirtQueue *vq)
> >>>  
> >>>  iov = elem.out_sg;
> >>>  iov_cnt = elem.out_num;
> >>> -s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
> >>>  iov_discard_front(&iov, &iov_cnt, sizeof(ctrl));
> >>> +
> >>> +virtqueue_map_sg(elem.in_sg, elem.in_addr, elem.in_num, 1);
> >>> +virtqueue_map_sg(elem.out_sg, elem.out_addr, elem.out_num, 0);
> >>> +
> >>> +s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
> >> Does this really work?
> > It seems to work here, as in it doesn't crash QEMU and I am able to boot
> > a guest with network. I didn't try any MAC related commands.
> >
> 
> It was because the guest (not a recent kernel?) never issue commands
> through control vq.
> 
> We'd better hide the implementation details such as virtqueue_map_sg()
> in virtio core instead of letting device call it directly.
> >> The code in fact skips the location that contains
> >> virtio_net_ctrl_hdr. And virtio_net_handle_mac() still calls
> >> iov_discard_front().
> >>
> >> How about copy iov to a temp variable and use it in this function?
> > That would only work if I moved the cpu_physical_memory_unmap call
> > outside of virtqueue_fill, so that we can pass different iov to them.
> > We need to unmap the same iov that was previously mapped by
> > virtqueue_pop.
> >
> 
> I mean something like following or just passing the offset of iov to
> virtio_net_handle_*().

Sorry, you are right, your patch works too. I tried something like this
yesterday but I was confused because even if a crash doesn't happen
anymore, virtio-net still doesn't work on Xen (it boots but the network
doesn't work properly within the guest).
But that seems to be a separate issue and it affects my series too.

A possible problem with this approach is that virtqueue_push is now
called passing the original iov, not the shortened one.

Are you sure that is OK?
If so we can drop my series and use this instead.


> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 9b88775..fdb4edd 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -798,7 +798,7 @@ static void virtio_net_handle_ctrl(VirtIODevice
> *vdev, VirtQueue *vq)
>  virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
>  VirtQueueElement elem;
>  size_t s;
> -struct iovec *iov;
> +struct iovec *iov, *iov2;
>  unsigned int iov_cnt;
>  
>  while (virtqueue_pop(vq, &elem)) {
> @@ -808,8 +808,12 @@ static void virtio_net_handle_ctrl(VirtIODevice
> *vdev, VirtQueue *vq)
>  exit(1);
>  }
>  
> -iov = elem.out_sg;
>  iov_cnt = elem.out_num;
> +s = sizeof(struct iovec) * elem.out_num;
> +iov = g_malloc(s);
> +memcpy(iov, elem.out_sg, s);
> +iov2 = iov;
> +
>  s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
>  iov_discard_fro

Re: [Qemu-devel] [RFC PATCH v5 22/31] timer: introduce new QEMU_CLOCK_VIRTUAL_RT clock

2014-11-26 Thread Paolo Bonzini



On 26/11/2014 11:40, Pavel Dovgalyuk wrote:
> This patch introduces new QEMU_CLOCK_VIRTUAL_RT clock, which
> should be used for icount warping. Separate timer is needed
> for replaying the execution, because warping callbacks should
> be deterministic. We cannot make realtime clock deterministic
> because it is used for screen updates and other simulator-specific
> actions. That is why we added new clock which is recorded and
> replayed when needed.
> 
> Signed-off-by: Pavel Dovgalyuk 
> ---
>  include/qemu/timer.h |7 +++
>  qemu-timer.c |2 ++
>  replay/replay.h  |4 +++-
>  3 files changed, 12 insertions(+), 1 deletions(-)
> 
> diff --git a/include/qemu/timer.h b/include/qemu/timer.h
> index 7b43331..df27157 100644
> --- a/include/qemu/timer.h
> +++ b/include/qemu/timer.h
> @@ -37,12 +37,19 @@
>   * is suspended, and it will reflect system time changes the host may
>   * undergo (e.g. due to NTP). The host clock has the same precision as
>   * the virtual clock.
> + *
> + * @QEMU_CLOCK_VIRTUAL_RT: realtime clock used for icount warp
> + *
> + * This clock runs as a realtime clock, but is used for icount warp
> + * and thus should be traced with record/replay to make warp function
> + * behave deterministically.
>   */

I think it should also stop/restart across "stop" and "cont" commands,
similar to QEMU_CLOCK_VIRTUAL.  This is as simple as changing
get_clock() to cpu_get_clock().

This way, QEMU_CLOCK_VIRTUAL_RT is "what QEMU_CLOCK_VIRTUAL does without
-icount".  This makes a lot of sense and can be merged in 2.3
independent of the rest of the series.

Paolo

>  typedef enum {
>  QEMU_CLOCK_REALTIME = 0,
>  QEMU_CLOCK_VIRTUAL = 1,
>  QEMU_CLOCK_HOST = 2,
> +QEMU_CLOCK_VIRTUAL_RT = 3,
>  QEMU_CLOCK_MAX
>  } QEMUClockType;
>  
> diff --git a/qemu-timer.c b/qemu-timer.c
> index 8307913..3f99af5 100644
> --- a/qemu-timer.c
> +++ b/qemu-timer.c
> @@ -567,6 +567,8 @@ int64_t qemu_clock_get_ns(QEMUClockType type)
>  notifier_list_notify(&clock->reset_notifiers, &now);
>  }
>  return now;
> +case QEMU_CLOCK_VIRTUAL_RT:
> +return REPLAY_CLOCK(REPLAY_CLOCK_VIRTUAL_RT, get_clock());
>  }
>  }
>  
> diff --git a/replay/replay.h b/replay/replay.h
> index 143fe85..0c02e03 100755
> --- a/replay/replay.h
> +++ b/replay/replay.h
> @@ -22,8 +22,10 @@
>  #define REPLAY_CLOCK_REAL_TICKS 0
>  /* host_clock */
>  #define REPLAY_CLOCK_HOST   1
> +/* virtual_rt_clock */
> +#define REPLAY_CLOCK_VIRTUAL_RT 2
>  
> -#define REPLAY_CLOCK_COUNT  2
> +#define REPLAY_CLOCK_COUNT  3
>  
>  extern ReplayMode replay_mode;
>  extern char *replay_image_suffix;
>

Re: [Qemu-devel] [PATCH] s390x/kvm: Fix compile error

2014-11-26 Thread Michael S. Tsirkin

On Wed, Nov 26, 2014 at 11:07:24AM +0100, Christian Borntraeger wrote:
> commit a2b257d6212a "memory: expose alignment used for allocating RAM
> as MemoryRegion API" triggered a compile error on KVM/s390x.
> 
> Fix the prototype and the implementation of legacy_s390_alloc.
> 
> Cc: Igor Mammedov 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Christian Borntraeger 

Reviewed-by: Michael S. Tsirkin 

> ---
>  target-s390x/kvm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
> index 80fb0aa..5349075 100644
> --- a/target-s390x/kvm.c
> +++ b/target-s390x/kvm.c
> @@ -110,7 +110,7 @@ static int cap_async_pf;
>  
>  static uint64_t cpu_model_call_cache;
>  
> -static void *legacy_s390_alloc(size_t size);
> +static void *legacy_s390_alloc(size_t size, uint64_t *align);
>  
>  static int kvm_s390_check_clear_cmma(KVMState *s)
>  {
> @@ -545,7 +545,7 @@ int kvm_s390_set_clock(uint8_t *tod_clock_high, uint64_t 
> *tod_clock)
>   * to grow. We also have to use MAP parameters that avoid
>   * read-only mapping of guest pages.
>   */
> -static void *legacy_s390_alloc(size_t size, , uint64_t *align)
> +static void *legacy_s390_alloc(size_t size, uint64_t *align)
>  {
>  void *mem;
>  
> -- 
> 1.9.3

Re: [Qemu-devel] [PATCH 2/2] balloon: add a feature bit to let Guest OS deflate balloon on oom

2014-11-26 Thread Michael S. Tsirkin

On Wed, Nov 26, 2014 at 01:11:25PM +0300, Denis V. Lunev wrote:
> From: Raushaniya Maksudova 
> 
> Excessive virtio_balloon inflation can cause invocation of OOM-killer,
> when Linux is under severe memory pressure. Various mechanisms are
> responsible for correct virtio_balloon memory management. Nevertheless it
> is often the case that these control tools does not have enough time to
> react on fast changing memory load. As a result OS runs out of memory and
> invokes OOM-killer. The balancing of memory by use of the virtio balloon
> should not cause the termination of processes while there are pages in the
> balloon. Now there is no way for virtio balloon driver to free memory at
> the last moment before some process get killed by OOM-killer.
> 
> This does not provide a security breach as balloon itself is running
> inside Guest OS and is working in the cooperation with the host. Thus
> some improvements from Guest side should be considered as normal.
> 
> To solve the problem, introduce a virtio_balloon callback which is
> expected to be called from the oom notifier call chain in out_of_memory()
> function. If virtio balloon could release some memory, it will make the
> system to return and retry the allocation that forced the out of memory
> killer to run.
> 
> This behavior should be enabled if and only if appropriate feature bit
> is set on the device. It is off by default.
> 
> This functionality was recently merged into vanilla Linux (actually in
> linux-next at the moment)
> 
>   commit 5a10b7dbf904bfe01bb9fcc6298f7df09eed77d5
>   Author: Raushaniya Maksudova 
>   Date:   Mon Nov 10 09:36:29 2014 +1030
> 
> This patch adds respective control bits into QEMU. It introduces
> deflate-on-oom option for baloon device which do the trick.
> 
> Signed-off-by: Raushaniya Maksudova 
> Signed-off-by: Denis V. Lunev 
> CC: Anthony Liguori 
> CC: Michael S. Tsirkin 
> ---
>  hw/virtio/virtio-balloon.c | 7 +++
>  include/hw/virtio/virtio-balloon.h | 2 ++
>  qemu-options.hx| 6 +-
>  3 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 7bfbb75..9d145fa 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -305,7 +305,12 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
>  
>  static uint32_t virtio_balloon_get_features(VirtIODevice *vdev, uint32_t f)
>  {
> +VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>  f |= (1 << VIRTIO_BALLOON_F_STATS_VQ);
> +if (dev->deflate_on_oom) {
> +f |= (1 << VIRTIO_BALLOON_F_DEFLATE_ON_OOM);
> +}
> +
>  return f;
>  }
>  
> @@ -409,6 +414,7 @@ static void virtio_balloon_device_unrealize(DeviceState 
> *dev, Error **errp)
>  }
>  
>  static Property virtio_balloon_properties[] = {
> +DEFINE_PROP_BOOL("deflate-on-oom", VirtIOBalloon, deflate_on_oom, false),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/virtio/virtio-balloon.h 
> b/include/hw/virtio/virtio-balloon.h
> index f863bfe..45cc55a 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -30,6 +30,7 @@
>  /* The feature bitmap for virtio balloon */
>  #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */
>  #define VIRTIO_BALLOON_F_STATS_VQ 1   /* Memory stats virtqueue */
> +#define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */
>  
>  /* Size of a PFN in the balloon interface. */
>  #define VIRTIO_BALLOON_PFN_SHIFT 12
> @@ -67,6 +68,7 @@ typedef struct VirtIOBalloon {
>  QEMUTimer *stats_timer;
>  int64_t stats_last_update;
>  int64_t stats_poll_interval;
> +bool deflate_on_oom;
>  } VirtIOBalloon;
>  
>  #endif

You don't need an extra bool, and open-coding.
Do it same as we do for other features please,
set bit in feature mask directly.

> diff --git a/qemu-options.hx b/qemu-options.hx
> index da9851d..14ede0b 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -324,7 +324,8 @@ ETEXI
>  DEF("balloon", HAS_ARG, QEMU_OPTION_balloon,
>  "-balloon none   disable balloon device\n"
>  "-balloon virtio[,addr=str]\n"
> -"enable virtio balloon device (default)\n", 
> QEMU_ARCH_ALL)
> +"enable virtio balloon device (default)\n"
> +"   [,deflate-on-oom=on|off]\n", QEMU_ARCH_ALL)
>  STEXI
>  @item -balloon none
>  @findex -balloon
> @@ -332,6 +333,9 @@ Disable balloon device.
>  @item -balloon virtio[,addr=@var{addr}]
>  Enable virtio balloon device (default), optionally with PCI address
>  @var{addr}.
> +@item -balloon virtio[,deflate-on-oom=@var{deflate-on-oom}]
> +@var{deflate-on-oom} is "on" or "off" and enables whether to let Guest OS
> +to deflate virtio balloon on OOM. Default is off.
>  ETEXI
>  
>  DEF("device", HAS_ARG, QEMU_OPTION_device,

Please don't add stuff to legacy -balloon.
New -device is enough, you don't need to touch qemu-options.hx
for it.

Re: [Qemu-devel] [PATCH v6 1/3] linux-aio: fix submit aio as a batch

2014-11-26 Thread Kevin Wolf

Am 25.11.2014 um 08:23 hat Ming Lei geschrieben:
> In the submit path, we can't complete request directly,
> otherwise "Co-routine re-entered recursively" may be caused,
> so this patch fixes the issue with below ideas:
> 
>   - for -EAGAIN or partial completion, retry the submision
>   in following completion cb which is run in BH context
>   - for part of completion, update the io queue too
>   - for case of io queue full, submit queued requests
>   immediatelly and return failure to caller
>   - for other failure, abort all queued requests in BH
> context, and requests won't be allow to submit until
> aborting is handled
> 
> Reviewed-by: Paolo Bonzini 
> Signed-off-by: Ming Lei 

This looks like a quite complex fix to this problem, and it introduces
new error cases, too (while aborting, requests fail now, but they really
should just be waiting).

I'm wondering if this is the time to convert the linux-aio interface to
coroutines finally. It wouldn't only be a performance optimisation, but
would potentially also simplify this code.

>  block/linux-aio.c |  114 
> -
>  1 file changed, 95 insertions(+), 19 deletions(-)
> 
> diff --git a/block/linux-aio.c b/block/linux-aio.c
> index d92513b..11ac828 100644
> --- a/block/linux-aio.c
> +++ b/block/linux-aio.c
> @@ -38,11 +38,20 @@ struct qemu_laiocb {
>  QLIST_ENTRY(qemu_laiocb) node;
>  };
>  
> +/*
> + * TODO: support to batch I/O from multiple bs in one same
> + * AIO context, one important use case is multi-lun scsi,
> + * so in future the IO queue should be per AIO context.
> + */
>  typedef struct {
>  struct iocb *iocbs[MAX_QUEUED_IO];
>  int plugged;
>  unsigned int size;
>  unsigned int idx;
> +
> +/* abort queued requests in BH context */
> +QEMUBH *abort_bh;
> +bool  aborting;

Two spaces.

>  } LaioQueue;
>  
>  struct qemu_laio_state {
> @@ -59,6 +68,8 @@ struct qemu_laio_state {
>  int event_max;
>  };
>  
> +static int ioq_submit(struct qemu_laio_state *s);
> +
>  static inline ssize_t io_event_ret(struct io_event *ev)
>  {
>  return (ssize_t)(((uint64_t)ev->res2 << 32) | ev->res);
> @@ -91,6 +102,13 @@ static void qemu_laio_process_completion(struct 
> qemu_laio_state *s,
>  qemu_aio_unref(laiocb);
>  }
>  
> +static void qemu_laio_start_retry(struct qemu_laio_state *s)
> +{
> +if (s->io_q.idx) {
> +ioq_submit(s);
> +}
> +}
> +
>  /* The completion BH fetches completed I/O requests and invokes their
>   * callbacks.
>   *
> @@ -135,6 +153,8 @@ static void qemu_laio_completion_bh(void *opaque)
>  
>  qemu_laio_process_completion(s, laiocb);
>  }
> +
> +qemu_laio_start_retry(s);
>  }

Why is qemu_laio_start_retry() a separate function? This is the only
caller.

>  static void qemu_laio_completion_cb(EventNotifier *e)
> @@ -175,47 +195,99 @@ static void ioq_init(LaioQueue *io_q)
>  io_q->size = MAX_QUEUED_IO;
>  io_q->idx = 0;
>  io_q->plugged = 0;
> +io_q->aborting = false;
>  }
>  
> +/* Always return >= 0 and it means how many requests are submitted */
>  static int ioq_submit(struct qemu_laio_state *s)
>  {
> -int ret, i = 0;
> +int ret;
>  int len = s->io_q.idx;
>  
> -do {
> -ret = io_submit(s->ctx, len, s->io_q.iocbs);
> -} while (i++ < 3 && ret == -EAGAIN);
> -
> -/* empty io queue */
> -s->io_q.idx = 0;
> +if (!len) {
> +return 0;
> +}
>  
> +ret = io_submit(s->ctx, len, s->io_q.iocbs);
>  if (ret < 0) {
> -i = 0;
> -} else {
> -i = ret;
> +/* retry in following completion cb */
> +if (ret == -EAGAIN) {
> +return 0;
> +}
> +
> +/*
> + * Abort in BH context for avoiding Co-routine re-entered,
> + * and update io queue at that time
> + */
> +qemu_bh_schedule(s->io_q.abort_bh);
> +s->io_q.aborting = true;
> +ret = 0;
>  }
>  
> -for (; i < len; i++) {
> -struct qemu_laiocb *laiocb =
> -container_of(s->io_q.iocbs[i], struct qemu_laiocb, iocb);

> +/*
> + * update io queue, and retry will be started automatically
> + * in following completion cb for the remainder
> + */
> +if (ret > 0) {
> +if (ret < len) {
> +memmove(&s->io_q.iocbs[0], &s->io_q.iocbs[ret],
> +(len - ret) * sizeof(struct iocb *));
> +}
> +s->io_q.idx -= ret;
> +}

Support for partly handled queues is nice, but a logically separate
change. Please move this to its own patch.

> -laiocb->ret = (ret < 0) ? ret : -EIO;
> +return ret;
> +}
> +
> +static void ioq_abort_bh(void *opaque)
> +{
> +struct qemu_laio_state *s = opaque;
> +int i;
> +
> +for (i = 0; i < s->io_q.idx; i++) {
> +struct qemu_laiocb *laiocb = container_of(s->io_q.iocbs[i],
> +  struct qe

Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support

2014-11-26 Thread Alexander Graf



On 26.11.14 11:48, Eric Auger wrote:
> On 11/26/2014 11:24 AM, Alexander Graf wrote:
>>
>>
>> On 26.11.14 10:45, Eric Auger wrote:
>>> On 11/05/2014 11:29 AM, Alexander Graf wrote:


 On 31.10.14 15:05, Eric Auger wrote:
> Minimal VFIO platform implementation supporting
> - register space user mapping,
> - IRQ assignment based on eventfds handled on qemu side.
>
> irqfd kernel acceleration comes in a subsequent patch.
>
> Signed-off-by: Kim Phillips 
> Signed-off-by: Eric Auger 
>>
>> [...]
>>
> +/*
> + * Mechanics to program/start irq injection on machine init done 
> notifier:
> + * this is needed since at finalize time, the device IRQ are not yet
> + * bound to the platform bus IRQ. It is assumed here dynamic 
> instantiation
> + * always is used. Binding to the platform bus IRQ happens on a machine
> + * init done notifier registered by the machine file. After its execution
> + * we execute a new notifier that actually starts the injection. When 
> using
> + * irqfd, programming the injection consists in associating eventfds to
> + * GSI number,ie. virtual IRQ number
> + */
> +
> +typedef struct VfioIrqStarterNotifierParams {
> +unsigned int platform_bus_first_irq;
> +Notifier notifier;
> +} VfioIrqStarterNotifierParams;
> +
> +typedef struct VfioIrqStartParams {
> +PlatformBusDevice *pbus;
> +int platform_bus_first_irq;
> +} VfioIrqStartParams;
> +
> +/* Start injection of IRQ for a specific VFIO device */
> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
> +{
> +int i;
> +VfioIrqStartParams *p = opaque;
> +VFIOPlatformDevice *vdev;
> +VFIODevice *vbasedev;
> +uint64_t irq_number;
> +PlatformBusDevice *pbus = p->pbus;
> +int platform_bus_first_irq = p->platform_bus_first_irq;
> +
> +if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
> +vdev = VFIO_PLATFORM_DEVICE(sbdev);
> +vbasedev = &vdev->vbasedev;
> +for (i = 0; i < vbasedev->num_irqs; i++) {
> +irq_number = platform_bus_get_irqn(pbus, sbdev, i)
> + + platform_bus_first_irq;
> +vfio_start_irq_injection(sbdev, i, irq_number);
> +}
> +}
> +return 0;
> +}
> +
> +/* loop on all VFIO platform devices and start their IRQ injection */
> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
> +{
> +VfioIrqStarterNotifierParams *p =
> +container_of(notifier, VfioIrqStarterNotifierParams, notifier);
> +DeviceState *dev =
> +qdev_find_recursive(sysbus_get_default(), 
> TYPE_PLATFORM_BUS_DEVICE);
> +PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
> +
> +if (pbus->done_gathering) {
> +VfioIrqStartParams data = {
> +.pbus = pbus,
> +.platform_bus_first_irq = p->platform_bus_first_irq,
> +};
> +
> +foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
> +}
> +}
> +
> +/* registers the machine init done notifier that will start VFIO IRQ */
> +void vfio_register_irq_starter(int platform_bus_first_irq)
> +{
> +VfioIrqStarterNotifierParams *p = 
> g_new(VfioIrqStarterNotifierParams, 1);
> +
> +p->platform_bus_first_irq = platform_bus_first_irq;
> +p->notifier.notify = vfio_irq_starter_notify;
> +qemu_add_machine_init_done_notifier(&p->notifier);

 Could you add a notifier for each device instead? Then the notifier
 would be part of the vfio device struct and not some dangling random
 pointer :).

 Of course instead of foreach_dynamic_sysbus_device() you would directly
 know the device you're dealing with and only handle a single device per
 notifier.
>>>
>>> Hi Alex,
>>>
>>> I don't see how to practically follow your request:
>>>
>>> - at machine init time, VFIO devices are not yet instantiated so I
>>> cannot call foreach_dynamic_sysbus_device() there - I was definitively
>>> wrong in my first reply :-().
>>>
>>> - I can't register a per VFIO device notifier in the VFIO device
>>> finalize function because this latter is called after the platform bus
>>> instantiation. So the IRQ binding notifier (registered in platform bus
>>> finalize fn) would be called after the IRQ starter notifier.
>>>
>>> - then to simplify things a bit I could use a qemu_register_reset in
>>> place of a machine init done notifier (would relax the call order
>>> constraint) but the problem consists in passing the platform bus first
>>> irq (all the more so you requested it became part of a const struct)
>>>
>>> Do I miss something?
>>
>> So the basic idea is that the device itself calls
>> qemu_add_machine_init_done_notifier() in its

Re: [Qemu-devel] [2.3 PATCH v7 01/10] qapi: Add optional field "name" to block dirty bitmap

2014-11-26 Thread Max Reitz


On 2014-11-25 at 20:46, John Snow wrote:

From: Fam Zheng 

This field will be set for user created dirty bitmap. Also pass in an
error pointer to bdrv_create_dirty_bitmap, so when a name is already
taken on this BDS, it can report an error message. This is not global
check, two BDSes can have dirty bitmap with a common name.

Implemented bdrv_find_dirty_bitmap to find a dirty bitmap by name, will
be used later when other QMP commands want to reference dirty bitmap by
name.

Add bdrv_dirty_bitmap_make_anon. This unsets the name of dirty bitmap.

Signed-off-by: Fam Zheng 
Signed-off-by: John Snow 
---
  block-migration.c |  2 +-
  block.c   | 32 +++-
  block/mirror.c|  2 +-
  include/block/block.h |  7 ++-
  qapi/block-core.json  |  4 +++-
  5 files changed, 42 insertions(+), 5 deletions(-)


Reviewed-by: Max Reitz

Re: [Qemu-devel] [PATCH v6 2/3] linux-aio: handling -EAGAIN for !s->io_q.plugged case

2014-11-26 Thread Kevin Wolf

Am 25.11.2014 um 08:23 hat Ming Lei geschrieben:
> Previously -EAGAIN is simply ignored for !s->io_q.plugged case,
> and sometimes it is easy to cause -EIO to VM, such as NVME device.
> 
> This patch handles -EAGAIN by io queue for !s->io_q.plugged case,
> and it will be retried in following aio completion cb.
> 
> Reviewed-by: Paolo Bonzini 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Ming Lei 
> ---
>  block/linux-aio.c |   24 
>  1 file changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/block/linux-aio.c b/block/linux-aio.c
> index 11ac828..ac25722 100644
> --- a/block/linux-aio.c
> +++ b/block/linux-aio.c
> @@ -282,8 +282,13 @@ static int ioq_enqueue(struct qemu_laio_state *s, struct 
> iocb *iocb)
>  s->io_q.iocbs[idx++] = iocb;
>  s->io_q.idx = idx;
>  
> -/* submit immediately if queue depth is above 2/3 */
> -if (idx > s->io_q.size * 2 / 3) {
> +/*
> + * This is reached in two cases: queue not plugged but io_submit
> + * returned -EAGAIN, or queue plugged.  In the latter case, start
> + * submitting some I/O if the queue is getting too full.  In the
> + * former case, instead, wait until an I/O operation is completed.
> + */

Are we guaranteed that an I/O operation is in flight when we get
-EAGAIN? The manpage of io_submit isn't very clear on this,
"insufficient resources" could be for any reason.

Because otherwise we might not ever submit this request.

> +if (s->io_q.plugged && unlikely(idx > s->io_q.size * 2 / 3)) {
>  ioq_submit(s);
>  }

Kevin

Re: [Qemu-devel] [RFC PATCH v5 23/31] cpus: make icount warp deterministic in replay mode

2014-11-26 Thread Paolo Bonzini



On 26/11/2014 11:40, Pavel Dovgalyuk wrote:
> This patch adds saving and replaying warping parameters in record and replay
> modes. These parameters affect on virtual clock values and therefore should
> be deterministic.
> 
> Signed-off-by: Pavel Dovgalyuk 

I think this makes warping behave better when you "stop" and "cont" the
VM.  We should apply this independent of the rest of the series.

Paolo

> ---
>  cpus.c |   14 +++---
>  1 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 707bf34..f6a6319 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -370,7 +370,7 @@ static void icount_warp_rt(void *opaque)
>  
>  seqlock_write_lock(&timers_state.vm_clock_seqlock);
>  if (runstate_is_running()) {
> -int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> +int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT);
>  int64_t warp_delta;
>  
>  warp_delta = clock - vm_clock_warp_start;
> @@ -444,7 +444,7 @@ void qemu_clock_warp(QEMUClockType type)
>  }
>  
>  /* We want to use the earliest deadline from ALL vm_clocks */
> -clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> +clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT);
>  deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
>  if (deadline < 0) {
>  return;
> @@ -537,8 +537,8 @@ void configure_icount(QemuOpts *opts, Error **errp)
>  return;
>  }
>  icount_align_option = qemu_opt_get_bool(opts, "align", false);
> -icount_warp_timer = timer_new_ns(QEMU_CLOCK_REALTIME,
> -  icount_warp_rt, NULL);
> +icount_warp_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
> + icount_warp_rt, NULL);
>  if (strcmp(option, "auto") != 0) {
>  errno = 0;
>  icount_time_shift = strtol(option, &rem_str, 0);
> @@ -562,10 +562,10 @@ void configure_icount(QemuOpts *opts, Error **errp)
> the virtual time trigger catches emulated time passing too fast.
> Realtime triggers occur even when idle, so use them less frequently
> than VM triggers.  */
> -icount_rt_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
> -icount_adjust_rt, NULL);
> +icount_rt_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL_RT,
> +   icount_adjust_rt, NULL);
>  timer_mod(icount_rt_timer,
> -   qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + 1000);
> +   qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) + 1000);
>  icount_vm_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>  icount_adjust_vm, NULL);
>  timer_mod(icount_vm_timer,
>

Re: [Qemu-devel] [PATCH v6 3/3] linux-aio: remove 'node' from 'struct qemu_laiocb'

2014-11-26 Thread Kevin Wolf

Am 25.11.2014 um 08:23 hat Ming Lei geschrieben:
> No one uses the 'node' field any more, so remove it
> from 'struct qemu_laiocb', and this can save 16byte
> for the struct on 64bit arch.
> 
> Reviewed-by: Paolo Bonzini 
> Signed-off-by: Ming Lei 

Useful on its own, even without the other patches of the series.

Reviewed-by: Kevin Wolf

[Qemu-devel] [PULL 1/3] -machine vmport=auto: Fix handling of VMWare ioport emulation for xen

2014-11-26 Thread Paolo Bonzini

From: Don Slutz 

c/s 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4

or

c/s b154537ad07598377ebf98252fb7d2aff127983b

moved the testing of xen_enabled() from pc_init1() to
pc_machine_initfn().

xen_enabled() does not return the correct value in
pc_machine_initfn().

Changed vmport from a bool to an enum.  Added the value "auto" to do
the old way.  Move check of xen_enabled() back to pc_init1().

Acked-by: Eric Blake 
Reviewed-by: Eduardo Habkost 
Signed-off-by: Don Slutz 
Signed-off-by: Paolo Bonzini 
---
 hw/i386/pc.c | 22 +-
 hw/i386/pc_piix.c|  7 ++-
 hw/i386/pc_q35.c |  7 ++-
 include/hw/i386/pc.h |  2 +-
 qapi/common.json | 15 +++
 qemu-options.hx  |  8 +---
 vl.c |  2 +-
 7 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8be50a4..f31d55e 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -61,6 +61,7 @@
 #include "hw/mem/pc-dimm.h"
 #include "trace.h"
 #include "qapi/visitor.h"
+#include "qapi-visit.h"
 
 /* debug PC/ISA interrupts */
 //#define DEBUG_IRQ
@@ -1772,18 +1773,21 @@ static void pc_machine_set_max_ram_below_4g(Object 
*obj, Visitor *v,
 pcms->max_ram_below_4g = value;
 }
 
-static bool pc_machine_get_vmport(Object *obj, Error **errp)
+static void pc_machine_get_vmport(Object *obj, Visitor *v, void *opaque,
+  const char *name, Error **errp)
 {
 PCMachineState *pcms = PC_MACHINE(obj);
+OnOffAuto vmport = pcms->vmport;
 
-return pcms->vmport;
+visit_type_OnOffAuto(v, &vmport, name, errp);
 }
 
-static void pc_machine_set_vmport(Object *obj, bool value, Error **errp)
+static void pc_machine_set_vmport(Object *obj, Visitor *v, void *opaque,
+  const char *name, Error **errp)
 {
 PCMachineState *pcms = PC_MACHINE(obj);
 
-pcms->vmport = value;
+visit_type_OnOffAuto(v, &pcms->vmport, name, errp);
 }
 
 static bool pc_machine_get_aligned_dimm(Object *obj, Error **errp)
@@ -1806,11 +1810,11 @@ static void pc_machine_initfn(Object *obj)
 pc_machine_set_max_ram_below_4g,
 NULL, NULL, NULL);
 
-pcms->vmport = !xen_enabled();
-object_property_add_bool(obj, PC_MACHINE_VMPORT,
- pc_machine_get_vmport,
- pc_machine_set_vmport,
- NULL);
+pcms->vmport = ON_OFF_AUTO_AUTO;
+object_property_add(obj, PC_MACHINE_VMPORT, "OnOffAuto",
+pc_machine_get_vmport,
+pc_machine_set_vmport,
+NULL, NULL, NULL);
 
 pcms->enforce_aligned_dimm = true;
 object_property_add_bool(obj, PC_MACHINE_ENFORCE_ALIGNED_DIMM,
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 741dffd..85ed3c8 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -234,9 +234,14 @@ static void pc_init1(MachineState *machine,
 
 pc_vga_init(isa_bus, pci_enabled ? pci_bus : NULL);
 
+assert(pc_machine->vmport != ON_OFF_AUTO_MAX);
+if (pc_machine->vmport == ON_OFF_AUTO_AUTO) {
+pc_machine->vmport = xen_enabled() ? ON_OFF_AUTO_OFF : ON_OFF_AUTO_ON;
+}
+
 /* init basic PC hardware */
 pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy,
- !pc_machine->vmport, 0x4);
+ (pc_machine->vmport != ON_OFF_AUTO_ON), 0x4);
 
 pc_nic_init(isa_bus, pci_bus);
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index e9ba1a2..0262b5e 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -242,9 +242,14 @@ static void pc_q35_init(MachineState *machine)
 
 pc_register_ferr_irq(gsi[13]);
 
+assert(pc_machine->vmport != ON_OFF_AUTO_MAX);
+if (pc_machine->vmport == ON_OFF_AUTO_AUTO) {
+pc_machine->vmport = xen_enabled() ? ON_OFF_AUTO_OFF : ON_OFF_AUTO_ON;
+}
+
 /* init basic PC hardware */
 pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy,
- !pc_machine->vmport, 0xff0104);
+ (pc_machine->vmport != ON_OFF_AUTO_ON), 0xff0104);
 
 /* connect pm stuff to lpc */
 ich9_lpc_pm_init(lpc);
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 9d85b89..69d9cf8 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -39,7 +39,7 @@ struct PCMachineState {
 ISADevice *rtc;
 
 uint64_t max_ram_below_4g;
-bool vmport;
+OnOffAuto vmport;
 bool enforce_aligned_dimm;
 };
 
diff --git a/qapi/common.json b/qapi/common.json
index 4e9a21f..63ef3b4 100644
--- a/qapi/common.json
+++ b/qapi/common.json
@@ -87,3 +87,18 @@
 ##
 { 'command': 'query-commands', 'returns': ['CommandInfo'] }
 
+##
+# @OnOffAuto
+#
+# An enumeration of three options: on, off, and auto
+#
+# @auto: QEMU selects the value between on and off
+#
+# @on: Enabled
+#
+# @off: Disabled
+#
+# Since: 2.2
+##
+{ 'enum': 'OnOffAuto',
+  'data': [ 'auto', 'on', 'off' ] }
diff -

[Qemu-devel] [PULL for-2.2 0/3] Misc fixes for 2014-11-26

2014-11-26 Thread Paolo Bonzini

The following changes since commit 2528043f1f299e0e88cb026f1ca7c40bbb4e1f80:

  Update version for v2.2.0-rc3 release (2014-11-25 18:23:54 +)

are available in the git repository at:

  git://github.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to dc622deb2d49aac6afa485f9025be8fed440ef3d:

  s390x/kvm: Fix compile error (2014-11-26 12:11:27 +0100)


The final 2.2 patches from me.


Christian Borntraeger (1):
  s390x/kvm: Fix compile error

Don Slutz (1):
  -machine vmport=auto: Fix handling of VMWare ioport emulation for xen

Gonglei (1):
  fw_cfg: fix boot order bug when dynamically modified via QOM

 hw/i386/pc.c | 22 +-
 hw/i386/pc_piix.c|  7 ++-
 hw/i386/pc_q35.c |  7 ++-
 hw/nvram/fw_cfg.c|  7 +--
 include/hw/i386/pc.h |  2 +-
 qapi/common.json | 15 +++
 qemu-options.hx  |  8 +---
 target-s390x/kvm.c   |  4 ++--
 vl.c |  2 +-
 9 files changed, 54 insertions(+), 20 deletions(-)
-- 
1.8.3.1

[Qemu-devel] [PULL 2/3] fw_cfg: fix boot order bug when dynamically modified via QOM

2014-11-26 Thread Paolo Bonzini

From: Gonglei 

When we dynamically modify boot order, the length of
boot order will be changed, but we don't update
s->files->f[i].size with new length. This casuse
seabios read a wrong vale of qemu cfg file about
bootorder.

Cc: Gerd Hoffmann 
Cc: Paolo Bonzini 
Signed-off-by: Gonglei 
Signed-off-by: Paolo Bonzini 
---
 hw/nvram/fw_cfg.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index e7ed27e..a7122ee 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -523,6 +523,7 @@ void *fw_cfg_modify_file(FWCfgState *s, const char 
*filename,
 void *data, size_t len)
 {
 int i, index;
+void *ptr = NULL;
 
 assert(s->files);
 
@@ -531,8 +532,10 @@ void *fw_cfg_modify_file(FWCfgState *s, const char 
*filename,
 
 for (i = 0; i < index; i++) {
 if (strcmp(filename, s->files->f[i].name) == 0) {
-return fw_cfg_modify_bytes_read(s, FW_CFG_FILE_FIRST + i,
- data, len);
+ptr = fw_cfg_modify_bytes_read(s, FW_CFG_FILE_FIRST + i,
+   data, len);
+s->files->f[i].size   = cpu_to_be32(len);
+return ptr;
 }
 }
 /* add new one */
-- 
1.8.3.1

[Qemu-devel] [PULL 3/3] s390x/kvm: Fix compile error

2014-11-26 Thread Paolo Bonzini

From: Christian Borntraeger 

commit a2b257d6212a "memory: expose alignment used for allocating RAM
as MemoryRegion API" triggered a compile error on KVM/s390x.

Fix the prototype and the implementation of legacy_s390_alloc.

Cc: Igor Mammedov 
Cc: Michael S. Tsirkin 
Signed-off-by: Christian Borntraeger 
Signed-off-by: Paolo Bonzini 
---
 target-s390x/kvm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 50709ba..2c638ab 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -106,7 +106,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 static int cap_sync_regs;
 static int cap_async_pf;
 
-static void *legacy_s390_alloc(size_t size);
+static void *legacy_s390_alloc(size_t size, uint64_t *align);
 
 static int kvm_s390_check_clear_cmma(KVMState *s)
 {
@@ -404,7 +404,7 @@ int kvm_arch_get_registers(CPUState *cs)
  * to grow. We also have to use MAP parameters that avoid
  * read-only mapping of guest pages.
  */
-static void *legacy_s390_alloc(size_t size, , uint64_t *align)
+static void *legacy_s390_alloc(size_t size, uint64_t *align)
 {
 void *mem;
 
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH] target-i386: add feature flags for CPUID[EAX=0xd, ECX=1]

2014-11-26 Thread Eduardo Habkost

On Wed, Nov 26, 2014 at 10:20:12AM +0100, Paolo Bonzini wrote:
> 
> 
> On 25/11/2014 21:02, Paolo Bonzini wrote:
> > > > +static const char *cpuid_xsave_feature_name[] = {
> > > > +"xsaveopt", "xsavec", "xgetbv1", "xsaves",
> > > 
> > > None of the above features introduce any new state that might need to be
> > > migrated, or will require other changes in QEMU to work, right?
> > > 
> > > It looks like they don't introduce any extra state, but if they do, they
> > > need to be added to unmigratable_flags until migration support is
> > > implemented.
> > > 
> > > If they require other QEMU changes, it would be nice if KVM reported
> > > them using KVM_CHECK_EXTENSION instead of GET_SUPPORTED_CPUID, so it
> > > wouldn't break "-cpu host".
> > 
> > No, they don't.
> 
> Actually, xsaves does but I don't think KVM_CHECK_EXTENSION is right.
> It's just another MSR, and we haven't used KVM_CHECK_EXTENSION for new
> MSRs and new XSAVE areas (last example: avx512).

If the changes needed are only to support migration (this is the case if
it's just another MSR handled by KVM, or additional XSAVE areas),
GET_SUPPORTED_CPUID is still reasonable, because features that are
unknown to QEMU are always considered unmigratable until we add the
feature name to the feature_name arrays. (That's why we need to know if
the feature introduces additional state when adding the feature names to
the array.)

If other changes are required to make the feature work even if no
migration is required, then adding them to GET_SUPPORTED_CPUID would
break "-cpu host" on older QEMUs. I don't think that's the case here,
but I wanted to confirm.

(Should we add those observations to Documentation/virtual/kvm/api.txt?)

> 
> Since no hardware really exists for it, and KVM does not support it
> anyway, I think it's simplest to leave xsaves out for now.  Is this right?

If unsure, it won't hurt to add the feature to unmigratable_features by
now. Making QEMU aware of the feature name is still useful.

-- 
Eduardo

[Qemu-devel] [PATCH] hmp: fix regression of HMP device_del auto-completion

2014-11-26 Thread Marcel Apfelbaum

The commits:
 - 6a1fa9f5 (monitor: add del completion for peripheral device)
 - 66e56b13 (qdev: add qdev_build_hotpluggable_device_list helper)

cause a QEMU crash when trying to use HMP device_del auto-completion.
It can be easily reproduced by:
 -enable-kvm  ~/images/fedora.qcow2 -monitor stdio -device 
virtio-net-pci,id=vnet

(qemu) device_del

/home/mapfelba/git/upstream/qemu/hw/core/qdev.c:941:qdev_build_hotpluggable_device_list:
 Object 0x7f6ce04e4fe0 is not an instance of type device
Aborted (core dumped)

The root cause is qdev_build_hotpluggable_device_list going recursively over
all peripherals and their children assuming all are devices. It doesn't work
since PCI devices have at least on child which is a memory region (bus master).

Solved by observing that all devices appear as direct children of
/machine/peripheral container. No need of going recursively
over all the children.

Signed-off-by: Marcel Apfelbaum 
---
 hw/core/qdev.c | 12 ++--
 include/hw/qdev-core.h |  2 +-
 monitor.c  | 11 ---
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 413b413..35fd00d 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -935,7 +935,7 @@ void qdev_alias_all_properties(DeviceState *target, Object 
*source)
 } while (class != object_class_by_name(TYPE_DEVICE));
 }
 
-int qdev_build_hotpluggable_device_list(Object *obj, void *opaque)
+static int qdev_add_hotpluggable_device(Object *obj, void *opaque)
 {
 GSList **list = opaque;
 DeviceState *dev = DEVICE(obj);
@@ -944,10 +944,18 @@ int qdev_build_hotpluggable_device_list(Object *obj, void 
*opaque)
 *list = g_slist_append(*list, dev);
 }
 
-object_child_foreach(obj, qdev_build_hotpluggable_device_list, opaque);
 return 0;
 }
 
+GSList *qdev_build_hotpluggable_device_list(Object *peripheral)
+{
+GSList *list = NULL;
+
+object_child_foreach(peripheral, qdev_add_hotpluggable_device, &list);
+
+return list;
+}
+
 static bool device_get_realized(Object *obj, Error **errp)
 {
 DeviceState *dev = DEVICE(obj);
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index d3a2940..589bbe7 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -365,7 +365,7 @@ extern int qdev_hotplug;
 
 char *qdev_get_dev_path(DeviceState *dev);
 
-int qdev_build_hotpluggable_device_list(Object *obj, void *opaque);
+GSList *qdev_build_hotpluggable_device_list(Object *peripheral);
 
 void qbus_set_hotplug_handler(BusState *bus, DeviceState *handler,
   Error **errp);
diff --git a/monitor.c b/monitor.c
index fa00594..f1031a1 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4321,17 +4321,14 @@ void object_add_completion(ReadLineState *rs, int 
nb_args, const char *str)
 static void peripheral_device_del_completion(ReadLineState *rs,
  const char *str, size_t len)
 {
-Object *peripheral;
-GSList *list = NULL, *item;
+Object *peripheral = container_get(qdev_get_machine(), "/peripheral");
+GSList *list, *item;
 
-peripheral = object_resolve_path("/machine/peripheral/", NULL);
-if (peripheral == NULL) {
+list = qdev_build_hotpluggable_device_list(peripheral);
+if (!list) {
 return;
 }
 
-object_child_foreach(peripheral, qdev_build_hotpluggable_device_list,
- &list);
-
 for (item = list; item; item = g_slist_next(item)) {
 DeviceState *dev = item->data;
 
-- 
1.9.3

Re: [Qemu-devel] [2.3 PATCH v7 02/10] qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove

2014-11-26 Thread Max Reitz


On 2014-11-25 at 20:46, John Snow wrote:

From: Fam Zheng 

The new command pair is added to manage user created dirty bitmap. The
dirty bitmap's name is mandatory and must be unique for the same device,
but different devices can have bitmaps with the same names.

The types added to block-core.json will be re-used in future patches
in this series, see:
'qapi: Add transaction support to block-dirty-bitmap-{add, enable, disable}'


Thanks, this helps. :-)


Signed-off-by: Fam Zheng 
Signed-off-by: John Snow 
---
  block.c   | 19 
  block/mirror.c| 10 +---
  blockdev.c| 63 +++
  include/block/block.h |  1 +
  qapi/block-core.json  | 58 +++
  qmp-commands.hx   | 49 +++
  6 files changed, 191 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index f94b753..a940345 100644
--- a/block.c
+++ b/block.c
@@ -5385,6 +5385,25 @@ int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap 
*bitmap, int64_t sector
  }
  }
  
+#define BDB_MIN_DEF_GRANULARITY 4096

+#define BDB_MAX_DEF_GRANULARITY 65536
+#define BDB_DEFAULT_GRANULARITY BDB_MAX_DEF_GRANULARITY


You mean this is the default for the default? ;-)


+
+int64_t bdrv_dbm_calc_def_granularity(BlockDriverState *bs)


You may want to make this a uint64_t so it's clear that this function 
does not return errors.



+{
+BlockDriverInfo bdi;
+int64_t granularity;
+
+if (bdrv_get_info(bs, &bdi) >= 0 && bdi.cluster_size != 0) {
+granularity = MAX(BDB_MIN_DEF_GRANULARITY, bdi.cluster_size);
+granularity = MIN(BDB_MAX_DEF_GRANULARITY, granularity);
+} else {
+granularity = BDB_DEFAULT_GRANULARITY;
+}
+
+return granularity;
+}
+
  void bdrv_dirty_iter_init(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap, HBitmapIter *hbi)
  {
diff --git a/block/mirror.c b/block/mirror.c
index 858e4ff..3633632 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -664,15 +664,7 @@ static void mirror_start_job(BlockDriverState *bs, 
BlockDriverState *target,
  MirrorBlockJob *s;
  
  if (granularity == 0) {

-/* Choose the default granularity based on the target file's cluster
- * size, clamped between 4k and 64k.  */
-BlockDriverInfo bdi;
-if (bdrv_get_info(target, &bdi) >= 0 && bdi.cluster_size != 0) {
-granularity = MAX(4096, bdi.cluster_size);
-granularity = MIN(65536, granularity);
-} else {
-granularity = 65536;
-}
+granularity = bdrv_dbm_calc_def_granularity(target);


Maybe you should note this replacement in the commit message.


  }
  
  assert ((granularity & (granularity - 1)) == 0);

diff --git a/blockdev.c b/blockdev.c
index 57910b8..e2fe687 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1810,6 +1810,69 @@ void qmp_block_set_io_throttle(const char *device, 
int64_t bps, int64_t bps_rd,
  aio_context_release(aio_context);
  }
  
+void qmp_block_dirty_bitmap_add(const char *device, const char *name,

+bool has_granularity, int64_t granularity,
+Error **errp)
+{
+BlockDriverState *bs;
+Error *local_err = NULL;
+
+if (!device) {
+error_setg(errp, "Device to add dirty bitmap to must not be null");
+return;
+}


I don't know if checking for that case makes sense, but of course it 
won't hurt. But...[1]



+
+bs = bdrv_lookup_bs(device, NULL, &local_err);


Fair enough, I'd still like blk_by_name() and blk_bs() more 
(bdrv_lookup_bs() uses blk_bs(blk_by_name()) for the device name just as 
bdrv_find() did), but this is at least not a completely trivial wrapper.



+if (!bs) {
+error_propagate(errp, local_err);


Simply calling bdrv_lookup_bs(device, NULL, errp); suffices, no need for 
a local Error object and error_propagate(). But I'm fine with it either way.



+return;
+}
+
+if (!name || name[0] == '\0') {
+error_setg(errp, "Bitmap name cannot be empty");
+return;
+}
+if (has_granularity) {
+if (granularity < 512 || !is_power_of_2(granularity)) {
+error_setg(errp, "Granularity must be power of 2 "
+ "and at least 512");
+return;
+}
+} else {
+/* Default to cluster size, if available: */
+granularity = bdrv_dbm_calc_def_granularity(bs);
+}
+
+bdrv_create_dirty_bitmap(bs, granularity, name, errp);
+}
+
+void qmp_block_dirty_bitmap_remove(const char *device, const char *name,
+   Error **errp)
+{
+BlockDriverState *bs;
+BdrvDirtyBitmap *bitmap;
+Error *local_err = NULL;


[1] why aren't you minding !device here?


+
+bs = bdrv_lookup_bs(device, NULL, &local_err);
+if (!bs) {
+error_propagate(errp, local_err);

Re: [Qemu-devel] [PATCH] hmp: fix regression of HMP device_del auto-completion

2014-11-26 Thread Igor Mammedov

On Wed, 26 Nov 2014 13:50:01 +0200
Marcel Apfelbaum  wrote:

> The commits:
>  - 6a1fa9f5 (monitor: add del completion for peripheral device)
>  - 66e56b13 (qdev: add qdev_build_hotpluggable_device_list helper)
> 
> cause a QEMU crash when trying to use HMP device_del auto-completion.
> It can be easily reproduced by:
>  -enable-kvm  ~/images/fedora.qcow2 -monitor stdio -device 
> virtio-net-pci,id=vnet
> 
> (qemu) device_del
> 
> /home/mapfelba/git/upstream/qemu/hw/core/qdev.c:941:qdev_build_hotpluggable_device_list:
>  Object 0x7f6ce04e4fe0 is not an instance of type device
> Aborted (core dumped)
> 
> The root cause is qdev_build_hotpluggable_device_list going recursively over
> all peripherals and their children assuming all are devices. It doesn't work
> since PCI devices have at least on child which is a memory region (bus 
> master).
> 
> Solved by observing that all devices appear as direct children of
> /machine/peripheral container. No need of going recursively
> over all the children.
> 
> Signed-off-by: Marcel Apfelbaum 
Reviewed-by: Igor Mammedov 

> ---
>  hw/core/qdev.c | 12 ++--
>  include/hw/qdev-core.h |  2 +-
>  monitor.c  | 11 ---
>  3 files changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index 413b413..35fd00d 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -935,7 +935,7 @@ void qdev_alias_all_properties(DeviceState *target, 
> Object *source)
>  } while (class != object_class_by_name(TYPE_DEVICE));
>  }
>  
> -int qdev_build_hotpluggable_device_list(Object *obj, void *opaque)
> +static int qdev_add_hotpluggable_device(Object *obj, void *opaque)
>  {
>  GSList **list = opaque;
>  DeviceState *dev = DEVICE(obj);
> @@ -944,10 +944,18 @@ int qdev_build_hotpluggable_device_list(Object *obj, 
> void *opaque)
>  *list = g_slist_append(*list, dev);
>  }
>  
> -object_child_foreach(obj, qdev_build_hotpluggable_device_list, opaque);
>  return 0;
>  }
>  
> +GSList *qdev_build_hotpluggable_device_list(Object *peripheral)
> +{
> +GSList *list = NULL;
> +
> +object_child_foreach(peripheral, qdev_add_hotpluggable_device, &list);
> +
> +return list;
> +}
> +
>  static bool device_get_realized(Object *obj, Error **errp)
>  {
>  DeviceState *dev = DEVICE(obj);
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index d3a2940..589bbe7 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -365,7 +365,7 @@ extern int qdev_hotplug;
>  
>  char *qdev_get_dev_path(DeviceState *dev);
>  
> -int qdev_build_hotpluggable_device_list(Object *obj, void *opaque);
> +GSList *qdev_build_hotpluggable_device_list(Object *peripheral);
>  
>  void qbus_set_hotplug_handler(BusState *bus, DeviceState *handler,
>Error **errp);
> diff --git a/monitor.c b/monitor.c
> index fa00594..f1031a1 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -4321,17 +4321,14 @@ void object_add_completion(ReadLineState *rs, int 
> nb_args, const char *str)
>  static void peripheral_device_del_completion(ReadLineState *rs,
>   const char *str, size_t len)
>  {
> -Object *peripheral;
> -GSList *list = NULL, *item;
> +Object *peripheral = container_get(qdev_get_machine(), "/peripheral");
> +GSList *list, *item;
>  
> -peripheral = object_resolve_path("/machine/peripheral/", NULL);
> -if (peripheral == NULL) {
> +list = qdev_build_hotpluggable_device_list(peripheral);
> +if (!list) {
>  return;
>  }
>  
> -object_child_foreach(peripheral, qdev_build_hotpluggable_device_list,
> - &list);
> -
>  for (item = list; item; item = g_slist_next(item)) {
>  DeviceState *dev = item->data;
>

Re: [Qemu-devel] [RFC PATCH v5 20/31] replay: recording and replaying clock ticks

2014-11-26 Thread Pavel Dovgaluk

> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> On 26/11/2014 11:40, Pavel Dovgalyuk wrote:
> > +/* real time host monotonic timer implementation */
> > +static inline int64_t get_clock_realtime_impl(void)
> >  {
> >  struct timeval tv;
> >
> > @@ -708,6 +709,12 @@ static inline int64_t get_clock_realtime(void)
> >  return tv.tv_sec * 10LL + (tv.tv_usec * 1000);
> >  }
> >
> > +/* real time host monotonic timer interface */
> > +static inline int64_t get_clock_realtime(void)
> > +{
> > +return REPLAY_CLOCK(REPLAY_CLOCK_HOST, get_clock_realtime_impl());
> > +}
> > +
> 
> Any reason to do this instead of using REPLAY_CLOCK in qemu_get_clock,
> like you do for QEMU_CLOCK_VIRTUAL_RT?

hw/ppc.c uses this functions in pre_save and post_load function.
It seems that these calls' results also should be logged by replay.

Pavel Dovgalyuk

Re: [Qemu-devel] [RFC PATCH v5 22/31] timer: introduce new QEMU_CLOCK_VIRTUAL_RT clock

2014-11-26 Thread Pavel Dovgaluk

> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> On 26/11/2014 11:40, Pavel Dovgalyuk wrote:
> > This patch introduces new QEMU_CLOCK_VIRTUAL_RT clock, which
> > should be used for icount warping. Separate timer is needed
> > for replaying the execution, because warping callbacks should
> > be deterministic. We cannot make realtime clock deterministic
> > because it is used for screen updates and other simulator-specific
> > actions. That is why we added new clock which is recorded and
> > replayed when needed.
> >
> > Signed-off-by: Pavel Dovgalyuk 
> > ---
> >  include/qemu/timer.h |7 +++
> >  qemu-timer.c |2 ++
> >  replay/replay.h  |4 +++-
> >  3 files changed, 12 insertions(+), 1 deletions(-)
> >
> > diff --git a/include/qemu/timer.h b/include/qemu/timer.h
> > index 7b43331..df27157 100644
> > --- a/include/qemu/timer.h
> > +++ b/include/qemu/timer.h
> > @@ -37,12 +37,19 @@
> >   * is suspended, and it will reflect system time changes the host may
> >   * undergo (e.g. due to NTP). The host clock has the same precision as
> >   * the virtual clock.
> > + *
> > + * @QEMU_CLOCK_VIRTUAL_RT: realtime clock used for icount warp
> > + *
> > + * This clock runs as a realtime clock, but is used for icount warp
> > + * and thus should be traced with record/replay to make warp function
> > + * behave deterministically.
> >   */
> 
> I think it should also stop/restart across "stop" and "cont" commands,
> similar to QEMU_CLOCK_VIRTUAL.  This is as simple as changing
> get_clock() to cpu_get_clock().

Ok, then I'll have to remove !use_icount check from here and retest the series.

void cpu_enable_ticks(void)
{
/* Here, the really thing protected by seqlock is cpu_clock_offset. */
seqlock_write_lock(&timers_state.vm_clock_seqlock);
if (!timers_state.cpu_ticks_enabled) {
if (!use_icount) {
timers_state.cpu_ticks_offset -= cpu_get_real_ticks();
timers_state.cpu_clock_offset -= get_clock();
}
timers_state.cpu_ticks_enabled = 1;
}
seqlock_write_unlock(&timers_state.vm_clock_seqlock);
}

> This way, QEMU_CLOCK_VIRTUAL_RT is "what QEMU_CLOCK_VIRTUAL does without
> -icount".  This makes a lot of sense and can be merged in 2.3
> independent of the rest of the series.

Pavel Dovgalyuk

Re: [Qemu-devel] [PATCH 1/2] balloon: call qdev_alias_all_properties for proxy dev in balloon class init

2014-11-26 Thread Cornelia Huck

On Wed, 26 Nov 2014 13:11:24 +0300
"Denis V. Lunev"  wrote:

> From: Raushaniya Maksudova 
> 
> The idea is that all other virtio devices are calling this helper
> to merge properties of the proxy device. This is the only difference
> in between this helper and code in inside virtio_instance_init_common.
> The patch should not cause any harm as property list in generic balloon
> code is empty.
> 
> This also allows to avoid some dummy errors like fixed by this
> commit 91ba21208839643603e7f7fa5864723c3f371ebe
> Author: Gonglei 
> Date:   Tue Sep 30 14:10:35 2014 +0800
> virtio-balloon: fix virtio-balloon child refcount in transports
> 
> Signed-off-by: Denis V. Lunev 
> Acked-by: Raushaniya Maksudova 
> CC: Cornelia Huck 
> CC: Christian Borntraeger 
> CC: Anthony Liguori 
> CC: Michael S. Tsirkin 
> ---
>  hw/s390x/virtio-ccw.c  | 5 ++---
>  hw/virtio/virtio-pci.c | 5 ++---
>  2 files changed, 4 insertions(+), 6 deletions(-)

Shouldn't this have the sign-off of the author (rather than the ack) as
well?

Otherwise, looks sane.

Re: [Qemu-devel] [2.3 PATCH v7 03/10] block: Introduce bdrv_dirty_bitmap_granularity()

2014-11-26 Thread Max Reitz


On 2014-11-25 at 20:46, John Snow wrote:

From: Fam Zheng 

This returns the granularity (in bytes) of dirty bitmap,
which matches the QMP interface and the existing query
interface.

Signed-off-by: Fam Zheng 
Reviewed-by: Benoit Canet 


Maybe you should have removed the R-b because of the functional changes, 
but Benoît should complain about that, not me.



Signed-off-by: John Snow 
---
  block.c   | 9 +++--
  include/block/block.h | 2 ++
  2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index a940345..ea4c8d8 100644
--- a/block.c
+++ b/block.c
@@ -5364,8 +5364,7 @@ BlockDirtyInfoList 
*bdrv_query_dirty_bitmaps(BlockDriverState *bs)
  BlockDirtyInfo *info = g_new0(BlockDirtyInfo, 1);
  BlockDirtyInfoList *entry = g_new0(BlockDirtyInfoList, 1);
  info->count = bdrv_get_dirty_count(bs, bm);
-info->granularity =
-((int64_t) BDRV_SECTOR_SIZE << hbitmap_granularity(bm->bitmap));
+info->granularity = bdrv_dirty_bitmap_granularity(bs, bm);
  info->has_name = !!bm->name;
  info->name = g_strdup(bm->name);
  entry->value = info;
@@ -5404,6 +5403,12 @@ int64_t bdrv_dbm_calc_def_granularity(BlockDriverState 
*bs)
  return granularity;
  }
  
+int64_t bdrv_dirty_bitmap_granularity(BlockDriverState *bs,

+  BdrvDirtyBitmap *bitmap)
+{
+return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
+}
+


Oh, BDRV_SECTOR_SIZE is an unsigned long long. Great, I didn't know. (So 
this can't overflow)


Reviewed-by: Max Reitz

Re: [Qemu-devel] [2.3 PATCH v7 04/10] hbitmap: Add hbitmap_copy

2014-11-26 Thread Max Reitz


On 2014-11-25 at 20:46, John Snow wrote:

From: Fam Zheng 

This makes a deep copy of an HBitmap.

Signed-off-by: Fam Zheng 
Signed-off-by: John Snow 
---
  include/qemu/hbitmap.h |  8 
  util/hbitmap.c | 16 
  2 files changed, 24 insertions(+)

diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 550d7ce..b645cfc 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -65,6 +65,14 @@ struct HBitmapIter {
  HBitmap *hbitmap_alloc(uint64_t size, int granularity);
  
  /**

+ * hbitmap_copy:
+ * @bitmap: The original bitmap to copy.
+ *
+ * Copy a HBitmap.
+ */
+HBitmap *hbitmap_copy(const HBitmap *bitmap);
+
+/**
   * hbitmap_empty:
   * @hb: HBitmap to operate on.
   *
diff --git a/util/hbitmap.c b/util/hbitmap.c
index b3060e6..eddc05b 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -395,3 +395,19 @@ HBitmap *hbitmap_alloc(uint64_t size, int granularity)
  hb->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
  return hb;
  }
+
+HBitmap *hbitmap_copy(const HBitmap *bitmap)
+{
+int i;
+uint64_t size;
+HBitmap *hb = g_memdup(bitmap, sizeof(HBitmap));
+
+size = bitmap->size;
+for (i = HBITMAP_LEVELS; i >= 0; i--) {


Should be HBITMAP_LEVELS - 1.

Max


+size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
+hb->levels[i] = g_memdup(bitmap->levels[i],
+ size * sizeof(unsigned long));
+}
+
+return hb;
+}

Re: [Qemu-devel] [2.3 PATCH v7 05/10] block: Add bdrv_copy_dirty_bitmap and bdrv_reset_dirty_bitmap

2014-11-26 Thread Max Reitz


On 2014-11-25 at 20:46, John Snow wrote:

From: Fam Zheng 

Signed-off-by: Fam Zheng 
Signed-off-by: John Snow 
---
  block.c   | 35 +++
  include/block/block.h |  4 
  2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index ea4c8d8..9582550 100644
--- a/block.c
+++ b/block.c
@@ -53,6 +53,8 @@
  
  struct BdrvDirtyBitmap {

  HBitmap *bitmap;
+int64_t size;
+int64_t granularity;
  char *name;
  QLIST_ENTRY(BdrvDirtyBitmap) list;
  };
@@ -5311,6 +5313,26 @@ void bdrv_dirty_bitmap_make_anon(BlockDriverState *bs, 
BdrvDirtyBitmap *bitmap)
  bitmap->name = NULL;
  }
  
+BdrvDirtyBitmap *bdrv_copy_dirty_bitmap(BlockDriverState *bs,

+const BdrvDirtyBitmap *bitmap,
+const char *name)
+{
+BdrvDirtyBitmap *new_bitmap;
+
+new_bitmap = g_malloc0(sizeof(BdrvDirtyBitmap));
+new_bitmap->bitmap = hbitmap_copy(bitmap->bitmap);
+new_bitmap->size = bitmap->size;
+new_bitmap->granularity = bitmap->granularity;
+new_bitmap->name = name ? g_strdup(name) : NULL;


You changed this in patch 1 so you may want to change it here, too. I'm 
fine with it either way.



+QLIST_INSERT_HEAD(&bs->dirty_bitmaps, new_bitmap, list);
+return new_bitmap;
+}
+
+void bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
+{
+hbitmap_reset(bitmap->bitmap, 0, bitmap->size);
+}
+
  BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
int granularity,
const char *name,
@@ -5318,6 +5340,7 @@ BdrvDirtyBitmap 
*bdrv_create_dirty_bitmap(BlockDriverState *bs,
  {
  int64_t bitmap_size;
  BdrvDirtyBitmap *bitmap;
+int sector_granularity;
  
  assert((granularity & (granularity - 1)) == 0);
  
@@ -5325,8 +5348,8 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,

  error_setg(errp, "Bitmap already exists: %s", name);
  return NULL;
  }
-granularity >>= BDRV_SECTOR_BITS;
-assert(granularity);
+sector_granularity = granularity >> BDRV_SECTOR_BITS;
+assert(sector_granularity);
  bitmap_size = bdrv_nb_sectors(bs);
  if (bitmap_size < 0) {
  error_setg_errno(errp, -bitmap_size, "could not get length of 
device");
@@ -5334,7 +5357,9 @@ BdrvDirtyBitmap 
*bdrv_create_dirty_bitmap(BlockDriverState *bs,
  return NULL;
  }
  bitmap = g_new0(BdrvDirtyBitmap, 1);
-bitmap->bitmap = hbitmap_alloc(bitmap_size, ffs(granularity) - 1);
+bitmap->size = bitmap_size;
+bitmap->granularity = granularity;
+bitmap->bitmap = hbitmap_alloc(bitmap->size, ffs(sector_granularity) - 1);
  bitmap->name = g_strdup(name);
  QLIST_INSERT_HEAD(&bs->dirty_bitmaps, bitmap, list);
  return bitmap;
@@ -5406,7 +5431,9 @@ int64_t bdrv_dbm_calc_def_granularity(BlockDriverState 
*bs)
  int64_t bdrv_dirty_bitmap_granularity(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap)
  {
-return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
+g_assert(BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap) == \
+ bitmap->granularity);


Do you really need that backslash?

If you don't and remove it, or if you do and keep it, and with either 
"name ? g_strdup(name) : NULL" left as-is or replace by "g_strdup(name)":


Reviewed-by: Max Reitz

Re: [Qemu-devel] [PATCH] target-i386: add feature flags for CPUID[EAX=0xd, ECX=1]

2014-11-26 Thread Paolo Bonzini



On 26/11/2014 12:40, Eduardo Habkost wrote:
> On Wed, Nov 26, 2014 at 10:20:12AM +0100, Paolo Bonzini wrote:
>>
>>
>> On 25/11/2014 21:02, Paolo Bonzini wrote:
> +static const char *cpuid_xsave_feature_name[] = {
> +"xsaveopt", "xsavec", "xgetbv1", "xsaves",

 None of the above features introduce any new state that might need to be
 migrated, or will require other changes in QEMU to work, right?

 It looks like they don't introduce any extra state, but if they do, they
 need to be added to unmigratable_flags until migration support is
 implemented.

 If they require other QEMU changes, it would be nice if KVM reported
 them using KVM_CHECK_EXTENSION instead of GET_SUPPORTED_CPUID, so it
 wouldn't break "-cpu host".
>>>
>>> No, they don't.
>>
>> Actually, xsaves does but I don't think KVM_CHECK_EXTENSION is right.
>> It's just another MSR, and we haven't used KVM_CHECK_EXTENSION for new
>> MSRs and new XSAVE areas (last example: avx512).
> 
> If the changes needed are only to support migration (this is the case if
> it's just another MSR handled by KVM, or additional XSAVE areas),
> GET_SUPPORTED_CPUID is still reasonable, because features that are
> unknown to QEMU are always considered unmigratable until we add the
> feature name to the feature_name arrays. (That's why we need to know if
> the feature introduces additional state when adding the feature names to
> the array.)
> 
> If other changes are required to make the feature work even if no
> migration is required, then adding them to GET_SUPPORTED_CPUID would
> break "-cpu host" on older QEMUs. I don't think that's the case here,
> but I wanted to confirm.

KVM may need more changes (I don't know, the details of the feature are
not public yet), but a new userspace API is very unlikely based on Intel
documentation.

> (Should we add those observations to Documentation/virtual/kvm/api.txt?)
> 
>>
>> Since no hardware really exists for it, and KVM does not support it
>> anyway, I think it's simplest to leave xsaves out for now.  Is this right?
> 
> If unsure, it won't hurt to add the feature to unmigratable_features by
> now. Making QEMU aware of the feature name is still useful.

Ok, thanks.

Paolo

Re: [Qemu-devel] [RFC PATCH v5 20/31] replay: recording and replaying clock ticks

2014-11-26 Thread Paolo Bonzini



On 26/11/2014 13:22, Pavel Dovgaluk wrote:
>> > Any reason to do this instead of using REPLAY_CLOCK in qemu_get_clock,
>> > like you do for QEMU_CLOCK_VIRTUAL_RT?
> hw/ppc.c uses this functions in pre_save and post_load function.
> It seems that these calls' results also should be logged by replay.

It should use qemu_get_clock_ns(QEMU_CLOCK_REALTIME) instead; same for
block/raw-posix.c and target-mips/kvm.c.

Paolo

Re: [Qemu-devel] [2.3 PATCH v7 06/10] qmp: Add block-dirty-bitmap-enable and block-dirty-bitmap-disable

2014-11-26 Thread Max Reitz


On 2014-11-25 at 20:46, John Snow wrote:

From: Fam Zheng 

This allows to put the dirty bitmap into a disabled state where no more
writes will be tracked.

It will be used before backup or writing to persistent file.

Signed-off-by: Fam Zheng 
Signed-off-by: John Snow 
---
  block.c   | 15 +
  blockdev.c| 62 +++
  include/block/block.h |  2 ++
  qapi/block-core.json  | 28 +++
  qmp-commands.hx   | 10 +
  5 files changed, 117 insertions(+)

diff --git a/block.c b/block.c
index 9582550..7217066 100644
--- a/block.c
+++ b/block.c
@@ -56,6 +56,7 @@ struct BdrvDirtyBitmap {
  int64_t size;
  int64_t granularity;
  char *name;
+bool enabled;
  QLIST_ENTRY(BdrvDirtyBitmap) list;
  };
  
@@ -5361,6 +5362,7 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,

  bitmap->granularity = granularity;
  bitmap->bitmap = hbitmap_alloc(bitmap->size, ffs(sector_granularity) - 1);
  bitmap->name = g_strdup(name);
+bitmap->enabled = true;
  QLIST_INSERT_HEAD(&bs->dirty_bitmaps, bitmap, list);
  return bitmap;
  }
@@ -5379,6 +5381,16 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, 
BdrvDirtyBitmap *bitmap)
  }
  }
  
+void bdrv_disable_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)

+{
+bitmap->enabled = false;
+}
+
+void bdrv_enable_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
+{
+bitmap->enabled = true;
+}
+
  BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
  {
  BdrvDirtyBitmap *bm;
@@ -5447,6 +5459,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t 
cur_sector,
  {
  BdrvDirtyBitmap *bitmap;
  QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
+if (!bitmap->enabled) {
+continue;
+}
  hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
  }
  }
diff --git a/blockdev.c b/blockdev.c
index e2fe687..baaf902 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1873,6 +1873,68 @@ void qmp_block_dirty_bitmap_remove(const char *device, 
const char *name,
  bdrv_release_dirty_bitmap(bs, bitmap);
  }
  
+static BdrvDirtyBitmap *block_dirty_bitmap_lookup(const char *device,

+  const char *name,
+  Error **errp)
+{
+BlockDriverState *bs;
+BdrvDirtyBitmap *bitmap;
+Error *local_err = NULL;
+
+if (!device) {
+error_setg(errp, "Device cannot be NULL");
+return NULL;
+}
+if (!name) {
+error_setg(errp, "Bitmap name cannot be NULL");
+return NULL;
+}
+
+bs = bdrv_lookup_bs(device, NULL, &local_err);
+if (!bs) {
+error_propagate(errp, local_err);


Comments from patch 2 apply here as well: I'm still in favor of 
blk_by_name(), and you don't need local_err.



+return NULL;
+}
+
+bitmap = bdrv_find_dirty_bitmap(bs, name);
+if (!bitmap) {
+error_setg(errp, "Dirty bitmap not found: %s", name);
+return NULL;
+}
+
+return bitmap;
+}
+
+void qmp_block_dirty_bitmap_enable(const char *device, const char *name,
+   Error **errp)
+{
+BdrvDirtyBitmap *bitmap;
+Error *local_err = NULL;
+
+bitmap = block_dirty_bitmap_lookup(device, name, &local_err);
+if (!bitmap) {
+error_propagate(errp, local_err);


Again, no need for error_propagate().


+return;
+}
+
+bdrv_enable_dirty_bitmap(NULL, bitmap);
+}
+
+void qmp_block_dirty_bitmap_disable(const char *device, const char *name,
+Error **errp)
+{
+BdrvDirtyBitmap *bitmap;
+Error *local_err = NULL;
+
+bitmap = block_dirty_bitmap_lookup(device, name, &local_err);
+if (!bitmap) {
+error_propagate(errp, local_err);


Here, too.

With or without any of the "ret = foo(..., &local_err); if (ret 
indicates error) { error_propagate(errp, local_err);" replaced by "ret = 
foo(..., errp); if (ret indicates error) {":


Reviewed-by: Max Reitz

Re: [Qemu-devel] [RFC PATCH v5 02/31] acpi: accurate overflow check

2014-11-26 Thread Paolo Bonzini



On 26/11/2014 11:38, Pavel Dovgalyuk wrote:
> Compare clock in ns, because acpi_pm_tmr_update uses rounded
> to ns value instead of ticks.
> 
> Reviewed-by: Paolo Bonzini 
> 
> Signed-off-by: Pavel Dovgalyuk 
> ---
>  hw/acpi/core.c |7 +--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/acpi/core.c b/hw/acpi/core.c
> index a7368fb..51913d6 100644
> --- a/hw/acpi/core.c
> +++ b/hw/acpi/core.c
> @@ -376,8 +376,11 @@ static void acpi_notify_wakeup(Notifier *notifier, void 
> *data)
>  /* ACPI PM1a EVT */
>  uint16_t acpi_pm1_evt_get_sts(ACPIREGS *ar)
>  {
> -int64_t d = acpi_pm_tmr_get_clock();
> -if (d >= ar->tmr.overflow_time) {
> +/* Compare ns-clock, not PM timer ticks, because
> +   acpi_pm_tmr_update function uses ns for setting the timer. */
> +int64_t d = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> +if (d >= muldiv64(ar->tmr.overflow_time,
> +  get_ticks_per_sec(), PM_TIMER_FREQUENCY)) {
>  ar->pm1.evt.sts |= ACPI_BITMASK_TIMER_STATUS;
>  }
>  return ar->pm1.evt.sts;

This is commit 3ef0eab178e5120a0e1c079d163d5c71689d9b71. :)

With it, Windows can boot in -icount mode (at least I tested XP).

Paolo

Re: [Qemu-devel] [PATCH 1/2] balloon: call qdev_alias_all_properties for proxy dev in balloon class init

2014-11-26 Thread Denis V. Lunev


On 26/11/14 15:27, Cornelia Huck wrote:

On Wed, 26 Nov 2014 13:11:24 +0300
"Denis V. Lunev"  wrote:


From: Raushaniya Maksudova 

The idea is that all other virtio devices are calling this helper
to merge properties of the proxy device. This is the only difference
in between this helper and code in inside virtio_instance_init_common.
The patch should not cause any harm as property list in generic balloon
code is empty.

This also allows to avoid some dummy errors like fixed by this
 commit 91ba21208839643603e7f7fa5864723c3f371ebe
 Author: Gonglei 
 Date:   Tue Sep 30 14:10:35 2014 +0800
 virtio-balloon: fix virtio-balloon child refcount in transports

Signed-off-by: Denis V. Lunev 
Acked-by: Raushaniya Maksudova 
CC: Cornelia Huck 
CC: Christian Borntraeger 
CC: Anthony Liguori 
CC: Michael S. Tsirkin 
---
  hw/s390x/virtio-ccw.c  | 5 ++---
  hw/virtio/virtio-pci.c | 5 ++---
  2 files changed, 4 insertions(+), 6 deletions(-)

Shouldn't this have the sign-off of the author (rather than the ack) as
well?

Otherwise, looks sane.


I am an original author of the patch and prepared it to Rushana
to pass command line options for her next patch. I'll fix this
line in the next submission.

Re: [Qemu-devel] [Bug 1395217] Re: Networking in qemu 2.0.0 and beyond is not compatible with Open Solaris (Illumos) 5.11

2014-11-26 Thread Markus Armbruster

Tim Dawson  writes:

> Additional test (I just don't know when to go to bed . . . *sigh* . . .
> ).
>
> In a checkout of the 2.1.2 code base, and based on the above failing
> commit as per bisect, I removed the change in the commit for
> target-i386/cpu.c of the line:
>
> [FEAT_1_ECX] = CPUID_EXT_X1APIC,
>
> as added by the errant commit, recompiled, and networking is now working
> with Illumos in 2.1.2, so this commit is definitely not as innocent as
> it may appear.

Possible workaround for unpatched code: -cpu qemu64,-x2apic.  From
memory, copying Eduardo to correct me in case I screwed it up.

Re: [Qemu-devel] [RFC PATCH v5 00/31] Deterministic replay and reverse execution

2014-11-26 Thread Paolo Bonzini



On 26/11/2014 11:38, Pavel Dovgalyuk wrote:
> This set of patches is related to the reverse execution and deterministic 
> replay of qemu execution  Our implementation of deterministic replay can 
> be used for deterministic and reverse debugging of guest code through gdb 
> remote interface.

Lots of progress!

I think these patches are now mergeable:

Paolo Bonzini (4):
  target-ppc: pass DisasContext to SPR generator functions
  cpu-exec: add a new CF_USE_ICOUNT cflag
  translate: check cflags instead of use_icount global
  gen-icount: check cflags instead of use_icount global

Pavel Dovgalyuk (6):
  cpu-exec: fix cpu_exec_nocache
  cpu-exec: reset exception_index correctly
  i386: do not cross the pages boundaries in replay mode
  cpu-exec: invalidate nocache translation if they are interrupted
  timer: introduce new QEMU_CLOCK_VIRTUAL_RT clock
  cpus: make icount warp behave well with respect to stop/cont

(The last is "cpus: make icount warp deterministic in replay mode".

Alex, can you ACK "target-ppc: pass DisasContext to SPR generator
functions" please?  In general a review of the other three patches by me
would be welcome.

Thanks,

Paolo

[Qemu-devel] [Bug 1395217] Re: Networking in qemu 2.0.0 and beyond is not compatible with Open Solaris (Illumos) 5.11

2014-11-26 Thread Eduardo Habkost

It is runtime selectable using "-cpu ...,-x2apic" (as indicated by
Markus on qemu-devel).

First thing we need to find out is if it fails on the newest CPU model
that can be run in enforce mode.

So, assuming you are running on an Intel host CPU, it would be
interesting to test those CPU models in this order, until you have one
that actually boots:

 -cpu Broadwell,enforce
 -cpu Haswell,enforce
 -cpu SandyBridge,enforce
 -cpu Westmere,enforce
 -cpu Nehalem,enforce
 -cpu Penryn,enforce
 -cpu Conroe,enforce

Testing of:
  -cpu host
would be interesting, too.

If the latest CPU model (or -cpu host) have working networking, that
means Solaris (or QEMU NIC emulation code) doesn't like to see an old
CPU with x2apic enabled. If it doesn't work even using the latest CPU
model (and -cpu host), that means Solaris (or QEMU NIC emulation)
doesn't like the x2apic implementation of KVM at all (and that could
mean a Solaris bug, a QEMU bug, or a KVM x2apic emulation bug).

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1395217

Title:
  Networking in qemu 2.0.0 and beyond is not compatible with Open
  Solaris (Illumos) 5.11

Status in QEMU:
  New

Bug description:
  The networking code in qemu in versions 2.0.0 and beyond is non-
  functional with Solaris/Illumos 5.11 images.

  Building 1.7.1, 2.0.0, 2.0.2, 2.1.2,and 2.2.0rc1with the following
  standard Slackware config:

  # From Slackware build tree . . . 
  ./configure \
--prefix=/usr \
--libdir=/usr/lib64 \
--sysconfdir=/etc \
--localstatedir=/var \
--enable-gtk \
--enable-system \
--enable-kvm \
--disable-debug-info \
--enable-virtfs \
--enable-sdl \
--audio-drv-list=alsa,oss,sdl,esd \
--enable-libusb \
--disable-vnc \
--target-list=x86_64-linux-user,i386-linux-user,x86_64-softmmu,i386-softmmu 
\
--enable-spice \
--enable-usb-redir 

  
  And attempting to run the same VM image with the following command (or via 
virt-manager):

  macaddress="DE:AD:BE:EF:3F:A4"

  qemu-system-x86_64 nex4x -cdrom /dev/cdrom -name "Nex41" -cpu Westmere
  -machine accel=kvm -smp 2 -m 4000 -net nic,macaddr=$macaddress  -net 
bridge,br=b
  r0 -net dump,file=/usr1/tmp/ -drive file=nex4x_d1 -drive 
file=nex4x_d2
   -enable-kvm

  Gives success on 1.7.1, and a deaf VM on all subsequent versions.

  Notable in validating my config, is that a Windows 7 image runs
  cleanly with networking on *all* builds, so my configuration appears
  to be good - qemu just hates Solaris at this point.

  Watching with wireshark (as well as pulling network traces from qemu
  as noted above) it appears that the notable difference in the two
  configs is that for some reason, Solaris gets stuck arping for it's
  own interface on startup, and never really comes on line on the
  network.  If other hosts attempt to ping the Solaris instance, they
  can successfully arp the bad VM, but not the other way around.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1395217/+subscriptions

[Qemu-devel] [PATCH] ppc: do not use get_clock_realtime()

2014-11-26 Thread Paolo Bonzini

Use the external qemu-timer API instead.

Cc: qemu-...@nongnu.org
Signed-off-by: Paolo Bonzini 
---
 hw/ppc/ppc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index bec82cd..5ce565d 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -844,7 +844,7 @@ static void timebase_pre_save(void *opaque)
 return;
 }
 
-tb->time_of_the_day_ns = get_clock_realtime();
+tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
 /*
  * tb_offset is only expected to be changed by migration so
  * there is no need to update it from KVM here
@@ -873,7 +873,7 @@ static int timebase_post_load(void *opaque, int version_id)
  * We try to adjust timebase by downtime if host clocks are not
  * too much out of sync (1 second for now).
  */
-host_ns = get_clock_realtime();
+host_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
 ns_diff = MAX(0, host_ns - tb_remote->time_of_the_day_ns);
 migration_duration_ns = MIN(NSEC_PER_SEC, ns_diff);
 migration_duration_tb = muldiv64(migration_duration_ns, freq, 
NSEC_PER_SEC);
-- 
1.8.3.1

[Qemu-devel] [PATCH] block: do not use get_clock()

2014-11-26 Thread Paolo Bonzini

Use the external qemu-timer API instead.

Cc: kw...@redhat.com
Cc: stefa...@redhat.com
Signed-off-by: Paolo Bonzini 
---
 block/accounting.c | 6 --
 block/raw-posix.c  | 8 
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/block/accounting.c b/block/accounting.c
index edbb1cc..18102f0 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -24,6 +24,7 @@
 
 #include "block/accounting.h"
 #include "block/block_int.h"
+#include "qemu/timer.h"
 
 void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie,
   int64_t bytes, enum BlockAcctType type)
@@ -31,7 +32,7 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie 
*cookie,
 assert(type < BLOCK_MAX_IOTYPE);
 
 cookie->bytes = bytes;
-cookie->start_time_ns = get_clock();
+cookie->start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 cookie->type = type;
 }
 
@@ -41,7 +42,8 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie 
*cookie)
 
 stats->nr_bytes[cookie->type] += cookie->bytes;
 stats->nr_ops[cookie->type]++;
-stats->total_time_ns[cookie->type] += get_clock() - cookie->start_time_ns;
+stats->total_time_ns[cookie->type] +=
+qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - cookie->start_time_ns;
 }
 
 
diff --git a/block/raw-posix.c b/block/raw-posix.c
index b1af77e..02e107f 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1922,7 +1922,7 @@ static int fd_open(BlockDriverState *bs)
 return 0;
 last_media_present = (s->fd >= 0);
 if (s->fd >= 0 &&
-(get_clock() - s->fd_open_time) >= FD_OPEN_TIMEOUT) {
+(qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - s->fd_open_time) >= 
FD_OPEN_TIMEOUT) {
 qemu_close(s->fd);
 s->fd = -1;
 #ifdef DEBUG_FLOPPY
@@ -1931,7 +1931,7 @@ static int fd_open(BlockDriverState *bs)
 }
 if (s->fd < 0) {
 if (s->fd_got_error &&
-(get_clock() - s->fd_error_time) < FD_OPEN_TIMEOUT) {
+(qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - s->fd_error_time) < 
FD_OPEN_TIMEOUT) {
 #ifdef DEBUG_FLOPPY
 printf("No floppy (open delayed)\n");
 #endif
@@ -1939,7 +1939,7 @@ static int fd_open(BlockDriverState *bs)
 }
 s->fd = qemu_open(bs->filename, s->open_flags & ~O_NONBLOCK);
 if (s->fd < 0) {
-s->fd_error_time = get_clock();
+s->fd_error_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 s->fd_got_error = 1;
 if (last_media_present)
 s->fd_media_changed = 1;
@@ -1954,7 +1954,7 @@ static int fd_open(BlockDriverState *bs)
 }
 if (!last_media_present)
 s->fd_media_changed = 1;
-s->fd_open_time = get_clock();
+s->fd_open_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 s->fd_got_error = 0;
 return 0;
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH] mips: kvm: do not use get_clock()

2014-11-26 Thread Paolo Bonzini

Use the external qemu-timer API instead.

Signed-off-by: Paolo Bonzini 
---
 target-mips/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-mips/kvm.c b/target-mips/kvm.c
index 97fd51a..a761ea5 100644
--- a/target-mips/kvm.c
+++ b/target-mips/kvm.c
@@ -439,7 +439,7 @@ static void kvm_mips_update_state(void *opaque, int 
running, RunState state)
 }
 } else {
 /* Set clock restore time to now */
-count_resume = get_clock();
+count_resume = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 ret = kvm_mips_put_one_reg64(cs, KVM_REG_MIPS_COUNT_RESUME,
  &count_resume);
 if (ret < 0) {
-- 
1.8.3.1

Re: [Qemu-devel] TCG Multithreading performance improvement

2014-11-26 Thread Peter Maydell

On 26 November 2014 at 09:31, Mark Burton  wrote:
> We have set up a wiki page to track the project
> http://wiki.qemu.org/Features/tcg-multithread

I see you write "The TCG today is close to being thread safe".
Personally I would phrase this as "TCG today is not at all
thread safe" :-)

-- PMM

Re: [Qemu-devel] [2.3 PATCH v7 07/10] qmp: Add support of "dirty-bitmap" sync mode for drive-backup

2014-11-26 Thread Max Reitz


On 2014-11-25 at 20:46, John Snow wrote:

From: Fam Zheng 

For "dirty-bitmap" sync mode, the block job will iterate through the
given dirty bitmap to decide if a sector needs backup (backup all the
dirty clusters and skip clean ones), just as allocation conditions of
"top" sync mode.

There are two bitmap use modes for sync=dirty-bitmap:

  - reset: backup job makes a copy of bitmap and resets the original
one.
  - consume: backup job makes the original anonymous (invisible to user)
and releases it after use.

Signed-off-by: Fam Zheng 
Signed-off-by: John Snow 
---
  block.c   |   5 ++
  block/backup.c| 128 ++
  block/mirror.c|   4 ++
  blockdev.c|  18 ++-
  hmp.c |   4 +-
  include/block/block.h |   1 +
  include/block/block_int.h |   6 +++
  qapi/block-core.json  |  30 +--
  qmp-commands.hx   |   7 +--
  9 files changed, 174 insertions(+), 29 deletions(-)

diff --git a/block.c b/block.c
index 7217066..cf93148 100644
--- a/block.c
+++ b/block.c
@@ -5454,6 +5454,11 @@ void bdrv_dirty_iter_init(BlockDriverState *bs,
  hbitmap_iter_init(hbi, bitmap->bitmap, 0);
  }
  
+void bdrv_dirty_iter_set(HBitmapIter *hbi, int64_t offset)

+{
+hbitmap_iter_init(hbi, hbi->hb, offset);
+}
+
  void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
  int nr_sectors)
  {
diff --git a/block/backup.c b/block/backup.c
index 792e655..8e7d135 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -37,6 +37,10 @@ typedef struct CowRequest {
  typedef struct BackupBlockJob {
  BlockJob common;
  BlockDriverState *target;
+/* bitmap for sync=dirty-bitmap */
+BdrvDirtyBitmap *sync_bitmap;
+/* dirty bitmap granularity */
+int64_t sync_bitmap_gran;
  MirrorSyncMode sync_mode;
  RateLimit limit;
  BlockdevOnError on_source_error;
@@ -242,6 +246,31 @@ static void backup_complete(BlockJob *job, void *opaque)
  g_free(data);
  }
  
+static bool yield_and_check(BackupBlockJob *job)

+{
+if (block_job_is_cancelled(&job->common)) {
+return true;
+}
+
+/* we need to yield so that qemu_aio_flush() returns.
+ * (without, VM does not reboot)
+ */
+if (job->common.speed) {
+uint64_t delay_ns = ratelimit_calculate_delay(&job->limit,
+  job->sectors_read);
+job->sectors_read = 0;
+block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
+} else {
+block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, 0);
+}
+
+if (block_job_is_cancelled(&job->common)) {
+return true;
+}
+
+return false;
+}
+
  static void coroutine_fn backup_run(void *opaque)
  {
  BackupBlockJob *job = opaque;
@@ -254,13 +283,13 @@ static void coroutine_fn backup_run(void *opaque)
  };
  int64_t start, end;
  int ret = 0;
+bool error_is_read;
  
  QLIST_INIT(&job->inflight_reqs);

  qemu_co_rwlock_init(&job->flush_rwlock);
  
  start = 0;

-end = DIV_ROUND_UP(job->common.len / BDRV_SECTOR_SIZE,
-   BACKUP_SECTORS_PER_CLUSTER);
+end = DIV_ROUND_UP(job->common.len, BACKUP_CLUSTER_SIZE);
  
  job->bitmap = hbitmap_alloc(end, 0);
  
@@ -278,28 +307,45 @@ static void coroutine_fn backup_run(void *opaque)

  qemu_coroutine_yield();
  job->common.busy = true;
  }
+} else if (job->sync_mode == MIRROR_SYNC_MODE_DIRTY_BITMAP) {
+/* Dirty Bitmap sync has a slightly different iteration method */
+HBitmapIter hbi;
+int64_t sector;
+int64_t cluster;
+bool polyrhythmic;
+
+bdrv_dirty_iter_init(bs, job->sync_bitmap, &hbi);
+/* Does the granularity happen to match our backup cluster size? */
+polyrhythmic = (job->sync_bitmap_gran != BACKUP_CLUSTER_SIZE);
+
+/* Find the next dirty /sector/ and copy that /cluster/ */
+while ((sector = hbitmap_iter_next(&hbi)) != -1) {
+if (yield_and_check(job)) {
+goto leave;
+}
+cluster = sector / BACKUP_SECTORS_PER_CLUSTER;
+
+do {
+ret = backup_do_cow(bs, cluster * BACKUP_SECTORS_PER_CLUSTER,
+BACKUP_SECTORS_PER_CLUSTER, 
&error_is_read);
+if ((ret < 0) &&
+backup_error_action(job, error_is_read, -ret) ==
+BLOCK_ERROR_ACTION_REPORT) {
+goto leave;
+}
+} while (ret < 0);
+
+/* Advance (or rewind) our iterator if we need to. */
+if (polyrhythmic) {
+bdrv_dirty_iter_set(&hbi,
+(cluster + 1) * 
BACKUP_SECTORS_PER_CLUSTER);
+}
+}
  } else {
  /* Both FULL and TOP SYNC_MODE's require copying.. */

Re: [Qemu-devel] TCG Multithreading performance improvement

2014-11-26 Thread Claudio Fontana

On 26.11.2014 15:06, Peter Maydell wrote:
> On 26 November 2014 at 09:31, Mark Burton  wrote:
>> We have set up a wiki page to track the project
>> http://wiki.qemu.org/Features/tcg-multithread
> 
> I see you write "The TCG today is close to being thread safe".
> Personally I would phrase this as "TCG today is not at all
> thread safe" :-)
> 
> -- PMM
> 

Always nitpicking.. :-)

Re: [Qemu-devel] Qemu-KVM: Virtual Machine Power Managment

2014-11-26 Thread Eduardo Habkost

On Thu, Nov 06, 2014 at 02:52:02PM +, Carew, Alan wrote:
> Hi folks,
> 
> I am looking for feedback regarding work-in-progress or planned CPU power
> management features for Qemu-KVM based Virtual Machines.
> 
> Looking back through the mailing list archives I did not find any discussion
> or patches relating to the general problem of virtual machine power
> management.
> 
> Currently the MSRs for power management are not exposed to a guest OS and as
> far as I am aware no abstraction driver exists to facilitate acpi_cpufreq like
> features on a guest.
> 
> The context for my query relates to deterministic power management at the
> application level(non-VM), where in certain domains the workload is 
> partitioned
> on a per core basis(1:1 thread-core exclusive pinning) and based on current
> workload an application thread can request a transition to a different 
> P-State,
> for example using the apci_cpufreq userspace power governor.
> 
> Are there any plans or previous discussion that I missed for exposing such an
> ability or similar for Qemu-KVM based virtual machines.

I am not aware of any previous discussion, either. One problem I see is
that you probably don't just need the VCPUs (or other VM-related tasks)
to be pinned to specific host CPUs, but you also want to make sure that
no other task in the system will unexpectedly run on that host CPU.

How is that normally achieved when no VMs are involved? Does the system
administrator have to ensure that no other processes are running at all,
or is there some software component that implements this and makes it
easier to set up?

-- 
Eduardo

1 2 3 >

1 - 100 of 205 matches

Mail list logo