date:20170322

[Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU

2017-03-22 Thread Igor Mammedov

it will allow switching from cpu_index to property based
numa mapping in follow up patches.

Signed-off-by: Igor Mammedov 
---
 hw/i386/pc.c  | 17 +
 target/i386/cpu.c |  1 +
 2 files changed, 18 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7031100..873bbfa 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1895,6 +1895,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
 DeviceState *dev, Error **errp)
 {
 int idx;
+int node_id;
 CPUState *cs;
 CPUArchId *cpu_slot;
 X86CPUTopoInfo topo;
@@ -1984,6 +1985,22 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
 
 cs = CPU(cpu);
 cs->cpu_index = idx;
+
+node_id = numa_get_node_for_cpu(cs->cpu_index);
+if (node_id == nb_numa_nodes) {
+/* by default CPUState::numa_node was 0 if it's not set via CLI
+ * keep it this way for now but in future we probably should
+ * refuse to start up with incomplete numa mapping */
+node_id = 0;
+}
+if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
+cs->numa_node = node_id;
+} else if (cs->numa_node != node_id) {
+error_setg(errp, "node-id %d must match numa node specified"
+"with -numa option for cpu-index %d",
+cs->numa_node, cs->cpu_index);
+return;
+}
 }
 
 static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 7aa7622..d690244 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3974,6 +3974,7 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_INT32("core-id", X86CPU, core_id, -1),
 DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, -1),
 #endif
+DEFINE_PROP_INT32("node-id", CPUState, numa_node, CPU_UNSET_NUMA_NODE_ID),
 DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false),
 { .name  = "hv-spinlocks", .info  = _prop_spinlocks },
 DEFINE_PROP_BOOL("hv-relaxed", X86CPU, hyperv_relaxed_timing, false),
-- 
2.7.4

[Qemu-devel] [PATCH for-2.10 13/23] spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()

2017-03-22 Thread Igor Mammedov

it's safe to remove thread node_id != core node_id error
branch as machine_set_cpu_numa_node() also does mismatch
check and is called even before any CPU is created.

Signed-off-by: Igor Mammedov 
---
 hw/ppc/spapr.c  |  4 ++--
 hw/ppc/spapr_cpu_core.c | 14 ++
 2 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9c61721..42cef3d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2803,8 +2803,8 @@ static void spapr_core_pre_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 goto out;
 }
 
-node_id = numa_get_node_for_cpu(cc->core_id);
-if (node_id == nb_numa_nodes) {
+node_id = core_slot->props.node_id;
+if (!core_slot->props.has_node_id) {
 /* by default CPUState::numa_node was 0 if it's not set via CLI
  * keep it this way for now but in future we probably should
  * refuse to start up with incomplete numa mapping */
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 25988f8..8d48468 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -168,7 +168,6 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error 
**errp)
 
 sc->threads = g_malloc0(size * cc->nr_threads);
 for (i = 0; i < cc->nr_threads; i++) {
-int node_id;
 char id[32];
 CPUState *cs;
 
@@ -178,17 +177,8 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error 
**errp)
 cs = CPU(obj);
 cs->cpu_index = cc->core_id + i;
 
-/* Set NUMA node for the added CPUs  */
-node_id = numa_get_node_for_cpu(cs->cpu_index);
-if (node_id != sc->node_id) {
-error_setg(_err, "Invalid node-id=%d of thread[cpu-index: 
%d]"
-" on CPU[core-id: %d, node-id: %d], node-id must be the same",
- node_id, cs->cpu_index, cc->core_id, sc->node_id);
-goto err;
-}
-if (node_id < nb_numa_nodes) {
-cs->numa_node = node_id;
-}
+/* Set NUMA node for the threads belonged to core  */
+cs->numa_node = sc->node_id;
 
 snprintf(id, sizeof(id), "thread[%d]", i);
 object_property_add_child(OBJECT(sc), id, obj, _err);
-- 
2.7.4

[Qemu-devel] [Bug 1674925] Re: Qemu PPC64 kvm no display if --device virtio-gpu-pci is selected

2017-03-22 Thread luigiburdo

Hi Thomas with 2.9 rc1 i have this with --enable-kvm

emu-system-ppc64 --enable-kvm
qemu-system-ppc64: KVM and IRQ_XICS capability must be present for in-kernel 
XICS

and the qemu dont run.

Ciao 
Luigi

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1674925

Title:
  Qemu PPC64 kvm no display if  --device virtio-gpu-pci is selected

Status in QEMU:
  New

Bug description:
  Hi,
  i did many tests on qemu 2.8 on my BE machines and i found an issue that i 
think was need to be reported

  Test Machines BE 970MP

  if i setup qemu with

  qemu-system-ppc64 -M 1024 --display sdl(or gtk),gl=on --device virtio-
  gpu-pci,virgl --enable-kvm and so and so

  result is doubled window one is vga other is virtio-gpu-pci without
  any start of the VM . pratically i dont have any output of openbios
  and on the virtual serial output

  the same issue i found is if i select:
  qemu-system-ppc64 -M 1024 --display gtk(or sdl) --device virtio-gpu-pci 
--enable-kvm and so and so

  
  i had been try to change all the -M types of all kind of pseries without any 
positive result.

  Ciao 
  Luigi

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1674925/+subscriptions

[Qemu-devel] [PATCH for-2.10 11/23] numa: do default mapping based on possible_cpus instead of node_cpu bitmaps

2017-03-22 Thread Igor Mammedov

Signed-off-by: Igor Mammedov 
---
 numa.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/numa.c b/numa.c
index 44057f1..ab41776 100644
--- a/numa.c
+++ b/numa.c
@@ -309,6 +309,7 @@ static void validate_numa_cpus(void)
 void parse_numa_opts(MachineState *ms)
 {
 int i;
+const CPUArchIdList *possible_cpus;
 MachineClass *mc = MACHINE_GET_CLASS(ms);
 
 for (i = 0; i < MAX_NODES; i++) {
@@ -379,11 +380,6 @@ void parse_numa_opts(MachineState *ms)
 
 numa_set_mem_ranges();
 
-for (i = 0; i < nb_numa_nodes; i++) {
-if (!bitmap_empty(numa_info[i].node_cpu, max_cpus)) {
-break;
-}
-}
 /* Historically VCPUs were assigned in round-robin order to NUMA
  * nodes. However it causes issues with guest not handling it nice
  * in case where cores/threads from a multicore CPU appear on
@@ -391,11 +387,20 @@ void parse_numa_opts(MachineState *ms)
  * rule grouping VCPUs by socket so that VCPUs from the same socket
  * would be on the same node.
  */
-if (!mc->cpu_index_to_instance_props) {
+if (!mc->cpu_index_to_instance_props || !mc->possible_cpu_arch_ids) {
 error_report("default CPUs to NUMA node mapping isn't supported");
 exit(1);
 }
-if (i == nb_numa_nodes) {
+
+possible_cpus = mc->possible_cpu_arch_ids(ms);
+for (i = 0; i < possible_cpus->len; i++) {
+if (possible_cpus->cpus[i].props.has_node_id) {
+break;
+}
+}
+
+/* no CPUs are assigned to NUMA nodes */
+if (i == possible_cpus->len) {
 for (i = 0; i < max_cpus; i++) {
 CpuInstanceProperties props;
 /* fetch default mapping from board and enable it */
-- 
2.7.4

[Qemu-devel] [PATCH for-2.10 09/23] numa: add check that board supports cpu_index to node mapping

2017-03-22 Thread Igor Mammedov

Default node mapping initialization already checks that board
supports cpu_index to node mapping and refuses to start if
it's not supported. Do the same for explicitly provided
mapping "-numa node,cpus=..."

Signed-off-by: Igor Mammedov 
---
 numa.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/numa.c b/numa.c
index b6e71bc..24c596d 100644
--- a/numa.c
+++ b/numa.c
@@ -140,10 +140,12 @@ uint32_t numa_get_node(ram_addr_t addr, Error **errp)
 return -1;
 }
 
-static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error 
**errp)
+static void numa_node_parse(MachineState *ms, NumaNodeOptions *node,
+QemuOpts *opts, Error **errp)
 {
 uint16_t nodenr;
 uint16List *cpus = NULL;
+MachineClass *mc = MACHINE_GET_CLASS(ms);
 
 if (node->has_nodeid) {
 nodenr = node->nodeid;
@@ -162,6 +164,10 @@ static void numa_node_parse(NumaNodeOptions *node, 
QemuOpts *opts, Error **errp)
 return;
 }
 
+if (!mc->cpu_index_to_instance_props) {
+error_report("CPUs to NUMA node mapping isn't supported");
+exit(1);
+}
 for (cpus = node->cpus; cpus; cpus = cpus->next) {
 if (cpus->value >= max_cpus) {
 error_setg(errp,
@@ -215,6 +221,7 @@ static void numa_node_parse(NumaNodeOptions *node, QemuOpts 
*opts, Error **errp)
 static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
 {
 NumaOptions *object = NULL;
+MachineState *ms = opaque;
 Error *err = NULL;
 
 {
@@ -229,7 +236,7 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error 
**errp)
 
 switch (object->type) {
 case NUMA_OPTIONS_TYPE_NODE:
-numa_node_parse(>u.node, opts, );
+numa_node_parse(ms, >u.node, opts, );
 if (err) {
 goto end;
 }
@@ -303,7 +310,7 @@ void parse_numa_opts(MachineState *ms)
 numa_info[i].node_cpu = bitmap_new(max_cpus);
 }
 
-if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, NULL, NULL)) {
+if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, ms, NULL)) {
 exit(1);
 }
 
-- 
2.7.4

Re: [Qemu-devel] [PATCH] only link current target arch traces to qemu-system

2017-03-22 Thread Paolo Bonzini



On 22/03/2017 03:03, Xu, Anthony wrote:
> When building target x86_64-softmmu, all other architectures' trace.o are 
> linked into 
> x86_64-softmmu/qemu-system-x86_64, like hw/arm/trace.o, hw/mips/trace.o etc., 
> that is not necessary.
>  Same thing happens when building other targets.
> 
> Only current target arch traces should be linked into qemu-system.
> 
> Signed-off -by: Anthony Xu 

It's a bit cleaner, but does the benefit outweight the maintenance cost
of the additional code added to the Makefiles?

Paolo

[Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus

2017-03-22 Thread Igor Mammedov

Introduce machine_set_cpu_numa_node() helper that stores
node mapping for CPU in MachineState::possible_cpus.
CPU and node it belongs to is specified by 'props' argument.

Patch doesn't remove old way of storing mapping in
numa_info[X].node_cpu as removing it at the same time
makes patch rather big. Instead it just mirrors mapping
in possible_cpus and follow up per target patches will
switch to possible_cpus and numa_info[X].node_cpu will
be removed once there isn't any users left.

Signed-off-by: Igor Mammedov 
---
 include/hw/boards.h |  2 ++
 hw/core/machine.c   | 68 +
 numa.c  |  8 +++
 3 files changed, 78 insertions(+)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 1dd0fde..40f30f1 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -42,6 +42,8 @@ bool machine_dump_guest_core(MachineState *machine);
 bool machine_mem_merge(MachineState *machine);
 void machine_register_compat_props(MachineState *machine);
 HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine);
+void machine_set_cpu_numa_node(MachineState *machine,
+   CpuInstanceProperties *props, Error **errp);
 
 /**
  * CPUArchId:
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 0d92672..6ff0b45 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -388,6 +388,74 @@ HotpluggableCPUList 
*machine_query_hotpluggable_cpus(MachineState *machine)
 return head;
 }
 
+void machine_set_cpu_numa_node(MachineState *machine,
+   CpuInstanceProperties *props, Error **errp)
+{
+MachineClass *mc = MACHINE_GET_CLASS(machine);
+bool match = false;
+int i;
+
+if (!mc->possible_cpu_arch_ids) {
+error_setg(errp, "mapping of CPUs to NUMA node is not supported");
+return;
+}
+
+/* force board to initialize possible_cpus if it hasn't been done yet */
+mc->possible_cpu_arch_ids(machine);
+
+for (i = 0; i < machine->possible_cpus->len; i++) {
+CPUArchId *slot = >possible_cpus->cpus[i];
+
+/* reject unsupported by board properties */
+if (props->has_thread_id && !slot->props.has_thread_id) {
+error_setg(errp, "thread-id is not supported");
+return;
+}
+
+if (props->has_core_id && !slot->props.has_core_id) {
+error_setg(errp, "core-id is not supported");
+return;
+}
+
+if (props->has_socket_id && !slot->props.has_socket_id) {
+error_setg(errp, "socket-id is not supported");
+return;
+}
+
+/* skip slots with explicit mismatch */
+if (props->has_thread_id && props->thread_id != slot->props.thread_id) 
{
+continue;
+}
+
+if (props->has_core_id && props->core_id != slot->props.core_id) {
+continue;
+}
+
+if (props->has_socket_id && props->socket_id != slot->props.socket_id) 
{
+continue;
+}
+
+/* reject assignment if slot is already assigned, for compatibility
+ * of legacy cpu_index mapping with SPAPR core based mapping do not
+ * error out if cpu thread and matched core have the same node-id */
+if (slot->props.has_node_id &&
+slot->props.node_id != props->node_id) {
+error_setg(errp, "CPU is already assigned to node-id: %" PRId64,
+   slot->props.node_id);
+return;
+}
+
+/* assign slot to node as it's matched '-numa cpu' key */
+match = true;
+slot->props.node_id = props->node_id;
+slot->props.has_node_id = props->has_node_id;
+}
+
+if (!match) {
+error_setg(errp, "no match found");
+}
+}
+
 static void machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
diff --git a/numa.c b/numa.c
index 24c596d..44057f1 100644
--- a/numa.c
+++ b/numa.c
@@ -169,6 +169,7 @@ static void numa_node_parse(MachineState *ms, 
NumaNodeOptions *node,
 exit(1);
 }
 for (cpus = node->cpus; cpus; cpus = cpus->next) {
+CpuInstanceProperties props;
 if (cpus->value >= max_cpus) {
 error_setg(errp,
"CPU index (%" PRIu16 ")"
@@ -177,6 +178,10 @@ static void numa_node_parse(MachineState *ms, 
NumaNodeOptions *node,
 return;
 }
 bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1);
+props = mc->cpu_index_to_instance_props(ms, cpus->value);
+props.node_id = nodenr;
+props.has_node_id = true;
+machine_set_cpu_numa_node(ms, , _fatal);
 }
 
 if (node->has_mem && node->has_memdev) {
@@ -393,9 +398,12 @@ void parse_numa_opts(MachineState *ms)
 if (i == nb_numa_nodes) {
 for (i = 0; i < max_cpus; i++) {
 CpuInstanceProperties props;
+/* fetch default

[Qemu-devel] [PATCH for-2.10 01/23] tests: add CPUs to numa node mapping test

2017-03-22 Thread Igor Mammedov

Signed-off-by: Igor Mammedov 
---
 tests/Makefile.include |   5 +++
 tests/numa-test.c  | 106 +
 2 files changed, 111 insertions(+)
 create mode 100644 tests/numa-test.c

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 402e71c..4547b01 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -260,6 +260,7 @@ check-qtest-i386-y += tests/test-filter-mirror$(EXESUF)
 check-qtest-i386-y += tests/test-filter-redirector$(EXESUF)
 check-qtest-i386-y += tests/postcopy-test$(EXESUF)
 check-qtest-i386-y += tests/test-x86-cpuid-compat$(EXESUF)
+check-qtest-i386-y += tests/numa-test$(EXESUF)
 check-qtest-x86_64-y += $(check-qtest-i386-y)
 gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c
 gcov-files-x86_64-y = $(subst 
i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y))
@@ -300,6 +301,7 @@ check-qtest-ppc64-y += tests/test-netfilter$(EXESUF)
 check-qtest-ppc64-y += tests/test-filter-mirror$(EXESUF)
 check-qtest-ppc64-y += tests/test-filter-redirector$(EXESUF)
 check-qtest-ppc64-y += tests/display-vga-test$(EXESUF)
+check-qtest-ppc64-y += tests/numa-test$(EXESUF)
 check-qtest-ppc64-$(CONFIG_EVENTFD) += tests/ivshmem-test$(EXESUF)
 
 check-qtest-sh4-y = tests/endianness-test$(EXESUF)
@@ -324,6 +326,8 @@ gcov-files-arm-y += arm-softmmu/hw/block/virtio-blk.c
 check-qtest-arm-y += tests/test-arm-mptimer$(EXESUF)
 gcov-files-arm-y += hw/timer/arm_mptimer.c
 
+check-qtest-aarch64-y = tests/numa-test$(EXESUF)
+
 check-qtest-microblazeel-y = $(check-qtest-microblaze-y)
 
 check-qtest-xtensaeb-y = $(check-qtest-xtensa-y)
@@ -747,6 +751,7 @@ tests/vhost-user-bridge$(EXESUF): tests/vhost-user-bridge.o 
contrib/libvhost-use
 tests/test-uuid$(EXESUF): tests/test-uuid.o $(test-util-obj-y)
 tests/test-arm-mptimer$(EXESUF): tests/test-arm-mptimer.o
 tests/test-qapi-util$(EXESUF): tests/test-qapi-util.o $(test-util-obj-y)
+tests/numa-test$(EXESUF): tests/numa-test.o
 
 tests/migration/stress$(EXESUF): tests/migration/stress.o
$(call quiet-command, $(LINKPROG) -static -O3 $(PTHREAD_LIB) -o $@ $< 
,"LINK","$(TARGET_DIR)$@")
diff --git a/tests/numa-test.c b/tests/numa-test.c
new file mode 100644
index 000..f5da0c8
--- /dev/null
+++ b/tests/numa-test.c
@@ -0,0 +1,106 @@
+/*
+ * NUMA configuration test cases
+ *
+ * Copyright (c) 2017 Red Hat Inc.
+ * Authors:
+ *  Igor Mammedov 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+
+static char *make_cli(const char *generic_cli, const char *test_cli)
+{
+return g_strdup_printf("%s %s", generic_cli ? generic_cli : "", test_cli);
+}
+
+static char *hmp_info_numa(void)
+{
+QDict *resp;
+char *s;
+
+resp = qmp("{ 'execute': 'human-monitor-command', 'arguments': "
+  "{ 'command-line': 'info numa '} }");
+g_assert(resp);
+g_assert(qdict_haskey(resp, "return"));
+s = g_strdup(qdict_get_str(resp, "return"));
+g_assert(s);
+QDECREF(resp);
+return s;
+}
+
+static void test_mon_explicit(const void *data)
+{
+char *s;
+char *cli;
+
+cli = make_cli(data, "-smp 8 "
+   "-numa node,nodeid=0,cpus=0-3 "
+   "-numa node,nodeid=1,cpus=4-7 ");
+qtest_start(cli);
+
+s = hmp_info_numa();
+g_assert(strstr(s, "node 0 cpus: 0 1 2 3"));
+g_assert(strstr(s, "node 1 cpus: 4 5 6 7"));
+g_free(s);
+
+qtest_end();
+g_free(cli);
+}
+
+static void test_mon_default(const void *data)
+{
+char *s;
+char *cli;
+
+cli = make_cli(data, "-smp 8 -numa node -numa node");
+qtest_start(cli);
+
+s = hmp_info_numa();
+g_assert(strstr(s, "node 0 cpus: 0 2 4 6"));
+g_assert(strstr(s, "node 1 cpus: 1 3 5 7"));
+g_free(s);
+
+qtest_end();
+g_free(cli);
+}
+
+static void test_mon_partial(const void *data)
+{
+char *s;
+char *cli;
+
+cli = make_cli(data, "-smp 8 "
+   "-numa node,nodeid=0,cpus=0-1 "
+   "-numa node,nodeid=1,cpus=4-5 ");
+qtest_start(cli);
+
+s = hmp_info_numa();
+g_assert(strstr(s, "node 0 cpus: 0 1 2 3 6 7"));
+g_assert(strstr(s, "node 1 cpus: 4 5"));
+g_free(s);
+
+qtest_end();
+g_free(cli);
+}
+
+int main(int argc, char **argv)
+{
+const char *args = NULL;
+const char *arch = qtest_get_arch();
+
+if (strcmp(arch, "aarch64") == 0) {
+args = "-machine virt";
+}
+
+g_test_init(, , NULL);
+
+qtest_add_data_func("/numa/mon/default", args, test_mon_default);
+qtest_add_data_func("/numa/mon/cpus/explicit", args, test_mon_explicit);
+qtest_add_data_func("/numa/mon/cpus/partial", args, test_mon_partial);
+
+return g_test_run();
+}
-- 
2.7.4

[Qemu-devel] [PATCH for-2.10 08/23] virt-arm: add node-id property to CPU

2017-03-22 Thread Igor Mammedov

it will allow switching from cpu_index to property based
numa mapping in follow up patches.

Signed-off-by: Igor Mammedov 
---
 hw/arm/virt.c| 15 +++
 target/arm/cpu.c |  1 +
 2 files changed, 16 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 8748d25..68d44f3 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1365,6 +1365,7 @@ static void machvirt_init(MachineState *machine)
 for (n = 0; n < machine->possible_cpus->len; n++) {
 Object *cpuobj;
 CPUState *cs;
+int node_id;
 
 if (n >= smp_cpus) {
 break;
@@ -1377,6 +1378,20 @@ static void machvirt_init(MachineState *machine)
 cs = CPU(cpuobj);
 cs->cpu_index = n;
 
+node_id = numa_get_node_for_cpu(cs->cpu_index);
+if (node_id == nb_numa_nodes) {
+/* by default CPUState::numa_node was 0 if it's not set via CLI
+ * keep it this way for now but in future we probably should
+ * refuse to start up with incomplete numa mapping */
+ node_id = 0;
+}
+if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
+cs->numa_node = node_id;
+} else {
+/* CPU isn't device_add compatible yet, this shouldn't happen */
+error_setg(_abort, "user set node-id not implemented");
+}
+
 if (!vms->secure) {
 object_property_set_bool(cpuobj, false, "has_el3", NULL);
 }
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 04b062c..a635048 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1606,6 +1606,7 @@ static Property arm_cpu_properties[] = {
 DEFINE_PROP_UINT32("midr", ARMCPU, midr, 0),
 DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
 mp_affinity, ARM64_AFFINITY_INVALID),
+DEFINE_PROP_INT32("node-id", CPUState, numa_node, CPU_UNSET_NUMA_NODE_ID),
 DEFINE_PROP_END_OF_LIST()
 };
 
-- 
2.7.4

[Qemu-devel] [PATCH for-2.10 02/23] hw/arm/virt: extract mp-affinity calculation in separate function

2017-03-22 Thread Igor Mammedov

Signed-off-by: Igor Mammedov 
---
 hw/arm/virt.c | 59 ++-
 1 file changed, 42 insertions(+), 17 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 5f62a03..484754e 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1194,6 +1194,45 @@ void virt_machine_done(Notifier *notifier, void *data)
 virt_build_smbios(vms);
 }
 
+static uint64_t virt_idx2mp_affinity(VirtMachineState *vms, int idx)
+{
+uint64_t mp_affinity;
+uint8_t clustersz;
+VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
+
+if (!vmc->disallow_affinity_adjustment) {
+uint8_t aff0, aff1;
+
+if (vms->gic_version == 3) {
+clustersz = GICV3_TARGETLIST_BITS;
+} else {
+clustersz = GIC_TARGETLIST_BITS;
+}
+
+/* Adjust MPIDR like 64-bit KVM hosts, which incorporate the
+ * GIC's target-list limitations. 32-bit KVM hosts currently
+ * always create clusters of 4 CPUs, but that is expected to
+ * change when they gain support for gicv3. When KVM is enabled
+ * it will override the changes we make here, therefore our
+ * purposes are to make TCG consistent (with 64-bit KVM hosts)
+ * and to improve SGI efficiency.
+ */
+aff1 = idx / clustersz;
+aff0 = idx % clustersz;
+mp_affinity = (aff1 << ARM_AFF1_SHIFT) | aff0;
+} else {
+/* This cpu-id-to-MPIDR affinity is used only for TCG;
+ * KVM will override it. We don't support setting cluster ID
+ * ([16..23]) (known as Aff2 in later ARM ARM versions), or any of
+ * the higher affinity level fields, so these bits always RAZ.
+ */
+uint32_t Aff1 = idx / ARM_DEFAULT_CPUS_PER_CLUSTER;
+uint32_t Aff0 = idx % ARM_DEFAULT_CPUS_PER_CLUSTER;
+mp_affinity = (Aff1 << ARM_AFF1_SHIFT) | Aff0;
+}
+return mp_affinity;
+}
+
 static void machvirt_init(MachineState *machine)
 {
 VirtMachineState *vms = VIRT_MACHINE(machine);
@@ -1210,7 +1249,6 @@ static void machvirt_init(MachineState *machine)
 CPUClass *cc;
 Error *err = NULL;
 bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
-uint8_t clustersz;
 
 if (!cpu_model) {
 cpu_model = "cortex-a15";
@@ -1263,10 +1301,8 @@ static void machvirt_init(MachineState *machine)
  */
 if (vms->gic_version == 3) {
 virt_max_cpus = vms->memmap[VIRT_GIC_REDIST].size / 0x2;
-clustersz = GICV3_TARGETLIST_BITS;
 } else {
 virt_max_cpus = GIC_NCPU;
-clustersz = GIC_TARGETLIST_BITS;
 }
 
 if (max_cpus > virt_max_cpus) {
@@ -1326,20 +1362,9 @@ static void machvirt_init(MachineState *machine)
 
 for (n = 0; n < smp_cpus; n++) {
 Object *cpuobj = object_new(typename);
-if (!vmc->disallow_affinity_adjustment) {
-/* Adjust MPIDR like 64-bit KVM hosts, which incorporate the
- * GIC's target-list limitations. 32-bit KVM hosts currently
- * always create clusters of 4 CPUs, but that is expected to
- * change when they gain support for gicv3. When KVM is enabled
- * it will override the changes we make here, therefore our
- * purposes are to make TCG consistent (with 64-bit KVM hosts)
- * and to improve SGI efficiency.
- */
-uint8_t aff1 = n / clustersz;
-uint8_t aff0 = n % clustersz;
-object_property_set_int(cpuobj, (aff1 << ARM_AFF1_SHIFT) | aff0,
-"mp-affinity", NULL);
-}
+
+object_property_set_int(cpuobj, virt_idx2mp_affinity(vms, n),
+"mp-affinity", NULL);
 
 if (!vms->secure) {
 object_property_set_bool(cpuobj, false, "has_el3", NULL);
-- 
2.7.4

[Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards

2017-03-22 Thread Igor Mammedov

Originally CPU threads were by default assigned in
round-robin fashion. However it was causing issues in
guest since CPU threads from the same socket/core could
be placed on different NUMA nodes.
Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
fixed it by grouping threads within a socket on the same node
introducing cpu_index_to_socket_id() callback and commit
20bb648d (spapr: Fix default NUMA node allocation for threads)
reused callback to fix similar issues for SPAPR machine
even though socket doesn't make much sense there.

As result QEMU ended up having 3 default distribution rules
used by 3 targets /virt-arm, spapr, pc/.

In effort of moving NUMA mapping for CPUs into possible_cpus,
generalize default mapping in numa.c by making boards decide
on default mapping and let them explicitly tell generic
numa code to which node a CPU thread belongs to by replacing
cpu_index_to_socket_id() with @cpu_index_to_instance_props()
which provides default node_id assigned by board to specified
cpu_index.

Signed-off-by: Igor Mammedov 
---
Patch only moves source of default mapping to possible_cpus[]
and leaves the rest of NUMA handling to numa_info[node_id].node_cpu
bitmaps. It's up to follow up patches to replace bitmaps
with possible_cpus[] internally.
---
 include/hw/boards.h   |  8 ++--
 include/sysemu/numa.h |  2 +-
 hw/arm/virt.c | 19 +--
 hw/i386/pc.c  | 22 --
 hw/ppc/spapr.c| 27 ---
 numa.c| 15 +--
 vl.c  |  2 +-
 7 files changed, 70 insertions(+), 25 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 269d0ba..1dd0fde 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -74,7 +74,10 @@ typedef struct {
  *of HotplugHandler object, which handles hotplug operation
  *for a given @dev. It may return NULL if @dev doesn't require
  *any actions to be performed by hotplug handler.
- * @cpu_index_to_socket_id:
+ * @cpu_index_to_instance_props:
+ *used to provide @cpu_index to socket/core/thread number mapping, allowing
+ *legacy code to perform maping from cpu_index to topology properties
+ *Returns: tuple of socket/core/thread ids given cpu_index belongs to.
  *used to provide @cpu_index to socket number mapping, allowing
  *a machine to group CPU threads belonging to the same socket/package
  *Returns: socket number given cpu_index belongs to.
@@ -138,7 +141,8 @@ struct MachineClass {
 
 HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
DeviceState *dev);
-unsigned (*cpu_index_to_socket_id)(unsigned cpu_index);
+CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine,
+ unsigned cpu_index);
 const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
 };
 
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 8f09dcf..46ea6c7 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -24,7 +24,7 @@ typedef struct node_info {
 } NodeInfo;
 
 extern NodeInfo numa_info[MAX_NODES];
-void parse_numa_opts(MachineClass *mc);
+void parse_numa_opts(MachineState *ms);
 void numa_post_machine_init(void);
 void query_numa_node_mem(uint64_t node_mem[]);
 extern QemuOptsList qemu_numa_opts;
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 0cbcbc1..8748d25 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1554,6 +1554,16 @@ static void virt_set_gic_version(Object *obj, const char 
*value, Error **errp)
 }
 }
 
+static CpuInstanceProperties
+virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
+{
+MachineClass *mc = MACHINE_GET_CLASS(ms);
+const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+
+assert(cpu_index < possible_cpus->len);
+return possible_cpus->cpus[cpu_index].props;;
+}
+
 static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
 {
 int n;
@@ -1573,8 +1583,12 @@ static const CPUArchIdList 
*virt_possible_cpu_arch_ids(MachineState *ms)
 ms->possible_cpus->cpus[n].props.has_thread_id = true;
 ms->possible_cpus->cpus[n].props.thread_id = n;
 
-/* TODO: add 'has_node/node' here to describe
-   to which node core belongs */
+/* default distribution of CPUs over NUMA nodes */
+if (nb_numa_nodes) {
+/* preset values but do not enable them i.e. 'has_node_id = false',
+ * board will enable them if manual mapping wasn't present on CLI 
*/
+ms->possible_cpus->cpus[n].props.node_id = n % nb_numa_nodes;;
+}
 }
 return ms->possible_cpus;
 }
@@ -1596,6 +1610,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 /* We know we will never create a pre-ARMv7 CPU which needs 1K pages */
 mc->minimum_page_bits = 12;
 mc->possible_cpu_arch_ids =

[Qemu-devel] [PATCH for-2.10 12/23] pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()

2017-03-22 Thread Igor Mammedov

Signed-off-by: Igor Mammedov 
---
 hw/acpi/cpu.c|  7 +++
 hw/i386/acpi-build.c | 11 ---
 hw/i386/pc.c | 18 ++
 3 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index 8c719d3..90fe24d 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -503,7 +503,6 @@ void build_cpus_aml(Aml *table, MachineState *machine, 
CPUHotplugFeatures opts,
 
 /* build Processor object for each processor */
 for (i = 0; i < arch_ids->len; i++) {
-int j;
 Aml *dev;
 Aml *uid = aml_int(i);
 GArray *madt_buf = g_array_new(0, 1, 1);
@@ -557,9 +556,9 @@ void build_cpus_aml(Aml *table, MachineState *machine, 
CPUHotplugFeatures opts,
  * as a result _PXM is required for all CPUs which might
  * be hot-plugged. For simplicity, add it for all CPUs.
  */
-j = numa_get_node_for_cpu(i);
-if (j < nb_numa_nodes) {
-aml_append(dev, aml_name_decl("_PXM", aml_int(j)));
+if (arch_ids->cpus[i].props.has_node_id) {
+int node_id = arch_ids->cpus[i].props.node_id;
+aml_append(dev, aml_name_decl("_PXM", aml_int(node_id)));
 }
 
 aml_append(cpus_dev, dev);
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 2073108..a2be70b 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2306,7 +2306,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 srat->reserved1 = cpu_to_le32(1);
 
 for (i = 0; i < apic_ids->len; i++) {
-int j = numa_get_node_for_cpu(i);
+int node_id = apic_ids->cpus[i].props.has_node_id ?
+apic_ids->cpus[i].props.node_id : 0;
 uint32_t apic_id = apic_ids->cpus[i].arch_id;
 
 if (apic_id < 255) {
@@ -2316,9 +2317,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 core->type = ACPI_SRAT_PROCESSOR_APIC;
 core->length = sizeof(*core);
 core->local_apic_id = apic_id;
-if (j < nb_numa_nodes) {
-core->proximity_lo = j;
-}
+core->proximity_lo = node_id;
 memset(core->proximity_hi, 0, 3);
 core->local_sapic_eid = 0;
 core->flags = cpu_to_le32(1);
@@ -2329,9 +2328,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 core->type = ACPI_SRAT_PROCESSOR_x2APIC;
 core->length = sizeof(*core);
 core->x2apic_id = cpu_to_le32(apic_id);
-if (j < nb_numa_nodes) {
-core->proximity_domain = cpu_to_le32(j);
-}
+core->proximity_domain = cpu_to_le32(node_id);
 core->flags = cpu_to_le32(1);
 }
 }
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 873bbfa..6fdec59 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -747,7 +747,9 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, 
PCMachineState *pcms)
 {
 FWCfgState *fw_cfg;
 uint64_t *numa_fw_cfg;
-int i, j;
+int i;
+const CPUArchIdList *cpus;
+MachineClass *mc = MACHINE_GET_CLASS(pcms);
 
 fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as);
 fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
@@ -782,12 +784,12 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, 
PCMachineState *pcms)
  */
 numa_fw_cfg = g_new0(uint64_t, 1 + pcms->apic_id_limit + nb_numa_nodes);
 numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
-for (i = 0; i < max_cpus; i++) {
-unsigned int apic_id = x86_cpu_apic_id_from_index(i);
+cpus = mc->possible_cpu_arch_ids(MACHINE(pcms));
+for (i = 0; i < cpus->len; i++) {
+unsigned int apic_id = cpus->cpus[i].arch_id;
 assert(apic_id < pcms->apic_id_limit);
-j = numa_get_node_for_cpu(i);
-if (j < nb_numa_nodes) {
-numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);
+if (cpus->cpus[i].props.has_node_id) {
+numa_fw_cfg[apic_id + 1] = 
cpu_to_le64(cpus->cpus[i].props.node_id);
 }
 }
 for (i = 0; i < nb_numa_nodes; i++) {
@@ -1986,8 +1988,8 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
 cs = CPU(cpu);
 cs->cpu_index = idx;
 
-node_id = numa_get_node_for_cpu(cs->cpu_index);
-if (node_id == nb_numa_nodes) {
+node_id = cpu_slot->props.node_id;
+if (!cpu_slot->props.has_node_id) {
 /* by default CPUState::numa_node was 0 if it's not set via CLI
  * keep it this way for now but in future we probably should
  * refuse to start up with incomplete numa mapping */
-- 
2.7.4

[Qemu-devel] [PATCH for-2.10 06/23] spapr: add node-id property to sPAPR core

2017-03-22 Thread Igor Mammedov

it will allow switching from cpu_index to core based numa
mapping in follow up patches.

Signed-off-by: Igor Mammedov 
---
 include/hw/ppc/spapr_cpu_core.h |  1 +
 include/qom/cpu.h   |  2 ++
 hw/ppc/spapr.c  | 17 +
 hw/ppc/spapr_cpu_core.c | 11 ---
 4 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
index 3c35665..93051e9 100644
--- a/include/hw/ppc/spapr_cpu_core.h
+++ b/include/hw/ppc/spapr_cpu_core.h
@@ -27,6 +27,7 @@ typedef struct sPAPRCPUCore {
 
 /*< public >*/
 void *threads;
+int node_id;
 } sPAPRCPUCore;
 
 typedef struct sPAPRCPUCoreClass {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index c3292ef..7f27d56 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -258,6 +258,8 @@ typedef void (*run_on_cpu_func)(CPUState *cpu, 
run_on_cpu_data data);
 
 struct qemu_work_item;
 
+#define CPU_UNSET_NUMA_NODE_ID -1
+
 /**
  * CPUState:
  * @cpu_index: CPU index (informative).
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9dcbbcc..9c61721 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2770,9 +2770,11 @@ static void spapr_core_pre_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 MachineClass *mc = MACHINE_GET_CLASS(hotplug_dev);
 Error *local_err = NULL;
 CPUCore *cc = CPU_CORE(dev);
+sPAPRCPUCore *sc = SPAPR_CPU_CORE(dev);
 char *base_core_type = spapr_get_cpu_core_type(machine->cpu_model);
 const char *type = object_get_typename(OBJECT(dev));
 CPUArchId *core_slot;
+int node_id;
 int index;
 
 if (dev->hotplugged && !mc->has_hotpluggable_cpus) {
@@ -2801,6 +2803,21 @@ static void spapr_core_pre_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 goto out;
 }
 
+node_id = numa_get_node_for_cpu(cc->core_id);
+if (node_id == nb_numa_nodes) {
+/* by default CPUState::numa_node was 0 if it's not set via CLI
+ * keep it this way for now but in future we probably should
+ * refuse to start up with incomplete numa mapping */
+node_id = 0;
+}
+if (sc->node_id == CPU_UNSET_NUMA_NODE_ID) {
+sc->node_id = node_id;
+} else if (sc->node_id != node_id) {
+error_setg(_err, "node-id %d must match numa node specified"
+"with -numa option for cpu-index %d", sc->node_id, cc->core_id);
+goto out;
+}
+
 out:
 g_free(base_core_type);
 error_propagate(errp, local_err);
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 6883f09..25988f8 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -163,7 +163,6 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error 
**errp)
 const char *typename = object_class_get_name(scc->cpu_class);
 size_t size = object_type_get_instance_size(typename);
 Error *local_err = NULL;
-int core_node_id = numa_get_node_for_cpu(cc->core_id);;
 void *obj;
 int i, j;
 
@@ -181,10 +180,10 @@ static void spapr_cpu_core_realize(DeviceState *dev, 
Error **errp)
 
 /* Set NUMA node for the added CPUs  */
 node_id = numa_get_node_for_cpu(cs->cpu_index);
-if (node_id != core_node_id) {
+if (node_id != sc->node_id) {
 error_setg(_err, "Invalid node-id=%d of thread[cpu-index: 
%d]"
 " on CPU[core-id: %d, node-id: %d], node-id must be the same",
- node_id, cs->cpu_index, cc->core_id, core_node_id);
+ node_id, cs->cpu_index, cc->core_id, sc->node_id);
 goto err;
 }
 if (node_id < nb_numa_nodes) {
@@ -250,6 +249,11 @@ static const char *spapr_core_models[] = {
 "POWER9_v1.0",
 };
 
+static Property spapr_cpu_core_properties[] = {
+DEFINE_PROP_INT32("node-id", sPAPRCPUCore, node_id, 
CPU_UNSET_NUMA_NODE_ID),
+DEFINE_PROP_END_OF_LIST()
+};
+
 void spapr_cpu_core_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
@@ -257,6 +261,7 @@ void spapr_cpu_core_class_init(ObjectClass *oc, void *data)
 
 dc->realize = spapr_cpu_core_realize;
 dc->unrealize = spapr_cpu_core_unrealizefn;
+dc->props = spapr_cpu_core_properties;
 scc->cpu_class = cpu_class_by_name(TYPE_POWERPC_CPU, data);
 g_assert(scc->cpu_class);
 }
-- 
2.7.4

[Qemu-devel] [PATCH for-2.10 04/23] hw/arm/virt: explicitly allocate cpu_index for cpus

2017-03-22 Thread Igor Mammedov

Currently cpu_index is implicitly auto assigned during
cpu.realize() time cpu_exec_realizefn()->cpu_list_add().

It happens to match index in possible_cpus so take
control over it and make board initialize cpu_index
to possible_cpus index explicitly. It will at least
document that board is in control of it and when
'-device cpu' support comes it will keep cpu_index
stable regardless of order cpus are created so it won't
break migration.
Within this series it will be used for internal
conversion from storing cpu_index based NUMA node
bitmaps to property based mapping with possible_cpus,
And will allow map cpu_index to a CPU entry in
possible_cpus array.

Signed-off-by: Igor Mammedov 
---
 hw/arm/virt.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4de46b1..0cbcbc1 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1364,6 +1364,7 @@ static void machvirt_init(MachineState *machine)
 mc->possible_cpu_arch_ids(machine);
 for (n = 0; n < machine->possible_cpus->len; n++) {
 Object *cpuobj;
+CPUState *cs;
 
 if (n >= smp_cpus) {
 break;
@@ -1373,6 +1374,9 @@ static void machvirt_init(MachineState *machine)
 object_property_set_int(cpuobj, 
machine->possible_cpus->cpus[n].arch_id,
 "mp-affinity", NULL);
 
+cs = CPU(cpuobj);
+cs->cpu_index = n;
+
 if (!vms->secure) {
 object_property_set_bool(cpuobj, false, "has_el3", NULL);
 }
-- 
2.7.4

[Qemu-devel] [PATCH for-2.10 03/23] hw/arm/virt: use machine->possible_cpus for storing possible topology info

2017-03-22 Thread Igor Mammedov

for now precalculate and store mp_afinity in possible_cpus
as ARM cpus don't have socket/core/thread-id properties yet.
In follow patches possible_cpus will be used for storing
and setting NUMA node mapping and replace legacy bitmap
based numa_info[node_id].node_cpu/numa_get_node_for_cpu()

For the lack of better idea, this patch cannibalizes
possible_cpus.cpus[x].props.thread_id so that
*_cpu_index_to_props() callback could return addressable
by props CPU which will used by machine_set_cpu_numa_node()
in follow up patches to assign a CPU to node. But
cannibalizing is fine for now as that thread_id isn't exposed
to users (no hotpluggable_cpus callback support for ARM yet)
and it will be used only internally until 'device_add cpu'
is supported where we can decide on which properties to use.

Signed-off-by: Igor Mammedov 
---
 hw/arm/virt.c | 39 ---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 484754e..4de46b1 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1237,6 +1237,7 @@ static void machvirt_init(MachineState *machine)
 {
 VirtMachineState *vms = VIRT_MACHINE(machine);
 VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine);
+MachineClass *mc = MACHINE_GET_CLASS(machine);
 qemu_irq pic[NUM_IRQS];
 MemoryRegion *sysmem = get_system_memory();
 MemoryRegion *secure_sysmem = NULL;
@@ -1360,10 +1361,16 @@ static void machvirt_init(MachineState *machine)
 exit(1);
 }
 
-for (n = 0; n < smp_cpus; n++) {
-Object *cpuobj = object_new(typename);
+mc->possible_cpu_arch_ids(machine);
+for (n = 0; n < machine->possible_cpus->len; n++) {
+Object *cpuobj;
 
-object_property_set_int(cpuobj, virt_idx2mp_affinity(vms, n),
+if (n >= smp_cpus) {
+break;
+}
+
+cpuobj = object_new(typename);
+object_property_set_int(cpuobj, 
machine->possible_cpus->cpus[n].arch_id,
 "mp-affinity", NULL);
 
 if (!vms->secure) {
@@ -1543,6 +1550,31 @@ static void virt_set_gic_version(Object *obj, const char 
*value, Error **errp)
 }
 }
 
+static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
+{
+int n;
+VirtMachineState *vms = VIRT_MACHINE(ms);
+
+if (ms->possible_cpus) {
+assert(ms->possible_cpus->len == max_cpus);
+return ms->possible_cpus;
+}
+
+ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
+  sizeof(CPUArchId) * max_cpus);
+ms->possible_cpus->len = max_cpus;
+for (n = 0; n < ms->possible_cpus->len; n++) {
+ms->possible_cpus->cpus[n].arch_id =
+virt_idx2mp_affinity(vms, n);
+ms->possible_cpus->cpus[n].props.has_thread_id = true;
+ms->possible_cpus->cpus[n].props.thread_id = n;
+
+/* TODO: add 'has_node/node' here to describe
+   to which node core belongs */
+}
+return ms->possible_cpus;
+}
+
 static void virt_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -1559,6 +1591,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 mc->pci_allow_0_address = true;
 /* We know we will never create a pre-ARMv7 CPU which needs 1K pages */
 mc->minimum_page_bits = 12;
+mc->possible_cpu_arch_ids = virt_possible_cpu_arch_ids;
 }
 
 static const TypeInfo virt_machine_info = {
-- 
2.7.4

[Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option

2017-03-22 Thread Igor Mammedov

Changes since RFC:
* convert all targets that support numa (Eduardo)
* add numa CLI tests
* support wildcard matching with "-numa cpu,..." (Paolo)

Series introduces a new CLI option to allow mapping cpus to numa
nodes using public properties [socket|core|thread]-ids instead of
internal cpu_index and moving internal handling of cpu<->node
mapping from cpu_index based global bitmaps to MachineState.

New '-numa cpu' option is supported only on PC and SPAPR
machines that implement hotpluggable-cpus query.
ARM machine user-facing interface stays cpu_index based due to
lack of hotpluggable-cpus support, but internally cpu<->node
mapping will be using the common for PC/SPAPR/ARM approach
(i.e. store mapping info in MachineState:possible_cpus)

It only provides CLI interface to do mapping, there is no QMP
one as I haven't found a suitable place/way to update/set mapping
after machine_done for QEMU started with -S (stopped mode) so that
mgmt could query hopluggable-cpus first, then map them to numa nodes
in runtime before actually allowing guest to run.

Another alternative I've been considering is to add CLI option
similar to -S but that would pause initialization before machine_init()
callback is run so that user can get CPU layout with hopluggable-cpus,
then map CPUs to numa nodes and unpause to let machine_init() initialize
machine using previously predefined numa mapping.
Such option might also be useful for other usecases.


git repo for testing:
   https://github.com/imammedo/qemu.git cphp_numa_cfg_v1
reference to RFC:
   https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg03693.html

CC: Eduardo Habkost 
CC: Peter Maydell 
CC: Andrew Jones 
CC: David Gibson 
CC: Eric Blake 
CC: Paolo Bonzini 
CC: Shannon Zhao 
CC: qemu-...@nongnu.org
CC: qemu-...@nongnu.org

Igor Mammedov (23):
  tests: add CPUs to numa node mapping test
  hw/arm/virt: extract mp-affinity calculation in separate function
  hw/arm/virt: use machine->possible_cpus for storing possible topology
info
  hw/arm/virt: explicitly allocate cpu_index for cpus
  numa: move source of default CPUs to NUMA node mapping into boards
  spapr: add node-id property to sPAPR core
  pc: add node-id property to CPU
  virt-arm: add node-id property to CPU
  numa: add check that board supports cpu_index to node mapping
  numa: mirror cpu to node mapping in MachineState::possible_cpus
  numa: do default mapping based on possible_cpus instead of node_cpu
bitmaps
  pc: get numa node mapping from possible_cpus instead of
numa_get_node_for_cpu()
  spapr: get numa node mapping from possible_cpus instead of
numa_get_node_for_cpu()
  virt-arm: get numa node mapping from possible_cpus instead of
numa_get_node_for_cpu()
  QMP: include CpuInstanceProperties into query_cpus output output
  tests: numa: add case for QMP command query-cpus
  numa: remove no longer used numa_get_node_for_cpu()
  numa: remove no longer need numa_post_machine_init()
  machine: call machine init from wrapper
  numa: use possible_cpus for not mapped CPUs check
  numa: remove node_cpu bitmaps as they are no longer used
  numa: add '-numa cpu,...' option for property based node mapping
  tests: check -numa node,cpu=props_list usecase

 include/hw/boards.h |  11 +-
 include/hw/ppc/spapr_cpu_core.h |   1 +
 include/qom/cpu.h   |   2 +
 include/sysemu/numa.h   |   8 +-
 cpus.c  |   9 ++
 hw/acpi/cpu.c   |   7 +-
 hw/arm/virt-acpi-build.c|  19 +--
 hw/arm/virt.c   | 137 +++---
 hw/core/machine.c   | 132 ++
 hw/i386/acpi-build.c|  11 +-
 hw/i386/pc.c|  53 +--
 hw/ppc/spapr.c  |  44 +-
 hw/ppc/spapr_cpu_core.c |  21 ++-
 numa.c  | 145 +++
 qapi-schema.json|  13 +-
 qemu-options.hx |  23 ++-
 target/arm/cpu.c|   1 +
 target/i386/cpu.c   |   1 +
 tests/Makefile.include  |   5 +
 tests/numa-test.c   | 301 
 vl.c|   6 +-
 21 files changed, 761 insertions(+), 189 deletions(-)
 create mode 100644 tests/numa-test.c

--
2.7.4

Re: [Qemu-devel] [PATCH qemu-ga] qga: Make qemu-ga compile statically for Windows

2017-03-22 Thread Marc-André Lureau

Hi

On Wed, Mar 22, 2017 at 5:11 PM Sameeh Jubran  wrote:

> Attempting to compile qemu-ga statically as follows for Windows causes
> the following error:
>
> Compilation:
> ./configure --disable-docs --target-list=x86_64-softmmu \
> --cross-prefix=x86_64-w64-mingw32- --static \
> --enable-guest-agent-msi --with-vss-sdk=/path/to/VSSSDK72
>
> make -j8 qemu-ga
>
> Error:
> path/to/qemu/stubs/error-printf.c:7: undefined reference to
> `__imp_g_test_config_vars'
> collect2: error: ld returned 1 exit status
> Makefile:444: recipe for target 'qemu-ga.exe' failed
> make: *** [qemu-ga.exe] Error 1
>

weird, I don't get this error on fedora 25 (but I have no vss-sdk, is that
related?).


>
> This is caused by a bug in the pkg-config file for glib as it doesn't
> define
> GLIB_STATIC_COMPILATION for pkg-config --static.
>

If that's a bug in glib, it would be nice to have a link to a bug report.


>
> Signed-off-by: Stefan Hajnoczi 
> Signed-off-by: Sameeh Jubran 
>
---
>  configure | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/configure b/configure
> index b9a30cf..6f7b460 100755
> --- a/configure
> +++ b/configure
> @@ -4138,7 +4138,7 @@ int main(void) { return VSS_CTX_BACKUP; }
>  EOF
>if compile_prog "$vss_win32_include" "" ; then
>  guest_agent_with_vss="yes"
> -QEMU_CFLAGS="$QEMU_CFLAGS $vss_win32_include"
> +QEMU_CFLAGS="-DGLIB_STATIC_COMPILATION $QEMU_CFLAGS
> $vss_win32_include"
>  libs_qga="-lole32 -loleaut32 -lshlwapi -lstdc++
> -Wl,--enable-stdcall-fixup $libs_qga"
>  qga_vss_provider="qga/vss-win32/qga-vss.dll qga/vss-win32/qga-vss.tlb"
>else
> --
> 2.9.3
>
>
> --
Marc-André Lureau

Re: [Qemu-devel] [PATCH v2 3/3] vfio-pci: process non fatal error of AER

2017-03-22 Thread Michael S. Tsirkin

On Wed, Mar 22, 2017 at 06:36:52PM +0800, Cao jin wrote:
> Make use of the non fatal error eventfd that the kernel module provide
> to process the AER non fatal error. Fatal error still goes into the
> legacy way which results in VM stop.
> 
> Register the handler, wait for notification. Construct aer message and
> pass it to root port on notification. Root port will trigger an interrupt
> to signal guest, then guest driver will do the recovery.
> 
> Signed-off-by: Dou Liyang 
> Signed-off-by: Cao jin 
> ---
>  hw/vfio/pci.c  | 247 
> +
>  hw/vfio/pci.h  |   4 +
>  linux-headers/linux/vfio.h |   2 +
>  3 files changed, 253 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 3d0d005..4912bc6 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2422,6 +2422,34 @@ static void vfio_populate_device(VFIOPCIDevice *vdev, 
> Error **errp)
>   "Could not enable error recovery for the device",
>   vbasedev->name);
>  }
> +
> +irq_info.index = VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX;
> +irq_info.count = 0; /* clear */
> +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
> +if (ret) {
> +/* This can fail for an old kernel or legacy PCI dev */
> +trace_vfio_populate_device_get_irq_info_failure();
> +} else if (irq_info.count == 1) {
> +vdev->pci_aer_non_fatal = true;
> +} else {
> +error_report(WARN_PREFIX
> + "Couldn't enable non fatal error recovery for the 
> device",
> + vbasedev->name);

when does this trigger?

> +}
> +
> +irq_info.index = VFIO_PCI_PASSIVE_RESET_IRQ_INDEX;
> +irq_info.count = 0; /* clear */
> +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
> +if (ret) {
> +/* This can fail for an old kernel or legacy PCI dev */
> +trace_vfio_populate_device_get_irq_info_failure();
> +} else if (irq_info.count == 1) {
> +vdev->passive_reset = true;
> +} else {
> +error_report(WARN_PREFIX
> + "Don't support passive reset notification",
> + vbasedev->name);

when does this happen?
what does this message mean?

> +}
>  }
>  
>  static void vfio_put_device(VFIOPCIDevice *vdev)
> @@ -2432,6 +2460,221 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
>  vfio_put_base_device(>vbasedev);
>  }
>  
> +static void vfio_non_fatal_err_notifier_handler(void *opaque)
> +{
> +VFIOPCIDevice *vdev = opaque;
> +PCIDevice *dev = >pdev;
> +PCIEAERMsg msg = {
> +.severity = PCI_ERR_ROOT_CMD_NONFATAL_EN,
> +.source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
> +};
> +

Should this just use pci_requester_id?


At least Peter thought so.

> +if (!event_notifier_test_and_clear(>non_fatal_err_notifier)) {
> +return;
> +}
> +
> +/* Populate the aer msg and send it to root port */
> +if (dev->exp.aer_cap) {
> +uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
> +uint32_t uncor_status;
> +bool isfatal;
> +
> +uncor_status = vfio_pci_read_config(dev,
> +dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
> +if (!uncor_status) {
> +return;
> +}
> +
> +isfatal = uncor_status & pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
> +if (isfatal) {
> +goto stop;
> +}
> +
> +error_report("%s sending non fatal event to root port. uncor status 
> = "
> + "0x%"PRIx32, vdev->vbasedev.name, uncor_status);
> +pcie_aer_msg(dev, );
> +return;
> +}
> +
> +stop:
> +/* Terminate the guest in case of fatal error */
> +error_report("%s(%s) fatal error detected. Please collect any data"
> +" possible and then kill the guest", __func__, 
> vdev->vbasedev.name);

"Device detected a fatal error. VM stopped".

would be better IMO.


> +vm_stop(RUN_STATE_INTERNAL_ERROR);
> +}
> +
> +/*
> + * Register non fatal error notifier for devices supporting error recovery.
> + * If we encounter a failure in this function, we report an error
> + * and continue after disabling error recovery support for the device.
> + */
> +static void vfio_register_non_fatal_err_notifier(VFIOPCIDevice *vdev)
> +{
> +int ret;
> +int argsz;
> +struct vfio_irq_set *irq_set;
> +int32_t *pfd;
> +
> +if (!vdev->pci_aer_non_fatal) {
> +return;
> +}
> +
> +if (event_notifier_init(>non_fatal_err_notifier, 0)) {
> +error_report("vfio: Unable to init event notifier for non-fatal 
> error detection");
> +vdev->pci_aer_non_fatal = false;
> +return;
> +}
> +
> +argsz = sizeof(*irq_set) + sizeof(*pfd);
> +
> +irq_set = g_malloc0(argsz);
> +irq_set->argsz = argsz;
> +irq_set->flags

Re: [Qemu-devel] Proposal for deprecating unsupported host OSes & architecutures

2017-03-22 Thread Thomas Huth

On 22.03.2017 14:09, Peter Maydell wrote:
> On 22 March 2017 at 12:51, Alex Bennée  wrote:
>> Peter Maydell  writes:
>>> ...unfortunately the gcc compile farm mips board (1) is very slow
>>> and (2) has very little disk space free in /tmp, which means that
>>> it can't pass "make check" because for instance tests/test-replication
>>> assumes it can write comparatively large test files to /tmp/...
>>
>> That makes it sound like a mips cross build or mips linux-user powered
>> image would be useful then?
> 
> Cross build can't actually run 'make check' and I wouldn't
> trust linux-user to run our test suite. Also, if there's
> no hardware that we can sensibly do make/make check
> on then how much can people really care about QEMU on MIPS?
> (In fact since MIPS supports KVM these days, there really
> ought to be sufficiently capable hardware to work as
> a build system.)

Maybe one of our MIPS maintainers has a clue whether there is a public
MIPS build machine available somewhere? (I've put them on CC:)

 Thomas

Re: [Qemu-devel] [PATCH qemu-ga] qga: Make qemu-ga compile statically for Windows

2017-03-22 Thread Peter Maydell

On 22 March 2017 at 13:09, Sameeh Jubran  wrote:
> Attempting to compile qemu-ga statically as follows for Windows causes
> the following error:
>
> Compilation:
> ./configure --disable-docs --target-list=x86_64-softmmu \
> --cross-prefix=x86_64-w64-mingw32- --static \
> --enable-guest-agent-msi --with-vss-sdk=/path/to/VSSSDK72
>
> make -j8 qemu-ga
>
> Error:
> path/to/qemu/stubs/error-printf.c:7: undefined reference to 
> `__imp_g_test_config_vars'
> collect2: error: ld returned 1 exit status
> Makefile:444: recipe for target 'qemu-ga.exe' failed
> make: *** [qemu-ga.exe] Error 1
>
> This is caused by a bug in the pkg-config file for glib as it doesn't define
> GLIB_STATIC_COMPILATION for pkg-config --static.
>
> Signed-off-by: Stefan Hajnoczi 
> Signed-off-by: Sameeh Jubran 
> @@ -4138,7 +4138,7 @@ int main(void) { return VSS_CTX_BACKUP; }
>  EOF
>if compile_prog "$vss_win32_include" "" ; then
>  guest_agent_with_vss="yes"
> -QEMU_CFLAGS="$QEMU_CFLAGS $vss_win32_include"
> +QEMU_CFLAGS="-DGLIB_STATIC_COMPILATION $QEMU_CFLAGS $vss_win32_include"
>  libs_qga="-lole32 -loleaut32 -lshlwapi -lstdc++ 
> -Wl,--enable-stdcall-fixup $libs_qga"
>  qga_vss_provider="qga/vss-win32/qga-vss.dll qga/vss-win32/qga-vss.tlb"
>else

If we need this for static glib compilation we should
be adding it where we do the "test for glib and
add glib related flags to the CFLAGS", not in the
part of configure where we decide that we're
compiling the guest agent.

Also, please can we have a comment that clearly states
why we need this and that we're working around a
packaging bug in glib for Windows, please?

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 3/3] qapi: Fix QemuOpts visitor regression on unvisited input

2017-03-22 Thread Eric Blake

On 03/22/2017 01:47 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> An off-by-one in commit 15c2f669e meant that we were failing to
>> check for unparsed input in all QemuOpts visitors.  Recent testsuite
>> additions show that fixing the obvious bug with bogus fields will
>> also fix the case of an incomplete list visit; update the tests to
>> match the new behavior.
>>

>> @@ -276,8 +276,8 @@ static void
>>  opts_check_list(Visitor *v, Error **errp)
>>  {
>>  /*
>> - * FIXME should set error when unvisited elements remain.  Mostly
>> - * harmless, as the generated visits always visit all elements.
>> + * Unvisited list elements will be reported later when checking if
>> + * unvisited struct members remain.
> 
> Non-native speaker question: if or whether?
> 

Both work to my ear, but whether sounds a bit more formal. I can switch,
since...


>> -visit_check_list(v, _abort); /* BUG: unvisited tail not reported 
>> */
>> +visit_check_list(v, _abort); /* unvisited tail ignored until... */
>>  visit_end_list(v, (void **));
>>
>> -visit_check_struct(v, _abort);
>> +visit_check_struct(v, ); /* ...here */
>> +error_free_or_abort();
>>  visit_end_struct(v, NULL);
>>
>>  qapi_free_intList(list);
> 
> How come unvisited tails are diagnosed late?

Because opts_check_list() is still a no-op, and I didn't want to muck
with how to make it work to catch things earlier.  The late catch is by
virtue of the fact that we track complete coverage by whether the clone
of the QemuOpts still has the key, and the key is not removed until the
list is fully parsed.

> 
>> @@ -239,7 +241,7 @@ test_opts_range_beyond(void)
>>  error_free_or_abort();
>>  visit_end_list(v, (void **));
>>
>> -visit_check_struct(v, _abort);
>> +visit_check_struct(v, );
> 
> This looks wrong.  Either you expect an error or not.  If you do,
> error_free_or_abort() seems missing.  If you don't, the hunk needs to be
> dropped.
> 

...you are correct that this is a spurious hunk, and removing it does
not change the testsuite. v3 coming up.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 58/81] ahci: advertise HOST_CAP_64

2017-03-22 Thread John Snow



On 03/20/2017 07:08 PM, Michael Roth wrote:
> From: Ladi Prosek 
> 
> The AHCI emulation code supports 64-bit addressing and should advertise this
> fact in the Host Capabilities register. Both Linux and Windows drivers test
> this bit to decide if the upper 32 bits of various registers may be written
> to, and at least some versions of Windows have a bug where DMA is attempted
> with an address above 4GB but, in the absence of HOST_CAP_64, the upper 32
> bits are left unititialized which leads to a memory corruption.
> 
> [Maintainer edit:
> 
> This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1411105,
> which affects Windows Server 2008 SP2 in some cases.]
> 
> Signed-off-by: Ladi Prosek 
> Message-id: 1484305370-6220-1-git-send-email-lpro...@redhat.com
> [Amended commit message --js]
> Signed-off-by: John Snow 
> 
> (cherry picked from commit 98cb5dccb192b0082626080890dac413473573c6)
> Signed-off-by: Michael Roth 
> ---
>  hw/ide/ahci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index 3c19bda..6a17acf 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -488,7 +488,7 @@ static void ahci_reg_init(AHCIState *s)
>  s->control_regs.cap = (s->ports - 1) |
>(AHCI_NUM_COMMAND_SLOTS << 8) |
>(AHCI_SUPPORTED_SPEED_GEN1 << 
> AHCI_SUPPORTED_SPEED) |
> -  HOST_CAP_NCQ | HOST_CAP_AHCI;
> +  HOST_CAP_NCQ | HOST_CAP_AHCI | HOST_CAP_64;
>  
>  s->control_regs.impl = (1 << s->ports) - 1;
>  
> 

A reminder that if this is backported to 2.8.1, that you will need to
include the relevant seaBIOS fixes as well. Otherwise, rebooting under
that firmware breaks!

--js

Re: [Qemu-devel] [PATCH v5] vfio error recovery: kernel support

2017-03-22 Thread Michael S. Tsirkin

Minor comments on commit log below.

On Wed, Mar 22, 2017 at 06:34:23PM +0800, Cao jin wrote:
> From: "Michael S. Tsirkin" 
> 
> 0. What happens now (PCIE AER only)
>Fatal errors cause a link reset. Non fatal errors don't.
>All errors stop the QEMU guest eventually, but not immediately,
>because it's detected and reported asynchronously.
>Interrupts are forwarded as usual.
>Correctable errors are not reported to user at all.
> 
>Note:
>PPC EEH is different, but this approach won' affect EEH, because
>EEH treat all errors as fatal ones in AER, will still signal user
>via the legacy eventfd. And all devices/functions in a PE belongs to
>the same IOMMU group, so the slot_reset handler in this approach
>won't affect EEH either.
> 
> 1. Correctable errors
>Hardware can correct these errors without software intervention,
>clear the error status is enough, this is what already done now.
>No need to recover it, nothing changed, leave it as it is.
> 
> 2. Fatal errors
>They will induce a link reset. This is troublesome when user is
>a QEMU guest. This approach doens't touch the existing mechanism.
> 
> 3. Non-fatal errors
>Before

... this patch

>, they are signalled to user the same

... way

> as fatal ones. In this approach,

-> With this patch

>a new eventfd is introduced only for non-fatal error notification.

> By
>splitting non-fatal ones out, it will benefit AER recovery of a QEMU guest
>user by reporting them to guest saparately.

This last sentence does not add any value, pls drop it.

>To maintain backwards compatibility with userspace, non-fatal errors
>will continue to trigger via the existing error interrupt index if a
>non-fatal signaling mechanism has not been registered.
> 
>Note:


Below is imho confusing. Pls copy comment text from below.

>In case of a multi-function device which has different device driver
>for each of them, and one of the functions is bound to vfio while
>others doesn't(i.e., functions belong to
> different IOMMU group), a new
>slot_reset handler & another new eventfd are introduced. This is
>useful when
> device driver wants a slot reset while vfio-pci doesn't,
>which means vfio-pci device will got 
>a passive reset.
> Signal user
>via another new eventfd names
> passive_reset_trigger,
> this helps to
>avoid signalling user twice via the same legacy error trigger.
> For the original design and discussion,
> refer:
> https://www.spinics.net/lists/linux-virtualization/msg29843.html
> 


I don't think we need to keep this history in commit log.

> Signed-off-by: Michael S. Tsirkin 
> Signed-off-by: Cao jin 
> ---
> 
> v5 changelog:
> 1. Add another new eventfd passive_reset_trigger & the boilerplate code,
>used in slot_reset. Add comment for slot_reset().
> 2. Rewrite the commit log.
> 
>  drivers/vfio/pci/vfio_pci.c | 49 
> +++--
>  drivers/vfio/pci/vfio_pci_intrs.c   | 38 
>  drivers/vfio/pci/vfio_pci_private.h |  2 ++
>  include/uapi/linux/vfio.h   |  2 ++
>  4 files changed, 89 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 324c52e..375ba20 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -441,7 +441,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
> *vdev, int irq_type)
>  
>   return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
>   }
> - } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
> + } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
> +irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX ||
> +irq_type == VFIO_PCI_PASSIVE_RESET_IRQ_INDEX) {
>   if (pci_is_pcie(vdev->pdev))
>   return 1;
>   } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
> @@ -796,6 +798,8 @@ static long vfio_pci_ioctl(void *device_data,
>   case VFIO_PCI_REQ_IRQ_INDEX:
>   break;
>   case VFIO_PCI_ERR_IRQ_INDEX:
> + case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
> + case VFIO_PCI_PASSIVE_RESET_IRQ_INDEX:
>   if (pci_is_pcie(vdev->pdev))
>   break;
>   /* pass thru to return error */
> @@ -1282,7 +1286,9 @@ static pci_ers_result_t 
> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>  
>   mutex_lock(>igate);
>  
> - if (vdev->err_trigger)
> + if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
> + eventfd_signal(vdev->non_fatal_err_trigger, 1);
> + else if (vdev->err_trigger)
>   eventfd_signal(vdev->err_trigger, 1);
>  
>   mutex_unlock(>igate);
> @@ -1292,8 +1298,47 @@ static pci_ers_result_t 
> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>

[Qemu-devel] [PATCH qemu-ga] qga: Make qemu-ga compile statically for Windows

2017-03-22 Thread Sameeh Jubran

Attempting to compile qemu-ga statically as follows for Windows causes
the following error:

Compilation:
./configure --disable-docs --target-list=x86_64-softmmu \
--cross-prefix=x86_64-w64-mingw32- --static \
--enable-guest-agent-msi --with-vss-sdk=/path/to/VSSSDK72

make -j8 qemu-ga

Error:
path/to/qemu/stubs/error-printf.c:7: undefined reference to 
`__imp_g_test_config_vars'
collect2: error: ld returned 1 exit status
Makefile:444: recipe for target 'qemu-ga.exe' failed
make: *** [qemu-ga.exe] Error 1

This is caused by a bug in the pkg-config file for glib as it doesn't define
GLIB_STATIC_COMPILATION for pkg-config --static.

Signed-off-by: Stefan Hajnoczi 
Signed-off-by: Sameeh Jubran 
---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index b9a30cf..6f7b460 100755
--- a/configure
+++ b/configure
@@ -4138,7 +4138,7 @@ int main(void) { return VSS_CTX_BACKUP; }
 EOF
   if compile_prog "$vss_win32_include" "" ; then
 guest_agent_with_vss="yes"
-QEMU_CFLAGS="$QEMU_CFLAGS $vss_win32_include"
+QEMU_CFLAGS="-DGLIB_STATIC_COMPILATION $QEMU_CFLAGS $vss_win32_include"
 libs_qga="-lole32 -loleaut32 -lshlwapi -lstdc++ -Wl,--enable-stdcall-fixup 
$libs_qga"
 qga_vss_provider="qga/vss-win32/qga-vss.dll qga/vss-win32/qga-vss.tlb"
   else
-- 
2.9.3

Re: [Qemu-devel] Proposal for deprecating unsupported host OSes & architecutures

2017-03-22 Thread Peter Maydell

On 22 March 2017 at 12:51, Alex Bennée  wrote:
> Peter Maydell  writes:
>> ...unfortunately the gcc compile farm mips board (1) is very slow
>> and (2) has very little disk space free in /tmp, which means that
>> it can't pass "make check" because for instance tests/test-replication
>> assumes it can write comparatively large test files to /tmp/...
>
> That makes it sound like a mips cross build or mips linux-user powered
> image would be useful then?

Cross build can't actually run 'make check' and I wouldn't
trust linux-user to run our test suite. Also, if there's
no hardware that we can sensibly do make/make check
on then how much can people really care about QEMU on MIPS?
(In fact since MIPS supports KVM these days, there really
ought to be sufficiently capable hardware to work as
a build system.)

The "test uses /tmp/" stuff is a test suite bug which
we should fix, which might make the gcc cfarm box at
least kind of usable.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 1/3] blockjob: add block_job_start_shim

2017-03-22 Thread Jeff Cody

On Thu, Mar 16, 2017 at 05:23:49PM -0400, John Snow wrote:
> The purpose of this shim is to allow us to pause pre-started jobs.
> The purpose of *that* is to allow us to buffer a pause request that
> will be able to take effect before the job ever does any work, allowing
> us to create jobs during a quiescent state (under which they will be
> automatically paused), then resuming the jobs after the critical section
> in any order, either:
> 
> (1) -block_job_start
> -block_job_resume (via e.g. drained_end)
> 
> (2) -block_job_resume (via e.g. drained_end)
> -block_job_start
> 
> The problem that requires a startup wrapper is the idea that a job must
> start in the busy=true state only its first time-- all subsequent entries
> require busy to be false, and the toggling of this state is otherwise
> handled during existing pause and yield points.
> 
> The wrapper simply allows us to mandate that a job can "start," set busy
> to true, then immediately pause only if necessary. We could avoid
> requiring a wrapper, but all jobs would need to do it, so it's been
> factored out here.

I think this makes sense.  So when this happens:

* block_job_create
* block_job_pause
* block_job_resume  <-- only effects pause_count, rest is noop
* block_job_start

The block_job_resume is mostly a no-op, only affecting the pause_count but
since there is no job coroutine created yet, the block_job_enter does
nothing.

> 
> Signed-off-by: John Snow 
> ---
>  blockjob.c | 26 +++---
>  1 file changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 69126af..69b4ec6 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -250,16 +250,28 @@ static bool block_job_started(BlockJob *job)
>  return job->co;
>  }
>  
> +/**
> + * All jobs must allow a pause point before entering their job proper. This
> + * ensures that jobs can be paused prior to being started, then resumed 
> later.
> + */
> +static void coroutine_fn block_job_co_entry(void *opaque)
> +{
> +BlockJob *job = opaque;
> +
> +assert(job && job->driver && job->driver->start);
> +block_job_pause_point(job);
> +job->driver->start(job);
> +}
> +
>  void block_job_start(BlockJob *job)
>  {
>  assert(job && !block_job_started(job) && job->paused &&
> -   !job->busy && job->driver->start);
> -job->co = qemu_coroutine_create(job->driver->start, job);
> -if (--job->pause_count == 0) {
> -job->paused = false;
> -job->busy = true;
> -qemu_coroutine_enter(job->co);
> -}
> +   job->driver && job->driver->start);
> +job->co = qemu_coroutine_create(block_job_co_entry, job);
> +job->pause_count--;
> +job->busy = true;
> +job->paused = false;
> +qemu_coroutine_enter(job->co);
>  }
>  
>  void block_job_ref(BlockJob *job)
> -- 
> 2.9.3
>

[Qemu-devel] [PULL for-2.9 1/1] parallels: fix default options parsing

2017-03-22 Thread Stefan Hajnoczi

From: Edgar Kaziahmedov 

parallels block driver is completely broken since commit
commit 75cdcd1553e74b5edc58aed23e3b2da8dabb1876
Author: Markus Armbruster 
Date:   Tue Feb 21 21:14:08 2017 +0100
option: Fix checking of sizes for overflow and trailing crap
Right now even simple
qemu-io -c "read 512 64k" 1.hds
ends up with
Unexpected error in parse_option_size() at util/qemu-option.c:188:
Parameter 'prealloc-size' expects a non-negative number below 2^64
Aborted (core dumped)
The cure is simple - we should use 'M' as a suffix in default option value
instead of 'MiB'.

Signed-off-by: Edgar Kaziahmedov 
Signed-off-by: Denis V. Lunev 
Message-id: 1490002022-22653-1-git-send-email-...@openvz.org
CC: Markus Armbruster 
CC: Stefan Hajnoczi 
Signed-off-by: Stefan Hajnoczi 
---
 block/parallels.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/parallels.c b/block/parallels.c
index 19935e2..6bf9375 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -114,7 +114,7 @@ static QemuOptsList parallels_runtime_opts = {
 .name = PARALLELS_OPT_PREALLOC_SIZE,
 .type = QEMU_OPT_SIZE,
 .help = "Preallocation size on image expansion",
-.def_value_str = "128MiB",
+.def_value_str = "128M",
 },
 {
 .name = PARALLELS_OPT_PREALLOC_MODE,
-- 
2.9.3

[Qemu-devel] [PULL for-2.9 0/1] Block patches

2017-03-22 Thread Stefan Hajnoczi

The following changes since commit 940a8ce075e3408742a4edcabfd6c2a15e2539eb:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2017-03-20 16:34:26 +)

are available in the git repository at:

  git://github.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to ff5bbe56c6f9a74c2d77389a21d5d2368458c939:

  parallels: fix default options parsing (2017-03-21 10:02:36 +)





Edgar Kaziahmedov (1):
  parallels: fix default options parsing

 block/parallels.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.9.3

Re: [Qemu-devel] Proposal for deprecating unsupported host OSes & architecutures

2017-03-22 Thread Alex Bennée


Peter Maydell  writes:

> On 16 March 2017 at 15:23, Peter Maydell  wrote:
>> (Technically right this instant 'mips' and 's390' would be in the
>> 'dump' list, since I don't personally have access yet. But we have
>> a plan for s390, and it turns out there is a mips machine in the
>> gcc compile farm which I'm just checking out.)
>
> ...unfortunately the gcc compile farm mips board (1) is very slow
> and (2) has very little disk space free in /tmp, which means that
> it can't pass "make check" because for instance tests/test-replication
> assumes it can write comparatively large test files to /tmp/...

That makes it sound like a mips cross build or mips linux-user powered
image would be useful then?

--
Alex Bennée

[Qemu-devel] [PATCH] cryptodev: fix asserting single queue

2017-03-22 Thread Halil Pasic

We already check for queues == 1 in cryptodev_builtin_init and when that
is not true raise an error. But before that error is reported the
assertion in cryptodev_builtin_cleanup kicks in (because object is being
finalized and freed).

Let's remove assert(queues == 1) form cryptodev_builtin_cleanup as it
does only harm and no good.

Signed-off-by: Halil Pasic 
---
 backends/cryptodev-builtin.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/backends/cryptodev-builtin.c b/backends/cryptodev-builtin.c
index 82a068e..137c7a6 100644
--- a/backends/cryptodev-builtin.c
+++ b/backends/cryptodev-builtin.c
@@ -359,8 +359,6 @@ static void cryptodev_builtin_cleanup(
 }
 }
 
-assert(queues == 1);
-
 for (i = 0; i < queues; i++) {
 cc = backend->conf.peers.ccs[i];
 if (cc) {
-- 
2.8.4

[Qemu-devel] [PATCH v3 8/8] iotests: add test 178 for qemu-img measure

2017-03-22 Thread Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/178   | 144 +++
 tests/qemu-iotests/178.out.qcow2 | 242 +++
 tests/qemu-iotests/178.out.raw   | 130 +
 tests/qemu-iotests/group |   1 +
 4 files changed, 517 insertions(+)
 create mode 100755 tests/qemu-iotests/178
 create mode 100644 tests/qemu-iotests/178.out.qcow2
 create mode 100644 tests/qemu-iotests/178.out.raw

diff --git a/tests/qemu-iotests/178 b/tests/qemu-iotests/178
new file mode 100755
index 000..b777870
--- /dev/null
+++ b/tests/qemu-iotests/178
@@ -0,0 +1,144 @@
+#!/bin/bash
+#
+# qemu-img measure sub-command tests
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=stefa...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1# failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.pattern
+
+_supported_fmt raw qcow2
+_supported_proto file
+_supported_os Linux
+
+echo "== Input validation =="
+echo
+
+_make_test_img 1G
+
+$QEMU_IMG measure # missing arguments
+$QEMU_IMG measure --size 2G "$TEST_IMG" # only one allowed
+$QEMU_IMG measure "$TEST_IMG" a # only one filename allowed
+$QEMU_IMG measure --object secret,id=sec0,data=MTIzNDU2,format=base64 # 
missing filename
+$QEMU_IMG measure --image-opts # missing filename
+$QEMU_IMG measure -f qcow2 # missing filename
+$QEMU_IMG measure -l snap1 # missing filename
+$QEMU_IMG measure -o , # invalid option list
+$QEMU_IMG measure -l snapshot.foo # invalid snapshot option
+$QEMU_IMG measure --output foo # invalid output format
+$QEMU_IMG measure --size -1 # invalid image size
+$QEMU_IMG measure -O foo "$TEST_IMG" # unknown image file format
+
+make_test_img_with_fmt() {
+# Shadow global variables within this function
+local IMGFMT="$1" IMGOPTS=""
+_make_test_img "$2"
+}
+
+qemu_io_with_fmt() {
+# Shadow global variables within this function
+local QEMU_IO_OPTIONS=$(echo "$QEMU_IO_OPTIONS" | sed "s/-f $IMGFMT/-f 
$1/")
+shift
+$QEMU_IO "$@"
+}
+
+for ofmt in human json; do
+echo
+echo "== Size calculation for a new file ($ofmt) =="
+echo
+
+# Try a few interesting sizes
+$QEMU_IMG measure --output=$ofmt -O "$IMGFMT" --size 0
+$QEMU_IMG measure --output=$ofmt -O "$IMGFMT" --size 2G
+$QEMU_IMG measure --output=$ofmt -O "$IMGFMT" --size 64G
+$QEMU_IMG measure --output=$ofmt -O "$IMGFMT" --size 256G
+$QEMU_IMG measure --output=$ofmt -O "$IMGFMT" --size 1T
+
+# Always test the raw input files but also IMGFMT
+for fmt in $(echo -e "raw\n$IMGFMT\n" | sort -u); do
+echo
+echo "== Empty $fmt input image ($ofmt) =="
+echo
+make_test_img_with_fmt "$fmt" 0
+$QEMU_IMG measure --output=$ofmt -f "$fmt" -O "$IMGFMT" "$TEST_IMG"
+
+echo
+echo "== $fmt input image with data ($ofmt) =="
+echo
+make_test_img_with_fmt "$fmt" 1G
+$QEMU_IMG measure --output=$ofmt -f "$fmt" -O "$IMGFMT" "$TEST_IMG"
+qemu_io_with_fmt "$fmt" -c "write 512 512" "$TEST_IMG" | 
_filter_qemu_io
+qemu_io_with_fmt "$fmt" -c "write 64K 64K" "$TEST_IMG" | 
_filter_qemu_io
+if [ "$fmt" = "qcow2" ]; then
+$QEMU_IMG snapshot -c snapshot1 "$TEST_IMG"
+fi
+qemu_io_with_fmt "$fmt" -c "write 128M 63K" "$TEST_IMG" | 
_filter_qemu_io
+$QEMU_IMG measure --output=$ofmt -f "$fmt" -O "$IMGFMT" "$TEST_IMG"
+
+if [ "$fmt" = "qcow2" ]; then
+echo
+echo "== $fmt input image with internal snapshot ($ofmt) =="
+echo
+$QEMU_IMG measure --output=$ofmt -f "$fmt" -l snapshot1 \
+  -O "$IMGFMT" "$TEST_IMG"
+fi
+
+if [ "$IMGFMT" = "qcow2" ]; then
+echo
+echo "== $fmt input image and a backing file ($ofmt) =="
+echo
+# The backing file doesn't need to exist :)
+$QEMU_IMG measure --output=$ofmt -o backing_file=x \
+  -f "$fmt" -O "$IMGFMT" "$TEST_IMG"
+fi
+
+echo
+echo "== $fmt input image and

[Qemu-devel] [PATCH v3 1/8] block: add bdrv_measure() API

2017-03-22 Thread Stefan Hajnoczi

bdrv_measure() provides a conservative maximum for the size of a new
image.  This information is handy if storage needs to be allocated (e.g.
a SAN or an LVM volume) ahead of time.

Signed-off-by: Stefan Hajnoczi 
---
 qapi/block-core.json  | 25 +
 include/block/block.h |  4 
 include/block/block_int.h |  2 ++
 block.c   | 35 +++
 4 files changed, 66 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0f132fc..42fb90e 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -463,6 +463,31 @@
'*dirty-bitmaps': ['BlockDirtyInfo'] } }
 
 ##
+# @BlockMeasureInfo:
+#
+# Image size calculation information.  This structure describes the size
+# requirements for creating a new image file.
+#
+# The size requirements depend on the new image file format.  File size always
+# equals virtual disk size for the 'raw' format.  Compact formats such as
+# 'qcow2' represent unallocated and zero regions efficiently so file size may
+# be smaller than virtual disk size.
+#
+# The values are upper bounds that are guaranteed to fit the new image file.
+# Subsequent modification, such as internal snapshot or bitmap creation, may
+# require additional space and is not covered here.
+#
+# @required: Size required for a new image file, in bytes.
+#
+# @fully-allocated: Image file size, in bytes, once data has been written
+#   to all sectors.
+#
+# Since: 2.10
+##
+{ 'struct': 'BlockMeasureInfo',
+  'data': {'required': 'int', 'fully-allocated': 'int'} }
+
+##
 # @query-block:
 #
 # Get a list of BlockInfo for all virtual block devices.
diff --git a/include/block/block.h b/include/block/block.h
index 5149260..43c789f 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -298,6 +298,10 @@ int bdrv_truncate(BdrvChild *child, int64_t offset);
 int64_t bdrv_nb_sectors(BlockDriverState *bs);
 int64_t bdrv_getlength(BlockDriverState *bs);
 int64_t bdrv_get_allocated_file_size(BlockDriverState *bs);
+void bdrv_measure(BlockDriver *drv, QemuOpts *opts,
+  BlockDriverState *in_bs,
+  BlockMeasureInfo *info,
+  Error **errp);
 void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
 int bdrv_commit(BlockDriverState *bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 59400bd..5099a58 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -201,6 +201,8 @@ struct BlockDriver {
 int64_t (*bdrv_getlength)(BlockDriverState *bs);
 bool has_variable_length;
 int64_t (*bdrv_get_allocated_file_size)(BlockDriverState *bs);
+void (*bdrv_measure)(QemuOpts *opts, BlockDriverState *in_bs,
+ BlockMeasureInfo *info, Error **errp);
 
 int coroutine_fn (*bdrv_co_pwritev_compressed)(BlockDriverState *bs,
 uint64_t offset, uint64_t bytes, QEMUIOVector *qiov);
diff --git a/block.c b/block.c
index 6e906ec..a191b82 100644
--- a/block.c
+++ b/block.c
@@ -3266,6 +3266,41 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState 
*bs)
 return -ENOTSUP;
 }
 
+/*
+ * bdrv_measure:
+ * @drv: Format driver
+ * @opts: Creation options for new image
+ * @in_bs: Existing image containing data for new image (may be NULL)
+ * @info: Result object
+ * @errp: Error object
+ *
+ * Calculate file size required to create a new image.
+ *
+ * If @in_bs is given then space for allocated clusters and zero clusters
+ * from that image are included in the calculation.  If @opts contains a
+ * backing file that is shared by @in_bs then backing clusters are omitted
+ * from the calculation.
+ *
+ * If @in_bs is NULL then the calculation includes no allocated clusters
+ * unless a preallocation option is given in @opts.
+ *
+ * Note that @in_bs may use a different BlockDriver from @drv.
+ *
+ * If an error occurs the @errp pointer is set.
+ */
+void bdrv_measure(BlockDriver *drv, QemuOpts *opts,
+  BlockDriverState *in_bs, BlockMeasureInfo *info,
+  Error **errp)
+{
+if (!drv->bdrv_measure) {
+error_setg(errp, "Block driver '%s' does not support size measurement",
+   drv->format_name);
+return;
+}
+
+drv->bdrv_measure(opts, in_bs, info, errp);
+}
+
 /**
  * Return number of sectors on success, -errno on error.
  */
-- 
2.9.3

[Qemu-devel] [PATCH v3 7/8] qemu-iotests: support per-format golden output files

2017-03-22 Thread Stefan Hajnoczi

Some tests produce format-dependent output.  Either the difference is
filtered out and ignored, or the test case is format-specific so we
don't need to worry about per-format output differences.

There is a third case: the test script is the same for all image formats
and the format-dependent output is relevant.  An ugly workaround is to
copy-paste the test into multiple per-format test cases.  This
duplicates code and is not maintainable.

This patch allows test cases to add per-format golden output files so a
single test case can work correctly when format-dependent output must be
checked:

  123.out.qcow2
  123.out.raw
  123.out.vmdk
  ...

This naming scheme is not composable with 123.out.nocache or 123.pc.out,
two other scenarios where output files are split.  I don't think it
matters since few test cases need these features.

Signed-off-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/check | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 4b1c674..29553cf 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -338,6 +338,11 @@ do
 reference="$reference_machine"
 fi
 
+reference_format="$source_iotests/$seq.out.$IMGFMT"
+if [ -f "$reference_format" ]; then
+reference="$reference_format"
+fi
+
 if [ "$CACHEMODE" = "none" ]; then
 [ -f "$source_iotests/$seq.out.nocache" ] && 
reference="$source_iotests/$seq.out.nocache"
 fi
-- 
2.9.3

[Qemu-devel] [PATCH v3 0/8] qemu-img: add measure sub-command

2017-03-22 Thread Stefan Hajnoczi

v3:
 * Drop RFC, this is ready to go for QEMU 2.10
 * Use "required size" instead of "required bytes" in qemu-img output for
   consistency [Nir]
 * Clarify BlockMeasureInfo semantics [Max]
 * Clarify bdrv_measure() opts argument and error handling [Nir]
 * Handle -o backing_file= for qcow2 [Max]
 * Handle snapshot options in qemu-img measure
 * Probe input image for allocated data clusters for qcow2.  Didn't centralize
   this because there are format-specific aspects such as the cluster_size.  It
   may make sense to centralize it later (with a bit more complexity) if
   support is added to more formats.
 * Add qemu-img(1) man page section for 'measure' sub-command [Max]
 * Extend test case to cover additional scenarios [Nir]

RFCv2:
 * Publishing RFC again to discuss the new user-visible interfaces.  Code has
   changed quite a bit, I have not kept any Reviewed-by tags.
 * Rename qemu-img sub-command "measure" and API bdrv_measure() [Nir]
 * Report both "required bytes" and "fully allocated bytes" to handle the empty
   image file and prealloc use cases [Nir and Dan]
 * Use bdrv_getlength() instead of bdrv_nb_sectors() [Berto]
 * Rename "err" label "out" in qemu-img-cmds.c [Nir]
 * Add basic qcow2 support, doesn't support qemu-img convert from existing 
files yet

RFCv1:
 * Publishing patch series with just raw support, no qcow2 yet.  Please review
   the command-line interface and let me know if you are happy with this
   approach.

Users and management tools sometimes need to know the size required for a new
disk image so that an LVM volume, SAN LUN, etc can be allocated ahead of time.
Image formats like qcow2 have non-trivial metadata that makes it hard to
estimate the exact size without knowledge of file format internals.

This patch series introduces a new qemu-img sub-command that calculates the
required size for both image creation and conversion scenarios.

The conversion scenario is:

  $ qemu-img measure -f raw -O qcow2 input.img
  required size: 1327680
  fully allocated size: 1074069504

Here an existing image file is taken and the output includes the space required
for data from the input image file.

The creation scenario is:

  $ qemu-img measure -O qcow2 --size 5G
  required size: 327680
  fully allocated size: 1074069504

Stefan Hajnoczi (8):
  block: add bdrv_measure() API
  raw-format: add bdrv_measure() support
  qcow2: extract preallocation calculation function
  qcow2: extract image creation option parsing
  qcow2: add bdrv_measure() support
  qemu-img: add measure subcommand
  qemu-iotests: support per-format golden output files
  iotests: add test 178 for qemu-img measure

 qapi/block-core.json |  25 +++
 include/block/block.h|   4 +
 include/block/block_int.h|   2 +
 block.c  |  35 
 block/qcow2.c| 362 +--
 block/raw-format.c   |  22 +++
 qemu-img.c   | 227 
 qemu-img-cmds.hx |   6 +
 qemu-img.texi|  25 +++
 tests/qemu-iotests/178   | 144 
 tests/qemu-iotests/178.out.qcow2 | 242 ++
 tests/qemu-iotests/178.out.raw   | 130 ++
 tests/qemu-iotests/check |   5 +
 tests/qemu-iotests/group |   1 +
 14 files changed, 1136 insertions(+), 94 deletions(-)
 create mode 100755 tests/qemu-iotests/178
 create mode 100644 tests/qemu-iotests/178.out.qcow2
 create mode 100644 tests/qemu-iotests/178.out.raw

-- 
2.9.3

[Qemu-devel] [PATCH v3 6/8] qemu-img: add measure subcommand

2017-03-22 Thread Stefan Hajnoczi

The measure subcommand calculates the size required by a new image file.
This can be used by users or management tools that need to allocate
space on an LVM volume, SAN LUN, etc before creating or converting an
image file.

Suggested-by: Maor Lipchuk 
Signed-off-by: Stefan Hajnoczi 
---
 qemu-img.c   | 227 +++
 qemu-img-cmds.hx |   6 ++
 qemu-img.texi|  25 ++
 3 files changed, 258 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index 98b836b..eb1fb62 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -59,6 +59,7 @@ enum {
 OPTION_PATTERN = 260,
 OPTION_FLUSH_INTERVAL = 261,
 OPTION_NO_DRAIN = 262,
+OPTION_SIZE = 263,
 };
 
 typedef enum OutputFormat {
@@ -4287,6 +4288,232 @@ out:
 return 0;
 }
 
+static void dump_json_block_measure_info(BlockMeasureInfo *info)
+{
+QString *str;
+QObject *obj;
+Visitor *v = qobject_output_visitor_new();
+
+visit_type_BlockMeasureInfo(v, NULL, , _abort);
+visit_complete(v, );
+str = qobject_to_json_pretty(obj);
+assert(str != NULL);
+printf("%s\n", qstring_get_str(str));
+qobject_decref(obj);
+visit_free(v);
+QDECREF(str);
+}
+
+static int img_measure(int argc, char **argv)
+{
+static const struct option long_options[] = {
+{"help", no_argument, 0, 'h'},
+{"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
+{"object", required_argument, 0, OPTION_OBJECT},
+{"output", required_argument, 0, OPTION_OUTPUT},
+{"size", required_argument, 0, OPTION_SIZE},
+{0, 0, 0, 0}
+};
+OutputFormat output_format = OFORMAT_HUMAN;
+BlockBackend *in_blk = NULL;
+BlockDriver *drv;
+const char *filename = NULL;
+const char *fmt = NULL;
+const char *out_fmt = "raw";
+char *options = NULL;
+char *snapshot_name = NULL;
+QemuOpts *opts = NULL;
+QemuOpts *object_opts = NULL;
+QemuOpts *sn_opts = NULL;
+QemuOptsList *create_opts = NULL;
+bool image_opts = false;
+uint64_t img_size = ~0ULL;
+BlockMeasureInfo info;
+Error *local_err = NULL;
+int ret = 1;
+int c;
+
+while ((c = getopt_long(argc, argv, "hf:O:o:l:",
+long_options, NULL)) != -1) {
+switch (c) {
+case '?':
+case 'h':
+help();
+break;
+case 'f':
+fmt = optarg;
+break;
+case 'O':
+out_fmt = optarg;
+break;
+case 'o':
+if (!is_valid_option_list(optarg)) {
+error_report("Invalid option list: %s", optarg);
+goto out;
+}
+if (!options) {
+options = g_strdup(optarg);
+} else {
+char *old_options = options;
+options = g_strdup_printf("%s,%s", options, optarg);
+g_free(old_options);
+}
+break;
+case 'l':
+if (strstart(optarg, SNAPSHOT_OPT_BASE, NULL)) {
+sn_opts = qemu_opts_parse_noisily(_snapshot_opts,
+  optarg, false);
+if (!sn_opts) {
+error_report("Failed in parsing snapshot param '%s'",
+ optarg);
+goto out;
+}
+} else {
+snapshot_name = optarg;
+}
+break;
+case OPTION_OBJECT:
+object_opts = qemu_opts_parse_noisily(_object_opts,
+  optarg, true);
+if (!object_opts) {
+goto out;
+}
+break;
+case OPTION_IMAGE_OPTS:
+image_opts = true;
+break;
+case OPTION_OUTPUT:
+if (!strcmp(optarg, "json")) {
+output_format = OFORMAT_JSON;
+} else if (!strcmp(optarg, "human")) {
+output_format = OFORMAT_HUMAN;
+} else {
+error_report("--output must be used with human or json "
+ "as argument.");
+goto out;
+}
+break;
+case OPTION_SIZE:
+{
+int64_t sval;
+
+sval = cvtnum(optarg);
+if (sval < 0) {
+if (sval == -ERANGE) {
+error_report("Image size must be less than 8 EiB!");
+} else {
+error_report("Invalid image size specified! You may use "
+ "k, M, G, T, P or E suffixes for ");
+error_report("kilobytes, megabytes, gigabytes, terabytes, "
+ "petabytes and exabytes.");
+}
+goto out;
+}
+img_size = (uint64_t)sval;
+}
+break;
+}
+}
+
+

[Qemu-devel] [PATCH v3 3/8] qcow2: extract preallocation calculation function

2017-03-22 Thread Stefan Hajnoczi

Calculating the preallocated image size will be needed to implement
.bdrv_measure().  Extract the code out into a separate function.

Signed-off-by: Stefan Hajnoczi 
---
 block/qcow2.c | 134 +-
 1 file changed, 76 insertions(+), 58 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6a92d2e..7c702f4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2095,6 +2095,79 @@ static int preallocate(BlockDriverState *bs)
 return 0;
 }
 
+/**
+ * qcow2_calc_prealloc_size:
+ * @total_size: virtual disk size in bytes
+ * @cluster_size: cluster size in bytes
+ * @refcount_order: refcount bits power-of-2 exponent
+ *
+ * Returns: Total number of bytes required for the fully allocated image
+ * (including metadata).
+ */
+static int64_t qcow2_calc_prealloc_size(int64_t total_size,
+size_t cluster_size,
+int refcount_order)
+{
+/* Note: The following calculation does not need to be exact; if it is a
+ * bit off, either some bytes will be "leaked" (which is fine) or we
+ * will need to increase the file size by some bytes (which is fine,
+ * too, as long as the bulk is allocated here). Therefore, using
+ * floating point arithmetic is fine. */
+int64_t meta_size = 0;
+uint64_t nreftablee, nrefblocke, nl1e, nl2e;
+int64_t aligned_total_size = align_offset(total_size, cluster_size);
+int cluster_bits = ctz32(cluster_size);
+int refblock_bits, refblock_size;
+/* refcount entry size in bytes */
+double rces = (1 << refcount_order) / 8.;
+
+/* see qcow2_open() */
+refblock_bits = cluster_bits - (refcount_order - 3);
+refblock_size = 1 << refblock_bits;
+
+/* header: 1 cluster */
+meta_size += cluster_size;
+
+/* total size of L2 tables */
+nl2e = aligned_total_size / cluster_size;
+nl2e = align_offset(nl2e, cluster_size / sizeof(uint64_t));
+meta_size += nl2e * sizeof(uint64_t);
+
+/* total size of L1 tables */
+nl1e = nl2e * sizeof(uint64_t) / cluster_size;
+nl1e = align_offset(nl1e, cluster_size / sizeof(uint64_t));
+meta_size += nl1e * sizeof(uint64_t);
+
+/* total size of refcount blocks
+ *
+ * note: every host cluster is reference-counted, including metadata
+ * (even refcount blocks are recursively included).
+ * Let:
+ *   a = total_size (this is the guest disk size)
+ *   m = meta size not including refcount blocks and refcount tables
+ *   c = cluster size
+ *   y1 = number of refcount blocks entries
+ *   y2 = meta size including everything
+ *   rces = refcount entry size in bytes
+ * then,
+ *   y1 = (y2 + a)/c
+ *   y2 = y1 * rces + y1 * rces * sizeof(u64) / c + m
+ * we can get y1:
+ *   y1 = (a + m) / (c - rces - rces * sizeof(u64) / c)
+ */
+nrefblocke = (aligned_total_size + meta_size + cluster_size)
+/ (cluster_size - rces - rces * sizeof(uint64_t)
+/ cluster_size);
+meta_size += DIV_ROUND_UP(nrefblocke, refblock_size) * cluster_size;
+
+/* total size of refcount tables */
+nreftablee = nrefblocke / refblock_size;
+nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
+meta_size += nreftablee * sizeof(uint64_t);
+
+return meta_size + aligned_total_size;
+}
+
 static int qcow2_create2(const char *filename, int64_t total_size,
  const char *backing_file, const char *backing_format,
  int flags, size_t cluster_size, PreallocMode prealloc,
@@ -2133,64 +2206,9 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 int ret;
 
 if (prealloc == PREALLOC_MODE_FULL || prealloc == PREALLOC_MODE_FALLOC) {
-/* Note: The following calculation does not need to be exact; if it is 
a
- * bit off, either some bytes will be "leaked" (which is fine) or we
- * will need to increase the file size by some bytes (which is fine,
- * too, as long as the bulk is allocated here). Therefore, using
- * floating point arithmetic is fine. */
-int64_t meta_size = 0;
-uint64_t nreftablee, nrefblocke, nl1e, nl2e;
-int64_t aligned_total_size = align_offset(total_size, cluster_size);
-int refblock_bits, refblock_size;
-/* refcount entry size in bytes */
-double rces = (1 << refcount_order) / 8.;
-
-/* see qcow2_open() */
-refblock_bits = cluster_bits - (refcount_order - 3);
-refblock_size = 1 << refblock_bits;
-
-/* header: 1 cluster */
-meta_size += cluster_size;
-
-/* total size of L2 tables */
-nl2e = aligned_total_size / cluster_size;
-nl2e = align_offset(nl2e, cluster_size / sizeof(uint64_t));
-meta_size += nl2e * sizeof(uint64_t);
-
-/* total size of L1 tables */
-nl1e = nl2e *

[Qemu-devel] [PATCH v3 4/8] qcow2: extract image creation option parsing

2017-03-22 Thread Stefan Hajnoczi

The image creation options parsed by qcow2_create() are also needed to
implement .bdrv_measure().  Extract the parsing code, including input
validation.

Signed-off-by: Stefan Hajnoczi 
---
 block/qcow2.c | 109 +++---
 1 file changed, 73 insertions(+), 36 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 7c702f4..19be468 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2168,24 +2168,73 @@ static int64_t qcow2_calc_prealloc_size(int64_t 
total_size,
 return meta_size + aligned_total_size;
 }
 
-static int qcow2_create2(const char *filename, int64_t total_size,
- const char *backing_file, const char *backing_format,
- int flags, size_t cluster_size, PreallocMode prealloc,
- QemuOpts *opts, int version, int refcount_order,
- Error **errp)
+static size_t qcow2_opt_get_cluster_size_del(QemuOpts *opts, Error **errp)
 {
+size_t cluster_size;
 int cluster_bits;
-QDict *options;
 
-/* Calculate cluster_bits */
+cluster_size = qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE,
+ DEFAULT_CLUSTER_SIZE);
 cluster_bits = ctz32(cluster_size);
 if (cluster_bits < MIN_CLUSTER_BITS || cluster_bits > MAX_CLUSTER_BITS ||
 (1 << cluster_bits) != cluster_size)
 {
 error_setg(errp, "Cluster size must be a power of two between %d and "
"%dk", 1 << MIN_CLUSTER_BITS, 1 << (MAX_CLUSTER_BITS - 10));
-return -EINVAL;
+return 0;
 }
+return cluster_size;
+}
+
+static int qcow2_opt_get_version_del(QemuOpts *opts, Error **errp)
+{
+char *buf;
+int ret;
+
+buf = qemu_opt_get_del(opts, BLOCK_OPT_COMPAT_LEVEL);
+if (!buf) {
+ret = 3; /* default */
+} else if (!strcmp(buf, "0.10")) {
+ret = 2;
+} else if (!strcmp(buf, "1.1")) {
+ret = 3;
+} else {
+error_setg(errp, "Invalid compatibility level: '%s'", buf);
+ret = -EINVAL;
+}
+g_free(buf);
+return ret;
+}
+
+static uint64_t qcow2_opt_get_refcount_bits_del(QemuOpts *opts, int version,
+Error **errp)
+{
+uint64_t refcount_bits;
+
+refcount_bits = qemu_opt_get_number_del(opts, BLOCK_OPT_REFCOUNT_BITS, 16);
+if (refcount_bits > 64 || !is_power_of_2(refcount_bits)) {
+error_setg(errp, "Refcount width must be a power of two and may not "
+   "exceed 64 bits");
+return 0;
+}
+
+if (version < 3 && refcount_bits != 16) {
+error_setg(errp, "Different refcount widths than 16 bits require "
+   "compatibility level 1.1 or above (use compat=1.1 or "
+   "greater)");
+return 0;
+}
+
+return refcount_bits;
+}
+
+static int qcow2_create2(const char *filename, int64_t total_size,
+ const char *backing_file, const char *backing_format,
+ int flags, size_t cluster_size, PreallocMode prealloc,
+ QemuOpts *opts, int version, int refcount_order,
+ Error **errp)
+{
+QDict *options;
 
 /*
  * Open the image file and write a minimal qcow2 header.
@@ -2235,7 +2284,7 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 *header = (QCowHeader) {
 .magic  = cpu_to_be32(QCOW_MAGIC),
 .version= cpu_to_be32(version),
-.cluster_bits   = cpu_to_be32(cluster_bits),
+.cluster_bits   = cpu_to_be32(ctz32(cluster_size)),
 .size   = cpu_to_be64(0),
 .l1_table_offset= cpu_to_be64(0),
 .l1_size= cpu_to_be32(0),
@@ -2371,8 +2420,8 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 int flags = 0;
 size_t cluster_size = DEFAULT_CLUSTER_SIZE;
 PreallocMode prealloc;
-int version = 3;
-uint64_t refcount_bits = 16;
+int version;
+uint64_t refcount_bits;
 int refcount_order;
 Error *local_err = NULL;
 int ret;
@@ -2385,8 +2434,12 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 if (qemu_opt_get_bool_del(opts, BLOCK_OPT_ENCRYPT, false)) {
 flags |= BLOCK_FLAG_ENCRYPT;
 }
-cluster_size = qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE,
- DEFAULT_CLUSTER_SIZE);
+cluster_size = qcow2_opt_get_cluster_size_del(opts, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto finish;
+}
 buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
 prealloc = qapi_enum_parse(PreallocMode_lookup, buf,
PREALLOC_MODE__MAX, PREALLOC_MODE_OFF,
@@ -2396,16

[Qemu-devel] [PATCH v3 5/8] qcow2: add bdrv_measure() support

2017-03-22 Thread Stefan Hajnoczi

Use qcow2_calc_prealloc_size() to get the required file size.

Signed-off-by: Stefan Hajnoczi 
---
 block/qcow2.c | 119 ++
 1 file changed, 119 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 19be468..ed898d3 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2940,6 +2940,124 @@ static coroutine_fn int 
qcow2_co_flush_to_os(BlockDriverState *bs)
 return 0;
 }
 
+static void qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
+  BlockMeasureInfo *info, Error **errp)
+{
+Error *local_err = NULL;
+uint64_t required = 0; /* bytes that contribute to required size */
+uint64_t virtual_size; /* disk size as seen by guest */
+uint64_t refcount_bits;
+size_t cluster_size;
+int version;
+char *optstr;
+PreallocMode prealloc;
+bool has_backing_file;
+
+/* Parse image creation options */
+cluster_size = qcow2_opt_get_cluster_size_del(opts, _err);
+if (local_err) {
+goto err;
+}
+
+version = qcow2_opt_get_version_del(opts, _err);
+if (local_err) {
+goto err;
+}
+
+refcount_bits = qcow2_opt_get_refcount_bits_del(opts, version, _err);
+if (local_err) {
+goto err;
+}
+
+optstr = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
+prealloc = qapi_enum_parse(PreallocMode_lookup, optstr,
+   PREALLOC_MODE__MAX, PREALLOC_MODE_OFF,
+   _err);
+g_free(optstr);
+if (local_err) {
+goto err;
+}
+
+optstr = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FILE);
+has_backing_file = !!optstr;
+g_free(optstr);
+
+virtual_size = align_offset(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+cluster_size);
+
+/* Account for input image */
+if (in_bs) {
+int64_t ssize = bdrv_getlength(in_bs);
+if (ssize < 0) {
+error_setg_errno(errp, -ssize, "Unable to get image virtual_size");
+return;
+}
+
+virtual_size = align_offset(ssize, cluster_size);
+
+if (has_backing_file) {
+/* We don't how much of the backing chain is shared by the input
+ * image and the new image file.  In the worst case the new image's
+ * backing file has nothing in common with the input image.  Be
+ * conservative and assume all clusters need to be written.
+ */
+required = virtual_size;
+} else {
+int cluster_sectors = cluster_size / BDRV_SECTOR_SIZE;
+int64_t sector_num;
+int pnum = 0;
+
+for (sector_num = 0;
+ sector_num < ssize / BDRV_SECTOR_SIZE;
+ sector_num += pnum) {
+int nb_sectors = MAX(ssize / BDRV_SECTOR_SIZE - sector_num,
+ INT_MAX);
+BlockDriverState *file;
+int64_t ret;
+
+ret = bdrv_get_block_status_above(in_bs, NULL,
+  sector_num, nb_sectors,
+  , );
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Unable to get block status");
+return;
+}
+
+if (ret & BDRV_BLOCK_ZERO) {
+/* Skip zero regions (safe with no backing file) */
+} else if ((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ==
+   (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) {
+/* Extend pnum to end of cluster for next iteration */
+pnum = ROUND_UP(sector_num + pnum, cluster_sectors) -
+   sector_num;
+
+/* Count clusters we've seen */
+required += (sector_num % cluster_sectors + pnum) *
+BDRV_SECTOR_SIZE;
+}
+}
+}
+}
+
+/* Take into account preallocation.  Nothing special is needed for
+ * PREALLOC_MODE_METADATA since metadata is always counted.
+ */
+if (prealloc == PREALLOC_MODE_FULL || prealloc == PREALLOC_MODE_FALLOC) {
+required = virtual_size;
+}
+
+info->fully_allocated =
+qcow2_calc_prealloc_size(virtual_size, cluster_size,
+ ctz32(refcount_bits));
+
+/* Remove regions that are not required */
+info->required = info->fully_allocated - virtual_size + required;
+return;
+
+err:
+error_propagate(errp, local_err);
+}
+
 static int qcow2_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
 BDRVQcow2State *s = bs->opaque;
@@ -3487,6 +3605,7 @@ BlockDriver bdrv_qcow2 = {
 .bdrv_snapshot_delete   = qcow2_snapshot_delete,
 .bdrv_snapshot_list = qcow2_snapshot_list,
 .bdrv_snapshot_load_tmp = qcow2_snapshot_load_tmp,

[Qemu-devel] [PATCH v3 2/8] raw-format: add bdrv_measure() support

2017-03-22 Thread Stefan Hajnoczi

Maximum size calculation is trivial for the raw format: it's just the
requested image size (because there is no metadata).

Signed-off-by: Stefan Hajnoczi 
---
 block/raw-format.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/block/raw-format.c b/block/raw-format.c
index 86fbc65..e66ba00 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -312,6 +312,27 @@ static int64_t raw_getlength(BlockDriverState *bs)
 return s->size;
 }
 
+static void raw_measure(QemuOpts *opts, BlockDriverState *in_bs,
+BlockMeasureInfo *info,
+Error **errp)
+{
+if (in_bs) {
+int64_t ssize = bdrv_getlength(in_bs);
+if (ssize < 0) {
+error_setg_errno(errp, -ssize, "Unable to get image size");
+return;
+}
+info->required = ssize;
+} else {
+info->required =
+ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+ BDRV_SECTOR_SIZE);
+}
+
+/* Unallocated sectors count towards the file size in raw images */
+info->fully_allocated = info->required;
+}
+
 static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
 return bdrv_get_info(bs->file->bs, bdi);
@@ -477,6 +498,7 @@ BlockDriver bdrv_raw = {
 .bdrv_truncate= _truncate,
 .bdrv_getlength   = _getlength,
 .has_variable_length  = true,
+.bdrv_measure = _measure,
 .bdrv_get_info= _get_info,
 .bdrv_refresh_limits  = _refresh_limits,
 .bdrv_probe_blocksizes = _probe_blocksizes,
-- 
2.9.3

Re: [Qemu-devel] Minimum RAM size for PC machines?

2017-03-22 Thread Igor Mammedov

On Wed, 22 Mar 2017 11:03:44 +0100
Thomas Huth  wrote:

> On 22.03.2017 10:08, Markus Armbruster wrote:
> [...]
> > Are we now ready to accept a simple & stupid patch that actually helps
> > users, say letting boards that care declare minimum and maximum RAM
> > size?  And make PC reject RAM size less than 1MiB, even though "someone"
> > might conceivably have firmware that works with less?  
> 
> I'd say enforce a minimum RAM size on the normal "pc" and "q35" machine,
> but still allow smaller sizes on the "isapc" machine. So if "someone"
> comes around and claims to have a legacy firmware that wants less memory
> than 1MiB, just point them to the isapc machine.
> Just my 0.02 €.
We can print warning that minimum size will be enforced in 1-2 releases since 
2.9/10
and when it's enforced send users that need less to an old qemu version.

>  Thomas
> 
>

Re: [Qemu-devel] [PATCH v3] qemu/thread: Add support for error reporting in qemu_thread_create

2017-03-22 Thread Eric Blake

On 03/21/2017 04:00 PM, Achilles Benetopoulos wrote:
> Failure during thread creation in qemu_thread_create does not force
> the program to exit anymore, since that isn't always the desired
> behaviour. The caller of qemu_thread_create is responsible for the
> error handling.
> 
> Signed-off-by: Achilles Benetopoulos 
> ---
>  cpus.c  | 43 +++

When sending a new version, it's nice to summarize between the ---
separator and the diffstat what is different from the earlier version.


> +++ b/cpus.c
> @@ -1599,6 +1599,7 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
>  char thread_name[VCPU_THREAD_NAME_SIZE];
>  static QemuCond *single_tcg_halt_cond;
>  static QemuThread *single_tcg_cpu_thread;
> +Error *local_err = NULL;

We have a mix of 'err' and 'local_err'; if the shorter name makes it
easier to avoid 80-column lines, then that name might be worth using.

>  
>  if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) {
>  cpu->thread = g_malloc0(sizeof(QemuThread));
> @@ -1612,14 +1613,25 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
>   cpu->cpu_index);
>  
>  qemu_thread_create(cpu->thread, thread_name, 
> qemu_tcg_cpu_thread_fn,
> -   cpu, QEMU_THREAD_JOINABLE);
> +cpu, QEMU_THREAD_JOINABLE, _err);

The old indentation was correct, your change made it look wrong because
it is missing enough space.

> +
> +if (local_err) {
> +error_report_err(local_err);
> +exit(1);
> +}
>  
>  } else {
>  /* share a single thread for all cpus with TCG */
>  snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
> +
>  qemu_thread_create(cpu->thread, thread_name,

Why the blank line addition?

> -   qemu_tcg_rr_cpu_thread_fn,
> -   cpu, QEMU_THREAD_JOINABLE);
> +qemu_tcg_rr_cpu_thread_fn, cpu, QEMU_THREAD_JOINABLE,
> +_err);

Again, why are you changing correct indentation?  The
qemu_tc_rr_cpu_thread_fn line should not be touched.

> +
> +if (local_err) {
> +error_report_err(local_err);
> +exit(1);
> +}
>  
>  single_tcg_halt_cond = cpu->halt_cond;
>  single_tcg_cpu_thread = cpu->thread;
> @@ -1640,6 +1652,7 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
>  static void qemu_hax_start_vcpu(CPUState *cpu)
>  {
>  char thread_name[VCPU_THREAD_NAME_SIZE];
> +Error *local_err = NULL;
>  
>  cpu->thread = g_malloc0(sizeof(QemuThread));
>  cpu->halt_cond = g_malloc0(sizeof(QemuCond));
> @@ -1648,7 +1661,11 @@ static void qemu_hax_start_vcpu(CPUState *cpu)
>  snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HAX",
>   cpu->cpu_index);
>  qemu_thread_create(cpu->thread, thread_name, qemu_hax_cpu_thread_fn,
> -   cpu, QEMU_THREAD_JOINABLE);
> +   cpu, QEMU_THREAD_JOINABLE, _err);

Indentation is wrong, now in the opposite direction (too much space).


> @@ -342,13 +343,19 @@ static void pci_edu_realize(PCIDevice *pdev, Error 
> **errp)
>  {
>  EduState *edu = DO_UPCAST(EduState, pdev, pdev);
>  uint8_t *pci_conf = pdev->config;
> +Error *local_err = NULL;
>  
>  timer_init_ms(>dma_timer, QEMU_CLOCK_VIRTUAL, edu_dma_timer, edu);
>  
>  qemu_mutex_init(>thr_mutex);
>  qemu_cond_init(>thr_cond);
>  qemu_thread_create(>thread, "edu", edu_fact_thread,
> -   edu, QEMU_THREAD_JOINABLE);
> +   edu, QEMU_THREAD_JOINABLE, _err);
> +
> +if (local_err) {
> +error_propagate(errp, local_err);
> +return;
> +}

Looking at code like this, I wonder if it would be easier to make
qemu_thread_create() return a value, rather than being void.  Then you
could write:

if (qemu_thread_create(..., errp) < 0) {
return;
}

instead of having to futz around with local_err and error_propagate().


> +++ b/hw/usb/ccid-card-emulated.c
> @@ -34,6 +34,7 @@
>  
>  #include "qemu/thread.h"
>  #include "sysemu/char.h"
> +#include "qapi/error.h"
>  #include "ccid.h"
>  
>  #define DPRINTF(card, lvl, fmt, ...) \
> @@ -485,6 +486,7 @@ static int emulated_initfn(CCIDCardState *base)
>  EmulatedState *card = EMULATED_CCID_CARD(base);
>  VCardEmulError ret;
>  const EnumTable *ptable;
> +Error *err = NULL, *local_err = NULL;

Huh? Why do you need two local error objects? One is generally sufficient.

>  
>  QSIMPLEQ_INIT(>event_list);
>  QSIMPLEQ_INIT(>guest_apdu_list);
> @@ -541,9 +543,17 @@ static int emulated_initfn(CCIDCardState *base)
>  return -1;
>  }
>  qemu_thread_create(>event_thread_id, "ccid/event", event_thread,
> -   card, QEMU_THREAD_JOINABLE);
> +

Re: [Qemu-devel] [Qemu-block] [PATCH for-2.9] blockjob: avoid recursive AioContext locking

2017-03-22 Thread Jeff Cody

On Tue, Mar 21, 2017 at 06:48:10PM +0100, Paolo Bonzini wrote:
> Streaming or any other block job hangs when performed on a block device
> that has a non-default iothread.  This happens because the AioContext
> is acquired twice by block_job_defer_to_main_loop_bh and then released
> only once by BDRV_POLL_WHILE.  (Insert rants on recursive mutexes, which
> 
> unfortunately are a temporary but necessary evil for iothreads at the
> moment).
> 
> Luckily, the reason for the double acquisition is simple; the function
> acquires the AioContext for both the job iothread and the BDS iothread,
> in case the BDS iothread was changed while the job was running.  It
> is therefore enough to skip the second acquisition when the two
> AioContexts are one and the same.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  blockjob.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 69126af..2159df7 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -755,12 +755,16 @@ static void block_job_defer_to_main_loop_bh(void 
> *opaque)
>  
>  /* Fetch BDS AioContext again, in case it has changed */
>  aio_context = blk_get_aio_context(data->job->blk);
> -aio_context_acquire(aio_context);
> +if (aio_context != data->aio_context) {
> +aio_context_acquire(aio_context);
> +}
>  
>  data->job->deferred_to_main_loop = false;
>  data->fn(data->job, data->opaque);
>  
> -aio_context_release(aio_context);
> +if (aio_context != data->aio_context) {
> +aio_context_release(aio_context);
> +}
>  
>  aio_context_release(data->aio_context);
>  
> -- 
> 1.8.3.1
> 
>

Deleted the blank line in the commit message, and:


Thanks,

Applied to my block branch:

git://github.com/codyprime/qemu-kvm-jtc.git block

-Jeff

[Qemu-devel] [PATCH] Revert "apic: save apic_delivered flag"

2017-03-22 Thread Paolo Bonzini

This reverts commit 07bfa354772f2de67008dc66c201b627acff0106.
The global variable is only read as part of a

apic_reset_irq_delivered();
qemu_irq_raise(s->irq);
if (!apic_get_irq_delivered()) {

sequence, so the value never matters at migration time.

Reported-by: Dr. David Alan Gilbert 
Cc: Pavel Dovgalyuk 
Signed-off-by: Paolo Bonzini 
---
 hw/intc/apic_common.c   | 33 -
 include/hw/i386/apic_internal.h |  2 --
 2 files changed, 35 deletions(-)

diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 7a6e771..c3829e3 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -387,25 +387,6 @@ static bool apic_common_sipi_needed(void *opaque)
 return s->wait_for_sipi != 0;
 }
 
-static bool apic_irq_delivered_needed(void *opaque)
-{
-APICCommonState *s = APIC_COMMON(opaque);
-return s->cpu == X86_CPU(first_cpu) && apic_irq_delivered != 0;
-}
-
-static void apic_irq_delivered_pre_save(void *opaque)
-{
-APICCommonState *s = APIC_COMMON(opaque);
-s->apic_irq_delivered = apic_irq_delivered;
-}
-
-static int apic_irq_delivered_post_load(void *opaque, int version_id)
-{
-APICCommonState *s = APIC_COMMON(opaque);
-apic_irq_delivered = s->apic_irq_delivered;
-return 0;
-}
-
 static const VMStateDescription vmstate_apic_common_sipi = {
 .name = "apic_sipi",
 .version_id = 1,
@@ -418,19 +399,6 @@ static const VMStateDescription vmstate_apic_common_sipi = 
{
 }
 };
 
-static const VMStateDescription vmstate_apic_irq_delivered = {
-.name = "apic_irq_delivered",
-.version_id = 1,
-.minimum_version_id = 1,
-.needed = apic_irq_delivered_needed,
-.pre_save = apic_irq_delivered_pre_save,
-.post_load = apic_irq_delivered_post_load,
-.fields = (VMStateField[]) {
-VMSTATE_INT32(apic_irq_delivered, APICCommonState),
-VMSTATE_END_OF_LIST()
-}
-};
-
 static const VMStateDescription vmstate_apic_common = {
 .name = "apic",
 .version_id = 3,
@@ -465,7 +433,6 @@ static const VMStateDescription vmstate_apic_common = {
 },
 .subsections = (const VMStateDescription*[]) {
 _apic_common_sipi,
-_apic_irq_delivered,
 NULL
 }
 };
diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h
index 20ad28c..1209eb4 100644
--- a/include/hw/i386/apic_internal.h
+++ b/include/hw/i386/apic_internal.h
@@ -189,8 +189,6 @@ struct APICCommonState {
 DeviceState *vapic;
 hwaddr vapic_paddr; /* note: persistence via kvmvapic */
 bool legacy_instance_id;
-
-int apic_irq_delivered; /* for saving static variable */
 };
 
 typedef struct VAPICState {
-- 
2.9.3

Re: [Qemu-devel] [PATCH] qemu-binfmt-conf.sh: Fix m68k_mask

2017-03-22 Thread Laurent Vivier

Le 22/03/2017 à 13:04, Andreas Schwab a écrit :
> On Mär 22 2017, Thomas Huth  wrote:
> 
>> On 21.03.2017 10:38, Andreas Schwab wrote:
>>> The m68k mask should not remove the low bit of the ELF version field and
>>> should ignore the OS/ABI field.
>>
>> Did you encounter a problem with a real binary here?
> 
> Yes, some binaries are using ELFOSABI_GNU instead of ELFOSABI_SYSV.  All
> other patterns already ignore the OS/ABI field.

Yes, I've also seen this.

Reviewed-by: Laurent Vivier

Re: [Qemu-devel] [Qemu-block] [PATCH for-2.9] blockjob: avoid recursive AioContext locking

2017-03-22 Thread Jeff Cody

On Tue, Mar 21, 2017 at 06:48:10PM +0100, Paolo Bonzini wrote:
> Streaming or any other block job hangs when performed on a block device
> that has a non-default iothread.  This happens because the AioContext
> is acquired twice by block_job_defer_to_main_loop_bh and then released
> only once by BDRV_POLL_WHILE.  (Insert rants on recursive mutexes, which
> 
> unfortunately are a temporary but necessary evil for iothreads at the
> moment).
> 
> Luckily, the reason for the double acquisition is simple; the function
> acquires the AioContext for both the job iothread and the BDS iothread,
> in case the BDS iothread was changed while the job was running.  It
> is therefore enough to skip the second acquisition when the two
> AioContexts are one and the same.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  blockjob.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 69126af..2159df7 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -755,12 +755,16 @@ static void block_job_defer_to_main_loop_bh(void 
> *opaque)
>  
>  /* Fetch BDS AioContext again, in case it has changed */
>  aio_context = blk_get_aio_context(data->job->blk);
> -aio_context_acquire(aio_context);
> +if (aio_context != data->aio_context) {
> +aio_context_acquire(aio_context);
> +}
>  
>  data->job->deferred_to_main_loop = false;
>  data->fn(data->job, data->opaque);
>  
> -aio_context_release(aio_context);
> +if (aio_context != data->aio_context) {
> +aio_context_release(aio_context);
> +}
>  
>  aio_context_release(data->aio_context);
>  
> -- 
> 1.8.3.1
> 
>

Reviewed-by: Jeff Cody

Re: [Qemu-devel] [PATCH] qemu-binfmt-conf.sh: Fix m68k_mask

2017-03-22 Thread Andreas Schwab

On Mär 22 2017, Thomas Huth  wrote:

> On 21.03.2017 10:38, Andreas Schwab wrote:
>> The m68k mask should not remove the low bit of the ELF version field and
>> should ignore the OS/ABI field.
>
> Did you encounter a problem with a real binary here?

Yes, some binaries are using ELFOSABI_GNU instead of ELFOSABI_SYSV.  All
other patterns already ignore the OS/ABI field.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

[Qemu-devel] [PATCH] doc: fix function spelling

2017-03-22 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
---
 include/io/channel.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 5d48906998..db9bb022a1 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -315,7 +315,7 @@ ssize_t qio_channel_read(QIOChannel *ioc,
  Error **errp);
 
 /**
- * qio_channel_writev:
+ * qio_channel_write:
  * @ioc: the channel object
  * @buf: the memory regions to send data from
  * @buflen: the length of @buf
-- 
2.12.0.191.gc5d8de91d

[Qemu-devel] [Bug 1674925] Re: Qemu PPC64 kvm no display if --device virtio-gpu-pci is selected

2017-03-22 Thread luigiburdo

Hi Thomas, thanks for your reply i will test and report my experience
ASAP

Ciao
Luigi

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1674925

Title:
  Qemu PPC64 kvm no display if  --device virtio-gpu-pci is selected

Status in QEMU:
  New

Bug description:
  Hi,
  i did many tests on qemu 2.8 on my BE machines and i found an issue that i 
think was need to be reported

  Test Machines BE 970MP

  if i setup qemu with

  qemu-system-ppc64 -M 1024 --display sdl(or gtk),gl=on --device virtio-
  gpu-pci,virgl --enable-kvm and so and so

  result is doubled window one is vga other is virtio-gpu-pci without
  any start of the VM . pratically i dont have any output of openbios
  and on the virtual serial output

  the same issue i found is if i select:
  qemu-system-ppc64 -M 1024 --display gtk(or sdl) --device virtio-gpu-pci 
--enable-kvm and so and so

  
  i had been try to change all the -M types of all kind of pseries without any 
positive result.

  Ciao 
  Luigi

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1674925/+subscriptions

Re: [Qemu-devel] [PATCH] qemu-binfmt-conf.sh: Fix m68k_mask

2017-03-22 Thread Thomas Huth

On 21.03.2017 10:38, Andreas Schwab wrote:
> The m68k mask should not remove the low bit of the ELF version field and
> should ignore the OS/ABI field.

Did you encounter a problem with a real binary here? ... then it might
be worth to mention it in the patch description and the patch should
likely be included into QEMU 2.9. Or is this just cosmetics? ... then it
should rather be included in 2.10 later, I think.

 Thomas

> Signed-off-by: Andreas Schwab 
> ---
>  scripts/qemu-binfmt-conf.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/scripts/qemu-binfmt-conf.sh b/scripts/qemu-binfmt-conf.sh
> index 0f1aa63872..484bcf166e 100755
> --- a/scripts/qemu-binfmt-conf.sh
> +++ b/scripts/qemu-binfmt-conf.sh
> @@ -47,7 +47,7 @@ 
> ppc64le_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\x
>  ppc64le_family=ppcle
>  
>  
> m68k_magic='\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x04'
> -m68k_mask='\xff\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff'
> +m68k_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff'
>  m68k_family=m68k
>  
>  # FIXME: We could use the other endianness on a MIPS host.
>

Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-22 Thread Paolo Bonzini



On 16/03/2017 21:02, Xu, Anthony wrote:
>>> memory_region_finalize.
>>> Let me know if you think otherwise.
>>
>> Yes, you can replace memory_region_del_subregion in
>> memory_region_finalize
>> with special code that does
>>
>> assert(!mr->enabled);
>> assert(subregion->container == mr);
>> subregion->container = NULL;
>> QTAILQ_REMOVE(>subregions, subregion, subregions_link);
>> memory_region_unref(subregion);
>>
>> The last four lines are shared with memory_region_del_subregion, so please
>> factor them in a new function, for example
>> memory_region_del_subregion_internal.
> 
> After adding synchronize_rcu, I saw an infinite recursive call,
>   mem_commit-> memory_region_finalize-> mem_commit->
> memory_region_finalize-> ..
> it caused a segment fault, because 8M stack space is used up, and found when 
> memory_region_finalize is called, memory_region_transaction_depth is 0 and 
> memory_region_update_pending is true. That's not normal!

Ok, this is a bug.  This would fix it:

> Please review below patch
> 
> diff --git a/memory.c b/memory.c
> index 64b0a60..4c95aaf 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -906,12 +906,6 @@ void memory_region_transaction_begin(void)
>  ++memory_region_transaction_depth;
>  }
> 
> -static void memory_region_clear_pending(void)
> -{
> -memory_region_update_pending = false;
> -ioeventfd_update_pending = false;
> -}
> -
>  void memory_region_transaction_commit(void)
>  {
>  AddressSpace *as;
> @@ -927,14 +921,14 @@ void memory_region_transaction_commit(void)
>  QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
>  address_space_update_topology(as);
>  }
> -
> +memory_region_update_pending = false;
>  MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
>  } else if (ioeventfd_update_pending) {
>  QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
>  address_space_update_ioeventfds(as);
>  }
> +ioeventfd_update_pending = false;
>  }
> -memory_region_clear_pending();
> }
>  }

So please send it to the list with Signed-off-by line.

> The thing is, seems both address_space_translate and 
> address_space_dispatch_free 
> are called under the global lock. When synchronize_rcu is called, no other 
> threads 
> are in RCU critical section.

No, not necessarily.  address_space_write for example is called outside
the global lock by KVM and it calls address_space_translate.

> Seems RCU is not that useful for address space.

I suggest that you study the code more closely...  there is this in
kvm-all.c:

DPRINTF("handle_mmio\n");
/* Called outside BQL */
address_space_rw(_space_memory,
 run->mmio.phys_addr, attrs,
 run->mmio.data,
 run->mmio.len,
 run->mmio.is_write);

and adding a simple assert() would have quickly disproved your theory.

> After adding synchronize_rcu, we noticed a RCU dead loop.  synchronize_rcu is 
> called 
> inside RCU critical section. It happened when guest OS programmed the PCI bar.
> The call trace is like,
> address_space_write-> pci_host_config_write_common -> 
> memory_region_transaction_commit ->mem_commit-> synchronize_rcu
> pci_host_config_write_common is called inside RCU critical section.
> 
> The address_space_write change fixed this issue.

It's not a fix if the code is not thread-safe anymore!  But I think you
have the answer now as to why you cannot use synchronize_rcu.

Paolo

[Qemu-devel] [PATCH qemu-ga] qemu-ga: Make QGA VSS provider service run only when needed

2017-03-22 Thread Sameeh Jubran

Currently the service runs in background on boot even though it is not
needed and once it is running it never stops. The service needs to be
running only during freeze operation and it should be stopped after
executing thaw.

Signed-off-by: Sameeh Jubran 
---
 qga/vss-win32/install.cpp   | 28 ++--
 qga/vss-win32/requester.cpp |  2 ++
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/qga/vss-win32/install.cpp b/qga/vss-win32/install.cpp
index f4160a3..f41fcdf 100644
--- a/qga/vss-win32/install.cpp
+++ b/qga/vss-win32/install.cpp
@@ -14,7 +14,7 @@
 
 #include "vss-common.h"
 #include 
-#include 
+#include "install.h"
 #include 
 #include 
 #include 
@@ -276,7 +276,7 @@ STDAPI COMRegister(void)
 
 chk(pCatalog->CreateServiceForApplication(
 _bstr_t(QGA_PROVIDER_LNAME), _bstr_t(QGA_PROVIDER_LNAME),
-_bstr_t(L"SERVICE_AUTO_START"), _bstr_t(L"SERVICE_ERROR_NORMAL"),
+_bstr_t(L"SERVICE_DEMAND_START"), _bstr_t(L"SERVICE_ERROR_NORMAL"),
 _bstr_t(L""), _bstr_t(L".\\localsystem"), _bstr_t(L""), FALSE));
 chk(pCatalog->InstallComponent(_bstr_t(QGA_PROVIDER_LNAME),
_bstr_t(dllPath), _bstr_t(tlbPath),
@@ -461,3 +461,27 @@ namespace _com_util
 return bstr;
 }
 }
+
+/* Stop QGA VSS provider service from COM+ Application Admin Catalog */
+
+STDAPI StopService(void)
+{
+HRESULT hr;
+COMInitializer initializer;
+COMPointer pUnknown;
+COMPointer pCatalog;
+
+int count = 0;
+
+chk(QGAProviderFind(QGAProviderCount, (void *)));
+if (count) {
+chk(CoCreateInstance(CLSID_COMAdminCatalog, NULL, CLSCTX_INPROC_SERVER,
+IID_IUnknown, (void **)pUnknown.replace()));
+chk(pUnknown->QueryInterface(IID_ICOMAdminCatalog2,
+(void **)pCatalog.replace()));
+chk(pCatalog->ShutdownApplication(_bstr_t(QGA_PROVIDER_LNAME)));
+}
+
+out:
+return hr;
+}
diff --git a/qga/vss-win32/requester.cpp b/qga/vss-win32/requester.cpp
index 272e71b..27308ad 100644
--- a/qga/vss-win32/requester.cpp
+++ b/qga/vss-win32/requester.cpp
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "vss-common.h"
 #include "requester.h"
+#include "install.h"
 #include 
 #include 
 
@@ -501,4 +502,5 @@ void requester_thaw(int *num_vols, ErrorSet *errset)
 requester_cleanup();
 
 CoUninitialize();
+StopService();
 }
-- 
2.9.3

Re: [Qemu-devel] [PATCH] cirrus: fix PUTPIXEL macro

2017-03-22 Thread Dr. David Alan Gilbert

* Gerd Hoffmann (kra...@redhat.com) wrote:
> Should be "c" not "col".  The macro is used with "col" as third parameter
> everywhere, so this tyops doesn't break something.
> 
> Fixes: 026aeffcb4752054830ba203020ed6eb05bcaba8
> Reported-by: Dr. David Alan Gilbert 
> Signed-off-by: Gerd Hoffmann 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  hw/display/cirrus_vga_rop2.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/display/cirrus_vga_rop2.h b/hw/display/cirrus_vga_rop2.h
> index b86bcd6..b208b73 100644
> --- a/hw/display/cirrus_vga_rop2.h
> +++ b/hw/display/cirrus_vga_rop2.h
> @@ -29,8 +29,8 @@
>  #elif DEPTH == 24
>  #define PUTPIXEL(s, a, c)do {  \
>  ROP_OP(s, a, c);   \
> -ROP_OP(s, a + 1, (col >> 8));  \
> -ROP_OP(s, a + 2, (col >> 16)); \
> +ROP_OP(s, a + 1, (c >> 8));\
> +ROP_OP(s, a + 2, (c >> 16));   \
>  } while (0)
>  #elif DEPTH == 32
>  #define PUTPIXEL(s, a, c)ROP_OP_32(s, a, c)
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] Guest application reading from pl011 without device driver

2017-03-22 Thread Paolo Bonzini

On 22/03/2017 12:07, Peter Maydell wrote:
> That's not the end that's a problem. Here we know we have
> data available from the Windows end to read, we just
> can't feed it to the QEMU UART model yet because the
> UART model is saying "my FIFO is full, try later".
> We should be able to handle that. (I couldn't figure
> out how it works for the socket code either.)

On every iteration of the glib main loop, tcp_chr_read_poll returns zero
and io_watch_poll_prepare removes any previously created QIOChannel
event source for the file descriptor

When hw/char/pl011.c has free space in the FIFO tcp_chr_read_poll
returns nonzero, io_watch_poll_prepare adds again the QIOChannel event
source, the event source calls tcp_chr_read which reads from the socket
and passes the data to the PL011.

qemu_chr_fe_accept_input is only needed to force another call to
tcp_chr_read_poll.

Paolo

Re: [Qemu-devel] Guest application reading from pl011 without device driver

2017-03-22 Thread Paolo Bonzini



On 22/03/2017 11:28, Jiahuan Zhang wrote:
> 
> A function that lets a process sleep until data is available on the
> socket.  The solution is to rewrite Windows chardev handling in QEMU to
> use threads or overlapped I/O.
> 
> Yes, socket is working well. Will you add the "select" to pipe
> implementation?

It's Windows that doesn't support it (the Windows function name is
WaitForSingleObject).

Paolo

> Or shall I look into it and fix? Since in my case, I prefer windows pipe
> to socket.
> But if I do, definitly, I will spend much more effort than you, the experts.

[Qemu-devel] Bug: qemu-system-ppc in Fedora 25 and Windows 10 hangs and/or shows video driver related errors

2017-03-22 Thread Howard Spoelstra

Hi,

2.9 RC1 builds of qemu-system-ppc for Linux and Windows 10 running Mac
OS 9 or OSX 10.3 hang/throw errors when using the SDL2 gui. This does
not happen when using the GTK gui.

Linux build compiled with:
./configure --target-list=ppc-softmmu --enable-sdl --with-sdlabi=2.0
--enable-gtk --with-gtk-abi=3.0

Windows build (cross-compile from fedora 25) compiled with:
./configure --cross-prefix=x86_64-w64-mingw32-
--target-list="ppc-softmmu" --enable-gtk --with-gtkabi=3.0
--enable-sdl --with-sdlabi=2.0

The windows build simply hangs at the SDL window showing the guest has
not initialised the display (yet).

Here is an example of the output in the command window (Fedora 25,
using the nouveau video driver)

nouveau: kernel rejected pushbuf: No such file or directory
nouveau: ch10: krec 0 pushes 0 bufs 1 relocs 0
nouveau: ch10: buf  0002 0004 0004 
nouveau: kernel rejected pushbuf: No such file or directory
nouveau: ch10: krec 0 pushes 2 bufs 15 relocs 0
nouveau: ch10: buf  0002 0004 0004 
nouveau: ch10: buf 0001 0006 0004  0004
nouveau: ch10: buf 0002 0012 0002  0002
nouveau: ch10: buf 0003 000d 0004 0004 
nouveau: ch10: buf 0004 001a 0002 0002 0002
nouveau: ch10: buf 0005 000e 0004 0004 
nouveau: ch10: buf 0006 0008 0002 0002 0002
nouveau: ch10: buf 0007 0002 0004 0004 
nouveau: ch10: buf 0008 0007 0002 0002 
nouveau: ch10: buf 0009 000b 0002 0002 
nouveau: ch10: buf 000a 000a 0002 0002 0002
nouveau: ch10: buf 000b 0006 0004  0004
nouveau: ch10: buf 000c 001b 0002  0002
nouveau: ch10: buf 000d 001c 0002  0002
nouveau: ch10: buf 000e 000c 0004 0004 
nouveau: ch10: psh  072f48 072fd8
nouveau: 0x20056080
nouveau: 0x00d5
nouveau: 0x
nouveau: 0x0040
nouveau: 0x0001
nouveau: 0x
nouveau: 0x20046086
nouveau: 0x0280
nouveau: 0x0180
nouveau: 0x
nouveau: 0x01f8
nouveau: 0x2002608c
nouveau: 0x00cf
nouveau: 0x0001
nouveau: 0x20056091
nouveau: 0x0a00
nouveau: 0x0280
nouveau: 0x0180
nouveau: 0x
nouveau: 0x019c
nouveau: 0x80006223
nouveau: 0x2004622c
nouveau: 0x
nouveau: 0x
nouveau: 0x0280
nouveau: 0x0180
nouveau: 0x20046230
nouveau: 0x
nouveau: 0x0001
nouveau: 0x
nouveau: 0x0001
nouveau: 0x20046234
nouveau: 0x
nouveau: 0x
nouveau: 0x
nouveau: 0x
nouveau: ch10: psh 0007 072fd8 073168
nouveau: 0x20056080
nouveau: 0x00d5
nouveau: 0x
nouveau: 0x0040
nouveau: 0x0001
nouveau: 0x
nouveau: 0x20046086
nouveau: 0x0320
nouveau: 0x0258
nouveau: 0x
nouveau: 0x0298
nouveau: 0x2002608c
nouveau: 0x00cf
nouveau: 0x0001
nouveau: 0x20056091
nouveau: 0x0c80
nouveau: 0x0320
nouveau: 0x0258
nouveau: 0x
nouveau: 0x02b8
nouveau: 0x80006223
nouveau: 0x2004622c
nouveau: 0x
nouveau: 0x
nouveau: 0x0320
nouveau: 0x0258
nouveau: 0x20046230
nouveau: 0x
nouveau: 0x0001
nouveau: 0x
nouveau: 0x0001
nouveau: 0x20046234
nouveau: 0x
nouveau: 0x
nouveau: 0x
nouveau: 0x
nouveau: 0x200308e0
nouveau: 0x0100
nouveau: 0x
nouveau: 0x0036
nouveau: 0xa01108e3
nouveau: 0x
nouveau: 0x3b23d70a
nouveau: 0x
nouveau: 0x
nouveau: 0x
nouveau: 0x
nouveau: 0xbb5a740e
nouveau: 0x
nouveau: 0x
nouveau: 0x
nouveau: 0x
nouveau: 0xc000
nouveau: 0x
nouveau: 0xbf80
nouveau: 0x3f80
nouveau: 0xbf80
nouveau: 0x3f80
nouveau: 0x200308e0
nouveau: 0x0100
nouveau: 0x
nouveau: 0x003a
nouveau: 0xa00508e3
nouveau: 0x
nouveau: 0x
nouveau: 0x
nouveau: 0x
nouveau: 0x
nouveau: 0x20030700
nouveau: 0x1010
nouveau: 0x
nouveau: 0x01943bc8
nouveau: 0x200207c0
nouveau: 0x
nouveau: 0x0194
nouveau: 0x20030708
nouveau: 0x1010
nouveau: 0x
nouveau: 0x01943bc0
nouveau: 0x200207c4
nouveau: 0x
nouveau: 0x0194
nouveau: 0x20050453
nouveau: 0x00074401
nouveau: 0x3f80
nouveau: 0x3f80
nouveau:

Re: [Qemu-devel] Guest application reading from pl011 without device driver

2017-03-22 Thread Peter Maydell

On 22 March 2017 at 08:40, Paolo Bonzini  wrote:
>
>> > I am using a windows named pipe to get the data from a window
>> > host program, which uses ReadFile () in char_win.c
>>
>> OK, bugs in the windows-specific char backend would be
>> unsurprising.
>>
>> I'm not entirely sure how the chardev layer works, but
>> at the pl011 end if we return 0 from our can_receive
>> function then the chardev layer should decide it has
>> nothing to do until the pl011 later calls
>> qemu_chr_fe_accept_input(), I think.
>>
>> I've cc'd Paolo and Marc-André Lureau as the chardev
>> maintainers.
>
> Windows named pipes do not support the equivalent of "select",
> so it's possible that they cause a busy wait.

That's not the end that's a problem. Here we know we have
data available from the Windows end to read, we just
can't feed it to the QEMU UART model yet because the
UART model is saying "my FIFO is full, try later".
We should be able to handle that. (I couldn't figure
out how it works for the socket code either.)

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 1/8] ppc/xics: introduce an ICPState backlink under PowerPCCPU

2017-03-22 Thread David Gibson

On Thu, Mar 16, 2017 at 03:35:05PM +0100, Cédric Le Goater wrote:
> Today, the ICPState array of the sPAPR machine is indexed with
> 'cpu_index' of the CPUState. This numbering of CPUs is internal to
> QEMU and the guest only knows about what is exposed in the device
> tree, that is the 'cpu_dt_id'. This is why sPAPR uses the helper
> xics_get_cpu_index_by_dt_id() to do the mapping in a couple of places.
> 
> To provide a more generic XICS layer, we need to abstract the IRQ
> 'server' number and remove any assumption made on its nature. It
> should not be used as a 'cpu_index' for lookups like xics_cpu_setup()
> and xics_cpu_destroy() do.
> 
> To reach that goal, we choose to introduce an ICPState backlink under
> PowerPCCPU, and let the machine core init routine do the ICPState
> lookup. The resulting object is stored under PowerPCCPU which is
> passed on to xics_cpu_setup(). The IRQ 'server' number in XICS is now
> generic. sPAPR uses 'cpu_dt_id' and PowerNV will use 'PIR' number.
> 
> This also has the benefit of simplifying the sPAPR hcall routines
> which do not need to do any ICPState lookups anymore.
> 
> Signed-off-by: Cédric Le Goater 

Having a direct link from the cpu to the interrupt state is a good
idea.  However, I'm not so fond of having a field that's specific to a
particular platforms intc in the CPU.  I'd suggest making it instead
an Object *.  We can use it for ICP now, but other platforms can use
it for pointers to per-cpu interrupt state if they need to.

> ---
>  hw/intc/xics.c  |  4 ++--
>  hw/intc/xics_spapr.c| 20 +---
>  hw/ppc/spapr_cpu_core.c |  4 +++-
>  target/ppc/cpu.h|  2 ++
>  4 files changed, 12 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index e740989a1162..5cde86ceb3bc 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -52,7 +52,7 @@ int xics_get_cpu_index_by_dt_id(int cpu_dt_id)
>  void xics_cpu_destroy(XICSFabric *xi, PowerPCCPU *cpu)
>  {
>  CPUState *cs = CPU(cpu);
> -ICPState *icp = xics_icp_get(xi, cs->cpu_index);
> +ICPState *icp = cpu->icp;
>  
>  assert(icp);
>  assert(cs == icp->cs);
> @@ -65,7 +65,7 @@ void xics_cpu_setup(XICSFabric *xi, PowerPCCPU *cpu)
>  {
>  CPUState *cs = CPU(cpu);
>  CPUPPCState *env = >env;
> -ICPState *icp = xics_icp_get(xi, cs->cpu_index);
> +ICPState *icp = cpu->icp;
>  ICPStateClass *icpc;
>  
>  assert(icp);
> diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
> index 84d24b2837a7..178b3adc8af7 100644
> --- a/hw/intc/xics_spapr.c
> +++ b/hw/intc/xics_spapr.c
> @@ -43,11 +43,9 @@
>  static target_ulong h_cppr(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> target_ulong opcode, target_ulong *args)
>  {
> -CPUState *cs = CPU(cpu);
> -ICPState *icp = xics_icp_get(XICS_FABRIC(spapr), cs->cpu_index);
>  target_ulong cppr = args[0];
>  
> -icp_set_cppr(icp, cppr);
> +icp_set_cppr(cpu->icp, cppr);
>  return H_SUCCESS;
>  }
>  
> @@ -69,9 +67,7 @@ static target_ulong h_ipi(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  static target_ulong h_xirr(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> target_ulong opcode, target_ulong *args)
>  {
> -CPUState *cs = CPU(cpu);
> -ICPState *icp = xics_icp_get(XICS_FABRIC(spapr), cs->cpu_index);
> -uint32_t xirr = icp_accept(icp);
> +uint32_t xirr = icp_accept(cpu->icp);
>  
>  args[0] = xirr;
>  return H_SUCCESS;
> @@ -80,9 +76,7 @@ static target_ulong h_xirr(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  static target_ulong h_xirr_x(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>   target_ulong opcode, target_ulong *args)
>  {
> -CPUState *cs = CPU(cpu);
> -ICPState *icp = xics_icp_get(XICS_FABRIC(spapr), cs->cpu_index);
> -uint32_t xirr = icp_accept(icp);
> +uint32_t xirr = icp_accept(cpu->icp);
>  
>  args[0] = xirr;
>  args[1] = cpu_get_host_ticks();
> @@ -92,21 +86,17 @@ static target_ulong h_xirr_x(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  static target_ulong h_eoi(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>target_ulong opcode, target_ulong *args)
>  {
> -CPUState *cs = CPU(cpu);
> -ICPState *icp = xics_icp_get(XICS_FABRIC(spapr), cs->cpu_index);
>  target_ulong xirr = args[0];
>  
> -icp_eoi(icp, xirr);
> +icp_eoi(cpu->icp, xirr);
>  return H_SUCCESS;
>  }
>  
>  static target_ulong h_ipoll(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>  target_ulong opcode, target_ulong *args)
>  {
> -CPUState *cs = CPU(cpu);
> -ICPState *icp = xics_icp_get(XICS_FABRIC(spapr), cs->cpu_index);
>  uint32_t mfrr;
> -uint32_t xirr = icp_ipoll(icp, );
> +uint32_t xirr = icp_ipoll(cpu->icp, );
>  
>  args[0] = xirr;
>  args[1] = mfrr;
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index

Re: [Qemu-devel] [PATCH qemu-ga] Start VSS Provider after install

2017-03-22 Thread Sameeh Jubran

On Tue, Mar 21, 2017 at 6:15 PM, Michael Roth 
wrote:

> Quoting Sameeh Jubran (2017-03-21 07:03:26)
> > Signed-off-by: Sameeh Jubran 
>
> What happens without this patch? Fresh installs don't report the
> fsfreeze interface as available?
>
 It solves a bug (https://bugzilla.redhat.com/show_bug.cgi?id=1218937) that
was reported in bugzilla, however
 thinking about it again this patch is not needed as the service should
only be running when it is needed.

>
> > ---
> >  qga/vss-win32/install.cpp | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/qga/vss-win32/install.cpp b/qga/vss-win32/install.cpp
> > index f4160a3..7e38332 100644
> > --- a/qga/vss-win32/install.cpp
> > +++ b/qga/vss-win32/install.cpp
> > @@ -307,6 +307,7 @@ STDAPI COMRegister(void)
> >  chk(put_Value(pObj, L"User", L"SYSTEM"));
> >  chk(pUsersInRole->SaveChanges());
> >
> > +chk(pCatalog->StartApplication(_bstr_t(QGA_PROVIDER_LNAME)));
> >  out:
> >  if (unregisterOnFailure && FAILED(hr)) {
> >  COMUnregister();
> > --
> > 2.9.3
> >
>
>


-- 
Respectfully,
*Sameeh Jubran*
*Linkedin *
*Software Engineer @ Daynix .*

Re: [Qemu-devel] [PATCH kernel v8 3/4] mm: add inerface to offer info about unused pages

2017-03-22 Thread Wang, Wei W

Hi Andrew, 

Do you have any comments on my thoughts? Thanks.

> On 03/17/2017 05:28 AM, Andrew Morton wrote:
> > On Thu, 16 Mar 2017 15:08:46 +0800 Wei Wang 
> wrote:
> >
> >> From: Liang Li 
> >>
> >> This patch adds a function to provides a snapshot of the present
> >> system unused pages. An important usage of this function is to
> >> provide the unsused pages to the Live migration thread, which skips
> >> the transfer of thoses unused pages. Newly used pages can be
> >> re-tracked by the dirty page logging mechanisms.
> > I don't think this will be useful for anything other than
> > virtio-balloon.  I guess it would be better to keep this code in the
> > virtio-balloon driver if possible, even though that's rather a
> > layering violation :( What would have to be done to make that
> > possible?  Perhaps we can put some *small* helpers into page_alloc.c
> > to prevent things from becoming too ugly.
> 
> The patch description was too narrowed and may have caused some confusion,
> sorry about that. This function is aimed to be generic. I agree with the
> description suggested by Michael.
> 
> Since the main body of the function is related to operating on the free_list. 
> I
> think it is better to have them located here.
> Small helpers may be less efficient and thereby causing some performance loss
> as well.
> I think one improvement we can make is to remove the "chunk format"
> related things from this function. The function can generally offer the base 
> pfn
> to the caller's recording buffer. Then it will be the caller's responsibility 
> to
> format the pfn if they need.
> 
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -4498,6 +4498,120 @@ void show_free_areas(unsigned int filter)
> >>show_swap_cache_info();
> >>   }
> >>
> >> +static int __record_unused_pages(struct zone *zone, int order,
> >> +   __le64 *buf, unsigned int size,
> >> +   unsigned int *offset, bool part_fill) {
> >> +  unsigned long pfn, flags;
> >> +  int t, ret = 0;
> >> +  struct list_head *curr;
> >> +  __le64 *chunk;
> >> +
> >> +  if (zone_is_empty(zone))
> >> +  return 0;
> >> +
> >> +  spin_lock_irqsave(>lock, flags);
> >> +
> >> +  if (*offset + zone->free_area[order].nr_free > size && !part_fill) {
> >> +  ret = -ENOSPC;
> >> +  goto out;
> >> +  }
> >> +  for (t = 0; t < MIGRATE_TYPES; t++) {
> >> +  list_for_each(curr, >free_area[order].free_list[t]) {
> >> +  pfn = page_to_pfn(list_entry(curr, struct page, lru));
> >> +  chunk = buf + *offset;
> >> +  if (*offset + 2 > size) {
> >> +  ret = -ENOSPC;
> >> +  goto out;
> >> +  }
> >> +  /* Align to the chunk format used in virtio-balloon */
> >> +  *chunk = cpu_to_le64(pfn << 12);
> >> +  *(chunk + 1) = cpu_to_le64((1 << order) << 12);
> >> +  *offset += 2;
> >> +  }
> >> +  }
> >> +
> >> +out:
> >> +  spin_unlock_irqrestore(>lock, flags);
> >> +
> >> +  return ret;
> >> +}
> > This looks like it could disable interrupts for a long time.  Too long?
> 
> What do you think if we give "budgets" to the above function?
> For example, budget=1000, and there are 2000 nodes on the list.
> record() returns with "incomplete" status in the first round, along with the 
> status
> info, "*continue_node".
> 
> *continue_node: pointer to the starting node of the leftover. If 
> *continue_node
> has been used at the time of the second call (i.e. continue_node->next == 
> NULL),
> which implies that the previous 1000 nodes have been used, then the record()
> function can simply start from the head of the list.
> 
> It is up to the caller whether it needs to continue the second round when 
> getting
> "incomplete".
> 
> >
> >> +/*
> >> + * The record_unused_pages() function is used to record the system
> >> +unused
> >> + * pages. The unused pages can be skipped to transfer during live 
> >> migration.
> >> + * Though the unused pages are dynamically changing, dirty page
> >> +logging
> >> + * mechanisms are able to capture the newly used pages though they
> >> +were
> >> + * recorded as unused pages via this function.
> >> + *
> >> + * This function scans the free page list of the specified order to
> >> +record
> >> + * the unused pages, and chunks those continuous pages following the
> >> +chunk
> >> + * format below:
> >> + * --
> >> + * |  Base (52-bit)   | Rsvd (12-bit) |
> >> + * --
> >> + * --
> >> + * |  Size (52-bit)   | Rsvd (12-bit) |
> >> + * --
> >> + *
> >> + * @start_zone: zone to start the record operation.
> >> + * @order: order of the free page list to record.
> >> + * @buf: buffer to record the

Re: [Qemu-devel] [RFC][PATCH 0/6] "bootonceindex" property

2017-03-22 Thread Laszlo Ersek

On 03/22/17 10:00, Huttunen, Janne (Nokia - FI/Espoo) wrote:
> On Wed, 2017-03-22 at 04:43 -0400, Paolo Bonzini wrote:
>>
>> Understood---my question is how you would set up the alternate
>> boot order: is it something like "keep a button pressed while
>> turning on", or something written in NVRAM, or something else
>> that is completely different?
> 
> In my case the real hardware has a management processor
> on the board and the temporary boot source (and also the
> permanent one for that matter) for the main processor can
> be set from there. Since neither the BIOS nor the management
> firmware are open source, I don't know how it technically
> works, but I assume there either is some shared memory
> between the main BIOS and the management processor or
> alternatively the BIOS talks with the management processor
> with some protocol during boot to get the order.
> 

I'm generally opposed to the proposed implementation for this feature /
use case; that is, the new "bootonceindex" device property.

(1) My somewhat hand-waving counter-argument is simply the complexity /
confusion that it introduces. See for example recent QEMU commit
c0d9f7d0bced ("docs: Add a note about mixing bootindex with "-boot
order"", 2017-02-28).

Even if the proposed solution keeps the "bootorder" fw_cfg file intact,
and firmware wouldn't have to look at other fw_cfg files -- I can
already guarantee that OVMF will not look at other fw_cfg files --, the
command line changes look undesirable to me.

(2) My more technical counter-arguments are:

(2a) Exposing this in the libvirt domain XML would be a huge pain.
AFAICS, libvirt already doesn't expose "-boot once" in the domain XML,
which is a *good* thing.

(2b) With the proposed change, "having rebooted once" becomes explicit
runtime state that is guest-controlled. As such, it would have to be
migrated. Assume that you start the guest on the source host, using both
bootindex and bootonceindex properties. Then, for migration, libvirt (or
the user, manually) starts QEMU on the target host using the same
command line. After migration, if the guest reboots on the target host,
its behavior should depend on whether said reboot is its first reboot
since launching the domain, so the fact whether it rebooted on the
source host should reach the target host.

I think you must already have a means to massage the management
processor to change the boot order, for the next boot. Are you doing
that massaging in code that runs on the main processor? If so, that
means the "guest code" is highly privileged, as it can control outside
components in order to influence the boot order.

For that, I can offer the following analogy:

- use a guest with libvirt

- whenever you want to modify the boot order from within the guest,
  ssh back out to the host, and use virsh-dumpxml (--inactive),
  the xmlstarlet utility, and virsh-define, to dump, edit, and save the
  domain XML non-interactively. Xmlstarlet is extremely versatile for
  modifying domain XMLs (or any other kinds of XMLs), and virsh-define
  explicitly supports the case when the domain is already running.

  In a normal virtualization environment, this would be a huge security
  hole, of course, but you are already manipulating the management
  processor from code that runs on the main processor. Exact same
  privilege escalation.

- whenever you want to relaunch the domain fully (i.e., restart QEMU
  with a new command line), again ssh out to the host, and start a
  process (a shell script) in the background. The script should first
  initiate a domain shutdown (virsh-shutdown), then wait for domain
  termination (virsh-qemu-monitor-event, and see the SHUTDOWN event in
  "qapi/event.json"), then start the domain (virsh start). Which is
  when the modified boot order will take effect.

Alternatively, if you are fine using OVMF (as UEFI firmware) within the
guest, to run your payload, you can try the following commands, to set
the BootNext UEFI variable & to reboot:

  efibootmgr --bootnext 
  reboot

While OVMF heavily massages the UEFI boot order (based on the
"bootorder" fw_cfg file), *if* you stick with a constant set of
bootindex properties (== constant boot order setting in the libvirt
domain XML), then most of the UEFI Boot variables that you get to
see in the guest *should* be stable, and the above commands should
hopefully work (no guarantees though).

Thanks
Laszlo

Re: [Qemu-devel] [PATCH v3] Allow setting NUMA distance for different NUMA nodes

2017-03-22 Thread Andrew Jones


You should have CC'ed me, considering this version is addressing my
review comments, but whatever, I usually skim the list so I found it...

On Wed, Mar 22, 2017 at 05:32:46PM +0800, He Chen wrote:
> Current, QEMU does not provide a clear command to set vNUMA distance for
> guest although we already have `-numa` command to set vNUMA nodes.
> 
> vNUMA distance makes sense in certain scenario.
> But now, if we create a guest that has 4 vNUMA nodes, when we check NUMA
> info via `numactl -H`, we will see:
> 
> node distance:
> node0123
>   0:   10   20   20   20
>   1:   20   10   20   20
>   2:   20   20   10   20
>   3:   20   20   20   10
> 
> Guest kernel regards all local node as distance 10, and all remote node
> as distance 20 when there is no SLIT table since QEMU doesn't build it.
> It looks like a little strange when you have seen the distance in an
> actual physical machine that contains 4 NUMA nodes. My machine shows:
> 
> node distance:
> node0123
>   0:   10   21   31   41
>   1:   21   10   21   31
>   2:   31   21   10   21
>   3:   41   31   21   10
> 
> To set vNUMA distance, guest should see a complete SLIT table.
> I found QEMU has provide `-acpitable` command that allows users to add
> a ACPI table into guest, but it requires users building ACPI table by
> themselves first. Using `-acpitable` to add a SLIT table may be not so
> straightforward or flexible, imagine that when the vNUMA configuration
> is changes and we need to generate another SLIT table manually. It may
> not be friendly to users or upper software like libvirt.
> 
> This patch is going to add SLIT table support in QEMU, and provides
> additional option `dist` for command `-numa` to allow user set vNUMA
> distance by QEMU command.
> 
> With this patch, when a user wants to create a guest that contains
> several vNUMA nodes and also wants to set distance among those nodes,
> the QEMU command would like:
> 
> ```
> -object 
> memory-backend-ram,size=1G,prealloc=yes,host-nodes=0,policy=bind,id=node0 \
> -numa node,nodeid=0,cpus=0,memdev=node0 \
> -object 
> memory-backend-ram,size=1G,prealloc=yes,host-nodes=1,policy=bind,id=node1 \
> -numa node,nodeid=1,cpus=1,memdev=node1 \
> -object 
> memory-backend-ram,size=1G,prealloc=yes,host-nodes=2,policy=bind,id=node2 \
> -numa node,nodeid=2,cpus=2,memdev=node2 \
> -object 
> memory-backend-ram,size=1G,prealloc=yes,host-nodes=3,policy=bind,id=node3 \
> -numa node,nodeid=3,cpus=3,memdev=node3 \
> -numa dist,src=0,dst=1,val=21 \
> -numa dist,src=0,dst=2,val=31 \
> -numa dist,src=0,dst=3,val=41 \
> -numa dist,src=1,dst=0,val=21 \
> ...
> ```
> 
> Signed-off-by: He Chen 
> 
> fix

stray 'fix' above the ---

> ---
>  hw/acpi/aml-build.c | 26 +
>  hw/arm/virt-acpi-build.c|  2 ++
>  hw/i386/acpi-build.c|  2 ++
>  include/hw/acpi/aml-build.h |  1 +
>  include/sysemu/numa.h   |  1 +
>  include/sysemu/sysemu.h |  4 
>  numa.c  | 47 
> +
>  qapi-schema.json| 30 ++---
>  qemu-options.hx | 12 +++-
>  9 files changed, 121 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index c6f2032..410b30e 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -24,6 +24,7 @@
>  #include "hw/acpi/aml-build.h"
>  #include "qemu/bswap.h"
>  #include "qemu/bitops.h"
> +#include "sysemu/numa.h"
>  
>  static GArray *build_alloc_array(void)
>  {
> @@ -1609,3 +1610,28 @@ void build_srat_memory(AcpiSratMemoryAffinity 
> *numamem, uint64_t base,
>  numamem->base_addr = cpu_to_le64(base);
>  numamem->range_length = cpu_to_le64(len);
>  }
> +
> +/*
> + * ACPI spec 5.2.17 System Locality Distance Information Table
> + * (Revision 2.0 or later)
> + */
> +void build_slit(GArray *table_data, BIOSLinker *linker)
> +{
> +int slit_start, i, j;
> +slit_start = table_data->len;
> +
> +acpi_data_push(table_data, sizeof(AcpiTableHeader));
> +
> +build_append_int_noprefix(table_data, nb_numa_nodes, 8);
> +for (i = 0; i < nb_numa_nodes; i++) {
> +for (j = 0; j < nb_numa_nodes; j++) {
> +build_append_int_noprefix(table_data, numa_info[i].distance[j], 
> 1);
> +}
> +}
> +
> +build_header(linker, table_data,
> + (void *)(table_data->data + slit_start),
> + "SLIT",
> + table_data->len - slit_start, 1, NULL, NULL);
> +}
> +
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 0835e59..d9e6828 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -781,6 +781,8 @@ void virt_acpi_build(VirtMachineState *vms, 
> AcpiBuildTables *tables)
>  if (nb_numa_nodes > 0) {
>  acpi_add_table(table_offsets, tables_blob);
>  build_srat(tables_blob, tables->linker, vms);
> +

Re: [Qemu-devel] Guest application reading from pl011 without device driver

2017-03-22 Thread Jiahuan Zhang

On 22 March 2017 at 10:37, Paolo Bonzini  wrote:

>
>
> On 22/03/2017 09:48, Jiahuan Zhang wrote:
> >
> >
> > On 22 March 2017 at 09:40, Paolo Bonzini  > > wrote:
> >
> >
> > > > I am using a windows named pipe to get the data from a window
> > > > host program, which uses ReadFile () in char_win.c
> > >
> > > OK, bugs in the windows-specific char backend would be
> > > unsurprising.
> > >
> > > I'm not entirely sure how the chardev layer works, but
> > > at the pl011 end if we return 0 from our can_receive
> > > function then the chardev layer should decide it has
> > > nothing to do until the pl011 later calls
> > > qemu_chr_fe_accept_input(), I think.
> > >
> > > I've cc'd Paolo and Marc-André Lureau as the chardev
> > > maintainers.
> >
> > Windows named pipes do not support the equivalent of "select",
> > so it's possible that they cause a busy wait.  Try using a
> > TCP socket instead and see if the bug goes away.
> >
> >
> > Hi, I am trying to use a Windows socket for serial redirection instead
> > of Windows named pipe.
> > What do you mean "the equivalent of 'select'"?
>
> A function that lets a process sleep until data is available on the
> socket.  The solution is to rewrite Windows chardev handling in QEMU to
> use threads or overlapped I/O.
>
> Yes, socket is working well. Will you add the "select" to pipe
implementation?
Or shall I look into it and fix? Since in my case, I prefer windows pipe to
socket.
But if I do, definitly, I will spend much more effort than you, the experts.


> Paolo
>

[Qemu-devel] [PATCH v2 2/3] vfio pci: new function to init AER capability

2017-03-22 Thread Cao jin

Signed-off-by: Dou Liyang 
Signed-off-by: Cao jin 
---
 hw/vfio/pci.c | 41 -
 hw/vfio/pci.h |  1 +
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 332f41d..3d0d005 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1855,18 +1855,42 @@ out:
 return 0;
 }
 
-static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
+static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,
+  int pos, uint16_t size, Error **errp)
+{
+PCIDevice *pdev = >pdev;
+uint32_t errcap;
+
+errcap = vfio_pci_read_config(pdev, pos + PCI_ERR_CAP, 4);
+/*
+ * The ability to record multiple headers is depending on
+ * the state of the Multiple Header Recording Capable bit and
+ * enabled by the Multiple Header Recording Enable bit.
+ */
+if ((errcap & PCI_ERR_CAP_MHRC) &&
+(errcap & PCI_ERR_CAP_MHRE)) {
+pdev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_DEFAULT;
+} else {
+pdev->exp.aer_log.log_max = 0;
+}
+
+pcie_cap_deverr_init(pdev);
+return pcie_aer_init(pdev, cap_ver, pos, size, errp);
+}
+
+static int vfio_add_ext_cap(VFIOPCIDevice *vdev, Error **errp)
 {
 PCIDevice *pdev = >pdev;
 uint32_t header;
 uint16_t cap_id, next, size;
 uint8_t cap_ver;
 uint8_t *config;
+int ret = 0;
 
 /* Only add extended caps if we have them and the guest can see them */
 if (!pci_is_express(pdev) || !pci_bus_is_express(pdev->bus) ||
 !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
-return;
+return 0;
 }
 
 /*
@@ -1915,6 +1939,9 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
PCI_EXT_CAP_NEXT_MASK);
 
 switch (cap_id) {
+case PCI_EXT_CAP_ID_ERR:
+ret = vfio_setup_aer(vdev, cap_ver, next, size, errp);
+break;
 case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
 case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
 trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);
@@ -1923,6 +1950,9 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
 pcie_add_capability(pdev, cap_id, cap_ver, next, size);
 }
 
+if (ret) {
+goto out;
+}
 }
 
 /* Cleanup chain head ID if necessary */
@@ -1930,8 +1960,9 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
 pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
 }
 
+out:
 g_free(config);
-return;
+return ret;
 }
 
 static int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp)
@@ -1949,8 +1980,8 @@ static int vfio_add_capabilities(VFIOPCIDevice *vdev, 
Error **errp)
 return ret;
 }
 
-vfio_add_ext_cap(vdev);
-return 0;
+ret = vfio_add_ext_cap(vdev, errp);
+return ret;
 }
 
 static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index a8366bb..34e8b04 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -15,6 +15,7 @@
 #include "qemu-common.h"
 #include "exec/memory.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/pci_bridge.h"
 #include "hw/vfio/vfio-common.h"
 #include "qemu/event_notifier.h"
 #include "qemu/queue.h"
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 1/3] pcie aer: verify if AER functionality is available

2017-03-22 Thread Cao jin

For devices which support AER function, verify it can work or not in the
system:
1. AER capable device is a PCIe device, it can't be plugged into PCI bus
2. If root port doesn't support AER, then there is no need to expose the
   AER capability

Signed-off-by: Dou Liyang 
Signed-off-by: Cao jin 
---
 hw/pci/pcie_aer.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index daf1f65..a2e9818 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -100,6 +100,34 @@ static void aer_log_clear_all_err(PCIEAERLog *aer_log)
 int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, uint16_t offset,
   uint16_t size, Error **errp)
 {
+PCIDevice *parent_dev;
+uint8_t type;
+uint8_t parent_type;
+
+/* Topology test: see if there is need to expose AER cap */
+type = pcie_cap_get_type(dev);
+parent_dev = pci_bridge_get_device(dev->bus);
+while (parent_dev) {
+parent_type = pcie_cap_get_type(parent_dev);
+
+if (type == PCI_EXP_TYPE_ENDPOINT &&
+(parent_type != PCI_EXP_TYPE_ROOT_PORT &&
+ parent_type != PCI_EXP_TYPE_DOWNSTREAM)) {
+error_setg(errp, "Parent device is not a PCIe component");
+return -ENOTSUP;
+}
+
+if (parent_type == PCI_EXP_TYPE_ROOT_PORT) {
+if (!parent_dev->exp.aer_cap)
+{
+error_setg(errp, "Root port does not support AER");
+return -ENOTSUP;
+}
+}
+
+parent_dev = pci_bridge_get_device(parent_dev->bus);
+}
+
 pcie_add_capability(dev, PCI_EXT_CAP_ID_ERR, cap_ver,
 offset, size);
 dev->exp.aer_cap = offset;
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 3/3] vfio-pci: process non fatal error of AER

2017-03-22 Thread Cao jin

Make use of the non fatal error eventfd that the kernel module provide
to process the AER non fatal error. Fatal error still goes into the
legacy way which results in VM stop.

Register the handler, wait for notification. Construct aer message and
pass it to root port on notification. Root port will trigger an interrupt
to signal guest, then guest driver will do the recovery.

Signed-off-by: Dou Liyang 
Signed-off-by: Cao jin 
---
 hw/vfio/pci.c  | 247 +
 hw/vfio/pci.h  |   4 +
 linux-headers/linux/vfio.h |   2 +
 3 files changed, 253 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 3d0d005..4912bc6 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2422,6 +2422,34 @@ static void vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
  "Could not enable error recovery for the device",
  vbasedev->name);
 }
+
+irq_info.index = VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX;
+irq_info.count = 0; /* clear */
+ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
+if (ret) {
+/* This can fail for an old kernel or legacy PCI dev */
+trace_vfio_populate_device_get_irq_info_failure();
+} else if (irq_info.count == 1) {
+vdev->pci_aer_non_fatal = true;
+} else {
+error_report(WARN_PREFIX
+ "Couldn't enable non fatal error recovery for the device",
+ vbasedev->name);
+}
+
+irq_info.index = VFIO_PCI_PASSIVE_RESET_IRQ_INDEX;
+irq_info.count = 0; /* clear */
+ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
+if (ret) {
+/* This can fail for an old kernel or legacy PCI dev */
+trace_vfio_populate_device_get_irq_info_failure();
+} else if (irq_info.count == 1) {
+vdev->passive_reset = true;
+} else {
+error_report(WARN_PREFIX
+ "Don't support passive reset notification",
+ vbasedev->name);
+}
 }
 
 static void vfio_put_device(VFIOPCIDevice *vdev)
@@ -2432,6 +2460,221 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
 vfio_put_base_device(>vbasedev);
 }
 
+static void vfio_non_fatal_err_notifier_handler(void *opaque)
+{
+VFIOPCIDevice *vdev = opaque;
+PCIDevice *dev = >pdev;
+PCIEAERMsg msg = {
+.severity = PCI_ERR_ROOT_CMD_NONFATAL_EN,
+.source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
+};
+
+if (!event_notifier_test_and_clear(>non_fatal_err_notifier)) {
+return;
+}
+
+/* Populate the aer msg and send it to root port */
+if (dev->exp.aer_cap) {
+uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
+uint32_t uncor_status;
+bool isfatal;
+
+uncor_status = vfio_pci_read_config(dev,
+dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
+if (!uncor_status) {
+return;
+}
+
+isfatal = uncor_status & pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
+if (isfatal) {
+goto stop;
+}
+
+error_report("%s sending non fatal event to root port. uncor status = "
+ "0x%"PRIx32, vdev->vbasedev.name, uncor_status);
+pcie_aer_msg(dev, );
+return;
+}
+
+stop:
+/* Terminate the guest in case of fatal error */
+error_report("%s(%s) fatal error detected. Please collect any data"
+" possible and then kill the guest", __func__, 
vdev->vbasedev.name);
+vm_stop(RUN_STATE_INTERNAL_ERROR);
+}
+
+/*
+ * Register non fatal error notifier for devices supporting error recovery.
+ * If we encounter a failure in this function, we report an error
+ * and continue after disabling error recovery support for the device.
+ */
+static void vfio_register_non_fatal_err_notifier(VFIOPCIDevice *vdev)
+{
+int ret;
+int argsz;
+struct vfio_irq_set *irq_set;
+int32_t *pfd;
+
+if (!vdev->pci_aer_non_fatal) {
+return;
+}
+
+if (event_notifier_init(>non_fatal_err_notifier, 0)) {
+error_report("vfio: Unable to init event notifier for non-fatal error 
detection");
+vdev->pci_aer_non_fatal = false;
+return;
+}
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+irq_set->index = VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX;
+irq_set->start = 0;
+irq_set->count = 1;
+pfd = (int32_t *)_set->data;
+
+*pfd = event_notifier_get_fd(>non_fatal_err_notifier);
+qemu_set_fd_handler(*pfd, vfio_non_fatal_err_notifier_handler, NULL, vdev);
+
+ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
+if (ret) {
+error_report("vfio: Failed to set up non-fatal error notification");
+

[Qemu-devel] [PATCH v2 0/3] vfio-pci: support recovery of AER non fatal error

2017-03-22 Thread Cao jin

v2 changelog:
Add the boilerplate code for new eventfd in patch 3. The corresponding
kernel patch is v5.

Test:
Test with func1 passthroughed while func0 doesn't have user.

Cao jin (3):
  pcie aer: verify if AER functionality is available
  vfio pci: new function to init AER capability
  vfio-pci: process non fatal error of AER

 hw/pci/pcie_aer.c  |  28 +
 hw/vfio/pci.c  | 288 -
 hw/vfio/pci.h  |   5 +
 linux-headers/linux/vfio.h |   2 +
 4 files changed, 318 insertions(+), 5 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH v5] vfio error recovery: kernel support

2017-03-22 Thread Cao jin

From: "Michael S. Tsirkin" 

0. What happens now (PCIE AER only)
   Fatal errors cause a link reset. Non fatal errors don't.
   All errors stop the QEMU guest eventually, but not immediately,
   because it's detected and reported asynchronously.
   Interrupts are forwarded as usual.
   Correctable errors are not reported to user at all.

   Note:
   PPC EEH is different, but this approach won' affect EEH, because
   EEH treat all errors as fatal ones in AER, will still signal user
   via the legacy eventfd. And all devices/functions in a PE belongs to
   the same IOMMU group, so the slot_reset handler in this approach
   won't affect EEH either.

1. Correctable errors
   Hardware can correct these errors without software intervention,
   clear the error status is enough, this is what already done now.
   No need to recover it, nothing changed, leave it as it is.

2. Fatal errors
   They will induce a link reset. This is troublesome when user is
   a QEMU guest. This approach doens't touch the existing mechanism.

3. Non-fatal errors
   Before, they are signalled to user the same as fatal ones. In this approach,
   a new eventfd is introduced only for non-fatal error notification. By
   splitting non-fatal ones out, it will benefit AER recovery of a QEMU guest
   user by reporting them to guest saparately.

   To maintain backwards compatibility with userspace, non-fatal errors
   will continue to trigger via the existing error interrupt index if a
   non-fatal signaling mechanism has not been registered.

   Note:
   In case of a multi-function device which has different device driver
   for each of them, and one of the functions is bound to vfio while
   others doesn't(i.e., functions belong to different IOMMU group), a new
   slot_reset handler & another new eventfd are introduced. This is
   useful when device driver wants a slot reset while vfio-pci doesn't,
   which means vfio-pci device will got a passive reset. Signal user
   via another new eventfd names passive_reset_trigger, this helps to
   avoid signalling user twice via the same legacy error trigger.

For the original design and discussion, refer:
https://www.spinics.net/lists/linux-virtualization/msg29843.html


Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Cao jin 
---

v5 changelog:
1. Add another new eventfd passive_reset_trigger & the boilerplate code,
   used in slot_reset. Add comment for slot_reset().
2. Rewrite the commit log.

 drivers/vfio/pci/vfio_pci.c | 49 +++--
 drivers/vfio/pci/vfio_pci_intrs.c   | 38 
 drivers/vfio/pci/vfio_pci_private.h |  2 ++
 include/uapi/linux/vfio.h   |  2 ++
 4 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 324c52e..375ba20 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -441,7 +441,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
*vdev, int irq_type)
 
return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
}
-   } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
+   } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
+  irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX ||
+  irq_type == VFIO_PCI_PASSIVE_RESET_IRQ_INDEX) {
if (pci_is_pcie(vdev->pdev))
return 1;
} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
@@ -796,6 +798,8 @@ static long vfio_pci_ioctl(void *device_data,
case VFIO_PCI_REQ_IRQ_INDEX:
break;
case VFIO_PCI_ERR_IRQ_INDEX:
+   case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
+   case VFIO_PCI_PASSIVE_RESET_IRQ_INDEX:
if (pci_is_pcie(vdev->pdev))
break;
/* pass thru to return error */
@@ -1282,7 +1286,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct 
pci_dev *pdev,
 
mutex_lock(>igate);
 
-   if (vdev->err_trigger)
+   if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
+   eventfd_signal(vdev->non_fatal_err_trigger, 1);
+   else if (vdev->err_trigger)
eventfd_signal(vdev->err_trigger, 1);
 
mutex_unlock(>igate);
@@ -1292,8 +1298,47 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct 
pci_dev *pdev,
return PCI_ERS_RESULT_CAN_RECOVER;
 }
 
+/*
+ * In case of a function/device is bound to vfio, while other collateral ones
+ * are still controlled by device driver(i.e., they belongs to different iommu
+ * group), and device driver want a slot reset when seeing AER errors while
+ * vfio pci doesn't, signal user via with proprietary eventfd in precedence to
+ * the legacy one.
+ */
+static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
+{
+   struct vfio_pci_device *vdev;
+

[Qemu-devel] [Bug 1674925] Re: Qemu PPC64 kvm no display if --device virtio-gpu-pci is selected

2017-03-22 Thread Thomas Huth

Hi! I think unless you use "-vga none" or "-nodefaults", QEMU will always start 
your guest with a VGA card by default, so if you add an additional "--device 
virtio-gpu-pci", you'll end up with a guest that has two video cards, one VGA 
and one virtio-gpu.
Also there is a known bug in the SLOF version that has been shipped with QEMU 
2.8, which causes trouble with virtio-gpu:
http://git.qemu-project.org/?p=SLOF.git;a=commitdiff;h=38bf852e73ce6f0ac801dfe8ef1545c4cd0b5ddb
Please try again with the latest release candidate of QEMU 2.9, it should be 
fixed there.
(But please note that SLOF does not contain a driver for virtio-gpu, so you 
won't see any output from the firmware when starting your guest ... i.e. you'll 
just see some output once Linux has been started)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1674925

Title:
  Qemu PPC64 kvm no display if  --device virtio-gpu-pci is selected

Status in QEMU:
  New

Bug description:
  Hi,
  i did many tests on qemu 2.8 on my BE machines and i found an issue that i 
think was need to be reported

  Test Machines BE 970MP

  if i setup qemu with

  qemu-system-ppc64 -M 1024 --display sdl(or gtk),gl=on --device virtio-
  gpu-pci,virgl --enable-kvm and so and so

  result is doubled window one is vga other is virtio-gpu-pci without
  any start of the VM . pratically i dont have any output of openbios
  and on the virtual serial output

  the same issue i found is if i select:
  qemu-system-ppc64 -M 1024 --display gtk(or sdl) --device virtio-gpu-pci 
--enable-kvm and so and so

  
  i had been try to change all the -M types of all kind of pseries without any 
positive result.

  Ciao 
  Luigi

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1674925/+subscriptions

Re: [Qemu-devel] Minimum RAM size for PC machines?

2017-03-22 Thread David Hildenbrand

On 22.03.2017 11:03, Thomas Huth wrote:
> On 22.03.2017 10:08, Markus Armbruster wrote:
> [...]
>> Are we now ready to accept a simple & stupid patch that actually helps
>> users, say letting boards that care declare minimum and maximum RAM
>> size?  And make PC reject RAM size less than 1MiB, even though "someone"
>> might conceivably have firmware that works with less?
> 
> I'd say enforce a minimum RAM size on the normal "pc" and "q35" machine,
> but still allow smaller sizes on the "isapc" machine. So if "someone"
> comes around and claims to have a legacy firmware that wants less memory
> than 1MiB, just point them to the isapc machine.
> Just my 0.02 €.
> 
>  Thomas

Or maybe simply warn the user that things may go wrong instead of
enforcing it.

-- 

Thanks,

David

Re: [Qemu-devel] Minimum RAM size for PC machines?

2017-03-22 Thread Thomas Huth

On 22.03.2017 10:08, Markus Armbruster wrote:
[...]
> Are we now ready to accept a simple & stupid patch that actually helps
> users, say letting boards that care declare minimum and maximum RAM
> size?  And make PC reject RAM size less than 1MiB, even though "someone"
> might conceivably have firmware that works with less?

I'd say enforce a minimum RAM size on the normal "pc" and "q35" machine,
but still allow smaller sizes on the "isapc" machine. So if "someone"
comes around and claims to have a legacy firmware that wants less memory
than 1MiB, just point them to the isapc machine.
Just my 0.02 €.

 Thomas

[Qemu-devel] [PATCH v3] Allow setting NUMA distance for different NUMA nodes

2017-03-22 Thread He Chen

Current, QEMU does not provide a clear command to set vNUMA distance for
guest although we already have `-numa` command to set vNUMA nodes.

vNUMA distance makes sense in certain scenario.
But now, if we create a guest that has 4 vNUMA nodes, when we check NUMA
info via `numactl -H`, we will see:

node distance:
node0123
  0:   10   20   20   20
  1:   20   10   20   20
  2:   20   20   10   20
  3:   20   20   20   10

Guest kernel regards all local node as distance 10, and all remote node
as distance 20 when there is no SLIT table since QEMU doesn't build it.
It looks like a little strange when you have seen the distance in an
actual physical machine that contains 4 NUMA nodes. My machine shows:

node distance:
node0123
  0:   10   21   31   41
  1:   21   10   21   31
  2:   31   21   10   21
  3:   41   31   21   10

To set vNUMA distance, guest should see a complete SLIT table.
I found QEMU has provide `-acpitable` command that allows users to add
a ACPI table into guest, but it requires users building ACPI table by
themselves first. Using `-acpitable` to add a SLIT table may be not so
straightforward or flexible, imagine that when the vNUMA configuration
is changes and we need to generate another SLIT table manually. It may
not be friendly to users or upper software like libvirt.

This patch is going to add SLIT table support in QEMU, and provides
additional option `dist` for command `-numa` to allow user set vNUMA
distance by QEMU command.

With this patch, when a user wants to create a guest that contains
several vNUMA nodes and also wants to set distance among those nodes,
the QEMU command would like:

```
-object 
memory-backend-ram,size=1G,prealloc=yes,host-nodes=0,policy=bind,id=node0 \
-numa node,nodeid=0,cpus=0,memdev=node0 \
-object 
memory-backend-ram,size=1G,prealloc=yes,host-nodes=1,policy=bind,id=node1 \
-numa node,nodeid=1,cpus=1,memdev=node1 \
-object 
memory-backend-ram,size=1G,prealloc=yes,host-nodes=2,policy=bind,id=node2 \
-numa node,nodeid=2,cpus=2,memdev=node2 \
-object 
memory-backend-ram,size=1G,prealloc=yes,host-nodes=3,policy=bind,id=node3 \
-numa node,nodeid=3,cpus=3,memdev=node3 \
-numa dist,src=0,dst=1,val=21 \
-numa dist,src=0,dst=2,val=31 \
-numa dist,src=0,dst=3,val=41 \
-numa dist,src=1,dst=0,val=21 \
...
```

Signed-off-by: He Chen 

fix
---
 hw/acpi/aml-build.c | 26 +
 hw/arm/virt-acpi-build.c|  2 ++
 hw/i386/acpi-build.c|  2 ++
 include/hw/acpi/aml-build.h |  1 +
 include/sysemu/numa.h   |  1 +
 include/sysemu/sysemu.h |  4 
 numa.c  | 47 +
 qapi-schema.json| 30 ++---
 qemu-options.hx | 12 +++-
 9 files changed, 121 insertions(+), 4 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index c6f2032..410b30e 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -24,6 +24,7 @@
 #include "hw/acpi/aml-build.h"
 #include "qemu/bswap.h"
 #include "qemu/bitops.h"
+#include "sysemu/numa.h"
 
 static GArray *build_alloc_array(void)
 {
@@ -1609,3 +1610,28 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, 
uint64_t base,
 numamem->base_addr = cpu_to_le64(base);
 numamem->range_length = cpu_to_le64(len);
 }
+
+/*
+ * ACPI spec 5.2.17 System Locality Distance Information Table
+ * (Revision 2.0 or later)
+ */
+void build_slit(GArray *table_data, BIOSLinker *linker)
+{
+int slit_start, i, j;
+slit_start = table_data->len;
+
+acpi_data_push(table_data, sizeof(AcpiTableHeader));
+
+build_append_int_noprefix(table_data, nb_numa_nodes, 8);
+for (i = 0; i < nb_numa_nodes; i++) {
+for (j = 0; j < nb_numa_nodes; j++) {
+build_append_int_noprefix(table_data, numa_info[i].distance[j], 1);
+}
+}
+
+build_header(linker, table_data,
+ (void *)(table_data->data + slit_start),
+ "SLIT",
+ table_data->len - slit_start, 1, NULL, NULL);
+}
+
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 0835e59..d9e6828 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -781,6 +781,8 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables 
*tables)
 if (nb_numa_nodes > 0) {
 acpi_add_table(table_offsets, tables_blob);
 build_srat(tables_blob, tables->linker, vms);
+acpi_add_table(table_offsets, tables_blob);
+build_slit(tables_blob, tables->linker);
 }
 
 if (its_class_name() && !vmc->no_its) {
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 2073108..12730ea 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2678,6 +2678,8 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
 if (pcms->numa_nodes) {
 acpi_add_table(table_offsets, tables_blob);
 build_srat(tables_blob, tables->linker,

[Qemu-devel] [Bug 1674925] [NEW] Qemu PPC64 kvm no display if --device virtio-gpu-pci is selected

2017-03-22 Thread luigiburdo

Public bug reported:

Hi,
i did many tests on qemu 2.8 on my BE machines and i found an issue that i 
think was need to be reported

Test Machines BE 970MP

if i setup qemu with

qemu-system-ppc64 -M 1024 --display sdl(or gtk),gl=on --device virtio-
gpu-pci,virgl --enable-kvm and so and so

result is doubled window one is vga other is virtio-gpu-pci without any
start of the VM . pratically i dont have any output of openbios and on
the virtual serial output

the same issue i found is if i select:
qemu-system-ppc64 -M 1024 --display gtk(or sdl) --device virtio-gpu-pci 
--enable-kvm and so and so


i had been try to change all the -M types of all kind of pseries without any 
positive result.

Ciao 
Luigi

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1674925

Title:
  Qemu PPC64 kvm no display if  --device virtio-gpu-pci is selected

Status in QEMU:
  New

Bug description:
  Hi,
  i did many tests on qemu 2.8 on my BE machines and i found an issue that i 
think was need to be reported

  Test Machines BE 970MP

  if i setup qemu with

  qemu-system-ppc64 -M 1024 --display sdl(or gtk),gl=on --device virtio-
  gpu-pci,virgl --enable-kvm and so and so

  result is doubled window one is vga other is virtio-gpu-pci without
  any start of the VM . pratically i dont have any output of openbios
  and on the virtual serial output

  the same issue i found is if i select:
  qemu-system-ppc64 -M 1024 --display gtk(or sdl) --device virtio-gpu-pci 
--enable-kvm and so and so

  
  i had been try to change all the -M types of all kind of pseries without any 
positive result.

  Ciao 
  Luigi

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1674925/+subscriptions

[Qemu-devel] [PATCH v4] xen: use libxendevice model to restrict operations

2017-03-22 Thread Paul Durrant

This patch adds a command-line option (-xen-domid-restrict) which will
use the new libxendevicemodel API to restrict devicemodel [1] operations
to the specified domid. (Such operations are not applicable to the xenpv
machine type).

This patch also adds a tracepoint to allow successful enabling of the
restriction to be monitored.

[1] I.e. operations issued by libxendevicemodel. Operation issued by other
xen libraries (e.g. libxenforeignmemory) are currently still unrestricted
but this will be rectified by subsequent patches.

Signed-off-by: Paul Durrant 
---
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Paolo Bonzini 

NOTE: This is already re-based on Juergen Gross's patch "xen: use 5 digit
  xen versions" and so should not be applied until after that patch
  has been applied.

v4:
 - Added missing quote

v3:
 - Updated usage comment

v2:
 - Log errno in tracepoint
---
 hw/xen/trace-events |  1 +
 include/hw/xen/xen.h|  1 +
 include/hw/xen/xen_common.h | 20 
 qemu-options.hx |  7 +++
 vl.c|  8 
 xen-hvm.c   |  8 
 6 files changed, 45 insertions(+)

diff --git a/hw/xen/trace-events b/hw/xen/trace-events
index c4fb6f1..5615dce 100644
--- a/hw/xen/trace-events
+++ b/hw/xen/trace-events
@@ -11,3 +11,4 @@ xen_map_portio_range(uint32_t id, uint64_t start_addr, 
uint64_t end_addr) "id: %
 xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) 
"id: %u start: %#"PRIx64" end: %#"PRIx64
 xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u 
bdf: %02x.%02x.%02x"
 xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u 
bdf: %02x.%02x.%02x"
+xen_domid_restrict(int err) "err: %u"
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 2b1733b..7efcdaa 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -21,6 +21,7 @@ enum xen_mode {
 
 extern uint32_t xen_domid;
 extern enum xen_mode xen_mode;
+extern bool xen_domid_restrict;
 
 extern bool xen_allowed;
 
diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
index df098c7..4f3bd35 100644
--- a/include/hw/xen/xen_common.h
+++ b/include/hw/xen/xen_common.h
@@ -152,6 +152,13 @@ static inline int xendevicemodel_set_mem_type(
 return xc_hvm_set_mem_type(dmod, domid, mem_type, first_pfn, nr);
 }
 
+static inline int xendevicemodel_restrict(
+xendevicemodel_handle *dmod, domid_t domid)
+{
+errno = ENOTTY;
+return -1;
+}
+
 #else /* CONFIG_XEN_CTRL_INTERFACE_VERSION >= 40900 */
 
 #include 
@@ -206,6 +213,19 @@ static inline int xen_modified_memory(domid_t domid, 
uint64_t first_pfn,
 return xendevicemodel_modified_memory(xen_dmod, domid, first_pfn, nr);
 }
 
+static inline int xen_restrict(domid_t domid)
+{
+int rc = xendevicemodel_restrict(xen_dmod, domid);
+
+trace_xen_domid_restrict(errno);
+
+if (errno == ENOTTY) {
+return 0;
+}
+
+return rc;
+}
+
 /* Xen 4.2 through 4.6 */
 #if CONFIG_XEN_CTRL_INTERFACE_VERSION < 40701
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 99af8ed..2043371 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3354,6 +3354,11 @@ DEF("xen-attach", 0, QEMU_OPTION_xen_attach,
 "-xen-attach attach to existing xen domain\n"
 "xend will use this when starting QEMU\n",
 QEMU_ARCH_ALL)
+DEF("xen-domid-restrict", 0, QEMU_OPTION_xen_domid_restrict,
+"-xen-domid-restrict restrict set of available xen operations\n"
+"to specified domain id. (Does not affect\n"
+"xenpv machine type).\n",
+QEMU_ARCH_ALL)
 STEXI
 @item -xen-domid @var{id}
 @findex -xen-domid
@@ -3366,6 +3371,8 @@ Warning: should not be used when xend is in use (XEN 
only).
 @findex -xen-attach
 Attach to existing xen domain.
 xend will use this when starting QEMU (XEN only).
+@findex -xen-domid-restrict
+Restrict set of available xen operations to specified domain id (XEN only).
 ETEXI
 
 DEF("no-reboot", 0, QEMU_OPTION_no_reboot, \
diff --git a/vl.c b/vl.c
index 0b4ed52..f46e070 100644
--- a/vl.c
+++ b/vl.c
@@ -205,6 +205,7 @@ static NotifierList machine_init_done_notifiers =
 bool xen_allowed;
 uint32_t xen_domid;
 enum xen_mode xen_mode = XEN_EMULATE;
+bool xen_domid_restrict;
 
 static int has_defaults = 1;
 static int default_serial = 1;
@@ -3933,6 +3934,13 @@ int main(int argc, char **argv, char **envp)
 }
 xen_mode = XEN_ATTACH;
 break;
+case QEMU_OPTION_xen_domid_restrict:
+if (!(xen_available())) {
+error_report("Option not supported for this target");
+exit(1);
+}
+xen_domid_restrict = true;
+break;
 case QEMU_OPTION_trace:

Re: [Qemu-devel] Guest application reading from pl011 without device driver

2017-03-22 Thread Paolo Bonzini



On 22/03/2017 09:48, Jiahuan Zhang wrote:
> 
> 
> On 22 March 2017 at 09:40, Paolo Bonzini  > wrote:
> 
> 
> > > I am using a windows named pipe to get the data from a window
> > > host program, which uses ReadFile () in char_win.c
> >
> > OK, bugs in the windows-specific char backend would be
> > unsurprising.
> >
> > I'm not entirely sure how the chardev layer works, but
> > at the pl011 end if we return 0 from our can_receive
> > function then the chardev layer should decide it has
> > nothing to do until the pl011 later calls
> > qemu_chr_fe_accept_input(), I think.
> >
> > I've cc'd Paolo and Marc-André Lureau as the chardev
> > maintainers.
> 
> Windows named pipes do not support the equivalent of "select",
> so it's possible that they cause a busy wait.  Try using a
> TCP socket instead and see if the bug goes away.
> 
> 
> Hi, I am trying to use a Windows socket for serial redirection instead
> of Windows named pipe.
> What do you mean "the equivalent of 'select'"?

A function that lets a process sleep until data is available on the
socket.  The solution is to rewrite Windows chardev handling in QEMU to
use threads or overlapped I/O.

Paolo

Re: [Qemu-devel] [PATCH v0] fsdev: QMP interface for throttling

2017-03-22 Thread Pradeep Jagadeesh


On 3/21/2017 2:38 PM, Greg Kurz wrote:

On Mon, 20 Mar 2017 09:07:20 -0400
Pradeep Jagadeesh  wrote:


This patchset enables qmp interfaces for the 9pfs
devices (fsdev).This provides two interfaces one
for querying all the 9pfs devices info. The second one
to set the IO limits for the required 9pfs device.

Signed-off-by: Pradeep Jagadeesh 
---
 Makefile|   2 +-
 fsdev/qemu-fsdev-throttle.c | 103 +++
 fsdev/qemu-fsdev-throttle.h |  14 
 fsdev/qemu-fsdev.c  |   8 ++-
 fsdev/qemu-fsdev.h  |   3 +
 hmp-commands-info.hx|  14 
 hmp-commands.hx |  28 
 hmp.c   |  70 ++
 hmp.h   |   3 +
 hw/9pfs/9p.c|  39 ++
 qapi-schema.json|   3 +
 qapi/9pfs.json  | 169 
 12 files changed, 454 insertions(+), 2 deletions(-)
 create mode 100644 qapi/9pfs.json

diff --git a/Makefile b/Makefile
index 73e0c12..4f387a1 100644
--- a/Makefile
+++ b/Makefile
@@ -413,7 +413,7 @@ qapi-modules = $(SRC_PATH)/qapi-schema.json 
$(SRC_PATH)/qapi/common.json \
$(SRC_PATH)/qapi/block.json $(SRC_PATH)/qapi/block-core.json \
$(SRC_PATH)/qapi/event.json $(SRC_PATH)/qapi/introspect.json \
$(SRC_PATH)/qapi/crypto.json $(SRC_PATH)/qapi/rocker.json \
-   $(SRC_PATH)/qapi/trace.json
+   $(SRC_PATH)/qapi/trace.json $(SRC_PATH)/qapi/9pfs.json

 qapi-types.c qapi-types.h :\
 $(qapi-modules) $(SRC_PATH)/scripts/qapi-types.py $(qapi-py)
diff --git a/fsdev/qemu-fsdev-throttle.c b/fsdev/qemu-fsdev-throttle.c
index 7ae4e86..b18d98a 100644
--- a/fsdev/qemu-fsdev-throttle.c
+++ b/fsdev/qemu-fsdev-throttle.c
@@ -29,6 +29,109 @@ static void fsdev_throttle_write_timer_cb(void *opaque)
 qemu_co_enter_next(>throttled_reqs[true]);
 }

+void fsdev_set_io_throttle(FS9PIOThrottle *arg, FsThrottle *fst, Error **errp)
+{
+ThrottleConfig cfg;
+
+throttle_config_init();
+cfg.buckets[THROTTLE_BPS_TOTAL].avg = arg->bps;
+cfg.buckets[THROTTLE_BPS_READ].avg  = arg->bps_rd;
+cfg.buckets[THROTTLE_BPS_WRITE].avg = arg->bps_wr;
+
+cfg.buckets[THROTTLE_OPS_TOTAL].avg = arg->iops;
+cfg.buckets[THROTTLE_OPS_READ].avg  = arg->iops_rd;
+cfg.buckets[THROTTLE_OPS_WRITE].avg = arg->iops_wr;
+
+if (arg->has_bps_max) {
+cfg.buckets[THROTTLE_BPS_TOTAL].max = arg->bps_max;
+}
+if (arg->has_bps_rd_max) {
+cfg.buckets[THROTTLE_BPS_READ].max = arg->bps_rd_max;
+}
+if (arg->has_bps_wr_max) {
+cfg.buckets[THROTTLE_BPS_WRITE].max = arg->bps_wr_max;
+}
+if (arg->has_iops_max) {
+cfg.buckets[THROTTLE_OPS_TOTAL].max = arg->iops_max;
+}
+if (arg->has_iops_rd_max) {
+cfg.buckets[THROTTLE_OPS_READ].max = arg->iops_rd_max;
+}
+if (arg->has_iops_wr_max) {
+cfg.buckets[THROTTLE_OPS_WRITE].max = arg->iops_wr_max;
+}
+
+if (arg->has_bps_max_length) {
+cfg.buckets[THROTTLE_BPS_TOTAL].burst_length = arg->bps_max_length;
+}
+if (arg->has_bps_rd_max_length) {
+cfg.buckets[THROTTLE_BPS_READ].burst_length = arg->bps_rd_max_length;
+}
+if (arg->has_bps_wr_max_length) {
+cfg.buckets[THROTTLE_BPS_WRITE].burst_length = arg->bps_wr_max_length;
+}
+if (arg->has_iops_max_length) {
+cfg.buckets[THROTTLE_OPS_TOTAL].burst_length = arg->iops_max_length;
+}
+if (arg->has_iops_rd_max_length) {
+cfg.buckets[THROTTLE_OPS_READ].burst_length = arg->iops_rd_max_length;
+}
+if (arg->has_iops_wr_max_length) {
+cfg.buckets[THROTTLE_OPS_WRITE].burst_length = arg->iops_wr_max_length;
+}
+
+if (arg->has_iops_size) {
+cfg.op_size = arg->iops_size;
+}
+
+if (!throttle_is_valid(, errp)) {
+goto out;
+}
+
+fst->cfg = cfg;
+fsdev_throttle_init(fst);
+
+out:
+return;


It looks like this could be:

if (throttle_is_valid(, errp)) {
fst->cfg = cfg;
fsdev_throttle_init(fst);
}


+


extra empty line

Done



+}
+
+void fsdev_get_io_throttle(FsThrottle *fst, FS9PIOThrottle **fs9pcfg,
+   char *fsdevice, Error **errp)
+{
+
+ThrottleConfig cfg = fst->cfg;
+FS9PIOThrottle *fscfg = g_malloc0(sizeof(*fscfg));
+
+fscfg->has_device = true;
+fscfg->device = g_strdup(fsdevice);
+fscfg->bps = cfg.buckets[THROTTLE_BPS_TOTAL].avg;
+fscfg->bps_rd = cfg.buckets[THROTTLE_BPS_READ].avg;
+fscfg->bps_wr = cfg.buckets[THROTTLE_BPS_WRITE].avg;
+
+fscfg->iops = cfg.buckets[THROTTLE_OPS_TOTAL].avg;
+fscfg->iops_rd = cfg.buckets[THROTTLE_OPS_READ].avg;
+fscfg->iops_wr = cfg.buckets[THROTTLE_OPS_WRITE].avg;
+
+fscfg->bps_max = cfg.buckets[THROTTLE_BPS_TOTAL].max;
+fscfg->bps_rd_max = cfg.buckets[THROTTLE_BPS_READ].max;
+fscfg->bps_wr_max =

Re: [Qemu-devel] [PATCH v3] xen: use libxendevice model to restrict operations

2017-03-22 Thread Paul Durrant

> -Original Message-
> From: Stefano Stabellini [mailto:sstabell...@kernel.org]
> Sent: 21 March 2017 18:59
> To: Paul Durrant 
> Cc: qemu-devel@nongnu.org; xen-de...@lists.xenproject.org; Stefano
> Stabellini ; Anthony Perard
> ; Paolo Bonzini 
> Subject: Re: [PATCH v3] xen: use libxendevice model to restrict operations
> 
> On Tue, 21 Mar 2017, Paul Durrant wrote:
> > This patch adds a command-line option (-xen-domid-restrict) which will
> > use the new libxendevicemodel API to restrict devicemodel [1] operations
> > to the specified domid. (Such operations are not applicable to the xenpv
> > machine type).
> >
> > This patch also adds a tracepoint to allow successful enabling of the
> > restriction to be monitored.
> >
> > [1] I.e. operations issued by libxendevicemodel. Operation issued by other
> > xen libraries (e.g. libxenforeignmemory) are currently still 
> > unrestricted
> > but this will be rectified by subsequent patches.
> >
> > Signed-off-by: Paul Durrant 
> 
> In file included from qemu-options-wrapper.h:32:0,
>  from qemu-options.h:33,
>  from os-posix.c:36:
> qemu-options.def:698:1: error: missing terminating " character [-Werror]
> cc1: all warnings being treated as errors
> make: *** [os-posix.o] Error 1
> make: *** Waiting for unfinished jobs
> 
> You are missing a \"

So I am. I'll send v4.

  Paul

> 
> 
> > ---
> > Cc: Stefano Stabellini 
> > Cc: Anthony Perard 
> > Cc: Paolo Bonzini 
> >
> > NOTE: This is already re-based on Juergen Gross's patch "xen: use 5 digit
> >   xen versions" and so should not be applied until after that patch
> >   has been applied.
> >
> > v2:
> >  - Log errno in tracepoint
> > ---
> >  hw/xen/trace-events |  1 +
> >  include/hw/xen/xen.h|  1 +
> >  include/hw/xen/xen_common.h | 20 
> >  qemu-options.hx |  7 +++
> >  vl.c|  8 
> >  xen-hvm.c   |  8 
> >  6 files changed, 45 insertions(+)
> >
> > diff --git a/hw/xen/trace-events b/hw/xen/trace-events
> > index c4fb6f1..5615dce 100644
> > --- a/hw/xen/trace-events
> > +++ b/hw/xen/trace-events
> > @@ -11,3 +11,4 @@ xen_map_portio_range(uint32_t id, uint64_t
> start_addr, uint64_t end_addr) "id: %
> >  xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t
> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> >  xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id:
> %u bdf: %02x.%02x.%02x"
> >  xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id:
> %u bdf: %02x.%02x.%02x"
> > +xen_domid_restrict(int err) "err: %u"
> > diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
> > index 2b1733b..7efcdaa 100644
> > --- a/include/hw/xen/xen.h
> > +++ b/include/hw/xen/xen.h
> > @@ -21,6 +21,7 @@ enum xen_mode {
> >
> >  extern uint32_t xen_domid;
> >  extern enum xen_mode xen_mode;
> > +extern bool xen_domid_restrict;
> >
> >  extern bool xen_allowed;
> >
> > diff --git a/include/hw/xen/xen_common.h
> b/include/hw/xen/xen_common.h
> > index df098c7..4f3bd35 100644
> > --- a/include/hw/xen/xen_common.h
> > +++ b/include/hw/xen/xen_common.h
> > @@ -152,6 +152,13 @@ static inline int xendevicemodel_set_mem_type(
> >  return xc_hvm_set_mem_type(dmod, domid, mem_type, first_pfn, nr);
> >  }
> >
> > +static inline int xendevicemodel_restrict(
> > +xendevicemodel_handle *dmod, domid_t domid)
> > +{
> > +errno = ENOTTY;
> > +return -1;
> > +}
> > +
> >  #else /* CONFIG_XEN_CTRL_INTERFACE_VERSION >= 40900 */
> >
> >  #include 
> > @@ -206,6 +213,19 @@ static inline int xen_modified_memory(domid_t
> domid, uint64_t first_pfn,
> >  return xendevicemodel_modified_memory(xen_dmod, domid,
> first_pfn, nr);
> >  }
> >
> > +static inline int xen_restrict(domid_t domid)
> > +{
> > +int rc = xendevicemodel_restrict(xen_dmod, domid);
> > +
> > +trace_xen_domid_restrict(errno);
> > +
> > +if (errno == ENOTTY) {
> > +return 0;
> > +}
> > +
> > +return rc;
> > +}
> > +
> >  /* Xen 4.2 through 4.6 */
> >  #if CONFIG_XEN_CTRL_INTERFACE_VERSION < 40701
> >
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 99af8ed..d380f7d 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -3354,6 +3354,11 @@ DEF("xen-attach", 0,
> QEMU_OPTION_xen_attach,
> >  "-xen-attach attach to existing xen domain\n"
> >  "xend will use this when starting QEMU\n",
> >  QEMU_ARCH_ALL)
> > +DEF("xen-domid-restrict", 0, QEMU_OPTION_xen_domid_restrict,
> > +"-xen-domid-restrict restrict set of available xen operations\n"
> > +"to specified domain id. (Does not affect\n
> > +"xenpv machine

Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread

2017-03-22 Thread Fam Zheng

On Tue, 03/21 06:05, Ed Swierk wrote:
> On Tue, Mar 21, 2017 at 5:50 AM, Fam Zheng  wrote:
> > On Tue, 03/21 05:20, Ed Swierk wrote:
> >> On Mon, Mar 20, 2017 at 10:26 PM, Fam Zheng  wrote:
> >> > On Fri, 03/17 09:55, Ed Swierk wrote:
> >> >> I'm running into the same problem taking an external snapshot with a
> >> >> virtio-blk drive with iothread, so it's not specific to virtio-scsi.
> >> >> Run a Linux guest on qemu master
> >> >>
> >> >>   qemu-system-x86_64 -nographic -enable-kvm -monitor
> >> >> telnet:0.0.0.0:1234,server,nowait -m 1024 -object
> >> >> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0
> >> >> -device virtio-blk-pci,iothread=iothread1,drive=drive0
> >> >>
> >> >> Then in the monitor
> >> >>
> >> >>   snapshot_blkdev drive0 /x/snap1.qcow2
> >> >>
> >> >> qemu bombs with
> >> >>
> >> >>   qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
> >> >> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
> >> >>
> >> >> whereas without the iothread the assertion failure does not occur.
> >> >
> >> >
> >> > Can you test this one?
> >> >
> >> > ---
> >> >
> >> >
> >> > diff --git a/blockdev.c b/blockdev.c
> >> > index c5b2c2c..4c217d5 100644
> >> > --- a/blockdev.c
> >> > +++ b/blockdev.c
> >> > @@ -1772,6 +1772,8 @@ static void 
> >> > external_snapshot_prepare(BlkActionState *common,
> >> >  return;
> >> >  }
> >> >
> >> > +bdrv_set_aio_context(state->new_bs, state->aio_context);
> >> > +
> >> >  /* This removes our old bs and adds the new bs. This is an 
> >> > operation that
> >> >   * can fail, so we need to do it in .prepare; undoing it for abort 
> >> > is
> >> >   * always possible. */
> >> > @@ -1789,8 +1791,6 @@ static void 
> >> > external_snapshot_commit(BlkActionState *common)
> >> >  ExternalSnapshotState *state =
> >> >   DO_UPCAST(ExternalSnapshotState, common, 
> >> > common);
> >> >
> >> > -bdrv_set_aio_context(state->new_bs, state->aio_context);
> >> > -
> >> >  /* We don't need (or want) to use the transactional
> >> >   * bdrv_reopen_multiple() across all the entries at once, because we
> >> >   * don't want to abort all of them if one of them fails the reopen 
> >> > */
> >>
> >> With this change, a different assertion fails on running snapshot_blkdev:
> >>
> >>   qemu-system-x86_64: /x/qemu/block/io.c:164: bdrv_drain_recurse:
> >> Assertion `qemu_get_current_aio_context() == qemu_get_aio_context()'
> >> failed.
> 
> Actually running snapshot_blkdev command in the text monitor doesn't
> trigger this assertion (I mixed up my notes). Instead it's triggered
> by the following sequence in qmp-shell:
> 
> (QEMU) blockdev-snapshot-sync device=drive0 format=qcow2
> snapshot-file=/x/snap1.qcow2
> {"return": {}}
> (QEMU) block-commit device=drive0
> {"return": {}}
> (QEMU) block-job-complete device=drive0
> {"return": {}}
> 
> > Is there a backtrace?
> 
> #0  0x73757067 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x73758448 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x73750266 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x73750312 in __assert_fail () from 
> /lib/x86_64-linux-gnu/libc.so.6
> #4  0x55b4b0bb in bdrv_drain_recurse
> (bs=bs@entry=0x57bd6010)  at /x/qemu/block/io.c:164
> #5  0x55b4b7ad in bdrv_drained_begin (bs=0x57bd6010)  at
> /x/qemu/block/io.c:231
> #6  0x55b4b802 in bdrv_parent_drained_begin
> (bs=0x568c1a00)  at /x/qemu/block/io.c:53
> #7  bdrv_drained_begin (bs=bs@entry=0x568c1a00)  at /x/qemu/block/io.c:228
> #8  0x55b4be1e in bdrv_co_drain_bh_cb (opaque=0x7fff9aaece40)
> at /x/qemu/block/io.c:190
> #9  0x55bb431e in aio_bh_call (bh=0x5750e5f0)  at
> /x/qemu/util/async.c:90
> #10 aio_bh_poll (ctx=ctx@entry=0x56718090)  at /x/qemu/util/async.c:118
> #11 0x55bb72eb in aio_poll (ctx=0x56718090,
> blocking=blocking@entry=true)  at /x/qemu/util/aio-posix.c:682
> #12 0x559443ce in iothread_run (opaque=0x56717b80)  at
> /x/qemu/iothread.c:59
> #13 0x73ad50a4 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #14 0x7380a87d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Hmm, looks like a separate bug to me. In addition please apply this (the
assertion here is correct I think, but all callers are not audited yet):

diff --git a/block.c b/block.c
index 6e906ec..447d908 100644
--- a/block.c
+++ b/block.c
@@ -1737,6 +1737,9 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
 {
 BlockDriverState *old_bs = child->bs;
 
+if (old_bs && new_bs) {
+assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
+}
 if (old_bs) {
 if (old_bs->quiesce_counter && child->role->drained_end) {
 child->role->drained_end(child);
diff --git a/block/mirror.c b/block/mirror.c
index ca4baa5..a23ca9e 100644

Re: [Qemu-devel] [PATCH v2] target/s390x: Fix broken user mode

2017-03-22 Thread Christian Borntraeger

On 01/30/2017 02:15 PM, Stefan Weil wrote:
> Returning NULL from get_max_cpu_model results in a SIGSEGV runtime error.
> 
> Signed-off-by: Stefan Weil 
> ---
> 
> v2: Re-sent as v1 was damaged by my mailer.
> 
> This is also broken in Debian.
> 
> In addition, there is no default CPU ("any"), so binfmt and related
> actions currently don't work. I hacked my local installation by
> duplicating the "qemu" cpu definition for "any", but maybe there is
> a better solution.
> 
> Regards
> Stefan
> 
>  target/s390x/cpu_models.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
> index 2a894ee..6e34763 100644
> --- a/target/s390x/cpu_models.c
> +++ b/target/s390x/cpu_models.c
> @@ -660,7 +660,6 @@ static void check_compatibility(const S390CPUModel 
> *max_model,
> 
>  static S390CPUModel *get_max_cpu_model(Error **errp)
>  {
> -#ifndef CONFIG_USER_ONLY
>  static S390CPUModel max_model;
>  static bool cached;
> 
> @@ -680,7 +679,6 @@ static S390CPUModel *get_max_cpu_model(Error **errp)
>  cached = true;
>  return _model;
>  }
> -#endif
>  return NULL;
>  }
> 

applied to our tree. Do we need cc stable?

Re: [Qemu-devel] [PATCH v2] target/s390x: Fix broken user mode

2017-03-22 Thread David Hildenbrand

On 22.03.2017 10:07, Christian Borntraeger wrote:
> On 01/30/2017 02:15 PM, Stefan Weil wrote:
>> Returning NULL from get_max_cpu_model results in a SIGSEGV runtime error.
>>
>> Signed-off-by: Stefan Weil 
>> ---
>>
>> v2: Re-sent as v1 was damaged by my mailer.
>>
>> This is also broken in Debian.
>>
>> In addition, there is no default CPU ("any"), so binfmt and related
>> actions currently don't work. I hacked my local installation by
>> duplicating the "qemu" cpu definition for "any", but maybe there is
>> a better solution.
> 
> applied to our tree. Do we need cc stable as well?

Yes, I think so.

Thanks!

-- 

Thanks,

David

[Qemu-devel] Minimum RAM size for PC machines?

2017-03-22 Thread Markus Armbruster

Last time I checked[1], SeaBIOS required 1MiB of RAM, and the failure
modes were mean.

Back then, I asked whether we should enforce a suitable minimum RAM
size[2].  Peter Maydell replied that modelling RAM constraints involves
an expedition into the Generality Swamps, and wished me better luck than
he had.

Four and a half years later, the failure modes are as mean as ever.  For
instance,

$ qemu-system-x86_64 --nodefaults -device VGA -m 640k

simply hangs for me, and

$ qemu-system-x86_64 --nodefaults -device VGA -m 16k

crashes with "qemu: fatal: Trying to execute code outside RAM or ROM at
0x4000" and a register dump with TCG, or the even less
helpful "KVM internal error. Suberror: 1" with KVM.

Waiting for "someone" to design and implement the completely general
solution has had the predictable result: nothing.

Are we now ready to accept a simple & stupid patch that actually helps
users, say letting boards that care declare minimum and maximum RAM
size?  And make PC reject RAM size less than 1MiB, even though "someone"
might conceivably have firmware that works with less?



[1] Message-ID: <87fw7xwqkq@blackfin.pond.sub.org>
https://www.seabios.org/pipermail/seabios/2012-August/004343.html
[2] Message-ID: <87wr1921rd@blackfin.pond.sub.org>
https://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01319.html

Re: [Qemu-devel] [PATCH v2] target/s390x: Fix broken user mode

2017-03-22 Thread Christian Borntraeger

On 01/30/2017 02:15 PM, Stefan Weil wrote:
> Returning NULL from get_max_cpu_model results in a SIGSEGV runtime error.
> 
> Signed-off-by: Stefan Weil 
> ---
> 
> v2: Re-sent as v1 was damaged by my mailer.
> 
> This is also broken in Debian.
> 
> In addition, there is no default CPU ("any"), so binfmt and related
> actions currently don't work. I hacked my local installation by
> duplicating the "qemu" cpu definition for "any", but maybe there is
> a better solution.

applied to our tree. Do we need cc stable as well?


> 
> Regards
> Stefan
> 
>  target/s390x/cpu_models.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
> index 2a894ee..6e34763 100644
> --- a/target/s390x/cpu_models.c
> +++ b/target/s390x/cpu_models.c
> @@ -660,7 +660,6 @@ static void check_compatibility(const S390CPUModel 
> *max_model,
> 
>  static S390CPUModel *get_max_cpu_model(Error **errp)
>  {
> -#ifndef CONFIG_USER_ONLY
>  static S390CPUModel max_model;
>  static bool cached;
> 
> @@ -680,7 +679,6 @@ static S390CPUModel *get_max_cpu_model(Error **errp)
>  cached = true;
>  return _model;
>  }
> -#endif
>  return NULL;
>  }
>

Re: [Qemu-devel] 答复: Re: 答复: Re: [BUG]COLO failover hang

2017-03-22 Thread Dr. David Alan Gilbert

* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
> > * Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> > > Hi,
> > > 
> > > Thanks for reporting this, and i confirmed it in my test, and it is a bug.
> > > 
> > > Though we tried to call qemu_file_shutdown() to shutdown the related fd, 
> > > in
> > > case COLO thread/incoming thread is stuck in read/write() while do 
> > > failover,
> > > but it didn't take effect, because all the fd used by COLO (also 
> > > migration)
> > > has been wrapped by qio channel, and it will not call the shutdown API if
> > > we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
> > > QIO_CHANNEL_FEATURE_SHUTDOWN).
> > > 
> > > Cc: Dr. David Alan Gilbert 
> > > 
> > > I doubted migration cancel has the same problem, it may be stuck in 
> > > write()
> > > if we tried to cancel migration.
> > > 
> > > void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
> > > Error **errp)
> > > {
> > >  qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
> > >  migration_channel_connect(s, ioc, NULL);
> > >  ... ...
> > > We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
> > > QIO_CHANNEL_FEATURE_SHUTDOWN) above,
> > > and the
> > > migrate_fd_cancel()
> > > {
> > >   ... ...
> > >  if (s->state == MIGRATION_STATUS_CANCELLING && f) {
> > >  qemu_file_shutdown(f);  --> This will not take effect. No ?
> > >  }
> > > }
> > 
> > (cc'd in Daniel Berrange).
> > I see that we call qio_channel_set_feature(ioc, 
> > QIO_CHANNEL_FEATURE_SHUTDOWN); at the
> > top of qio_channel_socket_new;  so I think that's safe isn't it?
> > 
> 
> Hmm, you are right, this problem is only exist for the migration incoming fd, 
> thanks.


Yes, and I don't think we normally do a cancel on the incoming side of a 
migration.

Dave

> > Dave
> > 
> > > Thanks,
> > > Hailiang
> > > 
> > > On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote:
> > > > Thank you。
> > > > 
> > > > I have test aready。
> > > > 
> > > > When the Primary Node panic,the Secondary Node qemu hang at the same 
> > > > place。
> > > > 
> > > > Incorrding http://wiki.qemu-project.org/Features/COLO ，kill Primary 
> > > > Node qemu will not produce the problem,but Primary Node panic can。
> > > > 
> > > > I think due to the feature of channel does not support 
> > > > QIO_CHANNEL_FEATURE_SHUTDOWN.
> > > > 
> > > > 
> > > > when failover,channel_shutdown could not shut down the channel.
> > > > 
> > > > 
> > > > so the colo_process_incoming_thread will hang at recvmsg.
> > > > 
> > > > 
> > > > I test a patch:
> > > > 
> > > > 
> > > > diff --git a/migration/socket.c b/migration/socket.c
> > > > 
> > > > 
> > > > index 13966f1..d65a0ea 100644
> > > > 
> > > > 
> > > > --- a/migration/socket.c
> > > > 
> > > > 
> > > > +++ b/migration/socket.c
> > > > 
> > > > 
> > > > @@ -147,8 +147,9 @@ static gboolean 
> > > > socket_accept_incoming_migration(QIOChannel *ioc,
> > > > 
> > > > 
> > > >}
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > >trace_migration_socket_incoming_accepted()
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > >qio_channel_set_name(QIO_CHANNEL(sioc), 
> > > > "migration-socket-incoming")
> > > > 
> > > > 
> > > > +qio_channel_set_feature(QIO_CHANNEL(sioc), 
> > > > QIO_CHANNEL_FEATURE_SHUTDOWN)
> > > > 
> > > > 
> > > >migration_channel_process_incoming(migrate_get_current(),
> > > > 
> > > > 
> > > >   QIO_CHANNEL(sioc))
> > > > 
> > > > 
> > > >object_unref(OBJECT(sioc))
> > > > 
> > > > 
> > > > 
> > > > 
> > > > My test will not hang any more.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 原始邮件
> > > > 
> > > > 
> > > > 
> > > > 发件人： ＜zhangchen.f...@cn.fujitsu.com＞
> > > > 收件人：王广10165992 ＜zhang.zhanghaili...@huawei.com＞
> > > > 抄送人： ＜qemu-devel@nongnu.org＞ ＜zhangchen.f...@cn.fujitsu.com＞
> > > > 日 期 ：2017年03月21日 15:58
> > > > 主 题 ：Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Hi,Wang.
> > > > 
> > > > You can test this branch:
> > > > 
> > > > https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
> > > > 
> > > > and please follow wiki ensure your own configuration correctly.
> > > > 
> > > > http://wiki.qemu-project.org/Features/COLO
> > > > 
> > > > 
> > > > Thanks
> > > > 
> > > > Zhang Chen
> > > > 
> > > > 
> > > > On 03/21/2017 03:27 PM, wang.guan...@zte.com.cn wrote:
> > > > ＞
> > > > ＞ hi.
> > > > ＞
> > > > ＞ I test the git qemu master have the same problem.
> > > > ＞
> > > > ＞ (gdb) bt
> > > > ＞
> > > > ＞ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
> > > > ＞ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
> > > > ＞
> > > > ＞ #1  0x7f658e4aa0c2

Re: [Qemu-devel] [RFC][PATCH 0/6] "bootonceindex" property

2017-03-22 Thread Huttunen, Janne (Nokia - FI/Espoo)

On Wed, 2017-03-22 at 04:43 -0400, Paolo Bonzini wrote:
> 
> Understood---my question is how you would set up the alternate
> boot order: is it something like "keep a button pressed while
> turning on", or something written in NVRAM, or something else
> that is completely different?

In my case the real hardware has a management processor
on the board and the temporary boot source (and also the
permanent one for that matter) for the main processor can
be set from there. Since neither the BIOS nor the management
firmware are open source, I don't know how it technically
works, but I assume there either is some shared memory
between the main BIOS and the management processor or
alternatively the BIOS talks with the management processor
with some protocol during boot to get the order.

[Qemu-devel] [PATCH v2] qemu-ga: add guest-get-osinfo command

2017-03-22 Thread Vinzenz 'evilissimo' Feenstra

From: Vinzenz Feenstra 

Add a new 'guest-get-osinfo' command for reporting basic information of
the guest operating system (hereafter just 'OS'). This information
includes the type of the OS, the version, and the architecture.
Additionally reported would be a name, distribution type and kernel
version where applicable.

Here an example for a Fedora 25 VM:

$ virsh -c qemu:system qemu-agent-command F25 \
'{ "execute": "guest-get-osinfo" }'
  {"return":{"arch":"x86_64","codename":"Server Edition","version":"25",
   "kernel":"4.8.6-300.fc25.x86_64","type":"linux","distribution":"Fedora"}}

And an example for a Windows 2012 R2 VM:

$ virsh -c qemu:system qemu-agent-command Win2k12R2 \
'{ "execute": "guest-get-osinfo" }'
  {"return":{"arch":"x86_64","codename":"Win 2012 R2",
   "version":"6.3","kernel":"","type":"windows","distribution":""}}

Signed-off-by: Vinzenz Feenstra 
---
 qga/commands-posix.c | 189 +++
 qga/commands-win32.c | 104 
 qga/qapi-schema.json |  40 +++
 3 files changed, 333 insertions(+)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 73d93eb..381c01a 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -13,6 +13,7 @@
 
 #include "qemu/osdep.h"
 #include 
+#include 
 #include 
 #include 
 #include "qga/guest-agent-core.h"
@@ -2356,6 +2357,188 @@ GuestMemoryBlockInfo 
*qmp_guest_get_memory_block_info(Error **errp)
 return info;
 }
 
+static void ga_strip_end(char *value)
+{
+size_t value_length = strlen(value);
+while (value_length > 0) {
+switch (value[value_length - 1]) {
+default:
+value_length = 0;
+break;
+case ' ': case '\n': case '\t': case '\'': case '"':
+value[value_length - 1] = 0;
+--value_length;
+break;
+}
+}
+}
+
+static void ga_parse_version_id(char const *value, GuestOSInfo *info)
+{
+if (strlen(value) < 128) {
+char codename[128];
+char version[128];
+
+if (*value == '"') {
+++value;
+}
+
+if (sscanf(value, "%[^(] (%[^)])", version, codename) == 2) {
+/* eg. VERSION="16.04.1 LTS (Xenial Xerus)" */
+info->codename = g_strdup(codename);
+info->version = g_strdup(version);
+} else if (sscanf(value, "%[^,] %[^\"]\"", version, codename) == 2) {
+/* eg. VERSION="12.04.5 LTS, Precise Pangolin" */
+info->codename = g_strdup(codename);
+info->version = g_strdup(version);
+} else {
+/* Just use the rest */
+info->version = g_strdup(version);
+}
+}
+}
+
+static void ga_parse_debian_version(FILE *fp, GuestOSInfo *info)
+{
+char *line = NULL;
+size_t n = 0;
+
+if (getline(, , fp) != -1) {
+ga_strip_end(line);
+info->version = g_strdup(line);
+info->distribution = g_strdup("Debian GNU/Linux");
+}
+free(line);
+}
+
+static void ga_parse_redhat_release(FILE *fp, GuestOSInfo *info)
+{
+char *line = NULL;
+size_t n = 0;
+
+if (getline(, , fp) != -1) {
+char *value = strstr(line, " release ");
+if (value != NULL) {
+*value = 0;
+info->distribution = g_strdup(line);
+value += 9;
+ga_strip_end(value);
+ga_parse_version_id(value, info);
+}
+}
+free(line);
+}
+
+static void ga_parse_os_release(FILE *fp, GuestOSInfo *info)
+{
+char *line = NULL;
+size_t n = 0;
+
+while (getline(, , fp) != -1) {
+char *value = strstr(line, "=");
+if (value != NULL) {
+*value = 0;
+++value;
+ga_strip_end(value);
+
+size_t len = strlen(line);
+if (len == 9 && strcmp(line, "VERSION_ID") == 0) {
+info->version = g_strdup(value);
+} else if (len == 7 && strcmp(line, "VERSION") == 0) {
+ga_parse_version_id(value, info);
+} else if (len == 4 && strcmp(line, "NAME") == 0) {
+info->distribution = g_strdup(value);
+}
+}
+}
+free(line);
+}
+
+static char *ga_stripped_strdup(char const *value)
+{
+char *result = NULL;
+while (value && *value == '"') {
+++value;
+}
+result = g_strdup(value);
+ga_strip_end(result);
+return result;
+}
+
+static void ga_parse_lsb_release(FILE *fp, GuestOSInfo *info)
+{
+char *line = NULL;
+size_t n = 0;
+
+while (getline(, , fp) != -1) {
+char *value = strstr(line, "=");
+if (value != NULL) {
+*value = 0;
+++value;
+ga_strip_end(value);
+
+size_t len = strlen(line);
+if (len == 15 && strcmp(line, "DISTRIB_RELEASE") == 0) {
+info->version = ga_stripped_strdup(value);
+} else

[Qemu-devel] (no subject)

2017-03-22 Thread Vinzenz 'evilissimo' Feenstra

In this version:

- Changed the use of strdup to g_strdup and the use of sprintf with a local
  buffer to use g_strdup_printf instead.
- Made the majority of fields in the GuestOSInfo optional to allow 0 values
- Used the right target version in the schema (2.10 vs 2.8 before)
- Refactored the code for deciding which release/version file to use to use a
  configuration struct and a while loop to iterate over the options.

I was looking into the usage of uname, as suggested by eric, however after
looking into this I realized that there's no additional information to be
gained from this. Therefore I decided that this is still a feasible approach.
In most cases the code will break out of the loop after accessing the second
file. For older systems there are some supported fallbacks available, but
/etc/os-release and /usr/lib/os-release are already quite established.

Re: [Qemu-devel] [PATCH] add xen-9p-backend to MAINTAINERS under Xen

2017-03-22 Thread Greg Kurz

On Tue, 21 Mar 2017 14:01:26 -0700
Stefano Stabellini  wrote:

> Signed-off-by: Stefano Stabellini 
> Signed-off-by: Stefano Stabellini 
> CC: gr...@kaod.org
> CC: anthony.per...@citrix.com
> ---

Reviewed-by: Greg Kurz 

> This patch is meant to be on top of
> http://marc.info/?l=xen-devel=149003409510264=2
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index be79f68..bd5218e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -327,6 +327,7 @@ L: xen-de...@lists.xenproject.org
>  S: Supported
>  F: xen-*
>  F: */xen*
> +F: hw/9pfs/xen-9p-backend.c
>  F: hw/char/xen_console.c
>  F: hw/display/xenfb.c
>  F: hw/net/xen_nic.c



pgpLXz4wEygKv.pgp
Description: OpenPGP digital signature

Re: [Qemu-devel] Guest application reading from pl011 without device driver

2017-03-22 Thread Jiahuan Zhang

On 22 March 2017 at 09:40, Paolo Bonzini  wrote:

>
> > > I am using a windows named pipe to get the data from a window
> > > host program, which uses ReadFile () in char_win.c
> >
> > OK, bugs in the windows-specific char backend would be
> > unsurprising.
> >
> > I'm not entirely sure how the chardev layer works, but
> > at the pl011 end if we return 0 from our can_receive
> > function then the chardev layer should decide it has
> > nothing to do until the pl011 later calls
> > qemu_chr_fe_accept_input(), I think.
> >
> > I've cc'd Paolo and Marc-André Lureau as the chardev
> > maintainers.
>
> Windows named pipes do not support the equivalent of "select",
> so it's possible that they cause a busy wait.  Try using a
> TCP socket instead and see if the bug goes away.
>

Hi, I am trying to use a Windows socket for serial redirection instead of
Windows named pipe.
What do you mean "the equivalent of 'select'"?


>
> Paolo
>

Re: [Qemu-devel] [PATCH v4 0/8] xen/9pfs: introduce the Xen 9pfs backend

2017-03-22 Thread Greg Kurz

On Tue, 21 Mar 2017 13:14:02 -0700 (PDT)
Stefano Stabellini  wrote:

> On Tue, 21 Mar 2017, Greg Kurz wrote:
> > On Mon, 20 Mar 2017 11:18:46 -0700 (PDT)
> > Stefano Stabellini  wrote:
> >   
> > > Hi all,
> > > 
> > > This patch series implements a new transport for 9pfs, aimed at Xen
> > > systems.
> > > 
> > > The transport is based on a traditional Xen frontend and backend drivers
> > > pair. This patch series implements the backend, which typically runs in
> > > Dom0. I sent another series to implement the frontend in Linux
> > > (http://marc.info/?l=linux-kernel=148883047125960=2).
> > > 
> > > The backend complies to the Xen transport for 9pfs specification
> > > version 1, available here:
> > > 
> > > https://xenbits.xen.org/docs/unstable/misc/9pfs.html
> > > 
> > > 
> > > Changes in v4:
> > > - add reviewed-bys
> > > - remove useless if(NULL) checks around g_free
> > > - g_free g_malloc'ed sgs
> > > - remove XEN_9PFS_RING_ORDER, make the ring order dynamic per ring,
> > >   reading the ring_order field in xen_9pfs_data_intf
> > > - remove patch not to build Xen backends on non-Xen capable targets
> > >   because it is already upstream
> > >   
> > 
> > Hi Stefano,
> > 
> > This looks good to me. Do you want these patches to go through my 9p
> > tree or through your xen tree ?  
> 
> Thanks Greg! It can work both ways. If you have any changes in your queue
> that could conflict with this, it's best to go via your tree.
> 
> Otherwise, I'll merge it in mine, so that I can keep an eye on the
> correspondent Xen changes to the header files and make sure they are in
> sync (specifically http://marc.info/?l=qemu-devel=149003412910278).
> 

I don't have any conflicting patches on my side. Please merge this in your
tree (as well as the MAINTAINERS patch).

Cheers,

--
Greg

> 
> >  Also, I guess you may want to add
> > F: hw/9pfs/xen-9p-backend.c to the Xen section in MAINTAINERS.  
> 
> I'll send a patch to be applied on top of the series
> 
> 
> > --
> > Greg
> >   
> > > Changes in v3:
> > > - do not build backends for targets that do not support xen
> > > - remove xen_9pfs.h, merge its content into xen-9p-backend.c
> > > - remove xen_9pfs_header, introduce P9MsgHeader
> > > - use le32_to_cpu to access P9MsgHeader fields
> > > - many coding style fixes
> > > - run checkpatch on all patches
> > > - add check if num_rings < 1
> > > - use g_strdup_printf
> > > - free fsdev_id in xen_9pfs_free
> > > - add comments
> > > 
> > > Changes in v2:
> > > - fix coding style
> > > - compile xen-9p-backend.c if CONFIG_XEN_BACKEND
> > > - add patch to set CONFIG_XEN_BACKEND only for the right targets
> > > - add review-bys
> > > 
> > > 
> > > Stefano Stabellini (8):
> > >   xen: import ring.h from xen
> > >   9p: introduce a type for the 9p header
> > >   xen/9pfs: introduce Xen 9pfs backend
> > >   xen/9pfs: connect to the frontend
> > >   xen/9pfs: receive requests from the frontend
> > >   xen/9pfs: implement in/out_iov_from_pdu and vmarshal/vunmarshal
> > >   xen/9pfs: send responses back to the frontend
> > >   xen/9pfs: build and register Xen 9pfs backend
> > > 
> > >  hw/9pfs/9p.h |   6 +
> > >  hw/9pfs/Makefile.objs|   1 +
> > >  hw/9pfs/virtio-9p-device.c   |   6 +-
> > >  hw/9pfs/xen-9p-backend.c | 444 
> > > +
> > >  hw/block/xen_blkif.h |   2 +-
> > >  hw/usb/xen-usb.c |   2 +-
> > >  hw/xen/xen_backend.c |   3 +
> > >  include/hw/xen/io/ring.h | 455 
> > > +++
> > >  include/hw/xen/xen_backend.h |   3 +
> > >  9 files changed, 915 insertions(+), 7 deletions(-)
> > >  create mode 100644 hw/9pfs/xen-9p-backend.c
> > >  create mode 100644 include/hw/xen/io/ring.h  
> > 
> >   



pgp2ATlF2SRjh.pgp
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC][PATCH 0/6] "bootonceindex" property

2017-03-22 Thread Paolo Bonzini



- Original Message -
> From: "Janne Huttunen" 
> To: "Paolo Bonzini" , "Gerd Hoffmann" 
> Cc: qemu-devel@nongnu.org
> Sent: Wednesday, March 22, 2017 7:36:54 AM
> Subject: Re: [Qemu-devel] [RFC][PATCH 0/6] "bootonceindex" property
> 
> On Tue, 2017-03-21 at 18:55 +0100, Paolo Bonzini wrote:
> > 
> > > Since real HW has this capability, there exist certain
> > > auxiliary systems that are built on it. Having similar
> > > semantics available in QEMU allows me to build a virtual
> > > machine that works with these systems without modifying
> > > them in any way.
> >
> > How does real hardware do it?  I suppose you'd do it with a firmware
> > setup menu or something like that; would it be enough to add a way to
> > modify bootindex during runtime?
> 
> On the real hardware the "boot once" really means *once*
> i.e. it only affects the next reboot regardless of how
> the next boot is triggered (reset button, power button,
> software, etc.). After the next boot the normal boot
> order is automatically restored.

Understood---my question is how you would set up the alternate
boot order: is it something like "keep a button pressed while
turning on", or something written in NVRAM, or something else
that is completely different?

Paolo

> Theoretically it should be possible to get a close
> approximation of this by changing the main boot order,
> waiting for the boot to happen and then restoring the
> original order back. This would require having some
> process that constantly monitors what QEMU is doing so
> that it can notice when the boot happens and then
> restore the order. I'm trying to avoid having such
> a process if possible, which in this case means that
> QEMU would need to restore the order by itself.
> 
> 
>

Re: [Qemu-devel] Guest application reading from pl011 without device driver

2017-03-22 Thread Paolo Bonzini


> > I am using a windows named pipe to get the data from a window
> > host program, which uses ReadFile () in char_win.c
> 
> OK, bugs in the windows-specific char backend would be
> unsurprising.
> 
> I'm not entirely sure how the chardev layer works, but
> at the pl011 end if we return 0 from our can_receive
> function then the chardev layer should decide it has
> nothing to do until the pl011 later calls
> qemu_chr_fe_accept_input(), I think.
> 
> I've cc'd Paolo and Marc-André Lureau as the chardev
> maintainers.

Windows named pipes do not support the equivalent of "select",
so it's possible that they cause a busy wait.  Try using a
TCP socket instead and see if the bug goes away.

Paolo

Re: [Qemu-devel] 答复: Re: 答复: Re: 答复: Re: 答复: Re: [BUG]COLO failover hang

2017-03-22 Thread Hailiang Zhang


On 2017/3/22 16:09, wang.guan...@zte.com.cn wrote:

hi:

yes.it is better.

And should we delete



Yes, you are right.





#ifdef WIN32

 QIO_CHANNEL(cioc)-＞event = CreateEvent(NULL, FALSE, FALSE, NULL)

#endif




in qio_channel_socket_accept？

qio_channel_socket_new already have it.












原始邮件



发件人： ＜zhang.zhanghaili...@huawei.com＞
收件人：王广10165992
抄送人： ＜xuqu...@huawei.com＞ ＜dgilb...@redhat.com＞ ＜zhangchen.f...@cn.fujitsu.com＞ 
＜qemu-devel@nongnu.org＞
日 期 ：2017年03月22日 15:03
主 题 ：Re: [Qemu-devel]  答复: Re:  答复: Re: 答复: Re: [BUG]COLO failover hang





Hi,

On 2017/3/22 9:42, wang.guan...@zte.com.cn wrote:
＞ diff --git a/migration/socket.c b/migration/socket.c
＞
＞
＞ index 13966f1..d65a0ea 100644
＞
＞
＞ --- a/migration/socket.c
＞
＞
＞ +++ b/migration/socket.c
＞
＞
＞ @@ -147,8 +147,9 @@ static gboolean 
socket_accept_incoming_migration(QIOChannel *ioc,
＞
＞
＞   }
＞
＞
＞
＞
＞
＞   trace_migration_socket_incoming_accepted()
＞
＞
＞
＞
＞
＞   qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
＞
＞
＞ +qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
＞
＞
＞   migration_channel_process_incoming(migrate_get_current(),
＞
＞
＞  QIO_CHANNEL(sioc))
＞
＞
＞   object_unref(OBJECT(sioc))
＞
＞
＞
＞
＞ Is this patch ok?
＞

Yes, i think this works, but a better way maybe to call 
qio_channel_set_feature()
in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
socket accept fd,
Or fix it by this:

diff --git a/io/channel-socket.c b/io/channel-socket.c
index f546c68..ce6894c 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
 Error **errp)
   {
   QIOChannelSocket *cioc
-
-cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET))
-cioc-＞fd = -1
+
+cioc = qio_channel_socket_new()
   cioc-＞remoteAddrLen = sizeof(ioc-＞remoteAddr)
   cioc-＞localAddrLen = sizeof(ioc-＞localAddr)


Thanks,
Hailiang

＞ I have test it . The test could not hang any more.
＞
＞
＞
＞
＞
＞
＞
＞
＞
＞
＞
＞
＞ 原始邮件
＞
＞
＞
＞ 发件人： ＜zhang.zhanghaili...@huawei.com＞
＞ 收件人： ＜dgilb...@redhat.com＞ ＜berra...@redhat.com＞
＞ 抄送人： ＜xuqu...@huawei.com＞ ＜qemu-devel@nongnu.org＞ 
＜zhangchen.f...@cn.fujitsu..com＞王广10165992
＞ 日 期 ：2017年03月22日 09:11
＞ 主 题 ：Re: [Qemu-devel]  答复: Re:  答复: Re: [BUG]COLO failover hang
＞
＞
＞
＞
＞
＞ On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
＞ ＞ * Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
＞ ＞＞ Hi,
＞ ＞＞
＞ ＞＞ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
＞ ＞＞
＞ ＞＞ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
＞ ＞＞ case COLO thread/incoming thread is stuck in read/write() while do 
failover,
＞ ＞＞ but it didn't take effect, because all the fd used by COLO (also migration)
＞ ＞＞ has been wrapped by qio channel, and it will not call the shutdown API if
＞ ＞＞ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN).
＞ ＞＞
＞ ＞＞ Cc: Dr. David Alan Gilbert ＜dgilb...@redhat.com＞
＞ ＞＞
＞ ＞＞ I doubted migration cancel has the same problem, it may be stuck in write()
＞ ＞＞ if we tried to cancel migration.
＞ ＞＞
＞ ＞＞ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
Error **errp)
＞ ＞＞ {
＞ ＞＞  qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
＞ ＞＞  migration_channel_connect(s, ioc, NULL)
＞ ＞＞  ... ...
＞ ＞＞ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN) above,
＞ ＞＞ and the
＞ ＞＞ migrate_fd_cancel()
＞ ＞＞ {
＞ ＞＞   ... ...
＞ ＞＞  if (s-＞state == MIGRATION_STATUS_CANCELLING && f) {
＞ ＞＞  qemu_file_shutdown(f)  --＞ This will not take effect. No ?
＞ ＞＞  }
＞ ＞＞ }
＞ ＞
＞ ＞ (cc'd in Daniel Berrange).
＞ ＞ I see that we call qio_channel_set_feature(ioc, 
QIO_CHANNEL_FEATURE_SHUTDOWN) at the
＞ ＞ top of qio_channel_socket_new  so I think that's safe isn't it?
＞ ＞
＞
＞ Hmm, you are right, this problem is only exist for the migration incoming fd, 
thanks.
＞
＞ ＞ Dave
＞ ＞
＞ ＞＞ Thanks,
＞ ＞＞ Hailiang
＞ ＞＞
＞ ＞＞ On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote:
＞ ＞＞＞ Thank you。
＞ ＞＞＞
＞ ＞＞＞ I have test aready。
＞ ＞＞＞
＞ ＞＞＞ When the Primary Node panic,the Secondary Node qemu hang at the same 
place。
＞ ＞＞＞
＞ ＞＞＞ Incorrding http://wiki.qemu-project.org/Features/COLO ，kill Primary Node 
qemu will not produce the problem,but Primary Node panic can。
＞ ＞＞＞
＞ ＞＞＞ I think due to the feature of channel does not support 
QIO_CHANNEL_FEATURE_SHUTDOWN.
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ when failover,channel_shutdown could not shut down the channel.
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ so the colo_process_incoming_thread will hang at recvmsg.
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ I test a patch:
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ diff --git a/migration/socket.c b/migration/socket.c
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ index 13966f1..d65a0ea 100644
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ --- a/migration/socket.c
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ +++ b/migration/socket.c
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ @@

Re: [Qemu-devel] [PATCH] Fix Event Viewer errors caused by qemu-ga

2017-03-22 Thread Sameeh Jubran

On Tue, Mar 21, 2017 at 6:09 PM, Michael Roth 
wrote:

> Quoting Sameeh Jubran (2017-03-21 05:49:52)
> > When the command "guest-fsfreeze-freeze" is executed it causes
> > the VSS service to log the errors below in the Event Viewer.
> >
> > These errors are caused by two issues in the function "CommitSnapshots"
> in
> > provider.cpp:
> >
> > 1. When VSS_TIMEOUT_MSEC expires the funtion returns E_ABORT. This causes
> > the error #12293.
> >
> > 2. The VSS_TIMEOUT_MSEC value is too big. According to msdn the
> > "Flush & Hold" operation has 10 seconds timeout not configurable, The
> > "CommitSnapshots" is a part of the "Flush & Hold" process and thus any
> > timeout bigger than 10 seconds would cause the error #12298 and anything
> > bigger than 40 seconds causes the error #12340. All this info can be
> found here:
> > https://msdn.microsoft.com/en-us/library/windows/desktop/
> aa384589(v=vs.85).aspx
>
> Not sure how best to deal with this. Technically our CommitSnapshots
> interface is driven by the backup job being run by QGA/QEMU management
> side. If that amount of time exceeds the VSS limits then I think it's
> appropriate for VSS to log the error accordingly. VSS_TIMEOUT_MSEC here
> doesn't actually have too much correlation with the VSS-set timeout,
> IIRC it's specifically picked to exceed both the 10 and 40 second
> timeouts and acts more as a fail-safe timeout.

The timeout was added in #commit: b39297aedfabe9b2c426cd540413be991500da25
There is no point in setting the TIMEOUT for this long as the actual freeze
- Fush and Hold Writes -
is limited to 10 seconds ( not configurable) according to msdn
https://msdn.microsoft.com/en-us/library/windows/desktop/aa384589%28v=vs.85%29.aspx

>
> Are the event logs causing issues? FWIW, on the posix side we also opt
> for gratuitous logging to syslog and such, the idea there being that
> cooperative guests would prefer transparency on how the agent is being
> used.
>
Apparently, these error logs are annoying to some (
https://bugzilla.redhat.com/show_bug.cgi?id=1387125),
moreover I don't think that our implementation to the freeze operation -
which is a workaround in a way -
should log errors even though we know they are false alarm.

>
> That said, I do think error 12293 is unecessary, since IIUC it would
> always be paired with the actual VSS-reported error. So avoiding the
> E_ABORT seems reasonable either way.
>
> >
> > |event id|   error
>  |
> > * 12293  : Volume Shadow Copy Service error: Error calling a routine on a
> >Shadow Copy Provider {----}.
> >Routine details CommitSnapshots [hr = 0x80004004, Operation
> >aborted.
> >
> > * 12340  : Volume Shadow Copy Error: VSS waited more than 40 seconds for
> >all volumes to be flushed.  This caused volume
> >\\?\Volume{62a171da-32ec-11e4-80b1-806e6f6e6963}\ to timeout
> >while waiting for the release-writes phase of shadow copy
> >creation. Trying again when disk activity is lower may solve
> >this problem.
> >
> > * 12298  : Volume Shadow Copy Service error: The I/O writes cannot be
> held
> >during the shadow copy creation period on volume
> >\\?\Volume{62a171d9-32ec-11e4-80b1-806e6f6e6963}\. The volume
> >index in the shadow copy set is 0. Error details:
> >Open[0x, The operation completed successfully. ],
> >Flush[0x, The operation completed successfully.],
> >Release[0x, The operation completed successfully.],
> >OnRun[0x80042314, The shadow copy provider timed out while
> >holding writes to the volume being shadow copied. This is
> >probably due to excessive activity on the volume by an
> >application or a system service. Try again later when activity
> >on the volume is reduced.
> >
> > Signed-off-by: Sameeh Jubran 
> > ---
> >  qga/vss-win32/provider.cpp | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/qga/vss-win32/provider.cpp b/qga/vss-win32/provider.cpp
> > index ef94669..d72f4d4 100644
> > --- a/qga/vss-win32/provider.cpp
> > +++ b/qga/vss-win32/provider.cpp
> > @@ -15,7 +15,7 @@
> >  #include 
> >  #include 
> >
> > -#define VSS_TIMEOUT_MSEC (60*1000)
> > +#define VSS_TIMEOUT_MSEC (9 * 1000)
> >
> >  static long g_nComObjsInUse;
> >  HINSTANCE g_hinstDll;
> > @@ -377,7 +377,6 @@ STDMETHODIMP CQGAVssProvider::CommitSnapshots(VSS_ID
> SnapshotSetId)
> >  if (WaitForSingleObject(hEventThaw, VSS_TIMEOUT_MSEC) !=
> WAIT_OBJECT_0) {
> >  /* Send event to qemu-ga to notify the provider is timed out */
> >  SetEvent(hEventTimeout);
> > -hr = E_ABORT;
> >  }
> >
> >  CloseHandle(hEventThaw);
> > --
> > 2.9.3
> >
>
>


-- 
Respectfully,
*Sameeh Jubran*
*Linkedin *
*Software Engineer @ Daynix

[Qemu-devel] 答复: Re: 答复: Re: 答复: Re: 答复: Re: [BUG]COLO failover hang

2017-03-22 Thread wang.guang55

hi:

yes.it is better.

And should we delete 




#ifdef WIN32

QIO_CHANNEL(cioc)-＞event = CreateEvent(NULL, FALSE, FALSE, NULL)

#endif




in qio_channel_socket_accept？

qio_channel_socket_new already have it.












原始邮件



发件人： ＜zhang.zhanghaili...@huawei.com＞
收件人：王广10165992
抄送人： ＜xuqu...@huawei.com＞ ＜dgilb...@redhat.com＞ ＜zhangchen.f...@cn.fujitsu.com＞ 
＜qemu-devel@nongnu.org＞
日 期 ：2017年03月22日 15:03
主 题 ：Re: [Qemu-devel]  答复: Re:  答复: Re: 答复: Re: [BUG]COLO failover hang





Hi,

On 2017/3/22 9:42, wang.guan...@zte.com.cn wrote:
＞ diff --git a/migration/socket.c b/migration/socket.c
＞
＞
＞ index 13966f1..d65a0ea 100644
＞
＞
＞ --- a/migration/socket.c
＞
＞
＞ +++ b/migration/socket.c
＞
＞
＞ @@ -147,8 +147,9 @@ static gboolean 
socket_accept_incoming_migration(QIOChannel *ioc,
＞
＞
＞   }
＞
＞
＞
＞
＞
＞   trace_migration_socket_incoming_accepted()
＞
＞
＞
＞
＞
＞   qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
＞
＞
＞ +qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
＞
＞
＞   migration_channel_process_incoming(migrate_get_current(),
＞
＞
＞  QIO_CHANNEL(sioc))
＞
＞
＞   object_unref(OBJECT(sioc))
＞
＞
＞
＞
＞ Is this patch ok?
＞

Yes, i think this works, but a better way maybe to call 
qio_channel_set_feature()
in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
socket accept fd,
Or fix it by this:

diff --git a/io/channel-socket.c b/io/channel-socket.c
index f546c68..ce6894c 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
Error **errp)
  {
  QIOChannelSocket *cioc
-
-cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET))
-cioc-＞fd = -1
+
+cioc = qio_channel_socket_new()
  cioc-＞remoteAddrLen = sizeof(ioc-＞remoteAddr)
  cioc-＞localAddrLen = sizeof(ioc-＞localAddr)


Thanks,
Hailiang

＞ I have test it . The test could not hang any more.
＞
＞
＞
＞
＞
＞
＞
＞
＞
＞
＞
＞
＞ 原始邮件
＞
＞
＞
＞ 发件人： ＜zhang.zhanghaili...@huawei.com＞
＞ 收件人： ＜dgilb...@redhat.com＞ ＜berra...@redhat.com＞
＞ 抄送人： ＜xuqu...@huawei.com＞ ＜qemu-devel@nongnu.org＞ 
＜zhangchen.f...@cn.fujitsu..com＞王广10165992
＞ 日 期 ：2017年03月22日 09:11
＞ 主 题 ：Re: [Qemu-devel]  答复: Re:  答复: Re: [BUG]COLO failover hang
＞
＞
＞
＞
＞
＞ On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
＞ ＞ * Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
＞ ＞＞ Hi,
＞ ＞＞
＞ ＞＞ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
＞ ＞＞
＞ ＞＞ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
＞ ＞＞ case COLO thread/incoming thread is stuck in read/write() while do 
failover,
＞ ＞＞ but it didn't take effect, because all the fd used by COLO (also migration)
＞ ＞＞ has been wrapped by qio channel, and it will not call the shutdown API if
＞ ＞＞ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN).
＞ ＞＞
＞ ＞＞ Cc: Dr. David Alan Gilbert ＜dgilb...@redhat.com＞
＞ ＞＞
＞ ＞＞ I doubted migration cancel has the same problem, it may be stuck in write()
＞ ＞＞ if we tried to cancel migration.
＞ ＞＞
＞ ＞＞ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
Error **errp)
＞ ＞＞ {
＞ ＞＞  qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
＞ ＞＞  migration_channel_connect(s, ioc, NULL)
＞ ＞＞  ... ...
＞ ＞＞ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN) above,
＞ ＞＞ and the
＞ ＞＞ migrate_fd_cancel()
＞ ＞＞ {
＞ ＞＞   ... ...
＞ ＞＞  if (s-＞state == MIGRATION_STATUS_CANCELLING && f) {
＞ ＞＞  qemu_file_shutdown(f)  --＞ This will not take effect. No ?
＞ ＞＞  }
＞ ＞＞ }
＞ ＞
＞ ＞ (cc'd in Daniel Berrange).
＞ ＞ I see that we call qio_channel_set_feature(ioc, 
QIO_CHANNEL_FEATURE_SHUTDOWN) at the
＞ ＞ top of qio_channel_socket_new  so I think that's safe isn't it?
＞ ＞
＞
＞ Hmm, you are right, this problem is only exist for the migration incoming fd, 
thanks.
＞
＞ ＞ Dave
＞ ＞
＞ ＞＞ Thanks,
＞ ＞＞ Hailiang
＞ ＞＞
＞ ＞＞ On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote:
＞ ＞＞＞ Thank you。
＞ ＞＞＞
＞ ＞＞＞ I have test aready。
＞ ＞＞＞
＞ ＞＞＞ When the Primary Node panic,the Secondary Node qemu hang at the same 
place。
＞ ＞＞＞
＞ ＞＞＞ Incorrding http://wiki.qemu-project.org/Features/COLO ，kill Primary Node 
qemu will not produce the problem,but Primary Node panic can。
＞ ＞＞＞
＞ ＞＞＞ I think due to the feature of channel does not support 
QIO_CHANNEL_FEATURE_SHUTDOWN.
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ when failover,channel_shutdown could not shut down the channel.
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ so the colo_process_incoming_thread will hang at recvmsg.
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ I test a patch:
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ diff --git a/migration/socket.c b/migration/socket.c
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ index 13966f1..d65a0ea 100644
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ --- a/migration/socket.c
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ +++ b/migration/socket.c
＞ ＞＞＞
＞ ＞＞＞
＞ ＞＞＞ @@ -147,8 +147,9 @@ static gboolean 
socket_accept_incoming_migration(QIOChannel

Re: [Qemu-devel] [PATCH for-2.9] block: Declare blockdev-add and blockdev-del supported

2017-03-22 Thread Markus Armbruster

Alexandre DERUMIER  writes:

> Pretty awesome news ! Congrat !

Thanks!

> So, can we update the wiki changelog ?

We need to get the patch merged first.

> http://wiki.qemu-project.org/ChangeLog/2.9
>
> "QMP command blockdev-add is still a work in progress. It doesn't support all 
> block drivers, it lacks a matching blockdev-del, and more. It might change 
> incompatibly."

[Qemu-devel] [PATCH] cirrus: fix PUTPIXEL macro

2017-03-22 Thread Gerd Hoffmann

Should be "c" not "col".  The macro is used with "col" as third parameter
everywhere, so this tyops doesn't break something.

Fixes: 026aeffcb4752054830ba203020ed6eb05bcaba8
Reported-by: Dr. David Alan Gilbert 
Signed-off-by: Gerd Hoffmann 
---
 hw/display/cirrus_vga_rop2.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/display/cirrus_vga_rop2.h b/hw/display/cirrus_vga_rop2.h
index b86bcd6..b208b73 100644
--- a/hw/display/cirrus_vga_rop2.h
+++ b/hw/display/cirrus_vga_rop2.h
@@ -29,8 +29,8 @@
 #elif DEPTH == 24
 #define PUTPIXEL(s, a, c)do {  \
 ROP_OP(s, a, c);   \
-ROP_OP(s, a + 1, (col >> 8));  \
-ROP_OP(s, a + 2, (col >> 16)); \
+ROP_OP(s, a + 1, (c >> 8));\
+ROP_OP(s, a + 2, (c >> 16));   \
 } while (0)
 #elif DEPTH == 32
 #define PUTPIXEL(s, a, c)ROP_OP_32(s, a, c)
-- 
1.8.3.1

Re: [Qemu-devel] 答复: Re: 答复: Re: 答复: Re: [BUG]COLO failover hang

2017-03-22 Thread Hailiang Zhang


Hi,

On 2017/3/22 9:42, wang.guan...@zte.com.cn wrote:

diff --git a/migration/socket.c b/migration/socket.c


index 13966f1..d65a0ea 100644


--- a/migration/socket.c


+++ b/migration/socket.c


@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
*ioc,


  }





  trace_migration_socket_incoming_accepted()





  qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")


+qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)


  migration_channel_process_incoming(migrate_get_current(),


 QIO_CHANNEL(sioc))


  object_unref(OBJECT(sioc))




Is this patch ok?



Yes, i think this works, but a better way maybe to call 
qio_channel_set_feature()
in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
socket accept fd,
Or fix it by this:

diff --git a/io/channel-socket.c b/io/channel-socket.c
index f546c68..ce6894c 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
   Error **errp)
 {
 QIOChannelSocket *cioc;
-
-cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
-cioc->fd = -1;
+
+cioc = qio_channel_socket_new();
 cioc->remoteAddrLen = sizeof(ioc->remoteAddr);
 cioc->localAddrLen = sizeof(ioc->localAddr);


Thanks,
Hailiang


I have test it . The test could not hang any more.












原始邮件



发件人： ＜zhang.zhanghaili...@huawei.com＞
收件人： ＜dgilb...@redhat.com＞ ＜berra...@redhat.com＞
抄送人： ＜xuqu...@huawei.com＞ ＜qemu-devel@nongnu.org＞ 
＜zhangchen.f...@cn.fujitsu.com＞王广10165992
日 期 ：2017年03月22日 09:11
主 题 ：Re: [Qemu-devel]  答复: Re:  答复: Re: [BUG]COLO failover hang





On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
＞ * Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
＞＞ Hi,
＞＞
＞＞ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
＞＞
＞＞ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
＞＞ case COLO thread/incoming thread is stuck in read/write() while do failover,
＞＞ but it didn't take effect, because all the fd used by COLO (also migration)
＞＞ has been wrapped by qio channel, and it will not call the shutdown API if
＞＞ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN).
＞＞
＞＞ Cc: Dr. David Alan Gilbert ＜dgilb...@redhat.com＞
＞＞
＞＞ I doubted migration cancel has the same problem, it may be stuck in write()
＞＞ if we tried to cancel migration.
＞＞
＞＞ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
Error **errp)
＞＞ {
＞＞  qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
＞＞  migration_channel_connect(s, ioc, NULL)
＞＞  ... ...
＞＞ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN) above,
＞＞ and the
＞＞ migrate_fd_cancel()
＞＞ {
＞＞   ... ...
＞＞  if (s-＞state == MIGRATION_STATUS_CANCELLING && f) {
＞＞  qemu_file_shutdown(f)  --＞ This will not take effect. No ?
＞＞  }
＞＞ }
＞
＞ (cc'd in Daniel Berrange).
＞ I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN) 
at the
＞ top of qio_channel_socket_new  so I think that's safe isn't it?
＞

Hmm, you are right, this problem is only exist for the migration incoming fd, 
thanks.

＞ Dave
＞
＞＞ Thanks,
＞＞ Hailiang
＞＞
＞＞ On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote:
＞＞＞ Thank you。
＞＞＞
＞＞＞ I have test aready。
＞＞＞
＞＞＞ When the Primary Node panic,the Secondary Node qemu hang at the same place。
＞＞＞
＞＞＞ Incorrding http://wiki.qemu-project.org/Features/COLO ，kill Primary Node 
qemu will not produce the problem,but Primary Node panic can。
＞＞＞
＞＞＞ I think due to the feature of channel does not support 
QIO_CHANNEL_FEATURE_SHUTDOWN.
＞＞＞
＞＞＞
＞＞＞ when failover,channel_shutdown could not shut down the channel.
＞＞＞
＞＞＞
＞＞＞ so the colo_process_incoming_thread will hang at recvmsg.
＞＞＞
＞＞＞
＞＞＞ I test a patch:
＞＞＞
＞＞＞
＞＞＞ diff --git a/migration/socket.c b/migration/socket.c
＞＞＞
＞＞＞
＞＞＞ index 13966f1..d65a0ea 100644
＞＞＞
＞＞＞
＞＞＞ --- a/migration/socket.c
＞＞＞
＞＞＞
＞＞＞ +++ b/migration/socket.c
＞＞＞
＞＞＞
＞＞＞ @@ -147,8 +147,9 @@ static gboolean 
socket_accept_incoming_migration(QIOChannel *ioc,
＞＞＞
＞＞＞
＞＞＞}
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞trace_migration_socket_incoming_accepted()
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
＞＞＞
＞＞＞
＞＞＞ +qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN)
＞＞＞
＞＞＞
＞＞＞migration_channel_process_incoming(migrate_get_current(),
＞＞＞
＞＞＞
＞＞＞   QIO_CHANNEL(sioc))
＞＞＞
＞＞＞
＞＞＞object_unref(OBJECT(sioc))
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞ My test will not hang any more.
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞
＞＞＞ 原始邮件
＞＞＞
＞＞＞
＞＞＞
＞＞＞ 发件人： ＜zhangchen.f...@cn.fujitsu..com＞
＞＞＞ 收件人：王广10165992 ＜zhang.zhanghaili...@huawei.com＞
＞＞＞ 抄送人：

Re: [Qemu-devel] [RFC][PATCH 0/6] "bootonceindex" property

2017-03-22 Thread Janne Huttunen

On Tue, 2017-03-21 at 18:55 +0100, Paolo Bonzini wrote:
> 
> > Since real HW has this capability, there exist certain
> > auxiliary systems that are built on it. Having similar
> > semantics available in QEMU allows me to build a virtual
> > machine that works with these systems without modifying
> > them in any way.
>
> How does real hardware do it?  I suppose you'd do it with a firmware
> setup menu or something like that; would it be enough to add a way to
> modify bootindex during runtime?

On the real hardware the "boot once" really means *once*
i.e. it only affects the next reboot regardless of how
the next boot is triggered (reset button, power button,
software, etc.). After the next boot the normal boot
order is automatically restored.

Theoretically it should be possible to get a close
approximation of this by changing the main boot order,
waiting for the boot to happen and then restoring the
original order back. This would require having some
process that constantly monitors what QEMU is doing so
that it can notice when the boot happens and then
restore the order. I'm trying to avoid having such
a process if possible, which in this case means that
QEMU would need to restore the order by itself.

Re: [Qemu-devel] [PATCH v2 3/3] qapi: Fix QemuOpts visitor regression on unvisited input

2017-03-22 Thread Markus Armbruster

Eric Blake  writes:

> An off-by-one in commit 15c2f669e meant that we were failing to
> check for unparsed input in all QemuOpts visitors.  Recent testsuite
> additions show that fixing the obvious bug with bogus fields will
> also fix the case of an incomplete list visit; update the tests to
> match the new behavior.
>
> Simple testcase:
>
> ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio -numa 
> node,size=1g
>
> failed to diagnose that 'size' is not a valid argument to -numa, and
> now once again reports:
>
> qemu-system-x86_64: -numa node,size=1g: Invalid parameter 'size'
>
> CC: qemu-sta...@nongnu.org
> Signed-off-by: Eric Blake 
> Reviewed-by: Michael Roth 
> Tested-by: Laurent Vivier 
>
> ---
> v2: trivial rebase to comment tweak in patch 1
> ---
>  qapi/opts-visitor.c   |  6 +++---
>  tests/test-opts-visitor.c | 15 +--
>  2 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/qapi/opts-visitor.c b/qapi/opts-visitor.c
> index 026d25b..b54da81 100644
> --- a/qapi/opts-visitor.c
> +++ b/qapi/opts-visitor.c
> @@ -164,7 +164,7 @@ opts_check_struct(Visitor *v, Error **errp)
>  GHashTableIter iter;
>  GQueue *any;
>
> -if (ov->depth > 0) {
> +if (ov->depth > 1) {
>  return;
>  }
>
> @@ -276,8 +276,8 @@ static void
>  opts_check_list(Visitor *v, Error **errp)
>  {
>  /*
> - * FIXME should set error when unvisited elements remain.  Mostly
> - * harmless, as the generated visits always visit all elements.
> + * Unvisited list elements will be reported later when checking if
> + * unvisited struct members remain.

Non-native speaker question: if or whether?

>   */
>  }
>
> diff --git a/tests/test-opts-visitor.c b/tests/test-opts-visitor.c
> index 8e0dda5..1766919 100644
> --- a/tests/test-opts-visitor.c
> +++ b/tests/test-opts-visitor.c
> @@ -175,6 +175,7 @@ expect_u64_max(OptsVisitorFixture *f, gconstpointer 
> test_data)
>  static void
>  test_opts_range_unvisited(void)
>  {
> +Error *err = NULL;
>  intList *list = NULL;
>  intList *tail;
>  QemuOpts *opts;
> @@ -199,10 +200,11 @@ test_opts_range_unvisited(void)
>  g_assert_cmpint(tail->value, ==, 1);
>  tail = (intList *)visit_next_list(v, (GenericList *)tail, sizeof(*list));
>  g_assert(tail);
> -visit_check_list(v, _abort); /* BUG: unvisited tail not reported */
> +visit_check_list(v, _abort); /* unvisited tail ignored until... */
>  visit_end_list(v, (void **));
>
> -visit_check_struct(v, _abort);
> +visit_check_struct(v, ); /* ...here */
> +error_free_or_abort();
>  visit_end_struct(v, NULL);
>
>  qapi_free_intList(list);

How come unvisited tails are diagnosed late?

> @@ -239,7 +241,7 @@ test_opts_range_beyond(void)
>  error_free_or_abort();
>  visit_end_list(v, (void **));
>
> -visit_check_struct(v, _abort);
> +visit_check_struct(v, );

This looks wrong.  Either you expect an error or not.  If you do,
error_free_or_abort() seems missing.  If you don't, the hunk needs to be
dropped.

>  visit_end_struct(v, NULL);
>
>  qapi_free_intList(list);
> @@ -250,6 +252,7 @@ test_opts_range_beyond(void)
>  static void
>  test_opts_dict_unvisited(void)
>  {
> +Error *err = NULL;
>  QemuOpts *opts;
>  Visitor *v;
>  UserDefOptions *userdef;
> @@ -258,11 +261,11 @@ test_opts_dict_unvisited(void)
> _abort);
>
>  v = opts_visitor_new(opts);
> -/* BUG: bogus should be diagnosed */
> -visit_type_UserDefOptions(v, NULL, , _abort);
> +visit_type_UserDefOptions(v, NULL, , );
> +error_free_or_abort();
>  visit_free(v);
>  qemu_opts_del(opts);
> -qapi_free_UserDefOptions(userdef);
> +g_assert(!userdef);
>  }
>
>  int

Re: [Qemu-devel] [PATCH v2 2/3] qom: Avoid unvisited 'id'/'qom-type' in user_creatable_add_opts

2017-03-22 Thread Markus Armbruster

Eric Blake  writes:

> A regression in commit 15c2f669e caused us to silently ignore
> excess input to the QemuOpts visitor.  Later, commit ea4641
> accidentally abused that situation, by removing "qom-type" and
> "id" from the corresponding QDict but leaving them defined in
> the QemuOpts, when using the pair of containers to create a
> user-defined object. Note that since we are already traversing
> two separate items (a QDict and a QemuOpts), we are already
> able to flag bogus arguments, as in:
>
> $ ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio 
> -object memory-backend-ram,id=mem1,size=4k,bogus=huh
> qemu-system-x86_64: -object memory-backend-ram,id=mem1,size=4k,bogus=huh: 
> Property '.bogus' not found
>
> So the only real concern is that when we re-enable strict checking
> in the QemuOpts visitor, we do not want to start flagging the two
> leftover keys as unvisited.  Rearrange the code to clean out the
> QemuOpts listing in advance, rather than removing items from the
> QDict.  Since "qom-type" is usually an automatic implicit default,
> we don't have to restore it; but "id" has to be put back (requiring
> us to cast away a const).

This is a yet another example of how actual configuration can easily
diverge from the one in QemuOpts.  Discussed recently in:

Subject: Re: [PATCH 0/2] add writeconfig command on monitor
Message-ID: <87k28qlca9@dusky.pond.sub.org>
https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg03476.html

Not putting "qom-type" back here is okay.  Putting it back would also be
okay.  I guess what you prefer depends on your level of OCD.

> CC: qemu-sta...@nongnu.org
> Signed-off-by: Eric Blake 
>
> ---
> v2: new patch
> ---
>  qom/object_interfaces.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
> index 03a95c3..cc9a694 100644
> --- a/qom/object_interfaces.c
> +++ b/qom/object_interfaces.c
> @@ -114,7 +114,7 @@ Object *user_creatable_add_opts(QemuOpts *opts, Error 
> **errp)
>  QDict *pdict;
>  Object *obj;
>  const char *id = qemu_opts_id(opts);
> -const char *type = qemu_opt_get(opts, "qom-type");
> +char *type = qemu_opt_get_del(opts, "qom-type");
>
>  if (!type) {
>  error_setg(errp, QERR_MISSING_PARAMETER, "qom-type");
> @@ -125,14 +125,15 @@ Object *user_creatable_add_opts(QemuOpts *opts, Error 
> **errp)
>  return NULL;
>  }
>
> +qemu_opts_set_id(opts, NULL);
>  pdict = qemu_opts_to_qdict(opts, NULL);
> -qdict_del(pdict, "qom-type");
> -qdict_del(pdict, "id");
>
>  v = opts_visitor_new(opts);
>  obj = user_creatable_add_type(type, id, pdict, v, errp);
>  visit_free(v);
>
> +qemu_opts_set_id(opts, (char *) id);
> +g_free(type);
>  QDECREF(pdict);
>  return obj;
>  }

Aside: I dislike how this converts QemuOpts to QDict so
user_creatable_add_type() can use the QDict to guide the visit.
Awkward, as so much code that uses QemuOpts in not entirely
straightforward ways.

Re: [Qemu-devel] [PATCH] virtio: fix vring_align() on 64-bit win32 platforms

2017-03-22 Thread Stefan Weil


Am 22.03.2017 um 00:06 schrieb Andrew Baumann:

From: Eric Blake [mailto:ebl...@redhat.com]
Sent: Tuesday, 21 March 2017 15:52

On 03/21/2017 05:31 PM, Andrew Baumann wrote:

"long" is 32-bits on win32, but we need to promote it to a 64-bit hwaddr
before negating, or else the top half of the address is truncated

Signed-off-by: Andrew Baumann 
---
 include/hw/virtio/virtio.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 15efcf2..a0a8543 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -34,7 +34,7 @@ struct VirtQueue;
 static inline hwaddr vring_align(hwaddr addr,
  unsigned long align)
 {
-return (addr + align - 1) & ~(align - 1);
+return (addr + align - 1) & ~(hwaddr)(align - 1);


Why not just use the QEMU_ALIGN_DOWN macro, instead of open-coding it?


Well, this code is aligning up, but yes the ALIGN_UP macro looks like it should 
also avoid the type promotion problem. This patch is just the 
minimally-invasive change after discovering the bug.

Let me know if you want me to spin another patch with the macro.

Andrew


Yes, please use QEMU_ALIGN_UP in an updated patch.
This is a bug fix needed for v2.9.0.

Fixing all other code locations which round up or down
with Coccinelle is a separate task, nothing which is
needed for the next QEMU version.

Thanks,
Stefan

< 1 2 3 4 >

201 - 300 of 301 matches

Mail list logo