date:20130620

[Qemu-devel] [PATCH 2/2] monitor: support sub commands in auto completion

2013-06-20 Thread Wenchao Xia

This patch allow auot completion work normal in sub command case,
"info block [DEVICE]" can auto complete now, by re-enter the completion
function. Also, original "info" is treated as a special case, now it is
treated as a sub command group, global variable info_cmds is not used
any more.

Signed-off-by: Wenchao Xia 
---
 monitor.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/monitor.c b/monitor.c
index bc60171..c706644 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4180,6 +4180,11 @@ static void monitor_find_completion(Monitor *mon,
 goto cleanup;
 }
 
+if (cmd->sub_table) {
+return monitor_find_completion(mon, cmd->sub_table,
+   cmdline + strlen(cmd->name));
+}
+
 ptype = next_arg_type(cmd->args_type);
 for(i = 0; i < nb_args - 2; i++) {
 if (*ptype != '\0') {
@@ -4207,12 +4212,7 @@ static void monitor_find_completion(Monitor *mon,
 break;
 case 's':
 /* XXX: more generic ? */
-if (!strcmp(cmd->name, "info")) {
-readline_set_completion_index(mon->rs, strlen(str));
-for(cmd = info_cmds; cmd->name != NULL; cmd++) {
-cmd_completion(mon, str, cmd->name);
-}
-} else if (!strcmp(cmd->name, "sendkey")) {
+if (!strcmp(cmd->name, "sendkey")) {
 char *sep = strrchr(str, '-');
 if (sep)
 str = sep + 1;
-- 
1.7.1

[Qemu-devel] [PATCH V2 8/9] NUMA: add hmp command set-mpol

2013-06-20 Thread Wanlong Gao

Add hmp command set-mpol to set host memory policy for a guest
NUMA node. Then we can also set node's memory policy using
the monitor command like:
(qemu) set-mpol 0 mem-policy=membind,mem-hostnode=0-1

Signed-off-by: Wanlong Gao 
---
 hmp-commands.hx | 16 
 hmp.c   | 35 +++
 hmp.h   |  1 +
 3 files changed, 52 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 915b0d1..417b69f 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1567,6 +1567,22 @@ Executes a qemu-io command on the given block device.
 ETEXI
 
 {
+.name   = "set-mpol",
+.args_type  = "nodeid:i,args:s?",
+.params = "nodeid [args]",
+.help   = "set host memory policy for a guest NUMA node",
+.mhandler.cmd = hmp_set_mpol,
+},
+
+STEXI
+@item set-mpol @var{nodeid} @var{args}
+@findex set-mpol
+
+Set host memory policy for a guest NUMA node
+
+ETEXI
+
+{
 .name   = "info",
 .args_type  = "item:s?",
 .params = "[subcommand]",
diff --git a/hmp.c b/hmp.c
index 494a9aa..81bddb1 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1464,3 +1464,38 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
 
 hmp_handle_error(mon, &err);
 }
+
+void hmp_set_mpol(Monitor *mon, const QDict *qdict)
+{
+Error *local_err = NULL;
+bool has_mpol = true;
+bool has_hostnode = true;
+const char *mpol = NULL;
+const char *hostnode = NULL;
+QemuOpts *opts;
+
+uint64_t nodeid = qdict_get_int(qdict, "nodeid");
+const char *args = qdict_get_try_str(qdict, "args");
+
+if (args == NULL) {
+has_mpol = false;
+has_hostnode = false;
+} else {
+opts = qemu_opts_parse(qemu_find_opts("numa"), args, 1);
+if (opts == NULL) {
+error_setg(&local_err, "Parsing memory policy args failed");
+} else {
+mpol = qemu_opt_get(opts, "mem-policy");
+if (mpol == NULL) {
+has_mpol = false;
+}
+hostnode = qemu_opt_get(opts, "mem-hostnode");
+if (hostnode == NULL) {
+has_hostnode = false;
+}
+}
+}
+
+qmp_set_mpol(nodeid, has_mpol, mpol, has_hostnode, hostnode, &local_err);
+hmp_handle_error(mon, &local_err);
+}
diff --git a/hmp.h b/hmp.h
index 56d2e92..81f631b 100644
--- a/hmp.h
+++ b/hmp.h
@@ -86,5 +86,6 @@ void hmp_nbd_server_stop(Monitor *mon, const QDict *qdict);
 void hmp_chardev_add(Monitor *mon, const QDict *qdict);
 void hmp_chardev_remove(Monitor *mon, const QDict *qdict);
 void hmp_qemu_io(Monitor *mon, const QDict *qdict);
+void hmp_set_mpol(Monitor *mon, const QDict *qdict);
 
 #endif
-- 
1.8.3.1.448.gfb7dfaa

[Qemu-devel] [PATCH 0/2] support sub command group for auto completion in monitor

2013-06-20 Thread Wenchao Xia

This patch modified auto completion a bit, to make it work when there is a
folded sub command group, for example, "info" is a sub command group of root
command group.

Note that at patch 1, the parameter *mon and *cmd_table is brought until
monitor_init() level. If *cmd_table is added also in it, a monitor can
be created with different command tables, but that requirement do not exist
yet, so not changed it to save trouble. 

Wenchao Xia (2):
  1) monitor: discard global variable in auto completion functions
  2) monitor: support sub commands in auto completion

 include/monitor/readline.h |8 -
 monitor.c  |   78 ++--
 readline.c |4 ++-
 3 files changed, 56 insertions(+), 34 deletions(-)

[Qemu-devel] [PATCH V2 9/9] NUMA: show host memory policy info in info numa command

2013-06-20 Thread Wanlong Gao

Show host memory policy of nodes in the info numa monitor command.
After this patch, the monitor command "info numa" will show the
information like following if the host numa support is enabled:

(qemu) info numa
2 nodes
node 0 cpus: 0
node 0 size: 1024 MB
node 0 mempolicy: membind=0,1
node 1 cpus: 1
node 1 size: 1024 MB
node 1 mempolicy: interleave=1

Signed-off-by: Wanlong Gao 
---
 monitor.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/monitor.c b/monitor.c
index 61dbebb..b6e93e5 100644
--- a/monitor.c
+++ b/monitor.c
@@ -74,6 +74,11 @@
 #endif
 #include "hw/lm32/lm32_pic.h"
 
+#ifdef CONFIG_NUMA
+#include 
+#include 
+#endif
+
 //#define DEBUG
 //#define DEBUG_COMPLETION
 
@@ -1807,6 +1812,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
 int i;
 CPUArchState *env;
 CPUState *cpu;
+unsigned long first, next;
 
 monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
 for (i = 0; i < nb_numa_nodes; i++) {
@@ -1820,6 +1826,42 @@ static void do_info_numa(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "\n");
 monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
 numa_info[i].node_mem >> 20);
+
+#ifdef CONFIG_NUMA
+monitor_printf(mon, "node %d mempolicy: ", i);
+switch (numa_info[i].flags & NODE_HOST_POLICY_MASK) {
+case NODE_HOST_BIND:
+monitor_printf(mon, "membind=");
+break;
+case NODE_HOST_INTERLEAVE:
+monitor_printf(mon, "interleave=");
+break;
+case NODE_HOST_PREFERRED:
+monitor_printf(mon, "preferred=");
+break;
+default:
+monitor_printf(mon, "default\n");
+continue;
+}
+
+if (numa_info[i].flags & NODE_HOST_RELATIVE)
+monitor_printf(mon, "+");
+
+next = first = find_first_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS);
+monitor_printf(mon, "%lu", first);
+do {
+if (next == numa_max_node())
+break;
+next = find_next_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS,
+ next + 1);
+if (next > numa_max_node() || next == MAX_CPUMASK_BITS)
+break;
+
+monitor_printf(mon, ",%lu", next);
+} while (true);
+
+monitor_printf(mon, "\n");
+#endif
 }
 }
 
-- 
1.8.3.1.448.gfb7dfaa

[Qemu-devel] [PATCH V2 3/9] NUMA: Add Linux libnuma detection

2013-06-20 Thread Wanlong Gao

Add detection of libnuma (mostly contained in the numactl package)
to the configure script. Can be enabled or disabled on the command line,
default is use if available.

Signed-off-by: Andre Przywara 
Signed-off-by: Wanlong Gao 
---
 configure | 32 
 1 file changed, 32 insertions(+)

diff --git a/configure b/configure
index ad32f87..2d2b177 100755
--- a/configure
+++ b/configure
@@ -242,6 +242,7 @@ gtk=""
 gtkabi="2.0"
 tpm="no"
 libssh2=""
+numa=""
 
 # parse CC options first
 for opt do
@@ -944,6 +945,10 @@ for opt do
   ;;
   --enable-libssh2) libssh2="yes"
   ;;
+  --disable-numa) numa="no"
+  ;;
+  --enable-numa) numa="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1158,6 +1163,8 @@ echo "  --gcov=GCOV  use specified gcov 
[$gcov_tool]"
 echo "  --enable-tpm enable TPM support"
 echo "  --disable-libssh2disable ssh block device support"
 echo "  --enable-libssh2 enable ssh block device support"
+echo "  --disable-numa   disable libnuma support"
+echo "  --enable-numaenable libnuma support"
 echo ""
 echo "NOTE: The object files are built at the place where configure is 
launched"
 exit 1
@@ -2389,6 +2396,27 @@ EOF
 fi
 
 ##
+# libnuma probe
+
+if test "$numa" != "no" ; then
+  numa=no
+  cat > $TMPC << EOF
+#include 
+int main(void) { return numa_available(); }
+EOF
+
+  if compile_prog "" "-lnuma" ; then
+numa=yes
+libs_softmmu="-lnuma $libs_softmmu"
+  else
+if test "$numa" = "yes" ; then
+  feature_not_found "linux NUMA (install numactl?)"
+fi
+numa=no
+  fi
+fi
+
+##
 # linux-aio probe
 
 if test "$linux_aio" != "no" ; then
@@ -3556,6 +3584,7 @@ echo "TPM support   $tpm"
 echo "libssh2 support   $libssh2"
 echo "TPM passthrough   $tpm_passthrough"
 echo "QOM debugging $qom_cast_debug"
+echo "NUMA host support $numa"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3589,6 +3618,9 @@ echo "extra_cflags=$EXTRA_CFLAGS" >> $config_host_mak
 echo "extra_ldflags=$EXTRA_LDFLAGS" >> $config_host_mak
 echo "qemu_localedir=$qemu_localedir" >> $config_host_mak
 echo "libs_softmmu=$libs_softmmu" >> $config_host_mak
+if test "$numa" = "yes"; then
+  echo "CONFIG_NUMA=y" >> $config_host_mak
+fi
 
 echo "ARCH=$ARCH" >> $config_host_mak
 
-- 
1.8.3.1.448.gfb7dfaa

[Qemu-devel] [PATCH V2 7/9] NUMA: add qmp command set-mpol to set memory policy for NUMA node

2013-06-20 Thread Wanlong Gao

The QMP command let it be able to set node's memory policy
through the QMP protocol. The qmp-shell command is like:
set-mpol nodeid=0 mem-policy=membind mem-hostnode=0-1

Signed-off-by: Wanlong Gao 
---
 cpus.c   | 54 ++
 qapi-schema.json | 15 +++
 qmp-commands.hx  | 35 +++
 3 files changed, 104 insertions(+)

diff --git a/cpus.c b/cpus.c
index 677ee15..9c2706c 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1432,3 +1432,57 @@ void qmp_inject_nmi(Error **errp)
 error_set(errp, QERR_UNSUPPORTED);
 #endif
 }
+
+void qmp_set_mpol(int64_t nodeid, bool has_mpol, const char *mpol,
+  bool has_hostnode, const char *hostnode, Error **errp)
+{
+unsigned int flags;
+DECLARE_BITMAP(host_mem, MAX_CPUMASK_BITS);
+
+if (nodeid >= nb_numa_nodes) {
+error_setg(errp, "Only has '%d' NUMA nodes", nb_numa_nodes);
+return;
+}
+
+bitmap_copy(host_mem, numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
+flags = numa_info[nodeid].flags;
+
+numa_info[nodeid].flags = NODE_HOST_NONE;
+bitmap_zero(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
+
+if (!has_mpol) {
+if (set_node_mpol(nodeid) == -1) {
+error_setg(errp, "Failed to set memory policy for node%lu", 
nodeid);
+goto error;
+}
+return;
+}
+
+numa_node_parse_mpol(nodeid, mpol, errp);
+if (error_is_set(errp)) {
+goto error;
+}
+
+if (!has_hostnode) {
+bitmap_fill(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
+}
+
+if (hostnode) {
+numa_node_parse_hostnode(nodeid, hostnode, errp);
+if (error_is_set(errp)) {
+goto error;
+}
+}
+
+if (set_node_mpol(nodeid) == -1) {
+error_setg(errp, "Failed to set memory policy for node%lu", nodeid);
+goto error;
+}
+
+return;
+
+error:
+bitmap_copy(numa_info[nodeid].host_mem, host_mem, MAX_CPUMASK_BITS);
+numa_info[nodeid].flags = flags;
+return;
+}
diff --git a/qapi-schema.json b/qapi-schema.json
index a80ee40..cedcbe1 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3608,3 +3608,18 @@
 '*cpuid-input-ecx': 'int',
 'cpuid-register': 'X86CPURegister32',
 'features': 'int' } }
+
+# @set-mpol:
+#
+# Set the host memory binding policy for guest NUMA node.
+#
+# @nodeid: The node ID of guest NUMA node to set memory policy to.
+#
+# @mem-policy: The memory policy string to set.
+#
+# @mem-hostnode: The host node or node range for memory policy.
+#
+# Since: 1.6.0
+##
+{ 'command': 'set-mpol', 'data': {'nodeid': 'int', '*mem-policy': 'str',
+  '*mem-hostnode': 'str'} }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 8cea5e5..7bb5038 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2997,3 +2997,38 @@ Example:
 <- { "return": {} }
 
 EQMP
+
+{
+.name  = "set-mpol",
+.args_type = "nodeid:i,mem-policy:s?,mem-hostnode:s?",
+.help  = "Set the host memory binding policy for guest NUMA node",
+.mhandler.cmd_new = qmp_marshal_input_set_mpol,
+},
+
+SQMP
+set-mpol
+--
+
+Set the host memory binding policy for guest NUMA node
+
+Arguments:
+
+- "nodeid": The nodeid of guest NUMA node to set memory policy to.
+(json-int)
+- "mem-policy": The memory policy string to set.
+(json-string, optional)
+- "mem-hostnode": The host nodes contained to mpol.
+  (json-string, optional)
+
+Example:
+
+-> { "execute": "set-mpol", "arguments": { "nodeid": 0, "mem-policy": 
"membind",
+   "mem-hostnode": "0-1" }}
+<- { "return": {} }
+
+Notes:
+1. If "mem-policy" is not set, the memory policy of this "nodeid" will be 
set
+   to "default".
+2. If "mem-hostnode" is not set, the node mask of this "mpol" will be set
+   to "all".
+EQMP
-- 
1.8.3.1.448.gfb7dfaa

[Qemu-devel] [PATCH V2 4/9] NUMA: parse guest numa nodes memory policy

2013-06-20 Thread Wanlong Gao

The memory policy setting format is like:
mem-policy={membind|interleave|preferred},mem-hostnode=[+|!]{all|N-N}
And we are adding this setting as a suboption of "-numa",
the memory policy then can be set like following:
 -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1
 -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=!1

Signed-off-by: Andre Przywara 
Signed-off-by: Wanlong Gao 
---
 include/sysemu/sysemu.h |   8 
 vl.c| 110 
 2 files changed, 118 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 70fd2ed..993b8e0 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -130,10 +130,18 @@ extern QEMUClock *rtc_clock;
 
 #define MAX_NODES 64
 #define MAX_CPUMASK_BITS 255
+#define NODE_HOST_NONE0x00
+#define NODE_HOST_BIND0x01
+#define NODE_HOST_INTERLEAVE  0x02
+#define NODE_HOST_PREFERRED   0x03
+#define NODE_HOST_POLICY_MASK 0x03
+#define NODE_HOST_RELATIVE0x04
 extern int nb_numa_nodes;
 struct node_info {
 uint64_t node_mem;
 DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
+DECLARE_BITMAP(host_mem, MAX_CPUMASK_BITS);
+unsigned int flags;
 };
 extern struct node_info numa_info[MAX_NODES];
 
diff --git a/vl.c b/vl.c
index 357137b..4dbf5cc 100644
--- a/vl.c
+++ b/vl.c
@@ -536,6 +536,14 @@ static QemuOptsList qemu_numa_opts = {
 .name = "cpus",
 .type = QEMU_OPT_STRING,
 .help = "cpu number or range"
+},{
+.name = "mem-policy",
+.type = QEMU_OPT_STRING,
+.help = "memory policy"
+},{
+.name = "mem-hostnode",
+.type = QEMU_OPT_STRING,
+.help = "host node number or range for memory policy"
 },
 { /* end of list */ }
 },
@@ -1374,6 +1382,79 @@ error:
 exit(1);
 }
 
+static void numa_node_parse_mpol(int nodenr, const char *mpol)
+{
+if (!mpol) {
+return;
+}
+
+if (!strcmp(mpol, "interleave")) {
+numa_info[nodenr].flags |= NODE_HOST_INTERLEAVE;
+} else if (!strcmp(mpol, "preferred")) {
+numa_info[nodenr].flags |= NODE_HOST_PREFERRED;
+} else if (!strcmp(mpol, "membind")) {
+numa_info[nodenr].flags |= NODE_HOST_BIND;
+} else {
+fprintf(stderr, "qemu: Invalid memory policy: %s\n", mpol);
+}
+}
+
+static void numa_node_parse_hostnode(int nodenr, const char *hostnode)
+{
+unsigned long long value, endvalue;
+char *endptr;
+bool clear = false;
+unsigned long *bm = numa_info[nodenr].host_mem;
+
+if (hostnode[0] == '!') {
+clear = true;
+bitmap_fill(bm, MAX_CPUMASK_BITS);
+hostnode++;
+}
+if (hostnode[0] == '+') {
+numa_info[nodenr].flags |= NODE_HOST_RELATIVE;
+hostnode++;
+}
+
+if (!strcmp(hostnode, "all")) {
+bitmap_fill(bm, MAX_CPUMASK_BITS);
+return;
+}
+
+if (parse_uint(hostnode, &value, &endptr, 10) < 0)
+goto error;
+if (*endptr == '-') {
+if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
+goto error;
+}
+} else if (*endptr == '\0') {
+endvalue = value;
+} else {
+goto error;
+}
+
+if (endvalue >= MAX_CPUMASK_BITS) {
+endvalue = MAX_CPUMASK_BITS - 1;
+fprintf(stderr,
+"qemu: NUMA: A max of %d host nodes are supported\n",
+ MAX_CPUMASK_BITS);
+}
+
+if (endvalue < value) {
+goto error;
+}
+
+if (clear)
+bitmap_clear(bm, value, endvalue - value + 1);
+else
+bitmap_set(bm, value, endvalue - value + 1);
+
+return;
+
+error:
+fprintf(stderr, "qemu: Invalid host NUMA nodes range: %s\n", hostnode);
+return;
+}
 
 static int numa_add_cpus(const char *name, const char *value, void *opaque)
 {
@@ -1385,6 +1466,25 @@ static int numa_add_cpus(const char *name, const char 
*value, void *opaque)
 return 0;
 }
 
+static int numa_add_mpol(const char *name, const char *value, void *opaque)
+{
+int *nodenr = opaque;
+
+if (!strcmp(name, "mem-policy")) {
+numa_node_parse_mpol(*nodenr, value);
+}
+return 0;
+}
+
+static int numa_add_hostnode(const char *name, const char *value, void *opaque)
+{
+int *nodenr = opaque;
+if (!strcmp(name, "mem-hostnode")) {
+numa_node_parse_hostnode(*nodenr, value);
+}
+return 0;
+}
+
 static int numa_init_func(QemuOpts *opts, void *opaque)
 {
 uint64_t nodenr, mem_size;
@@ -1404,6 +1504,14 @@ static int numa_init_func(QemuOpts *opts, void *opaque)
 return -1;
 }
 
+if (qemu_opt_foreach(opts, numa_add_mpol, &nodenr, 1) < 0) {
+return -1;
+}
+
+if (qemu_opt_foreach(opts, numa_add_hostnode, &nodenr, 1) < 0) {
+return -1;
+}
+
 return 0;
 }
 
@@ -2930,6 +3038,8 @@ int main(int argc, char **argv, char **envp)
 for (i = 0; i < MAX

[Qemu-devel] [PATCH V2 5/9] NUMA: set guest numa nodes memory policy

2013-06-20 Thread Wanlong Gao

Set the guest numa nodes memory policies using the mbind(2)
system call node by node.
After this patch, we are able to set guest nodes memory policies
through the QEMU options, this arms to solve the guest cross
nodes memory access performance issue.
And as you all know, if PCI-passthrough is used,
direct-attached-device uses DMA transfer between device and qemu process.
All pages of the guest will be pinned by get_user_pages().

KVM_ASSIGN_PCI_DEVICE ioctl
  kvm_vm_ioctl_assign_device()
=>kvm_assign_device()
  => kvm_iommu_map_memslots()
=> kvm_iommu_map_pages()
   => kvm_pin_pages()

So, with direct-attached-device, all guest page's page count will be +1 and
any page migration will not work. AutoNUMA won't too.

So, we should set the guest nodes memory allocation policies before
the pages are really mapped.

Signed-off-by: Andre Przywara 
Signed-off-by: Wanlong Gao 
---
 cpus.c | 87 ++
 1 file changed, 87 insertions(+)

diff --git a/cpus.c b/cpus.c
index e123d3f..677ee15 100644
--- a/cpus.c
+++ b/cpus.c
@@ -60,6 +60,15 @@
 
 #endif /* CONFIG_LINUX */
 
+#ifdef CONFIG_NUMA
+#include 
+#include 
+#ifndef MPOL_F_RELATIVE_NODES
+#define MPOL_F_RELATIVE_NODES (1 << 14)
+#define MPOL_F_STATIC_NODES   (1 << 15)
+#endif
+#endif
+
 static CPUArchState *next_cpu;
 
 static bool cpu_thread_is_idle(CPUArchState *env)
@@ -1186,6 +1195,75 @@ static void tcg_exec_all(void)
 exit_request = 0;
 }
 
+#ifdef CONFIG_NUMA
+static int node_parse_bind_mode(unsigned int nodeid)
+{
+int bind_mode;
+
+switch (numa_info[nodeid].flags & NODE_HOST_POLICY_MASK) {
+case NODE_HOST_BIND:
+bind_mode = MPOL_BIND;
+break;
+case NODE_HOST_INTERLEAVE:
+bind_mode = MPOL_INTERLEAVE;
+break;
+case NODE_HOST_PREFERRED:
+bind_mode = MPOL_PREFERRED;
+break;
+default:
+bind_mode = MPOL_DEFAULT;
+return bind_mode;
+}
+
+bind_mode |= (numa_info[nodeid].flags & NODE_HOST_RELATIVE) ?
+MPOL_F_RELATIVE_NODES : MPOL_F_STATIC_NODES;
+
+return bind_mode;
+}
+#endif
+
+static int set_node_mpol(unsigned int nodeid)
+{
+#ifdef CONFIG_NUMA
+void *ram_ptr;
+RAMBlock *block;
+ram_addr_t len, ram_offset = 0;
+int bind_mode;
+int i;
+
+QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+if (!strcmp(block->mr->name, "pc.ram")) {
+break;
+}
+}
+
+if (block->host == NULL)
+return -1;
+
+ram_ptr = block->host;
+for (i = 0; i < nodeid; i++) {
+len = numa_info[i].node_mem;
+ram_offset += len;
+}
+
+len = numa_info[i].node_mem;
+bind_mode = node_parse_bind_mode(i);
+
+/* This is a workaround for a long standing bug in Linux'
+ * mbind implementation, which cuts off the last specified
+ * node. To stay compatible should this bug be fixed, we
+ * specify one more node and zero this one out.
+ */
+clear_bit(numa_num_configured_nodes() + 1, numa_info[i].host_mem);
+if (mbind(ram_ptr + ram_offset, len, bind_mode,
+numa_info[i].host_mem, numa_num_configured_nodes() + 1, 0)) {
+perror("mbind");
+return -1;
+}
+#endif
+return 0;
+}
+
 void set_numa_modes(void)
 {
 CPUArchState *env;
@@ -1200,6 +1278,15 @@ void set_numa_modes(void)
 }
 }
 }
+
+#ifdef CONFIG_NUMA
+for (i = 0; i < nb_numa_nodes; i++) {
+if (set_node_mpol(i) == -1) {
+fprintf(stderr,
+"qemu: can't set host memory policy for node%d\n", i);
+}
+}
+#endif
 }
 
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
-- 
1.8.3.1.448.gfb7dfaa

[Qemu-devel] [PATCH V2 0/9] Add support for binding guest numa nodes to host numa nodes

2013-06-20 Thread Wanlong Gao



As you know, QEMU can't direct it's memory allocation now, this may cause
guest cross node access performance regression.
And, the worse thing is that if PCI-passthrough is used,
direct-attached-device uses DMA transfer between device and qemu process.
All pages of the guest will be pinned by get_user_pages().

KVM_ASSIGN_PCI_DEVICE ioctl
  kvm_vm_ioctl_assign_device()
=>kvm_assign_device()
  => kvm_iommu_map_memslots()
=> kvm_iommu_map_pages()
   => kvm_pin_pages()

So, with direct-attached-device, all guest page's page count will be +1 and
any page migration will not work. AutoNUMA won't too.

So, we should set the guest nodes memory allocation policy before
the pages are really mapped.

According to this patch set, we are able to set guest nodes memory policy
like following:

 -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1
 -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=1

This supports 
"mem-policy={membind|interleave|preferred},mem-hostnode=[+|!]{all|N-N}" like 
format.

And patch 7/9 adds a QMP command "set-mpol" to set the memory policy for every
guest nodes:
set-mpol nodeid=0 mem-policy=membind mem-hostnode=0-1

And patch 8/9 adds a monitor command "set-mpol" whose format like:
set-mpol 0 mem-policy=membind,mem-hostnode=0-1

And with patch 9/9, we can get the current memory policy of each guest node
using monitor command "info numa", for example:

(qemu) info numa
2 nodes
node 0 cpus: 0
node 0 size: 1024 MB
node 0 mempolicy: membind=0,1
node 1 cpus: 1
node 1 size: 1024 MB
node 1 mempolicy: interleave=1


V1->V2:
change to use QemuOpts in numa options (Paolo)
handle Error in mpol parser (Paolo)
change qmp command format to mem-policy=membind,mem-hostnode=0-1 like 
(Paolo)


Bandan Das (1):
  NUMA: Support multiple CPU ranges on -numa option

Wanlong Gao (8):
  NUMA: Add numa_info structure to contain numa nodes info
  NUMA: Add Linux libnuma detection
  NUMA: parse guest numa nodes memory policy
  NUMA: set guest numa nodes memory policy
  NUMA: handle Error in mpol and hostnode parser
  NUMA: add qmp command set-mpol to set memory policy for NUMA node
  NUMA: add hmp command set-mpol
  NUMA: show host memory policy info in info numa command

 configure   |  32 +++
 cpus.c  | 143 ++-
 hmp-commands.hx |  16 
 hmp.c   |  35 +++
 hmp.h   |   1 +
 hw/i386/pc.c|   4 +-
 hw/net/eepro100.c   |   1 -
 include/sysemu/sysemu.h |  20 +++-
 monitor.c   |  44 -
 qapi-schema.json|  15 +++
 qemu-options.hx |   3 +-
 qmp-commands.hx |  35 +++
 vl.c| 250 ++--
 13 files changed, 540 insertions(+), 59 deletions(-)

-- 
1.8.3.1.448.gfb7dfaa

[Qemu-devel] [PATCH V2 1/9] NUMA: Support multiple CPU ranges on -numa option

2013-06-20 Thread Wanlong Gao

From: Bandan Das 

This allows us to use the "cpus" property multiple times
to specify multiple cpu (ranges) to the -numa option :

-numa node,cpus=1,cpus=2,cpus=4
or
-numa node,cpus=1-3,cpus=5

Signed-off-by: Bandan Das 
Signed-off-by: Wanlong Gao 
---
 qemu-options.hx |   3 +-
 vl.c| 108 ++--
 2 files changed, 67 insertions(+), 44 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 8355f9b..767e601 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -100,7 +100,8 @@ STEXI
 @item -numa @var{opts}
 @findex -numa
 Simulate a multi node NUMA system. If mem and cpus are omitted, resources
-are split equally.
+are split equally. The "-cpus" property may be specified multiple times
+to denote multiple cpus or cpu ranges.
 ETEXI
 
 DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,
diff --git a/vl.c b/vl.c
index 767e020..a1e5ce9 100644
--- a/vl.c
+++ b/vl.c
@@ -516,6 +516,32 @@ static QemuOptsList qemu_realtime_opts = {
 },
 };
 
+static QemuOptsList qemu_numa_opts = {
+.name = "numa",
+.implied_opt_name = "type",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_numa_opts.head),
+.desc = {
+{
+.name = "type",
+.type = QEMU_OPT_STRING,
+.help = "node type"
+},{
+.name = "nodeid",
+.type = QEMU_OPT_NUMBER,
+.help = "node ID"
+},{
+.name = "mem",
+.type = QEMU_OPT_SIZE,
+.help = "memory size"
+},{
+.name = "cpus",
+.type = QEMU_OPT_STRING,
+.help = "cpu number or range"
+},
+{ /* end of list */ }
+},
+};
+
 const char *qemu_get_vm_name(void)
 {
 return qemu_name;
@@ -1349,56 +1375,37 @@ error:
 exit(1);
 }
 
-static void numa_add(const char *optarg)
+
+static int numa_add_cpus(const char *name, const char *value, void *opaque)
 {
-char option[128];
-char *endptr;
-unsigned long long nodenr;
+int *nodenr = opaque;
 
-optarg = get_opt_name(option, 128, optarg, ',');
-if (*optarg == ',') {
-optarg++;
+if (!strcmp(name, "cpu")) {
+numa_node_parse_cpus(*nodenr, value);
 }
-if (!strcmp(option, "node")) {
-
-if (nb_numa_nodes >= MAX_NODES) {
-fprintf(stderr, "qemu: too many NUMA nodes\n");
-exit(1);
-}
+return 0;
+}
 
-if (get_param_value(option, 128, "nodeid", optarg) == 0) {
-nodenr = nb_numa_nodes;
-} else {
-if (parse_uint_full(option, &nodenr, 10) < 0) {
-fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
-exit(1);
-}
-}
+static int numa_init_func(QemuOpts *opts, void *opaque)
+{
+uint64_t nodenr, mem_size;
 
-if (nodenr >= MAX_NODES) {
-fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
-exit(1);
-}
+nodenr = qemu_opt_get_number(opts, "nodeid", nb_numa_nodes++);
 
-if (get_param_value(option, 128, "mem", optarg) == 0) {
-node_mem[nodenr] = 0;
-} else {
-int64_t sval;
-sval = strtosz(option, &endptr);
-if (sval < 0 || *endptr) {
-fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
-exit(1);
-}
-node_mem[nodenr] = sval;
-}
-if (get_param_value(option, 128, "cpus", optarg) != 0) {
-numa_node_parse_cpus(nodenr, option);
-}
-nb_numa_nodes++;
-} else {
-fprintf(stderr, "Invalid -numa option: %s\n", option);
+if (nodenr >= MAX_NODES) {
+fprintf(stderr, "qemu: Max number of NUMA nodes reached : %d\n",
+(int)nodenr);
 exit(1);
 }
+
+mem_size = qemu_opt_get_size(opts, "mem", 0);
+node_mem[nodenr] = mem_size;
+
+if (qemu_opt_foreach(opts, numa_add_cpus, &nodenr, 1) < 0) {
+return -1;
+}
+
+return 0;
 }
 
 static void smp_parse(const char *optarg)
@@ -2901,6 +2908,7 @@ int main(int argc, char **argv, char **envp)
 qemu_add_opts(&qemu_object_opts);
 qemu_add_opts(&qemu_tpmdev_opts);
 qemu_add_opts(&qemu_realtime_opts);
+qemu_add_opts(&qemu_numa_opts);
 
 runstate_init();
 
@@ -3087,7 +3095,16 @@ int main(int argc, char **argv, char **envp)
 }
 break;
 case QEMU_OPTION_numa:
-numa_add(optarg);
+olist = qemu_find_opts("numa");
+opts = qemu_opts_parse(olist, optarg, 1);
+if (!opts) {
+exit(1);
+}
+optarg = qemu_opt_get(opts, "type");
+if (!optarg || strcmp(optarg, "node")) {
+fprintf(stderr, "qemu: Incorrect format for numa 
option\n");
+exit(1);
+}
 break;
 case QEMU_OPTION_d

Re: [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support

2013-06-20 Thread Benjamin Herrenschmidt

On Fri, 2013-06-21 at 00:03 -0600, Alex Williamson wrote:
> On Fri, 2013-06-21 at 15:12 +1000, Benjamin Herrenschmidt wrote:
> > On Thu, 2013-06-20 at 22:46 -0600, Alex Williamson wrote:
> > > Maybe you could add a device parameter to kvm_irqchip_add_msi_route so
> > > that it can be implemented on POWER without this pci_bus_map_msi
> > > interface that seems very unique to POWER.  Thanks,
> > 
> > You mean unique to all non-x86 ? :-)
> > 
> > I believe almost everybody eventually turn MSIs into "normal"
> > interrupts...
> > 
> > Most often than not, the logic to do so is in the PCI Host Bridge.
> > 
> > The whole concept of passing the message address/data accross the
> > user/kernel interface is an x86 crackpotery but as is the entire
> > remapping/routing layer so ... :-)
> 
> Regardless, this is exactly what kvm_irqchip_add_msi_route does.  It
> says, here's an MSIMessage, give me an IRQ that sends that. 

Yes, and in our case, what happens is that the guest said to use "I want
an MSI", we picked up an IRQ, and made up a message for it :-) The
actual message address/data we use is a complete invention that only
exists within qemu. So here we need to basically turn it back into an
IRQ, which we might be able to do by ... just making the message (or
part of the address) be the IRQ number or something like that.

>  In the x86
> case, that means pick a free IRQ and program it to send that MSIMessage
> when we hit the irqfd.  In the case of POWER it means lookup which IRQ
> gets fired by that MSIMessage and return it.  In a non-accelerated QEMU
> case I'd think msi_notify() would write the MSIMessage to this IRQ
> remapper device and let it toggle the next qemu_irq down the line.  If
> we ever add an IOMMU based IRQ remapper to the x86 model, we'd need
> something similar.  Thanks,

Ben.

Re: [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support

2013-06-20 Thread Alex Williamson

On Fri, 2013-06-21 at 15:12 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2013-06-20 at 22:46 -0600, Alex Williamson wrote:
> > Maybe you could add a device parameter to kvm_irqchip_add_msi_route so
> > that it can be implemented on POWER without this pci_bus_map_msi
> > interface that seems very unique to POWER.  Thanks,
> 
> You mean unique to all non-x86 ? :-)
> 
> I believe almost everybody eventually turn MSIs into "normal"
> interrupts...
> 
> Most often than not, the logic to do so is in the PCI Host Bridge.
> 
> The whole concept of passing the message address/data accross the
> user/kernel interface is an x86 crackpotery but as is the entire
> remapping/routing layer so ... :-)

Regardless, this is exactly what kvm_irqchip_add_msi_route does.  It
says, here's an MSIMessage, give me an IRQ that sends that.  In the x86
case, that means pick a free IRQ and program it to send that MSIMessage
when we hit the irqfd.  In the case of POWER it means lookup which IRQ
gets fired by that MSIMessage and return it.  In a non-accelerated QEMU
case I'd think msi_notify() would write the MSIMessage to this IRQ
remapper device and let it toggle the next qemu_irq down the line.  If
we ever add an IOMMU based IRQ remapper to the x86 model, we'd need
something similar.  Thanks,

Alex

Re: [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support

2013-06-20 Thread Benjamin Herrenschmidt

On Thu, 2013-06-20 at 22:46 -0600, Alex Williamson wrote:
> Maybe you could add a device parameter to kvm_irqchip_add_msi_route so
> that it can be implemented on POWER without this pci_bus_map_msi
> interface that seems very unique to POWER.  Thanks,

You mean unique to all non-x86 ? :-)

I believe almost everybody eventually turn MSIs into "normal"
interrupts...

Most often than not, the logic to do so is in the PCI Host Bridge.

The whole concept of passing the message address/data accross the
user/kernel interface is an x86 crackpotery but as is the entire
remapping/routing layer so ... :-)

Ben.

Re: [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support

2013-06-20 Thread Alex Williamson

On Fri, 2013-06-21 at 12:49 +1000, Alexey Kardashevskiy wrote:
> On 06/21/2013 12:34 PM, Alex Williamson wrote:
> > On Fri, 2013-06-21 at 11:56 +1000, Alexey Kardashevskiy wrote:
> >> On 06/21/2013 02:51 AM, Alex Williamson wrote:
> >>> On Fri, 2013-06-21 at 00:08 +1000, Alexey Kardashevskiy wrote:
>  At the moment QEMU creates a route for every MSI IRQ.
> 
>  Now we are about to add IRQFD support on PPC64-pseries platform.
>  pSeries already has in-kernel emulated interrupt controller with
>  8192 IRQs. Also, pSeries PHB already supports MSIMessage to IRQ
>  mapping as a part of PAPR requirements for MSI/MSIX guests.
>  Specifically, the pSeries guest does not touch MSIMessage's at
>  all, instead it uses rtas_ibm_change_msi and 
>  rtas_ibm_query_interrupt_source
>  rtas calls to do the mapping.
> 
>  Therefore we do not really need more routing than we got already.
>  The patch introduces the infrastructure to enable direct IRQ mapping.
> 
>  Signed-off-by: Alexey Kardashevskiy 
> 
>  ---
> 
>  The patch is raw and ugly indeed, I made it only to demonstrate
>  the idea and see if it has right to live or not.
> 
>  For some reason which I do not really understand (limited GSI numbers?)
>  the existing code always adds routing and I do not see why we would need 
>  it.
> >>>
> >>> It's an IOAPIC, a pin gets toggled from the device and an MSI message
> >>> gets written to the CPU.  So the route allocates and programs the
> >>> pin->MSI, then we tell it what notifier triggers that pin.
> >>
> >>> On x86 the MSI vector doesn't encode any information about the device
> >>> sending the MSI, here you seem to be able to figure out the device and
> >>> vector space number from the address.  Then your pin to MSI is
> >>> effectively fixed.  So why isn't this just your
> >>> kvm_irqchip_add_msi_route function?  On pSeries it's a lookup, on x86
> >>> it's a allocate and program.
> >>>  What does kvm_irqchip_add_msi_route do on
> >>> pSeries today?  Thanks,
> >>
> >>
> >> As we just started implementing this thing, I commented it out for the
> >> starter. Once called, it destroys direct mapping in the host kernel and
> >> everything stops working as routing is not implemented (yet? ever?).
> > 
> > Yay, it's broken, you can rewrite it ;)
> 
> 
> There is nothing to rewrite, my understanding is that it is just not
> written yet and Paul would like not do that :)
> 
> 
> >> My point here is that MSIMessage to irq translation is made on a PCI domain
> >> as PAPR (ppc64 server) spec says. The guest never uses MSIMessage, it is
> >> all in QEMU, the guest dynamically allocates MSI IRQs and it is up to a
> >> hypeviser (QEMU) to take care of actual MSIMessage for the device.
> > 
> > MSIMessage is what the guest has programmed for the address/data fields,
> > it's not just a QEMU invention.  From the guest perspective, the device
> > writes msg.data to msg.address to signal the CPU for the interrupt.
> 
> 
> Our guests do never program MSIMessage. Hypercalls are used instead.

Of course POWER has a hypercall for that, but that's just abstracting
the physical device, which does actually write msg.data to msg.address
on the bus.

> >> And the only reason to use MSIMessage in QEMU for us is to support
> >> msi_notify()/msix_notify() in places like vfio_msi_interrupt(), I have
> >> added a MSI window for that long time ago which we do not need as much as
> >> we already have an irq number in vfio_msi_interrupt(), etc.
> > 
> > It seems like you just have another layer of indirection via your
> > msi_table.  For x86 there's a layer of indirection via the virq virtual
> > IOAPIC pin.  Seems similar.  Thanks,
> 
> 
> Do not follow you, sorry. For x86, is it that MSI routing table which is
> updated via KVM_SET_GSI_ROUTING in KVM? When there is no KVM, what piece of
> code responds on msi_notify() in qemu-x86 and does qemu_irq_pulse()?

vfio_msi_interrupt->msi[x]_notify->stl_le_phys(msg.address, msg.data)

This writes directly to the interrupt block on the vCPU.  With KVM, the
in-kernel APIC does the same write, where the pin to MSIMessage is setup
by kvm_irqchip_add_msi_route and the pin is pulled by an irqfd.

Do I understand that on POWER the MSI from the device is intercepted at
the PHB and converted to an IRQ that's triggered by some means other
than a MSI write?  So to correctly model the hardware, vfio should do a
msi_notify() that does a stl_le_phys that terminates at this IRQ
remapper thing and in turn toggles a qemu_irq.  MSIMessage is only
extraneous data if you want to skip over hardware blocks.

Maybe you could add a device parameter to kvm_irqchip_add_msi_route so
that it can be implemented on POWER without this pci_bus_map_msi
interface that seems very unique to POWER.  Thanks,

Alex

>  ---
>   hw/misc/vfio.c   |   11 +--
>   hw/pci/pci.c |   13 +
>   hw/ppc/spapr_p

Re: [Qemu-devel] [PATCH v3 2/2] QEMUBH: make AioContext's bh re-entrant

2013-06-20 Thread liu ping fan

[...]
>>>
>>> qemu_bh_delete is safe as long as you wait for the bottom half to stop
>>> before deleting the containing object.  Once we have RCU, deletion of
>>> QOM objects will be RCU-protected.  Hence, a simple way could be to put
>>> the first part of aio_bh_poll() within rcu_read_lock/unlock.
>>>
>> In fact, I have some idea about this,  introduce another member -
>> Object for QEMUBH which will be refereed in cb, then we leave anything
>> to refcnt mechanism.
>> For qemu_bh_cancel(), I do not figure out whether it is important or
>> not to sync with caller.
>
> This is a separate patch anyway... and a long discussion to have before
> too. :)
>
> Let's concentrate on one thing at a time.
>
Yes, will do like this.

Regards,
Pingfan

> Paolo
>
>> diff --git a/async.c b/async.c
>> index 4b17eb7..60c35a1 100644
>> --- a/async.c
>> +++ b/async.c
>> @@ -61,6 +61,7 @@ int aio_bh_poll(AioContext *ctx)
>>  {
>>  QEMUBH *bh, **bhp, *next;
>>  int ret;
>> +int sched;
>>
>>  {
>>  QEMUBH *bh, **bhp, *next;
>>  int ret;
>> +int sched;
>>
>>  ctx->walking_bh++;
>>
>> @@ -69,8 +70,10 @@ int aio_bh_poll(AioContext *ctx)
>>  /* Make sure fetching bh before accessing its members */
>>  smp_read_barrier_depends();
>>  next = bh->next;
>> -if (!bh->deleted && bh->scheduled) {
>> -bh->scheduled = 0;
>> +sched = 0;
>> +atomic_xchg(&bh->scheduled, sched);
>
> This is expensive.
>
>> +if (!bh->deleted && sched) {
>> +//bh->scheduled = 0;
>>  if (!bh->idle)
>>  ret = 1;
>>  bh->idle = 0;
>> @@ -79,6 +82,9 @@ int aio_bh_poll(AioContext *ctx)
>>   */
>>  smp_rmb();
>>  bh->cb(bh->opaque);
>> +if (bh->obj) {
>> +object_unref(bh->obj);
>> +}
>>  }
>>  }
>>
>> @@ -105,8 +111,12 @@ int aio_bh_poll(AioContext *ctx)
>>
>>  void qemu_bh_schedule_idle(QEMUBH *bh)
>>  {
>> -if (bh->scheduled)
>> +int sched = 1;
>> +
>> +atomic_xchg( &bh->scheduled, sched);
>> +if (sched) {
>>  return;
>> +}
>>  /* Make sure any writes that are needed by the callback are done
>>   * before the locations are read in the aio_bh_poll.
>>   */
>> @@ -117,25 +127,46 @@ void qemu_bh_schedule_idle(QEMUBH *bh)
>>
>>  void qemu_bh_schedule(QEMUBH *bh)
>>  {
>> -if (bh->scheduled)
>> +int sched = 1;
>> +
>> +atomic_xchg( &bh->scheduled, sched);
>> +if (sched) {
>>  return;
>> +}
>>  /* Make sure any writes that are needed by the callback are done
>>   * before the locations are read in the aio_bh_poll.
>>   */
>>  smp_wmb();
>>  bh->scheduled = 1;
>> +if (bh->obj) {
>> +object_ref(bh->obj);
>> +}
>>  bh->idle = 0;
>>  aio_notify(bh->ctx);
>>  }
>>
>>  void qemu_bh_cancel(QEMUBH *bh)
>>  {
>> -bh->scheduled = 0;
>> +int sched = 0;
>> +
>> +atomic_xchg( &bh->scheduled, sched);
>> +if (sched) {
>> +if (bh->obj) {
>> +object_ref(bh->obj);
>> +}
>> +}
>>  }
>>
>>  void qemu_bh_delete(QEMUBH *bh)
>>  {
>> -bh->scheduled = 0;
>> +int sched = 0;
>> +
>> +atomic_xchg( &bh->scheduled, sched);
>> +if (sched) {
>> +if (bh->obj) {
>> +object_ref(bh->obj);
>> +}
>> +}
>>  bh->deleted = 1;
>>  }
>>
>> Regards,
>> Pingfan
 The other thing I'm unclear on is the ->idle assignment followed
 immediately by a ->scheduled assignment.  Without memory barriers
 aio_bh_poll() isn't guaranteed to get an ordered view of these updates:
 it may see an idle BH as a regular scheduled BH because ->idle is still
 0.
>>>
>>> Right.  You need to order ->idle writes before ->scheduled writes, and
>>> add memory barriers, or alternatively use two bits in ->scheduled so
>>> that you can assign both atomically.
>>>
>>> Paolo
>

[Qemu-devel] Object cast macro change-pattern automation.

2013-06-20 Thread Peter Crosthwaite

Hi Andreas, Hu,

I thought Id share with you a little script I made (not very polished)
that I used to help with some of my patches creating the QOM cast
macros (mainly the PCI ones). May be useful in speeding up the
QOMification effort. Andreas, im guessing you may have something
similar going if your able to comment? I know Hu mentioned he wanted
to work on QOMification of sysbus - which is a big job so stuff like
this may make life easier.

example usage:

$ source ./object_macro_maker hw/timer/xilinx_timer.c XILINX_TIMER

1st arg is target file, 2 arg is the name of the type, I.e. FOO in TYPE_FOO

It will automatically find replace usages of the string literal type
inplace and give you a fragment to copy-paste into the source defining
the type string and object cast macro.

It has the limitation that it only works with files that define a
single QOM type. I didnt bother trying to generalise as such files are
the exception and not the rule.

Example output below:

diff --git a/hw/timer/xilinx_timer.c b/hw/timer/xilinx_timer.c
index 0c39cff..ae09170 100644
--- a/hw/timer/xilinx_timer.c
+++ b/hw/timer/xilinx_timer.c
@@ -218,7 +218,7 @@ static int xilinx_timer_init(SysBusDevice *dev)
 ptimer_set_freq(xt->ptimer, t->freq_hz);
 }

-memory_region_init_io(&t->mmio, &timer_ops, t, "xlnx.xps-timer",
+memory_region_init_io(&t->mmio, &timer_ops, t, TYPE_XILINX_TIMER,
   R_MAX * 4 * num_timers(t));
 sysbus_init_mmio(dev, &t->mmio);
 return 0;
@@ -241,7 +241,7 @@ static void xilinx_timer_class_init(ObjectClass
*klass, void *data)
 }

 static const TypeInfo xilinx_timer_info = {
-.name  = "xlnx.xps-timer",
+.name  = TYPE_XILINX_TIMER,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(struct timerblock),
 .class_init= xilinx_timer_class_init,
State Struct is struct timerblock
-- cut here 
#define TYPE_XILINX_TIMER "xlnx.xps-timer"

#define XILINX_TIMER(obj) \
OBJECT_CHECK(struct timerblock, (obj), TYPE_XILINX_TIMER)
-


And the script itself:

#!/bin/bash

sed -n '/^static const TypeInfo.*$/,/^};.*$/p' $1 | \
grep "\(\.instance_size\|\.name\)"\
> typeinfo.tmp
cat typeinfo.tmp
STRING=$(grep -o "\".*\"" typeinfo.tmp | sed 's/\"//g')

echo "String is ${STRING}"
sed "s/\"${STRING}\"/TYPE_${2}/g" -i ${1}
git diff ${1} | cat

STATE_STRUCT=$(grep -o "(.*)" typeinfo.tmp | sed "s/(//" | sed "s/)//")
echo "State Struct is ${STATE_STRUCT}"
echo "-- cut here "

echo "#define TYPE_${2} \"${STRING}\""
echo ""
echo "#define ${2}(obj) \\"
echo "OBJECT_CHECK(${STATE_STRUCT}, (obj), TYPE_${2})"


Regards,
Peter

Re: [Qemu-devel] [PATCH] hmp: Make "info block" output more readable

2013-06-20 Thread Luiz Capitulino

On Wed, 19 Jun 2013 16:10:55 +0200
Kevin Wolf  wrote:

> HMP is meant for humans and you should notice it.
> 
> This changes the output format to use a bit more space to display the
> information more readable and leaves out irrelevant information (e.g.
> mention only that an image is encrypted, but not when it's not; display
> I/O limits only if throttling is in effect; ...)

I've applied this one. I can make the small suggestions that have been
made if you're OK with them.

> 
> Before:
> 
> (qemu) info block
> ide0-hd0: removable=0 io-status=ok file=/tmp/overlay.qcow2
> backing_file=/tmp/backing.img backing_file_depth=1 ro=0 drv=qcow2
> encrypted=1 bps=0 bps_rd=0 bps_wr=0 iops=1024 iops_rd=0 iops_wr=0
> ide1-cd0: removable=1 locked=0 tray-open=0 io-status=ok
> file=/home/kwolf/images/iso/Fedora-18-x86_64-Live-Desktop.iso ro=1
> drv=raw encrypted=0 bps=0 bps_rd=0 bps_wr=0 iops=0 iops_rd=0 iops_wr=0
> floppy0: removable=1 locked=0 tray-open=0 [not inserted]
> sd0: removable=1 locked=0 tray-open=0 [not inserted]
> 
> After:
> 
> (qemu) info block
> ide0-hd0: /tmp/overlay.qcow2 (qcow2, encrypted)
> Backing file: /tmp/backing.img (chain depth: 1)
> I/O limits:   bps=0 bps_rd=0 bps_wr=0 iops=1024 iops_rd=0 
> iops_wr=0
> 
> ide1-cd0: /home/kwolf/images/iso/Fedora-18-x86_64-Live-Desktop.iso (raw, 
> read-only)
> Removable device: not locked, tray closed
> 
> floppy0: [not inserted]
> Removable device: not locked, tray closed
> 
> sd0: [not inserted]
> Removable device: not locked, tray closed
> 
> Signed-off-by: Kevin Wolf 
> ---
>  hmp.c | 94 
> +++
>  1 file changed, 55 insertions(+), 39 deletions(-)
> 
> diff --git a/hmp.c b/hmp.c
> index 494a9aa..dddfaf4 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -289,62 +289,78 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
>  if (device && strcmp(device, info->value->device)) {
>  continue;
>  }
> -monitor_printf(mon, "%s: removable=%d",
> -   info->value->device, info->value->removable);
>  
> -if (info->value->removable) {
> -monitor_printf(mon, " locked=%d", info->value->locked);
> -monitor_printf(mon, " tray-open=%d", info->value->tray_open);
> +if (info != block_list) {
> +monitor_printf(mon, "\n");
> +}
> +
> +monitor_printf(mon, "%s", info->value->device);
> +if (info->value->has_inserted) {
> +monitor_printf(mon, ": %s (%s%s%s)\n",
> +   info->value->inserted->file,
> +   info->value->inserted->drv,
> +   info->value->inserted->ro ? ", read-only" : "",
> +   info->value->inserted->encrypted ? ", encrypted" 
> : "");
> +} else {
> +monitor_printf(mon, ": [not inserted]\n");
>  }
>  
> -if (info->value->has_io_status) {
> -monitor_printf(mon, " io-status=%s",
> +if (info->value->has_io_status && info->value->io_status != 
> BLOCK_DEVICE_IO_STATUS_OK) {
> +monitor_printf(mon, "I/O Status:   %s\n",
> 
> BlockDeviceIoStatus_lookup[info->value->io_status]);
>  }
>  
> -if (info->value->has_inserted) {
> -monitor_printf(mon, " file=");
> -monitor_print_filename(mon, info->value->inserted->file);
> -
> -if (info->value->inserted->has_backing_file) {
> -monitor_printf(mon, " backing_file=");
> -monitor_print_filename(mon, 
> info->value->inserted->backing_file);
> -monitor_printf(mon, " backing_file_depth=%" PRId64,
> -info->value->inserted->backing_file_depth);
> -}
> -monitor_printf(mon, " ro=%d drv=%s encrypted=%d",
> -   info->value->inserted->ro,
> -   info->value->inserted->drv,
> -   info->value->inserted->encrypted);
> +if (info->value->removable) {
> +monitor_printf(mon, "Removable device: %slocked, tray %s\n",
> +   info->value->locked ? "" : "not ",
> +   info->value->tray_open ? "open" : "closed");
> +}
>  
> -monitor_printf(mon, " bps=%" PRId64 " bps_rd=%" PRId64
> -" bps_wr=%" PRId64 " iops=%" PRId64
> -" iops_rd=%" PRId64 " iops_wr=%" PRId64,
> +
> +if (!info->value->has_inserted) {
> +continue;
> +}
> +
> +if (info->value->inserted->has_backing_file) {
> +monitor_printf(mon,
> +   "Backing file: %s "
> +   "(chain depth: %" PRId64 ")\n",
> +   info->value->inserted

Re: [Qemu-devel] [PATCH] full introspection support for QMP

2013-06-20 Thread Luiz Capitulino

On Wed, 19 Jun 2013 20:24:37 +0800
Amos Kong  wrote:

> Introduces new monitor command to query QMP schema information,
> the return data is a nested dict/list, it contains the useful
> metadata.

Thanks for the good work, Amos!

When testing this though I actually get qemu-ga's schema, not
qmp's. Did you test this with qemu-ga build enabled?

This bug shows that we need to handle qemu-ga properly, which
means having query-guest-agent-schema for qemu-ga.

It's also a good idea to start the commit log with some json
examples btw.

More comments below.

> we can add events definations to qapi-schema.json, then it can
> also be queried.
> 
> Signed-off-by: Amos Kong 
> ---
>  Makefile |   4 +-
>  qapi-schema.json |  68 +++
>  qmp-commands.hx  |  39 +++
>  qmp.c| 170 
> +++
>  scripts/qapi-commands.py |   2 +-
>  scripts/qapi-types.py|  34 +-
>  scripts/qapi-visit.py|   2 +-
>  scripts/qapi.py  |   7 +-
>  8 files changed, 320 insertions(+), 6 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 3cfa7d0..42713ef 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -38,7 +38,7 @@ endif
>  endif
>  
>  GENERATED_HEADERS = config-host.h qemu-options.def
> -GENERATED_HEADERS += qmp-commands.h qapi-types.h qapi-visit.h
> +GENERATED_HEADERS += qmp-commands.h qapi-types.h qapi-visit.h qmp-schema.h
>  GENERATED_SOURCES += qmp-marshal.c qapi-types.c qapi-visit.c
>  
>  GENERATED_HEADERS += trace/generated-events.h
> @@ -213,7 +213,7 @@ qga/qapi-generated/qga-qmp-commands.h 
> qga/qapi-generated/qga-qmp-marshal.c :\
>  $(SRC_PATH)/qga/qapi-schema.json $(SRC_PATH)/scripts/qapi-commands.py 
> $(qapi-py)
>   $(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-commands.py 
> $(gen-out-type) -o qga/qapi-generated -p "qga-" < $<, "  GEN   $@")
>  
> -qapi-types.c qapi-types.h :\
> +qapi-types.c qapi-types.h qmp-schema.h:\
>  $(SRC_PATH)/qapi-schema.json $(SRC_PATH)/scripts/qapi-types.py $(qapi-py)
>   $(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-types.py 
> $(gen-out-type) -o "." -b < $<, "  GEN   $@")
>  qapi-visit.c qapi-visit.h :\
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 6cc07c2..43abe57 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -3608,3 +3608,71 @@
>  '*cpuid-input-ecx': 'int',
>  'cpuid-register': 'X86CPURegister32',
>  'features': 'int' } }
> +
> +##
> +# @DataObject
> +#
> +# Details of a data object, it can be nested dictionary/list
> +#
> +# @name: #optional the string key of dictionary

Data object name if it has one?

> +#
> +# @type: the string value of dictionary or list

data type name?

> +#
> +# @data: #optional a list of @DataObject, dictionary's value is nested
> +#dictionary/list

#optional DataObject list, can be a dictionary or list type?

> +#
> +# Since: 1.6
> +##
> +{ 'type': 'DataObject',
> +  'data': { '*name': 'str', '*type': 'str', '*data': ['DataObject'] } }
> +
> +##
> +# @SchemaMetatype

As we're doing CamelCase, this should be SchemaMetaType. Or maybe just
SchemaType?

> +#
> +# Possible meta types of a schema entry
> +#
> +# @Command: QMP monitor command to control guest

"QMP command" is good enough.

> +#
> +# @Type: defined new data type
> +#
> +# @Enumeration: enumeration data type
> +#
> +# @Union: union data type
> +#
> +# @Event: QMP event to notify QMP clients

I'm not sure we should have events listed here as they are not
supported yet.

> +#
> +# Since: 1.6
> +##
> +{ 'enum': 'SchemaMetatype',
> +  'data': ['Command', 'Type', 'Enumeration', 'Union', 'Event'] }
> +
> +##
> +# @SchemaData

Sorry for the bikeshed, but SchemaEntry maybe?

> +#
> +# Details of schema items
> +#
> +# @type: dict's value, list's value

Entry's type in string format.

> +#
> +# @name: dict's key

Entry name.

> +#
> +# @data: #optional list of @DataObject, arguments data of executing
> +#QMP command

"#optional list of DataObject. This can have different meaning depending
on the 'type' value. For example, for a QMP command, this member contains
an argument listing. For an enumeration, it contains the enum's values
and so on"

> +#
> +# @returns: #optional list of DataObject, return data after executing
> +#   QMP command

I don't parse what's after the coma.

> +#
> +# Since: 1.6
> +##
> +{ 'type': 'SchemaData', 'data': { 'type': 'SchemaMetatype',
> +  'name': 'str', '*data': ['DataObject'], '*returns': ['DataObject'] } }
> +
> +##
> +# @query-qmp-schema
> +#
> +# Query QMP schema information
> +#
> +# Returns: list of @SchemaData. Returns an error if json string is invalid.

I don't think you should return errors, see below.

> +#
> +# Since: 1.6
> +##
> +{ 'command': 'query-qmp-schema', 'returns': ['SchemaData'] }
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 8cea5e5..667d9ab 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -2

Re: [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support

2013-06-20 Thread Alexey Kardashevskiy

On 06/21/2013 12:34 PM, Alex Williamson wrote:
> On Fri, 2013-06-21 at 11:56 +1000, Alexey Kardashevskiy wrote:
>> On 06/21/2013 02:51 AM, Alex Williamson wrote:
>>> On Fri, 2013-06-21 at 00:08 +1000, Alexey Kardashevskiy wrote:
 At the moment QEMU creates a route for every MSI IRQ.

 Now we are about to add IRQFD support on PPC64-pseries platform.
 pSeries already has in-kernel emulated interrupt controller with
 8192 IRQs. Also, pSeries PHB already supports MSIMessage to IRQ
 mapping as a part of PAPR requirements for MSI/MSIX guests.
 Specifically, the pSeries guest does not touch MSIMessage's at
 all, instead it uses rtas_ibm_change_msi and 
 rtas_ibm_query_interrupt_source
 rtas calls to do the mapping.

 Therefore we do not really need more routing than we got already.
 The patch introduces the infrastructure to enable direct IRQ mapping.

 Signed-off-by: Alexey Kardashevskiy 

 ---

 The patch is raw and ugly indeed, I made it only to demonstrate
 the idea and see if it has right to live or not.

 For some reason which I do not really understand (limited GSI numbers?)
 the existing code always adds routing and I do not see why we would need 
 it.
>>>
>>> It's an IOAPIC, a pin gets toggled from the device and an MSI message
>>> gets written to the CPU.  So the route allocates and programs the
>>> pin->MSI, then we tell it what notifier triggers that pin.
>>
>>> On x86 the MSI vector doesn't encode any information about the device
>>> sending the MSI, here you seem to be able to figure out the device and
>>> vector space number from the address.  Then your pin to MSI is
>>> effectively fixed.  So why isn't this just your
>>> kvm_irqchip_add_msi_route function?  On pSeries it's a lookup, on x86
>>> it's a allocate and program.
>>>  What does kvm_irqchip_add_msi_route do on
>>> pSeries today?  Thanks,
>>
>>
>> As we just started implementing this thing, I commented it out for the
>> starter. Once called, it destroys direct mapping in the host kernel and
>> everything stops working as routing is not implemented (yet? ever?).
> 
> Yay, it's broken, you can rewrite it ;)


There is nothing to rewrite, my understanding is that it is just not
written yet and Paul would like not do that :)


>> My point here is that MSIMessage to irq translation is made on a PCI domain
>> as PAPR (ppc64 server) spec says. The guest never uses MSIMessage, it is
>> all in QEMU, the guest dynamically allocates MSI IRQs and it is up to a
>> hypeviser (QEMU) to take care of actual MSIMessage for the device.
> 
> MSIMessage is what the guest has programmed for the address/data fields,
> it's not just a QEMU invention.  From the guest perspective, the device
> writes msg.data to msg.address to signal the CPU for the interrupt.


Our guests do never program MSIMessage. Hypercalls are used instead.


>> And the only reason to use MSIMessage in QEMU for us is to support
>> msi_notify()/msix_notify() in places like vfio_msi_interrupt(), I have
>> added a MSI window for that long time ago which we do not need as much as
>> we already have an irq number in vfio_msi_interrupt(), etc.
> 
> It seems like you just have another layer of indirection via your
> msi_table.  For x86 there's a layer of indirection via the virq virtual
> IOAPIC pin.  Seems similar.  Thanks,


Do not follow you, sorry. For x86, is it that MSI routing table which is
updated via KVM_SET_GSI_ROUTING in KVM? When there is no KVM, what piece of
code responds on msi_notify() in qemu-x86 and does qemu_irq_pulse()?



> 
> Alex
> 
 ---
  hw/misc/vfio.c   |   11 +--
  hw/pci/pci.c |   13 +
  hw/ppc/spapr_pci.c   |   13 +
  hw/virtio/virtio-pci.c   |   26 --
  include/hw/pci/pci.h |4 
  include/hw/pci/pci_bus.h |1 +
  6 files changed, 60 insertions(+), 8 deletions(-)

 diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
 index 14aac04..2d9eef7 100644
 --- a/hw/misc/vfio.c
 +++ b/hw/misc/vfio.c
 @@ -639,7 +639,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
 unsigned int nr,
   * Attempt to enable route through KVM irqchip,
   * default to userspace handling if unavailable.
   */
 -vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
 +
 +vector->virq = msg ? pci_bus_map_msi(vdev->pdev.bus, *msg) : -1;
 +if (vector->virq < 0) {
 +vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : 
 -1;
 +}
  if (vector->virq < 0 ||
  kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
 vector->virq) < 0) {
 @@ -807,7 +811,10 @@ retry:
   * Attempt to enable route through KVM irqchip,
   * default to userspace handling if unavailable.
>>>

Re: [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support

2013-06-20 Thread Alex Williamson

On Fri, 2013-06-21 at 11:56 +1000, Alexey Kardashevskiy wrote:
> On 06/21/2013 02:51 AM, Alex Williamson wrote:
> > On Fri, 2013-06-21 at 00:08 +1000, Alexey Kardashevskiy wrote:
> >> At the moment QEMU creates a route for every MSI IRQ.
> >>
> >> Now we are about to add IRQFD support on PPC64-pseries platform.
> >> pSeries already has in-kernel emulated interrupt controller with
> >> 8192 IRQs. Also, pSeries PHB already supports MSIMessage to IRQ
> >> mapping as a part of PAPR requirements for MSI/MSIX guests.
> >> Specifically, the pSeries guest does not touch MSIMessage's at
> >> all, instead it uses rtas_ibm_change_msi and 
> >> rtas_ibm_query_interrupt_source
> >> rtas calls to do the mapping.
> >>
> >> Therefore we do not really need more routing than we got already.
> >> The patch introduces the infrastructure to enable direct IRQ mapping.
> >>
> >> Signed-off-by: Alexey Kardashevskiy 
> >>
> >> ---
> >>
> >> The patch is raw and ugly indeed, I made it only to demonstrate
> >> the idea and see if it has right to live or not.
> >>
> >> For some reason which I do not really understand (limited GSI numbers?)
> >> the existing code always adds routing and I do not see why we would need 
> >> it.
> > 
> > It's an IOAPIC, a pin gets toggled from the device and an MSI message
> > gets written to the CPU.  So the route allocates and programs the
> > pin->MSI, then we tell it what notifier triggers that pin.
> 
> > On x86 the MSI vector doesn't encode any information about the device
> > sending the MSI, here you seem to be able to figure out the device and
> > vector space number from the address.  Then your pin to MSI is
> > effectively fixed.  So why isn't this just your
> > kvm_irqchip_add_msi_route function?  On pSeries it's a lookup, on x86
> > it's a allocate and program.
> >  What does kvm_irqchip_add_msi_route do on
> > pSeries today?  Thanks,
> 
> 
> As we just started implementing this thing, I commented it out for the
> starter. Once called, it destroys direct mapping in the host kernel and
> everything stops working as routing is not implemented (yet? ever?).

Yay, it's broken, you can rewrite it ;)

> My point here is that MSIMessage to irq translation is made on a PCI domain
> as PAPR (ppc64 server) spec says. The guest never uses MSIMessage, it is
> all in QEMU, the guest dynamically allocates MSI IRQs and it is up to a
> hypeviser (QEMU) to take care of actual MSIMessage for the device.

MSIMessage is what the guest has programmed for the address/data fields,
it's not just a QEMU invention.  From the guest perspective, the device
writes msg.data to msg.address to signal the CPU for the interrupt.

> And the only reason to use MSIMessage in QEMU for us is to support
> msi_notify()/msix_notify() in places like vfio_msi_interrupt(), I have
> added a MSI window for that long time ago which we do not need as much as
> we already have an irq number in vfio_msi_interrupt(), etc.

It seems like you just have another layer of indirection via your
msi_table.  For x86 there's a layer of indirection via the virq virtual
IOAPIC pin.  Seems similar.  Thanks,

Alex

> >> ---
> >>  hw/misc/vfio.c   |   11 +--
> >>  hw/pci/pci.c |   13 +
> >>  hw/ppc/spapr_pci.c   |   13 +
> >>  hw/virtio/virtio-pci.c   |   26 --
> >>  include/hw/pci/pci.h |4 
> >>  include/hw/pci/pci_bus.h |1 +
> >>  6 files changed, 60 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> >> index 14aac04..2d9eef7 100644
> >> --- a/hw/misc/vfio.c
> >> +++ b/hw/misc/vfio.c
> >> @@ -639,7 +639,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
> >> unsigned int nr,
> >>   * Attempt to enable route through KVM irqchip,
> >>   * default to userspace handling if unavailable.
> >>   */
> >> -vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
> >> +
> >> +vector->virq = msg ? pci_bus_map_msi(vdev->pdev.bus, *msg) : -1;
> >> +if (vector->virq < 0) {
> >> +vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : 
> >> -1;
> >> +}
> >>  if (vector->virq < 0 ||
> >>  kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
> >> vector->virq) < 0) {
> >> @@ -807,7 +811,10 @@ retry:
> >>   * Attempt to enable route through KVM irqchip,
> >>   * default to userspace handling if unavailable.
> >>   */
> >> -vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
> >> +vector->virq = pci_bus_map_msi(vdev->pdev.bus, msg);
> >> +if (vector->virq < 0) {
> >> +vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
> >> +}
> >>  if (vector->virq < 0 ||
> >>  kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
> >> vector->virq) < 0) {
> >> diff --git a/hw/pci/pci

[Qemu-devel] [PATCH] Fix iSCSI crash on SG_IO with an iovector

2013-06-20 Thread Ronnie Sahlberg

Don't assume that SG_IO is always invoked with a simple buffer,
check the iovec_count and if it is > 1 then we need to pass an array
of iovectors to libiscsi instead of just a plain buffer.

Signed-off-by: Ronnie Sahlberg 
---
 block/iscsi.c |   31 ---
 1 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 0bbf0b1..2d1cb4e 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -727,25 +727,42 @@ static BlockDriverAIOCB *iscsi_aio_ioctl(BlockDriverState 
*bs,
 memcpy(&acb->task->cdb[0], acb->ioh->cmdp, acb->ioh->cmd_len);
 acb->task->expxferlen = acb->ioh->dxfer_len;
 
+data.size = 0;
 if (acb->task->xfer_dir == SCSI_XFER_WRITE) {
-data.data = acb->ioh->dxferp;
-data.size = acb->ioh->dxfer_len;
+if (acb->ioh->iovec_count == 0) {
+data.data = acb->ioh->dxferp;
+data.size = acb->ioh->dxfer_len;
+}
 }
 if (iscsi_scsi_command_async(iscsi, iscsilun->lun, acb->task,
  iscsi_aio_ioctl_cb,
- (acb->task->xfer_dir == SCSI_XFER_WRITE) ?
- &data : NULL,
+ (data.size > 0) ? &data : NULL,
  acb) != 0) {
 scsi_free_scsi_task(acb->task);
 qemu_aio_release(acb);
 return NULL;
 }
 
+/* We got an iovector for writing to the target */
+if (acb->task->xfer_dir == SCSI_XFER_WRITE) {
+if (acb->ioh->iovec_count > 0) {
+scsi_task_set_iov_out(acb->task,
+  (struct scsi_iovec *) acb->ioh->dxferp,
+  acb->ioh->iovec_count);
+}
+}
+
 /* tell libiscsi to read straight into the buffer we got from ioctl */
 if (acb->task->xfer_dir == SCSI_XFER_READ) {
-scsi_task_add_data_in_buffer(acb->task,
- acb->ioh->dxfer_len,
- acb->ioh->dxferp);
+if (acb->ioh->iovec_count == 0) {
+scsi_task_add_data_in_buffer(acb->task,
+ acb->ioh->dxfer_len,
+ acb->ioh->dxferp);
+} else {
+scsi_task_set_iov_in(acb->task,
+ (struct scsi_iovec *) acb->ioh->dxferp,
+ acb->ioh->iovec_count);
+}
 }
 
 iscsi_set_events(iscsilun);
-- 
1.7.3.1

[Qemu-devel] [PATCH] iSCSI fix crash when using virtio and libiscsi

2013-06-20 Thread Ronnie Sahlberg

Stefan, List

Please find a patch that fixes the crashes for using virtio with libiscsi.
The problem was that block/iscsi.c always assumed we got a plain buffer to read 
data into, and when we got an iovector array instead we would overwrite 
pointers with garbage and crash.

Since we can get iovectors for the write case as well I have added a fix for 
when the guest is writing data to the target to handle the iovector case as 
well.


The new calls added are not protected with (LIBISCSI_FEATURE_IOVECTOR) checks
since anyone building a new/current version of qemu should probably also build
against a current libiscsi.
I will send patches later to remove the current (LIBISCSI_FEATURE_IOVECTOR) 
checks in the rest of the file.


regards
ronnie sahlberg

Re: [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support

2013-06-20 Thread Alexey Kardashevskiy

On 06/21/2013 02:51 AM, Alex Williamson wrote:
> On Fri, 2013-06-21 at 00:08 +1000, Alexey Kardashevskiy wrote:
>> At the moment QEMU creates a route for every MSI IRQ.
>>
>> Now we are about to add IRQFD support on PPC64-pseries platform.
>> pSeries already has in-kernel emulated interrupt controller with
>> 8192 IRQs. Also, pSeries PHB already supports MSIMessage to IRQ
>> mapping as a part of PAPR requirements for MSI/MSIX guests.
>> Specifically, the pSeries guest does not touch MSIMessage's at
>> all, instead it uses rtas_ibm_change_msi and rtas_ibm_query_interrupt_source
>> rtas calls to do the mapping.
>>
>> Therefore we do not really need more routing than we got already.
>> The patch introduces the infrastructure to enable direct IRQ mapping.
>>
>> Signed-off-by: Alexey Kardashevskiy 
>>
>> ---
>>
>> The patch is raw and ugly indeed, I made it only to demonstrate
>> the idea and see if it has right to live or not.
>>
>> For some reason which I do not really understand (limited GSI numbers?)
>> the existing code always adds routing and I do not see why we would need it.
> 
> It's an IOAPIC, a pin gets toggled from the device and an MSI message
> gets written to the CPU.  So the route allocates and programs the
> pin->MSI, then we tell it what notifier triggers that pin.

> On x86 the MSI vector doesn't encode any information about the device
> sending the MSI, here you seem to be able to figure out the device and
> vector space number from the address.  Then your pin to MSI is
> effectively fixed.  So why isn't this just your
> kvm_irqchip_add_msi_route function?  On pSeries it's a lookup, on x86
> it's a allocate and program.
>  What does kvm_irqchip_add_msi_route do on
> pSeries today?  Thanks,


As we just started implementing this thing, I commented it out for the
starter. Once called, it destroys direct mapping in the host kernel and
everything stops working as routing is not implemented (yet? ever?).

My point here is that MSIMessage to irq translation is made on a PCI domain
as PAPR (ppc64 server) spec says. The guest never uses MSIMessage, it is
all in QEMU, the guest dynamically allocates MSI IRQs and it is up to a
hypeviser (QEMU) to take care of actual MSIMessage for the device.

And the only reason to use MSIMessage in QEMU for us is to support
msi_notify()/msix_notify() in places like vfio_msi_interrupt(), I have
added a MSI window for that long time ago which we do not need as much as
we already have an irq number in vfio_msi_interrupt(), etc.



> 
> Alex
> 
>> ---
>>  hw/misc/vfio.c   |   11 +--
>>  hw/pci/pci.c |   13 +
>>  hw/ppc/spapr_pci.c   |   13 +
>>  hw/virtio/virtio-pci.c   |   26 --
>>  include/hw/pci/pci.h |4 
>>  include/hw/pci/pci_bus.h |1 +
>>  6 files changed, 60 insertions(+), 8 deletions(-)
>>
>> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
>> index 14aac04..2d9eef7 100644
>> --- a/hw/misc/vfio.c
>> +++ b/hw/misc/vfio.c
>> @@ -639,7 +639,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
>> unsigned int nr,
>>   * Attempt to enable route through KVM irqchip,
>>   * default to userspace handling if unavailable.
>>   */
>> -vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
>> +
>> +vector->virq = msg ? pci_bus_map_msi(vdev->pdev.bus, *msg) : -1;
>> +if (vector->virq < 0) {
>> +vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : 
>> -1;
>> +}
>>  if (vector->virq < 0 ||
>>  kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
>> vector->virq) < 0) {
>> @@ -807,7 +811,10 @@ retry:
>>   * Attempt to enable route through KVM irqchip,
>>   * default to userspace handling if unavailable.
>>   */
>> -vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
>> +vector->virq = pci_bus_map_msi(vdev->pdev.bus, msg);
>> +if (vector->virq < 0) {
>> +vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
>> +}
>>  if (vector->virq < 0 ||
>>  kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
>> vector->virq) < 0) {
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index a976e46..a9875e9 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -1254,6 +1254,19 @@ void pci_device_set_intx_routing_notifier(PCIDevice 
>> *dev,
>>  dev->intx_routing_notifier = notifier;
>>  }
>>  
>> +void pci_bus_set_map_msi_fn(PCIBus *bus, pci_map_msi_fn map_msi_fn)
>> +{
>> +bus->map_msi = map_msi_fn;
>> +}
>> +
>> +int pci_bus_map_msi(PCIBus *bus, MSIMessage msg)
>> +{
>> +if (bus->map_msi) {
>> +return bus->map_msi(bus, msg);
>> +}
>> +return -1;
>> +}
>> +
>>  /*
>>   * PCI-to-PCI bridge specification
>>   * 9.1: Interrupt routing. Table 9-1
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>

[Qemu-devel] QEMU Memory subsystem

2013-06-20 Thread Basim Baig

Hello,

I am currently working on a project where I aim to log every memory access
made by a virtual machine running inside of QEMU (for analyzing kernel
behavior). My initial approach is to possibly hook into the QEMU mmu
implementation and find the place where the guest->host page translation or
lookup is done. In this way I can know any pages accessed by the guest
(This is only the first level. Eventually I would want to get logging at
pointer granularity). I have been reading through the source code and
online documentations for a week now to get a general sense of the qemu
internals and codebase.

I just wanted some advice on what direction I should head to (or who I can
talk to) If I really want to get into depth of how I can make significant
changes to qemu memory management and mmu subsystem.

Thanks,
Mirza Basim Baig
Stony Brook University

Re: [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support

2013-06-20 Thread Alexey Kardashevskiy

On 06/21/2013 02:37 AM, Anthony Liguori wrote:
> Alexey Kardashevskiy  writes:
> 
>> At the moment QEMU creates a route for every MSI IRQ.
>>
>> Now we are about to add IRQFD support on PPC64-pseries platform.
>> pSeries already has in-kernel emulated interrupt controller with
>> 8192 IRQs. Also, pSeries PHB already supports MSIMessage to IRQ
>> mapping as a part of PAPR requirements for MSI/MSIX guests.
>> Specifically, the pSeries guest does not touch MSIMessage's at
>> all, instead it uses rtas_ibm_change_msi and rtas_ibm_query_interrupt_source
>> rtas calls to do the mapping.
>>
>> Therefore we do not really need more routing than we got already.
>> The patch introduces the infrastructure to enable direct IRQ mapping.
>>
>> Signed-off-by: Alexey Kardashevskiy 
>>
>> ---
>>
>> The patch is raw and ugly indeed, I made it only to demonstrate
>> the idea and see if it has right to live or not.
>>
>> For some reason which I do not really understand (limited GSI numbers?)
>> the existing code always adds routing and I do not see why we would need it.
>>
>> Thanks!
>> ---
>>  hw/misc/vfio.c   |   11 +--
>>  hw/pci/pci.c |   13 +
>>  hw/ppc/spapr_pci.c   |   13 +
>>  hw/virtio/virtio-pci.c   |   26 --
>>  include/hw/pci/pci.h |4 
>>  include/hw/pci/pci_bus.h |1 +
>>  6 files changed, 60 insertions(+), 8 deletions(-)
>>
>> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
>> index 14aac04..2d9eef7 100644
>> --- a/hw/misc/vfio.c
>> +++ b/hw/misc/vfio.c
>> @@ -639,7 +639,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
>> unsigned int nr,
>>   * Attempt to enable route through KVM irqchip,
>>   * default to userspace handling if unavailable.
>>   */
>> -vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
>> +
>> +vector->virq = msg ? pci_bus_map_msi(vdev->pdev.bus, *msg) : -1;
>> +if (vector->virq < 0) {
>> +vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : 
>> -1;
>> +}
>>  if (vector->virq < 0 ||
>>  kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
>> vector->virq) < 0) {
>> @@ -807,7 +811,10 @@ retry:
>>   * Attempt to enable route through KVM irqchip,
>>   * default to userspace handling if unavailable.
>>   */
>> -vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
>> +vector->virq = pci_bus_map_msi(vdev->pdev.bus, msg);
>> +if (vector->virq < 0) {
>> +vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
>> +}
> 
> I don't understand why you're adding a pci level hook verses just having
> a kvmppc specific hook in the kvm_irqchip_add_msi_route function..

Me neither :) I am just asking. The existing mapping code already exists in
sPAPR PCI host bridge and it is not going anywhere else.

And kvm_irqchip_add_msi_route does not have any link to a device or a bus
so I'll have to walk through all PHBs in system and see if PHB's MSI window
is the one from MSIMessage and convert MSIMessage to virq. Pretty easy and
quick but still dirty hack, would it be better?


-- 
Alexey

Re: [Qemu-devel] [PATCH] target-arm: implement ARMv8 VSEL instruction

2013-06-20 Thread Måns Rullgård

Peter Maydell  writes:

> On 18 June 2013 15:30, Mans Rullgard  wrote:
>> This adds support for the VSEL instruction introduced in ARMv8.
>> It resides along with other new VFP instructions under the CDP2
>> encoding which was previously unused.
>>
>> Signed-off-by: Mans Rullgard 
>
> So I found this pretty confusing,

That makes two of us.

> which I think is an indication that we need to start by cleaning up
> the existing v7 VFP/Neon decode.
>
> Specifically, currently we handle all Neon decode by just calling
> the neon decode functions directly from the disas_arm_insn
> and disas_thumb2_insn functions. We should move VFP to work
> in the same way (ie take it out of disas_coproc_insn()).
> Basically, the architecture manual treats them as part of the
> core instruction set, and we should make our decoder do the same.
>
> The (existing) coproc decode is also confusing, and would benefit
> a lot from a comment at the top of disas_coproc_insn specifying
> the opcode patterns that can reach it.
>
>> +if (((insn >> 23) & 1) == 0) {
>> +/* vsel */
>> +int cc = (insn >> 20) & 3;
>> +int cond = (cc << 2) | (((cc << 1) ^ cc) & 2);
>> +int pass_label = gen_new_label();
>> +
>> +gen_mov_F0_vreg(dp, rn);
>> +gen_mov_vreg_F0(dp, rd);
>> +gen_test_cc(cond, pass_label);
>> +gen_mov_F0_vreg(dp, rm);
>> +gen_mov_vreg_F0(dp, rd);
>> +gen_set_label(pass_label);
>
> You can generate better code with the TCG movcond op.
> Luckily you don't actually have to duplicate the whole of
> gen_test_cc only doing movconds, because there are only actually
> 4 encodable conditions here (3 of which turn into a single
> movcond; the fourth requires two consecutive movcond ops).

Thanks, that sounds better.

> Also I don't think we should introduce any new uses of F0/F1.
> You can just load a VFP register into a TCG temp like this:

Great, more obsolete stuff.

> ftmp = tcg_temp_new_i32();
> tcg_gen_ld_f32(ftmp, cpu_env, vfp_reg_offset(0, rd));
>
> operate on it as usual, and store:
> tcg_gen_st_f32(ftmp, cpu_env, vfp_reg_offset(0, rd));
> tcg_temp_free_i32(ftmp);
>
> (similarly for double).
>
>> @@ -6699,6 +6742,12 @@ static void disas_arm_insn(CPUARMState * env, 
>> DisasContext *s)
>>  }
>>  return; /* v7MP: Unallocated memory hint: must NOP */
>>  }
>> +if ((insn & 0x0f10) == 0x0e00) {
>> +/* cdp2 */
>> +if (disas_coproc_insn(env, s, insn))
>> +goto illegal_op;
>> +return;
>> +}
>
> This hunk is oddly placed, because it's neither next to the neon
> decode (which is further up) nor the mrc2/mcr2 decode (which is
> further down).

That's because it is neither.  It is CDP2, previously not decoded at all.
This seemed as logical a place as any to me.  If you disagree, please
say where you'd prefer that it go.

-- 
Måns Rullgård
m...@mansr.com

Re: [Qemu-devel] [PATCH v2] e600 core for MPC86xx processors

2013-06-20 Thread Alexander Graf


On 26.05.2013, at 19:41, Julio Guerra wrote:

> MPC86xx processors are based on the e600 core, which is not the case
> in qemu where it is based on the 7400 processor.
> 
> This patch creates the e600 core and instantiates the MPC86xx
> processors based on it. Therefore, adding the high BATs and the SPRG
> 4..7 registers, which are e600-specific [1].
> 
> This allows to define the MPC8610 processor too and my program running
> on a real MPC8610 target is now able to run on qemu :)
> 
> [1] http://cache.freescale.com/files/32bit/doc/ref_manual/E600CORERM.pdf
> 
> Signed-off-by: Julio Guerra 

Thanks, applied to ppc-next.


Alex

Re: [Qemu-devel] [PATCH] pseries: Support for in-kernel XICS interrupt controller

2013-06-20 Thread Alexander Graf


On 05.06.2013, at 09:39, Alexey Kardashevskiy wrote:

> From: David Gibson 
> 
> Recent (host) kernels support emulating the PAPR defined "XICS" interrupt
> controller system within KVM.  This patch allows qemu to initialize and
> configure the in-kernel XICS, and keep its state in sync with qemu's XICS
> state as necessary.
> 
> This should give considerable performance improvements.  e.g. on a simple
> IPI ping-pong test between hardware threads, using qemu XICS gives us
> around 5,000 irqs/second, whereas the in-kernel XICS gives us around
> 70,000 irqs/s on the same hardware configuration.
> 
> [Mike Qiu : fixed mistype which caused 
> ics_set_kvm_state() to fail]
> Signed-off-by: David Gibson 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> 
> This depends on the "pseries: savevm support for XICS interrupt controller"
> patch posted earlier.
> 
> ---
> hw/ppc/spapr.c|4 +-
> hw/ppc/xics.c |  333 -
> include/hw/ppc/xics.h |8 +-
> 3 files changed, 336 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 71da11b..04e0eae 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1136,8 +1136,6 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
> }
> env = &cpu->env;
> 
> -xics_cpu_setup(spapr->icp, cpu);
> -
> /* Set time-base frequency to 512 MHz */
> cpu_ppc_tb_init(env, TIMEBASE_FREQ);
> 
> @@ -1151,6 +1149,8 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
> kvmppc_set_papr(cpu);
> }
> 
> +xics_cpu_setup(spapr->icp, cpu);
> +
> qemu_register_reset(spapr_cpu_reset, cpu);
> }
> 
> diff --git a/hw/ppc/xics.c b/hw/ppc/xics.c
> index 02e44a0..b83f19f 100644
> --- a/hw/ppc/xics.c
> +++ b/hw/ppc/xics.c
> @@ -29,12 +29,19 @@
> #include "trace.h"
> #include "hw/ppc/spapr.h"
> #include "hw/ppc/xics.h"
> +#include "kvm_ppc.h"
> +#include "sysemu/kvm.h"
> +#include "config.h"
> +#include "qemu/config-file.h"
> +
> +#include 

Huh? This breaks compilation on non-Linux.

> 
> /*
>  * ICP: Presentation layer
>  */
> 
> struct icp_server_state {
> +CPUState *cs;

Why did you get around this earlier without the CPUState pointer?

> uint32_t xirr;
> uint8_t pending_priority;
> uint8_t mfrr;
> @@ -53,6 +60,9 @@ struct icp_state {
> uint32_t nr_servers;
> struct icp_server_state *ss;
> struct ics_state *ics;
> +uint32_t set_xive_token, get_xive_token,
> +int_off_token, int_on_token;

Separate declaration lines please.

> +int kernel_xics_fd;
> };
> 
> static void ics_reject(struct ics_state *ics, int nr);
> @@ -168,6 +178,66 @@ static void icp_irq(struct icp_state *icp, int server, 
> int nr, uint8_t priority)
> }
> }
> 
> +static void icp_get_kvm_state(struct icp_server_state *ss)
> +{
> +#ifdef CONFIG_KVM
> +uint64_t state;
> +struct kvm_one_reg reg = {
> +.id = KVM_REG_PPC_ICP_STATE,
> +.addr = (uintptr_t)&state,
> +};
> +int ret;
> +
> +if (!ss->cs) {
> +return; /* kernel irqchip not in use */
> +}
> +
> +ret = kvm_vcpu_ioctl(ss->cs, KVM_GET_ONE_REG, ®);
> +if (ret != 0) {
> +fprintf(stderr, "Unable to retrieve KVM interrupt controller state"
> +" for CPU %d: %s\n", ss->cs->cpu_index, strerror(errno));
> +exit(1);
> +}
> +
> +ss->xirr = state >> KVM_REG_PPC_ICP_XISR_SHIFT;
> +ss->mfrr = (state >> KVM_REG_PPC_ICP_MFRR_SHIFT)
> +& KVM_REG_PPC_ICP_MFRR_MASK;
> +ss->pending_priority = (state >> KVM_REG_PPC_ICP_PPRI_SHIFT)
> +& KVM_REG_PPC_ICP_PPRI_MASK;
> +#endif /* CONFIG_KVM */

This needs to get encapsulated into a kvm helper function that gets a dummy 
definition for non-KVM. We've been through this multiple times now in other 
areas.

> +}
> +
> +static int icp_set_kvm_state(struct icp_server_state *ss)
> +{
> +#ifdef CONFIG_KVM
> +uint64_t state;
> +struct kvm_one_reg reg = {
> +.id = KVM_REG_PPC_ICP_STATE,
> +.addr = (uintptr_t)&state,
> +};
> +int ret;
> +
> +if (!ss->cs) {
> +return 0; /* kernel irqchip not in use */
> +}
> +
> +state = ((uint64_t)ss->xirr << KVM_REG_PPC_ICP_XISR_SHIFT)
> +| ((uint64_t)ss->mfrr << KVM_REG_PPC_ICP_MFRR_SHIFT)
> +| ((uint64_t)ss->pending_priority << KVM_REG_PPC_ICP_PPRI_SHIFT);
> +
> +ret = kvm_vcpu_ioctl(ss->cs, KVM_SET_ONE_REG, ®);
> +if (ret != 0) {
> +fprintf(stderr, "Unable to restore KVM interrupt controller state 
> (0x%"
> +PRIx64 ") for CPU %d: %s\n", state, ss->cs->cpu_index,
> +strerror(errno));
> +exit(1);
> +return ret;
> +}
> +#endif /* CONFIG_KVM */
> +
> +return 0;
> +}
> +
> /*
>  * ICS: Source layer
>  */
> @@ -336,6 +406,107 @@ static void ics_eoi(struct ics_state *ics, int nr)
> }
> }
> 
> +static void ics_get_kvm_state(struct ics_state *ics)
> +{
> +#ifdef

Re: [Qemu-devel] [PATCH v4] target-ppc: Introduce unrealizefn for PowerPCCPU

2013-06-20 Thread Alexander Graf


On 09.06.2013, at 22:11, Andreas Färber wrote:

> Use it to clean up the opcode table, resolving a former TODO from Jocelyn.
> Also switch from malloc() to g_malloc().
> 
> Signed-off-by: Andreas Färber 

Thanks, applied to ppc-next.


Alex

Re: [Qemu-devel] [PATCH] booke_ppc: limit booke timer to max when timeout overflow

2013-06-20 Thread Alexander Graf


On 12.06.2013, at 14:30, Bharat Bhushan wrote:

> Limit watchdog and fit timer to maximum timeout value which
> qemu timer can support (INT64_MAX). This maximum timeout will be
> hundreds of years, so limiting to max timeout is pretty safe.
> 
> Signed-off-by: Bharat Bhushan 

Thanks, applied to ppc-next.


Alex

Re: [Qemu-devel] [PATCH 07/12] block: save the associated child in BlockDriverState

2013-06-20 Thread Paolo Bonzini

Il 20/06/2013 19:46, Marc-André Lureau ha scritto:
> This allows the Spice block driver to eject the associated device.

The child can change when you have for example a streaming operation.
What exactly are you trying to do here (I guess I'll understand more
when I get to the later patches)?

Can you draw the relationships between all the BlockDriverStates in a
spicebd: drive?

Paolo

> Signed-off-by: Marc-André Lureau 
> ---
>  block.c   | 46 +-
>  include/block/block_int.h |  1 +
>  2 files changed, 30 insertions(+), 17 deletions(-)
> 
> diff --git a/block.c b/block.c
> index b88ad2f..f502eed 100644
> --- a/block.c
> +++ b/block.c
> @@ -294,7 +294,8 @@ void bdrv_register(BlockDriver *bdrv)
>  }
>  
>  /* create a new block device (by default it is empty) */
> -BlockDriverState *bdrv_new(const char *device_name)
> +static BlockDriverState *bdrv_new_int(const char *device_name,
> +BlockDriverState *child)
>  {
>  BlockDriverState *bs;
>  
> @@ -305,10 +306,16 @@ BlockDriverState *bdrv_new(const char *device_name)
>  }
>  bdrv_iostatus_disable(bs);
>  notifier_list_init(&bs->close_notifiers);
> +bs->child = child;
>  
>  return bs;
>  }
>  
> +BlockDriverState *bdrv_new(const char *device_name)
> +{
> +return bdrv_new_int(device_name, NULL);
> +}
> +
>  void bdrv_add_close_notifier(BlockDriverState *bs, Notifier *notify)
>  {
>  notifier_list_add(&bs->close_notifiers, notify);
> @@ -769,16 +776,8 @@ free_and_fail:
>  return ret;
>  }
>  
> -/*
> - * Opens a file using a protocol (file, host_device, nbd, ...)
> - *
> - * options is a QDict of options to pass to the block drivers, or NULL for an
> - * empty set of options. The reference to the QDict belongs to the block 
> layer
> - * after the call (even on failure), so if the caller intends to reuse the
> - * dictionary, it needs to use QINCREF() before calling bdrv_file_open.
> - */
> -int bdrv_file_open(BlockDriverState **pbs, const char *filename,
> -   QDict *options, int flags)
> +static int bdrv_file_open_int(BlockDriverState **pbs, const char *filename,
> +QDict *options, int flags, BlockDriverState *child)
>  {
>  BlockDriverState *bs;
>  BlockDriver *drv;
> @@ -790,7 +789,7 @@ int bdrv_file_open(BlockDriverState **pbs, const char 
> *filename,
>  options = qdict_new();
>  }
>  
> -bs = bdrv_new("");
> +bs = bdrv_new_int("", child);
>  bs->options = options;
>  options = qdict_clone_shallow(options);
>  
> @@ -873,6 +872,20 @@ fail:
>  }
>  
>  /*
> + * Opens a file using a protocol (file, host_device, nbd, ...)
> + *
> + * options is a QDict of options to pass to the block drivers, or NULL for an
> + * empty set of options. The reference to the QDict belongs to the block 
> layer
> + * after the call (even on failure), so if the caller intends to reuse the
> + * dictionary, it needs to use QINCREF() before calling bdrv_file_open.
> + */
> +int bdrv_file_open(BlockDriverState **pbs, const char *filename,
> +   QDict *options, int flags)
> +{
> +return bdrv_file_open_int(pbs, filename, options, flags, NULL);
> +}
> +
> +/*
>   * Opens the backing file for a BlockDriverState if not yet open
>   *
>   * options is a QDict of options to pass to the block drivers, or NULL for an
> @@ -904,7 +917,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
> *options)
>  return 0;
>  }
>  
> -bs->backing_hd = bdrv_new("");
> +bs->backing_hd = bdrv_new_int("", bs);
>  bdrv_get_full_backing_filename(bs, backing_filename,
> sizeof(backing_filename));
>  
> @@ -990,7 +1003,7 @@ int bdrv_open(BlockDriverState *bs, const char 
> *filename, QDict *options,
> instead of opening 'filename' directly */
>  
>  /* if there is a backing file, use it */
> -bs1 = bdrv_new("");
> +bs1 = bdrv_new_int("", bs);
>  ret = bdrv_open(bs1, filename, NULL, 0, drv);
>  if (ret < 0) {
>  bdrv_delete(bs1);
> @@ -1043,9 +1056,8 @@ int bdrv_open(BlockDriverState *bs, const char 
> *filename, QDict *options,
>  }
>  
>  extract_subqdict(options, &file_options, "file.");
> -
> -ret = bdrv_file_open(&file, filename, file_options,
> - bdrv_open_flags(bs, flags));
> +ret = bdrv_file_open_int(&file, filename, file_options,
> + bdrv_open_flags(bs, flags), bs);
>  if (ret < 0) {
>  goto fail;
>  }
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index ba52247..9c72b32 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -245,6 +245,7 @@ struct BlockDriverState {
>  
>  BlockDriverState *backing_hd;
>  BlockDriverState *file;
> +BlockDriverState *child;
>  
>  NotifierList close_notifiers;
>  
>

Re: [Qemu-devel] [PATCH 02/12] qtest: add spapr hypercall support

2013-06-20 Thread Alexander Graf


On 20.06.2013, at 20:58, Anthony Liguori wrote:

> Alexander Graf  writes:
> 
>> Am 20.06.2013 um 17:42 schrieb Anthony Liguori :
>> 
>>> Andreas Färber  writes:
>>> 
 Am 19.06.2013 22:40, schrieb Anthony Liguori:
> Signed-off-by: Anthony Liguori 
> ---
> qtest.c  | 29 +
> tests/libqtest.c | 18 ++
> tests/libqtest.h | 46 ++
> 3 files changed, 93 insertions(+)
> 
> diff --git a/qtest.c b/qtest.c
> index 07a9612..f8c8f44 100644
> --- a/qtest.c
> +++ b/qtest.c
> @@ -19,6 +19,9 @@
> #include "hw/irq.h"
> #include "sysemu/sysemu.h"
> #include "sysemu/cpus.h"
> +#ifdef TARGET_PPC64
> +#include "hw/ppc/spapr.h"
> +#endif
> 
> #define MAX_IRQ 256
> 
> @@ -141,6 +144,13 @@ static bool qtest_opened;
> * where NUM is an IRQ number.  For the PC, interrupts can be intercepted
> * simply with "irq_intercept_in ioapic" (note that IRQ0 comes out with
> * NUM=0 even though it is remapped to GSI 2).
> + *
> + * Platform specific (sPAPR):
> + *
> + *  > papr_hypercall NR ARG0 ARG1 ... ARG8
 
 The functions are called spapr_hcall*() but the protocol uses
 papr_hypercall?
>>> 
>>> The discrepancy is inherited in the KVM vs. QEMU interfaces.  It's
>>> called papr_hypercall in the KVM interface vs. spapr in QEMU.
>>> 
>>> I honestly don't know what the distinction between spapr and papr is.
>> 
>> PAPR is what PAPR calls itself. However, there is also an ePAPR for
>> BookE, so in order to distinguish the 2 more easily, we named the
>> server version spapr wherever we remembered to.
> 
> So does it make sense to have papr_hypercall()?  Do hypercalls exist
> with the virtualization extensions on BookE?

papr_hypercall() really means spapr_hypercall() :). I don't think we should 
mangle ePAPR and sPAPR together.


Alex

Re: [Qemu-devel] git tag for 1.4.2

2013-06-20 Thread Peter Feiner

I had this same question. I'm not sure why it wasn't added to the mainline
qemu.git. In any case, you can get the commits and tag from git://
git.qemu.org/qemu-stable-1.4.git (git remote add stable-1.4 git://
git.qemu.org/qemu-stable-1.4.git && git fetch stable-1.4).

Peter

On Fri, May 31, 2013 at 6:07 AM, Dietmar Maurer  wrote:

>  Is there a git tag for 1.4.2?
>

Re: [Qemu-devel] [PATCH] int128: optimize

2013-06-20 Thread Paolo Bonzini

Il 20/06/2013 18:46, Richard Henderson ha scritto:
> On 06/20/2013 08:00 AM, Paolo Bonzini wrote:
>>  static inline Int128 int128_sub(Int128 a, Int128 b)
>>  {
>> -return int128_add(a, int128_neg(b));
>> +uint64_t lo = a.lo - b.lo;
>> +return (Int128) { lo, (lo < a.lo) + a.hi - b.hi };
> 
> This one isn't right.  Consider { 2, 0 } - { 2, 0 }
> 
>   lo = 2 - 2 = 0;
>   = { 0, (0 < 2) + 0 - 0 }
>   = { 0, 1 }
> 
> I'd be happier with a more traditional
> 
>   (Int128){ a.lo - b.lo, a.hi - b.hi - (a.lo < b.lo) };

Yeah, I wasn't quite sure of this and I was waiting for testcases to
prove me wrong...  To fix it in the style I used you need

   (Int128){ lo, a.hi - b.hi - (lo > a.lo) }

(We have to sum a + ~b + 1.  We have lo = a.lo + ~b.lo + 1, from which
the carry-out is either lo <= a.lo or lo <= ~b.lo, using <= because of
the carry-in.  Then the high part is

   a.hi + ~b.hi   + (lo <= a.lo)
 = a.hi + (-1 - b.hi) + 1 - (lo > a.lo)
 = a.hi - b.hi- (lo > a.lo)

).  But I'll go with your version, it probably generates better code
too.

Paolo

Re: [Qemu-devel] [PULL 00/21] pci,net,misc enhancements

2013-06-20 Thread Anthony Liguori

Gleb Natapov  writes:

> On Thu, Jun 20, 2013 at 02:02:59PM -0500, Anthony Liguori wrote:
>> "Michael S. Tsirkin"  writes:
>> 
>> > From: Michael S. Tsirkin 
>> >
>> > The following changes since commit 
>> > 90a2541b763b31d2b551b07e24aae3de5266d31b:
>> >
>> >   target-i386: fix over 80 chars warnings (2013-06-15 17:50:38 +)
>> >
>> > are available in the git repository at:
>> >
>> >   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_anthony
>> >
>> > for you to fetch changes up to f96c30047009f8a9c3cecf68104d8d99f989f54d:
>> >
>> >   pci: Fold host_buses list into PCIHostState functionality (2013-06-19 
>> > 18:35:05 +0300)
>> >
>> > 
>> > pci,net,misc enhancements
>> 
>> I don't like the amount of "misc" in this pull request but I'll take it
>> with appropriate acks.
>> 
>> >
>> > This includes some pci and net-related enhancements:
>> >
>> > Better support for systems with multiple PCI root buses
>> > A new management interface for access to rx filter in NICs
>> > KVM Speedup for MSI updates on kvm
>> > FW cfg interface for more robust pci programming in BIOS
>> > Minor fixes/cleanups for fw cfg and cross-version migration -
>> > because of dependencies with other patches
>> >
>> > Signed-off-by: Michael S. Tsirkin 
>> >
>> > 
>> > Amos Kong (1):
>> >   net: add support of mac-programming over macvtap in QEMU side
>> >
>> > Andrew Jones (1):
>> >   e1000: cleanup process_tx_desc
>> >
>> > David Gibson (10):
>> >   pci: Cleanup configuration for pci-hotplug.c
>> >   pci: Move pci_read_devaddr to pci-hotplug-old.c
>> >   pci: Abolish pci_find_root_bus()
>> >   pci: Use helper to find device's root bus in pci_find_domain()
>> >   pci: Replace pci_find_domain() with more general pci_root_bus_path()
>> >   pci: Add root bus argument to pci_get_bus_devfn()
>> >   pci: Add root bus parameter to pci_nic_init()
>> >   pci: Simpler implementation of primary PCI bus
>> >   pci: Remove domain from PCIHostBus
>> >   pci: Fold host_buses list into PCIHostState functionality
>> >
>> > Michael S. Tsirkin (9):
>> >   range: add Range structure
>> >   pci: store PCI hole ranges in guestinfo structure
>> >   pc: pass PCI hole ranges to Guests
>> >   pc_piix: cleanup init compat handling
>> >   kvm: zero-initialize KVM_SET_GSI_ROUTING input
>> >   kvm: skip system call when msi route is unchanged
>> >   MAINTAINERS: s/Marcelo/Paolo/
>> 
>> Shouldn't these be coming through the uq/master tree?  I haven't see a
>> pull for uq/master in a long time.  Does that tree still exist?
>> 
>> Would that be coming from Paolo or Gleb?  Can one of ya'll ack these
>> changes please.
>> 
> ACK. I should have taken it through KVM tree really, but since MST's
> patch sending scripts depend on accurate MAINTAINERS information having
> it in his tree saved Marcelo's inbox from a couple of dozens unwanted
> emails.

Thanks.

Regards,

Anthony Liguori

>
> --
>   Gleb.

Re: [Qemu-devel] [PULL 00/21] pci,net,misc enhancements

2013-06-20 Thread Gleb Natapov

On Thu, Jun 20, 2013 at 02:02:59PM -0500, Anthony Liguori wrote:
> "Michael S. Tsirkin"  writes:
> 
> > From: Michael S. Tsirkin 
> >
> > The following changes since commit 90a2541b763b31d2b551b07e24aae3de5266d31b:
> >
> >   target-i386: fix over 80 chars warnings (2013-06-15 17:50:38 +)
> >
> > are available in the git repository at:
> >
> >   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_anthony
> >
> > for you to fetch changes up to f96c30047009f8a9c3cecf68104d8d99f989f54d:
> >
> >   pci: Fold host_buses list into PCIHostState functionality (2013-06-19 
> > 18:35:05 +0300)
> >
> > 
> > pci,net,misc enhancements
> 
> I don't like the amount of "misc" in this pull request but I'll take it
> with appropriate acks.
> 
> >
> > This includes some pci and net-related enhancements:
> >
> > Better support for systems with multiple PCI root buses
> > A new management interface for access to rx filter in NICs
> > KVM Speedup for MSI updates on kvm
> > FW cfg interface for more robust pci programming in BIOS
> > Minor fixes/cleanups for fw cfg and cross-version migration -
> > because of dependencies with other patches
> >
> > Signed-off-by: Michael S. Tsirkin 
> >
> > 
> > Amos Kong (1):
> >   net: add support of mac-programming over macvtap in QEMU side
> >
> > Andrew Jones (1):
> >   e1000: cleanup process_tx_desc
> >
> > David Gibson (10):
> >   pci: Cleanup configuration for pci-hotplug.c
> >   pci: Move pci_read_devaddr to pci-hotplug-old.c
> >   pci: Abolish pci_find_root_bus()
> >   pci: Use helper to find device's root bus in pci_find_domain()
> >   pci: Replace pci_find_domain() with more general pci_root_bus_path()
> >   pci: Add root bus argument to pci_get_bus_devfn()
> >   pci: Add root bus parameter to pci_nic_init()
> >   pci: Simpler implementation of primary PCI bus
> >   pci: Remove domain from PCIHostBus
> >   pci: Fold host_buses list into PCIHostState functionality
> >
> > Michael S. Tsirkin (9):
> >   range: add Range structure
> >   pci: store PCI hole ranges in guestinfo structure
> >   pc: pass PCI hole ranges to Guests
> >   pc_piix: cleanup init compat handling
> >   kvm: zero-initialize KVM_SET_GSI_ROUTING input
> >   kvm: skip system call when msi route is unchanged
> >   MAINTAINERS: s/Marcelo/Paolo/
> 
> Shouldn't these be coming through the uq/master tree?  I haven't see a
> pull for uq/master in a long time.  Does that tree still exist?
> 
> Would that be coming from Paolo or Gleb?  Can one of ya'll ack these
> changes please.
> 
ACK. I should have taken it through KVM tree really, but since MST's
patch sending scripts depend on accurate MAINTAINERS information having
it in his tree saved Marcelo's inbox from a couple of dozens unwanted
emails.

--
Gleb.

Re: [Qemu-devel] [PATCH 01/12] chardev: ringbuf: add optional save parameter to save state

2013-06-20 Thread Eric Blake

On 06/19/2013 09:40 PM, Anthony Liguori wrote:
> It is very useful to use the ringbuf chardev for writing test
> cases and even more useful if the state of the ringbuf is migrated
> with the guest.  Otherwise it's hard to detect data loss in a test
> case.
> 
> Signed-off-by: Anthony Liguori 
> ---
>  qapi-schema.json |  3 ++-
>  qemu-char.c  | 45 +++--
>  2 files changed, 45 insertions(+), 3 deletions(-)
> 
> diff --git a/qapi-schema.json b/qapi-schema.json
> index a80ee40..90602d1 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -3280,10 +3280,11 @@
>  # Configuration info for memory chardevs
>  #
>  # @size: #optional Ringbuffer size, must be power of two, default is 65536
> +# @save: #optional Register a savevm handler, default false

Useful to have a '(since 1.6)' notation on the added field.

>  #
>  # Since: 1.5
>  ##
> -{ 'type': 'ChardevMemory', 'data': { '*size'  : 'int' } }
> +{ 'type': 'ChardevMemory', 'data': { '*size'  : 'int', '*save': 'bool' } }

Yet another case for introspection discovering an added feature to an
existing command.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-ppc] [PATCH 02/12] qtest: add spapr hypercall support

2013-06-20 Thread Scott Wood

On 06/20/2013 01:58:55 PM, Anthony Liguori wrote:

Alexander Graf  writes:

> Am 20.06.2013 um 17:42 schrieb Anthony Liguori  
:

>
>> Andreas Färber  writes:
>>
>>> The functions are called spapr_hcall*() but the protocol uses
>>> papr_hypercall?
>>
>> The discrepancy is inherited in the KVM vs. QEMU interfaces.  It's
>> called papr_hypercall in the KVM interface vs. spapr in QEMU.
>>
>> I honestly don't know what the distinction between spapr and papr  
is.

>
> PAPR is what PAPR calls itself. However, there is also an ePAPR for
> BookE, so in order to distinguish the 2 more easily, we named the
> server version spapr wherever we remembered to.

So does it make sense to have papr_hypercall()?  Do hypercalls exist
with the virtualization extensions on BookE?

Yes, there are hypercalls on booke.  Currently the few that KVM  
supports are all handled in the kernel, but that may change, especially  
since Alex is thinking about new hypercalls for guest reset/stop.

-Scott

Re: [Qemu-devel] [PATCH] target-arm: implement ARMv8 VSEL instruction

2013-06-20 Thread Peter Maydell

On 18 June 2013 15:30, Mans Rullgard  wrote:
> This adds support for the VSEL instruction introduced in ARMv8.
> It resides along with other new VFP instructions under the CDP2
> encoding which was previously unused.
>
> Signed-off-by: Mans Rullgard 

So I found this pretty confusing, which I think is an indication
that we need to start by cleaning up the existing v7 VFP/Neon
decode.

Specifically, currently we handle all Neon decode by just calling
the neon decode functions directly from the disas_arm_insn
and disas_thumb2_insn functions. We should move VFP to work
in the same way (ie take it out of disas_coproc_insn()).
Basically, the architecture manual treats them as part of the
core instruction set, and we should make our decoder do the same.

The (existing) coproc decode is also confusing, and would benefit
a lot from a comment at the top of disas_coproc_insn specifying
the opcode patterns that can reach it.

> +if (((insn >> 23) & 1) == 0) {
> +/* vsel */
> +int cc = (insn >> 20) & 3;
> +int cond = (cc << 2) | (((cc << 1) ^ cc) & 2);
> +int pass_label = gen_new_label();
> +
> +gen_mov_F0_vreg(dp, rn);
> +gen_mov_vreg_F0(dp, rd);
> +gen_test_cc(cond, pass_label);
> +gen_mov_F0_vreg(dp, rm);
> +gen_mov_vreg_F0(dp, rd);
> +gen_set_label(pass_label);

You can generate better code with the TCG movcond op.
Luckily you don't actually have to duplicate the whole of
gen_test_cc only doing movconds, because there are only actually
4 encodable conditions here (3 of which turn into a single
movcond; the fourth requires two consecutive movcond ops).

Also I don't think we should introduce any new uses of F0/F1.
You can just load a VFP register into a TCG temp like this:

ftmp = tcg_temp_new_i32();
tcg_gen_ld_f32(ftmp, cpu_env, vfp_reg_offset(0, rd));

operate on it as usual, and store:
tcg_gen_st_f32(ftmp, cpu_env, vfp_reg_offset(0, rd));
tcg_temp_free_i32(ftmp);

(similarly for double).

> @@ -6699,6 +6742,12 @@ static void disas_arm_insn(CPUARMState * env, 
> DisasContext *s)
>  }
>  return; /* v7MP: Unallocated memory hint: must NOP */
>  }
> +if ((insn & 0x0f10) == 0x0e00) {
> +/* cdp2 */
> +if (disas_coproc_insn(env, s, insn))
> +goto illegal_op;
> +return;
> +}

This hunk is oddly placed, because it's neither next to the neon
decode (which is further up) nor the mrc2/mcr2 decode (which is
further down).

thanks
-- PMM

Re: [Qemu-devel] qemu-ga behavior on virtio-serial unplug

2013-06-20 Thread Laszlo Ersek

On 06/20/13 15:31, Amit Shah wrote:
> On (Wed) 19 Jun 2013 [13:17:57], Laszlo Ersek wrote:

>> In any case we'd need a way to tell "host side close" from "port unplug".
> 
> Will POLLHUP|POLLERR help, along with error returns on read() and
> write()?

I think so:
- read() == 0  --> host side disconnected,
- read() == -1, equivalently POLLERR --> unplug (or other error)
- write() == -1 / errno == EPIPE: host side disconnected,
- write() == -1 / errno == EIO (or ENXIO or EINVAL): unplug

I think the current code could be adapted to such a scheme gracefully.
(Regarding the error codes on write(), I just made them up, but you get
the idea.)

On hot-unplug we could bail out to a more external loop that tries to
reopen the same device, or just exit and leave the restart to udev/systemd.

If possible I would like to avoid SIGIO. SIGIO is basically a non-queued
(= can be pending or not pending; any realtime queueing variant is
limited in depth hence useless) and edge triggered readiness
notification. It isn't portable and requires extra hoops to jump through
just to get the file descriptor and to tell a read event from a write event.

Although it's possible to base level triggered readiness on top of it
(by setting read & write readiness booleans on SIGIO and clearing them
on the respective -1/EAGAIN, while blocking SIGIO carefully), I think
that's quite distant from our current event loop. Hence we should remove
O_ASYNC (and the Solaris equivalent too) with the same fell swoop.

Thanks
Laszlo

[Qemu-devel] [PATCH 1/2] iscsi: add support for bdrv_co_is_allocated()

2013-06-20 Thread Peter Lieven

Signed-off-by: Peter Lieven 
---
 block/iscsi.c |   57 +
 1 file changed, 57 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 0bbf0b1..e6b966d 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -49,6 +49,7 @@ typedef struct IscsiLun {
 uint64_t num_blocks;
 int events;
 QEMUTimer *nop_timer;
+uint8_t lbpme;
 } IscsiLun;
 
 typedef struct IscsiAIOCB {
@@ -800,6 +801,60 @@ iscsi_getlength(BlockDriverState *bs)
 return len;
 }
 
+static int coroutine_fn iscsi_co_is_allocated(BlockDriverState *bs,
+  int64_t sector_num,
+  int nb_sectors, int *pnum)
+{
+IscsiLun *iscsilun = bs->opaque;
+struct scsi_task *task = NULL;
+struct scsi_get_lba_status *lbas = NULL;
+struct scsi_lba_status_descriptor *lbasd = NULL;
+int ret;
+
+*pnum = nb_sectors;
+
+if (iscsilun->lbpme == 0) {
+return 1;
+}
+
+/* in-flight requests could invalidate the lba status result */
+while (iscsi_process_flush(iscsilun)) {
+qemu_aio_wait();
+}
+
+task = iscsi_get_lba_status_sync(iscsilun->iscsi, iscsilun->lun,
+ sector_qemu2lun(sector_num, iscsilun),
+ 8+16);
+
+if (task == NULL || task->status != SCSI_STATUS_GOOD) {
+scsi_free_scsi_task(task);
+return 1;
+}
+
+lbas = scsi_datain_unmarshall(task);
+if (lbas == NULL) {
+scsi_free_scsi_task(task);
+return 1;
+}
+
+lbasd = &lbas->descriptors[0];
+
+if (sector_qemu2lun(sector_num, iscsilun) != lbasd->lba) {
+return 1;
+}
+
+*pnum = lbasd->num_blocks * (iscsilun->block_size / BDRV_SECTOR_SIZE);
+if (*pnum > nb_sectors) {
+*pnum = nb_sectors;
+}
+
+ret = (lbasd->provisioning == SCSI_PROVISIONING_TYPE_MAPPED) ? 1 : 0;
+
+scsi_free_scsi_task(task);
+
+return ret;
+}
+
 static int parse_chap(struct iscsi_context *iscsi, const char *target)
 {
 QemuOptsList *list;
@@ -948,6 +1003,7 @@ static int iscsi_readcapacity_sync(IscsiLun *iscsilun)
 } else {
 iscsilun->block_size = rc16->block_length;
 iscsilun->num_blocks = rc16->returned_lba + 1;
+iscsilun->lbpme = rc16->lbpme;
 }
 }
 break;
@@ -1274,6 +1330,7 @@ static BlockDriver bdrv_iscsi = {
 
 .bdrv_aio_discard = iscsi_aio_discard,
 .bdrv_has_zero_init = iscsi_has_zero_init,
+.bdrv_co_is_allocated = iscsi_co_is_allocated,
 
 #ifdef __linux__
 .bdrv_ioctl   = iscsi_ioctl,
-- 
1.7.9.5

Re: [Qemu-devel] [PULL 00/21] pci,net,misc enhancements

2013-06-20 Thread Anthony Liguori

"Michael S. Tsirkin"  writes:

> From: Michael S. Tsirkin 
>
> The following changes since commit 90a2541b763b31d2b551b07e24aae3de5266d31b:
>
>   target-i386: fix over 80 chars warnings (2013-06-15 17:50:38 +)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_anthony
>
> for you to fetch changes up to f96c30047009f8a9c3cecf68104d8d99f989f54d:
>
>   pci: Fold host_buses list into PCIHostState functionality (2013-06-19 
> 18:35:05 +0300)
>
> 
> pci,net,misc enhancements

I don't like the amount of "misc" in this pull request but I'll take it
with appropriate acks.

>
> This includes some pci and net-related enhancements:
>
> Better support for systems with multiple PCI root buses
> A new management interface for access to rx filter in NICs
> KVM Speedup for MSI updates on kvm
> FW cfg interface for more robust pci programming in BIOS
> Minor fixes/cleanups for fw cfg and cross-version migration -
> because of dependencies with other patches
>
> Signed-off-by: Michael S. Tsirkin 
>
> 
> Amos Kong (1):
>   net: add support of mac-programming over macvtap in QEMU side
>
> Andrew Jones (1):
>   e1000: cleanup process_tx_desc
>
> David Gibson (10):
>   pci: Cleanup configuration for pci-hotplug.c
>   pci: Move pci_read_devaddr to pci-hotplug-old.c
>   pci: Abolish pci_find_root_bus()
>   pci: Use helper to find device's root bus in pci_find_domain()
>   pci: Replace pci_find_domain() with more general pci_root_bus_path()
>   pci: Add root bus argument to pci_get_bus_devfn()
>   pci: Add root bus parameter to pci_nic_init()
>   pci: Simpler implementation of primary PCI bus
>   pci: Remove domain from PCIHostBus
>   pci: Fold host_buses list into PCIHostState functionality
>
> Michael S. Tsirkin (9):
>   range: add Range structure
>   pci: store PCI hole ranges in guestinfo structure
>   pc: pass PCI hole ranges to Guests
>   pc_piix: cleanup init compat handling
>   kvm: zero-initialize KVM_SET_GSI_ROUTING input
>   kvm: skip system call when msi route is unchanged
>   MAINTAINERS: s/Marcelo/Paolo/

Shouldn't these be coming through the uq/master tree?  I haven't see a
pull for uq/master in a long time.  Does that tree still exist?

Would that be coming from Paolo or Gleb?  Can one of ya'll ack these
changes please.

Regards,

Anthony Liguori

>   pvpanic: initialization cleanup
>   pvpanic: fix fwcfg for big endian hosts
>
>  MAINTAINERS |   2 +-
>  QMP/qmp-events.txt  |  17 
>  default-configs/i386-softmmu.mak|   3 +-
>  default-configs/ppc64-softmmu.mak   |   2 -
>  default-configs/x86_64-softmmu.mak  |   3 +-
>  hmp-commands.hx |   4 +-
>  hw/alpha/dp264.c|   2 +-
>  hw/arm/realview.c   |   6 +-
>  hw/arm/versatilepb.c|   2 +-
>  hw/i386/pc.c|  74 ++-
>  hw/i386/pc_piix.c   |  40 +---
>  hw/i386/pc_q35.c|  18 +++-
>  hw/mips/mips_fulong2e.c |   6 +-
>  hw/mips/mips_malta.c|   6 +-
>  hw/misc/pvpanic.c   |  31 ---
>  hw/net/e1000.c  |  18 ++--
>  hw/net/virtio-net.c | 111 ++
>  hw/pci-host/piix.c  |   9 ++
>  hw/pci-host/q35.c   |  17 
>  hw/pci/Makefile.objs|   2 +-
>  hw/pci/{pci-hotplug.c => pci-hotplug-old.c} |  75 ---
>  hw/pci/pci.c| 137 
> ++--
>  hw/pci/pci_host.c   |   1 +
>  hw/pci/pcie_aer.c   |   9 +-
>  hw/ppc/e500.c   |   2 +-
>  hw/ppc/mac_newworld.c   |   2 +-
>  hw/ppc/mac_oldworld.c   |   2 +-
>  hw/ppc/ppc440_bamboo.c  |   2 +-
>  hw/ppc/prep.c   |   2 +-
>  hw/ppc/spapr.c  |   2 +-
>  hw/ppc/spapr_pci.c  |  10 ++
>  hw/sh4/r2d.c|   5 +-
>  hw/sparc64/sun4u.c  |   2 +-
>  include/hw/i386/pc.h|  22 -
>  include/hw/pci-host/q35.h   |   2 +
>  include/hw/pci/pci.h|  17 ++--
>  include/hw/pci/pci_host.h   |  12 +++
>  include/monitor/monitor.h   |   1 +
>  include/net/net.h   |   3 +
>  include/qemu/range.h|  16 
>  include/qemu/typedef

Re: [Qemu-devel] [PATCH 02/12] qtest: add spapr hypercall support

2013-06-20 Thread Anthony Liguori

Alexander Graf  writes:

> Am 20.06.2013 um 17:42 schrieb Anthony Liguori :
>
>> Andreas Färber  writes:
>> 
>>> Am 19.06.2013 22:40, schrieb Anthony Liguori:
 Signed-off-by: Anthony Liguori 
 ---
 qtest.c  | 29 +
 tests/libqtest.c | 18 ++
 tests/libqtest.h | 46 ++
 3 files changed, 93 insertions(+)
 
 diff --git a/qtest.c b/qtest.c
 index 07a9612..f8c8f44 100644
 --- a/qtest.c
 +++ b/qtest.c
 @@ -19,6 +19,9 @@
 #include "hw/irq.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/cpus.h"
 +#ifdef TARGET_PPC64
 +#include "hw/ppc/spapr.h"
 +#endif
 
 #define MAX_IRQ 256
 
 @@ -141,6 +144,13 @@ static bool qtest_opened;
  * where NUM is an IRQ number.  For the PC, interrupts can be intercepted
  * simply with "irq_intercept_in ioapic" (note that IRQ0 comes out with
  * NUM=0 even though it is remapped to GSI 2).
 + *
 + * Platform specific (sPAPR):
 + *
 + *  > papr_hypercall NR ARG0 ARG1 ... ARG8
>>> 
>>> The functions are called spapr_hcall*() but the protocol uses
>>> papr_hypercall?
>> 
>> The discrepancy is inherited in the KVM vs. QEMU interfaces.  It's
>> called papr_hypercall in the KVM interface vs. spapr in QEMU.
>> 
>> I honestly don't know what the distinction between spapr and papr is.
>
> PAPR is what PAPR calls itself. However, there is also an ePAPR for
> BookE, so in order to distinguish the 2 more easily, we named the
> server version spapr wherever we remembered to.

So does it make sense to have papr_hypercall()?  Do hypercalls exist
with the virtualization extensions on BookE?

Regards,

Anthony Liguori

>
>
> Alex

[Qemu-devel] [PATCH 0/2] iscsi: support for is_allocated and inproved has_zero_init

2013-06-20 Thread Peter Lieven

These two patches add the possibility for qemu-img convert to reliably skip
zero blocks when writing to an iscsi target. 

Peter Lieven (2):
  iscsi: add support for bdrv_co_is_allocated()
  iscsi: add intelligent has_zero_init check

 block/iscsi.c |   82 -
 1 file changed, 81 insertions(+), 1 deletion(-)

-- 
1.7.9.5

[Qemu-devel] [PATCH 2/2] iscsi: add intelligent has_zero_init check

2013-06-20 Thread Peter Lieven

iscsi targets are not created by bdrv_create and thus we cannot
blindly assume that a target is empty. to avoid writing and allocating
blocks of zeroes we now check if all blocks of an existing target
are unallocated and return 1 for bdrv_has_zero_init if the
target is completely unalloacted and unallocated blocks read
as zeroes.

Signed-off-by: Peter Lieven 
---
 block/iscsi.c |   25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index e6b966d..fe41d9a 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -50,6 +50,7 @@ typedef struct IscsiLun {
 int events;
 QEMUTimer *nop_timer;
 uint8_t lbpme;
+uint8_t lbprz;
 } IscsiLun;
 
 typedef struct IscsiAIOCB {
@@ -1004,6 +1005,7 @@ static int iscsi_readcapacity_sync(IscsiLun *iscsilun)
 iscsilun->block_size = rc16->block_length;
 iscsilun->num_blocks = rc16->returned_lba + 1;
 iscsilun->lbpme = rc16->lbpme;
+iscsilun->lbprz = rc16->lbprz;
 }
 }
 break;
@@ -1249,7 +1251,28 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t 
offset)
 
 static int iscsi_has_zero_init(BlockDriverState *bs)
 {
-return 0;
+IscsiLun *iscsilun = bs->opaque;
+uint64_t lba;
+int n, ret, nb_sectors;
+
+if (iscsilun->lbprz == 0) {
+return 0;
+}
+
+for (lba = 0; lba < iscsilun->num_blocks; lba += 1 << 26) {
+nb_sectors = 1 << 26;
+if (lba + nb_sectors > iscsilun->num_blocks) {
+nb_sectors = iscsilun->num_blocks - lba;
+}
+nb_sectors *= (iscsilun->block_size / BDRV_SECTOR_SIZE);
+n = 0;
+ret = iscsi_co_is_allocated(bs, lba, nb_sectors, &n);
+if (ret || n != nb_sectors) {
+return 0;
+}
+}
+
+return 1;
 }
 
 static int iscsi_create(const char *filename, QEMUOptionParameter *options)
-- 
1.7.9.5

Re: [Qemu-devel] [PATCH 02/12] qtest: add spapr hypercall support

2013-06-20 Thread Alexander Graf



Am 20.06.2013 um 17:42 schrieb Anthony Liguori :

> Andreas Färber  writes:
> 
>> Am 19.06.2013 22:40, schrieb Anthony Liguori:
>>> Signed-off-by: Anthony Liguori 
>>> ---
>>> qtest.c  | 29 +
>>> tests/libqtest.c | 18 ++
>>> tests/libqtest.h | 46 ++
>>> 3 files changed, 93 insertions(+)
>>> 
>>> diff --git a/qtest.c b/qtest.c
>>> index 07a9612..f8c8f44 100644
>>> --- a/qtest.c
>>> +++ b/qtest.c
>>> @@ -19,6 +19,9 @@
>>> #include "hw/irq.h"
>>> #include "sysemu/sysemu.h"
>>> #include "sysemu/cpus.h"
>>> +#ifdef TARGET_PPC64
>>> +#include "hw/ppc/spapr.h"
>>> +#endif
>>> 
>>> #define MAX_IRQ 256
>>> 
>>> @@ -141,6 +144,13 @@ static bool qtest_opened;
>>>  * where NUM is an IRQ number.  For the PC, interrupts can be intercepted
>>>  * simply with "irq_intercept_in ioapic" (note that IRQ0 comes out with
>>>  * NUM=0 even though it is remapped to GSI 2).
>>> + *
>>> + * Platform specific (sPAPR):
>>> + *
>>> + *  > papr_hypercall NR ARG0 ARG1 ... ARG8
>> 
>> The functions are called spapr_hcall*() but the protocol uses
>> papr_hypercall?
> 
> The discrepancy is inherited in the KVM vs. QEMU interfaces.  It's
> called papr_hypercall in the KVM interface vs. spapr in QEMU.
> 
> I honestly don't know what the distinction between spapr and papr is.

PAPR is what PAPR calls itself. However, there is also an ePAPR for BookE, so 
in order to distinguish the 2 more easily, we named the server version spapr 
wherever we remembered to.


Alex

[Qemu-devel] [PATCH 1/2] iscsi: add support for bdrv_co_is_allocated()

2013-06-20 Thread Peter Lieven

Signed-off-by: Peter Lieven 
---
 block/iscsi.c |   57 +
 1 file changed, 57 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 0bbf0b1..e6b966d 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -49,6 +49,7 @@ typedef struct IscsiLun {
 uint64_t num_blocks;
 int events;
 QEMUTimer *nop_timer;
+uint8_t lbpme;
 } IscsiLun;
 
 typedef struct IscsiAIOCB {
@@ -800,6 +801,60 @@ iscsi_getlength(BlockDriverState *bs)
 return len;
 }
 
+static int coroutine_fn iscsi_co_is_allocated(BlockDriverState *bs,
+  int64_t sector_num,
+  int nb_sectors, int *pnum)
+{
+IscsiLun *iscsilun = bs->opaque;
+struct scsi_task *task = NULL;
+struct scsi_get_lba_status *lbas = NULL;
+struct scsi_lba_status_descriptor *lbasd = NULL;
+int ret;
+
+*pnum = nb_sectors;
+
+if (iscsilun->lbpme == 0) {
+return 1;
+}
+
+/* in-flight requests could invalidate the lba status result */
+while (iscsi_process_flush(iscsilun)) {
+qemu_aio_wait();
+}
+
+task = iscsi_get_lba_status_sync(iscsilun->iscsi, iscsilun->lun,
+ sector_qemu2lun(sector_num, iscsilun),
+ 8+16);
+
+if (task == NULL || task->status != SCSI_STATUS_GOOD) {
+scsi_free_scsi_task(task);
+return 1;
+}
+
+lbas = scsi_datain_unmarshall(task);
+if (lbas == NULL) {
+scsi_free_scsi_task(task);
+return 1;
+}
+
+lbasd = &lbas->descriptors[0];
+
+if (sector_qemu2lun(sector_num, iscsilun) != lbasd->lba) {
+return 1;
+}
+
+*pnum = lbasd->num_blocks * (iscsilun->block_size / BDRV_SECTOR_SIZE);
+if (*pnum > nb_sectors) {
+*pnum = nb_sectors;
+}
+
+ret = (lbasd->provisioning == SCSI_PROVISIONING_TYPE_MAPPED) ? 1 : 0;
+
+scsi_free_scsi_task(task);
+
+return ret;
+}
+
 static int parse_chap(struct iscsi_context *iscsi, const char *target)
 {
 QemuOptsList *list;
@@ -948,6 +1003,7 @@ static int iscsi_readcapacity_sync(IscsiLun *iscsilun)
 } else {
 iscsilun->block_size = rc16->block_length;
 iscsilun->num_blocks = rc16->returned_lba + 1;
+iscsilun->lbpme = rc16->lbpme;
 }
 }
 break;
@@ -1274,6 +1330,7 @@ static BlockDriver bdrv_iscsi = {
 
 .bdrv_aio_discard = iscsi_aio_discard,
 .bdrv_has_zero_init = iscsi_has_zero_init,
+.bdrv_co_is_allocated = iscsi_co_is_allocated,
 
 #ifdef __linux__
 .bdrv_ioctl   = iscsi_ioctl,
-- 
1.7.9.5

[Qemu-devel] [PATCH 2/2] iscsi: add intelligent has_zero_init check

2013-06-20 Thread Peter Lieven

iscsi targets are not created by bdrv_create and thus we cannot
blindly assume that a target is empty. to avoid writing and allocating
blocks of zeroes we now check if all blocks of an existing target
are unallocated and return 1 for bdrv_has_zero_init if the
target is completely unalloacted and unallocated blocks read
as zeroes.

Signed-off-by: Peter Lieven 
---
 block/iscsi.c |   25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index e6b966d..fe41d9a 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -50,6 +50,7 @@ typedef struct IscsiLun {
 int events;
 QEMUTimer *nop_timer;
 uint8_t lbpme;
+uint8_t lbprz;
 } IscsiLun;
 
 typedef struct IscsiAIOCB {
@@ -1004,6 +1005,7 @@ static int iscsi_readcapacity_sync(IscsiLun *iscsilun)
 iscsilun->block_size = rc16->block_length;
 iscsilun->num_blocks = rc16->returned_lba + 1;
 iscsilun->lbpme = rc16->lbpme;
+iscsilun->lbprz = rc16->lbprz;
 }
 }
 break;
@@ -1249,7 +1251,28 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t 
offset)
 
 static int iscsi_has_zero_init(BlockDriverState *bs)
 {
-return 0;
+IscsiLun *iscsilun = bs->opaque;
+uint64_t lba;
+int n, ret, nb_sectors;
+
+if (iscsilun->lbprz == 0) {
+return 0;
+}
+
+for (lba = 0; lba < iscsilun->num_blocks; lba += 1 << 26) {
+nb_sectors = 1 << 26;
+if (lba + nb_sectors > iscsilun->num_blocks) {
+nb_sectors = iscsilun->num_blocks - lba;
+}
+nb_sectors *= (iscsilun->block_size / BDRV_SECTOR_SIZE);
+n = 0;
+ret = iscsi_co_is_allocated(bs, lba, nb_sectors, &n);
+if (ret || n != nb_sectors) {
+return 0;
+}
+}
+
+return 1;
 }
 
 static int iscsi_create(const char *filename, QEMUOptionParameter *options)
-- 
1.7.9.5

[Qemu-devel] [PATCH 0/2] iscsi: support for is_allocated and inproved has_zero_init

2013-06-20 Thread Peter Lieven

These two patches add the possibility for qemu-img convert to reliably skip
zero blocks when writing to an iscsi target. 

Peter Lieven (2):
  iscsi: add support for bdrv_co_is_allocated()
  iscsi: add intelligent has_zero_init check

 block/iscsi.c |   82 -
 1 file changed, 81 insertions(+), 1 deletion(-)

-- 
1.7.9.5

[Qemu-devel] [PATCH 05/12] nbd: pass export name as init argument

2013-06-20 Thread Marc-André Lureau

There is no need to keep the export name around, and it seems a better
fit as an argument in the init() call.

Signed-off-by: Marc-André Lureau 
---
 block/nbd-client.c |  8 
 block/nbd-client.h |  5 ++---
 block/nbd.c| 13 -
 3 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 6d5f39c..b7eea21 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -353,15 +353,15 @@ static void nbd_teardown_connection(NbdClientSession 
*client)
 void nbd_client_session_close(NbdClientSession *client)
 {
 nbd_teardown_connection(client);
-g_free(client->export_name);
 }
 
-int nbd_client_session_init(NbdClientSession *client,
-BlockDriverState *bs, int sock)
+int nbd_client_session_init(NbdClientSession *client, BlockDriverState *bs,
+int sock, const char *export)
 {
 int ret;
 
-ret = nbd_receive_negotiate(sock, client->export_name,
+logout("session init %s\n", export);
+ret = nbd_receive_negotiate(sock, export,
 &client->nbdflags, &client->size,
 &client->blocksize);
 if (ret < 0) {
diff --git a/block/nbd-client.h b/block/nbd-client.h
index c1a7871..9c5246b 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -31,14 +31,13 @@ typedef struct NbdClientSession {
 Coroutine *recv_coroutine[MAX_NBD_REQUESTS];
 struct nbd_reply reply;
 
-char *export_name; /* An NBD server may export several devices */
 bool is_unix;
 
 BlockDriverState *bs;
 } NbdClientSession;
 
-int nbd_client_session_init(NbdClientSession *client,
-BlockDriverState *bs, int sock);
+int nbd_client_session_init(NbdClientSession *client, BlockDriverState *bs,
+int sock, const char *export_name);
 void nbd_client_session_close(NbdClientSession *client);
 
 int nbd_client_session_co_discard(NbdClientSession *client, int64_t sector_num,
diff --git a/block/nbd.c b/block/nbd.c
index 79ba0a6..18c3b78 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -179,7 +179,7 @@ out:
 g_free(file);
 }
 
-static int nbd_config(BDRVNBDState *s, QDict *options)
+static int nbd_config(BDRVNBDState *s, QDict *options, char **export)
 {
 Error *local_err = NULL;
 
@@ -209,8 +209,8 @@ static int nbd_config(BDRVNBDState *s, QDict *options)
 qemu_opt_set_number(s->socket_opts, "port", NBD_DEFAULT_PORT);
 }
 
-s->client.export_name = g_strdup(qdict_get_try_str(options, "export"));
-if (s->client.export_name) {
+*export = g_strdup(qdict_get_try_str(options, "export"));
+if (*export) {
 qdict_del(options, "export");
 }
 
@@ -243,10 +243,11 @@ static int nbd_establish_connection(BlockDriverState *bs)
 static int nbd_open(BlockDriverState *bs, QDict *options, int flags)
 {
 BDRVNBDState *s = bs->opaque;
+char *export = NULL;
 int result, sock;
 
 /* Pop the config into our state object. Exit if invalid. */
-result = nbd_config(s, options);
+result = nbd_config(s, options, &export);
 if (result != 0) {
 return result;
 }
@@ -260,7 +261,9 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags)
 }
 
 /* NBD handshake */
-return nbd_client_session_init(&s->client, bs, sock);
+result = nbd_client_session_init(&s->client, bs, sock, export);
+g_free(export);
+return result;
 }
 
 static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
-- 
1.8.3.rc1.49.g8d97506

[Qemu-devel] [PATCH 09/12] block: add "snapshot.size" option to avoid extra bdrv_open()

2013-06-20 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
---
 block.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 5db8fa1..b421083 100644
--- a/block.c
+++ b/block.c
@@ -1046,20 +1046,25 @@ int bdrv_open(BlockDriverState *bs, const char 
*filename, QDict *options,
 BlockDriverState *bs1;
 int64_t total_size;
 
+total_size = qdict_get_try_int(options, "snapshot.size", -1);
+qdict_del(options, "snapshot.size");
+
 if (qdict_size(options) != 0) {
 error_report("Can't use snapshot=on with driver-specific options");
 ret = -EINVAL;
 goto fail;
 }
 
-bs1 = bdrv_new_int("", NULL);
-ret = bdrv_open(bs1, filename, NULL, 0, drv);
-if (ret < 0) {
+if (total_size == -1) {
+bs1 = bdrv_new_int("", NULL);
+ret = bdrv_open(bs1, filename, NULL, 0, drv);
+if (ret < 0) {
+bdrv_delete(bs1);
+goto fail;
+}
+total_size = bdrv_getlength(bs1);
 bdrv_delete(bs1);
-goto fail;
 }
-total_size = bdrv_getlength(bs1);
-bdrv_delete(bs1);
 
 ret = make_snapshot(bs, total_size, &filename, &drv);
 if (ret < 0) {
-- 
1.8.3.rc1.49.g8d97506

[Qemu-devel] [PATCH 01/12] include: add missing config-host.h include

2013-06-20 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
---
 include/ui/qemu-spice.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/ui/qemu-spice.h b/include/ui/qemu-spice.h
index eba6d77..a92b2cf 100644
--- a/include/ui/qemu-spice.h
+++ b/include/ui/qemu-spice.h
@@ -18,6 +18,8 @@
 #ifndef QEMU_SPICE_H
 #define QEMU_SPICE_H
 
+#include "config-host.h"
+
 #ifdef CONFIG_SPICE
 
 #include 
-- 
1.8.3.rc1.49.g8d97506

[Qemu-devel] [PATCH 02/12] char: add qemu_chr_fe_event()

2013-06-20 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
---
 include/sysemu/char.h | 10 ++
 qemu-char.c   |  7 +++
 spice-qemu-char.c | 10 ++
 3 files changed, 27 insertions(+)

diff --git a/include/sysemu/char.h b/include/sysemu/char.h
index 066c216..eee70fe 100644
--- a/include/sysemu/char.h
+++ b/include/sysemu/char.h
@@ -69,6 +69,7 @@ struct CharDriverState {
 void (*chr_accept_input)(struct CharDriverState *chr);
 void (*chr_set_echo)(struct CharDriverState *chr, bool echo);
 void (*chr_set_fe_open)(struct CharDriverState *chr, int fe_open);
+void (*chr_fe_event)(struct CharDriverState *chr, int event);
 void *opaque;
 char *label;
 char *filename;
@@ -136,6 +137,15 @@ void qemu_chr_fe_set_echo(struct CharDriverState *chr, 
bool echo);
 void qemu_chr_fe_set_open(struct CharDriverState *chr, int fe_open);
 
 /**
+ * @qemu_chr_fe_event:
+ *
+ * Send an event from the back end to the front end.
+ *
+ * @event the event to send
+ */
+void qemu_chr_fe_event(CharDriverState *s, int event);
+
+/**
  * @qemu_chr_fe_printf:
  *
  * Write to a character backend using a printf style interface.
diff --git a/qemu-char.c b/qemu-char.c
index 2c3cfe6..14e268e 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -3317,6 +3317,13 @@ void qemu_chr_fe_set_open(struct CharDriverState *chr, 
int fe_open)
 }
 }
 
+void qemu_chr_fe_event(struct CharDriverState *chr, int event)
+{
+if (chr->chr_fe_event) {
+chr->chr_fe_event(chr, event);
+}
+}
+
 int qemu_chr_fe_add_watch(CharDriverState *s, GIOCondition cond,
   GIOFunc func, void *user_data)
 {
diff --git a/spice-qemu-char.c b/spice-qemu-char.c
index 6d147a7..0d77f77 100644
--- a/spice-qemu-char.c
+++ b/spice-qemu-char.c
@@ -223,6 +223,15 @@ static void spice_chr_set_fe_open(struct CharDriverState 
*chr, int fe_open)
 }
 }
 
+static void spice_chr_fe_event(struct CharDriverState *chr, int event)
+{
+#if SPICE_SERVER_VERSION >= 0x000c02
+SpiceCharDriver *s = chr->opaque;
+
+spice_server_port_event(&s->sin, event);
+#endif
+}
+
 static void print_allowed_subtypes(void)
 {
 const char** psubtype;
@@ -256,6 +265,7 @@ static CharDriverState *chr_open(const char *subtype)
 chr->chr_close = spice_chr_close;
 chr->chr_set_fe_open = spice_chr_set_fe_open;
 chr->explicit_be_open = true;
+chr->chr_fe_event = spice_chr_fe_event;
 
 QLIST_INSERT_HEAD(&spice_chars, s, next);
 
-- 
1.8.3.rc1.49.g8d97506

[Qemu-devel] [PATCH 07/12] block: save the associated child in BlockDriverState

2013-06-20 Thread Marc-André Lureau

This allows the Spice block driver to eject the associated device.

Signed-off-by: Marc-André Lureau 
---
 block.c   | 46 +-
 include/block/block_int.h |  1 +
 2 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/block.c b/block.c
index b88ad2f..f502eed 100644
--- a/block.c
+++ b/block.c
@@ -294,7 +294,8 @@ void bdrv_register(BlockDriver *bdrv)
 }
 
 /* create a new block device (by default it is empty) */
-BlockDriverState *bdrv_new(const char *device_name)
+static BlockDriverState *bdrv_new_int(const char *device_name,
+BlockDriverState *child)
 {
 BlockDriverState *bs;
 
@@ -305,10 +306,16 @@ BlockDriverState *bdrv_new(const char *device_name)
 }
 bdrv_iostatus_disable(bs);
 notifier_list_init(&bs->close_notifiers);
+bs->child = child;
 
 return bs;
 }
 
+BlockDriverState *bdrv_new(const char *device_name)
+{
+return bdrv_new_int(device_name, NULL);
+}
+
 void bdrv_add_close_notifier(BlockDriverState *bs, Notifier *notify)
 {
 notifier_list_add(&bs->close_notifiers, notify);
@@ -769,16 +776,8 @@ free_and_fail:
 return ret;
 }
 
-/*
- * Opens a file using a protocol (file, host_device, nbd, ...)
- *
- * options is a QDict of options to pass to the block drivers, or NULL for an
- * empty set of options. The reference to the QDict belongs to the block layer
- * after the call (even on failure), so if the caller intends to reuse the
- * dictionary, it needs to use QINCREF() before calling bdrv_file_open.
- */
-int bdrv_file_open(BlockDriverState **pbs, const char *filename,
-   QDict *options, int flags)
+static int bdrv_file_open_int(BlockDriverState **pbs, const char *filename,
+QDict *options, int flags, BlockDriverState *child)
 {
 BlockDriverState *bs;
 BlockDriver *drv;
@@ -790,7 +789,7 @@ int bdrv_file_open(BlockDriverState **pbs, const char 
*filename,
 options = qdict_new();
 }
 
-bs = bdrv_new("");
+bs = bdrv_new_int("", child);
 bs->options = options;
 options = qdict_clone_shallow(options);
 
@@ -873,6 +872,20 @@ fail:
 }
 
 /*
+ * Opens a file using a protocol (file, host_device, nbd, ...)
+ *
+ * options is a QDict of options to pass to the block drivers, or NULL for an
+ * empty set of options. The reference to the QDict belongs to the block layer
+ * after the call (even on failure), so if the caller intends to reuse the
+ * dictionary, it needs to use QINCREF() before calling bdrv_file_open.
+ */
+int bdrv_file_open(BlockDriverState **pbs, const char *filename,
+   QDict *options, int flags)
+{
+return bdrv_file_open_int(pbs, filename, options, flags, NULL);
+}
+
+/*
  * Opens the backing file for a BlockDriverState if not yet open
  *
  * options is a QDict of options to pass to the block drivers, or NULL for an
@@ -904,7 +917,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options)
 return 0;
 }
 
-bs->backing_hd = bdrv_new("");
+bs->backing_hd = bdrv_new_int("", bs);
 bdrv_get_full_backing_filename(bs, backing_filename,
sizeof(backing_filename));
 
@@ -990,7 +1003,7 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
QDict *options,
instead of opening 'filename' directly */
 
 /* if there is a backing file, use it */
-bs1 = bdrv_new("");
+bs1 = bdrv_new_int("", bs);
 ret = bdrv_open(bs1, filename, NULL, 0, drv);
 if (ret < 0) {
 bdrv_delete(bs1);
@@ -1043,9 +1056,8 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
QDict *options,
 }
 
 extract_subqdict(options, &file_options, "file.");
-
-ret = bdrv_file_open(&file, filename, file_options,
- bdrv_open_flags(bs, flags));
+ret = bdrv_file_open_int(&file, filename, file_options,
+ bdrv_open_flags(bs, flags), bs);
 if (ret < 0) {
 goto fail;
 }
diff --git a/include/block/block_int.h b/include/block/block_int.h
index ba52247..9c72b32 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -245,6 +245,7 @@ struct BlockDriverState {
 
 BlockDriverState *backing_hd;
 BlockDriverState *file;
+BlockDriverState *child;
 
 NotifierList close_notifiers;
 
-- 
1.8.3.rc1.49.g8d97506

[Qemu-devel] [PATCH 11/12] block: allow to call bdrv_open() with an opaque

2013-06-20 Thread Marc-André Lureau

If the block driver already has a bs->opaque when calling bdrv_open(),
pass it down to the file driver.

Signed-off-by: Marc-André Lureau 
---
 block.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index bdffb42..ff9cb0b 100644
--- a/block.c
+++ b/block.c
@@ -1041,6 +1041,7 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
QDict *options,
 int ret;
 BlockDriverState *file = NULL;
 QDict *file_options = NULL;
+void *backing_opaque = NULL;
 
 /* NULL means an empty set of options */
 if (options == NULL) {
@@ -1064,6 +1065,8 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
QDict *options,
 goto fail;
 }
 
+backing_opaque = bs->opaque;
+bs->opaque = NULL;
 if (total_size == -1) {
 bs1 = bdrv_new_int("", NULL, NULL);
 ret = bdrv_open(bs1, filename, NULL, 0, drv);
@@ -1088,7 +1091,8 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
QDict *options,
 
 extract_subqdict(options, &file_options, "file.");
 ret = bdrv_file_open_int(&file, filename, file_options,
- bdrv_open_flags(bs, flags), bs, NULL);
+ bdrv_open_flags(bs, flags), bs, bs->opaque);
+bs->opaque = NULL;
 if (ret < 0) {
 goto fail;
 }
@@ -1118,7 +1122,7 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
QDict *options,
 QDict *backing_options;
 
 extract_subqdict(options, &backing_options, "backing.");
-ret = bdrv_open_backing_file_int(bs, backing_options, NULL);
+ret = bdrv_open_backing_file_int(bs, backing_options, backing_opaque);
 if (ret < 0) {
 goto close_and_fail;
 }
-- 
1.8.3.rc1.49.g8d97506

[Qemu-devel] [PATCH 00/12] RFC: add Spice block device

2013-06-20 Thread Marc-André Lureau

Hi,

The following patch series implement a Spice block device, which
allows the client to redirect a block device using the NBD protocol,
which greatly simplifies the Spice code by reusing an existing
protocol, and allows sharing existing qemu NBD implementation.

This block device driver is a bit special, since it is successfully
initialized with size 0, and once the client is connected (or want to
change block device) it re-opens itself. For this to work, we allow a
block driver to be open with an existing opaque data.

The backend only support read-only device atm (although it shouldn't
be hard to add write support if necessary). Migration hasn't been
tested yet.

Usage with a CDROM drive:
 -device ide-cd,drive=cd -drive if=none,id=cd,readonly,file=spicebd:

The associated server and client bits are:
http://lists.freedesktop.org/archives/spice-devel/2013-June/013608.html
http://lists.freedesktop.org/archives/spice-devel/2013-June/013609.html
http://lists.freedesktop.org/archives/spice-devel/2013-June/013610.html

Marc-André Lureau (12):
  include: add missing config-host.h include
  char: add qemu_chr_fe_event()
  nbd: don't change socket block during negotiate
  Split nbd block client code
  nbd: pass export name as init argument
  nbd: make session_close() idempotent
  block: save the associated child in BlockDriverState
  block: extract make_snapshot() from bdrv_open()
  block: add "snapshot.size" option to avoid extra bdrv_open()
  block: learn to open a driver with a given opaque
  block: allow to call bdrv_open() with an opaque
  block: add spice block device backend

 block.c   | 191 ++---
 block/Makefile.objs   |   3 +-
 block/nbd-client.c| 391 ++
 block/nbd-client.h|  51 +
 block/nbd.c   | 394 --
 block/spice.c | 523 ++
 include/block/block_int.h |   1 +
 include/sysemu/char.h |  10 +
 include/ui/qemu-spice.h   |   2 +
 nbd.c |   1 -
 qemu-char.c   |   7 +
 spice-qemu-char.c |  10 +
 12 files changed, 1151 insertions(+), 433 deletions(-)
 create mode 100644 block/nbd-client.c
 create mode 100644 block/nbd-client.h
 create mode 100644 block/spice.c

-- 
1.8.3.rc1.49.g8d97506

Re: [Qemu-devel] [PATCH v2] target-arm: implement LDA/STL instructions

2013-06-20 Thread Peter Maydell

On 17 June 2013 17:50, Mans Rullgard  wrote:
> This adds support for the ARMv8 load acquire/store release instructions.
> Since qemu does nothing special for memory barriers, these can be
> emulated like their non-acquire/release counterparts.

Couple more minor issues, otherwise looks good.

>  addr = tcg_temp_local_new_i32();
>  load_reg_var(s, addr, rn);
> -if (insn & (1 << 20)) {
> +
> +/* Since the emulation does not have barriers,
> +   the acquire/release semantics need no special
> +   handling */
> +if (op2 == 0) {
> +tmp = tcg_temp_new_i32();

This line needs to go inside the next if(), otherwise we leak a temp
in the stl/stlb/stlh case. (load_reg() does a temp_new for you, so
you need to pair temp_new/store_reg and load_reg/temp_free.)

> +if (insn & (1 << 20)) {
> +switch (op1) {
> +case 0: /* lda */
> +tcg_gen_qemu_ld32u(tmp, addr, 
> IS_USER(s));
> +break;
> +case 2: /* ldab */
> +tcg_gen_qemu_ld8u(tmp, addr, IS_USER(s));
> +break;
> +case 3: /* ldah */
> +tcg_gen_qemu_ld16u(tmp, addr, 
> IS_USER(s));
> +break;
> +default:
> +abort();
> +}
> +store_reg(s, rd, tmp);
> +} else {
> +rm = insn & 0xf;
> +tmp = load_reg(s, rm);
> +switch (op1) {
> +case 0: /* stl */
> +tcg_gen_qemu_st32(tmp, addr, IS_USER(s));
> +break;
> +case 2: /* stlb */
> +tcg_gen_qemu_st8(tmp, addr, IS_USER(s));
> +break;
> +case 3: /* stlh */
> +tcg_gen_qemu_st16(tmp, addr, IS_USER(s));
> +break;
> +default:
> +abort();
> +}
> +tcg_temp_free_i32(tmp);
> +}
> +} else if (insn & (1 << 20)) {
>  switch (op1) {
>  case 0: /* ldrex */
>  gen_load_exclusive(s, rd, 15, addr, 2);

> @@ -8152,15 +8210,63 @@ static int disas_thumb2_insn(CPUARMState *env, 
> DisasContext *s, uint16_t insn_hw
>  tcg_gen_addi_i32(tmp, tmp, s->pc);
>  store_reg(s, 15, tmp);
>  } else {
> -/* Load/store exclusive byte/halfword/doubleword.  */
> -ARCH(7);
> +int op2 = (insn >> 6) & 0x3;
>  op = (insn >> 4) & 0x3;
> -if (op == 2) {
> +switch (op2) {
> +case 0:
>  goto illegal_op;
> +case 1:
> +/* Load/store exclusive byte/halfword/doubleword */
> +ARCH(7);
> +break;
> +case 2:
> +/* Load-acquire/store-release */
> +if (op == 3) {
> +goto illegal_op;
> +}
> +/* Fall through */
> +case 3:
> +/* Load-acquire/store-release exclusive */
> +ARCH(8);
> +break;
>  }

This change has lost the check for op==2 being illegal in the
load/store exclusive case (ie case op2==1).

>  addr = tcg_temp_local_new_i32();
>  load_reg_var(s, addr, rn);
> -if (insn & (1 << 20)) {
> +if (!(op2 & 1)) {
> +tmp = tcg_temp_new_i32();

This needs to be inside the following if(), otherwise we leak
a temp in the stlb/stlh/stl case.

> +if (insn & (1 << 20)) {
> +switch (op) {
> +case 0: /* ldab */
> +tcg_gen_qemu_ld8u(tmp, addr, IS_USER(s));
> +break;
> +case 1: /* ldah */
> +tcg_gen_qemu_ld16u(tmp, addr, IS_USER(s));
> +break;
> +case 2: /* lda */

[Qemu-devel] [PATCH 06/12] nbd: make session_close() idempotent

2013-06-20 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
---
 block/nbd-client.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index b7eea21..c49be30 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -352,7 +352,12 @@ static void nbd_teardown_connection(NbdClientSession 
*client)
 
 void nbd_client_session_close(NbdClientSession *client)
 {
+if (!client->bs) {
+return;
+}
+
 nbd_teardown_connection(client);
+client->bs = NULL;
 }
 
 int nbd_client_session_init(NbdClientSession *client, BlockDriverState *bs,
-- 
1.8.3.rc1.49.g8d97506

[Qemu-devel] [PATCH 03/12] nbd: don't change socket block during negotiate

2013-06-20 Thread Marc-André Lureau

The caller might handle non-blocking using coroutine. Leave the choice
to the caller to use a blocking or non-blocking noegotiate.

Signed-off-by: Marc-André Lureau 
---
 nbd.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/nbd.c b/nbd.c
index 2606403..2f8c946 100644
--- a/nbd.c
+++ b/nbd.c
@@ -442,7 +442,6 @@ int nbd_receive_negotiate(int csock, const char *name, 
uint32_t *flags,
 
 TRACE("Receiving negotiation.");
 
-qemu_set_block(csock);
 rc = -EINVAL;
 
 if (read_sync(csock, buf, 8) != 8) {
-- 
1.8.3.rc1.49.g8d97506

Re: [Qemu-devel] Virtio-Balloon : config_set_size

2013-06-20 Thread Luiz Capitulino

On Thu, 20 Jun 2013 12:49:17 +0800
Saptarshi Sen  wrote:

> Hi all,
> 
> I am experimenting with the Virtio- balloon driver in qemu.
> 
> When I set the balloon size to a arbitrary low value. I see
>  the actual value of the balloon set is not what I intended
> but to a level probably decided by the  system.

A few things might be happening there. Maybe the guest is just slow
and is still inflating the balloon when you type 'info balloon'. Or
the guest may be running out of memory and is temporarily unable to
keep inflating the balloon. Finally, if the guest runs out of memory
(because you inflated too much) it may OOPs and then you won't see
any balloon activity anymore.

> I am not able to explain this part who decides on the final
> size of the balloon.
> 
> Another observation each time I do a qmp request to deflate the balloon
> the in the virtio-balloon.c config_set_size function is called. I do not
> understand who calls it and the method of activation

There's no such function in virtio-balloon.c, at least not in latest
git HEAD. Are you referring to virtio_balloon_set_config()? This function
is called when the virtio balloon driver in the guest wants to update
the balloon size.

[Qemu-devel] [PATCH 04/12] Split nbd block client code

2013-06-20 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
---
 block/Makefile.objs |   2 +-
 block/nbd-client.c  | 386 +++
 block/nbd-client.h  |  52 +++
 block/nbd.c | 387 
 4 files changed, 469 insertions(+), 358 deletions(-)
 create mode 100644 block/nbd-client.c
 create mode 100644 block/nbd-client.h

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 2981654..5890b5c 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -10,7 +10,7 @@ block-obj-$(CONFIG_POSIX) += raw-posix.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 
 ifeq ($(CONFIG_POSIX),y)
-block-obj-y += nbd.o sheepdog.o
+block-obj-y += nbd.o nbd-client.o sheepdog.o
 block-obj-$(CONFIG_LIBISCSI) += iscsi.o
 block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
diff --git a/block/nbd-client.c b/block/nbd-client.c
new file mode 100644
index 000..6d5f39c
--- /dev/null
+++ b/block/nbd-client.c
@@ -0,0 +1,386 @@
+/*
+ * QEMU Block driver for  NBD
+ *
+ * Copyright (C) 2008 Bull S.A.S.
+ * Author: Laurent Vivier 
+ *
+ * Some parts:
+ *Copyright (C) 2007 Anthony Liguori 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#include "nbd-client.h"
+#include "qemu/sockets.h"
+
+#define HANDLE_TO_INDEX(bs, handle) ((handle) ^ ((uint64_t)(intptr_t)bs))
+#define INDEX_TO_HANDLE(bs, index)  ((index)  ^ ((uint64_t)(intptr_t)bs))
+
+static void nbd_reply_ready(void *opaque)
+{
+NbdClientSession *s = opaque;
+uint64_t i;
+int ret;
+
+if (s->reply.handle == 0) {
+/* No reply already in flight.  Fetch a header.  It is possible
+ * that another thread has done the same thing in parallel, so
+ * the socket is not readable anymore.
+ */
+ret = nbd_receive_reply(s->sock, &s->reply);
+if (ret == -EAGAIN) {
+return;
+}
+if (ret < 0) {
+s->reply.handle = 0;
+goto fail;
+}
+}
+
+/* There's no need for a mutex on the receive side, because the
+ * handler acts as a synchronization point and ensures that only
+ * one coroutine is called until the reply finishes.  */
+i = HANDLE_TO_INDEX(s, s->reply.handle);
+if (i >= MAX_NBD_REQUESTS) {
+goto fail;
+}
+
+if (s->recv_coroutine[i]) {
+qemu_coroutine_enter(s->recv_coroutine[i], NULL);
+return;
+}
+
+fail:
+for (i = 0; i < MAX_NBD_REQUESTS; i++) {
+if (s->recv_coroutine[i]) {
+qemu_coroutine_enter(s->recv_coroutine[i], NULL);
+}
+}
+}
+
+static void nbd_restart_write(void *opaque)
+{
+NbdClientSession *s = opaque;
+
+qemu_coroutine_enter(s->send_coroutine, NULL);
+}
+
+static int nbd_have_request(void *opaque)
+{
+NbdClientSession *s = opaque;
+
+return s->in_flight > 0;
+}
+
+static int nbd_co_send_request(NbdClientSession *s,
+struct nbd_request *request,
+QEMUIOVector *qiov, int offset)
+{
+int rc, ret;
+
+qemu_co_mutex_lock(&s->send_mutex);
+s->send_coroutine = qemu_coroutine_self();
+qemu_aio_set_fd_handler(s->sock, nbd_reply_ready, nbd_restart_write,
+nbd_have_request, s);
+if (qiov) {
+if (!s->is_unix) {
+socket_set_cork(s->sock, 1);
+}
+rc = nbd_send_request(s->sock, request);
+if (rc >= 0) {
+ret = qemu_co_sendv(s->sock, qiov->iov, qiov->niov,
+offset, request->len);
+if (ret != request->len) {
+rc = -EIO;
+}
+}
+if (!s->is_unix) {
+socket_set_cork(s->sock, 0);
+}
+} else {
+rc = nbd_send_request(s->sock, request);
+}
+qemu_aio_set_fd_handler(s->sock, nbd_reply_ready, NULL,
+nbd_have_request, s);
+s->send_coroutine = NULL;
+qemu_co_mutex_unlock(&s->send_m

[Qemu-devel] [PATCH 12/12] block: add spice block device backend

2013-06-20 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
---
 block/Makefile.objs |   1 +
 block/spice.c   | 523 
 2 files changed, 524 insertions(+)
 create mode 100644 block/spice.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 5890b5c..0170011 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -16,6 +16,7 @@ block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
 block-obj-$(CONFIG_GLUSTERFS) += gluster.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
+common-obj-$(CONFIG_SPICE) += spice.o
 endif
 
 common-obj-y += stream.o
diff --git a/block/spice.c b/block/spice.c
new file mode 100644
index 000..b2c669d
--- /dev/null
+++ b/block/spice.c
@@ -0,0 +1,523 @@
+/*
+ * Spice block backend for QEMU.
+ *
+ * Copyright (C) 2013 Red Hat, Inc.
+ * Author: Marc-André Lureau 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "nbd-client.h"
+#include "ui/qemu-spice.h"
+#include "block/block_int.h"
+#include "qemu/sockets.h"
+#include "qemu/uri.h"
+#include "qapi/qmp/qint.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/char.h"
+
+#ifndef DEBUG_SPICE
+#define DEBUG_SPICE   0
+#endif
+
+#define SOCKET_CHR 0
+#define SOCKET_NBD 1
+
+#define DPRINTF(fmt, ...)   \
+do {\
+if (DEBUG_SPICE) {  \
+fprintf(stderr, "spice: %-15s " fmt "\n",   \
+__func__, ##__VA_ARGS__);   \
+}   \
+} while (0)
+
+typedef struct Buffer {
+uint8_t data[4096];
+uint8_t *p;
+char left;
+} Buffer;
+
+typedef struct BDRVSpiceState {
+BlockDriverState *bs;
+NbdClientSession client;
+char *export;
+
+/* our spicechr-fd pipe */
+int sv[2];
+Buffer readb;
+Buffer writeb;
+
+int aio_count;
+CharDriverState *chr;
+guint chr_watch;
+
+Coroutine *coroutine;
+bool need_read;
+bool need_write;
+bool opened;
+} BDRVSpiceState;
+
+static void nbd_read_handler(void *opaque);
+static void update_chr_handlers(BDRVSpiceState *s);
+
+static int parse_uri(const char *filename, QDict *options, Error **errp)
+{
+URI *uri = NULL;
+
+uri = uri_parse(filename);
+if (!uri) {
+return -EINVAL;
+}
+
+if (strcmp(uri->scheme, "spicebd") != 0) {
+error_setg(errp, "URI scheme must be 'spicebd'");
+goto err;
+}
+
+if (uri->path && *uri->path) {
+qdict_put(options, "export", qstring_from_str(uri->path));
+}
+
+uri_free(uri);
+return 0;
+
+ err:
+if (uri) {
+uri_free(uri);
+}
+return -EINVAL;
+}
+
+static void spice_parse_filename(const char *filename, QDict *options,
+ Error **errp)
+{
+if (qdict_haskey(options, "export")) {
+error_setg(errp, "export cannot be used at the same time "
+   "as a file option");
+return;
+}
+
+parse_uri(filename, options, errp);
+}
+
+static void co_restart(void *opaque)
+{
+BDRVSpiceState *s = opaque;
+
+qemu_coroutine_enter(s->coroutine, NULL);
+}
+
+static void close_socketpair(BDRVSpiceState *s)
+{
+/* this is catching various error paths, and deals with it by
+   closing socketpair, so that both ends can cleanup. It may need
+   more specific error handling. */
+DPRINTF("closing socketpair");
+if (!s->opened) {
+return;
+}
+
+if (s->sv[SOCKET_NBD] >= 0) {
+qemu_aio_set_fd_handler(s->sv[SOCKET_NBD], NULL, NULL, NULL, NULL);
+closesocket(s->sv[SOCKET_NBD]);
+s->sv[SOCKET_NBD] = -1;
+}
+
+if (s->sv[SOCKET_CHR] >= 0) {
+qemu_aio_set_fd_handler(s->sv[SOCKET_CHR], NULL, NULL, NULL, NULL);
+closesocket(s->sv[SOCKET_CHR]);
+s->sv[SOCKET_CHR] = -1;
+}
+
+

[Qemu-devel] [PATCH 08/12] block: extract make_snapshot() from bdrv_open()

2013-06-20 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
---
 block.c | 107 +---
 1 file changed, 62 insertions(+), 45 deletions(-)

diff --git a/block.c b/block.c
index f502eed..5db8fa1 100644
--- a/block.c
+++ b/block.c
@@ -959,6 +959,65 @@ static void extract_subqdict(QDict *src, QDict **dst, 
const char *start)
 }
 }
 
+static int make_snapshot(BlockDriverState *bs, int64_t total_size,
+ const char **pfilename, BlockDriver **pdrv)
+{
+const char *filename = *pfilename;
+BlockDriver *drv = *pdrv;
+int ret;
+BlockDriver *bdrv_qcow2;
+QEMUOptionParameter *create_options;
+char backing_filename[PATH_MAX];
+/* TODO: extra byte is a hack to ensure MAX_PATH space on Windows. */
+char tmp_filename[PATH_MAX + 1];
+
+assert(filename != NULL);
+total_size &= BDRV_SECTOR_MASK;
+
+/* if snapshot, we create a temporary backing file and open it
+   instead of opening 'filename' directly */
+
+ret = get_tmp_filename(tmp_filename, sizeof(tmp_filename));
+if (ret < 0) {
+goto fail;
+}
+
+/* Real path is meaningless for protocols */
+if (path_has_protocol(filename)) {
+snprintf(backing_filename, sizeof(backing_filename),
+ "%s", filename);
+} else if (!realpath(filename, backing_filename)) {
+ret = -errno;
+goto fail;
+}
+
+bdrv_qcow2 = bdrv_find_format("qcow2");
+create_options = parse_option_parameters("", bdrv_qcow2->create_options,
+ NULL);
+
+set_option_parameter_int(create_options, BLOCK_OPT_SIZE, total_size);
+set_option_parameter(create_options, BLOCK_OPT_BACKING_FILE,
+ backing_filename);
+if (drv) {
+set_option_parameter(create_options, BLOCK_OPT_BACKING_FMT,
+ drv->format_name);
+}
+
+ret = bdrv_create(bdrv_qcow2, tmp_filename, create_options);
+free_option_parameters(create_options);
+if (ret < 0) {
+goto fail;
+}
+
+*pfilename = tmp_filename;
+*pdrv = bdrv_qcow2;
+bs->is_temporary = 1;
+return 0;
+
+fail:
+return ret;
+}
+
 /*
  * Opens a disk image (raw, qcow2, vmdk, ...)
  *
@@ -971,8 +1030,6 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
QDict *options,
   int flags, BlockDriver *drv)
 {
 int ret;
-/* TODO: extra byte is a hack to ensure MAX_PATH space on Windows. */
-char tmp_filename[PATH_MAX + 1];
 BlockDriverState *file = NULL;
 QDict *file_options = NULL;
 
@@ -988,66 +1045,26 @@ int bdrv_open(BlockDriverState *bs, const char 
*filename, QDict *options,
 if (flags & BDRV_O_SNAPSHOT) {
 BlockDriverState *bs1;
 int64_t total_size;
-BlockDriver *bdrv_qcow2;
-QEMUOptionParameter *create_options;
-char backing_filename[PATH_MAX];
 
 if (qdict_size(options) != 0) {
 error_report("Can't use snapshot=on with driver-specific options");
 ret = -EINVAL;
 goto fail;
 }
-assert(filename != NULL);
 
-/* if snapshot, we create a temporary backing file and open it
-   instead of opening 'filename' directly */
-
-/* if there is a backing file, use it */
-bs1 = bdrv_new_int("", bs);
+bs1 = bdrv_new_int("", NULL);
 ret = bdrv_open(bs1, filename, NULL, 0, drv);
 if (ret < 0) {
 bdrv_delete(bs1);
 goto fail;
 }
-total_size = bdrv_getlength(bs1) & BDRV_SECTOR_MASK;
-
+total_size = bdrv_getlength(bs1);
 bdrv_delete(bs1);
 
-ret = get_tmp_filename(tmp_filename, sizeof(tmp_filename));
+ret = make_snapshot(bs, total_size, &filename, &drv);
 if (ret < 0) {
 goto fail;
 }
-
-/* Real path is meaningless for protocols */
-if (path_has_protocol(filename)) {
-snprintf(backing_filename, sizeof(backing_filename),
- "%s", filename);
-} else if (!realpath(filename, backing_filename)) {
-ret = -errno;
-goto fail;
-}
-
-bdrv_qcow2 = bdrv_find_format("qcow2");
-create_options = parse_option_parameters("", 
bdrv_qcow2->create_options,
- NULL);
-
-set_option_parameter_int(create_options, BLOCK_OPT_SIZE, total_size);
-set_option_parameter(create_options, BLOCK_OPT_BACKING_FILE,
- backing_filename);
-if (drv) {
-set_option_parameter(create_options, BLOCK_OPT_BACKING_FMT,
-drv->format_name);
-}
-
-ret = bdrv_create(bdrv_qcow2, tmp_filename, create_options);
-free_option_parameters(create_options);
-if (ret < 0) {
-goto fail;
-}
-
-filename = tmp_filename;
-drv = bdrv_qcow2;
-bs->is_temp

[Qemu-devel] [PATCH 10/12] block: learn to open a driver with a given opaque

2013-06-20 Thread Marc-André Lureau

If the block driver is given an opaque data, there is no need to
allocate a new one. This allows to pass an existing driver state to the
new driver.

Signed-off-by: Marc-André Lureau 
---
 block.c | 47 ---
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/block.c b/block.c
index b421083..bdffb42 100644
--- a/block.c
+++ b/block.c
@@ -295,7 +295,7 @@ void bdrv_register(BlockDriver *bdrv)
 
 /* create a new block device (by default it is empty) */
 static BlockDriverState *bdrv_new_int(const char *device_name,
-BlockDriverState *child)
+BlockDriverState *child, void *opaque)
 {
 BlockDriverState *bs;
 
@@ -307,13 +307,14 @@ static BlockDriverState *bdrv_new_int(const char 
*device_name,
 bdrv_iostatus_disable(bs);
 notifier_list_init(&bs->close_notifiers);
 bs->child = child;
+bs->opaque = opaque;
 
 return bs;
 }
 
 BlockDriverState *bdrv_new(const char *device_name)
 {
-return bdrv_new_int(device_name, NULL);
+return bdrv_new_int(device_name, NULL, NULL);
 }
 
 void bdrv_add_close_notifier(BlockDriverState *bs, Notifier *notify)
@@ -729,7 +730,9 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockDriverState *file,
 }
 
 bs->drv = drv;
-bs->opaque = g_malloc0(drv->instance_size);
+if (bs->opaque == NULL) {
+bs->opaque = g_malloc0(drv->instance_size);
+}
 
 bs->enable_write_cache = !!(flags & BDRV_O_CACHE_WB);
 
@@ -777,7 +780,7 @@ free_and_fail:
 }
 
 static int bdrv_file_open_int(BlockDriverState **pbs, const char *filename,
-QDict *options, int flags, BlockDriverState *child)
+QDict *options, int flags, BlockDriverState *child, void *opaque)
 {
 BlockDriverState *bs;
 BlockDriver *drv;
@@ -789,7 +792,7 @@ static int bdrv_file_open_int(BlockDriverState **pbs, const 
char *filename,
 options = qdict_new();
 }
 
-bs = bdrv_new_int("", child);
+bs = bdrv_new_int("", child, opaque);
 bs->options = options;
 options = qdict_clone_shallow(options);
 
@@ -882,18 +885,11 @@ fail:
 int bdrv_file_open(BlockDriverState **pbs, const char *filename,
QDict *options, int flags)
 {
-return bdrv_file_open_int(pbs, filename, options, flags, NULL);
+return bdrv_file_open_int(pbs, filename, options, flags, NULL, NULL);
 }
 
-/*
- * Opens the backing file for a BlockDriverState if not yet open
- *
- * options is a QDict of options to pass to the block drivers, or NULL for an
- * empty set of options. The reference to the QDict is transferred to this
- * function (even on failure), so if the caller intends to reuse the 
dictionary,
- * it needs to use QINCREF() before calling bdrv_file_open.
- */
-int bdrv_open_backing_file(BlockDriverState *bs, QDict *options)
+static int bdrv_open_backing_file_int(BlockDriverState *bs,
+QDict *options, void *opaque)
 {
 char backing_filename[PATH_MAX];
 int back_flags, ret;
@@ -917,7 +913,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options)
 return 0;
 }
 
-bs->backing_hd = bdrv_new_int("", bs);
+bs->backing_hd = bdrv_new_int("", bs, opaque);
 bdrv_get_full_backing_filename(bs, backing_filename,
sizeof(backing_filename));
 
@@ -940,6 +936,19 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options)
 return 0;
 }
 
+/*
+ * Opens the backing file for a BlockDriverState if not yet open
+ *
+ * options is a QDict of options to pass to the block drivers, or NULL for an
+ * empty set of options. The reference to the QDict is transferred to this
+ * function (even on failure), so if the caller intends to reuse the 
dictionary,
+ * it needs to use QINCREF() before calling bdrv_file_open.
+ */
+int bdrv_open_backing_file(BlockDriverState *bs, QDict *options)
+{
+return bdrv_open_backing_file_int(bs, options, NULL);
+}
+
 static void extract_subqdict(QDict *src, QDict **dst, const char *start)
 {
 const QDictEntry *entry, *next;
@@ -1056,7 +1065,7 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
QDict *options,
 }
 
 if (total_size == -1) {
-bs1 = bdrv_new_int("", NULL);
+bs1 = bdrv_new_int("", NULL, NULL);
 ret = bdrv_open(bs1, filename, NULL, 0, drv);
 if (ret < 0) {
 bdrv_delete(bs1);
@@ -1079,7 +1088,7 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
QDict *options,
 
 extract_subqdict(options, &file_options, "file.");
 ret = bdrv_file_open_int(&file, filename, file_options,
- bdrv_open_flags(bs, flags), bs);
+ bdrv_open_flags(bs, flags), bs, NULL);
 if (ret < 0) {
 goto fail;
 }
@@ -1109,7 +1118,7 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
QDict *options,
 QDict *backing_options;
 
 extract_subqdict(options, &backing_options, "backing.");
-ret = bdrv_open

Re: [Qemu-devel] [RFC PATCH 0/4] per-object libraries

2013-06-20 Thread Richard Henderson

On 06/20/2013 02:49 AM, Paolo Bonzini wrote:
> This only leaves Darwin.  I have no idea about that, and I don't have
> anymore a machine to test it.  Andreas or Peter, can you shed light?

I have an idea that -shared works there as well.

r~

Re: [Qemu-devel] [PULL 18/21] pci: Add root bus parameter to pci_nic_init()

2013-06-20 Thread Richard Henderson

On 06/20/2013 06:11 AM, Michael S. Tsirkin wrote:
> From: David Gibson 
> 
> At present, pci_nic_init() and pci_nic_init_nofail() assume that they will
> only create a NIC under the primary PCI root.  As we add support for
> multiple PCI roots, that may no longer be the case.  This patch adds a root
> bus parameter to pci_nic_init() (and updates callers accordingly) to allow
> the machine init code using it to specify the right PCI root for NICs
> created by old-style -net nic parameters.  NICs created new-style, with
> -device can of course be put anywhere.
> 
> Signed-off-by: David Gibson 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  hw/alpha/dp264.c |  2 +-
>  hw/arm/realview.c|  6 --
>  hw/arm/versatilepb.c |  2 +-
>  hw/i386/pc.c |  2 +-
>  hw/mips/mips_fulong2e.c  |  6 +++---
>  hw/mips/mips_malta.c |  6 +++---
>  hw/pci/pci-hotplug-old.c |  3 ++-
>  hw/pci/pci.c | 10 ++
>  hw/ppc/e500.c|  2 +-
>  hw/ppc/mac_newworld.c|  2 +-
>  hw/ppc/mac_oldworld.c|  2 +-
>  hw/ppc/ppc440_bamboo.c   |  2 +-
>  hw/ppc/prep.c|  2 +-
>  hw/ppc/spapr.c   |  2 +-
>  hw/sh4/r2d.c |  5 -
>  hw/sparc64/sun4u.c   |  2 +-
>  include/hw/pci/pci.h |  6 --
>  17 files changed, 36 insertions(+), 26 deletions(-)

Acked-by: Richard Henderson 


r~

Re: [Qemu-devel] [PATCH 1/1] tcg/aarch64: Implement tlb lookup fast path

2013-06-20 Thread Richard Henderson

On 06/20/2013 03:53 AM, Jani Kokkonen wrote:
>  #ifndef _EXEC_ALL_H_
>  #define _EXEC_ALL_H_
> -
>  #include "qemu-common.h"
> -

Whitespace change?

> +/* Load and compare a TLB entry, emitting the conditional jump to the
> +slow path for the failure case, which will be patched later when finalizing
> +the slow pathClobbers X0,X1,X2,X3 and TMP.  */

Indentation.

> +tcg_out_ldst(s, TARGET_LONG_BITS == 64 ? LDST_64 : LDST_32,
> +  LDST_LD, TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff);
> +tcg_out_ldst(s, LDST_64, LDST_LD, TCG_REG_X1, TCG_REG_X2,
> +(tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend) -
> + (is_read ? offsetof(CPUTLBEntry, addr_read) :
> +   offsetof(CPUTLBEntry, addr_write;

I wonder if it wouldn't be clearer to not include the addr_read/write offset in
the passed tlb_offset value.  So more like

  int tlb_offset = offsetof(CPUArchState, tlb_table[mem_index])

  tcg_out_ldst(s, TARGET_LONG_BITS == 64 ? LDST_64 : LDST_32,
   LDST_LD, TCG_REG_X0, TCG_REG_X2,
   (tlb_offset & 0xfff) +
   (is_read ? offsetof(CPUTLBEntry, addr_read)
: offsetof(CPUTLBEntry, addr_write)));
  tcg_out_ldst(s, LDST_64, LDST_LD, TCG_REG_X1, TCG_REG_X2,
   (tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend));

and then in the two callers pass down mem_index instead of tlb_offset.

In addition, the function could use some commentary.


r~

Re: [Qemu-devel] [PATCH 1/1] tcg/aarch64: Implement tlb lookup fast path

2013-06-20 Thread Richard Henderson

On 06/20/2013 07:58 AM, Claudio Fontana wrote:
>> > +tcg_out_ldst(s, TARGET_LONG_BITS == 64 ? LDST_64 : LDST_32,
>> > +  LDST_LD, TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff);
>> > +tcg_out_ldst(s, LDST_64, LDST_LD, TCG_REG_X1, TCG_REG_X2,
>> > +(tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend) -
>> > + (is_read ? offsetof(CPUTLBEntry, addr_read) :
>> > +   offsetof(CPUTLBEntry, addr_write;
>> > +
>> > +tcg_out_cmp(s, 1, TCG_REG_X0, TCG_REG_X3, 0);
>> > +*label_ptr = s->code_ptr;
>> > +tcg_out_goto_cond_noaddr(s, TCG_COND_NE);
>> > +}
> hmm should not the compare and branch actually be before the loading of the 
> addend?
> If we jump to the slow path we don't need to load the addend do we?
> 

No, but it's the slow path, and we don't care if we do extra work.
What's more important is minimizing the memory load delay for the
fast path.


r~

Re: [Qemu-devel] [PATCH 03/12] qtest: return string from QMP commands

2013-06-20 Thread Anthony Liguori

Andreas Färber  writes:

> Am 19.06.2013 22:40, schrieb Anthony Liguori:
>> Signed-off-by: Anthony Liguori 
>> ---
>>  tests/libqtest.c | 16 +---
>>  tests/libqtest.h | 14 +++---
>>  2 files changed, 24 insertions(+), 6 deletions(-)
>> 
>> diff --git a/tests/libqtest.c b/tests/libqtest.c
>> index 81107cf..235ec62 100644
>> --- a/tests/libqtest.c
>> +++ b/tests/libqtest.c
>> @@ -287,10 +287,13 @@ redo:
>>  return words;
>>  }
>>  
>> -void qtest_qmpv(QTestState *s, const char *fmt, va_list ap)
>> +char *qtest_qmpv(QTestState *s, const char *fmt, va_list ap)
>>  {
>>  bool has_reply = false;
>>  int nesting = 0;
>> +GString *ret;
>> +
>> +ret = g_string_new("");
>>  
>>  /* Send QMP request */
>>  socket_sendf(s->qmp_fd, fmt, ap);
>> @@ -319,16 +322,23 @@ void qtest_qmpv(QTestState *s, const char *fmt, 
>> va_list ap)
>>  nesting--;
>>  break;
>>  }
>> +
>> +g_string_append_c(ret, c);
>>  }
>> +
>> +return g_string_free(ret, FALSE);
>>  }
>>  
>> -void qtest_qmp(QTestState *s, const char *fmt, ...)
>> +char *qtest_qmp(QTestState *s, const char *fmt, ...)
>>  {
>>  va_list ap;
>> +char *ret;
>>  
>>  va_start(ap, fmt);
>> -qtest_qmpv(s, fmt, ap);
>> +ret = qtest_qmpv(s, fmt, ap);
>>  va_end(ap);
>> +
>> +return ret;
>>  }
>>  
>>  const char *qtest_get_arch(void)
>> diff --git a/tests/libqtest.h b/tests/libqtest.h
>> index 592f035..5cdcae7 100644
>> --- a/tests/libqtest.h
>> +++ b/tests/libqtest.h
>> @@ -21,6 +21,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  typedef struct QTestState QTestState;
>>  
>> @@ -48,8 +49,10 @@ void qtest_quit(QTestState *s);
>>   * @fmt...: QMP message to send to qemu
>>   *
>>   * Sends a QMP message to QEMU
>> + *
>> + * Returns: the result of the QMP command
>>   */
>> -void qtest_qmp(QTestState *s, const char *fmt, ...);
>> +char *qtest_qmp(QTestState *s, const char *fmt, ...);
>>  
>>  /**
>>   * qtest_qmpv:
>> @@ -58,8 +61,10 @@ void qtest_qmp(QTestState *s, const char *fmt, ...);
>>   * @ap: QMP message arguments
>>   *
>>   * Sends a QMP message to QEMU.
>> + *
>> + * Returns: the result of the QMP command
>>   */
>> -void qtest_qmpv(QTestState *s, const char *fmt, va_list ap);
>> +char *qtest_qmpv(QTestState *s, const char *fmt, va_list ap);
>>  
>>  /**
>>   * qtest_get_irq:
>> @@ -340,10 +345,13 @@ static inline QTestState *qtest_start(const char *args)
>>  static inline void qmp(const char *fmt, ...)
>>  {
>>  va_list ap;
>> +char *ret;
>>  
>>  va_start(ap, fmt);
>> -qtest_qmpv(global_qtest, fmt, ap);
>> +ret = qtest_qmpv(global_qtest, fmt, ap);
>>  va_end(ap);
>> +
>> +g_free(ret);
>>  }
>>  
>>  /**
>
> In http://patchwork.ozlabs.org/patch/207689/ you had suggested to return
> QObject?

It's the Right Thing to do but it's hard.   It's on my list todo and I
don't think gating this series on doing proper QMP integration is the
right thing to do.

Regards,

Anthony Liguori

>
> Regards,
> Andreas
>
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support

2013-06-20 Thread Alex Williamson

On Fri, 2013-06-21 at 00:08 +1000, Alexey Kardashevskiy wrote:
> At the moment QEMU creates a route for every MSI IRQ.
> 
> Now we are about to add IRQFD support on PPC64-pseries platform.
> pSeries already has in-kernel emulated interrupt controller with
> 8192 IRQs. Also, pSeries PHB already supports MSIMessage to IRQ
> mapping as a part of PAPR requirements for MSI/MSIX guests.
> Specifically, the pSeries guest does not touch MSIMessage's at
> all, instead it uses rtas_ibm_change_msi and rtas_ibm_query_interrupt_source
> rtas calls to do the mapping.
> 
> Therefore we do not really need more routing than we got already.
> The patch introduces the infrastructure to enable direct IRQ mapping.
> 
> Signed-off-by: Alexey Kardashevskiy 
> 
> ---
> 
> The patch is raw and ugly indeed, I made it only to demonstrate
> the idea and see if it has right to live or not.
> 
> For some reason which I do not really understand (limited GSI numbers?)
> the existing code always adds routing and I do not see why we would need it.

It's an IOAPIC, a pin gets toggled from the device and an MSI message
gets written to the CPU.  So the route allocates and programs the
pin->MSI, then we tell it what notifier triggers that pin.

On x86 the MSI vector doesn't encode any information about the device
sending the MSI, here you seem to be able to figure out the device and
vector space number from the address.  Then your pin to MSI is
effectively fixed.  So why isn't this just your
kvm_irqchip_add_msi_route function?  On pSeries it's a lookup, on x86
it's a allocate and program.  What does kvm_irqchip_add_msi_route do on
pSeries today?  Thanks,

Alex

> ---
>  hw/misc/vfio.c   |   11 +--
>  hw/pci/pci.c |   13 +
>  hw/ppc/spapr_pci.c   |   13 +
>  hw/virtio/virtio-pci.c   |   26 --
>  include/hw/pci/pci.h |4 
>  include/hw/pci/pci_bus.h |1 +
>  6 files changed, 60 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> index 14aac04..2d9eef7 100644
> --- a/hw/misc/vfio.c
> +++ b/hw/misc/vfio.c
> @@ -639,7 +639,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
> unsigned int nr,
>   * Attempt to enable route through KVM irqchip,
>   * default to userspace handling if unavailable.
>   */
> -vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
> +
> +vector->virq = msg ? pci_bus_map_msi(vdev->pdev.bus, *msg) : -1;
> +if (vector->virq < 0) {
> +vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
> +}
>  if (vector->virq < 0 ||
>  kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
> vector->virq) < 0) {
> @@ -807,7 +811,10 @@ retry:
>   * Attempt to enable route through KVM irqchip,
>   * default to userspace handling if unavailable.
>   */
> -vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
> +vector->virq = pci_bus_map_msi(vdev->pdev.bus, msg);
> +if (vector->virq < 0) {
> +vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
> +}
>  if (vector->virq < 0 ||
>  kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
> vector->virq) < 0) {
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index a976e46..a9875e9 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1254,6 +1254,19 @@ void pci_device_set_intx_routing_notifier(PCIDevice 
> *dev,
>  dev->intx_routing_notifier = notifier;
>  }
>  
> +void pci_bus_set_map_msi_fn(PCIBus *bus, pci_map_msi_fn map_msi_fn)
> +{
> +bus->map_msi = map_msi_fn;
> +}
> +
> +int pci_bus_map_msi(PCIBus *bus, MSIMessage msg)
> +{
> +if (bus->map_msi) {
> +return bus->map_msi(bus, msg);
> +}
> +return -1;
> +}
> +
>  /*
>   * PCI-to-PCI bridge specification
>   * 9.1: Interrupt routing. Table 9-1
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 80408c9..9ef9a29 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -500,6 +500,18 @@ static void spapr_msi_write(void *opaque, hwaddr addr,
>  qemu_irq_pulse(xics_get_qirq(spapr->icp, irq));
>  }
>  
> +static int spapr_msi_get_irq(PCIBus *bus, MSIMessage msg)
> +{
> +DeviceState *par = bus->qbus.parent;
> +sPAPRPHBState *sphb = (sPAPRPHBState *) par;
> +unsigned long addr = msg.address - sphb->msi_win_addr;
> +int ndev = addr >> 16;
> +int vec = ((addr & 0x) >> 2) | msg.data;
> +uint32_t irq = sphb->msi_table[ndev].irq + vec;
> +
> +return (int)irq;
> +}
> +
>  static const MemoryRegionOps spapr_msi_ops = {
>  /* There is no .read as the read result is undefined by PCI spec */
>  .read = NULL,
> @@ -664,6 +676,7 @@ static int _spapr_phb_init(SysBusDevice *s)
>  
>  sphb->lsi_table[i].irq = irq;
>  }
> +pci_bus_set_map_msi_fn(bus, spapr_msi_get_ir

Re: [Qemu-devel] [PATCH] int128: optimize

2013-06-20 Thread Richard Henderson

On 06/20/2013 08:00 AM, Paolo Bonzini wrote:
>  static inline Int128 int128_sub(Int128 a, Int128 b)
>  {
> -return int128_add(a, int128_neg(b));
> +uint64_t lo = a.lo - b.lo;
> +return (Int128) { lo, (lo < a.lo) + a.hi - b.hi };

This one isn't right.  Consider { 2, 0 } - { 2, 0 }

  lo = 2 - 2 = 0;
  = { 0, (0 < 2) + 0 - 0 }
  = { 0, 1 }

I'd be happier with a more traditional

  (Int128){ a.lo - b.lo, a.hi - b.hi - (a.lo < b.lo) };


r~

Re: [Qemu-devel] Java volatile vs. C11 seq_cst (was Re: [PATCH v2 1/2] add a header file for atomic operations)

2013-06-20 Thread Paul E. McKenney

On Wed, Jun 19, 2013 at 09:11:36AM +0200, Torvald Riegel wrote:
> On Tue, 2013-06-18 at 18:53 -0700, Paul E. McKenney wrote:
> > On Tue, Jun 18, 2013 at 05:37:42PM +0200, Torvald Riegel wrote:
> > > On Tue, 2013-06-18 at 07:50 -0700, Paul E. McKenney wrote:
> > > > First, I am not a fan of SC, mostly because there don't seem to be many
> > > > (any?) production-quality algorithms that need SC.  But if you really
> > > > want to take a parallel-programming trip back to the 1980s, let's go!  
> > > > ;-)
> > > 
> > > Dekker-style mutual exclusion is useful for things like read-mostly
> > > multiple-reader single-writer locks, or similar "asymmetric" cases of
> > > synchronization.  SC fences are needed for this.
> > 
> > They definitely need Power hwsync rather than lwsync, but they need
> > fewer fences than would be emitted by slavishly following either of the
> > SC recipes for Power.  (Another example needing store-to-load ordering
> > is hazard pointers.)
> 
> The C++11 seq-cst fence expands to hwsync; combined with a relaxed
> store / load, that should be minimal.  Or are you saying that on Power,
> there is a weaker HW barrier available that still constrains store-load
> reordering sufficiently?

Your example use of seq-cst fence is a very good one for this example.
But most people I have talked to think of C++11 SC as being SC atomic
accesses, and SC atomics would get you a bunch of redundant fences
in this example -- some but not all of which could be easily optimized
away.

Thanx, Paul

Re: [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support

2013-06-20 Thread Anthony Liguori

Alexey Kardashevskiy  writes:

> At the moment QEMU creates a route for every MSI IRQ.
>
> Now we are about to add IRQFD support on PPC64-pseries platform.
> pSeries already has in-kernel emulated interrupt controller with
> 8192 IRQs. Also, pSeries PHB already supports MSIMessage to IRQ
> mapping as a part of PAPR requirements for MSI/MSIX guests.
> Specifically, the pSeries guest does not touch MSIMessage's at
> all, instead it uses rtas_ibm_change_msi and rtas_ibm_query_interrupt_source
> rtas calls to do the mapping.
>
> Therefore we do not really need more routing than we got already.
> The patch introduces the infrastructure to enable direct IRQ mapping.
>
> Signed-off-by: Alexey Kardashevskiy 
>
> ---
>
> The patch is raw and ugly indeed, I made it only to demonstrate
> the idea and see if it has right to live or not.
>
> For some reason which I do not really understand (limited GSI numbers?)
> the existing code always adds routing and I do not see why we would need it.
>
> Thanks!
> ---
>  hw/misc/vfio.c   |   11 +--
>  hw/pci/pci.c |   13 +
>  hw/ppc/spapr_pci.c   |   13 +
>  hw/virtio/virtio-pci.c   |   26 --
>  include/hw/pci/pci.h |4 
>  include/hw/pci/pci_bus.h |1 +
>  6 files changed, 60 insertions(+), 8 deletions(-)
>
> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> index 14aac04..2d9eef7 100644
> --- a/hw/misc/vfio.c
> +++ b/hw/misc/vfio.c
> @@ -639,7 +639,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
> unsigned int nr,
>   * Attempt to enable route through KVM irqchip,
>   * default to userspace handling if unavailable.
>   */
> -vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
> +
> +vector->virq = msg ? pci_bus_map_msi(vdev->pdev.bus, *msg) : -1;
> +if (vector->virq < 0) {
> +vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
> +}
>  if (vector->virq < 0 ||
>  kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
> vector->virq) < 0) {
> @@ -807,7 +811,10 @@ retry:
>   * Attempt to enable route through KVM irqchip,
>   * default to userspace handling if unavailable.
>   */
> -vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
> +vector->virq = pci_bus_map_msi(vdev->pdev.bus, msg);
> +if (vector->virq < 0) {
> +vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
> +}

I don't understand why you're adding a pci level hook verses just having
a kvmppc specific hook in the kvm_irqchip_add_msi_route function..

Regards,

Anthony Liguori

>  if (vector->virq < 0 ||
>  kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
> vector->virq) < 0) {
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index a976e46..a9875e9 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1254,6 +1254,19 @@ void pci_device_set_intx_routing_notifier(PCIDevice 
> *dev,
>  dev->intx_routing_notifier = notifier;
>  }
>  
> +void pci_bus_set_map_msi_fn(PCIBus *bus, pci_map_msi_fn map_msi_fn)
> +{
> +bus->map_msi = map_msi_fn;
> +}
> +
> +int pci_bus_map_msi(PCIBus *bus, MSIMessage msg)
> +{
> +if (bus->map_msi) {
> +return bus->map_msi(bus, msg);
> +}
> +return -1;
> +}
> +
>  /*
>   * PCI-to-PCI bridge specification
>   * 9.1: Interrupt routing. Table 9-1
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 80408c9..9ef9a29 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -500,6 +500,18 @@ static void spapr_msi_write(void *opaque, hwaddr addr,
>  qemu_irq_pulse(xics_get_qirq(spapr->icp, irq));
>  }
>  
> +static int spapr_msi_get_irq(PCIBus *bus, MSIMessage msg)
> +{
> +DeviceState *par = bus->qbus.parent;
> +sPAPRPHBState *sphb = (sPAPRPHBState *) par;
> +unsigned long addr = msg.address - sphb->msi_win_addr;
> +int ndev = addr >> 16;
> +int vec = ((addr & 0x) >> 2) | msg.data;
> +uint32_t irq = sphb->msi_table[ndev].irq + vec;
> +
> +return (int)irq;
> +}
> +
>  static const MemoryRegionOps spapr_msi_ops = {
>  /* There is no .read as the read result is undefined by PCI spec */
>  .read = NULL,
> @@ -664,6 +676,7 @@ static int _spapr_phb_init(SysBusDevice *s)
>  
>  sphb->lsi_table[i].irq = irq;
>  }
> +pci_bus_set_map_msi_fn(bus, spapr_msi_get_irq);
>  
>  return 0;
>  }
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index d309416..587f53e 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -472,6 +472,8 @@ static unsigned virtio_pci_get_features(DeviceState *d)
>  return proxy->host_features;
>  }
>  
> +extern int spapr_msi_get_irq(PCIBus *bus, MSIMessage *msg);
> +
>  static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy,
>

Re: [Qemu-devel] [PATCH 07/12] spapr-rtas: add CPU argument to RTAS calls

2013-06-20 Thread Anthony Liguori

Andreas Färber  writes:

> Am 19.06.2013 22:40, schrieb Anthony Liguori:
>> RTAS is a hypervisor provided binary blob that a guest loads and
>> calls into to execute certain functions.  It's similar to the
>> vsyscall page in Linux or the short lived VMCI paravirt interface
>> from VMware.
>> 
>> The QEMU implementation of the RTAS blob is simply a passthrough
>> that proxies all RTAS calls to the hypervisor via an hypercall.
>> 
>> While we pass a CPU argument for hypercall handling in QEMU, we
>> don't pass it for RTAS calls.  Since some RTAs calls require
>
> "RTAS"
>
>> making hypercalls (normally RTAS is implemented as guest code) we
>> have nasty hacks to allow that.
>
> Where are such nasty hacks being removed? I just see the cpu argument
> propagated mostly unused throughout code.

[PATCH 08/12] spapr-rtas: use hypercall interface and remove special vty
  interfaces

Regards,

Anthony Liguori

>
>> 
>> Add a CPU argument to RTAS call handling so we can more easily
>> invoke hypercalls just as guest code would.
>> 
>> Signed-off-by: Anthony Liguori 
>> ---
>>  hw/nvram/spapr_nvram.c |  4 ++--
>>  hw/ppc/spapr_events.c  |  2 +-
>>  hw/ppc/spapr_hcall.c   |  2 +-
>>  hw/ppc/spapr_pci.c | 13 +++--
>>  hw/ppc/spapr_rtas.c| 21 +++--
>>  hw/ppc/spapr_vio.c |  6 --
>>  hw/ppc/xics.c  | 12 
>>  include/hw/ppc/spapr.h |  5 +++--
>>  8 files changed, 37 insertions(+), 28 deletions(-)
>
> Otherwise,
>
> Reviewed-by: Andreas Färber 
>
> Andreas
>
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH v3] vl.c: Support multiple CPU ranges on -numa option

2013-06-20 Thread Bandan Das

Paolo Bonzini  writes:

> Il 20/06/2013 15:26, Eduardo Habkost ha scritto:
>> On Thu, Jun 20, 2013 at 11:52:42AM +0200, Paolo Bonzini wrote:
>>> Il 20/06/2013 11:30, Igor Mammedov ha scritto:
>> So, basically the format seemed easier to work with if we are 
>> thinking 
>> of using QemuOpts for -numa. Using -cpu rather than cpus probably
>> makes it less ambiguous as well IMO. However, it's probably not a 
>> good idea
>> if the current syntax is well established ?
>>
>> libvirt uses the "cpus" option already, so we have to keep it working.
 Sure, we can leave it as it's now for some time while a new interface is
 introduced/adopted. And than later deprecate "cpus".
>>>
>>> So, you used a new name because the new behavior of "-numa
>>> node,cpus=1-2,cpus=3-4" would be incompatible with the old.
>> 
>> I don't think anybody uses "cpus=1-2,cpus=3-4" today, so I believe we
>> can change its behavior. The problem was to get agreement on the syntax
>> to represent multiple CPU ranges.
>
> Ok.  I think almost everyone agreed on "cpus=1-2,cpus=3-4", which is
> basically what Bandan's patch does minus s/cpu/cpus/.  It matches what
> already happens with other options (SLIRP), so it's hardly surprising.

Good, so should I spin a new version with "cpus" ?

Also note that this patch actually doesn't add any extra code to support 
multiple cpus arguments. It all happens automatically as part of conversion to
QemuOpts. So, if we need to revisit the syntax later, we can always do that.

Bandan
> Let's go on with that.
>
> Paolo
>
>>> Personally I don't think that's a problem, but I remember a long
>>> discussion in the past.  Igor/Eduardo, do you remember the conclusions?
>> 
>> I don't remember seeing the discussion reach any conclusion,
>> unfortunately.
>>

Re: [Qemu-devel] Adding a persistent writeback cache to qemu

2013-06-20 Thread Sage Weil

On Thu, 20 Jun 2013, Stefan Hajnoczi wrote:
> > The concrete problem here is that flashcache/dm-cache/bcache don't
> > work with the rbd (librbd) driver, as flashcache/dm-cache/bcache
> > cache access to block devices (in the host layer), and with rbd
> > (for instance) there is no access to a block device at all. block/rbd.c
> > simply calls librbd which calls librados etc.
> > 
> > So the context switches etc. I am avoiding are the ones that would
> > be introduced by using kernel rbd devices rather than librbd.
> 
> I understand the limitations with kernel block devices - their
> setup/teardown is an extra step outside QEMU and privileges need to be
> managed.  That basically means you need to use a management tool like
> libvirt to make it usable.
> 
> But I don't understand the performance angle here.  Do you have profiles
> that show kernel rbd is a bottleneck due to context switching?
> 
> We use the kernel page cache for -drive file=test.img,cache=writeback
> and no one has suggested reimplementing the page cache inside QEMU for
> better performance.
> 
> Also, how do you want to manage QEMU page cache with multiple guests
> running?  They are independent and know nothing about each other.  Their
> process memory consumption will be bloated and the kernel memory
> management will end up having to sort out who gets to stay in physical
> memory.
> 
> You can see I'm skeptical of this and think it's premature optimization,
> but if there's really a case for it with performance profiles then I
> guess it would be necessary.  But we should definitely get feedback from
> the Ceph folks too.
> 
> I'd like to hear from Ceph folks what their position on kernel rbd vs
> librados is.  Why one do they recommend for QEMU guests and what are the
> pros/cons?

I agree that a flashcache/bcache-like persistent cache would be a big win 
for qemu + rbd users.  

There are few important issues with librbd vs kernel rbd:

 * librbd tends to get new features more quickly that the kernel rbd 
   (although now that layering has landed in 3.10 this will be less 
   painful than it was).

 * Using kernel rbd means users need bleeding edge kernels, a non-starter 
   for many orgs that are still running things like RHEL.  Bug fixes are 
   difficult to roll out, etc.

 * librbd has an in-memory cache that behaves similar to an HDD's cache 
   (e.g., it forces writeback on flush).  This improves performance 
   significantly for many workloads.  Of course, having a bcache-like 
   layer mitigates this..

I'm not really sure what the best path forward is.  Putting the 
functionality in qemu would benefit lots of other storage backends, 
putting it in librbd would capture various other librbd users (xen, tgt, 
and future users like hyper-v), and using new kernels works today but 
creates a lot of friction for operations.

sage

[Qemu-devel] [PATCH] linux-user: Fix sys_utimensat (would not compile on old glibc)

2013-06-20 Thread Peter Maydell

Commit c0d472b12e accidentally dropped the definition of
__NR_SYS_utimensat even though its use is guarded by
CONFIG_UTIMENSAT, not CONFIG_ATFILE. Some older glibc don't
have utimensat() (even if they have the other *at() functions).
Fix this by correctly cleaning up the sys_utimensat()
implementation and #defines, so that we always provide the
syscall if needed whether we're doing it via glibc or not.

Signed-off-by: Peter Maydell 
---
 linux-user/syscall.c |   16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index cdd0c28..f7877c3 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -338,6 +338,7 @@ static int sys_openat(int dirfd, const char *pathname, int 
flags, mode_t mode)
 }
 #endif
 
+#ifdef TARGET_NR_utimensat
 #ifdef CONFIG_UTIMENSAT
 static int sys_utimensat(int dirfd, const char *pathname,
 const struct timespec times[2], int flags)
@@ -347,12 +348,19 @@ static int sys_utimensat(int dirfd, const char *pathname,
 else
 return utimensat(dirfd, pathname, times, flags);
 }
-#else
-#if defined(TARGET_NR_utimensat) && defined(__NR_utimensat)
+#elif defined(__NR_utimensat)
+#define __NR_sys_utimensat __NR_utimensat
 _syscall4(int,sys_utimensat,int,dirfd,const char *,pathname,
   const struct timespec *,tsp,int,flags)
+#else
+static int sys_utimensat(int dirfd, const char *pathname,
+ const struct timespec times[2], int flags)
+{
+errno = ENOSYS;
+return -1;
+}
 #endif
-#endif /* CONFIG_UTIMENSAT  */
+#endif /* TARGET_NR_utimensat */
 
 #ifdef CONFIG_INOTIFY
 #include 
@@ -8536,7 +8544,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
 goto unimplemented_nowarn;
 #endif
 
-#if defined(TARGET_NR_utimensat) && defined(__NR_utimensat)
+#if defined(TARGET_NR_utimensat)
 case TARGET_NR_utimensat:
 {
 struct timespec *tsp, ts[2];
-- 
1.7.9.5

Re: [Qemu-devel] [PATCH 07/12] spapr-rtas: add CPU argument to RTAS calls

2013-06-20 Thread Andreas Färber

Am 19.06.2013 22:40, schrieb Anthony Liguori:
> RTAS is a hypervisor provided binary blob that a guest loads and
> calls into to execute certain functions.  It's similar to the
> vsyscall page in Linux or the short lived VMCI paravirt interface
> from VMware.
> 
> The QEMU implementation of the RTAS blob is simply a passthrough
> that proxies all RTAS calls to the hypervisor via an hypercall.
> 
> While we pass a CPU argument for hypercall handling in QEMU, we
> don't pass it for RTAS calls.  Since some RTAs calls require

"RTAS"

> making hypercalls (normally RTAS is implemented as guest code) we
> have nasty hacks to allow that.

Where are such nasty hacks being removed? I just see the cpu argument
propagated mostly unused throughout code.

> 
> Add a CPU argument to RTAS call handling so we can more easily
> invoke hypercalls just as guest code would.
> 
> Signed-off-by: Anthony Liguori 
> ---
>  hw/nvram/spapr_nvram.c |  4 ++--
>  hw/ppc/spapr_events.c  |  2 +-
>  hw/ppc/spapr_hcall.c   |  2 +-
>  hw/ppc/spapr_pci.c | 13 +++--
>  hw/ppc/spapr_rtas.c| 21 +++--
>  hw/ppc/spapr_vio.c |  6 --
>  hw/ppc/xics.c  | 12 
>  include/hw/ppc/spapr.h |  5 +++--
>  8 files changed, 37 insertions(+), 28 deletions(-)

Otherwise,

Reviewed-by: Andreas Färber 

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 0/2] Remove hardcoded xen-platform device initialization (v4)

2013-06-20 Thread Paolo Bonzini

Il 18/06/2013 16:22, Michael S. Tsirkin ha scritto:
> On Tue, Jun 18, 2013 at 01:15:57PM +, Paul Durrant wrote:
>>> -Original Message-
>>> From: Laszlo Ersek [mailto:ler...@redhat.com]
>>> Sent: 18 June 2013 14:14
>>> To: Michael S. Tsirkin
>>> Cc: Paul Durrant; qemu-devel@nongnu.org
>>> Subject: Re: [Qemu-devel] [PATCH 0/2] Remove hardcoded xen-platform
>>> device initialization (v4)
>>>
>>> On 06/18/13 15:01, Michael S. Tsirkin wrote:
 On Tue, Jun 18, 2013 at 12:57:54PM +, Paul Durrant wrote:
>> -Original Message-
>> From: Michael S. Tsirkin [mailto:m...@redhat.com]
>> Sent: 18 June 2013 13:52
>> To: Laszlo Ersek
>> Cc: Paul Durrant; qemu-devel@nongnu.org
>> Subject: Re: [Qemu-devel] [PATCH 0/2] Remove hardcoded xen-
>>> platform
>> device initialization (v4)
>>
>> On Tue, Jun 18, 2013 at 02:37:58PM +0200, Laszlo Ersek wrote:
>>> Hi Paul,
>>>
>>> (xen-devel snipped)
>>>
>>> On 06/18/13 13:16, Paul Durrant wrote:
 Because of concerns over backwards compatibility and a suggestion
>>> that
 xenfv should be retired in favour of using the pc machine type I have
>>> re-
 worked my original patch into 2 patches:

 [PATCH 1/2] Allow use of pc machine type (accel=xen) for Xen HVM
 [PATCH 2/2] Move hardcoded initialization of xen-platform device.

 Application of both these patches allows alternative pc machine types
>>> to
>> be
 used with the accel=xen option, but preserves the hardcoded
>>> creation of
 the xen-platform device only for machine type xenfv.

 v3:
 - Add test for xen_enabled() that went missing in v2

 v4:
 - Remove erroneous whitespace hunk
 - Replace hw_error() with fprintf()+exit(1)
 - Add braces to single-line if
>>>
>>> can you please offer an opinion in the
>>>
>>>   [PATCH 1/2] pvpanic: initialization cleanup
>>>   http://thread.gmane.org/gmane.comp.emulators.qemu/216940
>>>
>>> thread?
>>>
>>> >From where I stand (which is "quite afar" :)) this series of yours 
>>> >seems
>>> somewhat related to my doubt there.
>>>
>>> Thanks!
>>> Laszlo
>>
>> OK will make it skip fwcfg as we did earlier.
>> Thanks for the review.
>>
>
> Yes, I think the assert(fw_cfg) would be problematic in the xen case
>>> where, up until my patch, machine types was necessarily xenfv.
>
>   Paul

 Do you guys actually need the pvpanic device?
 How do you know which port to use without fwcfg?
>>>
>>> Xen domains don't know the port and don't use the pvpanic device, but
>>> qemu starts at least. In other words, the pvpanic device is created, but
>>> unreachable. Maybe the has_pvpanic logic should depend on (or extended
>>> with) !xen_enabled().
>>>
>>
>> That seems entirely reasonable to me.
> 
> We can just skip creating the device if there's no fw cfg.

No, in principle Xen domains could use another scheme to find the port
(xenstore for example).  If Xen domains do not want it, they can just
add an "if".  Or we could just skip the fw_cfg step.  The device will be
there but not ACPI-discoverable.

Paolo

Re: [Qemu-devel] [PATCH 06/12] spapr-vty: add copyright and license

2013-06-20 Thread Andreas Färber

Am 19.06.2013 22:40, schrieb Anthony Liguori:
> If you are on CC, then please Ack this patch as you touched this
> file at some point in time.
> 
> Cc: Alexey Kardashevskiy 
> Cc: Andreas Färber 
> Cc: David Gibson 
> Cc: Michael Ellerman 
> Cc: Paolo Bonzini 
> Signed-off-by: Anthony Liguori 
> ---
>  hw/char/spapr_vty.c | 13 +
>  1 file changed, 13 insertions(+)

Acked-by: Andreas Färber 

Andreas

> 
> diff --git a/hw/char/spapr_vty.c b/hw/char/spapr_vty.c
> index 2993848..ecc2bb5 100644
> --- a/hw/char/spapr_vty.c
> +++ b/hw/char/spapr_vty.c
> @@ -1,3 +1,16 @@
> +/*
> + * QEMU PowerPC pSeries Logical Partition (aka sPAPR) hardware System 
> Emulator
> + *
> + * PAPR Inter-VM Logical Lan, aka ibmveth
> + *
> + * Copyright IBM, Corp. 2010-2013
> + *
> + * Authors:
> + *   David Gibson 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
>  #include "hw/qdev.h"
>  #include "sysemu/char.h"
>  #include "hw/ppc/spapr.h"
> 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] [RFC V8 20/24] qcow2: Serialize write requests when deduplication is activated.

2013-06-20 Thread Benoît Canet

This fixes the sub cluster sized writes race conditions while waiting
for a faster solution.

Signed-off-by: Benoit Canet 
---
 block/qcow2.c |   14 ++
 block/qcow2.h |1 +
 2 files changed, 15 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 8eb63f1..11c115f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -534,6 +534,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, 
int flags)
 
 /* Initialise locks */
 qemu_co_mutex_init(&s->lock);
+qemu_co_mutex_init(&s->dedup_lock);
 
 /* Repair image if dirty */
 if (!(flags & BDRV_O_CHECK) && !bs->read_only &&
@@ -841,6 +842,15 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState 
*bs,
 
 s->cluster_cache_offset = -1; /* disable compressed cache */
 
+if (s->has_dedup) {
+/* This mutex is used to serialize the write requests in the dedup 
case.
+ * The goal is to avoid that the dedup process concurrents requests to
+ * the same clusters and corrupt data.
+ * With qcow2_dedup_read_missing_and_concatenate that would not work.
+ */
+qemu_co_mutex_lock(&s->dedup_lock);
+}
+
 qemu_co_mutex_lock(&s->lock);
 
 if (s->has_dedup) {
@@ -1018,6 +1028,10 @@ fail:
 l2meta = next;
 }
 
+if (s->has_dedup) {
+qemu_co_mutex_unlock(&s->dedup_lock);
+}
+
 qemu_iovec_destroy(&hd_qiov);
 qemu_vfree(cluster_data);
 qemu_vfree(dedup_cluster_data);
diff --git a/block/qcow2.h b/block/qcow2.h
index 6f85e03..3c6e685 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -364,6 +364,7 @@ typedef struct BDRVQcowState {
 Coroutine *load_filter_co;  /* used to load incarnations filters */
 
 CoMutex lock;
+CoMutex dedup_lock;
 
 uint32_t crypt_method; /* current crypt method, 0 if no key yet */
 uint32_t crypt_method_header;
-- 
1.7.10.4

Re: [Qemu-devel] [Bug 1191606] Re: qemu crashes with iscsi initiator (libiscsi) when using virtio

2013-06-20 Thread Laszlo Ersek

On 06/20/13 17:31, ronnie sahlberg wrote:
> On Thu, Jun 20, 2013 at 7:47 AM, Laszlo Ersek  wrote:

>> First I don't understand how access_len can only be "1". But, in any
>> case, if the "req->elem.in_sg[0].iov_base" pointer is stored in
>> little-endian order, and the kernel (or iscsi_scsi_command_async()?) for
>> whatever reason misinterprets "hdr.dxferp" to point at an actual receive
>> buffer (instead of an iovec array), that would be consistent with the
>> symptoms:
> 
> Ah, that makes sense.
> 
> block.iscsi.c   (https://github.com/qemu/qemu/blob/master/block/iscsi.c)
> does assume that ioh->dxferp is a pointer to the buffer and that there
> is no scatter gather.
> See lines  745-749.

How could I miss that? :) I stopped looking at the
iscsi_scsi_command_async() call on line 734. Sheesh.

> I did not know that ioctl() could take a scatter/gather list.
> 
> 
> I cant test now  but if I understand right then
> lines 745-749 should be replaced with something that does
> 
> * check ioh->iovec_count IF if it zero then there is no scatter gather
> and ioh->dxferp points to a buffer,  so just do what we do today.
> * IF iovec_count is > 0  then dxferp is NOT a pointer to a buffer but
> a pointer to an array of iovec then
> traverse the iovec array and add these as buffers to the task just
> like we do for readv. For example similar to the loop to add the
> iovecs in lines 449-453

Seems correct to me.

> 
> 
> I will try this tonight.

Thanks!
Laszlo

[Qemu-devel] qemu bug

2013-06-20 Thread jacek burghardt

i had compiled latest qemu master git with xen now i am getting this error
 $r/include/qemu/int128.h:18: int128_get64: Assertion `!a.hi' failed. i
wonder what could be causing this .

Re: [Qemu-devel] [PATCH 02/12] qtest: add spapr hypercall support

2013-06-20 Thread Anthony Liguori

Andreas Färber  writes:

> Am 19.06.2013 22:40, schrieb Anthony Liguori:
>> Signed-off-by: Anthony Liguori 
>> ---
>>  qtest.c  | 29 +
>>  tests/libqtest.c | 18 ++
>>  tests/libqtest.h | 46 ++
>>  3 files changed, 93 insertions(+)
>> 
>> diff --git a/qtest.c b/qtest.c
>> index 07a9612..f8c8f44 100644
>> --- a/qtest.c
>> +++ b/qtest.c
>> @@ -19,6 +19,9 @@
>>  #include "hw/irq.h"
>>  #include "sysemu/sysemu.h"
>>  #include "sysemu/cpus.h"
>> +#ifdef TARGET_PPC64
>> +#include "hw/ppc/spapr.h"
>> +#endif
>>  
>>  #define MAX_IRQ 256
>>  
>> @@ -141,6 +144,13 @@ static bool qtest_opened;
>>   * where NUM is an IRQ number.  For the PC, interrupts can be intercepted
>>   * simply with "irq_intercept_in ioapic" (note that IRQ0 comes out with
>>   * NUM=0 even though it is remapped to GSI 2).
>> + *
>> + * Platform specific (sPAPR):
>> + *
>> + *  > papr_hypercall NR ARG0 ARG1 ... ARG8
>
> The functions are called spapr_hcall*() but the protocol uses
> papr_hypercall?

The discrepancy is inherited in the KVM vs. QEMU interfaces.  It's
called papr_hypercall in the KVM interface vs. spapr in QEMU.

I honestly don't know what the distinction between spapr and papr is.

>> +static inline uint64_t spapr_hcall5(uint64_t nr, uint64_t a0, uint64_t a1,
>> +uint64_t a2, uint64_t a3, uint64_t a4)
>> +{
>> +return qtest_spapr_hcall9(global_qtest, nr, a0, a1, a2, a3, a4, 0, 0, 
>> 0, 0);
>> +}
>
> While for a large number of almost identical helpers this certainly
> sucks, I made an effort to document all functions in that file, so
> please keep it that way. :)

Seems a bit redundant to document every one of these but I don't mind
doing it.

Regards,

Anthony Liguori

>
> Looks very similar to what I had proposed for s390x, so fine with me.
>
> Regards,
> Andreas
>
>> +
>>  #endif
>> 
>
>
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] [RFC V8 18/24] qcow2: Remove hash when cluster is deleted.

2013-06-20 Thread Benoît Canet

Signed-off-by: Benoit Canet 
---
 block/qcow2-dedup.c|   45 +
 block/qcow2-refcount.c |3 +++
 block/qcow2.h  |2 ++
 3 files changed, 50 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index da4ad5c..599cb2e 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -656,3 +656,48 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
 
 return ret;
 }
+
+/* Clean the last reference to a given cluster when its refcount is zero
+ *
+ * @cluster_index: the index of the physical cluster
+ */
+void qcow2_dedup_destroy_hash(BlockDriverState *bs,
+  uint64_t cluster_index)
+{
+BDRVQcowState *s = bs->opaque;
+uint64_t offset = cluster_index * s->cluster_size;
+QCowHashInfo hash_info;
+uint8_t *buf;
+int ret = 0;
+
+/* allocate buffer */
+buf = qemu_blockalign(bs, s->cluster_size);
+
+/* read cluster from disk */
+ret = bdrv_pread(bs->file, offset, buf, s->cluster_size);
+
+/* error */
+if (ret < 0) {
+goto free_exit;
+}
+
+/* clear hash info */
+memset(&hash_info, 0, sizeof(QCowHashInfo));
+
+/* compute hash for the cluster */
+ret = qcow2_compute_cluster_hash(bs,
+ &hash_info.hash,
+ buf);
+
+
+/* error */
+if (ret < 0) {
+goto free_exit;
+}
+
+/* delete hash from key value store. It will not be deduplicated anymore */
+qcow2_store_delete(bs, &s->key_value_store, &hash_info);
+
+free_exit:
+   qemu_vfree(buf);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 3bd8f37..2734cd9 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -482,6 +482,9 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 ret = -EINVAL;
 goto fail;
 }
+if (s->has_dedup && refcount == 0) {
+qcow2_dedup_destroy_hash(bs, cluster_index);
+}
 if (refcount == 0 && cluster_index < s->free_cluster_index) {
 s->free_cluster_index = cluster_index;
 }
diff --git a/block/qcow2.h b/block/qcow2.h
index 720131d..6f85e03 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -748,5 +748,7 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
  int count,
  uint64_t logical_sect,
  uint64_t physical_sect);
+void qcow2_dedup_destroy_hash(BlockDriverState *bs,
+  uint64_t cluster_index);
 
 #endif
-- 
1.7.10.4

Re: [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support

2013-06-20 Thread Michael S. Tsirkin

On Fri, Jun 21, 2013 at 12:08:58AM +1000, Alexey Kardashevskiy wrote:
> At the moment QEMU creates a route for every MSI IRQ.
> 
> Now we are about to add IRQFD support on PPC64-pseries platform.
> pSeries already has in-kernel emulated interrupt controller with
> 8192 IRQs. Also, pSeries PHB already supports MSIMessage to IRQ
> mapping as a part of PAPR requirements for MSI/MSIX guests.
> Specifically, the pSeries guest does not touch MSIMessage's at
> all, instead it uses rtas_ibm_change_msi and rtas_ibm_query_interrupt_source
> rtas calls to do the mapping.
> 
> Therefore we do not really need more routing than we got already.
> The patch introduces the infrastructure to enable direct IRQ mapping.
> 
> Signed-off-by: Alexey Kardashevskiy 
> 
> ---
> 
> The patch is raw and ugly indeed, I made it only to demonstrate
> the idea and see if it has right to live or not.
> 
> For some reason which I do not really understand (limited GSI numbers?)
> the existing code always adds routing and I do not see why we would need it.
> 
> Thanks!
> ---
>  hw/misc/vfio.c   |   11 +--
>  hw/pci/pci.c |   13 +
>  hw/ppc/spapr_pci.c   |   13 +
>  hw/virtio/virtio-pci.c   |   26 --
>  include/hw/pci/pci.h |4 
>  include/hw/pci/pci_bus.h |1 +
>  6 files changed, 60 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> index 14aac04..2d9eef7 100644
> --- a/hw/misc/vfio.c
> +++ b/hw/misc/vfio.c
> @@ -639,7 +639,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
> unsigned int nr,
>   * Attempt to enable route through KVM irqchip,
>   * default to userspace handling if unavailable.
>   */
> -vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
> +
> +vector->virq = msg ? pci_bus_map_msi(vdev->pdev.bus, *msg) : -1;
> +if (vector->virq < 0) {
> +vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
> +}
>  if (vector->virq < 0 ||
>  kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
> vector->virq) < 0) {
> @@ -807,7 +811,10 @@ retry:
>   * Attempt to enable route through KVM irqchip,
>   * default to userspace handling if unavailable.
>   */
> -vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
> +vector->virq = pci_bus_map_msi(vdev->pdev.bus, msg);
> +if (vector->virq < 0) {
> +vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
> +}
>  if (vector->virq < 0 ||
>  kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
> vector->virq) < 0) {
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index a976e46..a9875e9 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1254,6 +1254,19 @@ void pci_device_set_intx_routing_notifier(PCIDevice 
> *dev,
>  dev->intx_routing_notifier = notifier;
>  }
>  
> +void pci_bus_set_map_msi_fn(PCIBus *bus, pci_map_msi_fn map_msi_fn)
> +{
> +bus->map_msi = map_msi_fn;
> +}
> +
> +int pci_bus_map_msi(PCIBus *bus, MSIMessage msg)
> +{
> +if (bus->map_msi) {
> +return bus->map_msi(bus, msg);
> +}
> +return -1;
> +}
> +
>  /*
>   * PCI-to-PCI bridge specification
>   * 9.1: Interrupt routing. Table 9-1
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 80408c9..9ef9a29 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -500,6 +500,18 @@ static void spapr_msi_write(void *opaque, hwaddr addr,
>  qemu_irq_pulse(xics_get_qirq(spapr->icp, irq));
>  }
>  
> +static int spapr_msi_get_irq(PCIBus *bus, MSIMessage msg)
> +{
> +DeviceState *par = bus->qbus.parent;
> +sPAPRPHBState *sphb = (sPAPRPHBState *) par;
> +unsigned long addr = msg.address - sphb->msi_win_addr;
> +int ndev = addr >> 16;
> +int vec = ((addr & 0x) >> 2) | msg.data;
> +uint32_t irq = sphb->msi_table[ndev].irq + vec;

This array seems to be SPAPR_MSIX_MAX_DEVS in size, no?
Won't this overflow if ndev is large?

> +
> +return (int)irq;
> +}
> +
>  static const MemoryRegionOps spapr_msi_ops = {
>  /* There is no .read as the read result is undefined by PCI spec */
>  .read = NULL,
> @@ -664,6 +676,7 @@ static int _spapr_phb_init(SysBusDevice *s)
>  
>  sphb->lsi_table[i].irq = irq;
>  }
> +pci_bus_set_map_msi_fn(bus, spapr_msi_get_irq);
>  
>  return 0;
>  }
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index d309416..587f53e 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -472,6 +472,8 @@ static unsigned virtio_pci_get_features(DeviceState *d)
>  return proxy->host_features;
>  }
>  
> +extern int spapr_msi_get_irq(PCIBus *bus, MSIMessage *msg);
> +
>  static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy,
>  unsigned int queue_n

Re: [Qemu-devel] [PATCH 04/12] qtest: add interface to save/restore

2013-06-20 Thread Andreas Färber

Am 19.06.2013 22:40, schrieb Anthony Liguori:
> The idea here is pretty simple.  We have a synchronous interface
> that when called, does a migration to a file, kills the QEMU
> instance, and spawns a new one using the saved file state.
> 
> We an then sprinkle calls to qtest_save_restore() thorough test
> cases to validate that we are properly saving and restoring state.
> 
> Signed-off-by: Anthony Liguori 
> ---
>  tests/libqtest.c | 65 
> 
>  tests/libqtest.h | 46 +++
>  2 files changed, 111 insertions(+)
> 
> diff --git a/tests/libqtest.c b/tests/libqtest.c
> index 235ec62..bc2e84e 100644
> --- a/tests/libqtest.c
> +++ b/tests/libqtest.c
> @@ -44,6 +44,7 @@ struct QTestState
>  gchar *pid_file; /* QEMU PID file */
>  int child_pid;   /* Child process created to execute QEMU */
>  char *socket_path, *qmp_socket_path;
> +char *extra_args;
>  };
>  
>  #define g_assert_no_errno(ret) do { \
> @@ -104,6 +105,14 @@ static pid_t qtest_qemu_pid(QTestState *s)
>  return pid;
>  }
>  
> +void qtest_qmp_wait_event(QTestState *s, const char *event)
> +{
> +char *d;
> +/* This is cheating */
> +d = qtest_qmp(s, "");

This reminds me that I was unable to use GCC_FMT_ATTR(2, 3) on
qtest_qmp() because of the "" argument that gcc would warn about.

Otherwise code looks okay, although I'm not too familiar with the events.

Regards,
Andreas

> +g_free(d);
> +}
> +
>  QTestState *qtest_init(const char *extra_args)
>  {
>  QTestState *s;
> @@ -118,6 +127,7 @@ QTestState *qtest_init(const char *extra_args)
>  
>  s = g_malloc(sizeof(*s));
>  
> +s->extra_args = g_strdup(extra_args);
>  s->socket_path = g_strdup_printf("/tmp/qtest-%d.sock", getpid());
>  s->qmp_socket_path = g_strdup_printf("/tmp/qtest-%d.qmp", getpid());
>  pid_file = g_strdup_printf("/tmp/qtest-%d.pid", getpid());
> @@ -177,6 +187,61 @@ void qtest_quit(QTestState *s)
>  g_free(s->pid_file);
>  g_free(s->socket_path);
>  g_free(s->qmp_socket_path);
> +g_free(s->extra_args);
> +}
> +
> +QTestState *qtest_save_restore(QTestState *s)
> +{
> +char *filename;
> +char *d, *p, *extra_args;
> +char *n;
> +
> +filename = g_strdup_printf("/tmp/qtest-%d.savevm", getpid());
> +
> +/* Start migration to a temporary file */
> +d = qtest_qmp(s,
> +  "{ 'execute': 'migrate', "
> +  "  'arguments': { 'uri': 'exec:dd of=%s 2>/dev/null' } }",
> +  filename);
> +g_free(d);
> +
> +/* Wait for critical section to be entered */
> +qtest_qmp_wait_event(s, "STOP");
> +
> +/* Not strictly needed as we can't possibly respond to this command until
> + * we've completed migration by virtue of the fact that STOP has been 
> sent
> + * but it's good to be rigorious. */
> +do {
> +d = qtest_qmp(s, "{ 'execute': 'query-migrate' }");
> +p = strstr(d, "\"status\": \"completed\",");
> +g_free(d);
> +if (!p) {
> +g_usleep(100);
> +}
> +} while (p == NULL);
> +
> +/* Save arguments to this qtest instance */
> +extra_args = s->extra_args;
> +s->extra_args = NULL;
> +
> +/* Quit src instance */
> +qtest_quit(s);
> +
> +/* Spawn destination */
> +n = g_strdup_printf("%s -incoming exec:\"dd if=%s 2>/dev/null\"",
> +extra_args, filename);
> +s = qtest_init(n);
> +
> +/* Wait for incoming migration to complete */
> +qtest_qmp_wait_event(s, "RESUME");
> +
> +/* Fixup extra arg so we can call repeatedly */
> +g_free(s->extra_args);
> +s->extra_args = extra_args;
> +
> +g_free(filename);
> +
> +return s;
>  }
>  
>  static void socket_sendf(int fd, const char *fmt, va_list ap)
> diff --git a/tests/libqtest.h b/tests/libqtest.h
> index 5cdcae7..f2c6e52 100644
> --- a/tests/libqtest.h
> +++ b/tests/libqtest.h
> @@ -67,6 +67,15 @@ char *qtest_qmp(QTestState *s, const char *fmt, ...);
>  char *qtest_qmpv(QTestState *s, const char *fmt, va_list ap);
>  
>  /**
> + * qtest_qmp_wait_event:
> + * @s: #QTestState instance to operate on.
> + * @event: the event to wait for.
> + *
> + * Waits for a specific QMP event to occur.
> + */
> +void qtest_qmp_wait_event(QTestState *s, const char *event);
> +
> +/**
>   * qtest_get_irq:
>   * @s: #QTestState instance to operate on.
>   * @num: Interrupt to observe.
> @@ -291,6 +300,19 @@ int64_t qtest_clock_step(QTestState *s, int64_t step);
>  int64_t qtest_clock_set(QTestState *s, int64_t val);
>  
>  /**
> + * qtest_save_restore:
> + * @s: QTest instance to operate on.
> + *
> + * This function will save and restore the state of the running QEMU
> + * instance.  If the savevm code is implemented correctly for a device,
> + * this function should behave like a nop.  If a test case fails because
> + * this function is called, the savevm code for the device is broken.
> +

[Qemu-devel] [PATCH 07/25] exec: return MemoryRegion from address_space_translate

2013-06-20 Thread Paolo Bonzini

Only address_space_translate_for_iotlb needs to return the section.
Every caller of address_space_translate now uses only section->mr,
return it directly.

Signed-off-by: Paolo Bonzini 
---
 exec.c| 150 +-
 include/exec/memory.h |   8 +--
 translate-all.c   |  10 ++--
 3 files changed, 84 insertions(+), 84 deletions(-)

diff --git a/exec.c b/exec.c
index ffd2dc8..3a8ef42 100644
--- a/exec.c
+++ b/exec.c
@@ -262,11 +262,11 @@ address_space_translate_internal(AddressSpace *as, hwaddr 
addr, hwaddr *xlat,
 return section;
 }
 
-MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
- hwaddr *xlat, hwaddr *plen,
- bool is_write)
+MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
+  hwaddr *xlat, hwaddr *plen,
+  bool is_write)
 {
-return address_space_translate_internal(as, addr, xlat, plen, true);
+return address_space_translate_internal(as, addr, xlat, plen, true)->mr;
 }
 
 MemoryRegionSection *
@@ -1923,58 +1923,58 @@ bool address_space_rw(AddressSpace *as, hwaddr addr, 
uint8_t *buf,
 uint8_t *ptr;
 uint64_t val;
 hwaddr addr1;
-MemoryRegionSection *section;
+MemoryRegion *mr;
 bool error = false;
 
 while (len > 0) {
 l = len;
-section = address_space_translate(as, addr, &addr1, &l, is_write);
+mr = address_space_translate(as, addr, &addr1, &l, is_write);
 
 if (is_write) {
-if (!memory_access_is_direct(section->mr, is_write)) {
-l = memory_access_size(section->mr, l, addr1);
+if (!memory_access_is_direct(mr, is_write)) {
+l = memory_access_size(mr, l, addr1);
 /* XXX: could force cpu_single_env to NULL to avoid
potential bugs */
 if (l == 4) {
 /* 32 bit write access */
 val = ldl_p(buf);
-error |= io_mem_write(section->mr, addr1, val, 4);
+error |= io_mem_write(mr, addr1, val, 4);
 } else if (l == 2) {
 /* 16 bit write access */
 val = lduw_p(buf);
-error |= io_mem_write(section->mr, addr1, val, 2);
+error |= io_mem_write(mr, addr1, val, 2);
 } else {
 /* 8 bit write access */
 val = ldub_p(buf);
-error |= io_mem_write(section->mr, addr1, val, 1);
+error |= io_mem_write(mr, addr1, val, 1);
 }
 } else {
-addr1 += memory_region_get_ram_addr(section->mr);
+addr1 += memory_region_get_ram_addr(mr);
 /* RAM case */
 ptr = qemu_get_ram_ptr(addr1);
 memcpy(ptr, buf, l);
 invalidate_and_set_dirty(addr1, l);
 }
 } else {
-if (!memory_access_is_direct(section->mr, is_write)) {
+if (!memory_access_is_direct(mr, is_write)) {
 /* I/O case */
-l = memory_access_size(section->mr, l, addr1);
+l = memory_access_size(mr, l, addr1);
 if (l == 4) {
 /* 32 bit read access */
-error |= io_mem_read(section->mr, addr1, &val, 4);
+error |= io_mem_read(mr, addr1, &val, 4);
 stl_p(buf, val);
 } else if (l == 2) {
 /* 16 bit read access */
-error |= io_mem_read(section->mr, addr1, &val, 2);
+error |= io_mem_read(mr, addr1, &val, 2);
 stw_p(buf, val);
 } else {
 /* 8 bit read access */
-error |= io_mem_read(section->mr, addr1, &val, 1);
+error |= io_mem_read(mr, addr1, &val, 1);
 stb_p(buf, val);
 }
 } else {
 /* RAM case */
-ptr = qemu_get_ram_ptr(section->mr->ram_addr + addr1);
+ptr = qemu_get_ram_ptr(mr->ram_addr + addr1);
 memcpy(buf, ptr, l);
 }
 }
@@ -2011,18 +2011,18 @@ void cpu_physical_memory_write_rom(hwaddr addr,
 hwaddr l;
 uint8_t *ptr;
 hwaddr addr1;
-MemoryRegionSection *section;
+MemoryRegion *mr;
 
 while (len > 0) {
 l = len;
-section = address_space_translate(&address_space_memory,
-  addr, &addr1, &l, true);
+mr = address_space_translate(&address_space_memory,
+ addr, &addr1, &l, true);
 
-if (!(memory_region_is_ram(section->mr) ||
-  memory_region_is_romd(sect

[Qemu-devel] [PATCH 17/25] spapr: use memory core for iommu support

2013-06-20 Thread Paolo Bonzini

Now we can stop using a "translating" DMAContext, but we do not yet modify
the sPAPRTCETable users to get an AddressSpace; they keep using the table
via a DMAContext.

Acked-by: David Gibson 
Signed-off-by: Paolo Bonzini 
---
 hw/ppc/spapr_iommu.c   | 48 +++-
 include/hw/ppc/spapr.h |  1 +
 2 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index cf5ccb1..6e33929 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -37,12 +37,16 @@ enum sPAPRTCEAccess {
 };
 
 struct sPAPRTCETable {
+/* temporary until everyone has its own AddressSpace */
 DMAContext dma;
+AddressSpace as;
+
 uint32_t liobn;
 uint32_t window_size;
 sPAPRTCE *table;
 bool bypass;
 int fd;
+MemoryRegion iommu;
 QLIST_ENTRY(sPAPRTCETable) list;
 };
 
@@ -68,8 +72,9 @@ static sPAPRTCETable *spapr_tce_find_by_liobn(uint32_t liobn)
 return NULL;
 }
 
-static IOMMUTLBEntry spapr_tce_translate_iommu(sPAPRTCETable *tcet, hwaddr 
addr)
+static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr 
addr)
 {
+sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
 uint64_t tce;
 
 #ifdef DEBUG_TCE
@@ -111,24 +116,9 @@ static IOMMUTLBEntry 
spapr_tce_translate_iommu(sPAPRTCETable *tcet, hwaddr addr)
 };
 }
 
-static int spapr_tce_translate(DMAContext *dma,
-   dma_addr_t addr,
-   hwaddr *paddr,
-   hwaddr *len,
-   DMADirection dir)
- {
-sPAPRTCETable *tcet = DO_UPCAST(sPAPRTCETable, dma, dma);
-bool is_write = (dir == DMA_DIRECTION_FROM_DEVICE);
-IOMMUTLBEntry entry = spapr_tce_translate_iommu(tcet, addr);
-if (!(entry.perm & (1 << is_write))) {
-return -EPERM;
-}
-
-/* Translate */
-*paddr = entry.translated_addr | (addr & entry.addr_mask);
-*len = (addr | entry.addr_mask) - addr + 1;
-return 0;
-}
+static MemoryRegionIOMMUOps spapr_iommu_ops = {
+.translate = spapr_tce_translate_iommu,
+};
 
 sPAPRTCETable *spapr_tce_new_table(uint32_t liobn, size_t window_size)
 {
@@ -145,8 +135,6 @@ sPAPRTCETable *spapr_tce_new_table(uint32_t liobn, size_t 
window_size)
 }
 
 tcet = g_malloc0(sizeof(*tcet));
-dma_context_init(&tcet->dma, &address_space_memory, spapr_tce_translate, 
NULL, NULL);
-
 tcet->liobn = liobn;
 tcet->window_size = window_size;
 
@@ -167,6 +155,11 @@ sPAPRTCETable *spapr_tce_new_table(uint32_t liobn, size_t 
window_size)
 "table @ %p, fd=%d\n", tcet, liobn, tcet->table, tcet->fd);
 #endif
 
+memory_region_init_iommu(&tcet->iommu, &spapr_iommu_ops,
+ "iommu-spapr", UINT64_MAX);
+address_space_init(&tcet->as, &tcet->iommu);
+dma_context_init(&tcet->dma, &tcet->as, NULL, NULL, NULL);
+
 QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
 
 return tcet;
@@ -190,6 +183,11 @@ DMAContext *spapr_tce_get_dma(sPAPRTCETable *tcet)
 return &tcet->dma;
 }
 
+MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet)
+{
+return &tcet->iommu;
+}
+
 void spapr_tce_set_bypass(sPAPRTCETable *tcet, bool bypass)
 {
 tcet->bypass = bypass;
@@ -208,6 +206,7 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, 
target_ulong ioba,
 target_ulong tce)
 {
 sPAPRTCE *tcep;
+IOMMUTLBEntry entry;
 
 if (ioba >= tcet->window_size) {
 hcall_dprintf("spapr_vio_put_tce on out-of-bounds IOBA 0x"
@@ -218,6 +217,13 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, 
target_ulong ioba,
 tcep = tcet->table + (ioba >> SPAPR_TCE_PAGE_SHIFT);
 tcep->tce = tce;
 
+entry.target_as = &address_space_memory,
+entry.iova = ioba & ~SPAPR_TCE_PAGE_MASK;
+entry.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
+entry.addr_mask = SPAPR_TCE_PAGE_MASK;
+entry.perm = tce;
+memory_region_notify_iommu(&tcet->iommu, entry);
+
 return H_SUCCESS;
 }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index e8d617b..142abb7 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -349,6 +349,7 @@ void spapr_events_init(sPAPREnvironment *spapr);
 void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
 sPAPRTCETable *spapr_tce_new_table(uint32_t liobn, size_t window_size);
 DMAContext *spapr_tce_get_dma(sPAPRTCETable *tcet);
+MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
 void spapr_tce_free(sPAPRTCETable *tcet);
 void spapr_tce_reset(sPAPRTCETable *tcet);
 void spapr_tce_set_bypass(sPAPRTCETable *tcet, bool bypass);
-- 
1.8.1.4

[Qemu-devel] [PATCH 23/25] memory: Fix comment typo

2013-06-20 Thread Paolo Bonzini

From: Peter Crosthwaite 

s/ajacent/adjacent

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Paolo Bonzini 
---
 memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/memory.c b/memory.c
index 8e99b8a..221b725 100644
--- a/memory.c
+++ b/memory.c
@@ -282,7 +282,7 @@ static bool can_merge(FlatRange *r1, FlatRange *r2)
 && r1->readonly == r2->readonly;
 }
 
-/* Attempt to simplify a view by merging ajacent ranges */
+/* Attempt to simplify a view by merging adjacent ranges */
 static void flatview_simplify(FlatView *view)
 {
 unsigned i, j;
-- 
1.8.1.4

Re: [Qemu-devel] [Bug 1191606] Re: qemu crashes with iscsi initiator (libiscsi) when using virtio

2013-06-20 Thread ronnie sahlberg

On Thu, Jun 20, 2013 at 7:47 AM, Laszlo Ersek  wrote:
> On 06/20/13 15:33, ronnie sahlberg wrote:
>> http://pastebin.com/EuwZPna1
>>
>> Last few thousand lines from the log with your patch.
>>
>>
>> The crash happens immediately after qemu has called out to iscsi_ioctl
>> with SG_IO to read the serial numbers vpd page.
>> We get the reply back from the target but as soon as ioctl_cb returns we 
>> crash.
>> If you comment out SG_IO in iscsi_ioctl then the crash does not happen
>> (but the qemu does nto get serial number either)
>>
>>
>> I will look more into it tonight.
>
>   virtqueue_map_sg: mapped gpa=790a9000 at hva=0x7f0cb10a9000 for 
> length=4, is_write=1  (out: data)
>   virtqueue_map_sg: mapped gpa=7726fc70 at hva=0x7f0caf26fc70 for 
> length=96, is_write=1 (out: sense)
>   virtqueue_map_sg: mapped gpa=764e5aa0 at hva=0x7f0cae4e5aa0 for 
> length=16, is_write=1 (out: errors, data_len, sense_len, residual)
>   virtqueue_map_sg: mapped gpa=764e5adc at hva=0x7f0cae4e5adc for 
> length=1, is_write=1  (out: status)
>   virtqueue_map_sg: mapped gpa=764e5a90 at hva=0x7f0cae4e5a90 for 
> length=16, is_write=0 (in: type, ioprio, sector)
>   virtqueue_map_sg: mapped gpa=7ab80578 at hva=0x7f0cb2b80578 for 
> length=6, is_write=0  (in: cmd)
>   virtio_blk_handle_request: type=0x0002
>   virtqueue_fill: unmapping hva=0x7f0c24008000 for length=4, access_len=1, 
> is_write=1
>   Bad ram pointer 0x7f0c24008000
>
> This looks related, in virtio_blk_handle_scsi():
>
> } else if (req->elem.in_num > 3) {
> /*
>  * If we have more than 3 input segments the guest wants to actually
>  * read data.
>  */
> hdr.dxfer_direction = SG_DXFER_FROM_DEV;
> hdr.iovec_count = req->elem.in_num - 3;
> for (i = 0; i < hdr.iovec_count; i++)
> hdr.dxfer_len += req->elem.in_sg[i].iov_len;
>
> hdr.dxferp = req->elem.in_sg;
> } else {
>
> This sets
> - "hdr.iovec_count" to 1,
> - "hdr.dxfer_len" to 4,
> - "hdr.dxferp" as shown above,
>
> For "struct sg_io_hdr" (which is the type of "hdr"), the typedef &
> documentation are in :
>
> unsigned short iovec_count; /* [i] 0 implies no scatter gather */
>
> void __user *dxferp;/* [i], [*io] points to data transfer memory
>   or scatter gather list */
>
> Now what we're seeing is a corruption of "req->elem.in_sg[0].iov_base",
> whose address equals that of "req->elem.in_sg" (it's at offset 0 in the
> struct at subscript #0 in the array).
>
>   virtqueue_map_sg: mapped gpa=790a9000 at hva=0x7f0cb10a9000 for 
> length=4, is_write=1
>   [...]
>   virtio_blk_handle_request: type=0x0002
>   virtqueue_fill: unmapping hva=0x7f0c24008000 for length=4, access_len=1, 
> is_write=1
>   Bad ram pointer 0x7f0c24008000
>
> First I don't understand how access_len can only be "1". But, in any
> case, if the "req->elem.in_sg[0].iov_base" pointer is stored in
> little-endian order, and the kernel (or iscsi_scsi_command_async()?) for
> whatever reason misinterprets "hdr.dxferp" to point at an actual receive
> buffer (instead of an iovec array), that would be consistent with the
> symptoms:

Ah, that makes sense.

block.iscsi.c   (https://github.com/qemu/qemu/blob/master/block/iscsi.c)
does assume that ioh->dxferp is a pointer to the buffer and that there
is no scatter gather.
See lines  745-749.

I did not know that ioctl() could take a scatter/gather list.


I cant test now  but if I understand right then
lines 745-749 should be replaced with something that does

* check ioh->iovec_count IF if it zero then there is no scatter gather
and ioh->dxferp points to a buffer,  so just do what we do today.
* IF iovec_count is > 0  then dxferp is NOT a pointer to a buffer but
a pointer to an array of iovec then
traverse the iovec array and add these as buffers to the task just
like we do for readv. For example similar to the loop to add the
iovecs in lines 449-453


I will try this tonight.


>
>   0x7f0cb10a9000 <--- original value of req->elem.in_sg[0].iov_base
>   0x7f0c24008000 <--- corrupted value
>  <--- 4 low bytes overwritten by SCSI data
>
> Laszl

[Qemu-devel] [RFC V8 02/24] qcow2: Add deduplication structures and fields.

2013-06-20 Thread Benoît Canet

Signed-off-by: Benoit Canet 
---
 block/qcow2.h |  203 -
 1 file changed, 201 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 9421843..953edfe 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -57,7 +57,182 @@
 #define REFCOUNT_CACHE_SIZE 4
 
 #define DEFAULT_CLUSTER_SIZE 65536
-
+#define DEFAULT_DEDUP_CLUSTER_SIZE 4096
+
+#define HASH_LENGTH 32
+
+/* indicate that this cluster hash has been deleted from the key value store */
+#define QCOW_DEDUP_DELETED (1LL << 61)
+/* indicate that the hash structure is empty and miss offset */
+#define QCOW_DEDUP_FLAG_EMPTY   (1LL << 62)
+
+#define SSD_ERASE_BLOCK_SIZE (512 * 1024) /* match SSD erase block size */
+#define JOURNAL_CLUSTER_SIZE 4096   /* used to read entries */
+#define HASH_STORE_CLUSTER_SIZE 4096
+
+#define QCOW_LOG_END_SIZE 2/* size of a end block journal entry */
+#define QCOW_LOG_STORE_ENTRY_USED (1LL << 60) /* mark used entry in table */
+#define QCOW_LOG_STORE_BUCKET_SIZE 4   /* size of a cuckoo hash bucket */
+#define QCOW_LOG_STORE_MAX_KICKS 128   /* max numbers of cuckoo hash kicks */
+#define QCOW_LOG_STORE_JOURNAL_RATIO 2 /* the ratio to compute the extra
+* room the journal will take based
+* on the log store size
+*/
+#define QCOW2_NB_INCARNATION_GOAL  128 /* targeted number of incarnation */
+
+#define QCOW_DEDUP_DIRTY 1 /* dirty flag in the qcow2 header extension */
+
+typedef enum {
+QCOW_LOG_NONE = 0xFF, /* on SSD erased clusters will mark none */
+QCOW_LOG_END = 1, /* end a block and point to the next */
+QCOW_LOG_HASH = 2,/* used to journalize a QCowHashInfo */
+} QCowLogEntryType;
+
+typedef enum {
+QCOW_HASH_SHA256 = 0,
+QCOW_HASH_SHA3   = 1,
+QCOW_HASH_SKEIN  = 2,
+} QCowHashAlgo;
+
+typedef struct {
+uint8_t data[HASH_LENGTH]; /* 32 bytes hash of a given cluster */
+} __attribute__((packed)) QCowHash;
+
+/* deduplication info */
+typedef struct {
+QCowHash hash;
+uint64_t physical_sect;   /* where the cluster is stored on disk */
+uint64_t first_logical_sect;  /* logical sector of the first occurrence of
+   * this cluster
+   */
+} __attribute__((packed)) QCowHashInfo;
+
+/* Used to keep a single precomputed hash between the calls of the dedup
+ * function
+ */
+typedef struct {
+QCowHashInfo hash_info;
+bool reuse; /* The main deduplication function can set this field to
+ * true before exiting to avoid computing the same hash
+ * twice. It's a speed optimization.
+ */
+} QcowPersistentHash;
+
+/* Undedupable hashes that must be written later to disk */
+typedef struct QCowHashElement {
+QCowHashInfo hash_info;
+QTAILQ_ENTRY(QCowHashElement) next;
+} QCowHashElement;
+
+typedef struct {
+QcowPersistentHash phash;  /* contains a hash persisting between calls of
+* qcow2_dedup()
+*/
+QTAILQ_HEAD(, QCowHashElement) undedupables;
+uint64_t nb_clusters_processed;
+uint64_t nb_undedupable_sectors;
+} QCowDedupState;
+
+/* The code must take care that the maximum size field of a QCowJournalEntry
+ * will be no more than 254 bytes.
+ * It's required to save the 2 bytes of room for QCOW_LOG_END entries
+ * in every cases
+ */
+typedef union {
+QCowHashInfo hash_info;
+uint8_t  padding[254]; /* note the extra two bytes of padding to avoid
+* read overflow.
+*/
+} QCowJournalEntryUnion;
+
+typedef struct {
+uint8_t size;/* maximum size of a journal entry is 254 bytes */
+uint8_t type;/* contains a QCowLogEntryType for future usage */
+QCowJournalEntryUnion u;
+} __attribute__((packed)) QCowJournalEntry;
+
+typedef struct {
+uint64_t sector;  /* the journal physical on disk sector */
+uint64_t size;/* the size of the journal in bytes */
+uint64_t index;   /* index of next buf cluster to write */
+uint8_t  *write_buf;  /* used to buffer written data */
+uint64_t offset_in_buf;   /* the offset in the write buffer */
+bool flushed; /* true if the buffer reached disk*/
+uint8_t  *read_cache; /* used to cache read data */
+int64_t read_index;   /* index the cached read cluster */
+bool started; /* has the journal been resumed */
+} QCowJournal;
+
+typedef struct {
+QCowJournal journal;  /* the journal this log store will use */
+uint32_t order;   /* the number of bits used for the sub hashes
+   * as sub h

Re: [Qemu-devel] [PATCH 07/22] memory: add address_space_translate

2013-06-20 Thread Paolo Bonzini

Il 20/06/2013 16:43, Peter Maydell ha scritto:
>>> >> There are other places in memory.c which do an int128_get64()
>>> >> on mr->size, which also look suspicious...
>> >
>> > They are all on I/O regions so they are safe
> Not entirely sure I understand this. There's no particular
> reason I can't create a 2^64 sized I/O memory region
> and put it in an address space, is there?

I think there are problems in the core if you do that (probably part of
it is fixed now).  Still, in cases like this:

memory_region_add_coalescing(mr, 0, int128_get64(mr->size));

the API simply doesn't support it.

Paolo

Re: [Qemu-devel] [PATCH 03/12] qtest: return string from QMP commands

2013-06-20 Thread Andreas Färber

Am 19.06.2013 22:40, schrieb Anthony Liguori:
> Signed-off-by: Anthony Liguori 
> ---
>  tests/libqtest.c | 16 +---
>  tests/libqtest.h | 14 +++---
>  2 files changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/libqtest.c b/tests/libqtest.c
> index 81107cf..235ec62 100644
> --- a/tests/libqtest.c
> +++ b/tests/libqtest.c
> @@ -287,10 +287,13 @@ redo:
>  return words;
>  }
>  
> -void qtest_qmpv(QTestState *s, const char *fmt, va_list ap)
> +char *qtest_qmpv(QTestState *s, const char *fmt, va_list ap)
>  {
>  bool has_reply = false;
>  int nesting = 0;
> +GString *ret;
> +
> +ret = g_string_new("");
>  
>  /* Send QMP request */
>  socket_sendf(s->qmp_fd, fmt, ap);
> @@ -319,16 +322,23 @@ void qtest_qmpv(QTestState *s, const char *fmt, va_list 
> ap)
>  nesting--;
>  break;
>  }
> +
> +g_string_append_c(ret, c);
>  }
> +
> +return g_string_free(ret, FALSE);
>  }
>  
> -void qtest_qmp(QTestState *s, const char *fmt, ...)
> +char *qtest_qmp(QTestState *s, const char *fmt, ...)
>  {
>  va_list ap;
> +char *ret;
>  
>  va_start(ap, fmt);
> -qtest_qmpv(s, fmt, ap);
> +ret = qtest_qmpv(s, fmt, ap);
>  va_end(ap);
> +
> +return ret;
>  }
>  
>  const char *qtest_get_arch(void)
> diff --git a/tests/libqtest.h b/tests/libqtest.h
> index 592f035..5cdcae7 100644
> --- a/tests/libqtest.h
> +++ b/tests/libqtest.h
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  typedef struct QTestState QTestState;
>  
> @@ -48,8 +49,10 @@ void qtest_quit(QTestState *s);
>   * @fmt...: QMP message to send to qemu
>   *
>   * Sends a QMP message to QEMU
> + *
> + * Returns: the result of the QMP command
>   */
> -void qtest_qmp(QTestState *s, const char *fmt, ...);
> +char *qtest_qmp(QTestState *s, const char *fmt, ...);
>  
>  /**
>   * qtest_qmpv:
> @@ -58,8 +61,10 @@ void qtest_qmp(QTestState *s, const char *fmt, ...);
>   * @ap: QMP message arguments
>   *
>   * Sends a QMP message to QEMU.
> + *
> + * Returns: the result of the QMP command
>   */
> -void qtest_qmpv(QTestState *s, const char *fmt, va_list ap);
> +char *qtest_qmpv(QTestState *s, const char *fmt, va_list ap);
>  
>  /**
>   * qtest_get_irq:
> @@ -340,10 +345,13 @@ static inline QTestState *qtest_start(const char *args)
>  static inline void qmp(const char *fmt, ...)
>  {
>  va_list ap;
> +char *ret;
>  
>  va_start(ap, fmt);
> -qtest_qmpv(global_qtest, fmt, ap);
> +ret = qtest_qmpv(global_qtest, fmt, ap);
>  va_end(ap);
> +
> +g_free(ret);
>  }
>  
>  /**

In http://patchwork.ozlabs.org/patch/207689/ you had suggested to return
QObject?

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH] libqos: include dependencies

2013-06-20 Thread Markus Armbruster

Anthony Liguori  writes:

> Otherwise rebuilds can fail when libqos is modified.
>
> Reported-by: Markus Armbruster 
> Signed-off-by: Anthony Liguori 

Tested-by: Markus Armbruster 

> ---
>  tests/Makefile | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/tests/Makefile b/tests/Makefile
> index 9a3007d..0042cf4 100644
> --- a/tests/Makefile
> +++ b/tests/Makefile
> @@ -227,3 +227,4 @@ check-block: $(patsubst %,check-%, $(check-block-y))
>  check: check-unit check-qtest
>  
>  -include $(wildcard tests/*.d)
> +-include $(wildcard tests/libqos/*.d)

Top-level Makefile also has the line

-include $(wildcard *.d tests/*.d)

Why is it not necessary to change that file, too?

Re: [Qemu-devel] [PATCH 02/12] qtest: add spapr hypercall support

2013-06-20 Thread Andreas Färber

Am 19.06.2013 22:40, schrieb Anthony Liguori:
> Signed-off-by: Anthony Liguori 
> ---
>  qtest.c  | 29 +
>  tests/libqtest.c | 18 ++
>  tests/libqtest.h | 46 ++
>  3 files changed, 93 insertions(+)
> 
> diff --git a/qtest.c b/qtest.c
> index 07a9612..f8c8f44 100644
> --- a/qtest.c
> +++ b/qtest.c
> @@ -19,6 +19,9 @@
>  #include "hw/irq.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/cpus.h"
> +#ifdef TARGET_PPC64
> +#include "hw/ppc/spapr.h"
> +#endif
>  
>  #define MAX_IRQ 256
>  
> @@ -141,6 +144,13 @@ static bool qtest_opened;
>   * where NUM is an IRQ number.  For the PC, interrupts can be intercepted
>   * simply with "irq_intercept_in ioapic" (note that IRQ0 comes out with
>   * NUM=0 even though it is remapped to GSI 2).
> + *
> + * Platform specific (sPAPR):
> + *
> + *  > papr_hypercall NR ARG0 ARG1 ... ARG8

The functions are called spapr_hcall*() but the protocol uses
papr_hypercall?

> + *  < OK RET
> + *
> + * where NR, ARG[0-8] and RET are all integers.
>   */
>  
>  static int hex2nib(char ch)
> @@ -425,6 +435,25 @@ static void qtest_process_command(CharDriverState *chr, 
> gchar **words)
>  qtest_clock_warp(ns);
>  qtest_send_prefix(chr);
>  qtest_send(chr, "OK %"PRIi64"\n", 
> (int64_t)qemu_get_clock_ns(vm_clock));
> +#ifdef TARGET_PPC64
> +} else if (strcmp(words[0], "papr_hypercall") == 0) {
> +uint64_t nr;
> +uint64_t args[9];
> +uint64_t ret;
> +int i;
> +
> +memset(args, 0, sizeof(args));
> +g_assert(words[1]);
> +nr = strtoull(words[1], NULL, 0);
> +for (i = 0; i < 9; i++) {
> +if (words[2 + i] == NULL) {
> +break;
> +}
> +args[i] = strtoull(words[2 + i], NULL, 0);
> +}
> +ret = spapr_hypercall(ppc_env_get_cpu(first_cpu), nr, args);
> +qtest_send(chr, "OK 0x%" PRIx64 "\n", ret);
> +#endif
>  } else {
>  qtest_send_prefix(chr);
>  qtest_send(chr, "FAIL Unknown command `%s'\n", words[0]);
> diff --git a/tests/libqtest.c b/tests/libqtest.c
> index 879ffe9..81107cf 100644
> --- a/tests/libqtest.c
> +++ b/tests/libqtest.c
> @@ -544,3 +544,21 @@ void qtest_memwrite(QTestState *s, uint64_t addr, const 
> void *data, size_t size)
>  qtest_sendf(s, "\n");
>  qtest_rsp(s, 0);
>  }
> +
> +uint64_t qtest_spapr_hcall9(QTestState *s, uint64_t nr, uint64_t a0,
> +uint64_t a1, uint64_t a2, uint64_t a3, uint64_t 
> a4,
> +uint64_t a5, uint64_t a6, uint64_t a7, uint64_t 
> a8)
> +{
> +gchar **args;
> +uint64_t value;
> +
> +qtest_sendf(s, "papr_hypercall 0x%" PRIx64 " 0x%" PRIx64
> +" 0x%" PRIx64 " 0x%" PRIx64 " 0x%" PRIx64 " 0x%" PRIx64 
> +" 0x%" PRIx64 " 0x%" PRIx64 " 0x%" PRIx64 " 0x%" PRIx64 
> +"\n", nr, a0, a1, a2, a3, a4, a5, a6, a7, a8);
> +args = qtest_rsp(s, 2);
> +value = strtoull(args[1], NULL, 0);
> +g_strfreev(args);
> +
> +return value;
> +}
> diff --git a/tests/libqtest.h b/tests/libqtest.h
> index 437bda3..592f035 100644
> --- a/tests/libqtest.h
> +++ b/tests/libqtest.h
> @@ -286,6 +286,19 @@ int64_t qtest_clock_step(QTestState *s, int64_t step);
>  int64_t qtest_clock_set(QTestState *s, int64_t val);
>  
>  /**
> + * qtest_spapr_hcall9:
> + * @s: QTestState instance to operate on.
> + * @nr: The hypercall index
> + * @aN: The @Nth hypercall argument
> + *
> + * Issue an sPAPR hypercall
> + *
> + * Returns: The result of the hypercall.
> + */
> +uint64_t qtest_spapr_hcall9(QTestState *s, uint64_t nr, uint64_t a0,
> +uint64_t a1, uint64_t a2, uint64_t a3, uint64_t 
> a4,
> +uint64_t a5, uint64_t a6, uint64_t a7, uint64_t 
> a8);
> +/**
>   * qtest_get_arch:
>   *
>   * Returns: The architecture for the QEMU executable under test.
> @@ -607,4 +620,37 @@ static inline int64_t clock_set(int64_t val)
>  return qtest_clock_set(global_qtest, val);
>  }
>  
> +static inline uint64_t spapr_hcall0(uint64_t nr)
> +{
> +return qtest_spapr_hcall9(global_qtest, nr, 0, 0, 0, 0, 0, 0, 0, 0, 0);
> +}
> +
> +static inline uint64_t spapr_hcall1(uint64_t nr, uint64_t a0)
> +{
> +return qtest_spapr_hcall9(global_qtest, nr, a0, 0, 0, 0, 0, 0, 0, 0, 0);
> +}
> +
> +static inline uint64_t spapr_hcall2(uint64_t nr, uint64_t a0, uint64_t a1)
> +{
> +return qtest_spapr_hcall9(global_qtest, nr, a0, a1, 0, 0, 0, 0, 0, 0, 0);
> +}
> +
> +static inline uint64_t spapr_hcall3(uint64_t nr, uint64_t a0, uint64_t a1,
> +uint64_t a2)
> +{
> +return qtest_spapr_hcall9(global_qtest, nr, a0, a1, a2, 0, 0, 0, 0, 0, 
> 0);
> +}
> +
> +static inline uint64_t spapr_hcall4(uint64_t nr, uint64_t a0, uint64_t a1,
> +uint64_t a2, uint64_t a3)
> +{
> +return qte

Re: [Qemu-devel] qemu-ga behavior on virtio-serial unplug

2013-06-20 Thread mdroth

On Thu, Jun 20, 2013 at 10:12:30AM -0500, mdroth wrote:
> On Wed, Jun 19, 2013 at 01:17:57PM +0200, Laszlo Ersek wrote:
> > Hello Michael,
> > 
> > this is with reference to
> > .
> > 
> > Ever since the initial qemu-ga commit AFAICS an exception for
> > virtio-serial has existed, when reading EOF from the channel.
> > 
> > For isa-serial, EOF results in the client connection being closed. I
> > assume this exits the glib main loop somehow (otherwise qemu-ga would
> > just sit there and do nothing, as no further connections are accepted I
> > think).
> 
> I think it would actually do the latter, unfortunately. It's distinct
> from virtio-serial handling in that we remove the GSource by return false
> in the event handler (qga/main.c:channel_event_cb), but I think we'd
> need to drop in a g_main_loop_quit() to actually get qemu-ga to exit in
> that scenario.
> 
> This doesn't normally get triggered though, as isa-serial does not send
> EOF when nothing is connected to the chardev backend, but instead just
> blocks. Might till make sense to make qemu-ga exit in this case though
> since it won't be doing anything useful and wrapper scripts would at
> least have some indication that something is up.
> 
> > 
> > For a unix domain socket, we can continue accepting new connections
> > after reading EOF.
> > 
> > For virtio-serial, EOF means "no host-side process is connected". In
> > this case we sleep a bit and go back to reading from the same fd (and
> > this turns into a sleep loop until the next host-side process connects).
> > 
> > 
> > Can we tell "virtio-serial port unplug" from "no host-side process"?
> > Because in the former case qemu-ga should really close the channel (and
> > maybe exit (*)), otherwise the unplug won't complete in the guest kernel.
> > 
> > According to Amit's comments in the RHBZ, at unplug a SIGIO is
> > delivered, and a POLLHUP event is reported. However,
> > 
> > (a) I think the glib iochannel abstraction doesn't allow us to tell
> > POLLHUP apart from reading EOF;
> 
> AFAICT we can actually access the POLLHUP event via the 'condition' param
> that gets passed to the cb, but the issue is we also get POLLUP when
> the chardev backend isn't open.
> 
> > 
> > (b) delivery of an unhandled SIGIO seems to terminate the victim
> > process. qemu-ga doesn't seem to either catch or block SIGIO, which is
> > why I think a SIGIO signal is not sent in reality (maybe we should ask
> > for it first?)
> > 
> > ... Actually I'm confused about this as well. The virtio-serial port
> > *is* opened with O_ASYNC (and on Solaris, it is replaced with an
> > I_SETSIG ioctl()). What's the reason for this? g_io_channel_unix_new()
> > doesn't seem to list it as a requirement, and qemu-ga doesn't seem to
> > handle SIGIO.
> 
> At some point I played around with trying to use SIGIO to handle channel
> resets and whatnot (since we're also supposed to get one when someone
> opens the chardev backend and causes VIRTIO_CONSOLE_PORT_OPEN to get
> sent). I don't think I ever got it working, SIGIO doesn't seem to get
> sent, so that O_ASYNC might just be a relic from that.
> 
> I tried installing a handler retested host-connect as well as hot
> unplug and still don't seem to be getting the signal. Not sure if i'm
> doing something wrong or if there's an issue with the guest driver.
> 
> I did notice something interesting though:
> 
> 1371740628.596505: debug: cb: condition: 17, status: 2
> 1371740628.596541: debug: received EOF
> 1371740628.395726: debug: cb: condition: 17, status: 2
> 1371740628.395760: debug: received EOF
> 1371740628.496035: debug: cb: condition: 17, status: 2
> 1371740628.496072: debug: received EOF
> 1371740628.596505: debug: cb: condition: 17, status: 2
> 1371740628.596541: debug: received EOF
> 
> 
> 
> 1371740634.195524: debug: cb: condition: 1, status: 1
> 1371740634.195556: debug: read data, count: 25, data:
> {'execute':'guest-ping'}
> 
> 1371740634.195634: debug: process_event: called
> 1371740634.195660: debug: processing command
> 1371740634.196007: debug: sending data, count: 15
> 
> 
> 
> 1371740644.113346: debug: cb: condition: 16, status: 2
> 1371740644.113379: debug: received EOF
> 1371740644.213694: debug: cb: condition: 16, status: 2
> 1371740644.213725: debug: received EOF
> 1371740644.314041: debug: cb: condition: 16, status: 2
> 1371740644.314168: debug: received EOF
> 
> i.e. we got the POLLHUP if we read from an
> unconnected-but-present port, and we *don't* get the POLLHUP
> if the port has been unplugged.

Er...silly me, POLLHUP=16, POLLIN=1 for glib, so I mixed them
up. For unplugged case we get POLLHUP, for unconnected case we get
POLLIN | POLLHUP, so that might actually be enough to distinguish
unplug if this is the intended behavior.

Amit, can you confirm?

> 
> And in none of these cases do the SIGIO seem to be sent.
> 
> Here's the debug stuff i added:
> 
> diff --git a/qga/main.c b/qga/main.c
> index 0e04e73..7f9

Re: [Qemu-devel] qemu-ga behavior on virtio-serial unplug

2013-06-20 Thread mdroth

On Wed, Jun 19, 2013 at 01:17:57PM +0200, Laszlo Ersek wrote:
> Hello Michael,
> 
> this is with reference to
> .
> 
> Ever since the initial qemu-ga commit AFAICS an exception for
> virtio-serial has existed, when reading EOF from the channel.
> 
> For isa-serial, EOF results in the client connection being closed. I
> assume this exits the glib main loop somehow (otherwise qemu-ga would
> just sit there and do nothing, as no further connections are accepted I
> think).

I think it would actually do the latter, unfortunately. It's distinct
from virtio-serial handling in that we remove the GSource by return false
in the event handler (qga/main.c:channel_event_cb), but I think we'd
need to drop in a g_main_loop_quit() to actually get qemu-ga to exit in
that scenario.

This doesn't normally get triggered though, as isa-serial does not send
EOF when nothing is connected to the chardev backend, but instead just
blocks. Might till make sense to make qemu-ga exit in this case though
since it won't be doing anything useful and wrapper scripts would at
least have some indication that something is up.

> 
> For a unix domain socket, we can continue accepting new connections
> after reading EOF.
> 
> For virtio-serial, EOF means "no host-side process is connected". In
> this case we sleep a bit and go back to reading from the same fd (and
> this turns into a sleep loop until the next host-side process connects).
> 
> 
> Can we tell "virtio-serial port unplug" from "no host-side process"?
> Because in the former case qemu-ga should really close the channel (and
> maybe exit (*)), otherwise the unplug won't complete in the guest kernel.
> 
> According to Amit's comments in the RHBZ, at unplug a SIGIO is
> delivered, and a POLLHUP event is reported. However,
> 
> (a) I think the glib iochannel abstraction doesn't allow us to tell
> POLLHUP apart from reading EOF;

AFAICT we can actually access the POLLHUP event via the 'condition' param
that gets passed to the cb, but the issue is we also get POLLUP when
the chardev backend isn't open.

> 
> (b) delivery of an unhandled SIGIO seems to terminate the victim
> process. qemu-ga doesn't seem to either catch or block SIGIO, which is
> why I think a SIGIO signal is not sent in reality (maybe we should ask
> for it first?)
> 
> ... Actually I'm confused about this as well. The virtio-serial port
> *is* opened with O_ASYNC (and on Solaris, it is replaced with an
> I_SETSIG ioctl()). What's the reason for this? g_io_channel_unix_new()
> doesn't seem to list it as a requirement, and qemu-ga doesn't seem to
> handle SIGIO.

At some point I played around with trying to use SIGIO to handle channel
resets and whatnot (since we're also supposed to get one when someone
opens the chardev backend and causes VIRTIO_CONSOLE_PORT_OPEN to get
sent). I don't think I ever got it working, SIGIO doesn't seem to get
sent, so that O_ASYNC might just be a relic from that.

I tried installing a handler retested host-connect as well as hot
unplug and still don't seem to be getting the signal. Not sure if i'm
doing something wrong or if there's an issue with the guest driver.

I did notice something interesting though:

1371740628.596505: debug: cb: condition: 17, status: 2
1371740628.596541: debug: received EOF
1371740628.395726: debug: cb: condition: 17, status: 2
1371740628.395760: debug: received EOF
1371740628.496035: debug: cb: condition: 17, status: 2
1371740628.496072: debug: received EOF
1371740628.596505: debug: cb: condition: 17, status: 2
1371740628.596541: debug: received EOF

1371740634.195524: debug: cb: condition: 1, status: 1
1371740634.195556: debug: read data, count: 25, data:
{'execute':'guest-ping'}

1371740634.195634: debug: process_event: called
1371740634.195660: debug: processing command
1371740634.196007: debug: sending data, count: 15

1371740644.113346: debug: cb: condition: 16, status: 2
1371740644.113379: debug: received EOF
1371740644.213694: debug: cb: condition: 16, status: 2
1371740644.213725: debug: received EOF
1371740644.314041: debug: cb: condition: 16, status: 2
1371740644.314168: debug: received EOF

i.e. we got the POLLHUP if we read from an
unconnected-but-present port, and we *don't* get the POLLHUP
if the port has been unplugged.

And in none of these cases do the SIGIO seem to be sent.

Here's the debug stuff i added:

diff --git a/qga/main.c b/qga/main.c
index 0e04e73..7f9a628 100644
--- a/qga/main.c
+++ b/qga/main.c
@@ -140,6 +140,11 @@ static void quit_handler(int sig)
 }
 }

+static void sigio_handler(int sig)
+{
+g_debug("got sigio: %d", sig);
+}
+
 #ifndef _WIN32
 static gboolean register_signal_handlers(void)
 {
@@ -158,6 +163,13 @@ static gboolean register_signal_handlers(void)
 g_error("error configuring signal handler: %s", strerror(errno));
 }

+memset(&sigact, 0, sizeof(struct sigaction));
+sigact.sa_handler = sigio_handler;
+ret = sigaction(SIGIO, &sigact,

[Qemu-devel] [PATCH] libqos: include dependencies

2013-06-20 Thread Anthony Liguori

Otherwise rebuilds can fail when libqos is modified.

Reported-by: Markus Armbruster 
Signed-off-by: Anthony Liguori 
---
 tests/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/Makefile b/tests/Makefile
index 9a3007d..0042cf4 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -227,3 +227,4 @@ check-block: $(patsubst %,check-%, $(check-block-y))
 check: check-unit check-qtest
 
 -include $(wildcard tests/*.d)
+-include $(wildcard tests/libqos/*.d)
-- 
1.8.0

Re: [Qemu-devel] [PATCH 0/2] Remove hardcoded xen-platform device initialization (v4)

2013-06-20 Thread Michael S. Tsirkin

On Thu, Jun 20, 2013 at 05:02:56PM +0200, Paolo Bonzini wrote:
> Il 18/06/2013 16:22, Michael S. Tsirkin ha scritto:
> > On Tue, Jun 18, 2013 at 01:15:57PM +, Paul Durrant wrote:
> >>> -Original Message-
> >>> From: Laszlo Ersek [mailto:ler...@redhat.com]
> >>> Sent: 18 June 2013 14:14
> >>> To: Michael S. Tsirkin
> >>> Cc: Paul Durrant; qemu-devel@nongnu.org
> >>> Subject: Re: [Qemu-devel] [PATCH 0/2] Remove hardcoded xen-platform
> >>> device initialization (v4)
> >>>
> >>> On 06/18/13 15:01, Michael S. Tsirkin wrote:
>  On Tue, Jun 18, 2013 at 12:57:54PM +, Paul Durrant wrote:
> >> -Original Message-
> >> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> >> Sent: 18 June 2013 13:52
> >> To: Laszlo Ersek
> >> Cc: Paul Durrant; qemu-devel@nongnu.org
> >> Subject: Re: [Qemu-devel] [PATCH 0/2] Remove hardcoded xen-
> >>> platform
> >> device initialization (v4)
> >>
> >> On Tue, Jun 18, 2013 at 02:37:58PM +0200, Laszlo Ersek wrote:
> >>> Hi Paul,
> >>>
> >>> (xen-devel snipped)
> >>>
> >>> On 06/18/13 13:16, Paul Durrant wrote:
>  Because of concerns over backwards compatibility and a suggestion
> >>> that
>  xenfv should be retired in favour of using the pc machine type I have
> >>> re-
>  worked my original patch into 2 patches:
> 
>  [PATCH 1/2] Allow use of pc machine type (accel=xen) for Xen HVM
>  [PATCH 2/2] Move hardcoded initialization of xen-platform device.
> 
>  Application of both these patches allows alternative pc machine types
> >>> to
> >> be
>  used with the accel=xen option, but preserves the hardcoded
> >>> creation of
>  the xen-platform device only for machine type xenfv.
> 
>  v3:
>  - Add test for xen_enabled() that went missing in v2
> 
>  v4:
>  - Remove erroneous whitespace hunk
>  - Replace hw_error() with fprintf()+exit(1)
>  - Add braces to single-line if
> >>>
> >>> can you please offer an opinion in the
> >>>
> >>>   [PATCH 1/2] pvpanic: initialization cleanup
> >>>   http://thread.gmane.org/gmane.comp.emulators.qemu/216940
> >>>
> >>> thread?
> >>>
> >>> >From where I stand (which is "quite afar" :)) this series of yours 
> >>> >seems
> >>> somewhat related to my doubt there.
> >>>
> >>> Thanks!
> >>> Laszlo
> >>
> >> OK will make it skip fwcfg as we did earlier.
> >> Thanks for the review.
> >>
> >
> > Yes, I think the assert(fw_cfg) would be problematic in the xen case
> >>> where, up until my patch, machine types was necessarily xenfv.
> >
> >   Paul
> 
>  Do you guys actually need the pvpanic device?
>  How do you know which port to use without fwcfg?
> >>>
> >>> Xen domains don't know the port and don't use the pvpanic device, but
> >>> qemu starts at least. In other words, the pvpanic device is created, but
> >>> unreachable. Maybe the has_pvpanic logic should depend on (or extended
> >>> with) !xen_enabled().
> >>>
> >>
> >> That seems entirely reasonable to me.
> > 
> > We can just skip creating the device if there's no fw cfg.
> 
> No, in principle Xen domains could use another scheme to find the port
> (xenstore for example).  If Xen domains do not want it, they can just
> add an "if".  Or we could just skip the fw_cfg step.  The device will be
> there but not ACPI-discoverable.
> 
> Paolo

That's a good reason to hide it.  We don't want multiple
mechanisms to discover the same feature, it's a
maintainance headache.

For example, if we do this guests might try to use these ports for something
else by mistake. Or they will start hard-coding the port,
then it will break if user tweaks it from command line.
Simply put, if xen wants to use features exposed through fw cfg,
let's add fw cfg support for xen and use let xen use fw cfg to discover
them.

-- 
MST

1 2 3 >

1 - 100 of 274 matches

Mail list logo