[Qemu-block] [PATCH v7 0/7] trace: enable tracing in qemu-io/qemu-nbd/qemu-img

2016-06-16 Thread Denis V. Lunev
Changes from v6:
- changed order of patches 1 & 2

Changes from v5:
- added missed hunk into patch #7

Changes from v4:
- synced help descriprion for --trace with man for qemu.1/qemu-img.1/qemu-nbd.8
- moved @findex from qemu-option-trace.texi

Changes from v3:
- fixed difference in help/man for qemu-img/qemu-nbd
- created separate .texi to contain trace description, proper dependency is
  added to makefile
- added --version/--help description to qemu-img
- fixed crash induced by new option processing scheme in qemu-img which
has happened when invoked as './qemu-img -K'

Changes from v2:
- tweaked man-pages of qemu-nbd/qemu-img
- added support for qemu-img (patches 4-5 as suggested)

Changes from v1:
- fixed nits found by Eric

Signed-off-by: Denis V. Lunev 
Reviewed-by: Eric Blake 
CC: Paolo Bonzini 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 

Denis V. Lunev (7):
  doc: sync help descriprion for --trace with man for qemu.1
  doc: move text describing --trace to specific .texi file
  trace: move qemu_trace_opts to trace/control.c
  trace: enable tracing in qemu-io
  trace: enable tracing in qemu-nbd
  qemu-img: move common options parsing before commands processing
  trace: enable tracing in qemu-img

 Makefile|  7 ---
 qemu-img.c  | 55 ++-
 qemu-img.texi   | 13 -
 qemu-io.c   | 17 +
 qemu-nbd.c  | 18 +-
 qemu-nbd.texi   |  3 +++
 qemu-options.hx | 29 ++---
 trace/control.c | 44 +++-
 trace/control.h | 24 +---
 vl.c| 37 +
 10 files changed, 150 insertions(+), 97 deletions(-)

-- 
2.1.4




[Qemu-block] [PATCH 1/7] doc: sync help descriprion for --trace with man for qemu.1

2016-06-16 Thread Denis V. Lunev
Signed-off-by: Denis V. Lunev 
Reviewed-by: Eric Blake 
CC: Paolo Bonzini 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 qemu-options.hx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 0e42ba5..bee246e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3667,7 +3667,7 @@ DEF("trace", HAS_ARG, QEMU_OPTION_trace,
 STEXI
 HXCOMM This line is not accurate, as some sub-options are backend-specific but
 HXCOMM HX does not support conditional compilation of text.
-@item -trace [events=@var{file}][,file=@var{file}]
+@item -trace [[enable=]@var{pattern}][,events=@var{file}][,file=@var{file}]
 @findex -trace
 
 Specify tracing options.
-- 
2.1.4




[Qemu-block] [PATCH 4/7] trace: enable tracing in qemu-io

2016-06-16 Thread Denis V. Lunev
Moving trace_init_backends() into trace_opt_parse() is not possible. This
should be called after daemonize() in vl.c.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Eric Blake 
CC: Paolo Bonzini 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 qemu-io.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index d977a6e..0ef28b0 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -18,6 +18,7 @@
 #include "qemu/option.h"
 #include "qemu/config-file.h"
 #include "qemu/readline.h"
+#include "qemu/log.h"
 #include "qapi/qmp/qstring.h"
 #include "qom/object_interfaces.h"
 #include "sysemu/block-backend.h"
@@ -253,7 +254,9 @@ static void usage(const char *name)
 "  -k, --native-aio use kernel AIO implementation (on Linux only)\n"
 "  -t, --cache=MODE use the given cache mode for the image\n"
 "  -d, --discard=MODE   use the given discard mode for the image\n"
-"  -T, --trace FILE enable trace events listed in the given file\n"
+"  -T, --trace [[enable=]][,events=][,file=]\n"
+"   specify tracing options\n"
+"   see qemu-img(1) man page for full description\n"
 "  -h, --help   display this help and exit\n"
 "  -V, --versionoutput version information and exit\n"
 "\n"
@@ -458,6 +461,7 @@ int main(int argc, char **argv)
 Error *local_error = NULL;
 QDict *opts = NULL;
 const char *format = NULL;
+char *trace_file = NULL;
 
 #ifdef CONFIG_POSIX
 signal(SIGPIPE, SIG_IGN);
@@ -470,6 +474,7 @@ int main(int argc, char **argv)
 
 module_call_init(MODULE_INIT_QOM);
 qemu_add_opts(&qemu_object_opts);
+qemu_add_opts(&qemu_trace_opts);
 bdrv_init();
 
 while ((c = getopt_long(argc, argv, sopt, lopt, &opt_index)) != -1) {
@@ -509,9 +514,7 @@ int main(int argc, char **argv)
 }
 break;
 case 'T':
-if (!trace_init_backends()) {
-exit(1); /* error message will have been printed */
-}
+trace_file = trace_opt_parse(optarg, trace_file);
 break;
 case 'V':
 printf("%s version %s\n", progname, QEMU_VERSION);
@@ -557,6 +560,12 @@ int main(int argc, char **argv)
 exit(1);
 }
 
+if (!trace_init_backends()) {
+exit(1);
+}
+trace_init_file(trace_file);
+qemu_set_log(LOG_TRACE);
+
 /* initialize commands */
 qemuio_add_command(&quit_cmd);
 qemuio_add_command(&open_cmd);
-- 
2.1.4




[Qemu-block] [PATCH 2/7] doc: move text describing --trace to specific .texi file

2016-06-16 Thread Denis V. Lunev
This text will be included to qemu-nbd/qemu-img mans in the next patches.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Eric Blake 
CC: Paolo Bonzini 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 Makefile|  3 ++-
 qemu-options.hx | 27 +--
 2 files changed, 3 insertions(+), 27 deletions(-)

diff --git a/Makefile b/Makefile
index ed4032a..eb6c573 100644
--- a/Makefile
+++ b/Makefile
@@ -564,6 +564,7 @@ qemu.1: qemu-doc.texi qemu-options.texi qemu-monitor.texi 
qemu-monitor-info.texi
  perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $< qemu.pod && \
  $(POD2MAN) --section=1 --center=" " --release=" " qemu.pod > $@, \
  "  GEN   $@")
+qemu.1: qemu-option-trace.texi
 
 qemu-img.1: qemu-img.texi qemu-img-cmds.texi
$(call quiet-command, \
@@ -595,7 +596,7 @@ info: qemu-doc.info qemu-tech.info
 pdf: qemu-doc.pdf qemu-tech.pdf
 
 qemu-doc.dvi qemu-doc.html qemu-doc.info qemu-doc.pdf: \
-   qemu-img.texi qemu-nbd.texi qemu-options.texi \
+   qemu-img.texi qemu-nbd.texi qemu-options.texi qemu-option-trace.texi \
qemu-monitor.texi qemu-img-cmds.texi qemu-ga.texi \
qemu-monitor-info.texi
 
diff --git a/qemu-options.hx b/qemu-options.hx
index bee246e..f28abd3 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3669,32 +3669,7 @@ HXCOMM This line is not accurate, as some sub-options 
are backend-specific but
 HXCOMM HX does not support conditional compilation of text.
 @item -trace [[enable=]@var{pattern}][,events=@var{file}][,file=@var{file}]
 @findex -trace
-
-Specify tracing options.
-
-@table @option
-@item [enable=]@var{pattern}
-Immediately enable events matching @var{pattern}.
-The file must contain one event name (as listed in the @file{trace-events} 
file)
-per line; globbing patterns are accepted too.  This option is only
-available if QEMU has been compiled with the @var{simple}, @var{stderr}
-or @var{ftrace} tracing backend.  To specify multiple events or patterns,
-specify the @option{-trace} option multiple times.
-
-Use @code{-trace help} to print a list of names of trace points.
-
-@item events=@var{file}
-Immediately enable events listed in @var{file}.
-The file must contain one event name (as listed in the @file{trace-events} 
file)
-per line; globbing patterns are accepted too.  This option is only
-available if QEMU has been compiled with the @var{simple}, @var{stderr} or
-@var{ftrace} tracing backend.
-
-@item file=@var{file}
-Log output traces to @var{file}.
-This option is only available if QEMU has been compiled with
-the @var{simple} tracing backend.
-@end table
+@include qemu-option-trace.texi
 ETEXI
 
 HXCOMM Internal use
-- 
2.1.4




[Qemu-block] [PATCH 6/7] qemu-img: move common options parsing before commands processing

2016-06-16 Thread Denis V. Lunev
This is necessary to enable creation of common qemu-img options which will
be specified before command.

The patch also enables '-V' alias to '--version' (exactly like in other
block utilities) and documents this change.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Eric Blake 
CC: Paolo Bonzini 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 qemu-img.c| 39 ++-
 qemu-img.texi | 10 +-
 2 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 251386b..b4217e4 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -91,9 +91,12 @@ static void QEMU_NORETURN help(void)
 {
 const char *help_msg =
QEMU_IMG_VERSION
-   "usage: qemu-img command [command options]\n"
+   "usage: qemu-img [standard options] command [command options]\n"
"QEMU disk image utility\n"
"\n"
+   "'-h', '--help'   display this help and exit\n"
+   "'-V', '--version'output version information and exit\n"
+   "\n"
"Command syntax:\n"
 #define DEF(option, callback, arg_string)\
"  " arg_string "\n"
@@ -3806,7 +3809,7 @@ int main(int argc, char **argv)
 int c;
 static const struct option long_options[] = {
 {"help", no_argument, 0, 'h'},
-{"version", no_argument, 0, 'v'},
+{"version", no_argument, 0, 'V'},
 {0, 0, 0, 0}
 };
 
@@ -3829,27 +3832,37 @@ int main(int argc, char **argv)
 if (argc < 2) {
 error_exit("Not enough arguments");
 }
-cmdname = argv[1];
 
 qemu_add_opts(&qemu_object_opts);
 qemu_add_opts(&qemu_source_opts);
 
-/* find the command */
-for (cmd = img_cmds; cmd->name != NULL; cmd++) {
-if (!strcmp(cmdname, cmd->name)) {
-return cmd->handler(argc - 1, argv + 1);
+while ((c = getopt_long(argc, argv, "+hV", long_options, NULL)) != -1) {
+switch (c) {
+case 'h':
+help();
+return 0;
+case 'V':
+printf(QEMU_IMG_VERSION);
+return 0;
 }
 }
 
-c = getopt_long(argc, argv, "h", long_options, NULL);
+cmdname = argv[optind];
 
-if (c == 'h') {
-help();
-}
-if (c == 'v') {
-printf(QEMU_IMG_VERSION);
+/* reset getopt_long scanning */
+argc -= optind;
+if (argc < 1) {
 return 0;
 }
+argv += optind;
+optind = 1;
+
+/* find the command */
+for (cmd = img_cmds; cmd->name != NULL; cmd++) {
+if (!strcmp(cmdname, cmd->name)) {
+return cmd->handler(argc, argv);
+}
+}
 
 /* not found */
 error_exit("Command not found: %s", cmdname);
diff --git a/qemu-img.texi b/qemu-img.texi
index cbe50e9..f1b874d 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -1,6 +1,6 @@
 @example
 @c man begin SYNOPSIS
-@command{qemu-img} @var{command} [@var{command} @var{options}]
+@command{qemu-img} [@var{standard} @var{options}] @var{command} [@var{command} 
@var{options}]
 @c man end
 @end example
 
@@ -16,6 +16,14 @@ inconsistent state.
 
 @c man begin OPTIONS
 
+Standard options:
+@table @option
+@item -h, --help
+Display this help and exit
+@item -V, --version
+Display version information and exit
+@end table
+
 The following commands are supported:
 
 @include qemu-img-cmds.texi
-- 
2.1.4




[Qemu-block] [PATCH 5/7] trace: enable tracing in qemu-nbd

2016-06-16 Thread Denis V. Lunev
Please note, trace_init_backends() must be called in the final process,
i.e. after daemonization. This is necessary to keep tracing thread in
the proper process.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Eric Blake 
CC: Paolo Bonzini 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 Makefile  |  2 +-
 qemu-nbd.c| 18 +-
 qemu-nbd.texi |  3 +++
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index eb6c573..a291b7b 100644
--- a/Makefile
+++ b/Makefile
@@ -578,7 +578,7 @@ fsdev/virtfs-proxy-helper.1: fsdev/virtfs-proxy-helper.texi
  $(POD2MAN) --section=1 --center=" " --release=" " 
fsdev/virtfs-proxy-helper.pod > $@, \
  "  GEN   $@")
 
-qemu-nbd.8: qemu-nbd.texi
+qemu-nbd.8: qemu-nbd.texi qemu-option-trace.texi
$(call quiet-command, \
  perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $< qemu-nbd.pod && \
  $(POD2MAN) --section=8 --center=" " --release=" " qemu-nbd.pod > $@, \
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 6554f0a..f92820f 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -27,12 +27,14 @@
 #include "qemu/error-report.h"
 #include "qemu/config-file.h"
 #include "qemu/bswap.h"
+#include "qemu/log.h"
 #include "block/snapshot.h"
 #include "qapi/util.h"
 #include "qapi/qmp/qstring.h"
 #include "qom/object_interfaces.h"
 #include "io/channel-socket.h"
 #include "crypto/init.h"
+#include "trace/control.h"
 
 #include 
 #include 
@@ -88,6 +90,8 @@ static void usage(const char *name)
 "General purpose options:\n"
 "  --object type,id=ID,...   define an object such as 'secret' for providing\n"
 "passwords and/or encryption keys\n"
+"  -T, --trace [[enable=]][,events=][,file=]\n"
+"specify tracing options\n"
 #ifdef __linux__
 "Kernel NBD client support:\n"
 "  -c, --connect=DEV connect FILE to the local NBD device DEV\n"
@@ -470,7 +474,7 @@ int main(int argc, char **argv)
 off_t fd_size;
 QemuOpts *sn_opts = NULL;
 const char *sn_id_or_name = NULL;
-const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:x:";
+const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:x:T:";
 struct option lopt[] = {
 { "help", no_argument, NULL, 'h' },
 { "version", no_argument, NULL, 'V' },
@@ -498,6 +502,7 @@ int main(int argc, char **argv)
 { "export-name", required_argument, NULL, 'x' },
 { "tls-creds", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
 { "image-opts", no_argument, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
+{ "trace", required_argument, NULL, 'T' },
 { NULL, 0, NULL, 0 }
 };
 int ch;
@@ -518,6 +523,7 @@ int main(int argc, char **argv)
 const char *tlscredsid = NULL;
 bool imageOpts = false;
 bool writethrough = true;
+char *trace_file = NULL;
 
 /* The client thread uses SIGTERM to interrupt the server.  A signal
  * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
@@ -531,6 +537,7 @@ int main(int argc, char **argv)
 
 module_call_init(MODULE_INIT_QOM);
 qemu_add_opts(&qemu_object_opts);
+qemu_add_opts(&qemu_trace_opts);
 qemu_init_exec_dir(argv[0]);
 
 while ((ch = getopt_long(argc, argv, sopt, lopt, &opt_ind)) != -1) {
@@ -703,6 +710,9 @@ int main(int argc, char **argv)
 case QEMU_NBD_OPT_IMAGE_OPTS:
 imageOpts = true;
 break;
+case 'T':
+trace_file = trace_opt_parse(optarg, trace_file);
+break;
 }
 }
 
@@ -718,6 +728,12 @@ int main(int argc, char **argv)
 exit(EXIT_FAILURE);
 }
 
+if (!trace_init_backends()) {
+exit(1);
+}
+trace_init_file(trace_file);
+qemu_set_log(LOG_TRACE);
+
 if (tlscredsid) {
 if (sockpath) {
 error_report("TLS is only supported with IPv4/IPv6");
diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 9f23343..91ebf04 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -92,6 +92,9 @@ Display extra debugging information
 Display this help and exit
 @item -V, --version
 Display version information and exit
+@item -T, --trace 
[[enable=]@var{pattern}][,events=@var{file}][,file=@var{file}]
+@findex --trace
+@include qemu-option-trace.texi
 @end table
 
 @c man end
-- 
2.1.4




[Qemu-block] [PATCH 7/7] trace: enable tracing in qemu-img

2016-06-16 Thread Denis V. Lunev
The command will work this way:
qemu-img --trace qcow2* create -f qcow2 1.img 64G

Signed-off-by: Denis V. Lunev 
Suggested by: Daniel P. Berrange 
Reviewed-by: Eric Blake 
CC: Paolo Bonzini 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 Makefile  |  2 +-
 qemu-img.c| 18 +-
 qemu-img.texi |  3 +++
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index a291b7b..d3cb014 100644
--- a/Makefile
+++ b/Makefile
@@ -566,7 +566,7 @@ qemu.1: qemu-doc.texi qemu-options.texi qemu-monitor.texi 
qemu-monitor-info.texi
  "  GEN   $@")
 qemu.1: qemu-option-trace.texi
 
-qemu-img.1: qemu-img.texi qemu-img-cmds.texi
+qemu-img.1: qemu-img.texi qemu-option-trace.texi qemu-img-cmds.texi
$(call quiet-command, \
  perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $< qemu-img.pod && \
  $(POD2MAN) --section=1 --center=" " --release=" " qemu-img.pod > $@, \
diff --git a/qemu-img.c b/qemu-img.c
index b4217e4..0ff13f8 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -32,6 +32,7 @@
 #include "qemu/config-file.h"
 #include "qemu/option.h"
 #include "qemu/error-report.h"
+#include "qemu/log.h"
 #include "qom/object_interfaces.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/block-backend.h"
@@ -39,6 +40,7 @@
 #include "block/blockjob.h"
 #include "block/qapi.h"
 #include "crypto/init.h"
+#include "trace/control.h"
 #include 
 
 #define QEMU_IMG_VERSION "qemu-img version " QEMU_VERSION QEMU_PKGVERSION \
@@ -96,6 +98,8 @@ static void QEMU_NORETURN help(void)
"\n"
"'-h', '--help'   display this help and exit\n"
"'-V', '--version'output version information and exit\n"
+   "'-T', '--trace'  
[[enable=]][,events=][,file=]\n"
+   " specify tracing options\n"
"\n"
"Command syntax:\n"
 #define DEF(option, callback, arg_string)\
@@ -3806,10 +3810,12 @@ int main(int argc, char **argv)
 const img_cmd_t *cmd;
 const char *cmdname;
 Error *local_error = NULL;
+char *trace_file = NULL;
 int c;
 static const struct option long_options[] = {
 {"help", no_argument, 0, 'h'},
 {"version", no_argument, 0, 'V'},
+{"trace", required_argument, NULL, 'T'},
 {0, 0, 0, 0}
 };
 
@@ -3835,8 +3841,9 @@ int main(int argc, char **argv)
 
 qemu_add_opts(&qemu_object_opts);
 qemu_add_opts(&qemu_source_opts);
+qemu_add_opts(&qemu_trace_opts);
 
-while ((c = getopt_long(argc, argv, "+hV", long_options, NULL)) != -1) {
+while ((c = getopt_long(argc, argv, "+hVT:", long_options, NULL)) != -1) {
 switch (c) {
 case 'h':
 help();
@@ -3844,6 +3851,9 @@ int main(int argc, char **argv)
 case 'V':
 printf(QEMU_IMG_VERSION);
 return 0;
+case 'T':
+trace_file = trace_opt_parse(optarg, trace_file);
+break;
 }
 }
 
@@ -3857,6 +3867,12 @@ int main(int argc, char **argv)
 argv += optind;
 optind = 1;
 
+if (!trace_init_backends()) {
+exit(1);
+}
+trace_init_file(trace_file);
+qemu_set_log(LOG_TRACE);
+
 /* find the command */
 for (cmd = img_cmds; cmd->name != NULL; cmd++) {
 if (!strcmp(cmdname, cmd->name)) {
diff --git a/qemu-img.texi b/qemu-img.texi
index f1b874d..449a19c 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -22,6 +22,9 @@ Standard options:
 Display this help and exit
 @item -V, --version
 Display version information and exit
+@item -T, --trace 
[[enable=]@var{pattern}][,events=@var{file}][,file=@var{file}]
+@findex --trace
+@include qemu-option-trace.texi
 @end table
 
 The following commands are supported:
-- 
2.1.4




[Qemu-block] [PATCH 3/7] trace: move qemu_trace_opts to trace/control.c

2016-06-16 Thread Denis V. Lunev
The patch also creates trace_opt_parse() helper in trace/control.c to reuse
this code in next patches for qemu-nbd and qemu-io.

The patch also makes trace_init_events() static, as this call is not used
outside the module anymore.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Eric Blake 
CC: Paolo Bonzini 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 trace/control.c | 44 +++-
 trace/control.h | 24 +---
 vl.c| 37 +
 3 files changed, 57 insertions(+), 48 deletions(-)

diff --git a/trace/control.c b/trace/control.c
index d099f73..75fc731 100644
--- a/trace/control.c
+++ b/trace/control.c
@@ -20,11 +20,33 @@
 #include "qemu/log.h"
 #endif
 #include "qemu/error-report.h"
+#include "qemu/config-file.h"
 #include "monitor/monitor.h"
 
 int trace_events_enabled_count;
 bool trace_events_dstate[TRACE_EVENT_COUNT];
 
+QemuOptsList qemu_trace_opts = {
+.name = "trace",
+.implied_opt_name = "enable",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_trace_opts.head),
+.desc = {
+{
+.name = "enable",
+.type = QEMU_OPT_STRING,
+},
+{
+.name = "events",
+.type = QEMU_OPT_STRING,
+},{
+.name = "file",
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
+
 TraceEvent *trace_event_name(const char *name)
 {
 assert(name != NULL);
@@ -141,7 +163,7 @@ void trace_enable_events(const char *line_buf)
 }
 }
 
-void trace_init_events(const char *fname)
+static void trace_init_events(const char *fname)
 {
 Location loc;
 FILE *fp;
@@ -216,3 +238,23 @@ bool trace_init_backends(void)
 
 return true;
 }
+
+char *trace_opt_parse(const char *optarg, char *trace_file)
+{
+QemuOpts *opts = qemu_opts_parse_noisily(qemu_find_opts("trace"),
+ optarg, true);
+if (!opts) {
+exit(1);
+}
+if (qemu_opt_get(opts, "enable")) {
+trace_enable_events(qemu_opt_get(opts, "enable"));
+}
+trace_init_events(qemu_opt_get(opts, "events"));
+if (trace_file) {
+g_free(trace_file);
+}
+trace_file = g_strdup(qemu_opt_get(opts, "file"));
+qemu_opts_del(opts);
+
+return trace_file;
+}
diff --git a/trace/control.h b/trace/control.h
index e2ba6d4..942f9f5 100644
--- a/trace/control.h
+++ b/trace/control.h
@@ -160,17 +160,6 @@ static void trace_event_set_state_dynamic(TraceEvent *ev, 
bool state);
 bool trace_init_backends(void);
 
 /**
- * trace_init_events:
- * @events: Name of file with events to be enabled at startup; may be NULL.
- *  Corresponds to commandline option "-trace events=...".
- *
- * Read the list of enabled tracing events.
- *
- * Returns: Whether the backends could be successfully initialized.
- */
-void trace_init_events(const char *file);
-
-/**
  * trace_init_file:
  * @file:   Name of trace output file; may be NULL.
  *  Corresponds to commandline option "-trace file=...".
@@ -197,6 +186,19 @@ void trace_list_events(void);
  */
 void trace_enable_events(const char *line_buf);
 
+/**
+ * Definition of QEMU options describing trace subsystem configuration
+ */
+extern QemuOptsList qemu_trace_opts;
+
+/**
+ * trace_opt_parse:
+ * @optarg: A string argument of --trace command line argument
+ * @trace_file: current filename to save traces to
+ *
+ * Initialize tracing subsystem.
+ */
+char *trace_opt_parse(const char *optarg, char *trace_file);
 
 #include "trace/control-internal.h"
 
diff --git a/vl.c b/vl.c
index 45eff56..927aacf 100644
--- a/vl.c
+++ b/vl.c
@@ -262,26 +262,6 @@ static QemuOptsList qemu_sandbox_opts = {
 },
 };
 
-static QemuOptsList qemu_trace_opts = {
-.name = "trace",
-.implied_opt_name = "enable",
-.head = QTAILQ_HEAD_INITIALIZER(qemu_trace_opts.head),
-.desc = {
-{
-.name = "enable",
-.type = QEMU_OPT_STRING,
-},
-{
-.name = "events",
-.type = QEMU_OPT_STRING,
-},{
-.name = "file",
-.type = QEMU_OPT_STRING,
-},
-{ /* end of list */ }
-},
-};
-
 static QemuOptsList qemu_option_rom_opts = {
 .name = "option-rom",
 .implied_opt_name = "romfile",
@@ -3867,23 +3847,8 @@ int main(int argc, char **argv, char **envp)
 xen_mode = XEN_ATTACH;
 break;
 case QEMU_OPTION_trace:
-{
-opts = qemu_opts_parse_noisily(qemu_find_opts("trace"),
-   optarg, true);
-if (!opts) {
-exit(1);
-}
-if (qemu_opt_get(opts, "enable")) {
-trace_enable_events(qemu_opt_get(opts, "enable"));
-}
-trace_init_events(qemu_opt_get(opts, "events"));
-if (trace_file) {
-  

Re: [Qemu-block] [PATCH 1/3] block: fixed BdrvTrackedRequest filling in bdrv_co_discard

2016-06-16 Thread Fam Zheng
On Thu, 06/16 09:58, Denis V. Lunev wrote:
> The request is area is specified in bytes, not in sectors.

s/request is/request/

> 
> Signed-off-by: Denis V. Lunev 
> CC: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> ---
>  block/io.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index afb99c4..b51f681 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2263,8 +2263,8 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
> int64_t sector_num,
>  return 0;
>  }
>  
> -tracked_request_begin(&req, bs, NULL, sector_num, nb_sectors,
> -  BDRV_TRACKED_DISCARD);
> +tracked_request_begin(&req, bs, NULL, sector_num << BDRV_SECTOR_BITS,
> +  nb_sectors << BDRV_SECTOR_BITS, 
> BDRV_TRACKED_DISCARD);
>  bdrv_set_dirty(bs, sector_num, nb_sectors);
>  
>  max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
> -- 
> 2.5.0
> 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [PATCH 2/3] block: fix race in bdrv_co_discard with drive-mirror

2016-06-16 Thread Fam Zheng
On Thu, 06/16 09:58, Denis V. Lunev wrote:
> Actually we must set dirty bitmap dirty after we have written all our
> zeroes for correct processing in drive mirror code. In the other case
> we can face not zeroes in this dirty area there in mirror_iteration.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> ---
>  block/io.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/block/io.c b/block/io.c
> index b51f681..513bd99 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2265,7 +2265,6 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
> int64_t sector_num,
>  
>  tracked_request_begin(&req, bs, NULL, sector_num << BDRV_SECTOR_BITS,
>nb_sectors << BDRV_SECTOR_BITS, 
> BDRV_TRACKED_DISCARD);
> -bdrv_set_dirty(bs, sector_num, nb_sectors);
>  
>  max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
>  while (nb_sectors > 0) {
> @@ -2314,6 +2313,8 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
> int64_t sector_num,
>  }
>  ret = 0;
>  out:
> +bdrv_set_dirty(bs, req.offset << BDRV_SECTOR_BITS,
> +   req.bytes << BDRV_SECTOR_BITS);
>  tracked_request_end(&req);
>  return ret;
>  }
> -- 
> 2.5.0
> 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [PATCH 3/3] block: process before_write_notifiers in bdrv_co_discard

2016-06-16 Thread Fam Zheng
On Thu, 06/16 09:58, Denis V. Lunev wrote:
> This is mandatory for correct backup creation. In the other case the
> content under this area would be lost.
> 
> Dirty bits are set exactly like in bdrv_aligned_pwritev, i.e. they are set
> even if notifier has returned a error.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> ---
>  block/io.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/block/io.c b/block/io.c
> index 513bd99..92d9fe3 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2266,6 +2266,11 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
> int64_t sector_num,
>  tracked_request_begin(&req, bs, NULL, sector_num << BDRV_SECTOR_BITS,
>nb_sectors << BDRV_SECTOR_BITS, 
> BDRV_TRACKED_DISCARD);
>  
> +ret = notifier_with_return_list_notify(&bs->before_write_notifiers, 
> &req);
> +if (ret < 0) {
> +goto out;
> +}
> +
>  max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
>  while (nb_sectors > 0) {
>  int ret;
> -- 
> 2.5.0
> 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [PATCH] doc: Fix mailing list address in tests/qemu-iotests/README

2016-06-16 Thread Kevin Wolf
Am 16.06.2016 um 09:53 hat Thomas Huth geschrieben:
> The address of the mailing list is qemu-de...@nongnu.org
> instead of qemu-de...@savannah.nongnu.org. And while we're
> at it, also mention the qemu-block mailing list here.
> 
> Signed-off-by: Thomas Huth 

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH 0/6] block: bdrv_load/save_vmstate() cleanups

2016-06-16 Thread Stefan Hajnoczi
On Fri, Jun 10, 2016 at 06:05:16PM +0200, Kevin Wolf wrote:
> This series contains a few cleanups with respect to the vmstate I/O functions.
> Apart from making the interface more consistent (writes were already vectored,
> but not reads), this makes use of the new byte-based .bdrv_co_preadv/pwritev
> callbacks in qcow2 to get rid of a few hacks, including bs->zero_beyond_eof.
> 
> Kevin Wolf (6):
>   block: Introduce bdrv_preadv()
>   block: Make .bdrv_load_vmstate() vectored
>   block: Allow .bdrv_load/save_vmstate() to return 0/-errno
>   block: Make bdrv_load/save_vmstate coroutine_fns
>   qcow2: Let vmstate call qcow2_co_preadv/pwrite directly
>   block: Remove bs->zero_beyond_eof
> 
>  block.c   |   2 -
>  block/io.c| 173 
> +++---
>  block/qcow2.c |  28 ++--
>  block/sheepdog.c  |  13 +++-
>  include/block/block.h |   2 +
>  include/block/block_int.h |  13 ++--
>  6 files changed, 143 insertions(+), 88 deletions(-)
> 
> -- 
> 1.8.3.1
> 
> 

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH 4/6] block: Make bdrv_load/save_vmstate coroutine_fns

2016-06-16 Thread Stefan Hajnoczi
On Fri, Jun 10, 2016 at 06:05:20PM +0200, Kevin Wolf wrote:
> +static int coroutine_fn
> +bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
> +   bool is_read)
> +{
> +BlockDriver *drv = bs->drv;
> +
> +if (!drv) {
> +return -ENOMEDIUM;
> +} else if (drv->bdrv_load_vmstate) {
> +return is_read ? drv->bdrv_load_vmstate(bs, qiov, pos)
> +   : drv->bdrv_save_vmstate(bs, qiov, pos);
> +} else if (bs->file) {
> +return bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
> +}
> +
> +return -ENOTSUP;
> +}
> +
> +static void bdrv_co_rw_vmstate_entry(void *opaque)

This should also be coroutine_fn.


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH 0/6] block: bdrv_load/save_vmstate() cleanups

2016-06-16 Thread Kevin Wolf
Am 10.06.2016 um 18:05 hat Kevin Wolf geschrieben:
> This series contains a few cleanups with respect to the vmstate I/O functions.
> Apart from making the interface more consistent (writes were already vectored,
> but not reads), this makes use of the new byte-based .bdrv_co_preadv/pwritev
> callbacks in qcow2 to get rid of a few hacks, including bs->zero_beyond_eof.

Applied to the block branch.

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH v5 02/11] qapi: allow QmpInputVisitor to auto-cast types

2016-06-16 Thread Markus Armbruster
"Daniel P. Berrange"  writes:

> On Thu, Jun 09, 2016 at 04:03:50PM +0200, Markus Armbruster wrote:
>> "Daniel P. Berrange"  writes:
>> 
>> > Currently the QmpInputVisitor assumes that all scalar
>> > values are directly represented as their final types.
>> > ie it assumes an 'int' is using QInt, and a 'bool' is
>> > using QBool.
>> >
>> > This extends it so that QString is optionally permitted
>> > for any of the non-string scalar types. This behaviour
>> > is turned on by requesting the 'autocast' flag in the
>> > constructor.
>> >
>> > This makes it possible to use QmpInputVisitor with a
>> > QDict produced from QemuOpts, where everything is in
>> > string format.
>> 
>> We're still digging.
>> 
>> > Signed-off-by: Daniel P. Berrange 
>> > ---
>> >  docs/qapi-code-gen.txt |   2 +-
>> >  include/qapi/qmp-input-visitor.h   |   6 +-
>> >  qapi/opts-visitor.c|   1 +
>> >  qapi/qmp-input-visitor.c   |  89 ++--
>> >  qmp.c  |   2 +-
>> >  qom/qom-qobject.c  |   2 +-
>> >  replay/replay-input.c  |   2 +-
>> >  scripts/qapi-commands.py   |   2 +-
>> >  tests/check-qnull.c|   2 +-
>> >  tests/test-qmp-commands.c  |   2 +-
>> >  tests/test-qmp-input-strict.c  |   2 +-
>> >  tests/test-qmp-input-visitor.c | 115 
>> > -
>> >  tests/test-visitor-serialization.c |   2 +-
>> >  util/qemu-sockets.c|   2 +-
>> >  14 files changed, 201 insertions(+), 30 deletions(-)
>
>
>> > diff --git a/qapi/opts-visitor.c b/qapi/opts-visitor.c
>> > index 4cf1cf8..00e4b1a 100644
>> > --- a/qapi/opts-visitor.c
>> > +++ b/qapi/opts-visitor.c
>> > @@ -347,6 +347,7 @@ opts_type_bool(Visitor *v, const char *name, bool 
>> > *obj, Error **errp)
>> >  }
>> >  
>> >  if (opt->str) {
>> > +/* Keep these values in sync with same code in 
>> > qmp-input-visitor.c */
>> 
>> Also with parse_option_bool().  That's the canonical place, in fact.
>
> except parse_option_bool() doesn't allow "yes", "no", "y", "n" as valid
> values - it only permits "on" and "off" :-(  I guess I could make the
> parse_option_bool() method non-static, make it accept these missing values,
> and then call it from all places so we have guaranteed consistency.

Or factor out its parsing guts and put them into cutils.c.  Similar to
how qemu_strtol() & friends are (or should be) the common number
parsers.

But that's optional.  All I'm asking for this patch is an updated
comment.

>> 
>> >  if (strcmp(opt->str, "on") == 0 ||
>> >  strcmp(opt->str, "yes") == 0 ||
>> >  strcmp(opt->str, "y") == 0) {
>> > diff --git a/qapi/qmp-input-visitor.c b/qapi/qmp-input-visitor.c
>> > index aea90a1..92023b1 100644
>> > --- a/qapi/qmp-input-visitor.c
>> > +++ b/qapi/qmp-input-visitor.c
>> > @@ -20,6 +20,7 @@
>> >  #include "qemu-common.h"
>> >  #include "qapi/qmp/types.h"
>> >  #include "qapi/qmp/qerror.h"
>> > +#include "qemu/cutils.h"
>> >  
>> >  #define QIV_STACK_SIZE 1024
>> >  
>> > @@ -45,6 +46,7 @@ struct QmpInputVisitor
>> >  
>> >  /* True to reject parse in visit_end_struct() if unvisited keys 
>> > remain. */
>> >  bool strict;
>> > +bool autocast;
>> >  };
>> >  
>> >  static QmpInputVisitor *to_qiv(Visitor *v)
>> > @@ -254,15 +256,25 @@ static void qmp_input_type_int64(Visitor *v, const 
>> > char *name, int64_t *obj,
>> >   Error **errp)
>> >  {
>> >  QmpInputVisitor *qiv = to_qiv(v);
>> > -QInt *qint = qobject_to_qint(qmp_input_get_object(qiv, name, true));
>> > +QObject *qobj = qmp_input_get_object(qiv, name, true);
>> > +QInt *qint;
>> > +QString *qstr;
>> >  
>> > -if (!qint) {
>> > -error_setg(errp, QERR_INVALID_PARAMETER_TYPE, name ? name : 
>> > "null",
>> > -   "integer");
>> > +qint = qobject_to_qint(qobj);
>> > +if (qint) {
>> > +*obj = qint_get_int(qint);
>> >  return;
>> >  }
>> >  
>> > -*obj = qint_get_int(qint);
>> > +qstr = qobject_to_qstring(qobj);
>> > +if (qstr && qstr->string && qiv->autocast) {
>> > +if (qemu_strtoll(qstr->string, NULL, 10, obj) == 0) {
>> 
>> In the commit message, you mentioned intended use: "with a QDict
>> produced from QemuOpts, where everything is in string format."  I figure
>> you mean opts with empty opts->list->desc[].  For those, we accept any
>> key and parse all values as strings.
>> 
>> The idea with such QemuOpts is to *delay* parsing until we know how to
>> parse.  The delay should be transparent to the user.  In particular,
>> values should be parsed the same whether the parsing is delayed or not.
>> 
>> The canonical QemuOpts value parser is qemu_opt_parse().  It parses
>> integers with parse_option_number().  That function parses like
>> stroull(qstr->string, ..., 0).  Two differences:
>> 
>> * stroll() vs. strtoull(): they differ

Re: [Qemu-block] [Qemu-devel] [PATCH v5 01/11] qdict: implement a qdict_crumple method for un-flattening a dict

2016-06-16 Thread Markus Armbruster
"Daniel P. Berrange"  writes:

> On Thu, Jun 09, 2016 at 03:20:47PM +0200, Markus Armbruster wrote:
>> I apologize for the lateness of this review.
>> 
>> "Daniel P. Berrange"  writes:
>> 
>> > The qdict_flatten() method will take a dict whose elements are
>> > further nested dicts/lists and flatten them by concatenating
>> > keys.
>> >
>> > The qdict_crumple() method aims to do the reverse, taking a flat
>> > qdict, and turning it into a set of nested dicts/lists. It will
>> > apply nesting based on the key name, with a '.' indicating a
>> > new level in the hierarchy. If the keys in the nested structure
>> > are all numeric, it will create a list, otherwise it will create
>> > a dict.
>> >
>> > If the keys are a mixture of numeric and non-numeric, or the
>> > numeric keys are not in strictly ascending order, an error will
>> > be reported.
>> >
>> > As an example, a flat dict containing
>> >
>> >  {
>> >'foo.0.bar': 'one',
>> >'foo.0.wizz': '1',
>> >'foo.1.bar': 'two',
>> >'foo.1.wizz': '2'
>> >  }
>> >
>> > will get turned into a dict with one element 'foo' whose
>> > value is a list. The list elements will each in turn be
>> > dicts.
>> >
>> >  {
>> >'foo': [
>> >  { 'bar': 'one', 'wizz': '1' },
>> >  { 'bar': 'two', 'wizz': '2' }
>> >],
>> >  }
>> >
>> > If the key is intended to contain a literal '.', then it must
>> > be escaped as '..'. ie a flat dict
>> >
>> >   {
>> >  'foo..bar': 'wizz',
>> >  'bar.foo..bar': 'eek',
>> >  'bar.hello': 'world'
>> >   }
>> >
>> > Will end up as
>> >
>> >   {
>> >  'foo.bar': 'wizz',
>> >  'bar': {
>> > 'foo.bar': 'eek',
>> > 'hello': 'world'
>> >  }
>> >   }
>> >
>> > The intent of this function is that it allows a set of QemuOpts
>> > to be turned into a nested data structure that mirrors the nesting
>> > used when the same object is defined over QMP.
>> >
>> > Signed-off-by: Daniel P. Berrange 
>> 
>
>> > +/**
>> > + * qdict_split_flat_key:
>> > + * @key: the key string to split
>> > + * @prefix: non-NULL pointer to hold extracted prefix
>> > + * @suffix: non-NULL pointer to hold extracted suffix
>> > + *
>> > + * Given a flattened key such as 'foo.0.bar', split it
>> > + * into two parts at the first '.' separator. Allows
>> > + * double dot ('..') to escape the normal separator.
>> > + *
>> > + * eg
>> > + *'foo.0.bar' -> prefix='foo' and suffix='0.bar'
>> > + *'foo..0.bar' -> prefix='foo.0' and suffix='bar'
>> > + *
>> > + * The '..' sequence will be unescaped in the returned
>> > + * 'prefix' string. The 'suffix' string will be left
>> > + * in escaped format, so it can be fed back into the
>> > + * qdict_split_flat_key() key as the input later.
>> 
>> Why is the suffix strdup'ed then?
>
> It doesn't need to be - i'll make it const.
>
>
>
>> > +}
>> > +
>> > +
>> > +/**
>> > + * qdict_list_size:
>> > + * @maybe_list: dict that may be only list elements
>> 
>> Huh?  How can a dictionary "be only list elements"?
>> 
>> Do you mean "the dictionary to test?"
>
> I'll say "dict to search for keys representing list elements."

Yes, that's better.

>> > + *
>> > + * Determine whether all keys in @maybe_list are
>> > + * valid list elements. If they are all valid,
>> > + * then this returns the number of elements. If
>> > + * they all look like non-numeric keys, then returns
>> > + * zero. If there is a mix of numeric and non-numeric
>> > + * keys, then an error is set as it is both a list
>> > + * and a dict at once.
>> 
>> This is well-defined only if empty @maybe_list is considered to have
>> dict nature, not list nature.  Else, return value zero could be the
>> length of the empty list or the special "has dict nature" value.
>> 
>> Please spell out behavior for empty @maybe_list.
>
> Yep, empty list will imply qdict nature and so return zero.
>
>> > + *
>> > + * Returns: number of list elements, 0 if a dict, -1 on error
>> 
>> Awkward function name.  qdict_list_size_if_list() would be clear.
>>
>> But I'd simply turn this into a predicate qdict_is_list(), and have the
>> caller use qdict_size() to get the number of elements.
>
> I thought that qdict_size() would cause another iteration, but
> I see now it just returns a cached size. So I'll switch to
> qidct_is_list().
>
>> > +static ssize_t qdict_list_size(QDict *maybe_list, Error **errp)
>> > +{
>> > +const QDictEntry *entry, *next;
>> > +ssize_t len = 0;
>> > +ssize_t max = -1;
>> > +int is_list = -1;
>> > +int64_t val;
>> > +
>> > +entry = qdict_first(maybe_list);
>> > +while (entry != NULL) {
>> > +next = qdict_next(maybe_list, entry);
>> 
>> Please keep the loop control in one place:
>> 
>>for (entry = qdict_first(maybe_list); entry; entry = 
>> qdict_next(entry)) {
>> 
>> I'd rename some variables for less verbiage:
>> 
>>for (ent = qdict_first(dict); ent; ent = qdict_next(ent)) {
>> 
>> Your loop control also works when the loop body deletes @entry from
>> @maybe_list.  Seei

Re: [Qemu-block] [PATCH 2/3] block: fix race in bdrv_co_discard with drive-mirror

2016-06-16 Thread Vladimir Sementsov-Ogievskiy

On 16.06.2016 09:58, Denis V. Lunev wrote:

Actually we must set dirty bitmap dirty after we have written all our
zeroes for correct processing in drive mirror code. In the other case
we can face not zeroes in this dirty area there in mirror_iteration.

Signed-off-by: Denis V. Lunev 
CC: Vladimir Sementsov-Ogievskiy
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
---
  block/io.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index b51f681..513bd99 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2265,7 +2265,6 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
  
  tracked_request_begin(&req, bs, NULL, sector_num << BDRV_SECTOR_BITS,

nb_sectors << BDRV_SECTOR_BITS, 
BDRV_TRACKED_DISCARD);
-bdrv_set_dirty(bs, sector_num, nb_sectors);
  
  max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS);

  while (nb_sectors > 0) {
@@ -2314,6 +2313,8 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
  }
  ret = 0;
  out:
+bdrv_set_dirty(bs, req.offset << BDRV_SECTOR_BITS,

 s/<>/

+   req.bytes << BDRV_SECTOR_BITS);

and here

  tracked_request_end(&req);
  return ret;
  }



--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH 2/3] block: fix race in bdrv_co_discard with drive-mirror

2016-06-16 Thread Denis V. Lunev

On 06/16/2016 12:34 PM, Vladimir Sementsov-Ogievskiy wrote:

On 16.06.2016 09:58, Denis V. Lunev wrote:

Actually we must set dirty bitmap dirty after we have written all our
zeroes for correct processing in drive mirror code. In the other case
we can face not zeroes in this dirty area there in mirror_iteration.

Signed-off-by: Denis V. Lunev 
CC: Vladimir Sementsov-Ogievskiy
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
---
  block/io.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index b51f681..513bd99 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2265,7 +2265,6 @@ int coroutine_fn 
bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
tracked_request_begin(&req, bs, NULL, sector_num << 
BDRV_SECTOR_BITS,
nb_sectors << BDRV_SECTOR_BITS, 
BDRV_TRACKED_DISCARD);

-bdrv_set_dirty(bs, sector_num, nb_sectors);
max_discard = MIN_NON_ZERO(bs->bl.max_discard, 
BDRV_REQUEST_MAX_SECTORS);

  while (nb_sectors > 0) {
@@ -2314,6 +2313,8 @@ int coroutine_fn 
bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,

  }
  ret = 0;
  out:
+bdrv_set_dirty(bs, req.offset << BDRV_SECTOR_BITS,

 s/<>/

+   req.bytes << BDRV_SECTOR_BITS);

and here

  tracked_request_end(&req);
  return ret;
  }




very nice, thanks ;)



Re: [Qemu-block] [PATCH] qemu-img bench: Fix uninitialised writethrough mode

2016-06-16 Thread Stefan Hajnoczi
On Tue, Jun 14, 2016 at 11:33:25AM +0200, Kevin Wolf wrote:
> If no -t option is specified, bool writethrough stayed uninitialised.
> Initialise it as false, which makes cache=writeback the default cache
> mode.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qemu-img.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH 5/9] mirror: improve performance of mirroring of empty disk

2016-06-16 Thread Stefan Hajnoczi
On Wed, Jun 15, 2016 at 01:37:18PM +0300, Denis V. Lunev wrote:
> On 06/15/2016 12:19 PM, Stefan Hajnoczi wrote:
> > On Tue, Jun 14, 2016 at 09:20:47PM -0600, Eric Blake wrote:
> > > On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> > > > We should not take into account zero blocks for delay calculations.
> > > > They are not read and thus IO throttling is not required. In the
> > > > other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes
> > > > days.
> > > > 
> > > > Signed-off-by: Denis V. Lunev 
> > > > Reviewed-by: Vladimir Sementsov-Ogievskiy 
> > > > CC: Stefan Hajnoczi 
> > > > CC: Fam Zheng 
> > > > CC: Kevin Wolf 
> > > > CC: Max Reitz 
> > > > CC: Jeff Cody 
> > > > CC: Eric Blake 
> > > > ---
> > > >   block/mirror.c | 7 +--
> > > >   1 file changed, 5 insertions(+), 2 deletions(-)
> > > Seems reasonable, but I'll let others more familiar with throttling give
> > > the final say.
> > There is a bounce buffer fallback when !drv->bdrv_co_pwrite_zeroes.  In
> > that case we need to account for the bytes transferred.  I don't see
> > where the patch takes this into account.
> Interesting point.
> 
> I think we are charging for IO performed. Thus with the
> absence of the callback we should charge for io_sectors/2
> The write will be full scale, there is no reading.
> 
> Correct?

io_sectors currently only accounts for bytes written, not bytes read.

Therefore, I think we need:

/* Don't charge for efficient zero writes */
if (drv->bdrv_co_pwrite_zeroes) {
io_sectors = 0;
}

Stefan


signature.asc
Description: PGP signature


[Qemu-block] [PATCH] block: Fix snapshot=on with aio=native

2016-06-16 Thread Kevin Wolf
snapshot=on creates a temporary overlay that is always opened with
cache=unsafe (the cache mode specified by the user is only for the
actual image file and its children). This means that we must not inherit
the BDRV_O_NATIVE_AIO flag for the temporary overlay because trying to
use Linux AIO with cache=unsafe results in an error.

Reproducer without this patch:

$ x86_64-softmmu/qemu-system-x86_64 -drive 
file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on
qemu-system-x86_64: -drive 
file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on: aio=native was
specified, but it requires cache.direct=on, which was not specified.

Signed-off-by: Kevin Wolf 
---
 block.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/block.c b/block.c
index b350794..8104225 100644
--- a/block.c
+++ b/block.c
@@ -684,6 +684,10 @@ static void bdrv_temp_snapshot_options(int *child_flags, 
QDict *child_options,
 /* For temporary files, unconditional cache=unsafe is fine */
 qdict_set_default_str(child_options, BDRV_OPT_CACHE_DIRECT, "off");
 qdict_set_default_str(child_options, BDRV_OPT_CACHE_NO_FLUSH, "on");
+
+/* aio=native doesn't work for cache.direct=off, so disable it for the
+ * temporary snapshot */
+*child_flags &= ~BDRV_O_NATIVE_AIO;
 }
 
 /*
-- 
1.8.3.1




Re: [Qemu-block] [Qemu-devel] [PATCH v2 07/17] block: Give nonzero result to blk_get_max_transfer_length()

2016-06-16 Thread Paolo Bonzini


On 16/06/2016 07:25, Fam Zheng wrote:
>> > +assert(max_xfer_len);
>> > +stl_be_p(&r->buf[8], max_xfer_len);
>> > +/* Also take care of the opt xfer len. */
>> > +if (ldl_be_p(&r->buf[12]) > max_xfer_len) {
>> > +stl_be_p(&r->buf[12], max_xfer_len);
> Paolo: should we take s->blocksize into account? I missed it when I wrote this
> piece of code.

Hmm, the maximum and optimal transfer length is in logical blocks, so yes.

There is one thing that isn't great: at least Linux reads the capacity
before the block limits page, but we cannot know for sure that everyone
does it.  I think it's a good idea to send a READ CAPACITY(10) at
startup, like we do in get_stream_blocksize.  We only need it for the
blocksize, not for the capacity, so it's not necessary to use READ
CAPACITY(16).

The snooping is still necessary for the max_lba (which is used by
scsi_disk_dma_command), so we might as well keep the assignment to
s->blocksize in there too.  But let's add a best effort READ
CAPACITY(10) at startup.

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH 1/2] blockdev: Add dynamic generation of module_block.h

2016-06-16 Thread Colin Lord
On 06/16/2016 12:59 AM, Fam Zheng wrote:
> On Wed, 06/15 14:40, Colin Lord wrote:
>> From: Marc Mari 
>>
>> To simplify the addition of new block modules, add a script that generates
>> include/qemu/module_block.h automatically from the modules' source code.
>>
>> This script assumes that the QEMU coding style rules are followed.
>>
>> Signed-off-by: Marc MarĂ­ 
>> Signed-off-by: Colin Lord 
>> ---
>>  .gitignore  |   1 +
>>  Makefile|   8 +++
>>  scripts/modules/module_block.py | 134 
>> 
>>  3 files changed, 143 insertions(+)
>>  create mode 100644 scripts/modules/module_block.py
>>
>> diff --git a/.gitignore b/.gitignore
>> index 38ee1c5..06aa064 100644
>> --- a/.gitignore
>> +++ b/.gitignore
>> @@ -110,3 +110,4 @@ tags
>>  TAGS
>>  docker-src.*
>>  *~
>> +/include/qemu/module_block.h
>> diff --git a/Makefile b/Makefile
>> index ed4032a..8f8b6a2 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -76,6 +76,8 @@ GENERATED_HEADERS += trace/generated-ust-provider.h
>>  GENERATED_SOURCES += trace/generated-ust.c
>>  endif
>>  
>> +GENERATED_HEADERS += include/qemu/module_block.h
>> +
>>  # Don't try to regenerate Makefile or configure
>>  # We don't generate any of them
>>  Makefile: ;
>> @@ -352,6 +354,12 @@ ivshmem-client$(EXESUF): $(ivshmem-client-obj-y) 
>> libqemuutil.a libqemustub.a
>>  ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) libqemuutil.a libqemustub.a
>>  $(call LINK, $^)
>>  
>> +include/qemu/module_block.h: $(SRC_PATH)/scripts/modules/module_block.py 
>> config-host.mak
>> +$(call quiet-command,$(PYTHON) \
>> +$(SRC_PATH)/scripts/modules/module_block.py \
>> +$(SRC_PATH)/"./include/qemu/" $(addprefix $(SRC_PATH)/,$(patsubst 
>> %.mo,%.c,$(block-obj-m))), \
>> +"  GEN   $@")
>> +
>>  clean:
>>  # avoid old build problems by removing potentially incorrect old files
>>  rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h 
>> gen-op-arm.h
>> diff --git a/scripts/modules/module_block.py 
>> b/scripts/modules/module_block.py
>> new file mode 100644
>> index 000..005bc49
>> --- /dev/null
>> +++ b/scripts/modules/module_block.py
>> @@ -0,0 +1,134 @@
>> +#!/usr/bin/python
>> +#
>> +# Module information generator
>> +#
>> +# Copyright Red Hat, Inc. 2015
>> +#
>> +# Authors:
>> +#  Marc Mari 
> 
> Address hidden seems like a mistake during copy from web. :)
> 
> One more below..
> 

Yep, I didn't have the original emails (only the web archives) and I
didn't realize it wasn't supposed to look like that until I sent it out.
I'll fix it for the next version.

>> +#
>> +# This work is licensed under the terms of the GNU GPL, version 2.
>> +# See the COPYING file in the top-level directory.
>> +
>> +from __future__ import print_function
>> +import sys
>> +import os
>> +
>> +def get_string_struct(line):
>> +data = line.split()
>> +
>> +# data[0] -> struct element name
>> +# data[1] -> =
>> +# data[2] -> value
>> +
>> +return data[2].replace('"', '')[:-1]
>> +
>> +def add_module(fhader, library, format_name, protocol_name,
>> +probe, probe_device):
>> +lines = []
>> +lines.append('.library_name = "' + library + '",')
>> +if format_name != "":
>> +lines.append('.format_name = "' + format_name + '",')
>> +if protocol_name != "":
>> +lines.append('.protocol_name = "' + protocol_name + '",')
>> +if probe:
>> +lines.append('.has_probe = true,')
>> +if probe_device:
>> +lines.append('.has_probe_device = true,')
>> +
>> +text = '\n\t'.join(lines)
>> +fheader.write('\n\t{\n\t' + text + '\n\t},')
>> +
>> +def process_file(fheader, filename):
>> +# This parser assumes the coding style rules are being followed
>> +with open(filename, "r") as cfile:
>> +found_something = False
>> +found_start = False
>> +library, _ = os.path.splitext(os.path.basename(filename))
>> +for line in cfile:
>> +if found_start:
>> +line = line.replace('\n', '')
>> +if line.find(".format_name") != -1:
>> +format_name = get_string_struct(line)
>> +elif line.find(".protocol_name") != -1:
>> +protocol_name = get_string_struct(line)
>> +elif line.find(".bdrv_probe") != -1:
>> +probe = True
>> +elif line.find(".bdrv_probe_device") != -1:
>> +probe_device = True
>> +elif line == "};":
>> +add_module(fheader, library, format_name, protocol_name,
>> +probe, probe_device)
>> +found_start = False
>> +elif line.find("static BlockDriver") != -1:
>> +found_something = True
>> +found_start = True
>> +format_name = ""
>> +protocol_name = ""
>> +probe = Fa

Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] blockdev: Add dynamic module loading for block drivers

2016-06-16 Thread Colin Lord
On 06/15/2016 06:50 PM, Paolo Bonzini wrote:
> 
> 
> On 15/06/2016 20:40, Colin Lord wrote:
>>
>> The only block drivers that can be converted into modules are the drivers
>> that don't perform any init operation except for registering themselves. This
>> is why libiscsi has been disabled as a module.
> 
> I don't think it has in this patch :) but you can also move the
> iscsi_opts registration to vl.c.
> 
> Paolo
> 

Yeah I think Stefan mentioned this point in one of the earlier threads
but he said to do it in a separate commit, which I took to mean I
shouldn't include it here. Should I add it as a third part to this patch
series or leave it for a completely separate commit?

Colin



Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] blockdev: Add dynamic module loading for block drivers

2016-06-16 Thread Paolo Bonzini


On 16/06/2016 16:00, Colin Lord wrote:
>> >> The only block drivers that can be converted into modules are the drivers
>> >> that don't perform any init operation except for registering themselves. 
>> >> This
>> >> is why libiscsi has been disabled as a module.
> > 
> > I don't think it has in this patch :) but you can also move the
> > iscsi_opts registration to vl.c.
> 
> Yeah I think Stefan mentioned this point in one of the earlier threads
> but he said to do it in a separate commit, which I took to mean I
> shouldn't include it here. Should I add it as a third part to this patch
> series or leave it for a completely separate commit?

The patches in the series are left separate when including them in QEMU.
 Therefore, a separate patch *is* (or will become :)) a separate commit.

Therefore, putting the change before this patch, or alternatively as the
first patch in the series, will be fine.

Thanks,

Paolo



[Qemu-block] [PULL 01/39] qcow2: Work with bytes in qcow2_get_cluster_offset()

2016-06-16 Thread Kevin Wolf
This patch changes the units that qcow2_get_cluster_offset() uses
internally, without touching the interface just yet. This will be done
in another patch.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 block/qcow2-cluster.c | 44 +++-
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index b04bfaf..a59fb34 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -482,29 +482,28 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 unsigned int l2_index;
 uint64_t l1_index, l2_offset, *l2_table;
 int l1_bits, c;
-unsigned int index_in_cluster, nb_clusters;
-uint64_t nb_available, nb_needed;
+unsigned int offset_in_cluster, nb_clusters;
+uint64_t bytes_available, bytes_needed;
 int ret;
+unsigned int bytes;
 
-index_in_cluster = (offset >> 9) & (s->cluster_sectors - 1);
-nb_needed = *num + index_in_cluster;
+assert(*num <= BDRV_REQUEST_MAX_SECTORS);
+bytes = *num * BDRV_SECTOR_SIZE;
 
-l1_bits = s->l2_bits + s->cluster_bits;
-
-/* compute how many bytes there are between the offset and
- * the end of the l1 entry
- */
+offset_in_cluster = offset_into_cluster(s, offset);
+bytes_needed = (uint64_t) bytes + offset_in_cluster;
 
-nb_available = (1ULL << l1_bits) - (offset & ((1ULL << l1_bits) - 1));
-
-/* compute the number of available sectors */
+l1_bits = s->l2_bits + s->cluster_bits;
 
-nb_available = (nb_available >> 9) + index_in_cluster;
+/* compute how many bytes there are between the start of the cluster
+ * containing offset and the end of the l1 entry */
+bytes_available = (1ULL << l1_bits) - (offset & ((1ULL << l1_bits) - 1))
++ offset_in_cluster;
 
-if (nb_needed > nb_available) {
-nb_needed = nb_available;
+if (bytes_needed > bytes_available) {
+bytes_needed = bytes_available;
 }
-assert(nb_needed <= INT_MAX);
+assert(bytes_needed <= INT_MAX);
 
 *cluster_offset = 0;
 
@@ -542,7 +541,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
 *cluster_offset = be64_to_cpu(l2_table[l2_index]);
 
 /* nb_needed <= INT_MAX, thus nb_clusters <= INT_MAX, too */
-nb_clusters = size_to_clusters(s, nb_needed << 9);
+nb_clusters = size_to_clusters(s, bytes_needed);
 
 ret = qcow2_get_cluster_type(*cluster_offset);
 switch (ret) {
@@ -589,13 +588,16 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 
 qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
 
-nb_available = (c * s->cluster_sectors);
+bytes_available = (c * s->cluster_size);
 
 out:
-if (nb_available > nb_needed)
-nb_available = nb_needed;
+if (bytes_available > bytes_needed) {
+bytes_available = bytes_needed;
+}
 
-*num = nb_available - index_in_cluster;
+bytes = bytes_available - offset_in_cluster;
+assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+*num = bytes >> BDRV_SECTOR_BITS;
 
 return ret;
 
-- 
1.8.3.1




[Qemu-block] [PULL 04/39] qcow2: Use bytes instead of sectors for QCowL2Meta

2016-06-16 Thread Kevin Wolf
In preparation for implementing .bdrv_co_pwritev in qcow2.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 block/qcow2-cluster.c | 32 
 block/qcow2.h | 13 +++--
 2 files changed, 15 insertions(+), 30 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index b7671fc..b24230b 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -739,13 +739,12 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta 
*m, Qcow2COWRegion *r)
 BDRVQcow2State *s = bs->opaque;
 int ret;
 
-if (r->nb_sectors == 0) {
+if (r->nb_bytes == 0) {
 return 0;
 }
 
 qemu_co_mutex_unlock(&s->lock);
-ret = do_perform_cow(bs, m->offset, m->alloc_offset,
- r->offset, r->nb_sectors * BDRV_SECTOR_SIZE);
+ret = do_perform_cow(bs, m->offset, m->alloc_offset, r->offset, 
r->nb_bytes);
 qemu_co_mutex_lock(&s->lock);
 
 if (ret < 0) {
@@ -1196,25 +1195,20 @@ static int handle_alloc(BlockDriverState *bs, uint64_t 
guest_offset,
 /*
  * Save info needed for meta data update.
  *
- * requested_sectors: Number of sectors from the start of the first
+ * requested_bytes: Number of bytes from the start of the first
  * newly allocated cluster to the end of the (possibly shortened
  * before) write request.
  *
- * avail_sectors: Number of sectors from the start of the first
+ * avail_bytes: Number of bytes from the start of the first
  * newly allocated to the end of the last newly allocated cluster.
  *
- * nb_sectors: The number of sectors from the start of the first
+ * nb_bytes: The number of bytes from the start of the first
  * newly allocated cluster to the end of the area that the write
  * request actually writes to (excluding COW at the end)
  */
-int requested_sectors =
-(*bytes + offset_into_cluster(s, guest_offset))
->> BDRV_SECTOR_BITS;
-int avail_sectors = nb_clusters
-<< (s->cluster_bits - BDRV_SECTOR_BITS);
-int alloc_n_start = offset_into_cluster(s, guest_offset)
->> BDRV_SECTOR_BITS;
-int nb_sectors = MIN(requested_sectors, avail_sectors);
+uint64_t requested_bytes = *bytes + offset_into_cluster(s, guest_offset);
+int avail_bytes = MIN(INT_MAX, nb_clusters << s->cluster_bits);
+int nb_bytes = MIN(requested_bytes, avail_bytes);
 QCowL2Meta *old_m = *m;
 
 *m = g_malloc0(sizeof(**m));
@@ -1225,23 +1219,21 @@ static int handle_alloc(BlockDriverState *bs, uint64_t 
guest_offset,
 .alloc_offset   = alloc_cluster_offset,
 .offset = start_of_cluster(s, guest_offset),
 .nb_clusters= nb_clusters,
-.nb_available   = nb_sectors,
 
 .cow_start = {
 .offset = 0,
-.nb_sectors = alloc_n_start,
+.nb_bytes   = offset_into_cluster(s, guest_offset),
 },
 .cow_end = {
-.offset = nb_sectors * BDRV_SECTOR_SIZE,
-.nb_sectors = avail_sectors - nb_sectors,
+.offset = nb_bytes,
+.nb_bytes   = avail_bytes - nb_bytes,
 },
 };
 qemu_co_queue_init(&(*m)->dependent_requests);
 QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
 
 *host_offset = alloc_cluster_offset + offset_into_cluster(s, guest_offset);
-*bytes = MIN(*bytes, (nb_sectors * BDRV_SECTOR_SIZE)
- - offset_into_cluster(s, guest_offset));
+*bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
 assert(*bytes != 0);
 
 return 1;
diff --git a/block/qcow2.h b/block/qcow2.h
index e2c42d5..175b0c1 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -302,8 +302,8 @@ typedef struct Qcow2COWRegion {
  */
 uint64_toffset;
 
-/** Number of sectors to copy */
-int nb_sectors;
+/** Number of bytes to copy */
+int nb_bytes;
 } Qcow2COWRegion;
 
 /**
@@ -318,12 +318,6 @@ typedef struct QCowL2Meta
 /** Host offset of the first newly allocated cluster */
 uint64_t alloc_offset;
 
-/**
- * Number of sectors from the start of the first allocated cluster to
- * the end of the (possibly shortened) request
- */
-int nb_available;
-
 /** Number of newly allocated clusters */
 int nb_clusters;
 
@@ -471,8 +465,7 @@ static inline uint64_t l2meta_cow_start(QCowL2Meta *m)
 
 static inline uint64_t l2meta_cow_end(QCowL2Meta *m)
 {
-return m->offset + m->cow_end.offset
-+ (m->cow_end.nb_sectors << BDRV_SECTOR_BITS);
+return m->offset + m->cow_end.offset + m->cow_end.nb_bytes;
 }
 
 static inline uint64_t refcount_diff(uint64_t r1, uint64_t r2)
-- 
1.8.3.1




[Qemu-block] [PULL 00/39] Block layer patches

2016-06-16 Thread Kevin Wolf
The following changes since commit a66370b08d53837eb233cad090b3c2638084cc44:

  Merge remote-tracking branch 
'remotes/amit-migration/tags/migration-for-2.7-4' into staging (2016-06-16 
10:53:33 +0100)

are available in the git repository at:


  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 60251f4d3ecfc705c137ff505aaf7c46f31cb91b:

  Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-06-16' 
into queue-block (2016-06-16 15:22:18 +0200)



Block layer patches


Alberto Garcia (4):
  block: use the block job list in bdrv_drain_all()
  block: use the block job list in qmp_query_block_jobs()
  block: Prevent sleeping jobs from resuming if they have been paused
  block: Create the commit block job before reopening any image

Colin Lord (1):
  blockdev: clarify error on attempt to open locked tray

CĂ©dric Le Goater (1):
  m25p80: fix test on blk_pread() return value

Daniel P. Berrange (1):
  block: drop support for using qcow[2] encryption with system emulators

Eric Blake (2):
  block: Avoid bogus flags during mirroring
  block: Assert that flags are in range

Fam Zheng (1):
  iotests: 095: Clean up QEMU before showing image info

Kevin Wolf (21):
  qcow2: Work with bytes in qcow2_get_cluster_offset()
  qcow2: Implement .bdrv_co_preadv()
  qcow2: Make copy_sectors() byte based
  qcow2: Use bytes instead of sectors for QCowL2Meta
  qcow2: Implement .bdrv_co_pwritev()
  qemu-img bench: Fix uninitialised writethrough mode
  block: Byte-based bdrv_co_do_copy_on_readv()
  block: Prepare bdrv_aligned_preadv() for byte-aligned requests
  block: Prepare bdrv_aligned_pwritev() for byte-aligned requests
  raw-posix: Switch to bdrv_co_* interfaces
  raw-posix: Implement .bdrv_co_preadv/pwritev
  block: Don't enforce 512 byte minimum alignment
  linux-aio: Cancel BH if not needed
  block: Introduce bdrv_preadv()
  block: Make .bdrv_load_vmstate() vectored
  block: Allow .bdrv_load/save_vmstate() to return 0/-errno
  block: Make bdrv_load/save_vmstate coroutine_fns
  qcow2: Let vmstate call qcow2_co_preadv/pwrite directly
  block: Remove bs->zero_beyond_eof
  block: Fix snapshot=on with aio=native
  Merge remote-tracking branch 
'mreitz/tags/pull-block-for-kevin-2016-06-16' into queue-block

Max Reitz (5):
  block: Allow replacement of a BDS by its overlay
  block/mirror: Fix target backing BDS
  block/null: Implement bdrv_refresh_filename()
  iotests: Add test for post-mirror backing chains
  iotests: Add test for oVirt-like storage migration

Thomas Huth (1):
  doc: Fix mailing list address in tests/qemu-iotests/README

Vikhyat Umrao (1):
  rbd:change error_setg() to error_setg_errno()

Vladimir Sementsov-Ogievskiy (2):
  hmp: acquire aio_context in hmp_qemu_io
  hbitmap: add 'pos < size' asserts

 block.c|  32 +++--
 block/commit.c |  11 +-
 block/io.c | 306 +
 block/linux-aio.c  |  88 +
 block/mirror.c |  55 +---
 block/null.c   |  20 +++
 block/qcow.c   |  14 ++-
 block/qcow2-cluster.c  | 147 ++
 block/qcow2.c  | 239 +--
 block/qcow2.h  |  18 +--
 block/raw-aio.h|   3 +
 block/raw-posix.c  |  62 +
 block/rbd.c|  38 +++---
 block/sheepdog.c   |  13 +-
 blockdev.c |  42 ---
 blockjob.c |   6 +-
 hmp.c  |   5 +
 hw/block/m25p80.c  |   2 +-
 include/block/block.h  |  15 ++-
 include/block/block_int.h  |  31 +++--
 qemu-img.c |   2 +-
 tests/qemu-iotests/087.out |  12 +-
 tests/qemu-iotests/095 |   2 +
 tests/qemu-iotests/155 | 261 ++
 tests/qemu-iotests/155.out |   5 +
 tests/qemu-iotests/156 | 174 ++
 tests/qemu-iotests/156.out |  48 +++
 tests/qemu-iotests/README  |   3 +-
 tests/qemu-iotests/group   |   2 +
 trace-events   |   8 +-
 util/hbitmap.c |   3 +
 31 files changed, 1183 insertions(+), 484 deletions(-)
 create mode 100755 tests/qemu-iotests/155
 create mode 100644 tests/qemu-iotests/155.out
 create mode 100755 tests/qemu-iotests/156
 create mode 100644 tests/qemu-iotests/156.out



[Qemu-block] [PULL 14/39] block: Prepare bdrv_aligned_preadv() for byte-aligned requests

2016-06-16 Thread Kevin Wolf
This patch makes bdrv_aligned_preadv() ready to accept byte-aligned
requests. Note that this doesn't mean that such requests are actually
made. The caller still ensures that all requests are aligned to at least
512 bytes.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block/io.c | 44 
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/block/io.c b/block/io.c
index b6a2c80..e75bce2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -963,11 +963,9 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 {
 int ret;
 
-int64_t sector_num = offset >> BDRV_SECTOR_BITS;
-unsigned int nb_sectors = bytes >> BDRV_SECTOR_BITS;
-
-assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert(is_power_of_2(align));
+assert((offset & (align - 1)) == 0);
+assert((bytes & (align - 1)) == 0);
 assert(!qiov || bytes == qiov->size);
 assert((bs->open_flags & BDRV_O_NO_IO) == 0);
 assert(!(flags & ~BDRV_REQ_MASK));
@@ -987,9 +985,12 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 }
 
 if (flags & BDRV_REQ_COPY_ON_READ) {
+int64_t start_sector = offset >> BDRV_SECTOR_BITS;
+int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
+unsigned int nb_sectors = end_sector - start_sector;
 int pnum;
 
-ret = bdrv_is_allocated(bs, sector_num, nb_sectors, &pnum);
+ret = bdrv_is_allocated(bs, start_sector, nb_sectors, &pnum);
 if (ret < 0) {
 goto out;
 }
@@ -1005,28 +1006,24 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 0);
 } else {
 /* Read zeros after EOF */
-int64_t total_sectors, max_nb_sectors;
+int64_t total_bytes, max_bytes;
 
-total_sectors = bdrv_nb_sectors(bs);
-if (total_sectors < 0) {
-ret = total_sectors;
+total_bytes = bdrv_getlength(bs);
+if (total_bytes < 0) {
+ret = total_bytes;
 goto out;
 }
 
-max_nb_sectors = ROUND_UP(MAX(0, total_sectors - sector_num),
-  align >> BDRV_SECTOR_BITS);
-if (nb_sectors < max_nb_sectors) {
+max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
+if (bytes < max_bytes) {
 ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 0);
-} else if (max_nb_sectors > 0) {
+} else if (max_bytes > 0) {
 QEMUIOVector local_qiov;
 
 qemu_iovec_init(&local_qiov, qiov->niov);
-qemu_iovec_concat(&local_qiov, qiov, 0,
-  max_nb_sectors * BDRV_SECTOR_SIZE);
+qemu_iovec_concat(&local_qiov, qiov, 0, max_bytes);
 
-ret = bdrv_driver_preadv(bs, offset,
- max_nb_sectors * BDRV_SECTOR_SIZE,
- &local_qiov, 0);
+ret = bdrv_driver_preadv(bs, offset, max_bytes, &local_qiov, 0);
 
 qemu_iovec_destroy(&local_qiov);
 } else {
@@ -1034,11 +1031,10 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 }
 
 /* Reading beyond end of file is supposed to produce zeroes */
-if (ret == 0 && total_sectors < sector_num + nb_sectors) {
-uint64_t offset = MAX(0, total_sectors - sector_num);
-uint64_t bytes = (sector_num + nb_sectors - offset) *
-  BDRV_SECTOR_SIZE;
-qemu_iovec_memset(qiov, offset * BDRV_SECTOR_SIZE, 0, bytes);
+if (ret == 0 && total_bytes < offset + bytes) {
+uint64_t zero_offset = MAX(0, total_bytes - offset);
+uint64_t zero_bytes = offset + bytes - zero_offset;
+qemu_iovec_memset(qiov, zero_offset, 0, zero_bytes);
 }
 }
 
-- 
1.8.3.1




[Qemu-block] [PULL 06/39] blockdev: clarify error on attempt to open locked tray

2016-06-16 Thread Kevin Wolf
From: Colin Lord 

When opening a device with a locked tray, gives an error explaining the
device tray is locked and that the user should wait and try again. This
is less confusing than the previous error, which simply stated that the
tray was locked.

Signed-off-by: Colin Lord 
Signed-off-by: Kevin Wolf 
---
 blockdev.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 7fd515a..11177b4 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2544,6 +2544,7 @@ void qmp_blockdev_change_medium(const char *device, const 
char *filename,
 BlockBackend *blk;
 BlockDriverState *medium_bs = NULL;
 int bdrv_flags;
+int rc;
 QDict *options = NULL;
 Error *err = NULL;
 
@@ -2598,11 +2599,13 @@ void qmp_blockdev_change_medium(const char *device, 
const char *filename,
 goto fail;
 }
 
-qmp_blockdev_open_tray(device, false, false, &err);
-if (err) {
+rc = do_open_tray(device, false, &err);
+if (rc && rc != -ENOSYS) {
 error_propagate(errp, err);
 goto fail;
 }
+error_free(err);
+err = NULL;
 
 qmp_x_blockdev_remove_medium(device, &err);
 if (err) {
-- 
1.8.3.1




[Qemu-block] [PULL 12/39] block: drop support for using qcow[2] encryption with system emulators

2016-06-16 Thread Kevin Wolf
From: "Daniel P. Berrange" 

Back in the 2.3.0 release we declared qcow[2] encryption as
deprecated, warning people that it would be removed in a future
release.

  commit a1f688f4152e65260b94f37543521ceff8bfebe4
  Author: Markus Armbruster 
  Date:   Fri Mar 13 21:09:40 2015 +0100

block: Deprecate QCOW/QCOW2 encryption

The code still exists today, but by a (happy?) accident we entirely
broke the ability to use qcow[2] encryption in the system emulators
in the 2.4.0 release due to

  commit 8336aafae1451d54c81dd2b187b45f7c45d2428e
  Author: Daniel P. Berrange 
  Date:   Tue May 12 17:09:18 2015 +0100

qcow2/qcow: protect against uninitialized encryption key

This commit was designed to prevent future coding bugs which
might cause QEMU to read/write data on an encrypted block
device in plain text mode before a decryption key is set.

It turns out this preventative measure was a little too good,
because we already had a long standing bug where QEMU read
encrypted data in plain text mode during system emulator
startup, in order to guess disk geometry:

  Thread 10 (Thread 0x7fffd3fff700 (LWP 30373)):
  #0  0x7fffe90b1a28 in raise () at /lib64/libc.so.6
  #1  0x7fffe90b362a in abort () at /lib64/libc.so.6
  #2  0x7fffe90aa227 in __assert_fail_base () at /lib64/libc.so.6
  #3  0x7fffe90aa2d2 in  () at /lib64/libc.so.6
  #4  0x5587ae19 in qcow2_co_readv (bs=0x562accb0, sector_num=0, 
remaining_sectors=1, qiov=0x7fffd260) at block/qcow2.c:1229
  #5  0x5589b60d in bdrv_aligned_preadv (bs=bs@entry=0x562accb0, 
req=req@entry=0x7fffd3ffea50, offset=offset@entry=0, bytes=bytes@entry=512, 
align=align@entry=512, qiov=qiov@entry=0x7fffd260, flags=0) at 
block/io.c:908
  #6  0x5589b8bc in bdrv_co_do_preadv (bs=0x562accb0, offset=0, 
bytes=512, qiov=0x7fffd260, flags=) at block/io.c:999
  #7  0x5589c375 in bdrv_rw_co_entry (opaque=0x7fffd210) at 
block/io.c:544
  #8  0x5586933b in coroutine_thread (opaque=0x57876310) at 
coroutine-gthread.c:134
  #9  0x764e1835 in g_thread_proxy (data=0x562b5590) at 
gthread.c:778
  #10 0x76bb760a in start_thread () at /lib64/libpthread.so.0
  #11 0x7fffe917f59d in clone () at /lib64/libc.so.6

  Thread 1 (Thread 0x77ecab40 (LWP 30343)):
  #0  0x7fffe91797a9 in syscall () at /lib64/libc.so.6
  #1  0x764ff87f in g_cond_wait (cond=cond@entry=0x55e085f0 
, mutex=mutex@entry=0x55e08600 ) at 
gthread-posix.c:1397
  #2  0x558692c3 in qemu_coroutine_switch (co=) at 
coroutine-gthread.c:117
  #3  0x558692c3 in qemu_coroutine_switch (from_=0x562b5e30, 
to_=to_@entry=0x57876310, action=action@entry=COROUTINE_ENTER) at 
coroutine-gthread.c:175
  #4  0x55868a90 in qemu_coroutine_enter (co=0x57876310, 
opaque=0x0) at qemu-coroutine.c:116
  #5  0x55859b84 in thread_pool_completion_bh (opaque=0x7fffd40010e0) 
at thread-pool.c:187
  #6  0x55859514 in aio_bh_poll (ctx=ctx@entry=0x562953b0) at 
async.c:85
  #7  0x55864d10 in aio_dispatch (ctx=ctx@entry=0x562953b0) at 
aio-posix.c:135
  #8  0x55864f75 in aio_poll (ctx=ctx@entry=0x562953b0, 
blocking=blocking@entry=true) at aio-posix.c:291
  #9  0x5589c40d in bdrv_prwv_co (bs=bs@entry=0x562accb0, 
offset=offset@entry=0, qiov=qiov@entry=0x7fffd260, 
is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:591
  #10 0x5589c503 in bdrv_rw_co (bs=bs@entry=0x562accb0, 
sector_num=sector_num@entry=0, buf=buf@entry=0x7fffd2e0 "\321,", 
nb_sectors=nb_sectors@entry=21845, is_write=is_write@entry=false, 
flags=flags@entry=(unknown: 0)) at block/io.c:614
  #11 0x5589c562 in bdrv_read_unthrottled (nb_sectors=21845, 
buf=0x7fffd2e0 "\321,", sector_num=0, bs=0x562accb0) at block/io.c:622
  #12 0x5589c562 in bdrv_read_unthrottled (bs=0x562accb0, 
sector_num=sector_num@entry=0, buf=buf@entry=0x7fffd2e0 "\321,", 
nb_sectors=nb_sectors@entry=21845) at block/io.c:634
nb_sectors@entry=1) at block/block-backend.c:504
  #14 0x55752e9f in guess_disk_lchs (blk=blk@entry=0x562a5290, 
pcylinders=pcylinders@entry=0x7fffd52c, pheads=pheads@entry=0x7fffd530, 
psectors=psectors@entry=0x7fffd534) at hw/block/hd-geometry.c:68
  #15 0x55752ff7 in hd_geometry_guess (blk=0x562a5290, 
pcyls=pcyls@entry=0x57875d1c, pheads=pheads@entry=0x57875d20, 
psecs=psecs@entry=0x57875d24, ptrans=ptrans@entry=0x57875d28) at 
hw/block/hd-geometry.c:133
  #16 0x55752b87 in blkconf_geometry (conf=conf@entry=0x57875d00, 
ptrans=ptrans@entry=0x57875d28, cyls_max=cyls_max@entry=65536, 
heads_max=heads_max@entry=16, secs_max=secs_max@entry=255, 
errp=errp@entry=0x7fffd5e0) at hw/block/block.c:71
  #17 0x55799bc4 in ide_dev_initfn (dev=0x57875c80, kind=IDE_HD) at 
hw/ide/qdev.c:174
  #18 0x55768394 in device_realize 

[Qemu-block] [PULL 03/39] qcow2: Make copy_sectors() byte based

2016-06-16 Thread Kevin Wolf
This will allow copy on write operations where the overwritten part of
the cluster is not aligned to sector boundaries.

Also rename the function because it has nothing to do with sectors any
more.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 block/qcow2-cluster.c | 55 +--
 1 file changed, 27 insertions(+), 28 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 277764f..b7671fc 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -390,22 +390,18 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t 
sector_num,
 return 0;
 }
 
-static int coroutine_fn copy_sectors(BlockDriverState *bs,
- uint64_t start_sect,
- uint64_t cluster_offset,
- int n_start, int n_end)
+static int coroutine_fn do_perform_cow(BlockDriverState *bs,
+   uint64_t src_cluster_offset,
+   uint64_t cluster_offset,
+   int offset_in_cluster,
+   int bytes)
 {
 BDRVQcow2State *s = bs->opaque;
 QEMUIOVector qiov;
 struct iovec iov;
-int n, ret;
-
-n = n_end - n_start;
-if (n <= 0) {
-return 0;
-}
+int ret;
 
-iov.iov_len = n * BDRV_SECTOR_SIZE;
+iov.iov_len = bytes;
 iov.iov_base = qemu_try_blockalign(bs, iov.iov_len);
 if (iov.iov_base == NULL) {
 return -ENOMEM;
@@ -424,18 +420,21 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs,
  * interface.  This avoids double I/O throttling and request tracking,
  * which can lead to deadlock when block layer copy-on-read is enabled.
  */
-ret = bs->drv->bdrv_co_preadv(bs, (start_sect + n_start) * 
BDRV_SECTOR_SIZE,
-  n * BDRV_SECTOR_SIZE, &qiov, 0);
+ret = bs->drv->bdrv_co_preadv(bs, src_cluster_offset + offset_in_cluster,
+  bytes, &qiov, 0);
 if (ret < 0) {
 goto out;
 }
 
 if (bs->encrypted) {
 Error *err = NULL;
+int64_t sector = (cluster_offset + offset_in_cluster)
+ >> BDRV_SECTOR_BITS;
 assert(s->cipher);
-if (qcow2_encrypt_sectors(s, start_sect + n_start,
-  iov.iov_base, iov.iov_base, n,
-  true, &err) < 0) {
+assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
+assert((bytes & ~BDRV_SECTOR_MASK) == 0);
+if (qcow2_encrypt_sectors(s, sector, iov.iov_base, iov.iov_base,
+  bytes >> BDRV_SECTOR_BITS, true, &err) < 0) {
 ret = -EIO;
 error_free(err);
 goto out;
@@ -443,14 +442,14 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs,
 }
 
 ret = qcow2_pre_write_overlap_check(bs, 0,
-cluster_offset + n_start * BDRV_SECTOR_SIZE, n * BDRV_SECTOR_SIZE);
+cluster_offset + offset_in_cluster, bytes);
 if (ret < 0) {
 goto out;
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
-ret = bdrv_co_writev(bs->file->bs, (cluster_offset >> 9) + n_start, n,
- &qiov);
+ret = bdrv_co_pwritev(bs->file->bs, cluster_offset + offset_in_cluster,
+  bytes, &qiov, 0);
 if (ret < 0) {
 goto out;
 }
@@ -745,9 +744,8 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m, 
Qcow2COWRegion *r)
 }
 
 qemu_co_mutex_unlock(&s->lock);
-ret = copy_sectors(bs, m->offset / BDRV_SECTOR_SIZE, m->alloc_offset,
-   r->offset / BDRV_SECTOR_SIZE,
-   r->offset / BDRV_SECTOR_SIZE + r->nb_sectors);
+ret = do_perform_cow(bs, m->offset, m->alloc_offset,
+ r->offset, r->nb_sectors * BDRV_SECTOR_SIZE);
 qemu_co_mutex_lock(&s->lock);
 
 if (ret < 0) {
@@ -809,13 +807,14 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 assert(l2_index + m->nb_clusters <= s->l2_size);
 for (i = 0; i < m->nb_clusters; i++) {
 /* if two concurrent writes happen to the same unallocated cluster
-* each write allocates separate cluster and writes data concurrently.
-* The first one to complete updates l2 table with pointer to its
-* cluster the second one has to do RMW (which is done above by
-* copy_sectors()), update l2 table with its cluster pointer and free
-* old cluster. This is what this loop does */
-if(l2_table[l2_index + i] != 0)
+ * each write allocates separate cluster and writes data concurrently.
+ * The first one to complete updates l2 table with pointer to its
+ * cluster the second one has to do RMW (which is done above by
+ * perform_cow()), update l2 table with its cluster 

[Qemu-block] [PULL 08/39] m25p80: fix test on blk_pread() return value

2016-06-16 Thread Kevin Wolf
From: CĂ©dric Le Goater 

commit 243e6f69c129 ("m25p80: Switch to byte-based block access")
replaced blk_read() calls with blk_pread() but return values are
different.

Signed-off-by: CĂ©dric Le Goater 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 hw/block/m25p80.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index 4c856f5..51d8596 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -900,7 +900,7 @@ static int m25p80_init(SSISlave *ss)
 s->storage = blk_blockalign(s->blk, s->size);
 
 /* FIXME: Move to late init */
-if (blk_pread(s->blk, 0, s->storage, s->size)) {
+if (blk_pread(s->blk, 0, s->storage, s->size) != s->size) {
 fprintf(stderr, "Failed to initialize SPI flash!\n");
 return 1;
 }
-- 
1.8.3.1




[Qemu-block] [PULL 05/39] qcow2: Implement .bdrv_co_pwritev()

2016-06-16 Thread Kevin Wolf
This changes qcow2 to implement the byte-based .bdrv_co_pwritev
interface rather than the sector-based old one.

As preallocation uses the same allocation function as normal writes, and
the interface of that function needs to be changed, it is converted in
the same patch.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 block/qcow2-cluster.c | 13 
 block/qcow2.c | 89 ---
 block/qcow2.h |  3 +-
 trace-events  |  6 ++--
 4 files changed, 52 insertions(+), 59 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index b24230b..893ddf6 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1265,7 +1265,8 @@ fail:
  * Return 0 on success and -errno in error cases
  */
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
-int *num, uint64_t *host_offset, QCowL2Meta **m)
+   unsigned int *bytes, uint64_t *host_offset,
+   QCowL2Meta **m)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t start, remaining;
@@ -1273,13 +1274,11 @@ int qcow2_alloc_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 uint64_t cur_bytes;
 int ret;
 
-trace_qcow2_alloc_clusters_offset(qemu_coroutine_self(), offset, *num);
-
-assert((offset & ~BDRV_SECTOR_MASK) == 0);
+trace_qcow2_alloc_clusters_offset(qemu_coroutine_self(), offset, *bytes);
 
 again:
 start = offset;
-remaining = (uint64_t)*num << BDRV_SECTOR_BITS;
+remaining = *bytes;
 cluster_offset = 0;
 *host_offset = 0;
 cur_bytes = 0;
@@ -1365,8 +1364,8 @@ again:
 }
 }
 
-*num -= remaining >> BDRV_SECTOR_BITS;
-assert(*num > 0);
+*bytes -= remaining;
+assert(*bytes > 0);
 assert(*host_offset != 0);
 
 return 0;
diff --git a/block/qcow2.c b/block/qcow2.c
index 545dea0..cb55e2d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1544,23 +1544,21 @@ fail:
 return ret;
 }
 
-static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
-   int64_t sector_num,
-   int remaining_sectors,
-   QEMUIOVector *qiov)
+static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
+ uint64_t bytes, QEMUIOVector *qiov,
+ int flags)
 {
 BDRVQcow2State *s = bs->opaque;
-int index_in_cluster;
+int offset_in_cluster;
 int ret;
-int cur_nr_sectors; /* number of sectors in current iteration */
+unsigned int cur_bytes; /* number of sectors in current iteration */
 uint64_t cluster_offset;
 QEMUIOVector hd_qiov;
 uint64_t bytes_done = 0;
 uint8_t *cluster_data = NULL;
 QCowL2Meta *l2meta = NULL;
 
-trace_qcow2_writev_start_req(qemu_coroutine_self(), sector_num,
- remaining_sectors);
+trace_qcow2_writev_start_req(qemu_coroutine_self(), offset, bytes);
 
 qemu_iovec_init(&hd_qiov, qiov->niov);
 
@@ -1568,22 +1566,21 @@ static coroutine_fn int 
qcow2_co_writev(BlockDriverState *bs,
 
 qemu_co_mutex_lock(&s->lock);
 
-while (remaining_sectors != 0) {
+while (bytes != 0) {
 
 l2meta = NULL;
 
 trace_qcow2_writev_start_part(qemu_coroutine_self());
-index_in_cluster = sector_num & (s->cluster_sectors - 1);
-cur_nr_sectors = remaining_sectors;
-if (bs->encrypted &&
-cur_nr_sectors >
-QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors - index_in_cluster) {
-cur_nr_sectors =
-QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors - 
index_in_cluster;
+offset_in_cluster = offset_into_cluster(s, offset);
+cur_bytes = MIN(bytes, INT_MAX);
+if (bs->encrypted) {
+cur_bytes = MIN(cur_bytes,
+QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size
+- offset_in_cluster);
 }
 
-ret = qcow2_alloc_cluster_offset(bs, sector_num << 9,
-&cur_nr_sectors, &cluster_offset, &l2meta);
+ret = qcow2_alloc_cluster_offset(bs, offset, &cur_bytes,
+ &cluster_offset, &l2meta);
 if (ret < 0) {
 goto fail;
 }
@@ -1591,8 +1588,7 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState 
*bs,
 assert((cluster_offset & 511) == 0);
 
 qemu_iovec_reset(&hd_qiov);
-qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
-cur_nr_sectors * 512);
+qemu_iovec_concat(&hd_qiov, qiov, bytes_done, cur_bytes);
 
 if (bs->encrypted) {
 Error *err = NULL;
@@ -1611,8 +1607,9 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState 
*bs,
QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
 qemu_iovec_to_buf(&hd_qiov, 0, cluster_data, hd_qiov.size);
 
-if (qcow2_encr

[Qemu-block] [PULL 02/39] qcow2: Implement .bdrv_co_preadv()

2016-06-16 Thread Kevin Wolf
Reading from qcow2 images is now byte granularity.

Most of the affected code in qcow2 actually gets simpler with this
change. The only exception is encryption, which is fixed on 512 bytes
blocks; in order to keep this working, bs->request_alignment is set for
encrypted images.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 block/qcow2-cluster.c |  27 ++---
 block/qcow2.c | 108 +++---
 block/qcow2.h |   2 +-
 3 files changed, 72 insertions(+), 65 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index a59fb34..277764f 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -424,7 +424,8 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs,
  * interface.  This avoids double I/O throttling and request tracking,
  * which can lead to deadlock when block layer copy-on-read is enabled.
  */
-ret = bs->drv->bdrv_co_readv(bs, start_sect + n_start, n, &qiov);
+ret = bs->drv->bdrv_co_preadv(bs, (start_sect + n_start) * 
BDRV_SECTOR_SIZE,
+  n * BDRV_SECTOR_SIZE, &qiov, 0);
 if (ret < 0) {
 goto out;
 }
@@ -464,19 +465,21 @@ out:
 /*
  * get_cluster_offset
  *
- * For a given offset of the disk image, find the cluster offset in
- * qcow2 file. The offset is stored in *cluster_offset.
+ * For a given offset of the virtual disk, find the cluster type and offset in
+ * the qcow2 file. The offset is stored in *cluster_offset.
  *
- * on entry, *num is the number of contiguous sectors we'd like to
- * access following offset.
+ * On entry, *bytes is the maximum number of contiguous bytes starting at
+ * offset that we are interested in.
  *
- * on exit, *num is the number of contiguous sectors we can read.
+ * On exit, *bytes is the number of bytes starting at offset that have the same
+ * cluster type and (if applicable) are stored contiguously in the image file.
+ * Compressed clusters are always returned one by one.
  *
  * Returns the cluster type (QCOW2_CLUSTER_*) on success, -errno in error
  * cases.
  */
 int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
-int *num, uint64_t *cluster_offset)
+ unsigned int *bytes, uint64_t *cluster_offset)
 {
 BDRVQcow2State *s = bs->opaque;
 unsigned int l2_index;
@@ -485,13 +488,9 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 unsigned int offset_in_cluster, nb_clusters;
 uint64_t bytes_available, bytes_needed;
 int ret;
-unsigned int bytes;
-
-assert(*num <= BDRV_REQUEST_MAX_SECTORS);
-bytes = *num * BDRV_SECTOR_SIZE;
 
 offset_in_cluster = offset_into_cluster(s, offset);
-bytes_needed = (uint64_t) bytes + offset_in_cluster;
+bytes_needed = (uint64_t) *bytes + offset_in_cluster;
 
 l1_bits = s->l2_bits + s->cluster_bits;
 
@@ -595,9 +594,7 @@ out:
 bytes_available = bytes_needed;
 }
 
-bytes = bytes_available - offset_in_cluster;
-assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
-*num = bytes >> BDRV_SECTOR_BITS;
+*bytes = bytes_available - offset_in_cluster;
 
 return ret;
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 6f5fb81..545dea0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -975,6 +975,9 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 
 bs->encrypted = 1;
+
+/* Encryption works on a sector granularity */
+bs->request_alignment = BDRV_SECTOR_SIZE;
 }
 
 s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
@@ -1331,16 +1334,20 @@ static int64_t coroutine_fn 
qcow2_co_get_block_status(BlockDriverState *bs,
 BDRVQcow2State *s = bs->opaque;
 uint64_t cluster_offset;
 int index_in_cluster, ret;
+unsigned int bytes;
 int64_t status = 0;
 
-*pnum = nb_sectors;
+bytes = MIN(INT_MAX, nb_sectors * BDRV_SECTOR_SIZE);
 qemu_co_mutex_lock(&s->lock);
-ret = qcow2_get_cluster_offset(bs, sector_num << 9, pnum, &cluster_offset);
+ret = qcow2_get_cluster_offset(bs, sector_num << 9, &bytes,
+   &cluster_offset);
 qemu_co_mutex_unlock(&s->lock);
 if (ret < 0) {
 return ret;
 }
 
+*pnum = bytes >> BDRV_SECTOR_BITS;
+
 if (cluster_offset != 0 && ret != QCOW2_CLUSTER_COMPRESSED &&
 !s->cipher) {
 index_in_cluster = sector_num & (s->cluster_sectors - 1);
@@ -1358,28 +1365,34 @@ static int64_t coroutine_fn 
qcow2_co_get_block_status(BlockDriverState *bs,
 
 /* handle reading after the end of the backing file */
 int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov,
-  int64_t sector_num, int nb_sectors)
+int64_t offset, int bytes)
 {
+uint64_t bs_size = bs->total_sectors * BDRV_SECTOR_SIZE;
 int n1;
-if ((sector_num + nb_sectors) <= bs->total_sectors)
-return nb_sectors;
-if (secto

[Qemu-block] [PULL 09/39] qemu-img bench: Fix uninitialised writethrough mode

2016-06-16 Thread Kevin Wolf
If no -t option is specified, bool writethrough stayed uninitialised.
Initialise it as false, which makes cache=writeback the default cache
mode.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Stefan Hajnoczi 
---
 qemu-img.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-img.c b/qemu-img.c
index 251386b..14e2661 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3570,7 +3570,7 @@ static int img_bench(int argc, char **argv)
 BlockBackend *blk = NULL;
 BenchData data = {};
 int flags = 0;
-bool writethrough;
+bool writethrough = false;
 struct timeval t1, t2;
 int i;
 
-- 
1.8.3.1




[Qemu-block] [PULL 07/39] hmp: acquire aio_context in hmp_qemu_io

2016-06-16 Thread Kevin Wolf
From: Vladimir Sementsov-Ogievskiy 

Acquire aio context before run command, this is mandatory for unit tests.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Paolo Bonzini 
Signed-off-by: Kevin Wolf 
---
 hmp.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hmp.c b/hmp.c
index 4b8e987..30897af 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1955,7 +1955,12 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
 
 blk = blk_by_name(device);
 if (blk) {
+AioContext *aio_context = blk_get_aio_context(blk);
+aio_context_acquire(aio_context);
+
 qemuio_command(blk, command);
+
+aio_context_release(aio_context);
 } else {
 error_set(&err, ERROR_CLASS_DEVICE_NOT_FOUND,
   "Device '%s' not found", device);
-- 
1.8.3.1




[Qemu-block] [PULL 15/39] block: Prepare bdrv_aligned_pwritev() for byte-aligned requests

2016-06-16 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block/io.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/block/io.c b/block/io.c
index e75bce2..b261cc6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1249,11 +1249,9 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 bool waited;
 int ret;
 
-int64_t sector_num = offset >> BDRV_SECTOR_BITS;
-unsigned int nb_sectors = bytes >> BDRV_SECTOR_BITS;
+int64_t start_sector = offset >> BDRV_SECTOR_BITS;
+int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
 
-assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 assert(!qiov || bytes == qiov->size);
 assert((bs->open_flags & BDRV_O_NO_IO) == 0);
 assert(!(flags & ~BDRV_REQ_MASK));
@@ -1278,22 +1276,21 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 /* Do nothing, write notifier decided to fail this request */
 } else if (flags & BDRV_REQ_ZERO_WRITE) {
 bdrv_debug_event(bs, BLKDBG_PWRITEV_ZERO);
-ret = bdrv_co_do_pwrite_zeroes(bs, sector_num << BDRV_SECTOR_BITS,
-   nb_sectors << BDRV_SECTOR_BITS, flags);
+ret = bdrv_co_do_pwrite_zeroes(bs, offset, bytes, flags);
 } else {
 bdrv_debug_event(bs, BLKDBG_PWRITEV);
 ret = bdrv_driver_pwritev(bs, offset, bytes, qiov, flags);
 }
 bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE);
 
-bdrv_set_dirty(bs, sector_num, nb_sectors);
+bdrv_set_dirty(bs, start_sector, end_sector - start_sector);
 
 if (bs->wr_highest_offset < offset + bytes) {
 bs->wr_highest_offset = offset + bytes;
 }
 
 if (ret >= 0) {
-bs->total_sectors = MAX(bs->total_sectors, sector_num + nb_sectors);
+bs->total_sectors = MAX(bs->total_sectors, end_sector);
 }
 
 return ret;
-- 
1.8.3.1




[Qemu-block] [PULL 10/39] block: Avoid bogus flags during mirroring

2016-06-16 Thread Kevin Wolf
From: Eric Blake 

Commit e253f4b8 converted mirroring from sector-based bdrv_aio_*
to byte-based blk_aio_*, but failed to account for the subtle
difference in signatures (the former takes a semi-redundant length,
the latter takes a flags parameter).  Since all of our flags are
currently smaller in size than BDRV_SECTOR_SIZE, it has no ill
effects until we either perform sub-sector mirroring, or we start
asserting that no unexpected flags are set.  I found it while
testing new asserts when qemu-iotests 132 started warning about an
unknown flag 0x20.

Signed-off-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/mirror.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 80fd3c7..1f01f24 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -157,8 +157,7 @@ static void mirror_read_complete(void *opaque, int ret)
 return;
 }
 blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, &op->qiov,
-op->nb_sectors * BDRV_SECTOR_SIZE,
-mirror_write_complete, op);
+0, mirror_write_complete, op);
 }
 
 static inline void mirror_clip_sectors(MirrorBlockJob *s,
@@ -274,8 +273,7 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
sector_num,
 s->sectors_in_flight += nb_sectors;
 trace_mirror_one_iteration(s, sector_num, nb_sectors);
 
-blk_aio_preadv(source, sector_num * BDRV_SECTOR_SIZE, &op->qiov,
-   nb_sectors * BDRV_SECTOR_SIZE,
+blk_aio_preadv(source, sector_num * BDRV_SECTOR_SIZE, &op->qiov, 0,
mirror_read_complete, op);
 return ret;
 }
-- 
1.8.3.1




[Qemu-block] [PULL 13/39] block: Byte-based bdrv_co_do_copy_on_readv()

2016-06-16 Thread Kevin Wolf
In a first step to convert the common I/O path to work on bytes rather
than sectors, this converts the copy-on-read logic that is used by
bdrv_aligned_preadv().

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block/io.c| 63 +++
 block/mirror.c| 10 
 include/block/block.h | 10 +---
 trace-events  |  2 +-
 4 files changed, 52 insertions(+), 33 deletions(-)

diff --git a/block/io.c b/block/io.c
index 5b2017f..b6a2c80 100644
--- a/block/io.c
+++ b/block/io.c
@@ -404,12 +404,12 @@ static void mark_request_serialising(BdrvTrackedRequest 
*req, uint64_t align)
 }
 
 /**
- * Round a region to cluster boundaries
+ * Round a region to cluster boundaries (sector-based)
  */
-void bdrv_round_to_clusters(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors,
-int64_t *cluster_sector_num,
-int *cluster_nb_sectors)
+void bdrv_round_sectors_to_clusters(BlockDriverState *bs,
+int64_t sector_num, int nb_sectors,
+int64_t *cluster_sector_num,
+int *cluster_nb_sectors)
 {
 BlockDriverInfo bdi;
 
@@ -424,6 +424,26 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
 }
 }
 
+/**
+ * Round a region to cluster boundaries
+ */
+void bdrv_round_to_clusters(BlockDriverState *bs,
+int64_t offset, unsigned int bytes,
+int64_t *cluster_offset,
+unsigned int *cluster_bytes)
+{
+BlockDriverInfo bdi;
+
+if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
+*cluster_offset = offset;
+*cluster_bytes = bytes;
+} else {
+int64_t c = bdi.cluster_size;
+*cluster_offset = QEMU_ALIGN_DOWN(offset, c);
+*cluster_bytes = QEMU_ALIGN_UP(offset - *cluster_offset + bytes, c);
+}
+}
+
 static int bdrv_get_cluster_size(BlockDriverState *bs)
 {
 BlockDriverInfo bdi;
@@ -865,7 +885,7 @@ emulate_flags:
 }
 
 static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
+int64_t offset, unsigned int bytes, QEMUIOVector *qiov)
 {
 /* Perform I/O through a temporary buffer so that users who scribble over
  * their read buffer while the operation is in progress do not end up
@@ -877,21 +897,20 @@ static int coroutine_fn 
bdrv_co_do_copy_on_readv(BlockDriverState *bs,
 BlockDriver *drv = bs->drv;
 struct iovec iov;
 QEMUIOVector bounce_qiov;
-int64_t cluster_sector_num;
-int cluster_nb_sectors;
+int64_t cluster_offset;
+unsigned int cluster_bytes;
 size_t skip_bytes;
 int ret;
 
 /* Cover entire cluster so no additional backing file I/O is required when
  * allocating cluster in the image file.
  */
-bdrv_round_to_clusters(bs, sector_num, nb_sectors,
-   &cluster_sector_num, &cluster_nb_sectors);
+bdrv_round_to_clusters(bs, offset, bytes, &cluster_offset, &cluster_bytes);
 
-trace_bdrv_co_do_copy_on_readv(bs, sector_num, nb_sectors,
-   cluster_sector_num, cluster_nb_sectors);
+trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
+   cluster_offset, cluster_bytes);
 
-iov.iov_len = cluster_nb_sectors * BDRV_SECTOR_SIZE;
+iov.iov_len = cluster_bytes;
 iov.iov_base = bounce_buffer = qemu_try_blockalign(bs, iov.iov_len);
 if (bounce_buffer == NULL) {
 ret = -ENOMEM;
@@ -900,8 +919,7 @@ static int coroutine_fn 
bdrv_co_do_copy_on_readv(BlockDriverState *bs,
 
 qemu_iovec_init_external(&bounce_qiov, &iov, 1);
 
-ret = bdrv_driver_preadv(bs, cluster_sector_num * BDRV_SECTOR_SIZE,
- cluster_nb_sectors * BDRV_SECTOR_SIZE,
+ret = bdrv_driver_preadv(bs, cluster_offset, cluster_bytes,
  &bounce_qiov, 0);
 if (ret < 0) {
 goto err;
@@ -909,16 +927,12 @@ static int coroutine_fn 
bdrv_co_do_copy_on_readv(BlockDriverState *bs,
 
 if (drv->bdrv_co_pwrite_zeroes &&
 buffer_is_zero(bounce_buffer, iov.iov_len)) {
-ret = bdrv_co_do_pwrite_zeroes(bs,
-   cluster_sector_num * BDRV_SECTOR_SIZE,
-   cluster_nb_sectors * BDRV_SECTOR_SIZE,
-   0);
+ret = bdrv_co_do_pwrite_zeroes(bs, cluster_offset, cluster_bytes, 0);
 } else {
 /* This does not change the data on the disk, it is not necessary
  * to flush even in cache=writethrough mode.
  */
-ret = bdrv_driver_pwritev(bs, cluster_sector_num * BDRV_SECTOR_SIZE,
-  cluster_nb_sectors * BDRV_SECTOR_SIZE,
+ret = bdrv_drive

[Qemu-block] [PULL 11/39] block: Assert that flags are in range

2016-06-16 Thread Kevin Wolf
From: Eric Blake 

Add a new BDRV_REQ_MASK constant, and use it to make sure that
caller flags are always valid.

Tested with 'make check' and with qemu-iotests on both '-raw'
and '-qcow2'; the only failure turned up was fixed in the
previous commit.

Signed-off-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 block/io.c| 6 ++
 include/block/block.h | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/block/io.c b/block/io.c
index fb99a71..5b2017f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -776,6 +776,8 @@ static int coroutine_fn bdrv_driver_preadv(BlockDriverState 
*bs,
 int64_t sector_num;
 unsigned int nb_sectors;
 
+assert(!(flags & ~BDRV_REQ_MASK));
+
 if (drv->bdrv_co_preadv) {
 return drv->bdrv_co_preadv(bs, offset, bytes, qiov, flags);
 }
@@ -815,6 +817,8 @@ static int coroutine_fn 
bdrv_driver_pwritev(BlockDriverState *bs,
 unsigned int nb_sectors;
 int ret;
 
+assert(!(flags & ~BDRV_REQ_MASK));
+
 if (drv->bdrv_co_pwritev) {
 ret = drv->bdrv_co_pwritev(bs, offset, bytes, qiov,
flags & bs->supported_write_flags);
@@ -953,6 +957,7 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 assert(!qiov || bytes == qiov->size);
 assert((bs->open_flags & BDRV_O_NO_IO) == 0);
+assert(!(flags & ~BDRV_REQ_MASK));
 
 /* Handle Copy on Read and associated serialisation */
 if (flags & BDRV_REQ_COPY_ON_READ) {
@@ -1242,6 +1247,7 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 assert(!qiov || bytes == qiov->size);
 assert((bs->open_flags & BDRV_O_NO_IO) == 0);
+assert(!(flags & ~BDRV_REQ_MASK));
 
 waited = wait_serialising_requests(req);
 assert(!waited || !req->serialising);
diff --git a/include/block/block.h b/include/block/block.h
index 54cca28..8cabcdd 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -65,6 +65,9 @@ typedef enum {
 BDRV_REQ_MAY_UNMAP  = 0x4,
 BDRV_REQ_NO_SERIALISING = 0x8,
 BDRV_REQ_FUA= 0x10,
+
+/* Mask of valid flags */
+BDRV_REQ_MASK   = 0x1f,
 } BdrvRequestFlags;
 
 typedef struct BlockSizes {
-- 
1.8.3.1




[Qemu-block] [PULL 21/39] block: Introduce bdrv_preadv()

2016-06-16 Thread Kevin Wolf
We already have a byte-based bdrv_pwritev(), but the read counterpart
was still missing. This commit adds it.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block/io.c| 20 +---
 include/block/block.h |  1 +
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/block/io.c b/block/io.c
index b3ff9be..72d7210 100644
--- a/block/io.c
+++ b/block/io.c
@@ -700,6 +700,18 @@ int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags 
flags)
 }
 }
 
+int bdrv_preadv(BlockDriverState *bs, int64_t offset, QEMUIOVector *qiov)
+{
+int ret;
+
+ret = bdrv_prwv_co(bs, offset, qiov, false, 0);
+if (ret < 0) {
+return ret;
+}
+
+return qiov->size;
+}
+
 int bdrv_pread(BlockDriverState *bs, int64_t offset, void *buf, int bytes)
 {
 QEMUIOVector qiov;
@@ -707,19 +719,13 @@ int bdrv_pread(BlockDriverState *bs, int64_t offset, void 
*buf, int bytes)
 .iov_base = (void *)buf,
 .iov_len = bytes,
 };
-int ret;
 
 if (bytes < 0) {
 return -EINVAL;
 }
 
 qemu_iovec_init_external(&qiov, &iov, 1);
-ret = bdrv_prwv_co(bs, offset, &qiov, false, 0);
-if (ret < 0) {
-return ret;
-}
-
-return bytes;
+return bdrv_preadv(bs, offset, &qiov);
 }
 
 int bdrv_pwritev(BlockDriverState *bs, int64_t offset, QEMUIOVector *qiov)
diff --git a/include/block/block.h b/include/block/block.h
index 9c3a62c..9c63d07 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -235,6 +235,7 @@ int bdrv_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
 int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags);
 int bdrv_pread(BlockDriverState *bs, int64_t offset,
void *buf, int count);
+int bdrv_preadv(BlockDriverState *bs, int64_t offset, QEMUIOVector *qiov);
 int bdrv_pwrite(BlockDriverState *bs, int64_t offset,
 const void *buf, int count);
 int bdrv_pwritev(BlockDriverState *bs, int64_t offset, QEMUIOVector *qiov);
-- 
1.8.3.1




[Qemu-block] [PULL 24/39] block: Make bdrv_load/save_vmstate coroutine_fns

2016-06-16 Thread Kevin Wolf
This allows drivers to share code between normal I/O and vmstate
accesses.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block/io.c| 80 ++-
 include/block/block_int.h | 10 +++---
 2 files changed, 64 insertions(+), 26 deletions(-)

diff --git a/block/io.c b/block/io.c
index af4d43e..e1e8948 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1840,6 +1840,62 @@ int bdrv_write_compressed(BlockDriverState *bs, int64_t 
sector_num,
 return drv->bdrv_write_compressed(bs, sector_num, buf, nb_sectors);
 }
 
+typedef struct BdrvVmstateCo {
+BlockDriverState*bs;
+QEMUIOVector*qiov;
+int64_t pos;
+boolis_read;
+int ret;
+} BdrvVmstateCo;
+
+static int coroutine_fn
+bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
+   bool is_read)
+{
+BlockDriver *drv = bs->drv;
+
+if (!drv) {
+return -ENOMEDIUM;
+} else if (drv->bdrv_load_vmstate) {
+return is_read ? drv->bdrv_load_vmstate(bs, qiov, pos)
+   : drv->bdrv_save_vmstate(bs, qiov, pos);
+} else if (bs->file) {
+return bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
+}
+
+return -ENOTSUP;
+}
+
+static void coroutine_fn bdrv_co_rw_vmstate_entry(void *opaque)
+{
+BdrvVmstateCo *co = opaque;
+co->ret = bdrv_co_rw_vmstate(co->bs, co->qiov, co->pos, co->is_read);
+}
+
+static inline int
+bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
+bool is_read)
+{
+if (qemu_in_coroutine()) {
+return bdrv_co_rw_vmstate(bs, qiov, pos, is_read);
+} else {
+BdrvVmstateCo data = {
+.bs = bs,
+.qiov   = qiov,
+.pos= pos,
+.is_read= is_read,
+.ret= -EINPROGRESS,
+};
+Coroutine *co = qemu_coroutine_create(bdrv_co_rw_vmstate_entry);
+
+qemu_coroutine_enter(co, &data);
+while (data.ret == -EINPROGRESS) {
+aio_poll(bdrv_get_aio_context(bs), true);
+}
+return data.ret;
+}
+}
+
 int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
   int64_t pos, int size)
 {
@@ -1862,17 +1918,7 @@ int bdrv_save_vmstate(BlockDriverState *bs, const 
uint8_t *buf,
 
 int bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
 {
-BlockDriver *drv = bs->drv;
-
-if (!drv) {
-return -ENOMEDIUM;
-} else if (drv->bdrv_save_vmstate) {
-return drv->bdrv_save_vmstate(bs, qiov, pos);
-} else if (bs->file) {
-return bdrv_writev_vmstate(bs->file->bs, qiov, pos);
-}
-
-return -ENOTSUP;
+return bdrv_rw_vmstate(bs, qiov, pos, false);
 }
 
 int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf,
@@ -1896,17 +1942,7 @@ int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf,
 
 int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
 {
-BlockDriver *drv = bs->drv;
-
-if (!drv) {
-return -ENOMEDIUM;
-} else if (drv->bdrv_load_vmstate) {
-return drv->bdrv_load_vmstate(bs, qiov, pos);
-} else if (bs->file) {
-return bdrv_readv_vmstate(bs->file->bs, qiov, pos);
-}
-
-return -ENOTSUP;
+return bdrv_rw_vmstate(bs, qiov, pos, true);
 }
 
 /**/
diff --git a/include/block/block_int.h b/include/block/block_int.h
index f9a32cc..1fe0811 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -224,10 +224,12 @@ struct BlockDriver {
 int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs);
 
-int (*bdrv_save_vmstate)(BlockDriverState *bs, QEMUIOVector *qiov,
- int64_t pos);
-int (*bdrv_load_vmstate)(BlockDriverState *bs, QEMUIOVector *qiov,
- int64_t pos);
+int coroutine_fn (*bdrv_save_vmstate)(BlockDriverState *bs,
+  QEMUIOVector *qiov,
+  int64_t pos);
+int coroutine_fn (*bdrv_load_vmstate)(BlockDriverState *bs,
+  QEMUIOVector *qiov,
+  int64_t pos);
 
 int (*bdrv_change_backing_file)(BlockDriverState *bs,
 const char *backing_file, const char *backing_fmt);
-- 
1.8.3.1




[Qemu-block] [PULL 20/39] doc: Fix mailing list address in tests/qemu-iotests/README

2016-06-16 Thread Kevin Wolf
From: Thomas Huth 

The address of the mailing list is qemu-de...@nongnu.org
instead of qemu-de...@savannah.nongnu.org. And while we're
at it, also mention the qemu-block mailing list here.

Signed-off-by: Thomas Huth 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/README | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/README b/tests/qemu-iotests/README
index 4ccfdd1..6079b40 100644
--- a/tests/qemu-iotests/README
+++ b/tests/qemu-iotests/README
@@ -17,4 +17,5 @@ additional options to test further image formats or I/O 
methods.
 * Feedback and patches
 
 Please send improvements to the test suite, general feedback or just
-reports of failing tests cases to qemu-de...@savannah.nongnu.org.
+reports of failing tests cases to qemu-de...@nongnu.org with a CC:
+to qemu-block@nongnu.org.
-- 
1.8.3.1




[Qemu-block] [PULL 27/39] block: Fix snapshot=on with aio=native

2016-06-16 Thread Kevin Wolf
snapshot=on creates a temporary overlay that is always opened with
cache=unsafe (the cache mode specified by the user is only for the
actual image file and its children). This means that we must not inherit
the BDRV_O_NATIVE_AIO flag for the temporary overlay because trying to
use Linux AIO with cache=unsafe results in an error.

Reproducer without this patch:

$ x86_64-softmmu/qemu-system-x86_64 -drive 
file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on
qemu-system-x86_64: -drive 
file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on: aio=native was
specified, but it requires cache.direct=on, which was not specified.

Signed-off-by: Kevin Wolf 
---
 block.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/block.c b/block.c
index b350794..8104225 100644
--- a/block.c
+++ b/block.c
@@ -684,6 +684,10 @@ static void bdrv_temp_snapshot_options(int *child_flags, 
QDict *child_options,
 /* For temporary files, unconditional cache=unsafe is fine */
 qdict_set_default_str(child_options, BDRV_OPT_CACHE_DIRECT, "off");
 qdict_set_default_str(child_options, BDRV_OPT_CACHE_NO_FLUSH, "on");
+
+/* aio=native doesn't work for cache.direct=off, so disable it for the
+ * temporary snapshot */
+*child_flags &= ~BDRV_O_NATIVE_AIO;
 }
 
 /*
-- 
1.8.3.1




[Qemu-block] [PULL 30/39] block: Prevent sleeping jobs from resuming if they have been paused

2016-06-16 Thread Kevin Wolf
From: Alberto Garcia 

If we pause a block job and drain its BlockDriverState we want that
the job remains inactive until we call block_job_resume() again.

However if we pause the job while it is sleeping then it will resume
when the sleep timer fires.

This patch prevents that from happening by checking if the job has
been paused after it comes back from sleeping.

Signed-off-by: Alberto Garcia 
Suggested-by: Kevin Wolf 
Message-id: 
3d9011151512326b890d22bdab3530244ef349d7.1464346103.git.be...@igalia.com
Reviewed-by: Max Reitz 
Signed-off-by: Max Reitz 
---
 blockjob.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index c095cc5..01b896b 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -361,10 +361,12 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType 
type, int64_t ns)
 }
 
 job->busy = false;
+if (!block_job_is_paused(job)) {
+co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
+}
+/* The job can be paused while sleeping, so check this again */
 if (block_job_is_paused(job)) {
 qemu_coroutine_yield();
-} else {
-co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
 }
 job->busy = true;
 }
-- 
1.8.3.1




[Qemu-block] [PULL 28/39] block: use the block job list in bdrv_drain_all()

2016-06-16 Thread Kevin Wolf
From: Alberto Garcia 

bdrv_drain_all() pauses all block jobs by using bdrv_next() to iterate
over all top-level BlockDriverStates. Therefore the code is unable to
find block jobs in other nodes.

This patch uses block_job_next() to iterate over all block jobs.

Signed-off-by: Alberto Garcia 
Message-id: 
55ee7d7d4a65c28aa1a1b28823897ef326f328e2.1464346103.git.be...@igalia.com
Reviewed-by: Max Reitz 
Signed-off-by: Max Reitz 
---
 block/io.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/io.c b/block/io.c
index f5ce5f1..ebdb9d8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -289,15 +289,21 @@ void bdrv_drain_all(void)
 bool busy = true;
 BlockDriverState *bs;
 BdrvNextIterator it;
+BlockJob *job = NULL;
 GSList *aio_ctxs = NULL, *ctx;
 
+while ((job = block_job_next(job))) {
+AioContext *aio_context = blk_get_aio_context(job->blk);
+
+aio_context_acquire(aio_context);
+block_job_pause(job);
+aio_context_release(aio_context);
+}
+
 for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
 AioContext *aio_context = bdrv_get_aio_context(bs);
 
 aio_context_acquire(aio_context);
-if (bs->job) {
-block_job_pause(bs->job);
-}
 bdrv_parent_drained_begin(bs);
 bdrv_io_unplugged_begin(bs);
 bdrv_drain_recurse(bs);
@@ -340,12 +346,18 @@ void bdrv_drain_all(void)
 aio_context_acquire(aio_context);
 bdrv_io_unplugged_end(bs);
 bdrv_parent_drained_end(bs);
-if (bs->job) {
-block_job_resume(bs->job);
-}
 aio_context_release(aio_context);
 }
 g_slist_free(aio_ctxs);
+
+job = NULL;
+while ((job = block_job_next(job))) {
+AioContext *aio_context = blk_get_aio_context(job->blk);
+
+aio_context_acquire(aio_context);
+block_job_resume(job);
+aio_context_release(aio_context);
+}
 }
 
 /**
-- 
1.8.3.1




[Qemu-block] [PULL 25/39] qcow2: Let vmstate call qcow2_co_preadv/pwrite directly

2016-06-16 Thread Kevin Wolf
We don't really want to go through the block layer in order to read from
or write to the vmstate in a qcow2 image. Doing so required a few ugly
hacks like saving and restoring the old image size (because writing to
vmstate offsets would increase the image size) or disabling the "reads
after EOF = zeroes" logic. When calling the right functions directly,
these hacks aren't necessary any more.

Note that .bdrv_vmstate_load/save() return 0 instead of the number of
bytes in case of success now.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block/qcow2.c | 24 
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 362ada2..4718f82 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2909,36 +2909,20 @@ static int qcow2_save_vmstate(BlockDriverState *bs, 
QEMUIOVector *qiov,
   int64_t pos)
 {
 BDRVQcow2State *s = bs->opaque;
-int64_t total_sectors = bs->total_sectors;
-bool zero_beyond_eof = bs->zero_beyond_eof;
-int ret;
 
 BLKDBG_EVENT(bs->file, BLKDBG_VMSTATE_SAVE);
-bs->zero_beyond_eof = false;
-ret = bdrv_pwritev(bs, qcow2_vm_state_offset(s) + pos, qiov);
-bs->zero_beyond_eof = zero_beyond_eof;
-
-/* bdrv_co_do_writev will have increased the total_sectors value to include
- * the VM state - the VM state is however not an actual part of the block
- * device, therefore, we need to restore the old value. */
-bs->total_sectors = total_sectors;
-
-return ret;
+return bs->drv->bdrv_co_pwritev(bs, qcow2_vm_state_offset(s) + pos,
+qiov->size, qiov, 0);
 }
 
 static int qcow2_load_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
   int64_t pos)
 {
 BDRVQcow2State *s = bs->opaque;
-bool zero_beyond_eof = bs->zero_beyond_eof;
-int ret;
 
 BLKDBG_EVENT(bs->file, BLKDBG_VMSTATE_LOAD);
-bs->zero_beyond_eof = false;
-ret = bdrv_preadv(bs, qcow2_vm_state_offset(s) + pos, qiov);
-bs->zero_beyond_eof = zero_beyond_eof;
-
-return ret;
+return bs->drv->bdrv_co_preadv(bs, qcow2_vm_state_offset(s) + pos,
+   qiov->size, qiov, 0);
 }
 
 /*
-- 
1.8.3.1




[Qemu-block] [PULL 26/39] block: Remove bs->zero_beyond_eof

2016-06-16 Thread Kevin Wolf
It is always true for open images now.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block.c   |  2 --
 block/io.c| 52 +--
 include/block/block_int.h |  3 ---
 3 files changed, 23 insertions(+), 34 deletions(-)

diff --git a/block.c b/block.c
index 3d850a2..b350794 100644
--- a/block.c
+++ b/block.c
@@ -938,7 +938,6 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild 
*file,
 }
 
 bs->request_alignment = drv->bdrv_co_preadv ? 1 : 512;
-bs->zero_beyond_eof = true;
 bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
 
 if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
@@ -2192,7 +2191,6 @@ static void bdrv_close(BlockDriverState *bs)
 bs->encrypted = 0;
 bs->valid_key = 0;
 bs->sg = 0;
-bs->zero_beyond_eof = false;
 QDECREF(bs->options);
 QDECREF(bs->explicit_options);
 bs->options = NULL;
diff --git a/block/io.c b/block/io.c
index e1e8948..f5ce5f1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -967,6 +967,7 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 BdrvTrackedRequest *req, int64_t offset, unsigned int bytes,
 int64_t align, QEMUIOVector *qiov, int flags)
 {
+int64_t total_bytes, max_bytes;
 int ret;
 
 assert(is_power_of_2(align));
@@ -1008,40 +1009,33 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 }
 
 /* Forward the request to the BlockDriver */
-if (!bs->zero_beyond_eof) {
-ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 0);
-} else {
-/* Read zeros after EOF */
-int64_t total_bytes, max_bytes;
-
-total_bytes = bdrv_getlength(bs);
-if (total_bytes < 0) {
-ret = total_bytes;
-goto out;
-}
+total_bytes = bdrv_getlength(bs);
+if (total_bytes < 0) {
+ret = total_bytes;
+goto out;
+}
 
-max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
-if (bytes < max_bytes) {
-ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 0);
-} else if (max_bytes > 0) {
-QEMUIOVector local_qiov;
+max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
+if (bytes < max_bytes) {
+ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 0);
+} else if (max_bytes > 0) {
+QEMUIOVector local_qiov;
 
-qemu_iovec_init(&local_qiov, qiov->niov);
-qemu_iovec_concat(&local_qiov, qiov, 0, max_bytes);
+qemu_iovec_init(&local_qiov, qiov->niov);
+qemu_iovec_concat(&local_qiov, qiov, 0, max_bytes);
 
-ret = bdrv_driver_preadv(bs, offset, max_bytes, &local_qiov, 0);
+ret = bdrv_driver_preadv(bs, offset, max_bytes, &local_qiov, 0);
 
-qemu_iovec_destroy(&local_qiov);
-} else {
-ret = 0;
-}
+qemu_iovec_destroy(&local_qiov);
+} else {
+ret = 0;
+}
 
-/* Reading beyond end of file is supposed to produce zeroes */
-if (ret == 0 && total_bytes < offset + bytes) {
-uint64_t zero_offset = MAX(0, total_bytes - offset);
-uint64_t zero_bytes = offset + bytes - zero_offset;
-qemu_iovec_memset(qiov, zero_offset, 0, zero_bytes);
-}
+/* Reading beyond end of file is supposed to produce zeroes */
+if (ret == 0 && total_bytes < offset + bytes) {
+uint64_t zero_offset = MAX(0, total_bytes - offset);
+uint64_t zero_bytes = offset + bytes - zero_offset;
+qemu_iovec_memset(qiov, zero_offset, 0, zero_bytes);
 }
 
 out:
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 1fe0811..16c43e2 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -451,9 +451,6 @@ struct BlockDriverState {
 /* I/O Limits */
 BlockLimits bl;
 
-/* Whether produces zeros when read beyond eof */
-bool zero_beyond_eof;
-
 /* Alignment requirement for offset/length of I/O requests */
 unsigned int request_alignment;
 /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
-- 
1.8.3.1




[Qemu-block] [PULL 18/39] block: Don't enforce 512 byte minimum alignment

2016-06-16 Thread Kevin Wolf
If block drivers say that they can do an alignment < 512 bytes, let's
just suppose they mean it. raw-posix used to be an offender with respect
to this, but it can actually deal with byte-aligned requests now.

The default is still 512 bytes for any drivers that only implement
sector-based interfaces, but it is 1 now for drivers that implement
.bdrv_co_preadv.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block.c| 2 +-
 block/io.c | 8 +++-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index f54bc25..3d850a2 100644
--- a/block.c
+++ b/block.c
@@ -937,7 +937,7 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild 
*file,
 goto fail_opts;
 }
 
-bs->request_alignment = 512;
+bs->request_alignment = drv->bdrv_co_preadv ? 1 : 512;
 bs->zero_beyond_eof = true;
 bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
 
diff --git a/block/io.c b/block/io.c
index b261cc6..b3ff9be 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1052,8 +1052,7 @@ int coroutine_fn bdrv_co_preadv(BlockDriverState *bs,
 BlockDriver *drv = bs->drv;
 BdrvTrackedRequest req;
 
-/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
-uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment);
+uint64_t align = bs->request_alignment;
 uint8_t *head_buf = NULL;
 uint8_t *tail_buf = NULL;
 QEMUIOVector local_qiov;
@@ -1305,7 +1304,7 @@ static int coroutine_fn 
bdrv_co_do_zero_pwritev(BlockDriverState *bs,
 uint8_t *buf = NULL;
 QEMUIOVector local_qiov;
 struct iovec iov;
-uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment);
+uint64_t align = bs->request_alignment;
 unsigned int head_padding_bytes, tail_padding_bytes;
 int ret = 0;
 
@@ -1392,8 +1391,7 @@ int coroutine_fn bdrv_co_pwritev(BlockDriverState *bs,
 BdrvRequestFlags flags)
 {
 BdrvTrackedRequest req;
-/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
-uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment);
+uint64_t align = bs->request_alignment;
 uint8_t *head_buf = NULL;
 uint8_t *tail_buf = NULL;
 QEMUIOVector local_qiov;
-- 
1.8.3.1




[Qemu-block] [PULL 17/39] raw-posix: Implement .bdrv_co_preadv/pwritev

2016-06-16 Thread Kevin Wolf
The raw-posix block driver actually supports byte-aligned requests now
on non-O_DIRECT images, like it already (and previously incorrectly)
claimed in bs->request_alignment.

For some block drivers this means that a RMW cycle can be avoided when
they write sub-sector metadata e.g. for cluster allocation.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block/linux-aio.c |  7 ++-
 block/raw-aio.h   |  3 +--
 block/raw-posix.c | 43 +++
 3 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 657577a..fe7cece 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -271,15 +271,12 @@ static int laio_do_submit(int fd, struct qemu_laiocb 
*laiocb, off_t offset,
 }
 
 int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
-int64_t sector_num, QEMUIOVector *qiov,
-int nb_sectors, int type)
+uint64_t offset, QEMUIOVector *qiov, int type)
 {
-off_t offset = sector_num * BDRV_SECTOR_SIZE;
 int ret;
-
 struct qemu_laiocb laiocb = {
 .co = qemu_coroutine_self(),
-.nbytes = nb_sectors * BDRV_SECTOR_SIZE,
+.nbytes = qiov->size,
 .ctx= s,
 .is_read= (type == QEMU_AIO_READ),
 .qiov   = qiov,
diff --git a/block/raw-aio.h b/block/raw-aio.h
index 03bbfba..a4cdbbf 100644
--- a/block/raw-aio.h
+++ b/block/raw-aio.h
@@ -40,8 +40,7 @@ typedef struct LinuxAioState LinuxAioState;
 LinuxAioState *laio_init(void);
 void laio_cleanup(LinuxAioState *s);
 int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
-int64_t sector_num, QEMUIOVector *qiov,
-int nb_sectors, int type);
+uint64_t offset, QEMUIOVector *qiov, int type);
 BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 BlockCompletionFunc *cb, void *opaque, int type);
diff --git a/block/raw-posix.c b/block/raw-posix.c
index cb98769..aacf132 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1325,8 +1325,8 @@ static BlockAIOCB *paio_submit(BlockDriverState *bs, int 
fd,
 return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
 }
 
-static int coroutine_fn raw_co_rw(BlockDriverState *bs, int64_t sector_num,
-  int nb_sectors, QEMUIOVector *qiov, int type)
+static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
+   uint64_t bytes, QEMUIOVector *qiov, int 
type)
 {
 BDRVRawState *s = bs->opaque;
 
@@ -1344,26 +1344,28 @@ static int coroutine_fn raw_co_rw(BlockDriverState *bs, 
int64_t sector_num,
 type |= QEMU_AIO_MISALIGNED;
 #ifdef CONFIG_LINUX_AIO
 } else if (s->use_aio) {
-return laio_co_submit(bs, s->aio_ctx, s->fd, sector_num, qiov,
-  nb_sectors, type);
+assert(qiov->size == bytes);
+return laio_co_submit(bs, s->aio_ctx, s->fd, offset, qiov, type);
 #endif
 }
 }
 
-return paio_submit_co(bs, s->fd, sector_num * BDRV_SECTOR_SIZE, qiov,
-  nb_sectors * BDRV_SECTOR_SIZE, type);
+return paio_submit_co(bs, s->fd, offset, qiov, bytes, type);
 }
 
-static int coroutine_fn raw_co_readv(BlockDriverState *bs, int64_t sector_num,
- int nb_sectors, QEMUIOVector *qiov)
+static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset,
+  uint64_t bytes, QEMUIOVector *qiov,
+  int flags)
 {
-return raw_co_rw(bs, sector_num, nb_sectors, qiov, QEMU_AIO_READ);
+return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_READ);
 }
 
-static int coroutine_fn raw_co_writev(BlockDriverState *bs, int64_t sector_num,
-  int nb_sectors, QEMUIOVector *qiov)
+static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
+   uint64_t bytes, QEMUIOVector *qiov,
+   int flags)
 {
-return raw_co_rw(bs, sector_num, nb_sectors, qiov, QEMU_AIO_WRITE);
+assert(flags == 0);
+return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
 }
 
 static void raw_aio_plug(BlockDriverState *bs)
@@ -1952,8 +1954,8 @@ BlockDriver bdrv_file = {
 .bdrv_co_get_block_status = raw_co_get_block_status,
 .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes,
 
-.bdrv_co_readv  = raw_co_readv,
-.bdrv_co_writev = raw_co_writev,
+.bdrv_co_preadv = raw_co_preadv,
+.bdrv_co_pwritev= raw_co_pwritev,
 .bdrv_aio_flush = raw_aio_flush,
 .bdr

[Qemu-block] [PULL 19/39] linux-aio: Cancel BH if not needed

2016-06-16 Thread Kevin Wolf
linux-aio uses a BH in order to make sure that the remaining completions
are processed even in nested event loops of completion callbacks in
order to avoid deadlocks.

There is no need, however, to have the BH overhead for the first call
into qemu_laio_completion_bh() or after all pending completions have
already been processed. Therefore, this patch calls directly into
qemu_laio_completion_bh() in qemu_laio_completion_cb() and cancels
the BH after qemu_laio_completion_bh() has processed all pending
completions.

Signed-off-by: Kevin Wolf 
Reviewed-by: Paolo Bonzini 
---
 block/linux-aio.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index fe7cece..e468960 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -149,6 +149,8 @@ static void qemu_laio_completion_bh(void *opaque)
 if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
 ioq_submit(s);
 }
+
+qemu_bh_cancel(s->completion_bh);
 }
 
 static void qemu_laio_completion_cb(EventNotifier *e)
@@ -156,7 +158,7 @@ static void qemu_laio_completion_cb(EventNotifier *e)
 LinuxAioState *s = container_of(e, LinuxAioState, e);
 
 if (event_notifier_test_and_clear(&s->e)) {
-qemu_bh_schedule(s->completion_bh);
+qemu_laio_completion_bh(s);
 }
 }
 
-- 
1.8.3.1




[Qemu-block] [PULL 23/39] block: Allow .bdrv_load/save_vmstate() to return 0/-errno

2016-06-16 Thread Kevin Wolf
The return value of .bdrv_load/save_vmstate() can be any non-negative
number in case of success now. It used to be bytes/-errno.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block/io.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index aac9b67..af4d43e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1848,9 +1848,16 @@ int bdrv_save_vmstate(BlockDriverState *bs, const 
uint8_t *buf,
 .iov_base   = (void *) buf,
 .iov_len= size,
 };
+int ret;
 
 qemu_iovec_init_external(&qiov, &iov, 1);
-return bdrv_writev_vmstate(bs, &qiov, pos);
+
+ret = bdrv_writev_vmstate(bs, &qiov, pos);
+if (ret < 0) {
+return ret;
+}
+
+return size;
 }
 
 int bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
@@ -1876,9 +1883,15 @@ int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf,
 .iov_base   = buf,
 .iov_len= size,
 };
+int ret;
 
 qemu_iovec_init_external(&qiov, &iov, 1);
-return bdrv_readv_vmstate(bs, &qiov, pos);
+ret = bdrv_readv_vmstate(bs, &qiov, pos);
+if (ret < 0) {
+return ret;
+}
+
+return size;
 }
 
 int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
-- 
1.8.3.1




[Qemu-block] [PULL 32/39] iotests: 095: Clean up QEMU before showing image info

2016-06-16 Thread Kevin Wolf
From: Fam Zheng 

Signed-off-by: Fam Zheng 
Message-id: 1464944872-24484-1-git-send-email-f...@redhat.com
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/095 | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/qemu-iotests/095 b/tests/qemu-iotests/095
index dad04b9..030adb2 100755
--- a/tests/qemu-iotests/095
+++ b/tests/qemu-iotests/095
@@ -74,6 +74,8 @@ _send_qemu_cmd $h "{ 'execute': 'block-commit',
  'arguments': { 'device': 'test',
  'top': '"${TEST_IMG}.snp1"' } }" 
"BLOCK_JOB_COMPLETED"
 
+_cleanup_qemu
+
 echo
 echo "=== Base image info after commit and resize ==="
 TEST_IMG="${TEST_IMG}.base" _img_info | _filter_img_info
-- 
1.8.3.1




[Qemu-block] [PULL 16/39] raw-posix: Switch to bdrv_co_* interfaces

2016-06-16 Thread Kevin Wolf
In order to use the modern byte-based .bdrv_co_preadv/pwritev()
interface, this patch switches raw-posix to coroutine-based interfaces
as a first step. In terms of semantics and performance, it doesn't make
a difference with the existing code whether we go from a coroutine to a
callback-based interface already in block/io.c or only in linux-aio.c

As there have been concerns in the past that this change may be a step
in the wrong direction with respect to a possible AIO fast path, the
old callback-based interface for linux-aio is left around and can be
reactivated when a fast path (e.g. directly from virtio-blk dataplane,
bypassing the whole block layer) is implemented.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block/linux-aio.c | 87 +--
 block/raw-aio.h   |  4 +++
 block/raw-posix.c | 59 +
 3 files changed, 96 insertions(+), 54 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 90ec98e..657577a 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -11,8 +11,10 @@
 #include "qemu-common.h"
 #include "block/aio.h"
 #include "qemu/queue.h"
+#include "block/block.h"
 #include "block/raw-aio.h"
 #include "qemu/event_notifier.h"
+#include "qemu/coroutine.h"
 
 #include 
 
@@ -30,6 +32,7 @@
 
 struct qemu_laiocb {
 BlockAIOCB common;
+Coroutine *co;
 LinuxAioState *ctx;
 struct iocb iocb;
 ssize_t ret;
@@ -88,9 +91,14 @@ static void qemu_laio_process_completion(struct qemu_laiocb 
*laiocb)
 }
 }
 }
-laiocb->common.cb(laiocb->common.opaque, ret);
 
-qemu_aio_unref(laiocb);
+laiocb->ret = ret;
+if (laiocb->co) {
+qemu_coroutine_enter(laiocb->co, NULL);
+} else {
+laiocb->common.cb(laiocb->common.opaque, ret);
+qemu_aio_unref(laiocb);
+}
 }
 
 /* The completion BH fetches completed I/O requests and invokes their
@@ -230,22 +238,12 @@ void laio_io_unplug(BlockDriverState *bs, LinuxAioState 
*s)
 }
 }
 
-BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
-int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
-BlockCompletionFunc *cb, void *opaque, int type)
+static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
+  int type)
 {
-struct qemu_laiocb *laiocb;
-struct iocb *iocbs;
-off_t offset = sector_num * 512;
-
-laiocb = qemu_aio_get(&laio_aiocb_info, bs, cb, opaque);
-laiocb->nbytes = nb_sectors * 512;
-laiocb->ctx = s;
-laiocb->ret = -EINPROGRESS;
-laiocb->is_read = (type == QEMU_AIO_READ);
-laiocb->qiov = qiov;
-
-iocbs = &laiocb->iocb;
+LinuxAioState *s = laiocb->ctx;
+struct iocb *iocbs = &laiocb->iocb;
+QEMUIOVector *qiov = laiocb->qiov;
 
 switch (type) {
 case QEMU_AIO_WRITE:
@@ -258,7 +256,7 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState 
*s, int fd,
 default:
 fprintf(stderr, "%s: invalid AIO request type 0x%x.\n",
 __func__, type);
-goto out_free_aiocb;
+return -EIO;
 }
 io_set_eventfd(&laiocb->iocb, event_notifier_get_fd(&s->e));
 
@@ -268,11 +266,56 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, 
LinuxAioState *s, int fd,
 (!s->io_q.plugged || s->io_q.n >= MAX_QUEUED_IO)) {
 ioq_submit(s);
 }
-return &laiocb->common;
 
-out_free_aiocb:
-qemu_aio_unref(laiocb);
-return NULL;
+return 0;
+}
+
+int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
+int64_t sector_num, QEMUIOVector *qiov,
+int nb_sectors, int type)
+{
+off_t offset = sector_num * BDRV_SECTOR_SIZE;
+int ret;
+
+struct qemu_laiocb laiocb = {
+.co = qemu_coroutine_self(),
+.nbytes = nb_sectors * BDRV_SECTOR_SIZE,
+.ctx= s,
+.is_read= (type == QEMU_AIO_READ),
+.qiov   = qiov,
+};
+
+ret = laio_do_submit(fd, &laiocb, offset, type);
+if (ret < 0) {
+return ret;
+}
+
+qemu_coroutine_yield();
+return laiocb.ret;
+}
+
+BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
+int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+BlockCompletionFunc *cb, void *opaque, int type)
+{
+struct qemu_laiocb *laiocb;
+off_t offset = sector_num * BDRV_SECTOR_SIZE;
+int ret;
+
+laiocb = qemu_aio_get(&laio_aiocb_info, bs, cb, opaque);
+laiocb->nbytes = nb_sectors * BDRV_SECTOR_SIZE;
+laiocb->ctx = s;
+laiocb->ret = -EINPROGRESS;
+laiocb->is_read = (type == QEMU_AIO_READ);
+laiocb->qiov = qiov;
+
+ret = laio_do_submit(fd, laiocb, offset, type);
+if (ret < 0) {
+qemu_aio_unref(laiocb);
+return NULL;
+}
+
+return &laiocb->common;
 }
 
 void laio_detach_aio_context(

[Qemu-block] [PULL 22/39] block: Make .bdrv_load_vmstate() vectored

2016-06-16 Thread Kevin Wolf
This brings it in line with .bdrv_save_vmstate().

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block/io.c| 25 -
 block/qcow2.c |  6 +++---
 block/sheepdog.c  | 13 ++---
 include/block/block.h |  1 +
 include/block/block_int.h |  4 ++--
 5 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/block/io.c b/block/io.c
index 72d7210..aac9b67 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1871,13 +1871,28 @@ int bdrv_writev_vmstate(BlockDriverState *bs, 
QEMUIOVector *qiov, int64_t pos)
 int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf,
   int64_t pos, int size)
 {
+QEMUIOVector qiov;
+struct iovec iov = {
+.iov_base   = buf,
+.iov_len= size,
+};
+
+qemu_iovec_init_external(&qiov, &iov, 1);
+return bdrv_readv_vmstate(bs, &qiov, pos);
+}
+
+int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
+{
 BlockDriver *drv = bs->drv;
-if (!drv)
+
+if (!drv) {
 return -ENOMEDIUM;
-if (drv->bdrv_load_vmstate)
-return drv->bdrv_load_vmstate(bs, buf, pos, size);
-if (bs->file)
-return bdrv_load_vmstate(bs->file->bs, buf, pos, size);
+} else if (drv->bdrv_load_vmstate) {
+return drv->bdrv_load_vmstate(bs, qiov, pos);
+} else if (bs->file) {
+return bdrv_readv_vmstate(bs->file->bs, qiov, pos);
+}
+
 return -ENOTSUP;
 }
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 9de716a..362ada2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2926,8 +2926,8 @@ static int qcow2_save_vmstate(BlockDriverState *bs, 
QEMUIOVector *qiov,
 return ret;
 }
 
-static int qcow2_load_vmstate(BlockDriverState *bs, uint8_t *buf,
-  int64_t pos, int size)
+static int qcow2_load_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
+  int64_t pos)
 {
 BDRVQcow2State *s = bs->opaque;
 bool zero_beyond_eof = bs->zero_beyond_eof;
@@ -2935,7 +2935,7 @@ static int qcow2_load_vmstate(BlockDriverState *bs, 
uint8_t *buf,
 
 BLKDBG_EVENT(bs->file, BLKDBG_VMSTATE_LOAD);
 bs->zero_beyond_eof = false;
-ret = bdrv_pread(bs, qcow2_vm_state_offset(s) + pos, buf, size);
+ret = bdrv_preadv(bs, qcow2_vm_state_offset(s) + pos, qiov);
 bs->zero_beyond_eof = zero_beyond_eof;
 
 return ret;
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 23fbace..ef5d044 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -2784,12 +2784,19 @@ static int sd_save_vmstate(BlockDriverState *bs, 
QEMUIOVector *qiov,
 return ret;
 }
 
-static int sd_load_vmstate(BlockDriverState *bs, uint8_t *data,
-   int64_t pos, int size)
+static int sd_load_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
+   int64_t pos)
 {
 BDRVSheepdogState *s = bs->opaque;
+void *buf;
+int ret;
 
-return do_load_save_vmstate(s, data, pos, size, 1);
+buf = qemu_blockalign(bs, qiov->size);
+ret = do_load_save_vmstate(s, buf, pos, qiov->size, 1);
+qemu_iovec_from_buf(qiov, 0, buf, qiov->size);
+qemu_vfree(buf);
+
+return ret;
 }
 
 
diff --git a/include/block/block.h b/include/block/block.h
index 9c63d07..733a8ec 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -431,6 +431,7 @@ void path_combine(char *dest, int dest_size,
   const char *base_path,
   const char *filename);
 
+int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 int bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
   int64_t pos, int size);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 8a4963c..f9a32cc 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -226,8 +226,8 @@ struct BlockDriver {
 
 int (*bdrv_save_vmstate)(BlockDriverState *bs, QEMUIOVector *qiov,
  int64_t pos);
-int (*bdrv_load_vmstate)(BlockDriverState *bs, uint8_t *buf,
- int64_t pos, int size);
+int (*bdrv_load_vmstate)(BlockDriverState *bs, QEMUIOVector *qiov,
+ int64_t pos);
 
 int (*bdrv_change_backing_file)(BlockDriverState *bs,
 const char *backing_file, const char *backing_fmt);
-- 
1.8.3.1




[Qemu-block] [PULL 31/39] block: Create the commit block job before reopening any image

2016-06-16 Thread Kevin Wolf
From: Alberto Garcia 

If the base or overlay images need to be reopened in read-write mode
but the block_job_create() call fails then no one will put those
images back in read-only mode.

We can solve this problem easily by calling block_job_create() first.

Signed-off-by: Alberto Garcia 
Message-id: 
aa495045770a6f1a7cc5d408397a17c75097fdd8.1464346103.git.be...@igalia.com
Reviewed-by: Max Reitz 
Signed-off-by: Max Reitz 
---
 block/commit.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 8a00e11..444333b 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -236,6 +236,11 @@ void commit_start(BlockDriverState *bs, BlockDriverState 
*base,
 return;
 }
 
+s = block_job_create(&commit_job_driver, bs, speed, cb, opaque, errp);
+if (!s) {
+return;
+}
+
 orig_base_flags= bdrv_get_flags(base);
 orig_overlay_flags = bdrv_get_flags(overlay_bs);
 
@@ -252,16 +257,12 @@ void commit_start(BlockDriverState *bs, BlockDriverState 
*base,
 bdrv_reopen_multiple(reopen_queue, &local_err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
+block_job_unref(&s->common);
 return;
 }
 }
 
 
-s = block_job_create(&commit_job_driver, bs, speed, cb, opaque, errp);
-if (!s) {
-return;
-}
-
 s->base = blk_new();
 blk_insert_bs(s->base, base);
 
-- 
1.8.3.1




[Qemu-block] [PULL 37/39] iotests: Add test for post-mirror backing chains

2016-06-16 Thread Kevin Wolf
From: Max Reitz 

Signed-off-by: Max Reitz 
Message-id: 20160610185750.30956-5-mre...@redhat.com
Reviewed-by: Fam Zheng 
[mre...@redhat.com: Removed unnecessary imports]
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/155 | 261 +
 tests/qemu-iotests/155.out |   5 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 267 insertions(+)
 create mode 100755 tests/qemu-iotests/155
 create mode 100644 tests/qemu-iotests/155.out

diff --git a/tests/qemu-iotests/155 b/tests/qemu-iotests/155
new file mode 100755
index 000..4057b5e
--- /dev/null
+++ b/tests/qemu-iotests/155
@@ -0,0 +1,261 @@
+#!/usr/bin/env python
+#
+# Test whether the backing BDSs are correct after completion of a
+# mirror block job; in "existing" modes (drive-mirror with
+# mode=existing and blockdev-mirror) the backing chain should not be
+# overridden.
+#
+# Copyright (C) 2016 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import iotests
+from iotests import qemu_img
+
+back0_img = os.path.join(iotests.test_dir, 'back0.' + iotests.imgfmt)
+back1_img = os.path.join(iotests.test_dir, 'back1.' + iotests.imgfmt)
+back2_img = os.path.join(iotests.test_dir, 'back2.' + iotests.imgfmt)
+source_img = os.path.join(iotests.test_dir, 'source.' + iotests.imgfmt)
+target_img = os.path.join(iotests.test_dir, 'target.' + iotests.imgfmt)
+
+
+# Class variables for controlling its behavior:
+#
+# existing: If True, explicitly create the target image and blockdev-add it
+# target_backing: If existing is True: Use this filename as the backing file
+# of the target image
+# (None: no backing file)
+# target_blockdev_backing: If existing is True: Pass this dict as "backing"
+#  for the blockdev-add command
+#  (None: do not pass "backing")
+# target_real_backing: If existing is True: The real filename of the backing
+#  image during runtime, only makes sense if
+#  target_blockdev_backing is not None
+#  (None: same as target_backing)
+
+class BaseClass(iotests.QMPTestCase):
+target_blockdev_backing = None
+target_real_backing = None
+
+def setUp(self):
+qemu_img('create', '-f', iotests.imgfmt, back0_img, '1M')
+qemu_img('create', '-f', iotests.imgfmt, '-b', back0_img, back1_img)
+qemu_img('create', '-f', iotests.imgfmt, '-b', back1_img, back2_img)
+qemu_img('create', '-f', iotests.imgfmt, '-b', back2_img, source_img)
+
+self.vm = iotests.VM()
+self.vm.add_drive(None, '', 'none')
+self.vm.launch()
+
+# Add the BDS via blockdev-add so it stays around after the mirror 
block
+# job has been completed
+result = self.vm.qmp('blockdev-add',
+ options={'node-name': 'source',
+  'driver': iotests.imgfmt,
+  'file': {'driver': 'file',
+   'filename': source_img}})
+self.assert_qmp(result, 'return', {})
+
+result = self.vm.qmp('x-blockdev-insert-medium',
+ device='drive0', node_name='source')
+self.assert_qmp(result, 'return', {})
+
+self.assertIntactSourceBackingChain()
+
+if self.existing:
+if self.target_backing:
+qemu_img('create', '-f', iotests.imgfmt,
+ '-b', self.target_backing, target_img, '1M')
+else:
+qemu_img('create', '-f', iotests.imgfmt, target_img, '1M')
+
+if self.cmd == 'blockdev-mirror':
+options = { 'node-name': 'target',
+'driver': iotests.imgfmt,
+'file': { 'driver': 'file',
+  'filename': target_img } }
+if self.target_blockdev_backing:
+options['backing'] = self.target_blockdev_backing
+
+result = self.vm.qmp('blockdev-add', options=options)
+self.assert_qmp(result, 'return', {})
+
+def tearDown(self):
+self.vm.shutdown()
+os.remove(source_img)
+os.remove(back2_img)
+os.remove(back1_img)
+os.remove(back0_img)
+try:
+  

[Qemu-block] [PULL 34/39] block: Allow replacement of a BDS by its overlay

2016-06-16 Thread Kevin Wolf
From: Max Reitz 

change_parent_backing_link() asserts that the BDS to be replaced is not
used as a backing file. However, we may want to replace a BDS by its
overlay in which case that very link should not be redirected.

For instance, when doing a sync=none drive-mirror operation, we may have
the following BDS/BB forest before block job completion:

  target

  base <- source <- BlockBackend

During job completion, we want to establish the source BDS as the
target's backing node:

  target
|
v
  base <- source <- BlockBackend

This makes the target a valid replacement for the source:

  target <- BlockBackend
|
v
  base <- source

Without this modification to change_parent_backing_link() we have to
inject the target into the graph before the source is its backing node,
thus temporarily creating a wrong graph:

  target <- BlockBackend

  base <- source

Signed-off-by: Max Reitz 
Message-id: 20160610185750.30956-2-mre...@redhat.com
Reviewed-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Signed-off-by: Max Reitz 
---
 block.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 8104225..d090324 100644
--- a/block.c
+++ b/block.c
@@ -2226,9 +2226,23 @@ void bdrv_close_all(void)
 static void change_parent_backing_link(BlockDriverState *from,
BlockDriverState *to)
 {
-BdrvChild *c, *next;
+BdrvChild *c, *next, *to_c;
 
 QLIST_FOREACH_SAFE(c, &from->parents, next_parent, next) {
+if (c->role == &child_backing) {
+/* @from is generally not allowed to be a backing file, except for
+ * when @to is the overlay. In that case, @from may not be replaced
+ * by @to as @to's backing node. */
+QLIST_FOREACH(to_c, &to->children, next) {
+if (to_c == c) {
+break;
+}
+}
+if (to_c) {
+continue;
+}
+}
+
 assert(c->role != &child_backing);
 bdrv_ref(to);
 bdrv_replace_child(c, to);
-- 
1.8.3.1




[Qemu-block] [PULL 36/39] block/null: Implement bdrv_refresh_filename()

2016-06-16 Thread Kevin Wolf
From: Max Reitz 

The null block driver ignores any filename used for creating its BDSs,
which allows creating such BDSs even without any filename at all. In
that case, we currently construct a JSON filename when queried instead
of a plain "null-co://" or "null-aio://". This patch implements
bdrv_refresh_filename() to remedy this behavior.

Signed-off-by: Max Reitz 
Message-id: 20160610185750.30956-4-mre...@redhat.com
[mre...@redhat.com: Added commit message]
Signed-off-by: Max Reitz 
---
 block/null.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/block/null.c b/block/null.c
index 396500b..b511010 100644
--- a/block/null.c
+++ b/block/null.c
@@ -12,6 +12,8 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qstring.h"
 #include "block/block_int.h"
 
 #define NULL_OPT_LATENCY "latency-ns"
@@ -223,6 +225,20 @@ static int64_t coroutine_fn 
null_co_get_block_status(BlockDriverState *bs,
 }
 }
 
+static void null_refresh_filename(BlockDriverState *bs, QDict *opts)
+{
+QINCREF(opts);
+qdict_del(opts, "filename");
+
+if (!qdict_size(opts)) {
+snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s://",
+ bs->drv->format_name);
+}
+
+qdict_put(opts, "driver", qstring_from_str(bs->drv->format_name));
+bs->full_open_options = opts;
+}
+
 static BlockDriver bdrv_null_co = {
 .format_name= "null-co",
 .protocol_name  = "null-co",
@@ -238,6 +254,8 @@ static BlockDriver bdrv_null_co = {
 .bdrv_reopen_prepare= null_reopen_prepare,
 
 .bdrv_co_get_block_status   = null_co_get_block_status,
+
+.bdrv_refresh_filename  = null_refresh_filename,
 };
 
 static BlockDriver bdrv_null_aio = {
@@ -255,6 +273,8 @@ static BlockDriver bdrv_null_aio = {
 .bdrv_reopen_prepare= null_reopen_prepare,
 
 .bdrv_co_get_block_status   = null_co_get_block_status,
+
+.bdrv_refresh_filename  = null_refresh_filename,
 };
 
 static void bdrv_null_init(void)
-- 
1.8.3.1




[Qemu-block] [PULL 35/39] block/mirror: Fix target backing BDS

2016-06-16 Thread Kevin Wolf
From: Max Reitz 

Currently, we are trying to move the backing BDS from the source to the
target in bdrv_replace_in_backing_chain() which is called from
mirror_exit(). However, mirror_complete() already tries to open the
target's backing chain with a call to bdrv_open_backing_file().

First, we should only set the target's backing BDS once. Second, the
mirroring block job has a better idea of what to set it to than the
generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
conditions on when to move the backing BDS from source to target are not
really correct).

Therefore, remove that code from bdrv_replace_in_backing_chain() and
leave it to mirror_complete().

Depending on what kind of mirroring is performed, we furthermore want to
use different strategies to open the target's backing chain:

- If blockdev-mirror is used, we can assume the user made sure that the
  target already has the correct backing chain. In particular, we should
  not try to open a backing file if the target does not have any yet.

- If drive-mirror with mode=absolute-paths is used, we can and should
  reuse the already existing chain of nodes that the source BDS is in.
  In case of sync=full, no backing BDS is required; with sync=top, we
  just link the source's backing BDS to the target, and with sync=none,
  we use the source BDS as the target's backing BDS.
  We should not try to open these backing files anew because this would
  lead to two BDSs existing per physical file in the backing chain, and
  we would like to avoid such concurrent access.

- If drive-mirror with mode=existing is used, we have to use the
  information provided in the physical image file which means opening
  the target's backing chain completely anew, just as it has been done
  already.
  If the target's backing chain shares images with the source, this may
  lead to multiple BDSs per physical image file. But since we cannot
  reliably ascertain this case, there is nothing we can do about it.

Signed-off-by: Max Reitz 
Message-id: 20160610185750.30956-3-mre...@redhat.com
Reviewed-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Signed-off-by: Max Reitz 
---
 block.c   |  8 
 block/mirror.c| 39 ---
 blockdev.c| 15 ---
 include/block/block_int.h | 18 +-
 4 files changed, 57 insertions(+), 23 deletions(-)

diff --git a/block.c b/block.c
index d090324..b331eb9 100644
--- a/block.c
+++ b/block.c
@@ -2291,14 +2291,6 @@ void bdrv_replace_in_backing_chain(BlockDriverState 
*old, BlockDriverState *new)
 
 change_parent_backing_link(old, new);
 
-/* Change backing files if a previously independent node is added to the
- * chain. For active commit, we replace top by its own (indirect) backing
- * file and don't do anything here so we don't build a loop. */
-if (new->backing == NULL && !bdrv_chain_contains(backing_bs(old), new)) {
-bdrv_set_backing_hd(new, backing_bs(old));
-bdrv_set_backing_hd(old, NULL);
-}
-
 bdrv_unref(old);
 }
 
diff --git a/block/mirror.c b/block/mirror.c
index 41848b2..075384a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -44,6 +44,7 @@ typedef struct MirrorBlockJob {
 /* Used to block operations on the drive-mirror-replace target */
 Error *replace_blocker;
 bool is_none_mode;
+BlockMirrorBackingMode backing_mode;
 BlockdevOnError on_source_error, on_target_error;
 bool synced;
 bool should_complete;
@@ -742,20 +743,26 @@ static void mirror_set_speed(BlockJob *job, int64_t 
speed, Error **errp)
 static void mirror_complete(BlockJob *job, Error **errp)
 {
 MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
-Error *local_err = NULL;
-int ret;
+BlockDriverState *src, *target;
+
+src = blk_bs(job->blk);
+target = blk_bs(s->target);
 
-ret = bdrv_open_backing_file(blk_bs(s->target), NULL, "backing",
- &local_err);
-if (ret < 0) {
-error_propagate(errp, local_err);
-return;
-}
 if (!s->synced) {
 error_setg(errp, QERR_BLOCK_JOB_NOT_READY, job->id);
 return;
 }
 
+if (s->backing_mode == MIRROR_OPEN_BACKING_CHAIN) {
+int ret;
+
+assert(!target->backing);
+ret = bdrv_open_backing_file(target, NULL, "backing", errp);
+if (ret < 0) {
+return;
+}
+}
+
 /* check the target bs is not blocked and block all operations on it */
 if (s->replaces) {
 AioContext *replace_aio_context;
@@ -777,6 +784,13 @@ static void mirror_complete(BlockJob *job, Error **errp)
 aio_context_release(replace_aio_context);
 }
 
+if (s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
+BlockDriverState *backing = s->is_none_mode ? src : s->base;
+if (backing_bs(target) != backing) {
+bdrv_set_backing_hd(target, backing);
+}
+}
+
 s->shou

[Qemu-block] [PULL 33/39] rbd:change error_setg() to error_setg_errno()

2016-06-16 Thread Kevin Wolf
From: Vikhyat Umrao 

Ceph RBD block driver does not use error_setg_errno() where
it is possible to use. This patch replaces error_setg()
from error_setg_errno().

Signed-off-by: Vikhyat Umrao 
Message-id: 1462780319-5796-1-git-send-email-vum...@redhat.com
Reviewed-by: Josh Durgin 
Signed-off-by: Max Reitz 
---
 block/rbd.c | 38 +++---
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 5bc5b32..5226b6f 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -290,7 +290,8 @@ static int qemu_rbd_set_conf(rados_t cluster, const char 
*conf,
 if (only_read_conf_file) {
 ret = rados_conf_read_file(cluster, value);
 if (ret < 0) {
-error_setg(errp, "error reading conf file %s", value);
+error_setg_errno(errp, -ret, "error reading conf file %s",
+ value);
 break;
 }
 }
@@ -299,7 +300,7 @@ static int qemu_rbd_set_conf(rados_t cluster, const char 
*conf,
 } else if (!only_read_conf_file) {
 ret = rados_conf_set(cluster, name, value);
 if (ret < 0) {
-error_setg(errp, "invalid conf option %s", name);
+error_setg_errno(errp, -ret, "invalid conf option %s", name);
 ret = -EINVAL;
 break;
 }
@@ -354,9 +355,10 @@ static int qemu_rbd_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 
 clientname = qemu_rbd_parse_clientname(conf, clientname_buf);
-if (rados_create(&cluster, clientname) < 0) {
-error_setg(errp, "error initializing");
-return -EIO;
+ret = rados_create(&cluster, clientname);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "error initializing");
+return ret;
 }
 
 if (strstr(conf, "conf=") == NULL) {
@@ -381,21 +383,27 @@ static int qemu_rbd_create(const char *filename, QemuOpts 
*opts, Error **errp)
 return -EIO;
 }
 
-if (rados_connect(cluster) < 0) {
-error_setg(errp, "error connecting");
+ret = rados_connect(cluster);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "error connecting");
 rados_shutdown(cluster);
-return -EIO;
+return ret;
 }
 
-if (rados_ioctx_create(cluster, pool, &io_ctx) < 0) {
-error_setg(errp, "error opening pool %s", pool);
+ret = rados_ioctx_create(cluster, pool, &io_ctx);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "error opening pool %s", pool);
 rados_shutdown(cluster);
-return -EIO;
+return ret;
 }
 
 ret = rbd_create(io_ctx, name, bytes, &obj_order);
 rados_ioctx_destroy(io_ctx);
 rados_shutdown(cluster);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "error rbd create");
+return ret;
+}
 
 return ret;
 }
@@ -500,7 +508,7 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 clientname = qemu_rbd_parse_clientname(conf, clientname_buf);
 r = rados_create(&s->cluster, clientname);
 if (r < 0) {
-error_setg(errp, "error initializing");
+error_setg_errno(errp, -r, "error initializing");
 goto failed_opts;
 }
 
@@ -546,19 +554,19 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 r = rados_connect(s->cluster);
 if (r < 0) {
-error_setg(errp, "error connecting");
+error_setg_errno(errp, -r, "error connecting");
 goto failed_shutdown;
 }
 
 r = rados_ioctx_create(s->cluster, pool, &s->io_ctx);
 if (r < 0) {
-error_setg(errp, "error opening pool %s", pool);
+error_setg_errno(errp, -r, "error opening pool %s", pool);
 goto failed_shutdown;
 }
 
 r = rbd_open(s->io_ctx, s->name, &s->image, s->snap);
 if (r < 0) {
-error_setg(errp, "error reading header from %s", s->name);
+error_setg_errno(errp, -r, "error reading header from %s", s->name);
 goto failed_open;
 }
 
-- 
1.8.3.1




[Qemu-block] [PULL 29/39] block: use the block job list in qmp_query_block_jobs()

2016-06-16 Thread Kevin Wolf
From: Alberto Garcia 

qmp_query_block_jobs() uses bdrv_next() to look for block jobs, but
this function can only find those in top-level BlockDriverStates.

This patch uses block_job_next() instead.

Signed-off-by: Alberto Garcia 
Message-id: 
a8b7e5497b7c1fa67c12fcceae1630d01c3b1f96.1464346103.git.be...@igalia.com
Reviewed-by: Max Reitz 
Signed-off-by: Max Reitz 
---
 blockdev.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 11177b4..1d498c7 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -4157,22 +4157,18 @@ void qmp_x_blockdev_change(const char *parent, bool 
has_child,
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 {
 BlockJobInfoList *head = NULL, **p_next = &head;
-BlockDriverState *bs;
-BdrvNextIterator it;
+BlockJob *job;
 
-for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
-AioContext *aio_context = bdrv_get_aio_context(bs);
+for (job = block_job_next(NULL); job; job = block_job_next(job)) {
+BlockJobInfoList *elem = g_new0(BlockJobInfoList, 1);
+AioContext *aio_context = blk_get_aio_context(job->blk);
 
 aio_context_acquire(aio_context);
-
-if (bs->job) {
-BlockJobInfoList *elem = g_new0(BlockJobInfoList, 1);
-elem->value = block_job_query(bs->job);
-*p_next = elem;
-p_next = &elem->next;
-}
-
+elem->value = block_job_query(job);
 aio_context_release(aio_context);
+
+*p_next = elem;
+p_next = &elem->next;
 }
 
 return head;
-- 
1.8.3.1




[Qemu-block] [PULL 38/39] iotests: Add test for oVirt-like storage migration

2016-06-16 Thread Kevin Wolf
From: Max Reitz 

Signed-off-by: Max Reitz 
Message-id: 20160610185750.30956-6-mre...@redhat.com
Reviewed-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/156 | 174 +
 tests/qemu-iotests/156.out |  48 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 223 insertions(+)
 create mode 100755 tests/qemu-iotests/156
 create mode 100644 tests/qemu-iotests/156.out

diff --git a/tests/qemu-iotests/156 b/tests/qemu-iotests/156
new file mode 100755
index 000..cc95ff1
--- /dev/null
+++ b/tests/qemu-iotests/156
@@ -0,0 +1,174 @@
+#!/bin/bash
+#
+# Tests oVirt-like storage migration:
+#  - Create snapshot
+#  - Create target image with (not yet existing) target backing chain
+#(i.e. just write the name of a soon-to-be-copied-over backing file into 
it)
+#  - drive-mirror the snapshot to the target with mode=existing and sync=top
+#  - In the meantime, copy the original source files to the destination via
+#conventional means (i.e. outside of qemu)
+#  - Complete the drive-mirror job
+#  - Delete all source images
+#
+# Copyright (C) 2016 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=mre...@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1   # failure is the default!
+
+_cleanup()
+{
+rm -f "$TEST_IMG{,.target}{,.backing,.overlay}"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt qcow2 qed
+_supported_proto generic
+_supported_os Linux
+
+# Create source disk
+TEST_IMG="$TEST_IMG.backing" _make_test_img 1M
+_make_test_img -b "$TEST_IMG.backing" 1M
+
+$QEMU_IO -c 'write -P 1 0 256k' "$TEST_IMG.backing" | _filter_qemu_io
+$QEMU_IO -c 'write -P 2 64k 192k' "$TEST_IMG" | _filter_qemu_io
+
+_launch_qemu -drive if=none,id=source,file="$TEST_IMG"
+
+_send_qemu_cmd $QEMU_HANDLE \
+"{ 'execute': 'qmp_capabilities' }" \
+'return'
+
+# Create snapshot
+TEST_IMG="$TEST_IMG.overlay" _make_test_img -b "$TEST_IMG" 1M
+_send_qemu_cmd $QEMU_HANDLE \
+"{ 'execute': 'blockdev-snapshot-sync',
+   'arguments': { 'device': 'source',
+  'snapshot-file': '$TEST_IMG.overlay',
+  'format': '$IMGFMT',
+  'mode': 'existing' } }" \
+'return'
+
+# Write something to the snapshot
+_send_qemu_cmd $QEMU_HANDLE \
+"{ 'execute': 'human-monitor-command',
+   'arguments': { 'command-line':
+  'qemu-io source \"write -P 3 128k 128k\"' } }" \
+'return'
+
+# Create target image
+TEST_IMG="$TEST_IMG.target.overlay" _make_test_img -b "$TEST_IMG.target" 1M
+
+# Mirror snapshot
+_send_qemu_cmd $QEMU_HANDLE \
+"{ 'execute': 'drive-mirror',
+   'arguments': { 'device': 'source',
+  'target': '$TEST_IMG.target.overlay',
+  'mode': 'existing',
+  'sync': 'top' } }" \
+'return'
+
+# Wait for convergence
+_send_qemu_cmd $QEMU_HANDLE \
+'' \
+'BLOCK_JOB_READY'
+
+# Write some more
+_send_qemu_cmd $QEMU_HANDLE \
+"{ 'execute': 'human-monitor-command',
+   'arguments': { 'command-line':
+  'qemu-io source \"write -P 4 192k 64k\"' } }" \
+'return'
+
+# Copy source backing chain to the target before completing the job
+cp "$TEST_IMG.backing" "$TEST_IMG.target.backing"
+cp "$TEST_IMG" "$TEST_IMG.target"
+$QEMU_IMG rebase -u -b "$TEST_IMG.target.backing" "$TEST_IMG.target"
+
+# Complete block job
+_send_qemu_cmd $QEMU_HANDLE \
+"{ 'execute': 'block-job-complete',
+   'arguments': { 'device': 'source' } }" \
+''
+
+_send_qemu_cmd $QEMU_HANDLE \
+'' \
+'BLOCK_JOB_COMPLETED'
+
+# Remove the source images
+rm -f "$TEST_IMG{,.backing,.overlay}"
+
+echo
+
+# Check online disk contents
+_send_qemu_cmd $QEMU_HANDLE \
+"{ 'execute': 'human-monitor-command',
+   'arguments': { 'command-line':
+  'qemu-io source \"read -P 1 0k 64k\"' } }" \
+'return'
+
+_send_qemu_cmd $QEMU_HANDLE \
+"{ 'execute': 'human-monitor-command',
+   'arguments': { 'command-line':
+  'qemu-io source \"read -P 2 64k 64k\"' } }" \
+'return'
+
+_send_qemu_cmd $QEMU_HANDLE \
+"{ 'execute': 'human-moni

[Qemu-block] [PULL 39/39] hbitmap: add 'pos < size' asserts

2016-06-16 Thread Kevin Wolf
From: Vladimir Sementsov-Ogievskiy 

For now, fail in hbitmap_set on start + count > size will come from
hbitmap_set
  hb_count_between
hbitmap_iter_init
  assert(pos < hb->size)

This patch adds such checks to set/get/reset functions of hbitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-id: 1465924093-76875-2-git-send-email-vsement...@virtuozzo.com
Signed-off-by: Max Reitz 
---
 util/hbitmap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/util/hbitmap.c b/util/hbitmap.c
index 7121b11..99fd2ba 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -269,6 +269,7 @@ void hbitmap_set(HBitmap *hb, uint64_t start, uint64_t 
count)
 start >>= hb->granularity;
 last >>= hb->granularity;
 count = last - start + 1;
+assert(last < hb->size);
 
 hb->count += count - hb_count_between(hb, start, last);
 hb_set_between(hb, HBITMAP_LEVELS - 1, start, last);
@@ -348,6 +349,7 @@ void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t 
count)
 
 start >>= hb->granularity;
 last >>= hb->granularity;
+assert(last < hb->size);
 
 hb->count -= hb_count_between(hb, start, last);
 hb_reset_between(hb, HBITMAP_LEVELS - 1, start, last);
@@ -371,6 +373,7 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
 /* Compute position and bit in the last layer.  */
 uint64_t pos = item >> hb->granularity;
 unsigned long bit = 1UL << (pos & (BITS_PER_LONG - 1));
+assert(pos < hb->size);
 
 return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != 0;
 }
-- 
1.8.3.1




Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] blockdev: Add dynamic module loading for block drivers

2016-06-16 Thread Paolo Bonzini


On 16/06/2016 16:10, Colin Lord wrote:
> On 06/16/2016 10:05 AM, Paolo Bonzini wrote:
>>
>>
>> On 16/06/2016 16:00, Colin Lord wrote:
>> The only block drivers that can be converted into modules are the drivers
>> that don't perform any init operation except for registering themselves. 
>> This
>> is why libiscsi has been disabled as a module.

 I don't think it has in this patch :) but you can also move the
 iscsi_opts registration to vl.c.
>>>
>>> Yeah I think Stefan mentioned this point in one of the earlier threads
>>> but he said to do it in a separate commit, which I took to mean I
>>> shouldn't include it here. Should I add it as a third part to this patch
>>> series or leave it for a completely separate commit?
>>
>> The patches in the series are left separate when including them in QEMU.
>>  Therefore, a separate patch *is* (or will become :)) a separate commit.
> 
> Yep, mostly just wasn't sure whether it was considered to be related
> enough to include with the other two.

I think it is, since you had to refer to (the lack of) it in a commit
message.

In general, the beauty of patch series is that it's relatively cheap to
add a patch.  One should not overdo it, but a long series of
dependencies benefits neither the author nor the reviewer.

Paolo

>> Therefore, putting the change before this patch, or alternatively as the
>> first patch in the series, will be fine.
>>
>> Thanks,
>>
>> Paolo
>>
> 
> Sounds good, I'll work on putting that in the next version then.
> 
> Colin
> 



Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] blockdev: Add dynamic module loading for block drivers

2016-06-16 Thread Colin Lord
On 06/16/2016 10:05 AM, Paolo Bonzini wrote:
> 
> 
> On 16/06/2016 16:00, Colin Lord wrote:
> The only block drivers that can be converted into modules are the drivers
> that don't perform any init operation except for registering themselves. 
> This
> is why libiscsi has been disabled as a module.
>>>
>>> I don't think it has in this patch :) but you can also move the
>>> iscsi_opts registration to vl.c.
>>
>> Yeah I think Stefan mentioned this point in one of the earlier threads
>> but he said to do it in a separate commit, which I took to mean I
>> shouldn't include it here. Should I add it as a third part to this patch
>> series or leave it for a completely separate commit?
> 
> The patches in the series are left separate when including them in QEMU.
>  Therefore, a separate patch *is* (or will become :)) a separate commit.

Yep, mostly just wasn't sure whether it was considered to be related
enough to include with the other two.
> 
> Therefore, putting the change before this patch, or alternatively as the
> first patch in the series, will be fine.
> 
> Thanks,
> 
> Paolo
> 

Sounds good, I'll work on putting that in the next version then.

Colin



Re: [Qemu-block] [PATCH v2 15/17] block: Switch discard length bounds to byte-based

2016-06-16 Thread Eric Blake
On 06/15/2016 11:46 PM, Fam Zheng wrote:
> On Tue, 06/14 15:30, Eric Blake wrote:
>> Sector-based limits are awkward to think about; in our on-going
>> quest to move to byte-based interfaces, convert max_discard and
>> discard_alignment.  Rename them, using 'pdiscard' as an aid to
>> track which remaining discard interfaces need conversion, and so
>> that the compiler will help us catch the change in semantics
>> across any rebased code.  In iscsi.c, sector_limits_lun2qemu()
>> is no longer needed; and the BlockLimits type is now completely
>> byte-based.
>>
>> Signed-off-by: Eric Blake 
>>

>>
>>  typedef struct BlockLimits {
>> -/* maximum number of sectors that can be discarded at once */
>> -int max_discard;
>> +/* maximum number of bytes that can be discarded at once (since it
>> + * is signed, it must be < 2G, if set), should be multiple of
>> + * pdiscard_alignment, but need not be power of 2. May be 0 if no
>> + * inherent 32-bit limit */
>> +int32_t max_pdiscard;
> 
> Why not use uint32_t?
> 
>>
>> -/* optimal alignment for discard requests in sectors */
>> -int64_t discard_alignment;
>> +/* optimal alignment for discard requests in bytes, must be power
>> + * of 2, less than max_discard if that is set, and multiple of
>> + * bs->request_alignment. May be 0 if bs->request_alignment is
>> + * good enough */
>> +uint32_t pdiscard_alignment;
>>
>>  /* maximum number of bytes that can zeroized at once (since it is
>> - * signed, it must be < 2G, if set) */
>> + * signed, it must be < 2G, if set), should be multiple of
>> + * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
>>  int32_t max_pwrite_zeroes;

Because max_pwrite_zeroes didn't.

And because we're still limited by INT_MAX (or it's alternative spelling
BDRV_REQUEST_MAX_SECTORS).

Maybe we should switch both to uint32_t, but that can be a followup.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 16/17] block: Split bdrv_merge_limits() from bdrv_refresh_limits()

2016-06-16 Thread Eric Blake
On 06/15/2016 11:50 PM, Fam Zheng wrote:
> On Tue, 06/14 15:30, Eric Blake wrote:
>> The raw block driver was blindly copying all limits from bs->file,
>> even though: 1. the main bdrv_refresh_limits() already does this
>> for many of gthe limits, and 2. blindly copying from the children
> 
> s/gthe/the ?

Yep. [Sometimes it's interesting to stick in a typo, just to see who
notices. Other times it's just me fat-fingering things]

> 
>> can weaken any stricter limits that were already inherited from
>> the backing dhain during the main bdrv_refresh_limits().  Also,
>> the next patch is about to move .request_alignment into
>> BlockLimits, and that is a limit that should NOT be copied from
>> other layers in the BDS chain.
>>
>> Solve the issue by factoring out a new bdrv_merge_limits(),
>> and using that function to properly merge limits when comparing
>> two BlockDriverState objects.
>>
>> Signed-off-by: Eric Blake 
> 
> Reviewed-by: Fam Zheng 
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PULL 00/39] Block layer patches

2016-06-16 Thread Peter Maydell
On 16 June 2016 at 15:07, Kevin Wolf  wrote:
> The following changes since commit a66370b08d53837eb233cad090b3c2638084cc44:
>
>   Merge remote-tracking branch 
> 'remotes/amit-migration/tags/migration-for-2.7-4' into staging (2016-06-16 
> 10:53:33 +0100)
>
> are available in the git repository at:
>
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to 60251f4d3ecfc705c137ff505aaf7c46f31cb91b:
>
>   Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-06-16' 
> into queue-block (2016-06-16 15:22:18 +0200)
>
> 
>
> Block layer patches
>



Applied to target-arm.next, thanks.

-- PMM



[Qemu-block] [PATCH] doc: Fix mailing list address in tests/qemu-iotests/README

2016-06-16 Thread Thomas Huth
The address of the mailing list is qemu-de...@nongnu.org
instead of qemu-de...@savannah.nongnu.org. And while we're
at it, also mention the qemu-block mailing list here.

Signed-off-by: Thomas Huth 
---
 tests/qemu-iotests/README | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/README b/tests/qemu-iotests/README
index 4ccfdd1..6079b40 100644
--- a/tests/qemu-iotests/README
+++ b/tests/qemu-iotests/README
@@ -17,4 +17,5 @@ additional options to test further image formats or I/O 
methods.
 * Feedback and patches
 
 Please send improvements to the test suite, general feedback or just
-reports of failing tests cases to qemu-de...@savannah.nongnu.org.
+reports of failing tests cases to qemu-de...@nongnu.org with a CC:
+to qemu-block@nongnu.org.
-- 
1.8.3.1




Re: [Qemu-block] [PATCH] linux-aio: Cancel BH if not needed

2016-06-16 Thread Stefan Hajnoczi
On Wed, Jun 15, 2016 at 01:16:42PM +0200, Kevin Wolf wrote:
> linux-aio uses a BH in order to make sure that the remaining completions
> are processed even in nested event loops of completion callbacks in
> order to avoid deadlocks.
> 
> There is no need, however, to have the BH overhead for the first call
> into qemu_laio_completion_bh() or after all pending completions have
> already been processed. Therefore, this patch calls directly into
> qemu_laio_completion_bh() in qemu_laio_completion_cb() and cancels
> the BH after qemu_laio_completion_bh() has processed all pending
> completions.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/linux-aio.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


[Qemu-block] [PATCH] block/qcow2: Don't use cpu_to_*w()

2016-06-16 Thread Peter Maydell
Don't use the cpu_to_*w() functions, which we are trying to deprecate.
Instead either just use cpu_to_*() to do the byteswap, or use
st*_be_p() if we need to do the store somewhere other than to a
variable that's already the correct type.

Signed-off-by: Peter Maydell 
---
 block/qcow2-cluster.c  |  2 +-
 block/qcow2-refcount.c | 11 +--
 block/qcow2.c  |  6 +++---
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index b04bfaf..4acf0e3 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -117,7 +117,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t 
min_size,
 
 /* set new table */
 BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ACTIVATE_TABLE);
-cpu_to_be32w((uint32_t*)data, new_l1_size);
+stl_be_p(data, new_l1_size);
 stq_be_p(data + 4, new_l1_table_offset);
 ret = bdrv_pwrite_sync(bs->file->bs, offsetof(QCowHeader, l1_size),
data, sizeof(data));
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 66f187a..088c00f 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -565,8 +565,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
 uint64_t d64;
 uint32_t d32;
 } data;
-cpu_to_be64w(&data.d64, table_offset);
-cpu_to_be32w(&data.d32, table_clusters);
+data.d64 = cpu_to_be64(table_offset);
+data.d32 = cpu_to_be32(table_clusters);
 BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_SWITCH_TABLE);
 ret = bdrv_pwrite_sync(bs->file->bs,
offsetof(QCowHeader, refcount_table_offset),
@@ -2158,10 +2158,9 @@ write_refblocks:
 }
 
 /* Enter new reftable into the image header */
-cpu_to_be64w(&reftable_offset_and_clusters.reftable_offset,
- reftable_offset);
-cpu_to_be32w(&reftable_offset_and_clusters.reftable_clusters,
- size_to_clusters(s, reftable_size * sizeof(uint64_t)));
+reftable_offset_and_clusters.reftable_offset = 
cpu_to_be64(reftable_offset);
+reftable_offset_and_clusters.reftable_clusters =
+cpu_to_be32(size_to_clusters(s, reftable_size * sizeof(uint64_t)));
 ret = bdrv_pwrite_sync(bs->file->bs, offsetof(QCowHeader,
   refcount_table_offset),
&reftable_offset_and_clusters,
diff --git a/block/qcow2.c b/block/qcow2.c
index 6f5fb81..860d7b7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2686,9 +2686,9 @@ static int make_completely_empty(BlockDriverState *bs)
 /* "Create" an empty reftable (one cluster) directly after the image
  * header and an empty L1 table three clusters after the image header;
  * the cluster between those two will be used as the first refblock */
-cpu_to_be64w(&l1_ofs_rt_ofs_cls.l1_offset, 3 * s->cluster_size);
-cpu_to_be64w(&l1_ofs_rt_ofs_cls.reftable_offset, s->cluster_size);
-cpu_to_be32w(&l1_ofs_rt_ofs_cls.reftable_clusters, 1);
+l1_ofs_rt_ofs_cls.l1_offset = cpu_to_be64(3 * s->cluster_size);
+l1_ofs_rt_ofs_cls.reftable_offset = cpu_to_be64(s->cluster_size);
+l1_ofs_rt_ofs_cls.reftable_clusters = cpu_to_be32(1);
 ret = bdrv_pwrite_sync(bs->file->bs, offsetof(QCowHeader, l1_table_offset),
&l1_ofs_rt_ofs_cls, sizeof(l1_ofs_rt_ofs_cls));
 if (ret < 0) {
-- 
1.9.1




[Qemu-block] [PATCH 1/3] block: fixed BdrvTrackedRequest filling in bdrv_co_discard

2016-06-16 Thread Denis V. Lunev
The request area is specified in bytes, not in sectors.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy
Reviewed-by: Fam Zheng 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
CC: Max Reitz 
---
 block/io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index fb99a71..aca175e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2261,8 +2261,8 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
 return 0;
 }
 
-tracked_request_begin(&req, bs, sector_num, nb_sectors,
-  BDRV_TRACKED_DISCARD);
+tracked_request_begin(&req, bs, sector_num << BDRV_SECTOR_BITS,
+  nb_sectors << BDRV_SECTOR_BITS, 
BDRV_TRACKED_DISCARD);
 bdrv_set_dirty(bs, sector_num, nb_sectors);
 
 max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
-- 
2.1.4




[Qemu-block] [PATCH v2 0/3] potentially broken drive-mirror/drive-backup in bdrv_co_discard

2016-06-16 Thread Denis V. Lunev
Actually I have found this problem running iotest 132 for active async
mirror I have sent yesturday. Anyway, the problem is actual for current
backup/mirror implementation.

bdrv_co_discard must mark areas dirty after writing zeroes, it must call
before_write_notifier chain to push underlying data to backup and it also
must properly fill tracked request information.

Changes from v1:
- fixed problem in patch as pointed out by Vova
- ported to current (was made on top of active block mirror)
- minor spelling changes in commit messages

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy
Reviewed-by: Fam Zheng 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
CC: Max Reitz 

Denis V. Lunev (3):
  block: fixed BdrvTrackedRequest filling in bdrv_co_discard
  block: fix race in bdrv_co_discard with drive-mirror
  block: process before_write_notifiers in bdrv_co_discard

 block/io.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

-- 
2.1.4




[Qemu-block] [PATCH 3/3] block: process before_write_notifiers in bdrv_co_discard

2016-06-16 Thread Denis V. Lunev
This is mandatory for correct backup creation. In the other case the
content under this area would be lost.

Dirty bits are set exactly like in bdrv_aligned_pwritev, i.e. they are set
even if notifier has returned a error.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy
Reviewed-by: Fam Zheng 
CC: Fam Zheng 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
CC: Max Reitz 
---
 block/io.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/block/io.c b/block/io.c
index 11fcfcb..2177cc6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2264,6 +2264,11 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
 tracked_request_begin(&req, bs, sector_num << BDRV_SECTOR_BITS,
   nb_sectors << BDRV_SECTOR_BITS, 
BDRV_TRACKED_DISCARD);
 
+ret = notifier_with_return_list_notify(&bs->before_write_notifiers, &req);
+if (ret < 0) {
+goto out;
+}
+
 max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
 while (nb_sectors > 0) {
 int ret;
-- 
2.1.4




[Qemu-block] [PATCH 2/3] block: fix race in bdrv_co_discard with drive-mirror

2016-06-16 Thread Denis V. Lunev
Actually we must set dirty bitmap dirty after we have written all our
zeroes for correct processing in drive mirror code. In the other case
we can face not zeroes in this area in mirror_iteration.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy
Reviewed-by: Fam Zheng 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
CC: Max Reitz 
---
 block/io.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index aca175e..11fcfcb 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2263,7 +2263,6 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
 
 tracked_request_begin(&req, bs, sector_num << BDRV_SECTOR_BITS,
   nb_sectors << BDRV_SECTOR_BITS, 
BDRV_TRACKED_DISCARD);
-bdrv_set_dirty(bs, sector_num, nb_sectors);
 
 max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
 while (nb_sectors > 0) {
@@ -2312,6 +2311,8 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
 }
 ret = 0;
 out:
+bdrv_set_dirty(bs, req.offset >> BDRV_SECTOR_BITS,
+   req.bytes >> BDRV_SECTOR_BITS);
 tracked_request_end(&req);
 return ret;
 }
-- 
2.1.4




Re: [Qemu-block] [Qemu-devel] [PULL 00/39] Block layer patches

2016-06-16 Thread Peter Maydell
On 16 June 2016 at 18:04, Eric Blake  wrote:
> On 06/16/2016 09:06 AM, Peter Maydell wrote:
>>
>> Applied to target-arm.next, thanks.
>
> master, actually :)

Yep. Misclick in the "insert canned response" menu in my mail client...

-- PMM



Re: [Qemu-block] [Qemu-devel] [PATCH] block/qcow2: Don't use cpu_to_*w()

2016-06-16 Thread Eric Blake
On 06/16/2016 10:06 AM, Peter Maydell wrote:
> Don't use the cpu_to_*w() functions, which we are trying to deprecate.
> Instead either just use cpu_to_*() to do the byteswap, or use
> st*_be_p() if we need to do the store somewhere other than to a
> variable that's already the correct type.
> 
> Signed-off-by: Peter Maydell 
> ---
>  block/qcow2-cluster.c  |  2 +-
>  block/qcow2-refcount.c | 11 +--
>  block/qcow2.c  |  6 +++---
>  3 files changed, 9 insertions(+), 10 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PULL 00/39] Block layer patches

2016-06-16 Thread Eric Blake
On 06/16/2016 09:06 AM, Peter Maydell wrote:
> On 16 June 2016 at 15:07, Kevin Wolf  wrote:
>> The following changes since commit a66370b08d53837eb233cad090b3c2638084cc44:
>>
>>   Merge remote-tracking branch 
>> 'remotes/amit-migration/tags/migration-for-2.7-4' into staging (2016-06-16 
>> 10:53:33 +0100)
>>
>> are available in the git repository at:
>>
>>
>>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>>
>> for you to fetch changes up to 60251f4d3ecfc705c137ff505aaf7c46f31cb91b:
>>
>>   Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-06-16' 
>> into queue-block (2016-06-16 15:22:18 +0200)
>>
>> 
>>
>> Block layer patches
>>
> 
> 
> 
> Applied to target-arm.next, thanks.

master, actually :)

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 0/3] potentially broken drive-mirror/drive-backup in bdrv_co_discard

2016-06-16 Thread Paolo Bonzini


On 16/06/2016 08:58, Denis V. Lunev wrote:
> Actually I have found this problem running iotest 132 for active async
> mirror I have sent yesturday. Anyway, the problem is actual for current
> backup/mirror implementation.
> 
> bdrv_co_discard must mark areas dirty after writing zeroes, it must call
> before_write_notifier chain to push underlying data to backup and it also
> must properly fill tracked request information.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 

Three good catches!  Thanks for these patches.  I think 2 and 3 should
be Cced to qemu-stable too.

Reviewed-by: Paolo Bonzini 

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH 1/3] block: fixed BdrvTrackedRequest filling in bdrv_co_discard

2016-06-16 Thread Eric Blake
On 06/16/2016 10:09 AM, Denis V. Lunev wrote:
> The request area is specified in bytes, not in sectors.
> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy
> Reviewed-by: Fam Zheng 
> CC: Stefan Hajnoczi 
> CC: Kevin Wolf 
> CC: Max Reitz 
> ---
>  block/io.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 

Reviewed-by: Eric Blake 

[By the way, next on my list of sector-to-byte conversions is discard,
so thanks for finding this first]

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 15/17] block: Switch discard length bounds to byte-based

2016-06-16 Thread Fam Zheng
On Thu, 06/16 08:21, Eric Blake wrote:
> On 06/15/2016 11:46 PM, Fam Zheng wrote:
> > On Tue, 06/14 15:30, Eric Blake wrote:
> >> Sector-based limits are awkward to think about; in our on-going
> >> quest to move to byte-based interfaces, convert max_discard and
> >> discard_alignment.  Rename them, using 'pdiscard' as an aid to
> >> track which remaining discard interfaces need conversion, and so
> >> that the compiler will help us catch the change in semantics
> >> across any rebased code.  In iscsi.c, sector_limits_lun2qemu()
> >> is no longer needed; and the BlockLimits type is now completely
> >> byte-based.
> >>
> >> Signed-off-by: Eric Blake 
> >>
> 
> >>
> >>  typedef struct BlockLimits {
> >> -/* maximum number of sectors that can be discarded at once */
> >> -int max_discard;
> >> +/* maximum number of bytes that can be discarded at once (since it
> >> + * is signed, it must be < 2G, if set), should be multiple of
> >> + * pdiscard_alignment, but need not be power of 2. May be 0 if no
> >> + * inherent 32-bit limit */
> >> +int32_t max_pdiscard;
> > 
> > Why not use uint32_t?
> > 
> >>
> >> -/* optimal alignment for discard requests in sectors */
> >> -int64_t discard_alignment;
> >> +/* optimal alignment for discard requests in bytes, must be power
> >> + * of 2, less than max_discard if that is set, and multiple of
> >> + * bs->request_alignment. May be 0 if bs->request_alignment is
> >> + * good enough */
> >> +uint32_t pdiscard_alignment;
> >>
> >>  /* maximum number of bytes that can zeroized at once (since it is
> >> - * signed, it must be < 2G, if set) */
> >> + * signed, it must be < 2G, if set), should be multiple of
> >> + * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
> >>  int32_t max_pwrite_zeroes;
> 
> Because max_pwrite_zeroes didn't.
> 
> And because we're still limited by INT_MAX (or it's alternative spelling
> BDRV_REQUEST_MAX_SECTORS).
> 
> Maybe we should switch both to uint32_t, but that can be a followup.
> 

OK, thanks!

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [Qemu-devel] [PATCH 5/9] mirror: improve performance of mirroring of empty disk

2016-06-16 Thread Eric Blake
On 06/16/2016 04:10 AM, Stefan Hajnoczi wrote:

> 
> io_sectors currently only accounts for bytes written, not bytes read.
> 
> Therefore, I think we need:
> 
> /* Don't charge for efficient zero writes */
> if (drv->bdrv_co_pwrite_zeroes) {
> io_sectors = 0;
> }

That's not sufficient.  NBD will have conditional support for write
zeroes, depending on whether the server supports it (that is, once my
patches for NBD_CMD_WRITE_ZEROES get out of a holding pattern on all the
other patches in the queue being flushed...).  So NBD will have the
bdrv_co_pwrite_zeroes callback, but that doesn't mean it will always
work - if the server doesn't support it, calling bdrv_co_pwrite_zeroes
will fail with -ENOTSUP and fall back to less-efficient writes that need
to be accounted for.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 7/7] sheepdog: remove useless casts

2016-06-16 Thread Hitoshi Mitake
At Wed, 15 Jun 2016 18:14:37 +0200,
Laurent Vivier wrote:
> 
> This patch is the result of coccinelle script
> scripts/coccinelle/typecast.cocci
> 
> CC: Hitoshi Mitake 
> CC: qemu-block@nongnu.org
> Signed-off-by: Laurent Vivier 
> ---
>  block/sheepdog.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Hitoshi Mitake 

Thanks,
Hitoshi

> 
> diff --git a/block/sheepdog.c b/block/sheepdog.c
> index 23fbace..95e6a11 100644
> --- a/block/sheepdog.c
> +++ b/block/sheepdog.c
> @@ -1049,7 +1049,7 @@ static int parse_vdiname(BDRVSheepdogState *s, const 
> char *filename,
>  const char *host_spec, *vdi_spec;
>  int nr_sep, ret;
>  
> -strstart(filename, "sheepdog:", (const char **)&filename);
> +strstart(filename, "sheepdog:", &filename);
>  p = q = g_strdup(filename);
>  
>  /* count the number of separators */
> @@ -2652,7 +2652,7 @@ static int sd_snapshot_list(BlockDriverState *bs, 
> QEMUSnapshotInfo **psn_tab)
>  req.opcode = SD_OP_READ_VDIS;
>  req.data_length = max;
>  
> -ret = do_req(fd, s->aio_context, (SheepdogReq *)&req,
> +ret = do_req(fd, s->aio_context, &req,
>   vdi_inuse, &wlen, &rlen);
>  
>  closesocket(fd);
> -- 
> 2.5.5
> 
> 



Re: [Qemu-block] [PATCH v20 Resend 00/10] Block replication for continuous checkpoints

2016-06-16 Thread Changlong Xie

For v19, Stefan said he had reviewed most part of this patchsets.
So, this series need more comments from block and block job maintainers.

@Jeff and/or Kevin, ping...

On 06/14/2016 03:53 PM, Changlong Xie wrote:

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can get the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

You can get the patch here:
https://github.com/Pating/qemu/tree/changlox/block-replication-v20-resend

You can get the patch with framework here:
https://github.com/Pating/qemu/tree/changlox/colo_framework_v19

TODO:
1. Continuous block replication. It will be started after basic functions
are accepted.

Changs Log:
V20 Resend:
1. Resend to avoid bothering qemu-trivial maintainers
2. Address comments from Eric, fix header file issue and add a brief commit 
message for p7
V20:
1. Rebase to the lastest code
2. Address comments from stefan
p8:
1. error_setg() with an error message when check_top_bs() fails.
2. remove bdrv_ref(s->hidden_disk->bs) since commit 5c438bc6
3. use bloc_job_cancel_sync() before active commit
p9:
1. fix uninitialized 'pattern_buf'
2. introduce mkstemp(3) to fix unique filenames
3. use qemu_vfree() for qemu_blockalign() memory
4. add missing replication_start_all()
5. remove useless pattern for io_write()
V19:
1. Rebase to v2.6.0
2. Address comments from stefan
p3: a new patch that export interfaces for extra serialization
p8:
1. call replication_stop() before freeing s->top_id
2. check top_bs
3. reopen file readonly in error return paths
4. enable extra serialization between read and COW
p9: try to hanlde SIGABRT
V18:
p6: add local_err in all replication callbacks to prevent "errp == NULL"
p7: add missing qemu_iovec_destroy(xxx)
V17:
1. Rebase to the lastest codes
p2: refactor backup_do_checkpoint addressed comments from Jeff Cody
p4: fix bugs in "drive_add buddy xxx" hmp commands
p6: add "since: 2.7"
p7: fix bug in replication_close(), add missing "qapi/error.h", add 
test-replication
p8: add "since: 2.7"
V16:
1. Rebase to the newest codes
2. Address comments from Stefan & hailiang
p3: we don't need this patch now
p4: add "top-id" parameters for secondary
p6: fix NULL pointer in replication callbacks, remove unnecessary typedefs,
add doc comments that explain the semantics of Replication
p7: Refactor AioContext for thread-safe, remove unnecessary get_top_bs()
*Note*: I'm working on replication testcase now, will send out in V17
V15:
1. Rebase to the newest codes
2. Fix typos and coding style addresed Eric's comments
3. Address Stefan's comments
1) Make backup_do_checkpoint public, drop the changes on BlockJobDriver
2) Update the message and description for [PATCH 4/9]
3) Make replication_(start/stop/do_checkpoint)_all as global interfaces
4) Introduce AioContext lock to protect start/stop/do_checkpoint callbacks
5) Use BdrvChild instead of holding on to BlockDriverState * pointers
4. Clear BDRV_O_INACTIVE for hidden disk's open_flags since commit 09e0c771
5. Introduce replication_get_error_all to check replication status
6. Remove useless discard interface
V14:
1. Implement auto complete active commit
2. Implement active commit block job for replication.c
3. Address the comments from Stefan, add replication-specific API and data
structure, also remove old block layer APIs
V13:
1. Rebase to the newest codes
2. Remove redundant marcos and semicolon in replication.c
3. Fix typos in block-replication.txt
V12:
1. Rebase to the newest codes
2. Use backing reference to replcace 'allow-write-backing-file'
V11:
1. Reopen the backing file when starting blcok replication if it is not
opened in R/W mode
2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
when opening backing file
3. Block the top BDS so there is only one block job for the top BDS and
its backing chain.
V10:
1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
reference.
2. Address the comments from Eric Blake
V9:
1. Update the error messages
2. Rebase to the newest qemu
3. Split child add/delete support. These patches are sent in another patchset.
V8:
1. Address Alberto Garcia's comments
V7:
1. Implement adding/removing quorum child. Remove the option non-connect.
2. Simplify the backing refrence option according to Stefan Hajnoczi's 
suggestion
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects