Re: [PATCH v5] iotests: Test NBD client reconnection

2019-11-14 Thread Andrey Shinkevich



On 14/11/2019 18:09, Eric Blake wrote:
> On 11/11/19 9:39 PM, Andrey Shinkevich wrote:
>> The test for an NBD client. The NBD server is disconnected after the
>> client write request. The NBD client should reconnect and complete
>> the write operation.
>>
>> Suggested-by: Denis V. Lunev 
>> Suggested-by: Vladimir Sementsov-Ogievskiy 
>> Signed-off-by: Andrey Shinkevich 
>> Reviewed-by: Eric Blake 
>> Tested-by: Eric Blake 
>> ---
>> v5:  "" were replaced with '' in the test except the function comments.
>>  The 'quick' word was added to the 'qroup' file next to the test 
>> #277.
> 
> Queuing for 4.2-rc2 through my NBD tree.
> 
> 

Eric,
Thank you very much.
-- 
With the best regards,
Andrey Shinkevich




[PATCH v3 4/4] tests: More iotest 223 improvements

2019-11-14 Thread Eric Blake
Run the core of the test twice, once without iothreads, and again
with, for more coverage of both setups.

Suggested-by: Nir Soffer 
Signed-off-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/223 | 16 ++-
 tests/qemu-iotests/223.out | 85 +-
 2 files changed, 97 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/223 b/tests/qemu-iotests/223
index b5a80e50bbc1..ea69cd4b8b5e 100755
--- a/tests/qemu-iotests/223
+++ b/tests/qemu-iotests/223
@@ -117,10 +117,19 @@ _send_qemu_cmd $QEMU_HANDLE 
'{"execute":"qmp_capabilities"}' "return"
 _send_qemu_cmd $QEMU_HANDLE '{"execute":"blockdev-add",
   "arguments":{"driver":"qcow2", "node-name":"n",
 "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' "return"
-_send_qemu_cmd $QEMU_HANDLE '{"execute":"x-blockdev-set-iothread",
-  "arguments":{"node-name":"n", "iothread":"io0"}}' "return"
 _send_qemu_cmd $QEMU_HANDLE '{"execute":"block-dirty-bitmap-disable",
   "arguments":{"node":"n", "name":"b"}}' "return"
+
+for attempt in normal iothread; do
+
+echo
+echo "=== Set up NBD with $attempt access ==="
+echo
+if [ $attempt = iothread ]; then
+_send_qemu_cmd $QEMU_HANDLE '{"execute":"x-blockdev-set-iothread",
+  "arguments":{"node-name":"n", "iothread":"io0"}}' "return"
+fi
+
 _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-add",
   "arguments":{"device":"n"}}' "error" # Attempt add without server
 _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start",
@@ -180,6 +189,9 @@ _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-remove",
   "arguments":{"name":"n2"}}' "error" # Attempt duplicate clean
 _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-stop"}' "return"
 _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-stop"}' "error" # Again
+
+done
+
 _send_qemu_cmd $QEMU_HANDLE '{"execute":"quit"}' "return"
 wait=yes _cleanup_qemu

diff --git a/tests/qemu-iotests/223.out b/tests/qemu-iotests/223.out
index 861ddbd9e0a4..f17559880268 100644
--- a/tests/qemu-iotests/223.out
+++ b/tests/qemu-iotests/223.out
@@ -28,10 +28,91 @@ wrote 2097152/2097152 bytes at offset 2097152
 {"return": {}}
 {"execute":"blockdev-add", "arguments":{"driver":"IMGFMT", "node-name":"n", 
"file":{"driver":"file", "filename":"TEST_DIR/t.IMGFMT"}}}
 {"return": {}}
-{"execute":"x-blockdev-set-iothread", "arguments":{"node-name":"n", 
"iothread":"io0"}}
-{"return": {}}
 {"execute":"block-dirty-bitmap-disable", "arguments":{"node":"n", "name":"b"}}
 {"return": {}}
+
+=== Set up NBD with normal access ===
+
+{"execute":"nbd-server-add", "arguments":{"device":"n"}}
+{"error": {"class": "GenericError", "desc": "NBD server not running"}}
+{"execute":"nbd-server-start", "arguments":{"addr":{"type":"unix", 
"data":{"path":"SOCK_DIR/nbd"
+{"return": {}}
+{"execute":"nbd-server-start", "arguments":{"addr":{"type":"unix", 
"data":{"path":"SOCK_DIR/nbd1"
+{"error": {"class": "GenericError", "desc": "NBD server already running"}}
+exports available: 0
+{"execute":"nbd-server-add", "arguments":{"device":"n", "bitmap":"b"}}
+{"return": {}}
+{"execute":"nbd-server-add", "arguments":{"device":"nosuch"}}
+{"error": {"class": "GenericError", "desc": "Cannot find device=nosuch nor 
node_name=nosuch"}}
+{"execute":"nbd-server-add", "arguments":{"device":"n"}}
+{"error": {"class": "GenericError", "desc": "NBD server already has export 
named 'n'"}}
+{"execute":"nbd-server-add", "arguments":{"device":"n", "name":"n2", 
"bitmap":"b2"}}
+{"error": {"class": "GenericError", "desc": "Enabled bitmap 'b2' incompatible 
with readonly export"}}
+{"execute":"nbd-server-add", "arguments":{"device":"n", "name":"n2", 
"bitmap":"b3"}}
+{"error": {"class": "GenericError", "desc": "Bitmap 'b3' is not found"}}
+{"execute":"nbd-server-add", "arguments":{"device":"n", "name":"n2", 
"writable":true, "bitmap":"b2"}}
+{"return": {}}
+exports available: 2
+ export: 'n'
+  size:  4194304
+  flags: 0x58f ( readonly flush fua df multi cache )
+  min block: 1
+  opt block: 4096
+  max block: 33554432
+  available meta contexts: 2
+   base:allocation
+   qemu:dirty-bitmap:b
+ export: 'n2'
+  size:  4194304
+  flags: 0xced ( flush fua trim zeroes df cache fast-zero )
+  min block: 1
+  opt block: 4096
+  max block: 33554432
+  available meta contexts: 2
+   base:allocation
+   qemu:dirty-bitmap:b2
+
+=== Contrast normal status to large granularity dirty-bitmap ===
+
+read 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 524288
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 2097152/2097152 bytes at offset 2097152
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET},
+{ "start": 4096, "length": 1044480, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET},
+{ "start": 104857

[PATCH v3 1/4] iotests: Fix 173

2019-11-14 Thread Eric Blake
This test has been broken since 3.0.  It used TEST_IMG to influence
the name of a file created during _make_test_img, but commit 655ae6bb
changed things so that the wrong file name is being created, which
then caused _launch_qemu to fail.  In the meantime, the set of events
issued for the actions of the test has increased.

Why haven't we noticed the failure? Because the test rarely gets run:
'./check -qcow2 173' is insufficient (that defaults to using file protocol)
'./check -nfs 173' is insufficient (that defaults to using raw format)
so the test is only run with:
./check -qcow2 -nfs 173

Note that we already have a number of other problems with -nfs:
./check -nfs (fails 18/30)
./check -qcow2 -nfs (fails 45/76 after this patch, if exports does
not permit 'insecure')
and it's not on my priority list to fix those.  Rather, I found this
because of my next patch's work on tests using _send_qemu_cmd.

Fixes: 655ae6b
Signed-off-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/173 | 4 ++--
 tests/qemu-iotests/173.out | 6 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/173 b/tests/qemu-iotests/173
index 9e2fa2e73cb9..29dcaa1960df 100755
--- a/tests/qemu-iotests/173
+++ b/tests/qemu-iotests/173
@@ -47,9 +47,9 @@ size=100M
 BASE_IMG="${TEST_DIR}/image.base"
 TOP_IMG="${TEST_DIR}/image.snp1"

-TEST_IMG="${BASE_IMG}" _make_test_img $size
+TEST_IMG_FILE="${BASE_IMG}" _make_test_img $size

-TEST_IMG="${TOP_IMG}" _make_test_img $size
+TEST_IMG_FILE="${TOP_IMG}" _make_test_img $size

 echo
 echo === Running QEMU, using block-stream to find backing image ===
diff --git a/tests/qemu-iotests/173.out b/tests/qemu-iotests/173.out
index f477a0099a32..e83d17ec2f64 100644
--- a/tests/qemu-iotests/173.out
+++ b/tests/qemu-iotests/173.out
@@ -7,6 +7,10 @@ Formatting 'TEST_DIR/image.snp1', fmt=IMGFMT size=104857600
 {"return": {}}
 {"return": {}}
 {"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "created", "id": "disk2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk2"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "disk2", "len": 104857600, "offset": 
104857600, "speed": 0, "type": "stream"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "disk2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "disk2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "disk2", "len": 0, "offset": 0, 
"speed": 0, "type": "stream"}}
 *** done
-- 
2.21.0




[PATCH v3 3/4] iotests: Include QMP input in .out files

2019-11-14 Thread Eric Blake
We generally include relevant HMP input in .out files, by virtue of
the fact that HMP echoes its input.  But QMP does not, so we have to
explicitly inject it in the output stream (appropriately filtered to
keep the tests passing), in order to make it easier to read .out files
to see what behavior is being tested (especially true where the output
file is a sequence of {'return': {}}).

Suggested-by: Max Reitz 
Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/common.qemu |  9 
 tests/qemu-iotests/085.out | 26 ++
 tests/qemu-iotests/094.out |  4 ++
 tests/qemu-iotests/095.out |  2 +
 tests/qemu-iotests/109.out | 88 ++
 tests/qemu-iotests/117.out |  5 ++
 tests/qemu-iotests/127.out |  4 ++
 tests/qemu-iotests/140.out |  5 ++
 tests/qemu-iotests/141.out | 26 ++
 tests/qemu-iotests/143.out |  3 ++
 tests/qemu-iotests/144.out |  5 ++
 tests/qemu-iotests/153.out | 11 +
 tests/qemu-iotests/156.out | 11 +
 tests/qemu-iotests/161.out |  8 
 tests/qemu-iotests/173.out |  4 ++
 tests/qemu-iotests/182.out |  8 
 tests/qemu-iotests/183.out | 11 +
 tests/qemu-iotests/185.out | 18 +++
 tests/qemu-iotests/191.out |  8 
 tests/qemu-iotests/200.out |  1 +
 tests/qemu-iotests/223.out | 19 
 tests/qemu-iotests/229.out |  3 ++
 tests/qemu-iotests/249.out |  6 +++
 23 files changed, 285 insertions(+)

diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
index 8d2021a7eb0c..de680cf1c7c9 100644
--- a/tests/qemu-iotests/common.qemu
+++ b/tests/qemu-iotests/common.qemu
@@ -123,6 +123,9 @@ _timed_wait_for()
 # until either timeout, or a response.  If it is not set, or <=0,
 # then the command is only sent once.
 #
+# If neither $silent nor $mismatch_only is set, and $cmd begins with '{',
+# echo the command before sending it the first time.
+#
 # If $qemu_error_no_exit is set, then even if the expected response
 # is not seen, we will not exit.  $QEMU_STATUS[$1] will be set it -1 in
 # that case.
@@ -152,6 +155,12 @@ _send_qemu_cmd()
 shift $(($# - 2))
 fi

+# Display QMP being sent, but not HMP (since HMP already echoes its
+# input back to output); decide based on leading '{'
+if [ -z "$silent" ] && [ -z "$mismatch_only" ] &&
+[ "$cmd" != "${cmd#\{}" ]; then
+echo "${cmd}" | _filter_testdir | _filter_imgfmt
+fi
 while [ ${count} -gt 0 ]
 do
 echo "${cmd}" >&${QEMU_IN[${h}]}
diff --git a/tests/qemu-iotests/085.out b/tests/qemu-iotests/085.out
index 2a5f256cd3ec..bb50227b8265 100644
--- a/tests/qemu-iotests/085.out
+++ b/tests/qemu-iotests/085.out
@@ -7,48 +7,61 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728

 === Sending capabilities ===

+{ 'execute': 'qmp_capabilities' }
 {"return": {}}

 === Create a single snapshot on virtio0 ===

+{ 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'virtio0', 
'snapshot-file':'TEST_DIR/1-snapshot-v0.IMGFMT', 'format': 'IMGFMT' } }
 Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 
backing_file=TEST_DIR/t.qcow2.1 backing_fmt=qcow2 cluster_size=65536 
lazy_refcounts=off refcount_bits=16
 {"return": {}}

 === Invalid command - missing device and nodename ===

+{ 'execute': 'blockdev-snapshot-sync', 'arguments': { 
'snapshot-file':'TEST_DIR/1-snapshot-v0.IMGFMT', 'format': 'IMGFMT' } }
 {"error": {"class": "GenericError", "desc": "Cannot find device= nor 
node_name="}}

 === Invalid command - missing snapshot-file ===

+{ 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'virtio0', 
'format': 'IMGFMT' } }
 {"error": {"class": "GenericError", "desc": "Parameter 'snapshot-file' is 
missing"}}


 === Create several transactional group snapshots ===

+{ 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 
'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 
'TEST_DIR/2-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' 
: { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/2-snapshot-v1.IMGFMT' } } ] 
} }
 Formatting 'TEST_DIR/2-snapshot-v0.qcow2', fmt=qcow2 size=134217728 
backing_file=TEST_DIR/1-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 
lazy_refcounts=off refcount_bits=16
 Formatting 'TEST_DIR/2-snapshot-v1.qcow2', fmt=qcow2 size=134217728 
backing_file=TEST_DIR/t.qcow2.2 backing_fmt=qcow2 cluster_size=65536 
lazy_refcounts=off refcount_bits=16
 {"return": {}}
+{ 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 
'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 
'TEST_DIR/3-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' 
: { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/3-snapshot-v1.IMGFMT' } } ] 
} }
 Formatting 'TEST_DIR/3-snapshot-v0.qcow2', fmt=qcow2 size=134217728 
backing_file=TEST_DIR/2-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 
lazy_refcoun

[PATCH v3 2/4] iotests: Switch nbd tests to use Unix rather than TCP

2019-11-14 Thread Eric Blake
Up to now, all it took to cause a lot of iotest failures was to have a
background process such as 'nbdkit -p 10810 null' running, because we
hard-coded the TCP port.  Switching to a Unix socket eliminates this
contention.  We still have TCP coverage in test 233, and that test is
more careful to not pick a hard-coded port.

Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/common.filter | 6 --
 tests/qemu-iotests/common.rc | 8 
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index f870e00e4421..5367deea398e 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -127,7 +127,8 @@ _filter_img_create()
 -e "s#$TEST_DIR#TEST_DIR#g" \
 -e "s#$SOCK_DIR#SOCK_DIR#g" \
 -e "s#$IMGFMT#IMGFMT#g" \
--e 's#nbd:127.0.0.1:10810#TEST_DIR/t.IMGFMT#g' \
+-e 's#nbd:127.0.0.1:[0-9]\\+#TEST_DIR/t.IMGFMT#g' \
+-e 's#nbd+unix:///\??socket=SOCK_DIR/nbd#TEST_DIR/t.IMGFMT#g' \
 -e "s# encryption=off##g" \
 -e "s# cluster_size=[0-9]\\+##g" \
 -e "s# table_size=[0-9]\\+##g" \
@@ -164,7 +165,8 @@ _filter_img_info()
 -e "s#$TEST_DIR#TEST_DIR#g" \
 -e "s#$SOCK_DIR#SOCK_DIR#g" \
 -e "s#$IMGFMT#IMGFMT#g" \
--e 's#nbd://127.0.0.1:10810$#TEST_DIR/t.IMGFMT#g' \
+-e 's#nbd://127.0.0.1:[0-9]\\+$#TEST_DIR/t.IMGFMT#g' \
+-e 's#nbd+unix:///\??socket=SOCK_DIR/nbd#TEST_DIR/t.IMGFMT#g' \
 -e 's#json.*vdisk-id.*vxhs"}}#TEST_DIR/t.IMGFMT#' \
 -e "/encrypted: yes/d" \
 -e "/cluster_size: [0-9]\\+/d" \
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index fa7bae24226a..f772dcb67322 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -217,7 +217,7 @@ if [ "$IMGOPTSSYNTAX" = "true" ]; then
 TEST_IMG="$DRIVER,file.filename=$TEST_DIR/t.$IMGFMT"
 elif [ "$IMGPROTO" = "nbd" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
-TEST_IMG="$DRIVER,file.driver=nbd,file.host=127.0.0.1,file.port=10810"
+
TEST_IMG="$DRIVER,file.driver=nbd,file.type=unix,file.path=$SOCKDIR/$IMGFMT"
 elif [ "$IMGPROTO" = "ssh" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
 
TEST_IMG="$DRIVER,file.driver=ssh,file.host=127.0.0.1,file.path=$TEST_IMG_FILE"
@@ -233,7 +233,7 @@ else
 TEST_IMG=$TEST_DIR/t.$IMGFMT
 elif [ "$IMGPROTO" = "nbd" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
-TEST_IMG="nbd:127.0.0.1:10810"
+TEST_IMG="nbd+unix:///?socket=$SOCK_DIR/nbd"
 elif [ "$IMGPROTO" = "ssh" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
 
REMOTE_TEST_DIR="ssh://\\($USER@\\)\\?127.0.0.1\\(:[0-9]\\+\\)\\?$TEST_DIR"
@@ -293,7 +293,7 @@ _stop_nbd_server()
 local QEMU_NBD_PID
 read QEMU_NBD_PID < "${QEMU_TEST_DIR}/qemu-nbd.pid"
 kill ${QEMU_NBD_PID}
-rm -f "${QEMU_TEST_DIR}/qemu-nbd.pid"
+rm -f "${QEMU_TEST_DIR}/qemu-nbd.pid" "$SOCK_DIR/nbd"
 fi
 }

@@ -353,7 +353,7 @@ _make_test_img()
 if [ $IMGPROTO = "nbd" ]; then
 # Pass a sufficiently high number to -e that should be enough for all
 # tests
-eval "$QEMU_NBD -v -t -b 127.0.0.1 -p 10810 -f $IMGFMT -e 42 -x '' 
$TEST_IMG_FILE >/dev/null &"
+eval "$QEMU_NBD -v -t -k '$SOCK_DIR/nbd' -f $IMGFMT -e 42 -x '' 
$TEST_IMG_FILE >/dev/null &"
 sleep 1 # FIXME: qemu-nbd needs to be listening before we continue
 fi

-- 
2.21.0




Re: [PATCH v2 1/3] hw/block/pflash: Remove dynamic field width from trace events

2019-11-14 Thread Philippe Mathieu-Daudé

Hi Eric,

On 11/8/19 4:56 PM, Eric Blake wrote:

On 11/8/19 8:40 AM, Philippe Mathieu-Daudé wrote:

Since not all trace backends support dynamic field width in
format (dtrace via stap does not), replace by a static field
width instead.

Reported-by: Eric Blake 
Buglink: https://bugs.launchpad.net/qemu/+bug/1844817
Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/block/pflash_cfi01.c | 8 
  hw/block/pflash_cfi02.c | 8 
  hw/block/trace-events   | 8 
  3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 566c0acb77..787d1196f2 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -276,7 +276,7 @@ static uint32_t pflash_data_read(PFlashCFI01 *pfl, 
hwaddr offset,

  DPRINTF("BUG in %s\n", __func__);
  abort();
  }
-    trace_pflash_data_read(offset, width << 1, ret);
+    trace_pflash_data_read(offset, width << 3, ret);


Umm, why is width changing?  That's not mentioned in the commit message.


Previously it was used to set the format width: [1, 2, 4] -> [2, 4, 8].

We usually log the width in byte (accessed at memory location) or bits 
(used by the bus). When using this device I'm custom to think in bus 
access width.


Regardless whichever format we prefer, a change is needed.



@@ -389,7 +389,7 @@ static uint32_t pflash_read(PFlashCFI01 *pfl, 
hwaddr offset,

  break;
  }
-    trace_pflash_io_read(offset, width, width << 1, ret, pfl->cmd, 
pfl->wcycle);
+    trace_pflash_io_read(offset, width << 3, ret, pfl->cmd, 
pfl->wcycle);


And even this one is odd.  Matching up to the trace messages:


-pflash_io_read(uint64_t offset, int width, int fmt_width, uint32_t 
value, uint8_t cmd, uint8_t wcycle) "offset:0x%04"PRIx64" width:%d 
value:0x%0*x cmd:0x%02x wcycle:%u"


+pflash_io_read(uint64_t offset, int width, uint32_t value, uint8_t 
cmd, uint8_t wcycle) "offset:0x%04"PRIx64" width:%d value:0x%04x 
cmd:0x%02x wcycle:%u"


you are changing from:

"%04"PRIx64" %d %0*x...", offset, width, width << 1, ret,...

(where width<<1, ret matches *x)

into

"%04"PRIx64" %d %04x...", offset, width << 3, ret,...

where you are now printing a different value for width.


Do you prefer using a "-bit" suffix? As

"offset:0x%04"PRIx64" width:%d-bit value:0x%04x cmd:0x%02x wcycle:%u"

I can also simply remove this information. Ideally I'd revert this patch 
once the we get this format parsable by the SystemTap backend.





Re: [PATCH v2 2/3] hw/mips/gt64xxx: Remove dynamic field width from trace events

2019-11-14 Thread Philippe Mathieu-Daudé

On 11/8/19 4:58 PM, Eric Blake wrote:

On 11/8/19 8:40 AM, Philippe Mathieu-Daudé wrote:

Since not all trace backends support dynamic field width in
format (dtrace via stap does not), replace by a static field
width instead.

Reported-by: Eric Blake 
Buglink: https://bugs.launchpad.net/qemu/+bug/1844817
Signed-off-by: Philippe Mathieu-Daudé 
---
v2: Do not update qemu_log_mask()
---
  hw/mips/gt64xxx_pci.c | 16 
  hw/mips/trace-events  |  4 ++--
  2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c
index 5cab9c1ee1..6743e7c929 100644
--- a/hw/mips/gt64xxx_pci.c
+++ b/hw/mips/gt64xxx_pci.c
@@ -642,19 +642,19 @@ static void gt64120_writel(void *opaque, hwaddr 
addr,

  /* not really implemented */
  s->regs[saddr] = ~(~(s->regs[saddr]) | ~(val & 0xfffe));
  s->regs[saddr] |= !!(s->regs[saddr] & 0xfffe);
-    trace_gt64120_write("INTRCAUSE", size << 1, val);
+    trace_gt64120_write("INTRCAUSE", size << 3, val);


Again, this isn't mentioned in the commit message.  Why are you changing 
parameter values?




+++ b/hw/mips/trace-events
@@ -1,4 +1,4 @@
  # gt64xxx.c
-gt64120_read(const char *regname, int width, uint64_t value) "gt64120 
read %s value:0x%0*" PRIx64
-gt64120_write(const char *regname, int width, uint64_t value) 
"gt64120 write %s value:0x%0*" PRIx64
+gt64120_read(const char *regname, int width, uint64_t value) "gt64120 
read %s width:%d value:0x%08" PRIx64
+gt64120_write(const char *regname, int width, uint64_t value) 
"gt64120 write %s width:%d value:0x%08" PRIx64


Huh, we were really broken - the old code (if passed to printf) would 
try to parse 4 parameters, even though it was only passed 3.  But it 
looks like you still need a v3.


Oops. I am surprise the compiler doesn't emit a warning here...




Re: [PATCH v2 00/10] Further bitmaps improvements

2019-11-14 Thread Vladimir Sementsov-Ogievskiy
14.11.2019 21:47, Eric Blake wrote:
> On 10/22/19 7:58 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Hi!
>>
>> The main feature here is improvement of _next_dirty_area API, which I'm
>> going to use then for backup / block-copy.
>>
>> v2:
>> 01: just use INT64_MAX instead of adding new constant
>> 08: add separate function nbd_extent_array_convert_to_be and converted
>>  state of NBDExtentArray, to make these things explicit, and avoid
>>  extra memdup.
>> 09: Save part of comment for bitmap_to_extents(), add Eric's r-b
> 
> Is any of this series a bug fix important to get into -rc2?

Nothing

> Or is it safe to defer to the 5.0 timeframe?

Yes, no doubts.

> 
>>
>> Vladimir Sementsov-Ogievskiy (10):
>>    hbitmap: assert that we don't create bitmap larger than INT64_MAX
>>    hbitmap: move hbitmap_iter_next_word to hbitmap.c
>>    hbitmap: unpublish hbitmap_iter_skip_words
>>    hbitmap: drop meta bitmaps as they are unused
>>    block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t
>>    block/dirty-bitmap: add _next_dirty API
>>    block/dirty-bitmap: improve _next_dirty_area API
>>    nbd/server: introduce NBDExtentArray
>>    nbd/server: use bdrv_dirty_bitmap_next_dirty_area
>>    block/qcow2-bitmap: use bdrv_dirty_bitmap_next_dirty
>>
> 


-- 
Best regards,
Vladimir


Re: [PATCH v2 00/10] Further bitmaps improvements

2019-11-14 Thread Eric Blake

On 10/22/19 7:58 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi!

The main feature here is improvement of _next_dirty_area API, which I'm
going to use then for backup / block-copy.

v2:
01: just use INT64_MAX instead of adding new constant
08: add separate function nbd_extent_array_convert_to_be and converted
 state of NBDExtentArray, to make these things explicit, and avoid
 extra memdup.
09: Save part of comment for bitmap_to_extents(), add Eric's r-b


Is any of this series a bug fix important to get into -rc2?  Or is it 
safe to defer to the 5.0 timeframe?




Vladimir Sementsov-Ogievskiy (10):
   hbitmap: assert that we don't create bitmap larger than INT64_MAX
   hbitmap: move hbitmap_iter_next_word to hbitmap.c
   hbitmap: unpublish hbitmap_iter_skip_words
   hbitmap: drop meta bitmaps as they are unused
   block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t
   block/dirty-bitmap: add _next_dirty API
   block/dirty-bitmap: improve _next_dirty_area API
   nbd/server: introduce NBDExtentArray
   nbd/server: use bdrv_dirty_bitmap_next_dirty_area
   block/qcow2-bitmap: use bdrv_dirty_bitmap_next_dirty



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [QEMU-SECURITY] ide: fix assertion in ide_dma_cb() to prevent qemu DoS from quest

2019-11-14 Thread Alexander Popov
On 07.11.2019 01:05, Alexander Popov wrote:
> On 06.11.2019 15:05, Michael S. Tsirkin wrote:
>> Do you want to cook up a patch like this then?
> 
> Yes, I will take this task and return with a patch.
> 
> Thanks!

I've just sent the v2 of the patch.
Looking forward to your feedback.

Best regards,
Alexander



[PATCH v2 1/1] ide: check DMA transfer size in ide_dma_cb() to prevent qemu DoS from quests

2019-11-14 Thread Alexander Popov
The commit a718978ed58a from July 2015 introduced the assertion which
implies that the size of successful DMA transfers handled in ide_dma_cb()
should be multiple of 512 (the size of a sector). But guest systems can
initiate DMA transfers that don't fit this requirement.

PoC for Linux that uses SCSI_IOCTL_SEND_COMMAND to perform such an ATA
command and crash qemu:

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define CMD_SIZE 2048

struct scsi_ioctl_cmd_6 {
unsigned int inlen;
unsigned int outlen;
unsigned char cmd[6];
unsigned char data[];
};

int main(void)
{
intptr_t fd = 0;
struct scsi_ioctl_cmd_6 *cmd = NULL;

cmd = malloc(CMD_SIZE);
if (!cmd) {
perror("[-] malloc");
return 1;
}

memset(cmd, 0, CMD_SIZE);
cmd->inlen = 1337;
cmd->cmd[0] = READ_6;

fd = open("/dev/sg0", O_RDONLY);
if (fd == -1) {
perror("[-] opening sg");
return 1;
}

printf("[+] sg0 is opened\n");

printf("[.] qemu should break here:\n");
fflush(stdout);
ioctl(fd, SCSI_IOCTL_SEND_COMMAND, cmd);
printf("[-] qemu didn't break\n");

free(cmd);

return 1;
}

For fixing that let's check the number of bytes prepared for the transfer
by the prepare_buf() handler. If it is not a multiple of 512 then end
the DMA transfer with an error.

That also fixes the I/O stall in guests after a DMA transfer request
for less than the size of a sector.

Signed-off-by: Alexander Popov 
---
 hw/ide/core.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 754ff4dc34..85aac614f0 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -849,6 +849,7 @@ static void ide_dma_cb(void *opaque, int ret)
 int64_t sector_num;
 uint64_t offset;
 bool stay_active = false;
+int32_t prepared = 0;
 
 if (ret == -EINVAL) {
 ide_dma_error(s);
@@ -892,12 +893,10 @@ static void ide_dma_cb(void *opaque, int ret)
 n = s->nsector;
 s->io_buffer_index = 0;
 s->io_buffer_size = n * 512;
-if (s->bus->dma->ops->prepare_buf(s->bus->dma, s->io_buffer_size) < 512) {
-/* The PRDs were too short. Reset the Active bit, but don't raise an
- * interrupt. */
-s->status = READY_STAT | SEEK_STAT;
-dma_buf_commit(s, 0);
-goto eot;
+prepared = s->bus->dma->ops->prepare_buf(s->bus->dma, s->io_buffer_size);
+if (prepared % 512) {
+ide_dma_error(s);
+return;
 }
 
 trace_ide_dma_cb(s, sector_num, n, IDE_DMA_CMD_str(s->dma_cmd));
-- 
2.21.0




Re: Convert VMDK to RAW

2019-11-14 Thread Max Reitz
On 14.11.19 17:12, janine.schnei...@fau.de wrote:
> Ladies and Gentlemen,
> 
>  
> 
> I am a PhD student at the Friedrich-Alexander-University
> Erlangen-Nuremberg in Bavaria, Germany and am currently working on a
> forensic reconstruction tool. The tool can be used to analyze physical
> and virtual hard disks and to reconstruct files. I would now like to
> extend the tool so that it is able to analyze VMDK files and convert
> them to raw. Unfortunately I have not been able to understand how to
> correctly unpack and assemble VMDK containers. Since qemu is able to
> convert VMDK to raw, I wanted to ask you if you could explain to me how
> to put the grains together?

Hi,

I’m not quite sure what you mean by a “VMDK container”.  VMDK disk
images can consist of multiple files that are linked together by a
descriptor file.  In theory all you need to do is tell qemu-img to
convert that descriptor file into a raw image.  For example:

(Sorry, I don’t know much about VMware, so all I can do is use qemu
tools to demonstrate)

$ qemu-img create -f vmdk -o subformat=twoGbMaxExtentSparse foo.vmdk 4G
Formatting 'foo.vmdk', fmt=vmdk size=4294967296 compat6=off
hwversion=undefined subformat=twoGbMaxExtentSparse
$ ls
foo-s001.vmdk  foo-s002.vmdk  foo.vmdk
$

In this example, foo.vmdk is the descriptor file and it points to the
other two (data) files:

$ cat foo.vmdk
# Disk DescriptorFile
version=1
CID=6d8d65ed
parentCID=
createType="twoGbMaxExtentSparse"

# Extent description
RW 4194304 SPARSE "foo-s001.vmdk"
RW 4194304 SPARSE "foo-s002.vmdk"

# The Disk Data Base
#DDB

ddb.virtualHWVersion = "4"
ddb.geometry.cylinders = "8322"
ddb.geometry.heads = "16"
ddb.geometry.sectors = "63"
ddb.adapterType = "ide"
$


So to convert this VMDK disk image to a raw image, you’d simply do this:

$ qemu-img convert -f vmdk -O raw -p foo.vmdk foo.img
(100.00/100%)
$

Max



signature.asc
Description: OpenPGP digital signature


Convert VMDK to RAW

2019-11-14 Thread janine.schneider
Ladies and Gentlemen,

 

I am a PhD student at the Friedrich-Alexander-University Erlangen-Nuremberg
in Bavaria, Germany and am currently working on a forensic reconstruction
tool. The tool can be used to analyze physical and virtual hard disks and to
reconstruct files. I would now like to extend the tool so that it is able to
analyze VMDK files and convert them to raw. Unfortunately I have not been
able to understand how to correctly unpack and assemble VMDK containers.
Since qemu is able to convert VMDK to raw, I wanted to ask you if you could
explain to me how to put the grains together?

 

Many thanks and greetings

Janine Schneider



Re: [PATCH for-4.2 v2 3/3] block/file-posix: Let post-EOF fallocate serialize

2019-11-14 Thread Max Reitz
On 14.11.19 18:15, Max Reitz wrote:
> On 14.11.19 17:27, Christoph Hellwig wrote:
>> On Fri, Nov 01, 2019 at 04:25:10PM +0100, Max Reitz wrote:
>>> The XFS kernel driver has a bug that may cause data corruption for qcow2
>>> images as of qemu commit c8bb23cbdbe32f.  We can work around it by
>>> treating post-EOF fallocates as serializing up until infinity (INT64_MAX
>>> in practice).
>>
>> This has been fixed in the kernel a while ago.  I don't think it makes
>> sense to work around it in qemu.
> 
> Has it?  It was my understanding that this is fixed by
> https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=249bd9087a5264d2b8a974081870e2e27671b4dcwhich

Sorry, broke the link.  Let me try again:

https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=249bd9087a5264d2b8a974081870e2e27671b4dc

Max

> has been merged only very recently and is on track to be part of Linux
> 5.5, as far as I understand.
> 
> Max
> 




signature.asc
Description: OpenPGP digital signature


Re: [PATCH for-4.2 v2 3/3] block/file-posix: Let post-EOF fallocate serialize

2019-11-14 Thread Max Reitz
On 14.11.19 17:27, Christoph Hellwig wrote:
> On Fri, Nov 01, 2019 at 04:25:10PM +0100, Max Reitz wrote:
>> The XFS kernel driver has a bug that may cause data corruption for qcow2
>> images as of qemu commit c8bb23cbdbe32f.  We can work around it by
>> treating post-EOF fallocates as serializing up until infinity (INT64_MAX
>> in practice).
> 
> This has been fixed in the kernel a while ago.  I don't think it makes
> sense to work around it in qemu.

Has it?  It was my understanding that this is fixed by
https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=249bd9087a5264d2b8a974081870e2e27671b4dcwhich
has been merged only very recently and is on track to be part of Linux
5.5, as far as I understand.

Max



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH v2 20/26] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()

2019-11-14 Thread Alberto Garcia
On Tue 05 Nov 2019 12:43:16 PM CET, Max Reitz wrote:

> Speaking of handle_copied(); both elements of Qcow2COWRegion are of
> type unsigned.  handle_copied() doesn’t look like it takes any
> precautions to limit the range to even UINT_MAX (and it should
> probably limit it to INT_MAX).

Or rather, both handle_copied() and handle_alloc() should probably limit
it to BDRV_REQUEST_MAX_BYTES.

Berto



Re: [PATCH for-4.2 v2 3/3] block/file-posix: Let post-EOF fallocate serialize

2019-11-14 Thread Christoph Hellwig
On Fri, Nov 01, 2019 at 04:25:10PM +0100, Max Reitz wrote:
> The XFS kernel driver has a bug that may cause data corruption for qcow2
> images as of qemu commit c8bb23cbdbe32f.  We can work around it by
> treating post-EOF fallocates as serializing up until infinity (INT64_MAX
> in practice).

This has been fixed in the kernel a while ago.  I don't think it makes
sense to work around it in qemu.



Re: [RFC PATCH v2 16/26] qcow2: Add subcluster support to discard_in_l2_slice()

2019-11-14 Thread Max Reitz
On 14.11.19 16:33, Alberto Garcia wrote:
> On Mon 04 Nov 2019 04:07:35 PM CET, Max Reitz wrote:
>>>  /* First remove L2 entries */
>>>  qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>>> -if (!full_discard && s->qcow_version >= 3) {
>>> +if (has_subclusters(s)) {
>>> +set_l2_entry(s, l2_slice, l2_index + i, 0);
>>> +set_l2_bitmap(s, l2_slice, l2_index + i,
>>> +  full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES);
>>
>> But only for !full_discard, right?
> 
> Hence the conditional operator in my patch, maybe you didn't see it?

Oops, yep, I was just hung up on the if conditional.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v4 03/14] qapi: Introduce default values for struct members

2019-11-14 Thread Markus Armbruster
Less than thorough review, because I expect the necessary rebase will
require a bit of rewriting here and there.

Max Reitz  writes:

> With this change, it is possible to give default values for struct
> members, as follows:
>
>   What you had to do so far:
>
> # @member: Some description, defaults to 42.
> { 'struct': 'Foo',
>   'data': { '*member': 'int' } }
>
>   What you can do now:
>
> { 'struct': 'Foo',
>   'data': { '*member': { 'type': 'int', 'default': 42 } }
>
> On the C side, this change would remove Foo.has_member, because
> Foo.member is always valid now.  The input visitor deals with setting
> it.  (Naturally, this means that such defaults are useful only for input
> parameters.)
>
> At least three things are left unimplemented:
>
> First, support for alternate data types.  This is because supporting
> them would mean having to allocate the object in the input visitor, and
> then potentially going through multiple levels of nested types.  In any
> case, it would have been difficult and I do not think there is need for
> such support at this point.

I don't mind restricting the 'default' feature to uses we actually have,
at least initially.

I'm afraid I don't fully understand the difficulties you describe, but I
guess that's okay.

> Second, support for null.  The most important reason for this is that
> introspection already uses "'default': null" for "no default, but this
> field is optional".  The second reason is that without support for
> alternate data types, there is not really a point in supporting null.

Also, commit 9d55380b5a "qapi: Remove null from schema language" :)

> Third, full support for default lists.  This has a similar reason to the
> lack of support for alternate data types: Allocating a default list is
> not trivial -- unless the list is empty, which is exactly what we have
> support for.

Your commit message says "for struct members".  What about union
members?  Cases:

* Flat union 'base' members: 'base' is a a struct, possibly implicit.
  Do defaults work in implicit bases, like BlockdevOption's?

* Flat union branch members: these are always struct types, so there's
  nothing for me to ask.  I think.

* Simple union branch members: these are each wrapped in an implicit
  struct type.  Do defaults work?  I'd be totally fine with "nope, not
  implemented, not going to implement it" here.

> Signed-off-by: Max Reitz 
> ---
>  qapi/introspect.json   |   9 +-
>  scripts/qapi/commands.py   |   2 +-
>  scripts/qapi/common.py | 167 +++--
>  scripts/qapi/doc.py|  20 -
>  scripts/qapi/introspect.py |   2 +-
>  scripts/qapi/types.py  |   2 +-
>  scripts/qapi/visit.py  |  38 -
>  7 files changed, 217 insertions(+), 23 deletions(-)

Missing: docs/devel/qapi-code-gen.txt update.

>
> diff --git a/qapi/introspect.json b/qapi/introspect.json
> index 1843c1cb17..db703135f9 100644
> --- a/qapi/introspect.json
> +++ b/qapi/introspect.json
> @@ -198,11 +198,10 @@
>  #
>  # @default: default when used as command parameter.
>  #   If absent, the parameter is mandatory.
> -#   If present, the value must be null.  The parameter is
> -#   optional, and behavior when it's missing is not specified
> -#   here.
> -#   Future extension: if present and non-null, the parameter
> -#   is optional, and defaults to this value.
> +#   If present and null, the parameter is optional, and
> +#   behavior when it's missing is not specified here.
> +#   If present and non-null, the parameter is optional, and
> +#   defaults to this value.
>  #
>  # Since: 2.5
>  ##
> diff --git a/scripts/qapi/commands.py b/scripts/qapi/commands.py
> index b929e07be4..6c407cd4ba 100644
> --- a/scripts/qapi/commands.py
> +++ b/scripts/qapi/commands.py
> @@ -35,7 +35,7 @@ def gen_call(name, arg_type, boxed, ret_type):
>  elif arg_type:
>  assert not arg_type.variants
>  for memb in arg_type.members:
> -if memb.optional:
> +if memb.optional and memb.default is None:
>  argstr += 'arg.has_%s, ' % c_name(memb.name)
>  argstr += 'arg.%s, ' % c_name(memb.name)
>  
> diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
> index c6754a5856..8c57d0c67a 100644
> --- a/scripts/qapi/common.py
> +++ b/scripts/qapi/common.py
> @@ -14,6 +14,7 @@
>  from __future__ import print_function
>  from contextlib import contextmanager
>  import errno
> +import math
>  import os
>  import re
>  import string
> @@ -800,6 +801,136 @@ def check_if(expr, info):
>  check_if_str(ifcond, info)
>  
>  
> +def check_value_str(info, value):
> +return 'g_strdup(%s)' % to_c_string(value) if type(value) is str else 
> False

What's wrong with isinstance(value, str)?

I'm a happy user of ternaries myself, but this one results in a rather
long line.  Easier to read, I think:

   if isinstance(value, str

Re: [RFC PATCH v2 18/26] qcow2: Add subcluster support to expand_zero_clusters_in_l1()

2019-11-14 Thread Alberto Garcia
On Tue 05 Nov 2019 12:05:02 PM CET, Max Reitz wrote:
>> @@ -2102,6 +2103,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
>> *bs, uint64_t *l1_table,
>>  } else {
>>  set_l2_entry(s, l2_slice, j, offset);
>>  }
>> +set_l2_bitmap(s, l2_slice, j, QCOW_L2_BITMAP_ALL_ALLOC);
>>  l2_dirty = true;
>>  }
>
> Technically this isn’t needed because this function is only called
> when downgrading v3 to v2 images (which can’t have subclusters), but
> of course it won’t hurt.

Right, but we need to change the function anyway to either do this or
assert that there are no subclusters. I prefer to do this because it's
trivial but I won't oppose if someone prefers the alternative.

Berto



Re: [RFC PATCH v2 16/26] qcow2: Add subcluster support to discard_in_l2_slice()

2019-11-14 Thread Alberto Garcia
On Mon 04 Nov 2019 04:07:35 PM CET, Max Reitz wrote:
>>  /* First remove L2 entries */
>>  qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>> -if (!full_discard && s->qcow_version >= 3) {
>> +if (has_subclusters(s)) {
>> +set_l2_entry(s, l2_slice, l2_index + i, 0);
>> +set_l2_bitmap(s, l2_slice, l2_index + i,
>> +  full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES);
>
> But only for !full_discard, right?

Hence the conditional operator in my patch, maybe you didn't see it?

Berto



Re: [RFC PATCH v2 15/26] qcow2: Add subcluster support to zero_in_l2_slice()

2019-11-14 Thread Alberto Garcia
On Mon 04 Nov 2019 04:10:58 PM CET, Max Reitz wrote:
>>>  qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>>>  if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
>>> -set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
>>>  qcow2_free_any_clusters(bs, old_offset, 1, 
>>> QCOW2_DISCARD_REQUEST);
>> 
>> It feels wrong to me to free the cluster before updating the L2
>> entry.
>
> (Although it’s pre-existing, as set_l2_entry() is just an in-cache
> operation anyway :-/)

Yes, I think that if you want to do it afterwards you need to add
another loop after the qcow2_cache_put() call.

Berto



Re: [PATCH] MAINTAINERS: add more bitmap-related to Dirty Bitmaps section

2019-11-14 Thread Eric Blake

On 10/26/19 11:56 AM, Vladimir Sementsov-Ogievskiy wrote:

Let's add bitmaps persistence qcow2 feature and postcopy bitmaps
migration to Dirty Bitmaps section.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  MAINTAINERS | 2 ++
  1 file changed, 2 insertions(+)



I see no reason why this can't go in -rc2, so I'll queue it through my 
NBD tree.



diff --git a/MAINTAINERS b/MAINTAINERS
index 556ce0bfe3..51f31b4011 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1829,6 +1829,8 @@ F: util/hbitmap.c
  F: block/dirty-bitmap.c
  F: include/qemu/hbitmap.h
  F: include/block/dirty-bitmap.h
+F: qcow2-bitmap.c
+F: migration/block-dirty-bitmap.c
  F: tests/test-hbitmap.c
  F: docs/interop/bitmaps.rst
  T: git https://github.com/jnsnow/qemu.git bitmaps



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v5] iotests: Test NBD client reconnection

2019-11-14 Thread Eric Blake

On 11/11/19 9:39 PM, Andrey Shinkevich wrote:

The test for an NBD client. The NBD server is disconnected after the
client write request. The NBD client should reconnect and complete
the write operation.

Suggested-by: Denis V. Lunev 
Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Andrey Shinkevich 
Reviewed-by: Eric Blake 
Tested-by: Eric Blake 
---
v5:  "" were replaced with '' in the test except the function comments.
 The 'quick' word was added to the 'qroup' file next to the test #277.


Queuing for 4.2-rc2 through my NBD tree.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [RFC PATCH v2 10/26] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()

2019-11-14 Thread Alberto Garcia
On Wed 30 Oct 2019 05:55:04 PM CET, Max Reitz wrote:
>> This patch also adds the get/set_l2_bitmap() functions that are used
>> to access the bitmaps. For convenience, these functions are no-ops
>> when used in traditional qcow2 images.
>
> Granted, I haven’t seen the following patches yet, but if these
> functions are indeed called for images that don’t have subclusters,
> shouldn’t they return 0x0*0f*f then? (i.e. everything allocated)
>
> If they aren’t, they should probably just abort().  Well,
> set_l2_bitmap() should probably always abort() if there aren’t any
> subclusters.

Yeah, set_l2_bitmap() should abort (I had this changed already).

About get_l2_bitmap() ... I decided not to abort for convenience, for
cases like this one:

uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index);
uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc);

Here the value of l2_bitmap is going to be ignored anyway so it doesn't
matter what we return, but perhaps for consistency we should return
QCOW_OFLAG_SUB_ALLOC(0), which means that the first (and only, in this
case) subcluster is allocated.

Berto



Re: [PATCH v3 1/4] nbd/server: Prefer heap over stack for parsing client names

2019-11-14 Thread Eric Blake

On 11/14/19 4:04 AM, Maxim Levitsky wrote:

On Wed, 2019-11-13 at 20:46 -0600, Eric Blake wrote:

As long as we limit NBD names to 256 bytes (the bare minimum permitted
by the standard), stack-allocation works for parsing a name received
from the client.  But as mentioned in a comment, we eventually want to
permit up to the 4k maximum of the NBD standard, which is too large
for stack allocation; so switch everything in the server to use heap
allocation.  For now, there is no change in actually supported name
length.


I am just curios, why is this so?
I know that kernel uses 8K stacks due to historical limitation
of 1:1 physical memory mapping which creates fragmentation,
but in the userspace stacks shouldn't really be limited and grow on demand.


Actually, 4k rather than 8k stack overflow guard pages are typical on 
some OS.  The problem with stack-allocating anything larger than the 
guard page size is that you can end up overshooting the guard page, and 
then the OS is unable to catch stack overflow in the normal manner of 
sending SIGSEGV.  Also, when using coroutines, it is very common to have 
limited stack size in the first place, where large stack allocations can 
run into issues.  So in general, it's a good rule of thumb to never 
stack-allocate something if it can be larger than 4k.



Some gcc security option limits this?


Not by default, but you can compile with -Wframe-larger-than=4096 (or 
even smaller) to catch instances where stack allocation is likely to run 
into trouble.




@@ -427,7 +431,7 @@ static void nbd_check_meta_export(NBDClient *client)
  static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes,
  Error **errp)
  {
-char name[NBD_MAX_NAME_SIZE + 1];
+g_autofree char *name;


That is what patchew complained about I think.


Yes, and I've already fixed the missing initializer.



Isn't it wonderful how g_autofree fixes one issue
and introduces another. I mean 'name' isn't really
used here prior to allocation according to plain C,
but due to g_autofree, it can be now on any error
path. Nothing against g_autofree though, just noting this.


Yes, and our documentation for g_auto* reminds that all such variables 
with automatic cleanup must have an initializer or be set prior to any 
exit path.  I think I see why I didn't catch it beforehand - I'm 
compiling with --enable-debug, which passes CFLAGS=-g, while the 
compiler warning occurs when -O2 is in effect; but it is rather annoying 
that gcc doesn't catch the bug when not optimizing.




Looks correct, but I might have missed something.

Reviewed-by: Maxim Levitsky 



Thanks, and assuming that's with my initializer fix squashed in.


Best regards,
Maxim Levitsky




--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-11-14 Thread Max Reitz
On 11.09.19 13:00, Max Reitz wrote:
> On 11.09.19 12:31, Kevin Wolf wrote:
>> Am 11.09.2019 um 12:00 hat Max Reitz geschrieben:
>>> On 11.09.19 10:27, Kevin Wolf wrote:
 Am 11.09.2019 um 09:37 hat Max Reitz geschrieben:
> On 11.09.19 08:55, Kevin Wolf wrote:
>> Well, by default the primary child, which should cover like 90% of the
>> drivers?
>
> Hm, yes.
>
> But I still think that the drivers that do not want to count every
> single non-COW child are the exception.

 They are, but drivers that want to count more than their primary node
 are exceptions, too. And I think you're more likely to remember adding
 the callback when you want to have a certain feature, not when you don't
 want to have it.

 I really think we're likely to forget adding the callback where we need
 to disable the feature.
>>>
>>> Well, I mean, we did forget adding it for qcow2.
>>
>> I'm afraid I have to agree. So the conclusion is that we won't get it
>> right anyway?
>>
 I can see two options that should address both of our views:

 1. Just don't have a fallback at all, make the callback mandatory and
provide implementations in block.c that can be referred to in
BlockDriver. Not specifying the callback causes an assertion failure,
so we'd hopefully notice it quite early (assuming that we run either
'qemu-img info' or 'query-block' on a configuration with the block
driver, but I think that's faily safe to assume).
>>>
>>> Hm.  Seems a bit much, but if we can’t agree on what’s a good general
>>> implementation that works for everything, this is probably the only
>>> thing that would actually keep us from forgetting to add special cases.
>>>
>>> Though I actually don’t know.  I’d probably add two globally available
>>> helpers, one that returns the sum of everything but the backing node,
>>> and one that just returns the primary node.
>>
>> Yes, I think this is the same as I meant by "provide implementations in
>> block.c".
>>
>>> Now if I were to make qcow2 use the primary node helper function, would
>>> we have remembered changing it once we added a data file?
>>>
>>> Hmm.  Maybe not, but it should be OK to just make everything use the sum
>>> helper, except the drivers that want the primary node.  That should work
>>> for all cases.  (I think that whenever a format driver suddenly gains
>>> more child nodes, we probably will want to count them.  OTOH, everything
>>> that has nodes that shouldn’t be counted probably always wants to use
>>> the primary node helper function from the start.)
>>
>> The job filter nodes have only one child currently, which should be
>> counted. We'll add other children that shouldn't be counted only later.
>>
>> But we already have an idea of what possible extensions look like, so we
>> can probably choose the right function from the start.
> 
> Yep.
> 
 2. Make the 90% solution a 100% solution: Allow drivers to have multiple
storage children (for vmdk) and then have the fallback add up the
primary child plus all storage children. This is what I suggested as
the documented semantics in my initial reply to this patch (that you
chose not to answer).
>>>
>>> I didn’t answer that because I didn’t disagree.
>>>
Adding the size of storage children covers qcow2 and vmdk.
>>>
>>> That’s of course exactly what we’re trying to do, but the question is,
>>> how do we figure out that storage children?  Make it a per-BdrvChild
>>> attribute?  That seems rather heavy-handed, because I think we’d need it
>>> only here.
>>
>> Well, you added bdrv_storage_child().I'd argue this interface is wrong
> 
> Yes, it probably is.
> 
>> because it assumes that only one storage child exists. You just didn't
>> implement it for vmdk so that the problem didn't become apparent. It
>> would have to return a list rather than a single child. So fixing the
>> interface and then using it is what I was thinking.
>>
>> Now that you mention a per-BdrvChild attribute, however, I start to
>> wonder if the distinction between COW children, filter children, storage
>> children, metadata children, etc. isn't really what BdrvChildRole was
>> supposed to represent?
> 
> That’s a good point.
> 
>> Maybe we want to split off child_storage from child_file, though it's
>> not strictly necessary for this specific case because we want to treat
>> both metadata and storage nodes the same. But it could be useful for
>> other users of bdrv_storage_child(), if there are any.
> 
> Possible.  Maybe it turns out that at least for this series I don’t need
> bdrv_storage_child() at all.
> 
As the job filter won't declare the target or any other involved
nodes their storage nodes (I hope), this will do the right thing for
them, too.

For quorum and blkverify both ways could be justifiable. I think they
probably shouldn't declare their children as storage nodes.

Re: [RFC PATCH 02/18] qemu-storage-daemon: Add --object option

2019-11-14 Thread Kevin Wolf
Am 07.11.2019 um 21:36 hat Markus Armbruster geschrieben:
> Kevin Wolf  writes:
> 
> > Add a command line option to create user-creatable QOM objects.
> >
> > Signed-off-by: Kevin Wolf 
> > ---
> >  qemu-storage-daemon.c | 35 +++
> >  1 file changed, 35 insertions(+)
> >
> > diff --git a/qemu-storage-daemon.c b/qemu-storage-daemon.c
> > index a251dc255c..48d6af43a6 100644
> > --- a/qemu-storage-daemon.c
> > +++ b/qemu-storage-daemon.c
> > @@ -35,6 +35,8 @@
> >  #include "qemu/log.h"
> >  #include "qemu/main-loop.h"
> >  #include "qemu/module.h"
> > +#include "qemu/option.h"
> > +#include "qom/object_interfaces.h"
> >  
> >  #include "trace/control.h"
> >  
> > @@ -51,10 +53,26 @@ static void help(void)
> >  " specify tracing options\n"
> >  "  -V, --version  output version information and exit\n"
> >  "\n"
> > +"  --object   define a QOM object such as 'secret' for\n"
> > +" passwords and/or encryption keys\n"
> 
> This is less helpful than qemu-system-FOO's help:
> 
> -object TYPENAME[,PROP1=VALUE1,...]
> create a new object of type TYPENAME setting properties
> in the order they are specified.  Note that the 'id'
> property must be set.  These objects are placed in the
> '/objects' path.

Hm, yes. I took the description from the tools. I can switch to the vl.c
one (should the tools, too?).

But honestly, neither of the two is enough to tell anyone how to
actually use this... Considering how many different objects there are,
maybe the best we can do is referring to the man page for details.

> > +"\n"
> >  QEMU_HELP_BOTTOM "\n",
> >  error_get_progname());
> >  }
> >  
> > +enum {
> > +OPTION_OBJECT = 256,
> > +};
> > +
> > +static QemuOptsList qemu_object_opts = {
> > +.name = "object",
> > +.implied_opt_name = "qom-type",
> > +.head = QTAILQ_HEAD_INITIALIZER(qemu_object_opts.head),
> > +.desc = {
> > +{ }
> > +},
> > +};
> > +
> 
> Note for later: copied from vl.c.
> 
> >  static int process_options(int argc, char *argv[], Error **errp)
> >  {
> >  int c;
> > @@ -63,6 +81,7 @@ static int process_options(int argc, char *argv[], Error 
> > **errp)
> >  
> >  static const struct option long_options[] = {
> >  {"help", no_argument, 0, 'h'},
> > +{"object", required_argument, 0, OPTION_OBJECT},
> >  {"version", no_argument, 0, 'V'},
> >  {"trace", required_argument, NULL, 'T'},
> >  {0, 0, 0, 0}
> > @@ -88,6 +107,22 @@ static int process_options(int argc, char *argv[], 
> > Error **errp)
> >  g_free(trace_file);
> >  trace_file = trace_opt_parse(optarg);
> >  break;
> > +case OPTION_OBJECT:
> > +{
> > +QemuOpts *opts;
> > +const char *type;
> > +
> > +opts = qemu_opts_parse(&qemu_object_opts,
> > +   optarg, true, &error_fatal);
> > +type = qemu_opt_get(opts, "qom-type");
> > +
> > +if (user_creatable_print_help(type, opts)) {
> > +exit(EXIT_SUCCESS);
> > +}
> > +user_creatable_add_opts(opts, &error_fatal);
> > +qemu_opts_del(opts);
> > +break;
> > +}
> >  }
> >  }
> >  if (optind != argc) {
> 
> PATCH 01 duplicates case QEMU_OPTION_trace pretty much verbatim.  Makes
> sense, as qemu-storage-daemon is basically qemu-system-FOO with "FOO"
> and most "system" cut away.
> 
> This patch adds vl.c's case QEMU_OPTION_object in a much simpler form.
> This is one of my least favourite options, and I'll tell you why below.
> Let's compare the two versions.
> 
> vl.c:
> 
> case QEMU_OPTION_object:
> opts = qemu_opts_parse_noisily(qemu_find_opts("object"),
>optarg, true);
> if (!opts) {
> exit(1);
> }
> break;
> 
> Further down:
> 
> qemu_opts_foreach(qemu_find_opts("object"),
>   user_creatable_add_opts_foreach,
>   object_create_initial, &error_fatal);
> 
> Still further down:
> 
> qemu_opts_foreach(qemu_find_opts("object"),
>   user_creatable_add_opts_foreach,
>   object_create_delayed, &error_fatal);
> 
> These are basically
> 
> for opts in qemu_object_opts {
> type = qemu_opt_get(opts, "qom-type");
> if (type) {
> if (user_creatable_print_help(type, opts)) {
> exit(0);
> }
> if (!predicate(type)) {
> continue;
> }
> }
> obj = user_creatable_add_opts(opts, &error_fatal);
> object_unref(obj);
> }
> 
> where predicate(type) is true in exactly one of the

Re: [Qemu-devel] [PATCH v4 01/14] qapi: Parse numeric values

2019-11-14 Thread Markus Armbruster
Max Reitz  writes:

> On 14.11.19 10:15, Markus Armbruster wrote:
>> Max Reitz  writes:
>> 
>>> Signed-off-by: Max Reitz 
>>> ---
>>>  tests/qapi-schema/bad-type-int.json  |  1 -
>>>  tests/qapi-schema/enum-int-member.json   |  1 -
>>>  scripts/qapi/common.py   | 25 
>>>  scripts/qapi/introspect.py   |  2 ++
>>>  tests/qapi-schema/bad-type-int.err   |  2 +-
>>>  tests/qapi-schema/enum-int-member.err|  2 +-
>>>  tests/qapi-schema/leading-comma-list.err |  2 +-
>>>  7 files changed, 26 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/tests/qapi-schema/bad-type-int.json 
>>> b/tests/qapi-schema/bad-type-int.json
>>> index 56fc6f8126..81355eb196 100644
>>> --- a/tests/qapi-schema/bad-type-int.json
>>> +++ b/tests/qapi-schema/bad-type-int.json
>>> @@ -1,3 +1,2 @@
>>>  # we reject an expression with a metatype that is not a string
>>> -# FIXME: once the parser understands integer inputs, improve the error 
>>> message
>>>  { 'struct': 1, 'data': { } }
>>> diff --git a/tests/qapi-schema/enum-int-member.json 
>>> b/tests/qapi-schema/enum-int-member.json
>>> index 6c9c32e149..6958440c6d 100644
>>> --- a/tests/qapi-schema/enum-int-member.json
>>> +++ b/tests/qapi-schema/enum-int-member.json
>>> @@ -1,3 +1,2 @@
>>>  # we reject any enum member that is not a string
>>> -# FIXME: once the parser understands integer inputs, improve the error 
>>> message
>>>  { 'enum': 'MyEnum', 'data': [ 1 ] }
>>> diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
>>> index d61bfdc526..3396ea4a09 100644
>>> --- a/scripts/qapi/common.py
>>> +++ b/scripts/qapi/common.py
>>> @@ -498,6 +498,8 @@ class QAPISchemaParser(object):
>>>  raise QAPISemError(info, "Unknown pragma '%s'" % name)
>>>  
>>>  def accept(self, skip_comment=True):
>>> +num_match = re.compile(r'([-+]?inf|nan|[-+0-9.][0-9a-f.ex]*)')
>>> +
>>>  while True:
>>>  self.tok = self.src[self.cursor]
>>>  self.pos = self.cursor
>> 
>> This is yet another extension over plain JSON.  RFC 8259:
>> 
>>   number = [ minus ] int [ frac ] [ exp ]
>>   decimal-point = %x2E   ; .
>>   digit1-9 = %x31-39 ; 1-9
>>   e = %x65 / %x45; e E
>>   exp = e [ minus / plus ] 1*DIGIT
>>   frac = decimal-point 1*DIGIT
>>   int = zero / ( digit1-9 *DIGIT )
>>   minus = %x2D   ; -
>>   plus = %x2B; +
>>   zero = %x30; 0
>> 
>> Extensions are acceptable when we have an actual use for it, and we
>> document them properly.
>
> Well, it isn’t really an extension, because this isn’t a JSON parser but
> just something that accepts anything that looks like a number and then
> lets Python try a conversion on it.

I'm totally cool with deviating from JSON, all I really care about is
proper schema language documentation.

If we stick to JSON form number syntax, this is easy: replace "Numbers
and null are not supported" by "null is not supported", done.
Implementation should also be easy enough: convert RFC 8259's EBNF to a
regexp, feed the matched string to Python's number parser, done.

Fancier syntax we'd need to document ourselves.  I'm willing to deal
with that if we have a sufficiently compelling use for them.

>> Isn't the parenthesis in your regular expression redundant?
>
> You’re right, but on second thought, maybe I should surround it by \<
> and \>.
>
>> What use do you have in mind for 'inf' and 'nan'?
>
> I could imagine inf being a useful default value, actually.  nan,
> probably not so much.
>
>> Why accept leading '+' as in '+123'?
>> 
>> Why accept empty integral part as in '.123'?
>> 
>> Why accept '.xe.'?  Kidding you, that must be a bug in your regexp.
>
> Well, kind of.
>
> I wanted to accept anything that looks in any way like a number and then
> let Python try to convert it.  That’s also the reason why the case comes
> last.
>
> For that reason, I decided to keep the regex as simple as possible,
> because the attempted conversions would reject anything that isn’t (to
> Python) a valid number later.

Ah, now I see.

> It was my impression that the QAPI schema isn’t really JSON anyway and
> that our QAPI schema parser isn’t a JSON parser.  Under that assumption
> it simply seemed useful to me to accept anything that could potentially
> be a number to Python and convert it.
>
> Now, honestly, I still don’t see the point of having a strict JSON
> “parser” here, but if you insist.  Seems possible to do in a regex.
>
> Though I do think it makes sense to support hex integers as an extension.

Let's start simple & stupid.  Extensions can be added on top.
Hexadecimal integers may well be compelling enough to justify an
extension.

>> Please decide what number syntax you'd like to accept, then specify it
>> in docs/devel/qapi-code-gen.txt, so we can first discuss the
>> specification, and then check the regexp implements it.
>> 
>> docs/devel/qapi-code-gen.txt upda

Re: [PATCH v7 1/3] block: introduce compress filter driver

2019-11-14 Thread Vladimir Sementsov-Ogievskiy
14.11.2019 14:27, Max Reitz wrote:
> On 13.11.19 19:43, Andrey Shinkevich wrote:
>> Allow writing all the data compressed through the filter driver.
>> The written data will be aligned by the cluster size.
>> Based on the QEMU current implementation, that data can be written to
>> unallocated clusters only. May be used for a backup job.
>>
>> Suggested-by: Max Reitz 
>> Signed-off-by: Andrey Shinkevich 
>> ---
>>   block/Makefile.objs |   1 +
>>   block/filter-compress.c | 201 
>> 
>>   qapi/block-core.json|  10 ++-
>>   3 files changed, 208 insertions(+), 4 deletions(-)
>>   create mode 100644 block/filter-compress.c
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index e394fe0..330529b 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -43,6 +43,7 @@ block-obj-y += crypto.o
>>   
>>   block-obj-y += aio_task.o
>>   block-obj-y += backup-top.o
>> +block-obj-y += filter-compress.o
>>   
>>   common-obj-y += stream.o
>>   
>> diff --git a/block/filter-compress.c b/block/filter-compress.c
>> new file mode 100644
>> index 000..64b1ee5
>> --- /dev/null
>> +++ b/block/filter-compress.c
>> @@ -0,0 +1,201 @@
>> +/*
>> + * Compress filter block driver
>> + *
>> + * Copyright (c) 2019 Virtuozzo International GmbH
>> + *
>> + * Author:
>> + *   Andrey Shinkevich 
>> + *   (based on block/copy-on-read.c by Max Reitz)
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License as
>> + * published by the Free Software Foundation; either version 2 or
>> + * (at your option) any later version of the License.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see .
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "block/block_int.h"
>> +#include "qemu/module.h"
>> +
>> +
>> +static int compress_open(BlockDriverState *bs, QDict *options, int flags,
>> + Error **errp)
>> +{
>> +bs->backing = bdrv_open_child(NULL, options, "file", bs, &child_file, 
>> false,
>> +  errp);
> 
> Please don’t attach something that the QAPI schema calls “file” as
> bs->backing.


Agree, it's a mistake. If we want backing and user set backing in options, it's 
opened automatically, I think..

> 
> Yes, attaching it as bs->file would break backing chains.  That’s a bug
> in the block layer.  I’ve been working on a fix for a long time.
> 
> Please don’t introduce more weirdness just because we have a bug in the
> block layer.
> 
> (Note that I’d strongly oppose calling the child “backing” in the QAPI
> schema, as this would go against what all other user-creatable filters do.)
> 

So, are you opposite to correct backing-based user-creatable filter (with 
backing both
in QAPI and code)?

Do you think, that if we make backup-top to be user-creatable, we should move 
it to be
file-child-based, or support both backing and file child?


-- 
Best regards,
Vladimir



Re: [PATCH v7 1/3] block: introduce compress filter driver

2019-11-14 Thread Max Reitz
On 13.11.19 19:43, Andrey Shinkevich wrote:
> Allow writing all the data compressed through the filter driver.
> The written data will be aligned by the cluster size.
> Based on the QEMU current implementation, that data can be written to
> unallocated clusters only. May be used for a backup job.
> 
> Suggested-by: Max Reitz 
> Signed-off-by: Andrey Shinkevich 
> ---
>  block/Makefile.objs |   1 +
>  block/filter-compress.c | 201 
> 
>  qapi/block-core.json|  10 ++-
>  3 files changed, 208 insertions(+), 4 deletions(-)
>  create mode 100644 block/filter-compress.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index e394fe0..330529b 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -43,6 +43,7 @@ block-obj-y += crypto.o
>  
>  block-obj-y += aio_task.o
>  block-obj-y += backup-top.o
> +block-obj-y += filter-compress.o
>  
>  common-obj-y += stream.o
>  
> diff --git a/block/filter-compress.c b/block/filter-compress.c
> new file mode 100644
> index 000..64b1ee5
> --- /dev/null
> +++ b/block/filter-compress.c
> @@ -0,0 +1,201 @@
> +/*
> + * Compress filter block driver
> + *
> + * Copyright (c) 2019 Virtuozzo International GmbH
> + *
> + * Author:
> + *   Andrey Shinkevich 
> + *   (based on block/copy-on-read.c by Max Reitz)
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 or
> + * (at your option) any later version of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +#include "block/block_int.h"
> +#include "qemu/module.h"
> +
> +
> +static int compress_open(BlockDriverState *bs, QDict *options, int flags,
> + Error **errp)
> +{
> +bs->backing = bdrv_open_child(NULL, options, "file", bs, &child_file, 
> false,
> +  errp);

Please don’t attach something that the QAPI schema calls “file” as
bs->backing.

Yes, attaching it as bs->file would break backing chains.  That’s a bug
in the block layer.  I’ve been working on a fix for a long time.

Please don’t introduce more weirdness just because we have a bug in the
block layer.

(Note that I’d strongly oppose calling the child “backing” in the QAPI
schema, as this would go against what all other user-creatable filters do.)

Max



signature.asc
Description: OpenPGP digital signature


Re: API definition for LUKS key management

2019-11-14 Thread Maxim Levitsky
On Tue, 2019-11-12 at 10:12 +0100, Kevin Wolf wrote:
> Am 11.11.2019 um 19:34 hat Daniel P. Berrangé geschrieben:
> > On Mon, Nov 11, 2019 at 05:58:20PM +0200, Maxim Levitsky wrote:
> > > One of the concerns that was raised during the review was that amend 
> > > interface for luks that I propose is
> > > different from the amend inteface used currently for qcow2.
> > > 
> > > qcow2 amend interface specifies all the format options, thus overwrites 
> > > the existing options.
> > > Thus it seems natural to make the luks amend interface work the same way, 
> > > that it receive an array
> > > of 8 slots, and for each slot specify if it is active, and if true what 
> > > password to put in it.
> > > This does allow to add and erase the keyslots, but it doesn't allow:
> > > 
> > >* add a password without knowing all other passwords that exist in 
> > > existing keyslots
> > >  this can be mitigated by specifying which keyslots to modify for 
> > > example by omitting the
> > >  keyslots that shouldn't be touched from the array (passing null 
> > > placeholder instead)
> > >  but then it already doesn't follow the 'specify all the options each 
> > > time' principle.
> > 
> > I think this is highly undesirable, as we must not assume that the
> > mgmt app has access to all the passwords currently set.
> 
> And I think this shows the problem that we realy have with the crypto
> driver and amend: For every other driver, if you must, you can query the
> current settings and just write them back.
> 
> The difference here is that crypto doesn't allow to directly query or
> specify the content of some options (the keyslots), but provides only a
> way to derives that content from a secret, and obviously there is no way
> back from the stored data to the secret (that's what it's for).
> 
> I think we have two options here:
> 
> 1. Add a special "don't touch this" value for keyslots. Normally, just
>leaving out the value would be suitable syntax for this. Here,
>however, we have a list of keyslots, so we can't leave anything out.
> 
>We could use something like an alternate between str (new secret ID),
>null (erase keyslot) and empty dict (leave it alone) - the latter
>feels a bit hackish, but maybe it's not too bad. If the list is
>shorter than 8 entries, the rest is assumed to mean "leave it alone",
>too.
> 
> 2. Allow to query and set the raw key, which doesn't require a password
> 
> > The two key use cases for having multiple key slots are
> > 
> >   - To enable a two-phase change of passwords to ensure new password
> > is safely written out before erasing the old password
> > 
> >   - To allow for multiple access passwords with different controls
> > or access to when each password is made available.
> > 
> > eg each VM may have a separate "backup password" securely
> > stored off host that is only made available for use when
> > doing disaster recovery.
> > 
> > the second use case is doomed if you need to always provide all
> > current passwords when changing any key slots.
> 
> That providing all current passwords doesn't work is obvious.

I also want to *emphasise* that not being able to provide all the keyslots
is the smaller problem here, since it is relatively easy to omit slots that
should be left untouched.
The bigger problem is supporting the 'erase all keyslots that match the 
password'
That doesn't fit into current amend definition at all.

> 
> > >* erase all keyslots matching a password - this is really hard to do 
> > > using this approach,
> > >  unless we give user some kind of api to try each keyslot with given 
> > > password,
> > >  which is kind of ugly and might be racy as well.
> > > So what do you think?
> > 
> > The point of using "amend" is that we already have some of the boilerplate
> > supporting framework around that, so it saves effort for both QEMU and
> > our users. If the semantics of "amend" don't fit nicely though, then the
> > benefit of re-using "amend" is cancelled out and we should go back to
> > considering a separate "key-manage" command.
> 
> This wouldn't solve the fundamental problem that the crypto block
> driver, as it currently is, isn't able to provide a blockdev-amend
> callback. It's worse for qcow2 because qcow2 already implements amend.
> 
> I think we need to find a solution for the amend API.
> 
> Kevin

Best regards,
Maxim Levitsky





Re: API definition for LUKS key management

2019-11-14 Thread Maxim Levitsky
On Tue, 2019-11-12 at 11:02 +, Daniel P. Berrangé wrote:
> On Tue, Nov 12, 2019 at 10:12:45AM +0100, Kevin Wolf wrote:
> > Am 11.11.2019 um 19:34 hat Daniel P. Berrangé geschrieben:
> > > On Mon, Nov 11, 2019 at 05:58:20PM +0200, Maxim Levitsky wrote:
> > > > One of the concerns that was raised during the review was that amend 
> > > > interface for luks that I propose is
> > > > different from the amend inteface used currently for qcow2.
> > > > 
> > > > qcow2 amend interface specifies all the format options, thus overwrites 
> > > > the existing options.
> > > > Thus it seems natural to make the luks amend interface work the same 
> > > > way, that it receive an array
> > > > of 8 slots, and for each slot specify if it is active, and if true what 
> > > > password to put in it.
> > > > This does allow to add and erase the keyslots, but it doesn't allow:
> > > > 
> > > >* add a password without knowing all other passwords that exist in 
> > > > existing keyslots
> > > >  this can be mitigated by specifying which keyslots to modify for 
> > > > example by omitting the
> > > >  keyslots that shouldn't be touched from the array (passing null 
> > > > placeholder instead)
> > > >  but then it already doesn't follow the 'specify all the options 
> > > > each time' principle.
> > > 
> > > I think this is highly undesirable, as we must not assume that the
> > > mgmt app has access to all the passwords currently set.
> > 
> > And I think this shows the problem that we realy have with the crypto
> > driver and amend: For every other driver, if you must, you can query the
> > current settings and just write them back.
> > 
> > The difference here is that crypto doesn't allow to directly query or
> > specify the content of some options (the keyslots), but provides only a
> > way to derives that content from a secret, and obviously there is no way
> > back from the stored data to the secret (that's what it's for).
> > 
> > I think we have two options here:
> > 
> > 1. Add a special "don't touch this" value for keyslots. Normally, just
> >leaving out the value would be suitable syntax for this. Here,
> >however, we have a list of keyslots, so we can't leave anything out.
> > 
> >We could use something like an alternate between str (new secret ID),
> >null (erase keyslot) and empty dict (leave it alone) - the latter
> >feels a bit hackish, but maybe it's not too bad. If the list is
> >shorter than 8 entries, the rest is assumed to mean "leave it alone",
> >too.
> 
> I'd be very wary of having a "null" vs "empty dict" distinction to
> mean "erase" vs "don't touch".
> 
> It feels like that is designed to maximise the chances of someone
> shooting themselves in the foot by accidentally using "null" instead
> of an "empty dict".
> 
> The reason for the use of "active=yes" / "active=no" is because that
> was reasonable explicit about wanting to erase a keyslot, and it does
> does actually map to the key slot on disk which has an "active" field
> taking two magic values.
> 
> > 2. Allow to query and set the raw key, which doesn't require a password
> 
> I don't think I understand what you mean here. If you don't have a
> password the only change you can make is to erase key slots.
Well in the theory the keyslot has the hash of the password, the salt, the hash 
function
iteration count and the active field. 

In theory you can let the user read these values directly and write them back 
as is without knowing the password.
This is very ugly IMHO but will fit the classical amend definition.


> 
> > > >* erase all keyslots matching a password - this is really hard to do 
> > > > using this approach,
> > > >  unless we give user some kind of api to try each keyslot with 
> > > > given password,
> > > >  which is kind of ugly and might be racy as well.
> > > > So what do you think?
> > > 
> > > The point of using "amend" is that we already have some of the boilerplate
> > > supporting framework around that, so it saves effort for both QEMU and
> > > our users. If the semantics of "amend" don't fit nicely though, then the
> > > benefit of re-using "amend" is cancelled out and we should go back to
> > > considering a separate "key-manage" command.
> > 
> > This wouldn't solve the fundamental problem that the crypto block
> > driver, as it currently is, isn't able to provide a blockdev-amend
> > callback. It's worse for qcow2 because qcow2 already implements amend.
> > 
> > I think we need to find a solution for the amend API.
I also think so. Amend interface can be *ahem* amended to be more generic :-)
Currently it is designed for just one use case.

> 
> 
> BTW, looking forward to the future, if we ever implement LUKS version 2
> support there are a bunch more things can be tweaked at runtime. Most
> notable is that it is possible to change the master key, and change the
> encryption algorithm choices. Both of these then need to trigger a bulk
> re-encrypt of the entire d

Re: [RFC PATCH 00/18] Add qemu-storage-daemon

2019-11-14 Thread Kevin Wolf
Am 24.10.2019 um 15:55 hat Vladimir Sementsov-Ogievskiy geschrieben:
> This reflects our idea of using qemu binary instead of qemu-img for doing
> block-layer operations offline.
> 
> What is the practical difference between qemu-storage-daemon and starting
> qemu binary in stopped state?

If I'm doing things right, QEMU should be able to do the exact same
things as the storage daemon (and more). So functionality isn't what
makes the daemon desirable.

One point to consider is that the QEMU binary with a full system
emulator is (and has to be) relatively heavyweight compared to the
daemon.

I think libvirt once said they didn't want to use the qemu binary on the
grounds that it takes too long to start, though I'm not sure if that's
really an argument. I just did a quick test (qemu with -M none
-nodefaults -display none) and it's 30 vs. 60 ms on my laptop. I get the
same factor 2 for RSS.

More interesting is maybe the overhead in terms of binary size and
dependencies, especially if you need only either the storage daemon _or_
the QEMU binary (e.g. consider a case where the storage daemon runs in
one container and the VM in another).

Having two binaries allows to cut down the dependencies, and to some
extent also the binary size, for both binaries: The storage daemon
doesn't need anything related to system emulation, UI, network etc., and
in QEMU we can probably compile out most of the block layer then, which
gets rid of dependencies like libiscsi, librbd, libgfapi, etc.

As a bonus, this might even reduce the attack surface a little.

So yes, I agree that the storage daemon doesn't offer any functionality
that QEMU can't offer and we don't have a clear-cut requirement that
unambiguously calls for a separate storage daemon, but I still see some
advantage in having two separate binaries.

Another thing to mention is that on IRC, Stefan suggested the other day
to export block devices from the storage daemon not as vhost-user, but
using muser instead (i.e. providing a whole PCI device) and exporting
the existing virtio-blk-pci implementation. This would pull qdev and
device emulations into the storage daemon. I think that would be the
point where using the QEMU binary instead might make more sense (and
maybe compile it twice with different options if need be).

Kevin




Re: API definition for LUKS key management

2019-11-14 Thread Maxim Levitsky
On Mon, 2019-11-11 at 18:34 +, Daniel P. Berrangé wrote:
> On Mon, Nov 11, 2019 at 05:58:20PM +0200, Maxim Levitsky wrote:
> > Hi!
> > 
> > I would like to discuss the API for LUKS key management.
> > 
> > First of all very brief overview of LUKS v1 format:
> > 
> > Each sector of the image is encrypted with same master key, which
> > is not stored directly on the disk.
> > 
> > Instead in the LUKS header we have 8 slots. Each slot optionally stores
> > an encrypted version of the master key, encrypted by the user password.
> > Knowing the password, you can retrieve the master key from the keyslot.
> > Slot can be marked as active or inactive, inactive slots are not considered
> > when opening the image.
> > 
> > In addition to that LUKS header has a hash of the master key, so that
> > you can check if the password 'opens' a keyslot by decrypting it
> > with given the password, and then checking if 
> > the hash of the decrypted master key candidate obtained matches the stored 
> > hash.
> > 
> > That basically means that you can have up to 8 different passwords that will
> > open a luks volume and you can change them as you wish without re-encrypting
> > everything.
> > 
> > Now for raw luks volumes you have cryptsetup which allows to manage these
> > keyslots, but we also have so called encrypted qcow2 format which
> > basically has the luks header, together with keyslots embedded, plus each
> > cluster is encrypted with the master key as in raw luks.
> > Cryptsetup doesn't support this, thus I implemented this in qemu block 
> > layer.
> 
> Even for raw luks volumes, the traditional "cryptsetup" tool is
> undesirable. eg consider LUKS on an RBD or ISCSI volume where
> you are using the in-QEMU RBD/ISCSI client. You don't want to
> have to configure the host kernel client just to change the
> keyslot info. You don't want to use the in-QEMU clients for
> qemu-img.

I didn't thought about it. This is a very good point!

> 
> > 
> > Link to bugzilla here: https://bugzilla.redhat.com/show_bug.cgi?id=1662412
> > 
> > 
> > Relevant to the API,
> > first of all qemu has the notion of amend (qemu-img amend), which allows
> > currently to change format specific extensions of qcow2.
> > 
> > Since luks, especially luks inside qcow2 is a format on its own, it fits to 
> > use that interface to change the 'format' options, in this case,
> > the encryption key slots.
> > 
> > 
> > There are the following requirements (they are 100% hardcoded, we might 
> > discuss
> > to drop some of these):
> > 
> > 
> > 1. ability to add a new password to a free keyslot 
> > (best is to let api to pick a free keyslot)
> > Also user should not need to know all the passwords in existing keyslots.
> > 
> > 
> > 2. ability to erase a keyslot, usually by giving the password that should 
> > be erased, and erasing all
> > the keyslots that match the password, or by giving a keyslot index.
> > This will usually be done after adding a new password.
> > 
> > 
> > 3. Allow to do so online, that is while qemu is running, but also support 
> > offline management.
> > Note that online management is even useful for raw luks volumes, since its 
> > not safe
> > to run cryptsetup on them while qemu is using the images.
> > 
> > 
> > I implemented those requirements using the following interface.
> > (I have sent the patches already)
> > 
> > I will try to explain the interface with bunch of examples:
> > 
> > 
> > # adds a new password, defined by qemu secret 'sec0' to first unused slot
> > # give user a error if all keyslots are occupied
> > qemu-img amend --secret ... -o key-secret=sec1 image.luks
> 
> I think you meant "--object secret," instead of "--secret ..."
> 
True, sorry about that!

> Also, this example needs to have 2 secrets provided. The first
> secret to unlock the image using the existing password, and the
> second secret is the one being added.
> 
> > # erases all keyslots that can be opened by password that is contained in a 
> > qemu secret 'sec0'
> > # active=off means that the given password/keyslot won't be active after 
> > the operation
> > qemu-img amend --secret ... -o key-secret=sec0,active=off image.luks
> > 
> > 
> > # erase the slot 5 (this is more low level command, less expected to be 
> > used)
> > qemu-img amend --secret ... -o slot=5,active=off image.luks
> > 
> > # add new secret to slot 5 (will fail if the slot is already marked as 
> > active)
> > qemu-img amend --secret ... -o slot=5,key-secret=sec1 image.luks
> 
> This also needs two secrets provideed.
> 
> > 
> > 
> > This is basically it.
> > 
> > The full option syntax is as following:
> > 
> > active=on/off (optional, default to on) - toggles if we enabling a keyslot 
> > or are erasing it.
> > 
> > slot=number (optional, advanced option) - specifies which exactly slot to 
> > erase or which
> > slot to put the new key on
> > 
> > key-secret = id of the secret object - specifies the secret. when slot is 
> > enabled,
> > it will be put into the n

Re: [PATCH v3 1/4] nbd/server: Prefer heap over stack for parsing client names

2019-11-14 Thread Maxim Levitsky
On Wed, 2019-11-13 at 20:46 -0600, Eric Blake wrote:
> As long as we limit NBD names to 256 bytes (the bare minimum permitted
> by the standard), stack-allocation works for parsing a name received
> from the client.  But as mentioned in a comment, we eventually want to
> permit up to the 4k maximum of the NBD standard, which is too large
> for stack allocation; so switch everything in the server to use heap
> allocation.  For now, there is no change in actually supported name
> length.

I am just curios, why is this so?
I know that kernel uses 8K stacks due to historical limitation
of 1:1 physical memory mapping which creates fragmentation,
but in the userspace stacks shouldn't really be limited and grow on demand.
Some gcc security option limits this?

> 
> Signed-off-by: Eric Blake 
> ---
>  include/block/nbd.h | 10 +-
>  nbd/server.c| 25 +++--
>  2 files changed, 20 insertions(+), 15 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 316fd705a9e4..c306423dc85c 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -226,11 +226,11 @@ enum {
>  /* Maximum size of a single READ/WRITE data buffer */
>  #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)
> 
> -/* Maximum size of an export name. The NBD spec requires 256 and
> - * suggests that servers support up to 4096, but we stick to only the
> - * required size so that we can stack-allocate the names, and because
> - * going larger would require an audit of more code to make sure we
> - * aren't overflowing some other buffer. */
> +/*
> + * Maximum size of an export name. The NBD spec requires a minimum of
> + * 256 and recommends that servers support up to 4096; all users use
> + * malloc so we can bump this constant without worry.
> + */
>  #define NBD_MAX_NAME_SIZE 256
> 
>  /* Two types of reply structures */
> diff --git a/nbd/server.c b/nbd/server.c
> index d8d1e6245532..c63b76b22735 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -324,18 +324,20 @@ static int nbd_opt_skip(NBDClient *client, size_t size, 
> Error **errp)
>   *   uint32_t len (<= NBD_MAX_NAME_SIZE)
>   *   len bytes string (not 0-terminated)
>   *
> - * @name should be enough to store NBD_MAX_NAME_SIZE+1.
> + * On success, @name will be allocated.
>   * If @length is non-null, it will be set to the actual string length.
>   *
>   * Return -errno on I/O error, 0 if option was completely handled by
>   * sending a reply about inconsistent lengths, or 1 on success.
>   */
> -static int nbd_opt_read_name(NBDClient *client, char *name, uint32_t *length,
> +static int nbd_opt_read_name(NBDClient *client, char **name, uint32_t 
> *length,
>   Error **errp)
>  {
>  int ret;
>  uint32_t len;
> +g_autofree char *local_name = NULL;
> 
> +*name = NULL;
>  ret = nbd_opt_read(client, &len, sizeof(len), errp);
>  if (ret <= 0) {
>  return ret;
> @@ -347,15 +349,17 @@ static int nbd_opt_read_name(NBDClient *client, char 
> *name, uint32_t *length,
> "Invalid name length: %" PRIu32, len);
>  }
> 
> -ret = nbd_opt_read(client, name, len, errp);
> +local_name = g_malloc(len + 1);
> +ret = nbd_opt_read(client, local_name, len, errp);
>  if (ret <= 0) {
>  return ret;
>  }
> -name[len] = '\0';
> +local_name[len] = '\0';
> 
>  if (length) {
>  *length = len;
>  }
> +*name = g_steal_pointer(&local_name);
> 
>  return 1;
>  }
> @@ -427,7 +431,7 @@ static void nbd_check_meta_export(NBDClient *client)
>  static int nbd_negotiate_handle_export_name(NBDClient *client, bool 
> no_zeroes,
>  Error **errp)
>  {
> -char name[NBD_MAX_NAME_SIZE + 1];
> +g_autofree char *name;

That is what patchew complained about I think.

Isn't it wonderful how g_autofree fixes one issue
and introduces another. I mean 'name' isn't really
used here prior to allocation according to plain C,
but due to g_autofree, it can be now on any error
path. Nothing against g_autofree though, just noting this.

>  char buf[NBD_REPLY_EXPORT_NAME_SIZE] = "";
>  size_t len;
>  int ret;
> @@ -441,10 +445,11 @@ static int nbd_negotiate_handle_export_name(NBDClient 
> *client, bool no_zeroes,
>  [10 .. 133]   reserved (0) [unless no_zeroes]
>   */
>  trace_nbd_negotiate_handle_export_name();
> -if (client->optlen >= sizeof(name)) {
> +if (client->optlen > NBD_MAX_NAME_SIZE) {
>  error_setg(errp, "Bad length received");
>  return -EINVAL;
>  }
> +name = g_malloc(client->optlen + 1);
>  if (nbd_read(client->ioc, name, client->optlen, "export name", errp) < 
> 0) {
>  return -EIO;
>  }
> @@ -533,7 +538,7 @@ static int nbd_reject_length(NBDClient *client, bool 
> fatal, Error **errp)
>  static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)
>  {
>  int rc;
> -char nam

Re: [PATCH v3 3/4] nbd: Don't send oversize strings

2019-11-14 Thread Maxim Levitsky
On Wed, 2019-11-13 at 20:46 -0600, Eric Blake wrote:
> Qemu as server currently won't accept export names larger than 256
> bytes, nor create dirty bitmap names longer than 1023 bytes, so most
> uses of qemu as client or server have no reason to get anywhere near
> the NBD spec maximum of a 4k limit per string.
> 
> However, we weren't actually enforcing things, ignoring when the
> remote side violates the protocol on input, and also having several
> code paths where we send oversize strings on output (for example,
> qemu-nbd --description could easily send more than 4k).  Tighten
> things up as follows:
> 
> client:
> - Perform bounds check on export name and dirty bitmap request prior
>   to handing it to server
> - Validate that copied server replies are not too long (ignoring
>   NBD_INFO_* replies that are not copied is not too bad)
> server:
> - Perform bounds check on export name and description prior to
>   advertising it to client
> - Reject client name or metadata query that is too long
> - Adjust things to allow full 4k name limit rather than previous
>   256 byte limit
> 
> Signed-off-by: Eric Blake 
> ---
>  include/block/nbd.h |  8 
>  block/nbd.c | 10 ++
>  blockdev-nbd.c  |  5 +
>  nbd/client.c| 18 +++---
>  nbd/server.c| 20 +++-
>  qemu-nbd.c  |  9 +
>  6 files changed, 58 insertions(+), 12 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index c306423dc85c..7f46932d80f1 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -227,11 +227,11 @@ enum {
>  #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)
> 
>  /*
> - * Maximum size of an export name. The NBD spec requires a minimum of
> - * 256 and recommends that servers support up to 4096; all users use
> - * malloc so we can bump this constant without worry.
> + * Maximum size of a protocol string (export name, meta context name,
> + * etc.).  Use malloc rather than stack allocation for storage of a
> + * string.
>   */
> -#define NBD_MAX_NAME_SIZE 256
> +#define NBD_MAX_STRING_SIZE 4096
> 
>  /* Two types of reply structures */
>  #define NBD_SIMPLE_REPLY_MAGIC  0x67446698
> diff --git a/block/nbd.c b/block/nbd.c
> index 123976171cf4..5f18f78a9471 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -1832,6 +1832,10 @@ static int nbd_process_options(BlockDriverState *bs, 
> QDict *options,
>  }
> 
>  s->export = g_strdup(qemu_opt_get(opts, "export"));
> +if (s->export && strlen(s->export) > NBD_MAX_STRING_SIZE) {
> +error_setg(errp, "export name too long to send to server");
> +goto error;
> +}
> 
>  s->tlscredsid = g_strdup(qemu_opt_get(opts, "tls-creds"));
>  if (s->tlscredsid) {
> @@ -1849,6 +1853,11 @@ static int nbd_process_options(BlockDriverState *bs, 
> QDict *options,
>  }
> 
>  s->x_dirty_bitmap = g_strdup(qemu_opt_get(opts, "x-dirty-bitmap"));
> +if (s->x_dirty_bitmap && strlen(s->x_dirty_bitmap) > 
> NBD_MAX_STRING_SIZE) {
> +error_setg(errp, "x-dirty-bitmap query too long to send to server");
> +goto error;
> +}
> +
>  s->reconnect_delay = qemu_opt_get_number(opts, "reconnect-delay", 0);
> 
>  ret = 0;
> @@ -1859,6 +1868,7 @@ static int nbd_process_options(BlockDriverState *bs, 
> QDict *options,
>  qapi_free_SocketAddress(s->saddr);
>  g_free(s->export);
>  g_free(s->tlscredsid);
> +g_free(s->x_dirty_bitmap);
>  }
>  qemu_opts_del(opts);
>  return ret;
> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> index 6a8b206e1d74..8c20baa4a4b9 100644
> --- a/blockdev-nbd.c
> +++ b/blockdev-nbd.c
> @@ -162,6 +162,11 @@ void qmp_nbd_server_add(const char *device, bool 
> has_name, const char *name,
>  name = device;
>  }
> 
> +if (strlen(name) > NBD_MAX_STRING_SIZE) {
> +error_setg(errp, "export name '%s' too long", name);
> +return;
> +}
> +
>  if (nbd_export_find(name)) {
>  error_setg(errp, "NBD server already has export named '%s'", name);
>  return;
> diff --git a/nbd/client.c b/nbd/client.c
> index f6733962b49b..ba173108baab 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -289,8 +289,8 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, 
> char **description,
>  return -1;
>  }
>  len -= sizeof(namelen);
> -if (len < namelen) {
> -error_setg(errp, "incorrect option name length");
> +if (len < namelen || namelen > NBD_MAX_STRING_SIZE) {
> +error_setg(errp, "incorrect name length in server's list response");
>  nbd_send_opt_abort(ioc);
>  return -1;
>  }
> @@ -303,6 +303,12 @@ static int nbd_receive_list(QIOChannel *ioc, char 
> **name, char **description,
>  local_name[namelen] = '\0';
>  len -= namelen;
>  if (len) {
> +if (len > NBD_MAX_STRING_SIZE) {
> +error_setg(errp, "incorrect description length in server's "
> +  

Re: [PATCH v3 2/4] bitmap: Enforce maximum bitmap name length

2019-11-14 Thread Maxim Levitsky
On Wed, 2019-11-13 at 20:46 -0600, Eric Blake wrote:
> We document that for qcow2 persistent bitmaps, the name cannot exceed
> 1023 bytes.  It is inconsistent if transient bitmaps do not have to
> abide by the same limit, and it is unlikely that any existing client
> even cares about using bitmap names this long.  It's time to codify
> that ALL bitmaps managed by qemu (whether persistent in qcow2 or not)
> have a documented maximum length.

Strange a bit that 1023 was choosen, I guess implementation
uses a 1024 zero terminated string for storing the names
in memory.

> 
> Signed-off-by: Eric Blake 
> ---
>  qapi/block-core.json |  2 +-
>  include/block/dirty-bitmap.h |  2 ++
>  block/dirty-bitmap.c | 12 +---
>  block/qcow2-bitmap.c |  2 ++
>  4 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index aa97ee264112..0cf68fea1450 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2042,7 +2042,7 @@
>  #
>  # @node: name of device/node which the bitmap is tracking
>  #
> -# @name: name of the dirty bitmap
> +# @name: name of the dirty bitmap (must be less than 1024 bytes)
>  #
>  # @granularity: the bitmap granularity, default is 64k for
>  #   block-dirty-bitmap-add
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index 958e7474fb51..e2b20ecab9a3 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -14,6 +14,8 @@ typedef enum BitmapCheckFlags {
>   BDRV_BITMAP_INCONSISTENT)
>  #define BDRV_BITMAP_ALLOW_RO (BDRV_BITMAP_BUSY | BDRV_BITMAP_INCONSISTENT)
> 
> +#define BDRV_BITMAP_MAX_NAME_SIZE 1023
> +
>  BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
>uint32_t granularity,
>const char *name,
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 4bbb251b2c9c..7039e8252009 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -104,9 +104,15 @@ BdrvDirtyBitmap 
> *bdrv_create_dirty_bitmap(BlockDriverState *bs,
> 
>  assert(is_power_of_2(granularity) && granularity >= BDRV_SECTOR_SIZE);
> 
> -if (name && bdrv_find_dirty_bitmap(bs, name)) {
> -error_setg(errp, "Bitmap already exists: %s", name);
> -return NULL;
> +if (name) {
> +if (bdrv_find_dirty_bitmap(bs, name)) {
> +error_setg(errp, "Bitmap already exists: %s", name);
> +return NULL;
> +}
> +if (strlen(name) > BDRV_BITMAP_MAX_NAME_SIZE) {
> +error_setg(errp, "Bitmap name too long: %s", name);
> +return NULL;
> +}
>  }
>  bitmap_size = bdrv_getlength(bs);
>  if (bitmap_size < 0) {
> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> index ef9ef628a0d0..809bbc5d20c8 100644
> --- a/block/qcow2-bitmap.c
> +++ b/block/qcow2-bitmap.c
> @@ -42,6 +42,8 @@
>  #define BME_MIN_GRANULARITY_BITS 9
>  #define BME_MAX_NAME_SIZE 1023
> 
> +QEMU_BUILD_BUG_ON(BME_MAX_NAME_SIZE != BDRV_BITMAP_MAX_NAME_SIZE);
> +
>  #if BME_MAX_TABLE_SIZE * 8ULL > INT_MAX
>  #error In the code bitmap table physical size assumed to fit into int
>  #endif


Reviewed-by: Maxim Levitsky 

Best regards,
Maxim Levitsky






Re: [Qemu-devel] [PATCH v4 02/14] qapi: Move to_c_string() to common.py

2019-11-14 Thread Max Reitz
On 14.11.19 10:20, Markus Armbruster wrote:
> Max Reitz  writes:
> 
>> This function will be useful for code generation once we allow default
>> values, so move it to the other "C helper functions".  In the process,
>> rewrite it so it supports all nonprintable and non-ASCII characters.
>>
>> Signed-off-by: Max Reitz 
> 
> Please have a close look at commit 56a8caff92 "qapi: Restrict strings to
> printable ASCII".  Do we still need the rewrite?

If that’s all that has changed, I think we will still need at least some
bits, like the " or \ escaping.

Also, actually, it looks like 56a8caff92 didn’t change the fact that
control characters are verbatim parts of the string, i.e. \u000a will
still be a literal 0xa byte, and as such must be escaped anew in the C
string.

So without having tried, I think this is still very much necessary.

> If yes: the commit message title promises code motion, but the patch is
> anything but.  Adjust the title, please.

OK.

Max




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v4 01/14] qapi: Parse numeric values

2019-11-14 Thread Max Reitz
On 14.11.19 10:15, Markus Armbruster wrote:
> Max Reitz  writes:
> 
>> Signed-off-by: Max Reitz 
>> ---
>>  tests/qapi-schema/bad-type-int.json  |  1 -
>>  tests/qapi-schema/enum-int-member.json   |  1 -
>>  scripts/qapi/common.py   | 25 
>>  scripts/qapi/introspect.py   |  2 ++
>>  tests/qapi-schema/bad-type-int.err   |  2 +-
>>  tests/qapi-schema/enum-int-member.err|  2 +-
>>  tests/qapi-schema/leading-comma-list.err |  2 +-
>>  7 files changed, 26 insertions(+), 9 deletions(-)
>>
>> diff --git a/tests/qapi-schema/bad-type-int.json 
>> b/tests/qapi-schema/bad-type-int.json
>> index 56fc6f8126..81355eb196 100644
>> --- a/tests/qapi-schema/bad-type-int.json
>> +++ b/tests/qapi-schema/bad-type-int.json
>> @@ -1,3 +1,2 @@
>>  # we reject an expression with a metatype that is not a string
>> -# FIXME: once the parser understands integer inputs, improve the error 
>> message
>>  { 'struct': 1, 'data': { } }
>> diff --git a/tests/qapi-schema/enum-int-member.json 
>> b/tests/qapi-schema/enum-int-member.json
>> index 6c9c32e149..6958440c6d 100644
>> --- a/tests/qapi-schema/enum-int-member.json
>> +++ b/tests/qapi-schema/enum-int-member.json
>> @@ -1,3 +1,2 @@
>>  # we reject any enum member that is not a string
>> -# FIXME: once the parser understands integer inputs, improve the error 
>> message
>>  { 'enum': 'MyEnum', 'data': [ 1 ] }
>> diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
>> index d61bfdc526..3396ea4a09 100644
>> --- a/scripts/qapi/common.py
>> +++ b/scripts/qapi/common.py
>> @@ -498,6 +498,8 @@ class QAPISchemaParser(object):
>>  raise QAPISemError(info, "Unknown pragma '%s'" % name)
>>  
>>  def accept(self, skip_comment=True):
>> +num_match = re.compile(r'([-+]?inf|nan|[-+0-9.][0-9a-f.ex]*)')
>> +
>>  while True:
>>  self.tok = self.src[self.cursor]
>>  self.pos = self.cursor
> 
> This is yet another extension over plain JSON.  RFC 8259:
> 
>   number = [ minus ] int [ frac ] [ exp ]
>   decimal-point = %x2E   ; .
>   digit1-9 = %x31-39 ; 1-9
>   e = %x65 / %x45; e E
>   exp = e [ minus / plus ] 1*DIGIT
>   frac = decimal-point 1*DIGIT
>   int = zero / ( digit1-9 *DIGIT )
>   minus = %x2D   ; -
>   plus = %x2B; +
>   zero = %x30; 0
> 
> Extensions are acceptable when we have an actual use for it, and we
> document them properly.

Well, it isn’t really an extension, because this isn’t a JSON parser but
just something that accepts anything that looks like a number and then
lets Python try a conversion on it.

> Isn't the parenthesis in your regular expression redundant?

You’re right, but on second thought, maybe I should surround it by \<
and \>.

> What use do you have in mind for 'inf' and 'nan'?

I could imagine inf being a useful default value, actually.  nan,
probably not so much.

> Why accept leading '+' as in '+123'?
> 
> Why accept empty integral part as in '.123'?
> 
> Why accept '.xe.'?  Kidding you, that must be a bug in your regexp.

Well, kind of.

I wanted to accept anything that looks in any way like a number and then
let Python try to convert it.  That’s also the reason why the case comes
last.

For that reason, I decided to keep the regex as simple as possible,
because the attempted conversions would reject anything that isn’t (to
Python) a valid number later.

It was my impression that the QAPI schema isn’t really JSON anyway and
that our QAPI schema parser isn’t a JSON parser.  Under that assumption
it simply seemed useful to me to accept anything that could potentially
be a number to Python and convert it.

Now, honestly, I still don’t see the point of having a strict JSON
“parser” here, but if you insist.  Seems possible to do in a regex.

Though I do think it makes sense to support hex integers as an extension.

> Please decide what number syntax you'd like to accept, then specify it
> in docs/devel/qapi-code-gen.txt, so we can first discuss the
> specification, and then check the regexp implements it.
> 
> docs/devel/qapi-code-gen.txt update goes here:
> 
> === Schema syntax ===
> 
> Syntax is loosely based on JSON (http://www.ietf.org/rfc/rfc8259.txt).
> Differences:
> 
> * Comments: start with a hash character (#) that is not part of a
>   string, and extend to the end of the line.
> 
> * Strings are enclosed in 'single quotes', not "double quotes".
> 
> * Strings are restricted to printable ASCII, and escape sequences to
>   just '\\'.
> 
> --> * Numbers and null are not supported.

OK.

> Hrmm, commit 9d55380b5a "qapi: Remove null from schema language" left
> two instances in error messages behind.  I'll fix them.
> 
>> @@ -584,7 +586,22 @@ class QAPISchemaParser(object):
>>  return
>>  self.line += 1
>>  self.line_pos = self.cursor
>> -el

Re: [Qemu-devel] [PATCH v4 02/14] qapi: Move to_c_string() to common.py

2019-11-14 Thread Markus Armbruster
Max Reitz  writes:

> This function will be useful for code generation once we allow default
> values, so move it to the other "C helper functions".  In the process,
> rewrite it so it supports all nonprintable and non-ASCII characters.
>
> Signed-off-by: Max Reitz 

Please have a close look at commit 56a8caff92 "qapi: Restrict strings to
printable ASCII".  Do we still need the rewrite?

If yes: the commit message title promises code motion, but the patch is
anything but.  Adjust the title, please.




Re: [Qemu-devel] [PATCH v4 01/14] qapi: Parse numeric values

2019-11-14 Thread Markus Armbruster
Max Reitz  writes:

> Signed-off-by: Max Reitz 
> ---
>  tests/qapi-schema/bad-type-int.json  |  1 -
>  tests/qapi-schema/enum-int-member.json   |  1 -
>  scripts/qapi/common.py   | 25 
>  scripts/qapi/introspect.py   |  2 ++
>  tests/qapi-schema/bad-type-int.err   |  2 +-
>  tests/qapi-schema/enum-int-member.err|  2 +-
>  tests/qapi-schema/leading-comma-list.err |  2 +-
>  7 files changed, 26 insertions(+), 9 deletions(-)
>
> diff --git a/tests/qapi-schema/bad-type-int.json 
> b/tests/qapi-schema/bad-type-int.json
> index 56fc6f8126..81355eb196 100644
> --- a/tests/qapi-schema/bad-type-int.json
> +++ b/tests/qapi-schema/bad-type-int.json
> @@ -1,3 +1,2 @@
>  # we reject an expression with a metatype that is not a string
> -# FIXME: once the parser understands integer inputs, improve the error 
> message
>  { 'struct': 1, 'data': { } }
> diff --git a/tests/qapi-schema/enum-int-member.json 
> b/tests/qapi-schema/enum-int-member.json
> index 6c9c32e149..6958440c6d 100644
> --- a/tests/qapi-schema/enum-int-member.json
> +++ b/tests/qapi-schema/enum-int-member.json
> @@ -1,3 +1,2 @@
>  # we reject any enum member that is not a string
> -# FIXME: once the parser understands integer inputs, improve the error 
> message
>  { 'enum': 'MyEnum', 'data': [ 1 ] }
> diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
> index d61bfdc526..3396ea4a09 100644
> --- a/scripts/qapi/common.py
> +++ b/scripts/qapi/common.py
> @@ -498,6 +498,8 @@ class QAPISchemaParser(object):
>  raise QAPISemError(info, "Unknown pragma '%s'" % name)
>  
>  def accept(self, skip_comment=True):
> +num_match = re.compile(r'([-+]?inf|nan|[-+0-9.][0-9a-f.ex]*)')
> +
>  while True:
>  self.tok = self.src[self.cursor]
>  self.pos = self.cursor

This is yet another extension over plain JSON.  RFC 8259:

  number = [ minus ] int [ frac ] [ exp ]
  decimal-point = %x2E   ; .
  digit1-9 = %x31-39 ; 1-9
  e = %x65 / %x45; e E
  exp = e [ minus / plus ] 1*DIGIT
  frac = decimal-point 1*DIGIT
  int = zero / ( digit1-9 *DIGIT )
  minus = %x2D   ; -
  plus = %x2B; +
  zero = %x30; 0

Extensions are acceptable when we have an actual use for it, and we
document them properly.

Isn't the parenthesis in your regular expression redundant?

What use do you have in mind for 'inf' and 'nan'?

Why accept leading '+' as in '+123'?

Why accept empty integral part as in '.123'?

Why accept '.xe.'?  Kidding you, that must be a bug in your regexp.
Please decide what number syntax you'd like to accept, then specify it
in docs/devel/qapi-code-gen.txt, so we can first discuss the
specification, and then check the regexp implements it.

docs/devel/qapi-code-gen.txt update goes here:

=== Schema syntax ===

Syntax is loosely based on JSON (http://www.ietf.org/rfc/rfc8259.txt).
Differences:

* Comments: start with a hash character (#) that is not part of a
  string, and extend to the end of the line.

* Strings are enclosed in 'single quotes', not "double quotes".

* Strings are restricted to printable ASCII, and escape sequences to
  just '\\'.

--> * Numbers and null are not supported.

Hrmm, commit 9d55380b5a "qapi: Remove null from schema language" left
two instances in error messages behind.  I'll fix them.

> @@ -584,7 +586,22 @@ class QAPISchemaParser(object):
>  return
>  self.line += 1
>  self.line_pos = self.cursor
> -elif not self.tok.isspace():
> +elif self.tok.isspace():
> +pass
> +elif num_match.match(self.src[self.pos:]):
> +match = num_match.match(self.src[self.pos:]).group(0)

Sadly, the walrus operator is Python 3.8.

> +try:
> +self.val = int(match, 0)
> +except ValueError:
> +try:
> +self.val = float(match)
> +except ValueError:
> +raise QAPIParseError(self,
> +'"%s" is not a valid integer or float' % 
> match)
> +
> +self.cursor += len(match) - 1
> +return
> +else:
>  raise QAPIParseError(self, 'Stray "%s"' % self.tok)

Any particular reason for putting the number case last?

>  
>  def get_members(self):
> @@ -617,9 +634,9 @@ class QAPISchemaParser(object):
>  if self.tok == ']':
>  self.accept()
>  return expr
> -if self.tok not in "{['tfn":
> +if self.tok not in "{['tfn-+0123456789.i":

This is getting a bit ugly.  Let's not worry about it now.

>  raise QAPIParseError(self, 'Expected "{", "[", "]", string, '
> - 'boolean or "null"')
> +   

Re: [PATCH v4 00/14] block: Try to create well-typed json:{} filenames

2019-11-14 Thread Markus Armbruster
Max Reitz  writes:

> On 13.09.19 13:49, Max Reitz wrote:
>> Another gentle ping.
>
> And another.

Conflicts with the refactoring merged in commit 69717d0f890.  Please
accept my apologies for the inconvenience caused by the excessive delay.

I'll try to review anyway.