Re: [PULL 0/1] Block layer patches for 5.2.0-rc3

2020-11-24 Thread Peter Maydell
On Tue, 24 Nov 2020 at 14:25, Kevin Wolf  wrote:
>
> The following changes since commit 23895cbd82be95428e90168b12e925d0d3ca2f06:
>
>   Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20201123.0' 
> into staging (2020-11-23 18:51:13 +)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to c8bf9a9169db094aaed680cdba570758c0dc18b9:
>
>   qcow2: Fix corruption on write_zeroes with MAY_UNMAP (2020-11-24 11:29:41 
> +0100)
>
> 
> Patches for 5.2.0-rc3:
>
> - qcow2: Fix corruption on write_zeroes with MAY_UNMAP
>
> 
> Maxim Levitsky (1):
>   qcow2: Fix corruption on write_zeroes with MAY_UNMAP


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.2
for any user-visible changes.

-- PMM



Re: [PATCH 1/1] Fix qcow2 corruption on discard

2020-11-24 Thread Maxim Levitsky
On Tue, 2020-11-24 at 20:59 +0200, Maxim Levitsky wrote:
> On Tue, 2020-11-24 at 19:59 +0100, Alberto Garcia wrote:
> > On Tue 24 Nov 2020 10:17:23 AM CET, Kevin Wolf wrote:
> > > We can then continue work to find a minimal reproducer and merge the
> > > test case in the early 6.0 cycle.
> > 
> > I haven't been able to reproduce the problem yet, do you have any
> > findings?
> > 
> > Berto
> > 
> 
> I have a working reproducer script. I'll send it in a hour or so.
> Best regards,
>   Maxim Levitsky

I have attached a minimal reproducer for this issue.
I can convert this to an iotest if you think that this is worth it.


So these are the exact conditions for the corruption to happen:

1. Image must have at least 5 refcount tables 
(1 more that default refcount table cache size, which is 4 by default)


2. IO pattern that populates the 4 entry refcount table cache fully:

 Easiest way to do it is to have 4 L2 entries populated in the base image,
 such as each entry references a physical cluster that is served by different
 refcount table.
 
 Then discard these entries in the snapshot, triggering discard in the
 base file during the commit, which will populate the refcount table cache.


4. A discard of a cluster that belongs to 5th refcount table (done in the
   exact same way as above discards).
   It should be done soon, before L2 cache flushed due to some unrelated
   IO.

   This triggers the corruption:

The call stack is:

2. qcow2_free_any_cluster->
qcow2_free_clusters->
update_refcount:

//This sets dependency between flushing the refcount 
cache and l2 cache.
if (decrease)
qcow2_cache_set_dependency(bs, 
s->refcount_block_cache,s->l2_table_cache);


ret = alloc_refcount_block(bs, cluster_index, 
_block);
return load_refcount_block(bs, 
refcount_block_offset, refcount_block);
return qcow2_cache_get(...
qcow2_cache_do_get
/* because of a cache 
miss, we have to evict an entry*/
ret = 
qcow2_cache_entry_flush(bs, c, i);
if (c->depends) {
/* this flushes 
the L2 cache */
ret = 
qcow2_cache_flush_dependency(bs, c);
}


I had attached a reproducer that works with almost any cluster size and 
refcount block size.
Cluster sizes below 4K don't work because commit which is done by the mirror 
job which works on 4K granularity,
and that results in it not doing any discards due to various alignment 
restrictions.

If I patch qemu to make mirror job work on 512B granularity, test reproduces 
for small clusters as well.

The reproducer creates a qcow2 image in the current directory and it needs 
about 11G for default parameters.
(64K cluster size, 16 bit refcounts).
For 4K cluster size and 64 bit refcounts, it needs only 11M.
(This can be changed by editing the script)

Best regards,
Maxim Levitsky


reproducer.sh
Description: application/shellscript


Re: [PATCH 1/1] Fix qcow2 corruption on discard

2020-11-24 Thread Maxim Levitsky
On Tue, 2020-11-24 at 19:59 +0100, Alberto Garcia wrote:
> On Tue 24 Nov 2020 10:17:23 AM CET, Kevin Wolf wrote:
> > We can then continue work to find a minimal reproducer and merge the
> > test case in the early 6.0 cycle.
> 
> I haven't been able to reproduce the problem yet, do you have any
> findings?
> 
> Berto
> 

I have a working reproducer script. I'll send it in a hour or so.
Best regards,
Maxim Levitsky




Re: [PATCH 1/1] Fix qcow2 corruption on discard

2020-11-24 Thread Alberto Garcia
On Tue 24 Nov 2020 10:17:23 AM CET, Kevin Wolf wrote:
> We can then continue work to find a minimal reproducer and merge the
> test case in the early 6.0 cycle.

I haven't been able to reproduce the problem yet, do you have any
findings?

Berto



Re: [PATCH v2 2/2] monitor: increase amount of data for monitor to read

2020-11-24 Thread Vladimir Sementsov-Ogievskiy

24.11.2020 14:03, Vladimir Sementsov-Ogievskiy wrote:

23.11.2020 18:44, Andrey Shinkevich wrote:

QMP and HMP monitors read one byte at a time from the socket or stdin,
which is very inefficient. With 100+ VMs on the host, this results in
multiple extra system calls and CPU overuse.
This patch increases the amount of read data up to 4096 bytes that fits
the buffer size on the channel level.
A JSON little parser is introduced to throttle QMP commands read from
the buffer so that incoming requests do not overflow the monitor input
queue.

Suggested-by: Denis V. Lunev
Signed-off-by: Andrey Shinkevich



Can't we just increase qmp queue instead? It seems a lot simpler:

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 348bfad3d5..7e721eee3f 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -8,7 +8,7 @@
  typedef struct MonitorHMP MonitorHMP;
  typedef struct MonitorOptions MonitorOptions;

-#define QMP_REQ_QUEUE_LEN_MAX 8
+#define QMP_REQ_QUEUE_LEN_MAX 4096

  extern QemuOptsList qemu_mon_opts;

diff --git a/monitor/monitor.c b/monitor/monitor.c
index 84222cd130..1588f00306 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -566,7 +566,7 @@ int monitor_can_read(void *opaque)
  {
  Monitor *mon = opaque;

-    return !qatomic_mb_read(>suspend_cnt);
+    return !qatomic_mb_read(>suspend_cnt) ? 4096 : 0;
  }


- with this patch tests pass and performance is even better.




Suddenly I found, that this patch ^^ was sent a year ago:

  https://patchew.org/QEMU/20190610105906.28524-1-dplotni...@virtuozzo.com/

some questions were asked, so I think we should start from it.


--
Best regards,
Vladimir



[PULL 0/1] Block layer patches for 5.2.0-rc3

2020-11-24 Thread Kevin Wolf
The following changes since commit 23895cbd82be95428e90168b12e925d0d3ca2f06:

  Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20201123.0' 
into staging (2020-11-23 18:51:13 +)

are available in the Git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to c8bf9a9169db094aaed680cdba570758c0dc18b9:

  qcow2: Fix corruption on write_zeroes with MAY_UNMAP (2020-11-24 11:29:41 
+0100)


Patches for 5.2.0-rc3:

- qcow2: Fix corruption on write_zeroes with MAY_UNMAP


Maxim Levitsky (1):
  qcow2: Fix corruption on write_zeroes with MAY_UNMAP

 block/qcow2-cluster.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)




[PULL 1/1] qcow2: Fix corruption on write_zeroes with MAY_UNMAP

2020-11-24 Thread Kevin Wolf
From: Maxim Levitsky 

Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
introduced a subtle change to code in zero_in_l2_slice:

It swapped the order of

1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);

To

1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);

It seems harmless, however the call to qcow2_free_any_clusters can
trigger a cache flush which can mark the L2 table as clean, and
assuming that this was the last write to it, a stale version of it
will remain on the disk.

Now we have a valid L2 entry pointing to a freed cluster. Oops.

Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
Signed-off-by: Maxim Levitsky 
[ kwolf: Fixed to restore the correct original order from before
  205fa50750; added comments like in discard_in_l2_slice(). ]
Signed-off-by: Kevin Wolf 
Message-Id: <20201124092815.39056-1-kw...@redhat.com>
Reviewed-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 block/qcow2-cluster.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 485b4cb92e..bd0597842f 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2010,14 +2010,17 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 continue;
 }
 
+/* First update L2 entries */
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-if (unmap) {
-qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST);
-}
 set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
 if (has_subclusters(s)) {
 set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
 }
+
+/* Then decrease the refcount */
+if (unmap) {
+qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST);
+}
 }
 
 qcow2_cache_put(s->l2_table_cache, (void **) _slice);
-- 
2.28.0




RE: [raw] Guest stuck during live live-migration

2020-11-24 Thread Quentin Grolleau
Thanks Kevin, 

> > Hello,
> >
> > In our company, we are hosting a large number of Vm, hosted behind 
> > Openstack (so libvirt/qemu).
> > A large majority of our Vms are runnign with local data only, stored on 
> > NVME, and most of them are RAW disks.
> >
> > With Qemu 4.0 (can be even with older version) we see strange  
> > live-migration comportement:

> First of all, 4.0 is relatively old. Generally it is worth retrying with
> the most recent code (git master or 5.2.0-rc2) before having a closer
> look at problems, because it is frustrating to spend considerable time
> debugging an issue and then find out it has already been fixed a year
> ago.

> I will try to build it the most recent code


I will try to build with the most recent code, but it will take me some time to 
do it


> > - some Vms live migrate at very high speed without issue (> 6 Gbps)
> > - some Vms are running correctly, but migrating at a strange low speed 
> > (3Gbps)
> > - some Vms are migrating at a very low speed (1Gbps, sometime less) and 
> > during the migration the guest is completely I/O stuck
> >
> > When this issue happen the VM is completly block, iostat in the Vm show us 
> > a latency of 30 secs

> Can you get the stack backtraces of all QEMU threads while the VM is
> blocked (e.g. with gdb or pstack)?

(gdb) thread apply all bt 

 
Thread 20 (Thread 0x7f8a0effd700 (LWP 201248)): 



#0  pthread_cond_wait@@GLIBC_2.3.2 () at 
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185   


#1  0x56520139878b in qemu_cond_wait_impl (cond=0x5652020f27b0, 
mutex=0x5652020f27e8, file=0x5652014e4178 
"/root/qemu_debug_LSEEK/qemu_debug/qemu/ui/vnc-jobs.c", line=214) at 
/root/qemu_debug_LSEEK/qemu_debug/qemu/util/qemu-thread-posix.c:161
#2  0x5652012a264d in vnc_worker_thread_loop 
(queue=queue@entry=0x5652020f27b0) at 
/root/qemu_debug_LSEEK/qemu_debug/qemu/ui/vnc-jobs.c:214

 
#3  0x5652012a2c18 in vnc_worker_thread (arg=arg@entry=0x5652020f27b0) at 
/root/qemu_debug_LSEEK/qemu_debug/qemu/ui/vnc-jobs.c:324
   
#4  0x565201398116 in qemu_thread_start (args=) at 
/root/qemu_debug_LSEEK/qemu_debug/qemu/util/qemu-thread-posix.c:502
#5  0x7f8a5e24a6ba in start_thread (arg=0x7f8a0effd700) at 
pthread_create.c:333  
#6  0x7f8a5df8041d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109   
   

   
Thread 19 (Thread 0x7f8a0700 (LWP 201222)): 
   
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x7f8a5e24cdbd in __GI___pthread_mutex_lock 
(mutex=mutex@entry=0x565201adb680 ) at 
../nptl/pthread_mutex_lock.c:80
#2  0x565201398263 in qemu_mutex_lock_impl (mutex=0x565201adb680 
, file=0x5652013d7c68 
"/root/qemu_debug_LSEEK/qemu_debug/qemu/accel/kvm/kvm-all.c", line=2089) at 
/root/qemu_debug_LSEEK/qemu_debug/qemu/util/qemu-thread-posix.c:66
#3  0x565200f7d00e in qemu_mutex_lock_iothread_impl 
(file=file@entry=0x5652013d7c68 
"/root/qemu_debug_LSEEK/qemu_debug/qemu/accel/kvm/kvm-all.c", 
line=line@entry=2089) at /root/qemu_debug_LSEEK/qemu_debug/qemu/cpus.c:1850
#4  0x565200fa7ca8 in kvm_cpu_exec (cpu=cpu@entry=0x565202354480) at 
/root/qemu_debug_LSEEK/qemu_debug/qemu/accel/kvm/kvm-all.c:2089 

   
#5  0x565200f7d1ce in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x565202354480) 
at /root/qemu_debug_LSEEK/qemu_debug/qemu/cpus.c:1281   
 
#6  0x565201398116 in qemu_thread_start (args=) at 
/root/qemu_debug_LSEEK/qemu_debug/qemu/util/qemu-thread-posix.c:502
#7  0x7f8a5e24a6ba in start_thread (arg=0x7f8a0700) at 
pthread_create.c:333  
#8  0x7f8a5df8041d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109   
   

   
Thread 18 (Thread 

Re: [PATCH v2 1/2] iotests: add another bash sleep command to 247

2020-11-24 Thread Andrey Shinkevich

On 23.11.2020 18:44, Andrey Shinkevich wrote:

This patch paves the way for the one that follows. The following patch
makes the QMP monitor to read up to 4K from stdin at once. That results
in running the bash 'sleep' command before the _qemu_proc_exec() starts
in subshell. Another 'sleep' command with an unobtrusive 'query-status'
plays as a workaround.

Signed-off-by: Andrey Shinkevich 
---
  tests/qemu-iotests/247 | 2 ++
  tests/qemu-iotests/247.out | 1 +
  2 files changed, 3 insertions(+)



[...]

With the patch 2/2 of the current version 2, the test case #247 passes 
without this patch 1/2. So, it may be excluded from the series.

Thanks to Vladimir for the idea to check.

Andrey



Re: [PATCH for-5.2 05/10] vhost-user-blk-test: close fork child file descriptors

2020-11-24 Thread Coiby Xu

Hi Stefan,

On Wed, Nov 11, 2020 at 12:43:26PM +, Stefan Hajnoczi wrote:

Do not leave stdin and stdout open after fork. stdout is the
tap-driver.pl pipe. If we keep the pipe open then tap-driver.pl will not
detect that qos-test has terminated and it will hang.


I wonder under which situation this would happen. I couldn't re-produce
this issue locally.



Signed-off-by: Stefan Hajnoczi 
---
tests/qtest/vhost-user-blk-test.c | 9 +
1 file changed, 9 insertions(+)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index f05f14c192..4019a72ac0 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -749,6 +749,15 @@ static char *start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
   storage_daemon_command->str);
pid_t pid = fork();
if (pid == 0) {
+/*
+ * Close standard file descriptors so tap-driver.pl pipe detects when
+ * our parent terminates.
+ */
+close(0);
+close(1);
+open("/dev/null", O_RDONLY);
+open("/dev/null", O_WRONLY);
+
execlp("/bin/sh", "sh", "-c", storage_daemon_command->str, NULL);
exit(1);
}
--
2.28.0



--
Best regards,
Coiby



Re: [PATCH v2 1/2] iotests: add another bash sleep command to 247

2020-11-24 Thread Andrey Shinkevich

On 24.11.2020 13:04, Vladimir Sementsov-Ogievskiy wrote:

23.11.2020 18:44, Andrey Shinkevich wrote:

This patch paves the way for the one that follows. The following patch
makes the QMP monitor to read up to 4K from stdin at once. That results
in running the bash 'sleep' command before the _qemu_proc_exec() starts


But how? Before _qemu_proc_exec() starts, qemu monitor is not runnning,
and its new behavior can't influence..



I am not a bash expert to explain 'how' but this workaround works. It's 
just a test. Maybe other colleagues can say.



If bash subshell work in unpredictable way, may be better is refactor test
to send commands one by one with help of _send_qemu_cmd. Then sleep will
be natively executed between sending commands.



Or maybe write a similar test case in Python if Kevin agrees.


in subshell. Another 'sleep' command with an unobtrusive 'query-status'
plays as a workaround.

Signed-off-by: Andrey Shinkevich 
---
  tests/qemu-iotests/247 | 2 ++
  tests/qemu-iotests/247.out | 1 +
  2 files changed, 3 insertions(+)

diff --git a/tests/qemu-iotests/247 b/tests/qemu-iotests/247
index 87e37b3..7d316ec 100755
--- a/tests/qemu-iotests/247
+++ b/tests/qemu-iotests/247
@@ -59,6 +59,8 @@ TEST_IMG="$TEST_IMG.4" _make_test_img $size
  {"execute":"block-commit",
   "arguments":{"device":"format-4", "top-node": "format-2", 
"base-node":"format-0", "job-id":"job0"}}

  EOF
+sleep 1
+echo '{"execute":"query-status"}'
  if [ "${VALGRIND_QEMU}" == "y" ]; then
  sleep 10
  else
diff --git a/tests/qemu-iotests/247.out b/tests/qemu-iotests/247.out
index e909e83..13d9547 100644
--- a/tests/qemu-iotests/247.out
+++ b/tests/qemu-iotests/247.out
@@ -17,6 +17,7 @@ QMP_VERSION
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
"event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", "len": 
134217728, "offset": 134217728, "speed": 0, "type": "commit"}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
"event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": 
"job0"}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
"event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job0"}}

+{"return": {"status": "running", "singlestep": false, "running": true}}
  {"return": {}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
"event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}

  *** done








Re: [PATCH v2 2/2] monitor: increase amount of data for monitor to read

2020-11-24 Thread Vladimir Sementsov-Ogievskiy

23.11.2020 18:44, Andrey Shinkevich wrote:

QMP and HMP monitors read one byte at a time from the socket or stdin,
which is very inefficient. With 100+ VMs on the host, this results in
multiple extra system calls and CPU overuse.
This patch increases the amount of read data up to 4096 bytes that fits
the buffer size on the channel level.
A JSON little parser is introduced to throttle QMP commands read from
the buffer so that incoming requests do not overflow the monitor input
queue.

Suggested-by: Denis V. Lunev
Signed-off-by: Andrey Shinkevich



Can't we just increase qmp queue instead? It seems a lot simpler:

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 348bfad3d5..7e721eee3f 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -8,7 +8,7 @@
 typedef struct MonitorHMP MonitorHMP;
 typedef struct MonitorOptions MonitorOptions;
 
-#define QMP_REQ_QUEUE_LEN_MAX 8

+#define QMP_REQ_QUEUE_LEN_MAX 4096
 
 extern QemuOptsList qemu_mon_opts;
 
diff --git a/monitor/monitor.c b/monitor/monitor.c

index 84222cd130..1588f00306 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -566,7 +566,7 @@ int monitor_can_read(void *opaque)
 {
 Monitor *mon = opaque;
 
-return !qatomic_mb_read(>suspend_cnt);

+return !qatomic_mb_read(>suspend_cnt) ? 4096 : 0;
 }


- with this patch tests pass and performance is even better.


--
Best regards,
Vladimir



Re: [PATCH for-5.2 v2] qcow2: Fix corruption on write_zeroes with MAY_UNMAP

2020-11-24 Thread Alberto Garcia
On Tue 24 Nov 2020 10:28:15 AM CET, Kevin Wolf wrote:
> From: Maxim Levitsky 
>
> Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
> introduced a subtle change to code in zero_in_l2_slice:
>
> It swapped the order of
>
> 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
> 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
>
> To
>
> 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
>
> It seems harmless, however the call to qcow2_free_any_clusters can
> trigger a cache flush which can mark the L2 table as clean, and
> assuming that this was the last write to it, a stale version of it
> will remain on the disk.
>
> Now we have a valid L2 entry pointing to a freed cluster. Oops.
>
> Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
> Signed-off-by: Maxim Levitsky 
> [ kwolf: Fixed to restore the correct original order from before
>   205fa50750; added comments like in discard_in_l2_slice(). ]

Reviewed-by: Alberto Garcia 

Berto



[PATCH] block: Don't inactivate bs if it is aleady inactive

2020-11-24 Thread Tuguoyi
The following steps will cause qemu assertion failure:
- pause vm
- save memory snapshot into local file through fd migration
- do the above operation again will cause qemu assertion failure

The backtrace looks like:
#0  0x7fbf958c5c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x7fbf958c9028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x7fbf958bebf6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x7fbf958beca2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x55ca8decd39d in bdrv_inactivate_recurse (bs=0x55ca90c80400) at 
/build/qemu-5.0/block.c:5724
#5  0x55ca8dece967 in bdrv_inactivate_all () at 
/build//qemu-5.0/block.c:5792
#6  0x55ca8de5539d in qemu_savevm_state_complete_precopy_non_iterable 
(inactivate_disks=true, in_postcopy=false, f=0x55ca907044b0)
at /build/qemu-5.0/migration/savevm.c:1401
#7  qemu_savevm_state_complete_precopy (f=0x55ca907044b0, 
iterable_only=iterable_only@entry=false, 
inactivate_disks=inactivate_disks@entry=true)
at /build/qemu-5.0/migration/savevm.c:1453
#8  0x55ca8de4f581 in migration_completion (s=0x55ca8f64d9f0) at 
/build/qemu-5.0/migration/migration.c:2941
#9  migration_iteration_run (s=0x55ca8f64d9f0) at 
/build/qemu-5.0/migration/migration.c:3295
#10 migration_thread (opaque=opaque@entry=0x55ca8f64d9f0) at 
/build/qemu-5.0/migration/migration.c:3459
#11 0x55ca8dfc6716 in qemu_thread_start (args=) at 
/build/qemu-5.0/util/qemu-thread-posix.c:519
#12 0x7fbf95c5f184 in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#13 0x7fbf9598cbed in clone () from /lib/x86_64-linux-gnu/libc.so.6

When the first migration completes, bs->open_flags will set BDRV_O_INACTIVE flag
by bdrv_inactivate_recurse(), and during the second migration the
bdrv_inactivate_recurse assert that the bs->open_flags is already 
BDRV_O_INACTIVE
enabled which cause crash.

This patch just make the bdrv_inactivate_all() to don't inactivate the bs if it 
is
already inactive

Signed-off-by: Tuguoyi 
---
 block.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index f1cedac..02361e1 100644
--- a/block.c
+++ b/block.c
@@ -5938,6 +5938,11 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs)
 return 0;
 }
 
+static bool bdrv_is_inactive(BlockDriverState *bs)
+{
+return bs->open_flags & BDRV_O_INACTIVE;
+}
+
 int bdrv_inactivate_all(void)
 {
 BlockDriverState *bs = NULL;
@@ -5958,7 +5963,7 @@ int bdrv_inactivate_all(void)
 /* Nodes with BDS parents are covered by recursion from the last
  * parent that gets inactivated. Don't inactivate them a second
  * time if that has already happened. */
-if (bdrv_has_bds_parent(bs, false)) {
+if (bdrv_has_bds_parent(bs, false) || bdrv_is_inactive(bs)) {
 continue;
 }
 ret = bdrv_inactivate_recurse(bs);
-- 
2.7.4


--
Best regards,
Guoyi




Re: [PATCH v2 1/2] iotests: add another bash sleep command to 247

2020-11-24 Thread Vladimir Sementsov-Ogievskiy

23.11.2020 18:44, Andrey Shinkevich wrote:

This patch paves the way for the one that follows. The following patch
makes the QMP monitor to read up to 4K from stdin at once. That results
in running the bash 'sleep' command before the _qemu_proc_exec() starts


But how? Before _qemu_proc_exec() starts, qemu monitor is not runnning,
and its new behavior can't influence..

If bash subshell work in unpredictable way, may be better is refactor test
to send commands one by one with help of _send_qemu_cmd. Then sleep will
be natively executed between sending commands.


in subshell. Another 'sleep' command with an unobtrusive 'query-status'
plays as a workaround.

Signed-off-by: Andrey Shinkevich 
---
  tests/qemu-iotests/247 | 2 ++
  tests/qemu-iotests/247.out | 1 +
  2 files changed, 3 insertions(+)

diff --git a/tests/qemu-iotests/247 b/tests/qemu-iotests/247
index 87e37b3..7d316ec 100755
--- a/tests/qemu-iotests/247
+++ b/tests/qemu-iotests/247
@@ -59,6 +59,8 @@ TEST_IMG="$TEST_IMG.4" _make_test_img $size
  {"execute":"block-commit",
   "arguments":{"device":"format-4", "top-node": "format-2", "base-node":"format-0", 
"job-id":"job0"}}
  EOF
+sleep 1
+echo '{"execute":"query-status"}'
  if [ "${VALGRIND_QEMU}" == "y" ]; then
  sleep 10
  else
diff --git a/tests/qemu-iotests/247.out b/tests/qemu-iotests/247.out
index e909e83..13d9547 100644
--- a/tests/qemu-iotests/247.out
+++ b/tests/qemu-iotests/247.out
@@ -17,6 +17,7 @@ QMP_VERSION
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", 
"len": 134217728, "offset": 134217728, "speed": 0, "type": "commit"}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "concluded", "id": "job0"}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "null", "id": "job0"}}
+{"return": {"status": "running", "singlestep": false, "running": true}}
  {"return": {}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": 
{"guest": false, "reason": "host-qmp-quit"}}
  *** done




--
Best regards,
Vladimir



Re: [PATCH 04/21] block: bdrv_refresh_perms: check parents compliance

2020-11-24 Thread Vladimir Sementsov-Ogievskiy

23.11.2020 23:12, Vladimir Sementsov-Ogievskiy wrote:

Add additional check that node parents do not interfere with each
other. This should not hurt existing callers and allows in further
patch use bdrv_refresh_perms() to update a subtree of changed
BdrvChild (check that change is correct).

New check will substitute bdrv_check_update_perm() in following
permissions refactoring, so keep error messages the same to avoid
unit test result changes.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block.c | 62 -
  1 file changed, 53 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index 0dd28f0902..0d0f065db4 100644
--- a/block.c
+++ b/block.c
@@ -1945,6 +1945,56 @@ bool bdrv_is_writable(BlockDriverState *bs)
  return bdrv_is_writable_after_reopen(bs, NULL);
  }
  
+static char *bdrv_child_user_desc(BdrvChild *c)

+{
+if (c->klass->get_parent_desc) {
+return c->klass->get_parent_desc(c);
+}
+
+return g_strdup("another user");
+}
+
+static bool bdrv_a_allow_b(BdrvChild *a, BdrvChild *b, Error **errp)
+{
+g_autofree char *user = NULL;
+g_autofree char *perm_names = NULL;
+
+if ((b->perm & a->shared_perm) == b->perm) {
+return true;
+}
+
+perm_names = bdrv_perm_names(b->perm & ~a->shared_perm);
+user = bdrv_child_user_desc(a);
+error_setg(errp, "Conflicts with use by %s as '%s', which does not "
+   "allow '%s' on %s",
+   user, a->name, perm_names, bdrv_get_node_name(b->bs));
+
+return false;
+}
+
+static bool bdrv_check_parents_compliance(BlockDriverState *bs, Error **errp)
+{
+BdrvChild *a, *b;
+
+QLIST_FOREACH(a, >parents, next_parent) {
+QLIST_FOREACH(b, >parents, next_parent) {
+if (a == b) {
+continue;
+}
+
+if (!bdrv_a_allow_b(a, b, errp)) {
+return false;
+}
+
+if (!bdrv_a_allow_b(b, a, errp)) {
+return false;
+}


drop this if. We look at each pair twice anyway.

--
Best regards,
Vladimir



Re: [PATCH 2/2] block: assert that permission commit sets same permissions

2020-11-24 Thread Vladimir Sementsov-Ogievskiy

23.11.2020 23:12, Vladimir Sementsov-Ogievskiy wrote:

On permission update commit we must set same permissions as on_check_.
Let's add assertions. Next step may be to drop permission parameters
from_set_.

Note that prior to previous commit, fixing bdrv_drop_intermediate(),
new assertion in bdrv_child_set_perm() crashes on iotests 30 and 40.

Signed-off-by: Vladimir Sementsov-Ogievskiy


this is accidental patch, ignore it

--
Best regards,
Vladimir



Re: [PATCH 1/2] block: make bdrv_drop_intermediate() less wrong

2020-11-24 Thread Vladimir Sementsov-Ogievskiy

23.11.2020 23:12, Vladimir Sementsov-Ogievskiy wrote:

First, permission update loop tries to do iterations transactionally,
but the whole update is not transactional: nobody roll-back successful
loop iterations when some iteration fails.

Second, in the iteration we have nested permission update:
c->klass->update_filename may point to bdrv_child_cb_update_filename()
which calls bdrv_backing_update_filename(), which may do node reopen to
RW.

Permission update system is not prepared to nested updates, at least it
has intermediate permission-update state stored in BdrvChild
structures: has_backup_perm, backup_perm and backup_shared_perm.

So, let's first do bdrv_replace_node() (which is more transactional
than open-coded update in bdrv_drop_intermediate()) and then call
update_filename() in separate. We still do not rollback changes in case
of update_filename() failure but it's not much worse than pre-patch
behavior.

Signed-off-by: Vladimir Sementsov-Ogievskiy


this is accidental patch, ignore it

--
Best regards,
Vladimir



[PATCH for-5.2 v2] qcow2: Fix corruption on write_zeroes with MAY_UNMAP

2020-11-24 Thread Kevin Wolf
From: Maxim Levitsky 

Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
introduced a subtle change to code in zero_in_l2_slice:

It swapped the order of

1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);

To

1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);

It seems harmless, however the call to qcow2_free_any_clusters can
trigger a cache flush which can mark the L2 table as clean, and
assuming that this was the last write to it, a stale version of it
will remain on the disk.

Now we have a valid L2 entry pointing to a freed cluster. Oops.

Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
Signed-off-by: Maxim Levitsky 
[ kwolf: Fixed to restore the correct original order from before
  205fa50750; added comments like in discard_in_l2_slice(). ]
Signed-off-by: Kevin Wolf 
---
 block/qcow2-cluster.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 485b4cb92e..bd0597842f 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2010,14 +2010,17 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 continue;
 }
 
+/* First update L2 entries */
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-if (unmap) {
-qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST);
-}
 set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
 if (has_subclusters(s)) {
 set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
 }
+
+/* Then decrease the refcount */
+if (unmap) {
+qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST);
+}
 }
 
 qcow2_cache_put(s->l2_table_cache, (void **) _slice);
-- 
2.28.0




Re: [PATCH 1/1] Fix qcow2 corruption on discard

2020-11-24 Thread Kevin Wolf
Am 23.11.2020 um 19:11 hat Maxim Levitsky geschrieben:
> On Mon, 2020-11-23 at 18:38 +0100, Kevin Wolf wrote:
> > Am 23.11.2020 um 16:49 hat Maxim Levitsky geschrieben:
> > > Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
> > > introduced a subtle change to code in zero_in_l2_slice:
> > > 
> > > It swapped the order of
> > > 
> > > 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> > > 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
> > > 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> > > 
> > > To
> > > 
> > > 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> > > 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> > > 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
> > > 
> > > It seems harmless, however the call to qcow2_free_any_clusters
> > > can trigger a cache flush which can mark the L2 table as clean,
> > > and assuming that this was the last write to it,
> > > a stale version of it will remain on the disk.
> > 
> > Do you have more details on this last paragraph? I'm trying to come up
> > with a reproducer, but I don't see how qcow2_free_any_clusters() could
> > flush the L2 table cache. (It's easy to get it to flush the refcount
> > block cache, but that's useless for a reproducer.)
> > 
> > The only way I see to flush any cache with it is in update_refcount()
> > the qcow2_cache_set_dependency() call. This will always flush the cache
> > that the L2 cache depends on - which will never be the L2 cache itself,
> > but always either the refcount cache or nothing.
> > 
> > There are more options in alloc_refcount_block() if we're allocating a
> > new refcount block, but in the context of freeing clusters we'll never
> > need to do that.
> > 
> > Whatever I tried, at the end of zero_in_l2_slice(), I have a dirty L2
> > table and a dirty refcount block in the cache, with a dependency that
> > makes sure that the L2 table will be written out first.
> > 
> > If you don't have the information yet, can you try to debug your manual
> > reproducer a bit more to find out how this happens?
> I'll do this tomorrow.

As the last RC for 5.2 is today, I will send a v2 that changes the fix
to restore the original order.

We can then continue work to find a minimal reproducer and merge the
test case in the early 6.0 cycle.

Kevin