Re: [PULL 0/1] Block layer patches for 5.2.0-rc3
On Tue, 24 Nov 2020 at 14:25, Kevin Wolf wrote: > > The following changes since commit 23895cbd82be95428e90168b12e925d0d3ca2f06: > > Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20201123.0' > into staging (2020-11-23 18:51:13 +) > > are available in the Git repository at: > > git://repo.or.cz/qemu/kevin.git tags/for-upstream > > for you to fetch changes up to c8bf9a9169db094aaed680cdba570758c0dc18b9: > > qcow2: Fix corruption on write_zeroes with MAY_UNMAP (2020-11-24 11:29:41 > +0100) > > > Patches for 5.2.0-rc3: > > - qcow2: Fix corruption on write_zeroes with MAY_UNMAP > > > Maxim Levitsky (1): > qcow2: Fix corruption on write_zeroes with MAY_UNMAP Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/5.2 for any user-visible changes. -- PMM
Re: [PATCH 1/1] Fix qcow2 corruption on discard
On Tue, 2020-11-24 at 20:59 +0200, Maxim Levitsky wrote: > On Tue, 2020-11-24 at 19:59 +0100, Alberto Garcia wrote: > > On Tue 24 Nov 2020 10:17:23 AM CET, Kevin Wolf wrote: > > > We can then continue work to find a minimal reproducer and merge the > > > test case in the early 6.0 cycle. > > > > I haven't been able to reproduce the problem yet, do you have any > > findings? > > > > Berto > > > > I have a working reproducer script. I'll send it in a hour or so. > Best regards, > Maxim Levitsky I have attached a minimal reproducer for this issue. I can convert this to an iotest if you think that this is worth it. So these are the exact conditions for the corruption to happen: 1. Image must have at least 5 refcount tables (1 more that default refcount table cache size, which is 4 by default) 2. IO pattern that populates the 4 entry refcount table cache fully: Easiest way to do it is to have 4 L2 entries populated in the base image, such as each entry references a physical cluster that is served by different refcount table. Then discard these entries in the snapshot, triggering discard in the base file during the commit, which will populate the refcount table cache. 4. A discard of a cluster that belongs to 5th refcount table (done in the exact same way as above discards). It should be done soon, before L2 cache flushed due to some unrelated IO. This triggers the corruption: The call stack is: 2. qcow2_free_any_cluster-> qcow2_free_clusters-> update_refcount: //This sets dependency between flushing the refcount cache and l2 cache. if (decrease) qcow2_cache_set_dependency(bs, s->refcount_block_cache,s->l2_table_cache); ret = alloc_refcount_block(bs, cluster_index, &refcount_block); return load_refcount_block(bs, refcount_block_offset, refcount_block); return qcow2_cache_get(... qcow2_cache_do_get /* because of a cache miss, we have to evict an entry*/ ret = qcow2_cache_entry_flush(bs, c, i); if (c->depends) { /* this flushes the L2 cache */ ret = qcow2_cache_flush_dependency(bs, c); } I had attached a reproducer that works with almost any cluster size and refcount block size. Cluster sizes below 4K don't work because commit which is done by the mirror job which works on 4K granularity, and that results in it not doing any discards due to various alignment restrictions. If I patch qemu to make mirror job work on 512B granularity, test reproduces for small clusters as well. The reproducer creates a qcow2 image in the current directory and it needs about 11G for default parameters. (64K cluster size, 16 bit refcounts). For 4K cluster size and 64 bit refcounts, it needs only 11M. (This can be changed by editing the script) Best regards, Maxim Levitsky reproducer.sh Description: application/shellscript
Re: [PATCH 1/1] Fix qcow2 corruption on discard
On Tue, 2020-11-24 at 19:59 +0100, Alberto Garcia wrote: > On Tue 24 Nov 2020 10:17:23 AM CET, Kevin Wolf wrote: > > We can then continue work to find a minimal reproducer and merge the > > test case in the early 6.0 cycle. > > I haven't been able to reproduce the problem yet, do you have any > findings? > > Berto > I have a working reproducer script. I'll send it in a hour or so. Best regards, Maxim Levitsky
Re: [PATCH 1/1] Fix qcow2 corruption on discard
On Tue 24 Nov 2020 10:17:23 AM CET, Kevin Wolf wrote: > We can then continue work to find a minimal reproducer and merge the > test case in the early 6.0 cycle. I haven't been able to reproduce the problem yet, do you have any findings? Berto
Re: [PATCH v2 2/2] monitor: increase amount of data for monitor to read
24.11.2020 14:03, Vladimir Sementsov-Ogievskiy wrote: 23.11.2020 18:44, Andrey Shinkevich wrote: QMP and HMP monitors read one byte at a time from the socket or stdin, which is very inefficient. With 100+ VMs on the host, this results in multiple extra system calls and CPU overuse. This patch increases the amount of read data up to 4096 bytes that fits the buffer size on the channel level. A JSON little parser is introduced to throttle QMP commands read from the buffer so that incoming requests do not overflow the monitor input queue. Suggested-by: Denis V. Lunev Signed-off-by: Andrey Shinkevich Can't we just increase qmp queue instead? It seems a lot simpler: diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h index 348bfad3d5..7e721eee3f 100644 --- a/include/monitor/monitor.h +++ b/include/monitor/monitor.h @@ -8,7 +8,7 @@ typedef struct MonitorHMP MonitorHMP; typedef struct MonitorOptions MonitorOptions; -#define QMP_REQ_QUEUE_LEN_MAX 8 +#define QMP_REQ_QUEUE_LEN_MAX 4096 extern QemuOptsList qemu_mon_opts; diff --git a/monitor/monitor.c b/monitor/monitor.c index 84222cd130..1588f00306 100644 --- a/monitor/monitor.c +++ b/monitor/monitor.c @@ -566,7 +566,7 @@ int monitor_can_read(void *opaque) { Monitor *mon = opaque; - return !qatomic_mb_read(&mon->suspend_cnt); + return !qatomic_mb_read(&mon->suspend_cnt) ? 4096 : 0; } - with this patch tests pass and performance is even better. Suddenly I found, that this patch ^^ was sent a year ago: https://patchew.org/QEMU/20190610105906.28524-1-dplotni...@virtuozzo.com/ some questions were asked, so I think we should start from it. -- Best regards, Vladimir
[PULL 0/1] Block layer patches for 5.2.0-rc3
The following changes since commit 23895cbd82be95428e90168b12e925d0d3ca2f06: Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20201123.0' into staging (2020-11-23 18:51:13 +) are available in the Git repository at: git://repo.or.cz/qemu/kevin.git tags/for-upstream for you to fetch changes up to c8bf9a9169db094aaed680cdba570758c0dc18b9: qcow2: Fix corruption on write_zeroes with MAY_UNMAP (2020-11-24 11:29:41 +0100) Patches for 5.2.0-rc3: - qcow2: Fix corruption on write_zeroes with MAY_UNMAP Maxim Levitsky (1): qcow2: Fix corruption on write_zeroes with MAY_UNMAP block/qcow2-cluster.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-)
[PULL 1/1] qcow2: Fix corruption on write_zeroes with MAY_UNMAP
From: Maxim Levitsky Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()") introduced a subtle change to code in zero_in_l2_slice: It swapped the order of 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); To 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); It seems harmless, however the call to qcow2_free_any_clusters can trigger a cache flush which can mark the L2 table as clean, and assuming that this was the last write to it, a stale version of it will remain on the disk. Now we have a valid L2 entry pointing to a freed cluster. Oops. Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()") Signed-off-by: Maxim Levitsky [ kwolf: Fixed to restore the correct original order from before 205fa50750; added comments like in discard_in_l2_slice(). ] Signed-off-by: Kevin Wolf Message-Id: <20201124092815.39056-1-kw...@redhat.com> Reviewed-by: Alberto Garcia Signed-off-by: Kevin Wolf --- block/qcow2-cluster.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 485b4cb92e..bd0597842f 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -2010,14 +2010,17 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset, continue; } +/* First update L2 entries */ qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); -if (unmap) { -qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST); -} set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry); if (has_subclusters(s)) { set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap); } + +/* Then decrease the refcount */ +if (unmap) { +qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST); +} } qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice); -- 2.28.0
RE: [raw] Guest stuck during live live-migration
Thanks Kevin, > > Hello, > > > > In our company, we are hosting a large number of Vm, hosted behind > > Openstack (so libvirt/qemu). > > A large majority of our Vms are runnign with local data only, stored on > > NVME, and most of them are RAW disks. > > > > With Qemu 4.0 (can be even with older version) we see strange > > live-migration comportement: > First of all, 4.0 is relatively old. Generally it is worth retrying with > the most recent code (git master or 5.2.0-rc2) before having a closer > look at problems, because it is frustrating to spend considerable time > debugging an issue and then find out it has already been fixed a year > ago. > I will try to build it the most recent code I will try to build with the most recent code, but it will take me some time to do it > > - some Vms live migrate at very high speed without issue (> 6 Gbps) > > - some Vms are running correctly, but migrating at a strange low speed > > (3Gbps) > > - some Vms are migrating at a very low speed (1Gbps, sometime less) and > > during the migration the guest is completely I/O stuck > > > > When this issue happen the VM is completly block, iostat in the Vm show us > > a latency of 30 secs > Can you get the stack backtraces of all QEMU threads while the VM is > blocked (e.g. with gdb or pstack)? (gdb) thread apply all bt Thread 20 (Thread 0x7f8a0effd700 (LWP 201248)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x56520139878b in qemu_cond_wait_impl (cond=0x5652020f27b0, mutex=0x5652020f27e8, file=0x5652014e4178 "/root/qemu_debug_LSEEK/qemu_debug/qemu/ui/vnc-jobs.c", line=214) at /root/qemu_debug_LSEEK/qemu_debug/qemu/util/qemu-thread-posix.c:161 #2 0x5652012a264d in vnc_worker_thread_loop (queue=queue@entry=0x5652020f27b0) at /root/qemu_debug_LSEEK/qemu_debug/qemu/ui/vnc-jobs.c:214 #3 0x5652012a2c18 in vnc_worker_thread (arg=arg@entry=0x5652020f27b0) at /root/qemu_debug_LSEEK/qemu_debug/qemu/ui/vnc-jobs.c:324 #4 0x565201398116 in qemu_thread_start (args=) at /root/qemu_debug_LSEEK/qemu_debug/qemu/util/qemu-thread-posix.c:502 #5 0x7f8a5e24a6ba in start_thread (arg=0x7f8a0effd700) at pthread_create.c:333 #6 0x7f8a5df8041d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 19 (Thread 0x7f8a0700 (LWP 201222)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x7f8a5e24cdbd in __GI___pthread_mutex_lock (mutex=mutex@entry=0x565201adb680 ) at ../nptl/pthread_mutex_lock.c:80 #2 0x565201398263 in qemu_mutex_lock_impl (mutex=0x565201adb680 , file=0x5652013d7c68 "/root/qemu_debug_LSEEK/qemu_debug/qemu/accel/kvm/kvm-all.c", line=2089) at /root/qemu_debug_LSEEK/qemu_debug/qemu/util/qemu-thread-posix.c:66 #3 0x565200f7d00e in qemu_mutex_lock_iothread_impl (file=file@entry=0x5652013d7c68 "/root/qemu_debug_LSEEK/qemu_debug/qemu/accel/kvm/kvm-all.c", line=line@entry=2089) at /root/qemu_debug_LSEEK/qemu_debug/qemu/cpus.c:1850 #4 0x565200fa7ca8 in kvm_cpu_exec (cpu=cpu@entry=0x565202354480) at /root/qemu_debug_LSEEK/qemu_debug/qemu/accel/kvm/kvm-all.c:2089 #5 0x565200f7d1ce in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x565202354480) at /root/qemu_debug_LSEEK/qemu_debug/qemu/cpus.c:1281 #6 0x565201398116 in qemu_thread_start (args=) at /root/qemu_debug_LSEEK/qemu_debug/qemu/util/qemu-thread-posix.c:502 #7 0x7f8a5e24a6ba in start_thread (arg=0x7f8a0700) at pthread_create.c:333 #8 0x7f8a5df8041d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 18 (Thread 0x7f8a2
Re: [PATCH v2 1/2] iotests: add another bash sleep command to 247
On 23.11.2020 18:44, Andrey Shinkevich wrote: This patch paves the way for the one that follows. The following patch makes the QMP monitor to read up to 4K from stdin at once. That results in running the bash 'sleep' command before the _qemu_proc_exec() starts in subshell. Another 'sleep' command with an unobtrusive 'query-status' plays as a workaround. Signed-off-by: Andrey Shinkevich --- tests/qemu-iotests/247 | 2 ++ tests/qemu-iotests/247.out | 1 + 2 files changed, 3 insertions(+) [...] With the patch 2/2 of the current version 2, the test case #247 passes without this patch 1/2. So, it may be excluded from the series. Thanks to Vladimir for the idea to check. Andrey
Re: [PATCH for-5.2 05/10] vhost-user-blk-test: close fork child file descriptors
Hi Stefan, On Wed, Nov 11, 2020 at 12:43:26PM +, Stefan Hajnoczi wrote: Do not leave stdin and stdout open after fork. stdout is the tap-driver.pl pipe. If we keep the pipe open then tap-driver.pl will not detect that qos-test has terminated and it will hang. I wonder under which situation this would happen. I couldn't re-produce this issue locally. Signed-off-by: Stefan Hajnoczi --- tests/qtest/vhost-user-blk-test.c | 9 + 1 file changed, 9 insertions(+) diff --git a/tests/qtest/vhost-user-blk-test.c b/tests/qtest/vhost-user-blk-test.c index f05f14c192..4019a72ac0 100644 --- a/tests/qtest/vhost-user-blk-test.c +++ b/tests/qtest/vhost-user-blk-test.c @@ -749,6 +749,15 @@ static char *start_vhost_user_blk(GString *cmd_line, int vus_instances, storage_daemon_command->str); pid_t pid = fork(); if (pid == 0) { +/* + * Close standard file descriptors so tap-driver.pl pipe detects when + * our parent terminates. + */ +close(0); +close(1); +open("/dev/null", O_RDONLY); +open("/dev/null", O_WRONLY); + execlp("/bin/sh", "sh", "-c", storage_daemon_command->str, NULL); exit(1); } -- 2.28.0 -- Best regards, Coiby
Re: [PATCH v2 1/2] iotests: add another bash sleep command to 247
On 24.11.2020 13:04, Vladimir Sementsov-Ogievskiy wrote: 23.11.2020 18:44, Andrey Shinkevich wrote: This patch paves the way for the one that follows. The following patch makes the QMP monitor to read up to 4K from stdin at once. That results in running the bash 'sleep' command before the _qemu_proc_exec() starts But how? Before _qemu_proc_exec() starts, qemu monitor is not runnning, and its new behavior can't influence.. I am not a bash expert to explain 'how' but this workaround works. It's just a test. Maybe other colleagues can say. If bash subshell work in unpredictable way, may be better is refactor test to send commands one by one with help of _send_qemu_cmd. Then sleep will be natively executed between sending commands. Or maybe write a similar test case in Python if Kevin agrees. in subshell. Another 'sleep' command with an unobtrusive 'query-status' plays as a workaround. Signed-off-by: Andrey Shinkevich --- tests/qemu-iotests/247 | 2 ++ tests/qemu-iotests/247.out | 1 + 2 files changed, 3 insertions(+) diff --git a/tests/qemu-iotests/247 b/tests/qemu-iotests/247 index 87e37b3..7d316ec 100755 --- a/tests/qemu-iotests/247 +++ b/tests/qemu-iotests/247 @@ -59,6 +59,8 @@ TEST_IMG="$TEST_IMG.4" _make_test_img $size {"execute":"block-commit", "arguments":{"device":"format-4", "top-node": "format-2", "base-node":"format-0", "job-id":"job0"}} EOF +sleep 1 +echo '{"execute":"query-status"}' if [ "${VALGRIND_QEMU}" == "y" ]; then sleep 10 else diff --git a/tests/qemu-iotests/247.out b/tests/qemu-iotests/247.out index e909e83..13d9547 100644 --- a/tests/qemu-iotests/247.out +++ b/tests/qemu-iotests/247.out @@ -17,6 +17,7 @@ QMP_VERSION {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", "len": 134217728, "offset": 134217728, "speed": 0, "type": "commit"}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job0"}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job0"}} +{"return": {"status": "running", "singlestep": false, "running": true}} {"return": {}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}} *** done
Re: [PATCH v2 2/2] monitor: increase amount of data for monitor to read
23.11.2020 18:44, Andrey Shinkevich wrote: QMP and HMP monitors read one byte at a time from the socket or stdin, which is very inefficient. With 100+ VMs on the host, this results in multiple extra system calls and CPU overuse. This patch increases the amount of read data up to 4096 bytes that fits the buffer size on the channel level. A JSON little parser is introduced to throttle QMP commands read from the buffer so that incoming requests do not overflow the monitor input queue. Suggested-by: Denis V. Lunev Signed-off-by: Andrey Shinkevich Can't we just increase qmp queue instead? It seems a lot simpler: diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h index 348bfad3d5..7e721eee3f 100644 --- a/include/monitor/monitor.h +++ b/include/monitor/monitor.h @@ -8,7 +8,7 @@ typedef struct MonitorHMP MonitorHMP; typedef struct MonitorOptions MonitorOptions; -#define QMP_REQ_QUEUE_LEN_MAX 8 +#define QMP_REQ_QUEUE_LEN_MAX 4096 extern QemuOptsList qemu_mon_opts; diff --git a/monitor/monitor.c b/monitor/monitor.c index 84222cd130..1588f00306 100644 --- a/monitor/monitor.c +++ b/monitor/monitor.c @@ -566,7 +566,7 @@ int monitor_can_read(void *opaque) { Monitor *mon = opaque; -return !qatomic_mb_read(&mon->suspend_cnt); +return !qatomic_mb_read(&mon->suspend_cnt) ? 4096 : 0; } - with this patch tests pass and performance is even better. -- Best regards, Vladimir
Re: [PATCH for-5.2 v2] qcow2: Fix corruption on write_zeroes with MAY_UNMAP
On Tue 24 Nov 2020 10:28:15 AM CET, Kevin Wolf wrote: > From: Maxim Levitsky > > Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()") > introduced a subtle change to code in zero_in_l2_slice: > > It swapped the order of > > 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); > 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); > 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); > > To > > 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); > 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); > 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); > > It seems harmless, however the call to qcow2_free_any_clusters can > trigger a cache flush which can mark the L2 table as clean, and > assuming that this was the last write to it, a stale version of it > will remain on the disk. > > Now we have a valid L2 entry pointing to a freed cluster. Oops. > > Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()") > Signed-off-by: Maxim Levitsky > [ kwolf: Fixed to restore the correct original order from before > 205fa50750; added comments like in discard_in_l2_slice(). ] Reviewed-by: Alberto Garcia Berto
[PATCH] block: Don't inactivate bs if it is aleady inactive
The following steps will cause qemu assertion failure: - pause vm - save memory snapshot into local file through fd migration - do the above operation again will cause qemu assertion failure The backtrace looks like: #0 0x7fbf958c5c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x7fbf958c9028 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x7fbf958bebf6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x7fbf958beca2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x55ca8decd39d in bdrv_inactivate_recurse (bs=0x55ca90c80400) at /build/qemu-5.0/block.c:5724 #5 0x55ca8dece967 in bdrv_inactivate_all () at /build//qemu-5.0/block.c:5792 #6 0x55ca8de5539d in qemu_savevm_state_complete_precopy_non_iterable (inactivate_disks=true, in_postcopy=false, f=0x55ca907044b0) at /build/qemu-5.0/migration/savevm.c:1401 #7 qemu_savevm_state_complete_precopy (f=0x55ca907044b0, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at /build/qemu-5.0/migration/savevm.c:1453 #8 0x55ca8de4f581 in migration_completion (s=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:2941 #9 migration_iteration_run (s=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:3295 #10 migration_thread (opaque=opaque@entry=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:3459 #11 0x55ca8dfc6716 in qemu_thread_start (args=) at /build/qemu-5.0/util/qemu-thread-posix.c:519 #12 0x7fbf95c5f184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #13 0x7fbf9598cbed in clone () from /lib/x86_64-linux-gnu/libc.so.6 When the first migration completes, bs->open_flags will set BDRV_O_INACTIVE flag by bdrv_inactivate_recurse(), and during the second migration the bdrv_inactivate_recurse assert that the bs->open_flags is already BDRV_O_INACTIVE enabled which cause crash. This patch just make the bdrv_inactivate_all() to don't inactivate the bs if it is already inactive Signed-off-by: Tuguoyi --- block.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index f1cedac..02361e1 100644 --- a/block.c +++ b/block.c @@ -5938,6 +5938,11 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs) return 0; } +static bool bdrv_is_inactive(BlockDriverState *bs) +{ +return bs->open_flags & BDRV_O_INACTIVE; +} + int bdrv_inactivate_all(void) { BlockDriverState *bs = NULL; @@ -5958,7 +5963,7 @@ int bdrv_inactivate_all(void) /* Nodes with BDS parents are covered by recursion from the last * parent that gets inactivated. Don't inactivate them a second * time if that has already happened. */ -if (bdrv_has_bds_parent(bs, false)) { +if (bdrv_has_bds_parent(bs, false) || bdrv_is_inactive(bs)) { continue; } ret = bdrv_inactivate_recurse(bs); -- 2.7.4 -- Best regards, Guoyi
Re: [PATCH v2 1/2] iotests: add another bash sleep command to 247
23.11.2020 18:44, Andrey Shinkevich wrote: This patch paves the way for the one that follows. The following patch makes the QMP monitor to read up to 4K from stdin at once. That results in running the bash 'sleep' command before the _qemu_proc_exec() starts But how? Before _qemu_proc_exec() starts, qemu monitor is not runnning, and its new behavior can't influence.. If bash subshell work in unpredictable way, may be better is refactor test to send commands one by one with help of _send_qemu_cmd. Then sleep will be natively executed between sending commands. in subshell. Another 'sleep' command with an unobtrusive 'query-status' plays as a workaround. Signed-off-by: Andrey Shinkevich --- tests/qemu-iotests/247 | 2 ++ tests/qemu-iotests/247.out | 1 + 2 files changed, 3 insertions(+) diff --git a/tests/qemu-iotests/247 b/tests/qemu-iotests/247 index 87e37b3..7d316ec 100755 --- a/tests/qemu-iotests/247 +++ b/tests/qemu-iotests/247 @@ -59,6 +59,8 @@ TEST_IMG="$TEST_IMG.4" _make_test_img $size {"execute":"block-commit", "arguments":{"device":"format-4", "top-node": "format-2", "base-node":"format-0", "job-id":"job0"}} EOF +sleep 1 +echo '{"execute":"query-status"}' if [ "${VALGRIND_QEMU}" == "y" ]; then sleep 10 else diff --git a/tests/qemu-iotests/247.out b/tests/qemu-iotests/247.out index e909e83..13d9547 100644 --- a/tests/qemu-iotests/247.out +++ b/tests/qemu-iotests/247.out @@ -17,6 +17,7 @@ QMP_VERSION {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", "len": 134217728, "offset": 134217728, "speed": 0, "type": "commit"}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job0"}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job0"}} +{"return": {"status": "running", "singlestep": false, "running": true}} {"return": {}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}} *** done -- Best regards, Vladimir
Re: [PATCH 04/21] block: bdrv_refresh_perms: check parents compliance
23.11.2020 23:12, Vladimir Sementsov-Ogievskiy wrote: Add additional check that node parents do not interfere with each other. This should not hurt existing callers and allows in further patch use bdrv_refresh_perms() to update a subtree of changed BdrvChild (check that change is correct). New check will substitute bdrv_check_update_perm() in following permissions refactoring, so keep error messages the same to avoid unit test result changes. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block.c | 62 - 1 file changed, 53 insertions(+), 9 deletions(-) diff --git a/block.c b/block.c index 0dd28f0902..0d0f065db4 100644 --- a/block.c +++ b/block.c @@ -1945,6 +1945,56 @@ bool bdrv_is_writable(BlockDriverState *bs) return bdrv_is_writable_after_reopen(bs, NULL); } +static char *bdrv_child_user_desc(BdrvChild *c) +{ +if (c->klass->get_parent_desc) { +return c->klass->get_parent_desc(c); +} + +return g_strdup("another user"); +} + +static bool bdrv_a_allow_b(BdrvChild *a, BdrvChild *b, Error **errp) +{ +g_autofree char *user = NULL; +g_autofree char *perm_names = NULL; + +if ((b->perm & a->shared_perm) == b->perm) { +return true; +} + +perm_names = bdrv_perm_names(b->perm & ~a->shared_perm); +user = bdrv_child_user_desc(a); +error_setg(errp, "Conflicts with use by %s as '%s', which does not " + "allow '%s' on %s", + user, a->name, perm_names, bdrv_get_node_name(b->bs)); + +return false; +} + +static bool bdrv_check_parents_compliance(BlockDriverState *bs, Error **errp) +{ +BdrvChild *a, *b; + +QLIST_FOREACH(a, &bs->parents, next_parent) { +QLIST_FOREACH(b, &bs->parents, next_parent) { +if (a == b) { +continue; +} + +if (!bdrv_a_allow_b(a, b, errp)) { +return false; +} + +if (!bdrv_a_allow_b(b, a, errp)) { +return false; +} drop this if. We look at each pair twice anyway. -- Best regards, Vladimir
Re: [PATCH 2/2] block: assert that permission commit sets same permissions
23.11.2020 23:12, Vladimir Sementsov-Ogievskiy wrote: On permission update commit we must set same permissions as on_check_. Let's add assertions. Next step may be to drop permission parameters from_set_. Note that prior to previous commit, fixing bdrv_drop_intermediate(), new assertion in bdrv_child_set_perm() crashes on iotests 30 and 40. Signed-off-by: Vladimir Sementsov-Ogievskiy this is accidental patch, ignore it -- Best regards, Vladimir
Re: [PATCH 1/2] block: make bdrv_drop_intermediate() less wrong
23.11.2020 23:12, Vladimir Sementsov-Ogievskiy wrote: First, permission update loop tries to do iterations transactionally, but the whole update is not transactional: nobody roll-back successful loop iterations when some iteration fails. Second, in the iteration we have nested permission update: c->klass->update_filename may point to bdrv_child_cb_update_filename() which calls bdrv_backing_update_filename(), which may do node reopen to RW. Permission update system is not prepared to nested updates, at least it has intermediate permission-update state stored in BdrvChild structures: has_backup_perm, backup_perm and backup_shared_perm. So, let's first do bdrv_replace_node() (which is more transactional than open-coded update in bdrv_drop_intermediate()) and then call update_filename() in separate. We still do not rollback changes in case of update_filename() failure but it's not much worse than pre-patch behavior. Signed-off-by: Vladimir Sementsov-Ogievskiy this is accidental patch, ignore it -- Best regards, Vladimir
[PATCH for-5.2 v2] qcow2: Fix corruption on write_zeroes with MAY_UNMAP
From: Maxim Levitsky Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()") introduced a subtle change to code in zero_in_l2_slice: It swapped the order of 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); To 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); It seems harmless, however the call to qcow2_free_any_clusters can trigger a cache flush which can mark the L2 table as clean, and assuming that this was the last write to it, a stale version of it will remain on the disk. Now we have a valid L2 entry pointing to a freed cluster. Oops. Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()") Signed-off-by: Maxim Levitsky [ kwolf: Fixed to restore the correct original order from before 205fa50750; added comments like in discard_in_l2_slice(). ] Signed-off-by: Kevin Wolf --- block/qcow2-cluster.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 485b4cb92e..bd0597842f 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -2010,14 +2010,17 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset, continue; } +/* First update L2 entries */ qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); -if (unmap) { -qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST); -} set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry); if (has_subclusters(s)) { set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap); } + +/* Then decrease the refcount */ +if (unmap) { +qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST); +} } qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice); -- 2.28.0
Re: [PATCH 1/1] Fix qcow2 corruption on discard
Am 23.11.2020 um 19:11 hat Maxim Levitsky geschrieben: > On Mon, 2020-11-23 at 18:38 +0100, Kevin Wolf wrote: > > Am 23.11.2020 um 16:49 hat Maxim Levitsky geschrieben: > > > Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()") > > > introduced a subtle change to code in zero_in_l2_slice: > > > > > > It swapped the order of > > > > > > 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); > > > 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); > > > 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); > > > > > > To > > > > > > 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); > > > 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); > > > 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); > > > > > > It seems harmless, however the call to qcow2_free_any_clusters > > > can trigger a cache flush which can mark the L2 table as clean, > > > and assuming that this was the last write to it, > > > a stale version of it will remain on the disk. > > > > Do you have more details on this last paragraph? I'm trying to come up > > with a reproducer, but I don't see how qcow2_free_any_clusters() could > > flush the L2 table cache. (It's easy to get it to flush the refcount > > block cache, but that's useless for a reproducer.) > > > > The only way I see to flush any cache with it is in update_refcount() > > the qcow2_cache_set_dependency() call. This will always flush the cache > > that the L2 cache depends on - which will never be the L2 cache itself, > > but always either the refcount cache or nothing. > > > > There are more options in alloc_refcount_block() if we're allocating a > > new refcount block, but in the context of freeing clusters we'll never > > need to do that. > > > > Whatever I tried, at the end of zero_in_l2_slice(), I have a dirty L2 > > table and a dirty refcount block in the cache, with a dependency that > > makes sure that the L2 table will be written out first. > > > > If you don't have the information yet, can you try to debug your manual > > reproducer a bit more to find out how this happens? > I'll do this tomorrow. As the last RC for 5.2 is today, I will send a v2 that changes the fix to restore the original order. We can then continue work to find a minimal reproducer and merge the test case in the early 6.0 cycle. Kevin