Re: [PATCH] migration: for snapshots, hold the BQL during setup callbacks

2023-05-23 Thread Fiona Ebner
Am 10.05.23 um 08:31 schrieb Juan Quintela:
> I am more towards revert completely
> 9b0950375277467fd74a9075624477ae43b9bb22
> 
> and call it a day.  On migration we don't use coroutines on the sending
> side (I mean the migration code, the block layer uses coroutines for
> everything/anything).
> 

So I was preparing v2 for this patch, also taking the BQL during setup
for migration to avoid the conditional locking. But it turns out that
the original issue (i.e. snapshot code running in the vCPU thread) is
not solved completely. (And this is not only the issue with the
virito-net I ran into the other time [0])

This time I ran it without the net device and ended up with a hang [1].
I think what happens is the following, but am not sure:

vCPU thread
calls vm_stop(RUN_STATE_SAVE_VM)

main thread
calls pause_all_vcpus() and starts waiting

vCPU thread
finished doing the snapshot and calls vm_start()

Now the main thread will never progress, because all_vcpus_paused() was
never true.

Not sure what can be done about that and not sure if there are other
code paths were a vCPU thread could trigger vm_stop() followed by
vm_start() without stopping itself in between?

Best Regards,
Fiona

[0]: https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg00140.html

[1]:

> Thread 1 (Thread 0x7359e2c0 (LWP 121263) "qemu-system-x86"):
> #0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, 
> op=393, expected=0, futex_word=0x56a1f868 ) at 
> ./nptl/futex-internal.c:57
> #1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x56a1f868 
> , expected=expected@entry=0, clockid=clockid@entry=0, 
> abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) 
> at ./nptl/futex-internal.c:87
> #2  0x762ded9b in __GI___futex_abstimed_wait_cancelable64 
> (futex_word=futex_word@entry=0x56a1f868 , 
> expected=expected@entry=0, clockid=clockid@entry=0, 
> abstime=abstime@entry=0x0, private=private@entry=0) at 
> ./nptl/futex-internal.c:139
> #3  0x762e13f8 in __pthread_cond_wait_common (abstime=0x0, clockid=0, 
> mutex=0x56a1f7a0 , cond=0x56a1f840 
> ) at ./nptl/pthread_cond_wait.c:503
> #4  ___pthread_cond_wait (cond=0x56a1f840 , 
> mutex=0x56a1f7a0 ) at ./nptl/pthread_cond_wait.c:618
> #5  0x5605ef9d in qemu_cond_wait_impl (cond=0x56a1f840 
> , mutex=0x56a1f7a0 , 
> file=0x561d483b "../softmmu/cpus.c", line=573) at 
> ../util/qemu-thread-posix.c:225
> #6  0x55b36891 in pause_all_vcpus () at ../softmmu/cpus.c:573
> #7  0x55b35ef3 in do_vm_stop (state=RUN_STATE_SAVE_VM, 
> send_stop=true) at ../softmmu/cpus.c:262
> #8  0x55b36cab in vm_stop (state=RUN_STATE_SAVE_VM) at 
> ../softmmu/cpus.c:676
> #9  0x55b443ed in main_loop_should_exit (status=0x7fffd944) at 
> ../softmmu/runstate.c:723
> #10 0x55b4443e in qemu_main_loop () at ../softmmu/runstate.c:735
> #11 0x55e57ec3 in qemu_default_main () at ../softmmu/main.c:37
> #12 0x55e57ef9 in main (argc=56, argv=0x7fffdaa8) at 
> ../softmmu/main.c:48
> (gdb) bt
> #0  resume_all_vcpus () at ../softmmu/cpus.c:592
> #1  0x55b36d85 in vm_start () at ../softmmu/cpus.c:724
> #2  0x55b932ee in save_snapshot (name=0x7fff5c197340 "snap0", 
> overwrite=false, vmstate=0x7fff5c302ca0 "sata0", has_devices=true, 
> devices=0x7fff5c1968d0, errp=0x7fff5c39aec8) at ../migration/savevm.c:2992
> #3  0x55b93c49 in snapshot_save_job_bh (opaque=0x7fff5c39ae00) at 
> ../migration/savevm.c:3255
> #4  0x5607733f in aio_bh_call (bh=0x7fff5c1f9560) at 
> ../util/async.c:169
> #5  0x5607745a in aio_bh_poll (ctx=0x56b806b0) at 
> ../util/async.c:216
> #6  0x5605a95c in aio_poll (ctx=0x56b806b0, blocking=true) at 
> ../util/aio-posix.c:732
> #7  0x55ea7e70 in bdrv_poll_co (s=0x70ee7e50) at 
> /home/febner/repos/qemu/block/block-gen.h:43
> #8  0x55eaaa76 in blk_pwrite (blk=0x56df79a0, offset=189440, 
> bytes=512, buf=0x7fff6322e400, flags=0) at block/block-gen.c:1754
> #9  0x559077c9 in pflash_update (pfl=0x56dda2e0, offset=189440, 
> size=1) at ../hw/block/pflash_cfi01.c:394
> #10 0x55907ecd in pflash_write (pfl=0x56dda2e0, offset=189507, 
> value=114, width=1, be=0) at ../hw/block/pflash_cfi01.c:522
> #11 0x55908479 in pflash_mem_write_with_attrs (opaque=0x56dda2e0, 
> addr=189507, value=114, len=1, attrs=...) at ../hw/block/pflash_cfi01.c:681
> #12 0x55da3c17 in memory_region_write_with_attrs_accessor 
> (mr=0x56dda6a0, addr=189507, value=0x70ee8068, size=1, shift=0, 
> mask=255, attrs=...) at ../softmmu/memory.c:514
> #13 0x55da3e33 in access_with_adjusted_size (addr=189507, 
> value=0x70ee8068, size=1, access_size_min=1, access_size_max=4, 
> access_fn=0x55da3b1b , 
> mr=0x56dda6a0, attrs=...) at ../softmmu/memory.c:569
> #14 0x55da711b in memory_region_dispatch_write 

Re: [PATCH] migration: for snapshots, hold the BQL during setup callbacks

2023-05-17 Thread Peter Xu
On Wed, May 10, 2023 at 08:31:13AM +0200, Juan Quintela wrote:
> Peter Xu  wrote:
> 
> Hi
> 
> [Adding Kevin to the party]
> 
> > On Fri, May 05, 2023 at 03:46:52PM +0200, Fiona Ebner wrote:
> >> To fix it, ensure that the BQL is held during setup. To avoid changing
> >> the behavior for migration too, introduce conditionals for the setup
> >> callbacks that need the BQL and only take the lock if it's not already
> >> held.
> >
> > The major complexity of this patch is the "conditionally taking" part.
> 
> Yeap.
> 
> I don't want that bit.
> 
> This is another case of:
> - I have a problem
> - I will use recursive mutexes to solve it
> 
> Now you have two problems O:-)
> 
> > Pure question: what is the benefit of not holding BQL always during
> > save_setup(), if after all we have this coroutine issue to be solved?
> 
> Dunno.
> 
> I would like that paolo commented on this.  I "reviewed the code" 10
> years ago.  I don't remember at all why we wanted to change that.
> 
> > I can understand that it helps us to avoid taking BQL too long, but we'll
> > need to take it anyway during ramblock dirty track initializations, and so
> > far IIUC it's the major time to be consumed during setup().
> >
> > Commit message of 9b0950375277467 says, "Only the migration_bitmap_sync()
> > call needs the iothread lock". Firstly I think it's also covering
> > enablement of dirty tracking:
> >
> > +qemu_mutex_lock_iothread();
> > +qemu_mutex_lock_ramlist();
> > +bytes_transferred = 0;
> > +reset_ram_globals();
> > +
> >  memory_global_dirty_log_start();
> >  migration_bitmap_sync();
> > +qemu_mutex_unlock_iothread();
> >
> > And I think enablement itself can be slow too, maybe even slower than
> > migration_bitmap_sync() especially with KVM_DIRTY_LOG_INITIALLY_SET
> > supported in the kernel.
> >
> > Meanwhile I always got confused on why we need to sync dirty bitmap when
> > setup at all.  Say, what if we drop migration_bitmap_sync() here?  After
> > all, shouldn't all pages be dirty from start (ram_list_init_bitmaps())?
> 
> How do you convince KVM (or the other lists) to start doing dirty
> tracking?  Doing a bitmap sync.

I think memory_global_dirty_log_start() kicks off the tracking already.

Take KVM as example, normally the case is KVM_MEM_LOG_DIRTY_PAGES is set
there, then ioctl(KVM_SET_USER_MEMORY_REGION) will start dirty tracking for
all of the guest memory slots necessary (including wr-protect all pages).

KVM_DIRTY_LOG_INITIALLY_SET is slightly special, it'll skip that procedure
during ioctl(KVM_SET_USER_MEMORY_REGION), but that also means the kernel
assumed the userapp (QEMU) marked all pages dirty initially (always the
case for QEMU, I think..).  Hence in this case the sync doesn't help either
because we'll simply get no new dirty bits in this shot..

> 
> And yeap, probably there is a better way of doing it.
> 
> > This is slightly off-topic, but I'd like to know if someone can help
> > answer.
> >
> > My whole point is still questioning whether we can unconditionally take bql
> > during save_setup().
> 
> I agree with you.
> 
> > I could have missed something, though, where we want to do in setup() but
> > avoid holding BQL.  Help needed on figuring this out (and if there is, IMHO
> > it'll be worthwhile to put that into comment of save_setup() hook).
> 
> I am more towards revert completely
> 9b0950375277467fd74a9075624477ae43b9bb22
> 
> and call it a day.  On migration we don't use coroutines on the sending
> side (I mean the migration code, the block layer uses coroutines for
> everything/anything).
> 
> Paolo, Stefan any clues for not run setup with the BKL?
> 
> Later, Juan.
> 

-- 
Peter Xu




Re: [PATCH] migration: for snapshots, hold the BQL during setup callbacks

2023-05-10 Thread Juan Quintela
Peter Xu  wrote:

Hi

[Adding Kevin to the party]

> On Fri, May 05, 2023 at 03:46:52PM +0200, Fiona Ebner wrote:
>> To fix it, ensure that the BQL is held during setup. To avoid changing
>> the behavior for migration too, introduce conditionals for the setup
>> callbacks that need the BQL and only take the lock if it's not already
>> held.
>
> The major complexity of this patch is the "conditionally taking" part.

Yeap.

I don't want that bit.

This is another case of:
- I have a problem
- I will use recursive mutexes to solve it

Now you have two problems O:-)

> Pure question: what is the benefit of not holding BQL always during
> save_setup(), if after all we have this coroutine issue to be solved?

Dunno.

I would like that paolo commented on this.  I "reviewed the code" 10
years ago.  I don't remember at all why we wanted to change that.

> I can understand that it helps us to avoid taking BQL too long, but we'll
> need to take it anyway during ramblock dirty track initializations, and so
> far IIUC it's the major time to be consumed during setup().
>
> Commit message of 9b0950375277467 says, "Only the migration_bitmap_sync()
> call needs the iothread lock". Firstly I think it's also covering
> enablement of dirty tracking:
>
> +qemu_mutex_lock_iothread();
> +qemu_mutex_lock_ramlist();
> +bytes_transferred = 0;
> +reset_ram_globals();
> +
>  memory_global_dirty_log_start();
>  migration_bitmap_sync();
> +qemu_mutex_unlock_iothread();
>
> And I think enablement itself can be slow too, maybe even slower than
> migration_bitmap_sync() especially with KVM_DIRTY_LOG_INITIALLY_SET
> supported in the kernel.
>
> Meanwhile I always got confused on why we need to sync dirty bitmap when
> setup at all.  Say, what if we drop migration_bitmap_sync() here?  After
> all, shouldn't all pages be dirty from start (ram_list_init_bitmaps())?

How do you convince KVM (or the other lists) to start doing dirty
tracking?  Doing a bitmap sync.

And yeap, probably there is a better way of doing it.

> This is slightly off-topic, but I'd like to know if someone can help
> answer.
>
> My whole point is still questioning whether we can unconditionally take bql
> during save_setup().

I agree with you.

> I could have missed something, though, where we want to do in setup() but
> avoid holding BQL.  Help needed on figuring this out (and if there is, IMHO
> it'll be worthwhile to put that into comment of save_setup() hook).

I am more towards revert completely
9b0950375277467fd74a9075624477ae43b9bb22

and call it a day.  On migration we don't use coroutines on the sending
side (I mean the migration code, the block layer uses coroutines for
everything/anything).

Paolo, Stefan any clues for not run setup with the BKL?

Later, Juan.




Re: [PATCH] migration: for snapshots, hold the BQL during setup callbacks

2023-05-05 Thread Peter Xu
On Fri, May 05, 2023 at 03:46:52PM +0200, Fiona Ebner wrote:
> To fix it, ensure that the BQL is held during setup. To avoid changing
> the behavior for migration too, introduce conditionals for the setup
> callbacks that need the BQL and only take the lock if it's not already
> held.

The major complexity of this patch is the "conditionally taking" part.

Pure question: what is the benefit of not holding BQL always during
save_setup(), if after all we have this coroutine issue to be solved?

I can understand that it helps us to avoid taking BQL too long, but we'll
need to take it anyway during ramblock dirty track initializations, and so
far IIUC it's the major time to be consumed during setup().

Commit message of 9b0950375277467 says, "Only the migration_bitmap_sync()
call needs the iothread lock". Firstly I think it's also covering
enablement of dirty tracking:

+qemu_mutex_lock_iothread();
+qemu_mutex_lock_ramlist();
+bytes_transferred = 0;
+reset_ram_globals();
+
 memory_global_dirty_log_start();
 migration_bitmap_sync();
+qemu_mutex_unlock_iothread();

And I think enablement itself can be slow too, maybe even slower than
migration_bitmap_sync() especially with KVM_DIRTY_LOG_INITIALLY_SET
supported in the kernel.

Meanwhile I always got confused on why we need to sync dirty bitmap when
setup at all.  Say, what if we drop migration_bitmap_sync() here?  After
all, shouldn't all pages be dirty from start (ram_list_init_bitmaps())?

This is slightly off-topic, but I'd like to know if someone can help
answer.

My whole point is still questioning whether we can unconditionally take bql
during save_setup().

I could have missed something, though, where we want to do in setup() but
avoid holding BQL.  Help needed on figuring this out (and if there is, IMHO
it'll be worthwhile to put that into comment of save_setup() hook).

Thanks,

-- 
Peter Xu