On 4/3/2024 5:39, Peter Xu wrote:>>>>>
>>>>> I'm not 100% sure such thing (no matter here or moved into it, which
>>>>> does look cleaner) would work for us.
>>>>>
>>>>> The problem is I still don't yet see an ordering restricted on top of
>>>>> (1)
>>>>> accept() happens, and (2) receive LISTEN cmd here.  What happens if
>>>>> the
>>>>> accept() request is not yet received when reaching LISTEN?  Or is it
>>>>> always guaranteed the accept(fd) will always be polled here?
>>>>>
>>>>> For example, the source QEMU (no matter pre-7.2 or later) will always
>>>>> setup the preempt channel asynchrounously, then IIUC it can connect()
>>>>> after sending the whole chunk of packed data which should include this
>>>>> LISTEN.  I think it means it's not guaranteed this will 100% work, but
>>>>> maybe further reduce the possibility of the race.
>>>>
>>>> I think the following code:
>>>>
>>>> postcopy_start() ->
>>>>    postcopy_preempt_establish_channel() ->
>>>>            qemu_sem_wait(&s->postcopy_qemufile_src_sem);
>>>>
>>>> can guarantee that the connect() syscall is successful so the destination 
>>>> side
>>>> receives the connect() request before it loads the LISTEN command, 
>>>> otherwise
>>>> it won't post the sem:
>>>>
>>>> postcopy_preempt_send_channel_new() ->
>>>>    postcopy_preempt_send_channel_done() ->
>>>>                    qemu_sem_post(&s->postcopy_qemufile_src_sem);
>>>>
>>>
>>> Yes. But as mentioned in another thread, connect() and accept() are async.
>>> So in theory accept() could still come later after the LISTEN command.
>>
>> IIUC accept() is the callback and will be triggered by the connect() event.
>>
>> The reason accept() is not called in the destination is the main loop doesn't
>> get a chance to handle other events (connect()), so if we can guarantee
>> connect() is before LISTEN, then when handling the LISTEN cmd, we yield to 
>> the
>> main loop and the connect() event is there, then we can handle it by calling 
>> the
>> accept():
>>
>> qio_net_listener_channel_func
>>      qio_channel_socket_accept
>>              qemu_accept
>>                      accept
>>
>> so it seems the case accept() comes alter after LISTEN is in our expectation?
> 
> The major thing uncertain to me is "accept() will return with a valid fd"
> on dest host is not guaranteed to order against anything.
> 
> For example, I won't be surprised if a kernel implementation provides an
> async model of "accept()" syscall, so that even if the other side returned
> with "connect()", the "accept()" can still fetch nothing if the async model
> will need a delay for the new channel to be finally delivered to the
> "accept()" thread context.  It just sounds like tricky to rely on such
> thing.
> 
> What I proposed below shouldn't rely on any kernel details on how accept()
> could be implemented, it simply waits for the fd to be created before doing
> anything else (including creating the preempt thread and process packed
> data).

Thanks for the detailed explanation!

> 
>>
>>>
>>>>>
>>>>> One right fix that I can think of is moving the sem_wait(&done) into
>>>>> the main thread too, so we wait for the sem _before_ reading the
>>>>> packed data, so there's no chance of fault.  However I don't think
>>>>> sem_wait() will be smart enough to yield when in a coroutine..  In the
>>>>> long term run I think we should really make migration loadvm to do
>>>>> work in the thread rather than the main thread.  I think it means we
>>>>> have one more example to be listed in this todo so that's preferred..
>>>>>
>>>>> https://wiki.qemu.org/ToDo/LiveMigration#Create_a_thread_for_migration
>>>>> _destination
>>>>>
>>>>> I attached such draft patch below, but I'm not sure it'll work.  Let
>>>>> me know how both of you think about it.
>>>>
>>>> Sadly it doesn't work, there is an unknown segfault.
> 
> Could you paste the stack of the segfault ("(gdb) thread apply all bt")?
> Or help to figure out what is wrong?
> 
> Since I cannot reproduce myself, I may need your help debugging this.  If
> you agree with what I said above and agree on such fix, please also feel
> free to go ahead and fix the segfault.

We should change the following line from

        while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done, 100)) {

to

        while (qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done, 100)) {

After that fix, test passed and no segfault.

Given that the test shows a yield to the main loop won't introduce much overhead
(<1ms), how about first yield unconditionally, then we enter the while loop to
wait for several ms and yield periodically?

Reply via email to