Peter Xu <pet...@redhat.com> writes:

> On Thu, Feb 06, 2025 at 02:32:12PM -0300, Fabiano Rosas wrote:
>> > In any case we'd still need some kind of a compatibility behavior for
>> > the TLS bit stream emitted by older QEMU versions (which is always
>> > improperly terminated).
>> >
>> 
>> There is no compat issue. For <= 9.2, QEMU is still doing an extra
>> multifd_send_sync_main(), which results in an extra MULTIFD_FLAG_SYNC on
>> the destination and it gets stuck waiting for the
>> RAM_SAVE_FLAG_MULTIFD_FLUSH that never comes. Therefore the src always
>> closes the connection before dst reaches the extra recv().
>> 
>> I test migration both ways with 2 previous QEMU versions and the
>> gnutls_bye() series passes all tests. I also put an assert at
>> tlssession.c and never triggers for GNUTLS_E_PREMATURE_TERMINATION. The
>> MULTIFD_FLAG_EOS should behave the same.
>
> Which are the versions you tried?  As only 9.1 and 9.2 has 637280aeb2, so I
> wonder if the same issue would hit too with 9.0 or older.

Good point. 9.0 indeed breaks.

>
> I'd confess I feel unreliable relying on the side effect of 637280aeb2,
> because fundamentally it works based on the fact that multifd threads need
> to be kicked out by the main load thread SYNC event on dest QEMU to avoid
> the readv() from going wrong.
>

We're relying on the opposite: mutlifd_recv NOT getting kicked. Which is
a bug that 1d457daf86 fixed.

> What I'm not sure here is, is it sheer luck that the main channel SYNC will
> always arrive _before_ pre-mature terminations of the multifd channels?  It
> sounds like it could also happen when the multifd channels got its
> pre-mature termination early, before the main thread got the SYNC.

You lost me here, what main channel sync? Its the MULTIFD_FLAG_SYNC that
puts the recv thread in the "won't see the termination" state and that
is serialized:

   SEND                        RECV
   -------------------------+----------------------------
1  multifd_send_sync_main()
2  pending_sync==true,
3  send thread sends SYNC      recv thread gets SYNC
4  <some work>                 recv gets stuck.
5  multifd_send_shutdown()     <time passes>
6  shutdown()                  multifd_recv_shutdown()
                               recv_terminate_threads()
                               recv exits without recv()

In other words, RECV would need to see the shutdown (6) before the SYNC
(3), which I don't think it possible.

>
> Maybe we still need a compat property at the end..

This is actually similar to preempt_pre_7_2, what about:

    /*
     * This variable only makes sense when set on the machine that is
     * the destination of a multifd migration with TLS enabled. It
     * affects the behavior of the last send->recv iteration with
     * regards to termination of the TLS session. Defaults to true.
     *
     * When set:
     *
     * - the destination QEMU instance can expect to never get a
     *   GNUTLS_E_PREMATURE_TERMINATION error. Manifested as the error
     *   message: "The TLS connection was non-properly terminated".
     *
     * When clear:
     *
     * - the destination QEMU instance can expect to see a
     *   GNUTLS_E_PREMATURE_TERMINATION error in any multifd channel
     *   whenever the last recv() call of that channel happens after
     *   the source QEMU instance has already issued shutdown() on the
     *   channel. This is affected by (at least) commits 637280aeb2
     *   and 1d457daf86.
     *
     * NOTE: Regardless of the state of this option, a premature
     * termination of the TLS connection might happen due to error at
     * any moment prior to the last send->recv iteration.
     */
    bool multifd_clean_tls_termination;

And I think the more straight-forward implementation is to incorporate
Maciej's premature_ok patches (in some form), otherwise that option will
have to take effect on the QIOChannel which is a layering violation.

Reply via email to