Peter Xu <pet...@redhat.com> writes: > On Thu, Feb 06, 2025 at 02:32:12PM -0300, Fabiano Rosas wrote: >> > In any case we'd still need some kind of a compatibility behavior for >> > the TLS bit stream emitted by older QEMU versions (which is always >> > improperly terminated). >> > >> >> There is no compat issue. For <= 9.2, QEMU is still doing an extra >> multifd_send_sync_main(), which results in an extra MULTIFD_FLAG_SYNC on >> the destination and it gets stuck waiting for the >> RAM_SAVE_FLAG_MULTIFD_FLUSH that never comes. Therefore the src always >> closes the connection before dst reaches the extra recv(). >> >> I test migration both ways with 2 previous QEMU versions and the >> gnutls_bye() series passes all tests. I also put an assert at >> tlssession.c and never triggers for GNUTLS_E_PREMATURE_TERMINATION. The >> MULTIFD_FLAG_EOS should behave the same. > > Which are the versions you tried? As only 9.1 and 9.2 has 637280aeb2, so I > wonder if the same issue would hit too with 9.0 or older.
Good point. 9.0 indeed breaks. > > I'd confess I feel unreliable relying on the side effect of 637280aeb2, > because fundamentally it works based on the fact that multifd threads need > to be kicked out by the main load thread SYNC event on dest QEMU to avoid > the readv() from going wrong. > We're relying on the opposite: mutlifd_recv NOT getting kicked. Which is a bug that 1d457daf86 fixed. > What I'm not sure here is, is it sheer luck that the main channel SYNC will > always arrive _before_ pre-mature terminations of the multifd channels? It > sounds like it could also happen when the multifd channels got its > pre-mature termination early, before the main thread got the SYNC. You lost me here, what main channel sync? Its the MULTIFD_FLAG_SYNC that puts the recv thread in the "won't see the termination" state and that is serialized: SEND RECV -------------------------+---------------------------- 1 multifd_send_sync_main() 2 pending_sync==true, 3 send thread sends SYNC recv thread gets SYNC 4 <some work> recv gets stuck. 5 multifd_send_shutdown() <time passes> 6 shutdown() multifd_recv_shutdown() recv_terminate_threads() recv exits without recv() In other words, RECV would need to see the shutdown (6) before the SYNC (3), which I don't think it possible. > > Maybe we still need a compat property at the end.. This is actually similar to preempt_pre_7_2, what about: /* * This variable only makes sense when set on the machine that is * the destination of a multifd migration with TLS enabled. It * affects the behavior of the last send->recv iteration with * regards to termination of the TLS session. Defaults to true. * * When set: * * - the destination QEMU instance can expect to never get a * GNUTLS_E_PREMATURE_TERMINATION error. Manifested as the error * message: "The TLS connection was non-properly terminated". * * When clear: * * - the destination QEMU instance can expect to see a * GNUTLS_E_PREMATURE_TERMINATION error in any multifd channel * whenever the last recv() call of that channel happens after * the source QEMU instance has already issued shutdown() on the * channel. This is affected by (at least) commits 637280aeb2 * and 1d457daf86. * * NOTE: Regardless of the state of this option, a premature * termination of the TLS connection might happen due to error at * any moment prior to the last send->recv iteration. */ bool multifd_clean_tls_termination; And I think the more straight-forward implementation is to incorporate Maciej's premature_ok patches (in some form), otherwise that option will have to take effect on the QIOChannel which is a layering violation.