On Mon, Feb 03, 2025 at 07:53:00PM +0100, Maciej S. Szmigiero wrote: > On 3.02.2025 19:20, Peter Xu wrote: > > On Thu, Jan 30, 2025 at 11:08:29AM +0100, Maciej S. Szmigiero wrote: > > > From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com> > > > > > > Multifd send channels are terminated by calling > > > qio_channel_shutdown(QIO_CHANNEL_SHUTDOWN_BOTH) in > > > multifd_send_terminate_threads(), which in the TLS case essentially > > > calls shutdown(SHUT_RDWR) on the underlying raw socket. > > > > > > Unfortunately, this does not terminate the TLS session properly and > > > the receive side sees this as a GNUTLS_E_PREMATURE_TERMINATION error. > > > > > > The only reason why this wasn't causing migration failures is because > > > the current migration code apparently does not check for migration > > > error being set after the end of the multifd receive process. > > > > > > However, this will change soon so the multifd receive code has to be > > > prepared to not return an error on such premature TLS session EOF. > > > Use the newly introduced QIOChannelTLS method for that. > > > > > > It's worth noting that even if the sender were to be changed to terminate > > > the TLS connection properly the receive side still needs to remain > > > compatible with older QEMU bit stream which does not do this. > > > > If this is an existing bug, we could add a Fixes. > > It is an existing issue but only uncovered by this patch set. > > As far as I can see it was always there, so it would need some > thought where to point that Fixes tag.
If there's no way to trigger a real functional bug anyway, it's also ok we omit the Fixes. > > Two pure questions.. > > > > - What is the correct way to terminate the TLS session without this flag? > > I guess one would need to call gnutls_bye() like in this GnuTLS example: > https://gitlab.com/gnutls/gnutls/-/blob/2b8c3e4c71ad380bbbffb32e6003b34ecad596e3/doc/examples/ex-client-anon.c#L102 > > > - Why this is only needed by multifd sessions? > > What uncovered the issue was switching the load threads to using > migrate_set_error() instead of their own result variable > (load_threads_ret) which you had requested during the previous > patch set version review: > https://lore.kernel.org/qemu-devel/Z1DbH5fwBaxtgrvH@x1n/ > > Turns out that the multifd receive code always returned > error in the TLS case, just nothing was previously checking for > that error presence. What I was curious is whether this issue also exists for the main migration channel when with tls, especially when e.g. multifd not enabled at all. As I don't see anywhere that qemu uses gnutls_bye() for any tls session. I think it's a good to find that we overlooked this before.. and IMHO it's always good we could fix this. Does it mean we need proper gnutls_bye() somewhere? If we need an explicit gnutls_bye(), then I wonder if that should be done on the main channel as well. If we don't need gnutls_bye(), then should we always ignore pre-mature termination of tls no matter if it's multifd or non-multifd channel (or even a tls session that is not migration-related)? Thanks, > > Another option would be to simply return to using > load_threads_ret like the previous versions did and not > experiment with touching global migration state because > as we can see other places can unintentionally break. > > If we go this route then these TLS EOF patches could be > dropped. > > > Thanks, > > > > Thanks, > Maciej > -- Peter Xu