On Mon, Feb 03, 2025 at 07:53:00PM +0100, Maciej S. Szmigiero wrote:
> On 3.02.2025 19:20, Peter Xu wrote:
> > On Thu, Jan 30, 2025 at 11:08:29AM +0100, Maciej S. Szmigiero wrote:
> > > From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>
> > > 
> > > Multifd send channels are terminated by calling
> > > qio_channel_shutdown(QIO_CHANNEL_SHUTDOWN_BOTH) in
> > > multifd_send_terminate_threads(), which in the TLS case essentially
> > > calls shutdown(SHUT_RDWR) on the underlying raw socket.
> > > 
> > > Unfortunately, this does not terminate the TLS session properly and
> > > the receive side sees this as a GNUTLS_E_PREMATURE_TERMINATION error.
> > > 
> > > The only reason why this wasn't causing migration failures is because
> > > the current migration code apparently does not check for migration
> > > error being set after the end of the multifd receive process.
> > > 
> > > However, this will change soon so the multifd receive code has to be
> > > prepared to not return an error on such premature TLS session EOF.
> > > Use the newly introduced QIOChannelTLS method for that.
> > > 
> > > It's worth noting that even if the sender were to be changed to terminate
> > > the TLS connection properly the receive side still needs to remain
> > > compatible with older QEMU bit stream which does not do this.
> > 
> > If this is an existing bug, we could add a Fixes.
> 
> It is an existing issue but only uncovered by this patch set.
> 
> As far as I can see it was always there, so it would need some
> thought where to point that Fixes tag.

If there's no way to trigger a real functional bug anyway, it's also ok we
omit the Fixes.

> > Two pure questions..
> > 
> >    - What is the correct way to terminate the TLS session without this flag?
> 
> I guess one would need to call gnutls_bye() like in this GnuTLS example:
> https://gitlab.com/gnutls/gnutls/-/blob/2b8c3e4c71ad380bbbffb32e6003b34ecad596e3/doc/examples/ex-client-anon.c#L102
> 
> >    - Why this is only needed by multifd sessions?
> 
> What uncovered the issue was switching the load threads to using
> migrate_set_error() instead of their own result variable
> (load_threads_ret) which you had requested during the previous
> patch set version review:
> https://lore.kernel.org/qemu-devel/Z1DbH5fwBaxtgrvH@x1n/
> 
> Turns out that the multifd receive code always returned
> error in the TLS case, just nothing was previously checking for
> that error presence.

What I was curious is whether this issue also exists for the main migration
channel when with tls, especially when e.g. multifd not enabled at all.  As
I don't see anywhere that qemu uses gnutls_bye() for any tls session.

I think it's a good to find that we overlooked this before.. and IMHO it's
always good we could fix this.

Does it mean we need proper gnutls_bye() somewhere?

If we need an explicit gnutls_bye(), then I wonder if that should be done
on the main channel as well.

If we don't need gnutls_bye(), then should we always ignore pre-mature
termination of tls no matter if it's multifd or non-multifd channel (or
even a tls session that is not migration-related)?

Thanks,

> 
> Another option would be to simply return to using
> load_threads_ret like the previous versions did and not
> experiment with touching global migration state because
> as we can see other places can unintentionally break.
> 
> If we go this route then these TLS EOF patches could be
> dropped.
> 
> > Thanks,
> > 
> 
> Thanks,
> Maciej
> 

-- 
Peter Xu


Reply via email to