On Tue, Oct 18, 2022 at 09:18:28AM +0100, Daniel P. Berrangé wrote:
> On Mon, Oct 17, 2022 at 05:15:35PM -0400, Peter Xu wrote:
> > On Mon, Oct 17, 2022 at 12:38:30PM +0100, Daniel P. Berrangé wrote:
> > > On Mon, Oct 17, 2022 at 01:06:00PM +0530, manish.mishra wrote:
> > > > Hi Daniel,
> > > > 
> > > > I was thinking for some solutions for this so wanted to discuss that 
> > > > before going ahead. Also added Juan and Peter in loop.
> > > > 
> > > > 1. Earlier i was thinking, on destination side as of now for default
> > > > and multi-FD channel first data to be sent is MAGIC_NUMBER and VERSION
> > > > so may be we can decide mapping based on that. But then that does not
> > > > work for newly added post copy preempt channel as it does not send
> > > > any MAGIC number. Also even for multiFD just MAGIC number does not
> > > > tell which multifd channel number is it, even though as per my thinking
> > > > it does not matter. So MAGIC number should be good for indentifying
> > > > default vs multiFD channel?
> > > 
> > > Yep, you don't need to know more than the MAGIC value.
> > > 
> > > In migration_io_process_incoming, we need to use MSG_PEEK to look at
> > > the first 4 bytes pendingon the wire. If those bytes are 'QEVM' that's
> > > the primary channel, if those bytes are big endian 0x11223344, that's
> > > a multifd channel.  Using MSG_PEEK aviods need to modify thue later
> > > code that actually reads this data.
> > > 
> > > The challenge is how long to wait with the MSG_PEEK. If we do it
> > > in a blocking mode, its fine for main channel and multifd, but
> > > IIUC for the post-copy pre-empt channel we'd be waiting for
> > > something that will never arrive.
> > > 
> > > Having suggested MSG_PEEK though, this may well not work if the
> > > channel has TLS present. In fact it almost definitely won't work.
> > > 
> > > To cope with TLS migration_io_process_incoming would need to
> > > actually read the data off the wire, and later methods be
> > > taught to skip reading the magic.
> > > 
> > > > 2. For post-copy preempt may be we can initiate this channel only
> > > > after we have received a request from remote e.g. remote page fault.
> > > > This to me looks safest considering post-copy recorvery case too.
> > > > I can not think of any depedency on post copy preempt channel which
> > > > requires it to be initialised very early. May be Peter can confirm
> > > > this.
> > > 
> > > I guess that could work
> > 
> > Currently all preempt code still assumes when postcopy activated it's in
> > preempt mode.  IIUC such a change will bring an extra phase of postcopy
> > with no-preempt before preempt enabled.  We may need to teach qemu to
> > understand that if it's needed.
> > 
> > Meanwhile the initial page requests will not be able to benefit from the
> > new preempt channel too.
> > 
> > > 
> > > > 3. Another thing we can do is to have 2-way handshake on every
> > > > channel creation with some additional metadata, this to me looks
> > > > like cleanest approach and durable, i understand that can break
> > > > migration to/from old qemu, but then that can come as migration
> > > > capability?
> > > 
> > > The benefit of (1) is that the fix can be deployed for all existing
> > > QEMU releases by backporting it.  (3) will meanwhile need mgmt app
> > > updates to make it work, which is much more work to deploy.
> > > 
> > > We really shoulud have had a more formal handshake, and I've described
> > > ways to achieve this in the past, but it is quite alot of work.
> > 
> > I don't know whether (1) is a valid option if there are use cases that it
> > cannot cover (on either tls or preempt).  The handshake is definitely the
> > clean approach.
> > 
> > What's the outcome of such wrongly ordered connections?  Will migration
> > fail immediately and safely?
> > 
> > For multifd, I think it should fail immediately after the connection
> > established.
> > 
> > For preempt, I'd also expect the same thing because the only wrong order to
> > happen right now is having the preempt channel to be the migration channel,
> > then it should also fail immediately on the first qemu_get_byte().
> > 
> > Hopefully that's still not too bad - I mean, if we can fail constantly and
> > safely (never fail during postcopy), we can always retry and as long as
> > connections created successfully we can start the migration safely.  But
> > please correct me if it's not the case.
> 
> It should typically fail as the magic bytes are different, which will not
> pass validation. The exception being the postcopy pre-empt  channel which
> may well cause migration to stall as nothing will be sent initially by
> the src.

Hmm right..

Actually if preempt channel is special we can fix it alone.  As both of you
discussed, we can postpone the preempt channel setup, maybe not as late as
when we receive the 1st page request, but:

  (1) For newly established migration, we can postpone preempt channel
      setup (postcopy_preempt_setup, resume=false) to the entrance of
      postcopy_start().

  (2) For a postcopy recovery process, we can postpone preempt channel
      setup (postcopy_preempt_setup, resume=true) to postcopy_do_resume(),
      maybe between qemu_savevm_state_resume_prepare() and the final
      handshake of postcopy_resume_handshake().

I need to try and test a bit for above idea.  But the same trick may not
play well on multifd even if it works.

-- 
Peter Xu


Reply via email to