Fabiano Rosas <faro...@suse.de> wrote:
> We currently use the 'sem_sync' semaphore on the sending side:
>
> 1) to know when the multifd_send_thread() has finished sending the
>    MULTIFD_FLAG_SYNC packet;
>
>   This is unnecessary. Multifd sends packets one by one and completion
>   is already bound by the 'sem' semaphore. The SYNC packet has nothing
>   special that would require it to have a separate semaphore on the
>   sending side.

What migration basically does is:

sync_dirty_bitmap()
while (too_much_dirty_memory)
   foreach(dirty_page)
      send(dirty_page)
   sync_dirty_bitmap()

I know, this is an over simplification, but it is enough to explain the
problem that this code tries to fix.

Once that we have multifd, we can have several channels, each of one
going through a different network connection.  Yes, networks are a black
box and there is no guarantees about how packets arrive on different
sockets.

In one iteration, page 99 is dirty.  We send it through channel 0.
We end the iteration and we synchronize the bitmap again.
We send the SYNC packet in both channels.
Page 99 is dirty again, and this time it gets sent through channel 1.

What we want, we want the communication to be:

    Channel 0          migration thread    Channel 1

(1) sent page 99
(2)                    sync bitmap
(3)                                        sent page 99

And we want destination to make sure that it never gets packet with page
99 from channel 0 AFTER page 99 from channel 1.

Notice, this is something that it is highly improbable to happen, but it
_can_ happen (and zero copy increases the probability of it).

So we create this SYNC packet that does that:

foreach (channel)
   create_job_to_send_sync() packet
foreach (channel)
   wait_until_it_has_sent_the_sync_packet()

Notice that this is the main migration thread, it will net send new work
to any channel until it receives the sem_sync for every channel.

Now, how do we deal on the target:

foreach (channel)
   wait(sem_sync)
foreach (channel)
   send(sem_sync)

So, trying to do my best at ASCII art, what happens when we end a round
of memory iteration

MigrationThread(send)           MigrationThread(recv)   channel_n (send)        
 channel_n(recv)

sync_bitmap()
foreach(channel)
   create_job_with_SYNC
   post(channel->sem)
                                                        wait(channel->sem)
                                                        write(SYNC)
                                                        post(channel->sem_sync)
foreach(channel)
   wait(channel->sem_sync)

write(MULTIFD_FLUSH)
                                                                                
  read(SYNC)
                                                                                
  post(main->sem_sync)
                                                                                
  wait(channel->sem_sync)
                                read(MULTIFD_FLUSH)
                                foreach(channel)
                                   wait(main->sem_sync)
                                foreach(channel)
                                   post(channel->sem_sync)

Interesting points:

1- We guarantee that packets inside the same channel are in order.

2- Migration send thread don't send a MULTIFD_FLUSH packet until every
   channel has sent a SYNC packet

3- After reception of a SYNC packet.  A channel:
   a -> communicates to the main migration thread that it has received
        it (post(main->sem_sync))
   b -> it waits on (channel->sem_sync)

4- Main recv thread receives a MULTIFD_FLUSH
   a -> waits for every channel to say that it has received a SYNC
        packet
   b -> communicates to every channel that they can continue.

Send channels can send new data after the main channel does a
write(MULTIFD_FLUSH).  But reception channels will not read it until the
main recv thread is sure that every reception channel has received a
SYNC, so we are sure that (in the previous example) page 99 from thread
0 is already written.

Is it clearer now?

And yes, after discussion I will convert this email in documentation.

> 2) to wait for the multifd threads to finish before cleaning up;
>
>    This happens because multifd_send_sync_main() blocks
>    ram_save_complete() from finishing until the semaphore is
>    posted. This is surprising and not documented.
>
> Clarify the above situation by renaming 'sem_sync' to 'sem_done' and
> making the #2 usage the main one. Post to 'sem_sync' only when there's
> no more pending_jobs.
>
> Signed-off-by: Fabiano Rosas <faro...@suse.de>
> ---
> I remove the parts about the receiving side. I wasn't sure about them
> and we don't need to mix the two. Potentially we need the sem_sync on
> the recv to ensure all channels wait before becoming available to read
> once again after a FLUSH.

I think this is not needed.  It is the source how decides when it is
needed to wait for all the packets in the middle.

Later, Juan.


Reply via email to