On Wed, Feb 07, 2024 at 02:33:47PM +0100, Cédric Le Goater wrote:
> In case of error, close_return_path_on_source() can perform a shutdown
> to exit the return-path thread.  However, in migrate_fd_cleanup(),
> 'to_dst_file' is closed before calling close_return_path_on_source()
> and the shutdown fails, leaving the source and destination waiting for
> an event to occur.
> 
> Close the file after calling close_return_path_on_source() so that the
> shutdown succeeds and the return-path thread exits.
> 
> Signed-off-by: Cédric Le Goater <c...@redhat.com>
> ---
> 
>  This is an RFC because the correct fix implies reworking the QEMUFile
>  construct, built on top of the QEMU I/O channel.
> 
>  migration/migration.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 
> 5f55af3d7624750ca416c4177781241b3e291e5d..de329f2c553288935d824748286e79e535929b8b
>  100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1313,6 +1313,8 @@ void migrate_set_state(int *state, int old_state, int 
> new_state)
>  
>  static void migrate_fd_cleanup(MigrationState *s)
>  {
> +    QEMUFile *tmp = NULL;
> +
>      g_free(s->hostname);
>      s->hostname = NULL;
>      json_writer_free(s->vmdesc);
> @@ -1321,8 +1323,6 @@ static void migrate_fd_cleanup(MigrationState *s)
>      qemu_savevm_state_cleanup();
>  
>      if (s->to_dst_file) {
> -        QEMUFile *tmp;
> -
>          trace_migrate_fd_cleanup();
>          bql_unlock();
>          if (s->migration_thread_running) {
> @@ -1341,15 +1341,14 @@ static void migrate_fd_cleanup(MigrationState *s)
>           * critical section won't block for long.
>           */
>          migration_ioc_unregister_yank_from_file(tmp);
> -        qemu_fclose(tmp);
>      }
>  
> -    /*
> -     * We already cleaned up to_dst_file, so errors from the return
> -     * path might be due to that, ignore them.
> -     */
>      close_return_path_on_source(s);
>  
> +    if (tmp) {
> +        qemu_fclose(tmp);
> +    }
> +
>      assert(!migration_is_active(s));
>  
>      if (s->state == MIGRATION_STATUS_CANCELLING) {

I think this is okay to me for a short term plan.  I'll see how others
think, also add Dan into the loop.

If so, would you please add a rich comment explaining why tmp needs to be
closed later?  Especially, explicit comment on the ordering requirement
would be helpful: IMHO here it's an order that qemu_fclose() must happen
after close_return_path_on_source().  So when others work on this code we
don't easily break it without noticing.

Also please feel free to post separately on migration patches if you'd like
us to merge the patches when repost.

Thanks,

-- 
Peter Xu


Reply via email to