Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()

2024-03-28 Thread Cédric Le Goater

On 3/28/24 16:50, Avihai Horon wrote:


On 28/03/2024 17:21, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


Hello Avihai,

On 3/28/24 15:02, Avihai Horon wrote:

After commit 9425ef3f990a ("migration: Use migrate_has_error() in
close_return_path_on_source()"), close_return_path_on_source() assumes
that migration error is set if an error occurs during migration.

This may not be true if migration errors in migration_completion(). For
example, if qemu_savevm_state_complete_precopy() errors, migration error
will not be set


Out of curiosity, could you describe a bit more the context ? Did
vfio_save_complete_precopy() fail ? why ?


Yep, vfio_save_complete_precopy() failed (but it failed while I was 
experimenting with an unofficial debug FW).



We should propagate errors of .save_live_complete_precopy() handlers as
it was done .save_setup handlers(). For 9.1.


Agreed.




This in turn, will cause a migration hang bug, similar to the bug that
was fixed by commit 22b04245f0d5 ("migration: Join the return path
thread before releasing to_dst_file"), as shutdown() will not be issued
for the return-path channel.


yes, but this test :

    if (ret < 0) {
    goto fail;
    }

will skip the close_return_path_on_source() call. Won't it ? So I don't
understand how it can be an issue. Am I missing something ?


It will skip the close_return_path_on_source() call in migration_completion(), 
but there is another close_return_path_on_source() call in migrate_fd_cleanup().


OK. Found it. This is a code path I hadn't explored yet.

Acked-by: Cédric Le Goater 

Thanks,

C.







Fix it by ensuring migration error is set in case of error in
migration_completion().


Why didn't you add a reference to commit 9425ef3f990a ?


I thought this commit didn't introduce this bug, but looking again in the 
mailing list [1], it kinda did:
The hang bug was fully fixed by commit 22b04245f0d ("migration: Join the return path 
thread before releasing to_dst_file") and then 9425ef3f990a re-introduced the bug, 
but only for migration_completion() case.
So, you are right, a fixes line with 9425ef3f990a should be added.

Thanks.

[1] https://lore.kernel.org/all/20240226203122.22894-1-faro...@suse.de/





Signed-off-by: Avihai Horon 
---
  migration/migration.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 9fe8fd2afd7..b73ae3a72c4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
  {
  int ret = 0;
  int current_active_state = s->state;
+    Error *local_err = NULL;

  if (s->state == MIGRATION_STATUS_ACTIVE) {
  ret = migration_completion_precopy(s, _active_state);
@@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
  return;

  fail:
+    if (qemu_file_get_error_obj(s->to_dst_file, _err)) {
+    migrate_set_error(s, local_err);
+    error_free(local_err);
+    } else if (ret) {
+    error_setg_errno(_err, -ret, "Error in migration completion");


The 'ret = -1' case could be improved with error_setg(). As a followup.

Thanks,

C.





+    migrate_set_error(s, local_err);
+    error_free(local_err);
+    }
+
  migration_completion_failed(s, current_active_state);
  }










Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()

2024-03-28 Thread Avihai Horon



On 28/03/2024 17:09, Peter Xu wrote:

External email: Use caution opening links or attachments


On Thu, Mar 28, 2024 at 04:02:51PM +0200, Avihai Horon wrote:

After commit 9425ef3f990a ("migration: Use migrate_has_error() in
close_return_path_on_source()"), close_return_path_on_source() assumes
that migration error is set if an error occurs during migration.

This may not be true if migration errors in migration_completion(). For
example, if qemu_savevm_state_complete_precopy() errors, migration error
will not be set.

This in turn, will cause a migration hang bug, similar to the bug that
was fixed by commit 22b04245f0d5 ("migration: Join the return path
thread before releasing to_dst_file"), as shutdown() will not be issued
for the return-path channel.

Fix it by ensuring migration error is set in case of error in
migration_completion().

Signed-off-by: Avihai Horon 

Reviewed-by: Peter Xu 

I'll attach this if it looks all right to you:

Fixes: 9425ef3f990a ("migration: Use migrate_has_error() in 
close_return_path_on_source()")


Yes, sure, go ahead.

Thanks.



Thanks,


---
  migration/migration.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 9fe8fd2afd7..b73ae3a72c4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
  {
  int ret = 0;
  int current_active_state = s->state;
+Error *local_err = NULL;

  if (s->state == MIGRATION_STATUS_ACTIVE) {
  ret = migration_completion_precopy(s, _active_state);
@@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
  return;

  fail:
+if (qemu_file_get_error_obj(s->to_dst_file, _err)) {
+migrate_set_error(s, local_err);
+error_free(local_err);
+} else if (ret) {
+error_setg_errno(_err, -ret, "Error in migration completion");
+migrate_set_error(s, local_err);
+error_free(local_err);
+}
+
  migration_completion_failed(s, current_active_state);
  }

--
2.26.3



--
Peter Xu





Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()

2024-03-28 Thread Avihai Horon



On 28/03/2024 17:21, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


Hello Avihai,

On 3/28/24 15:02, Avihai Horon wrote:

After commit 9425ef3f990a ("migration: Use migrate_has_error() in
close_return_path_on_source()"), close_return_path_on_source() assumes
that migration error is set if an error occurs during migration.

This may not be true if migration errors in migration_completion(). For
example, if qemu_savevm_state_complete_precopy() errors, migration error
will not be set


Out of curiosity, could you describe a bit more the context ? Did
vfio_save_complete_precopy() fail ? why ?


Yep, vfio_save_complete_precopy() failed (but it failed while I was 
experimenting with an unofficial debug FW).




We should propagate errors of .save_live_complete_precopy() handlers as
it was done .save_setup handlers(). For 9.1.


Agreed.




This in turn, will cause a migration hang bug, similar to the bug that
was fixed by commit 22b04245f0d5 ("migration: Join the return path
thread before releasing to_dst_file"), as shutdown() will not be issued
for the return-path channel.


yes, but this test :

    if (ret < 0) {
    goto fail;
    }

will skip the close_return_path_on_source() call. Won't it ? So I don't
understand how it can be an issue. Am I missing something ?


It will skip the close_return_path_on_source() call in 
migration_completion(), but there is another 
close_return_path_on_source() call in migrate_fd_cleanup().





Fix it by ensuring migration error is set in case of error in
migration_completion().


Why didn't you add a reference to commit 9425ef3f990a ?


I thought this commit didn't introduce this bug, but looking again in 
the mailing list [1], it kinda did:
The hang bug was fully fixed by commit 22b04245f0d ("migration: Join the 
return path thread before releasing to_dst_file") and then 9425ef3f990a 
re-introduced the bug, but only for migration_completion() case.

So, you are right, a fixes line with 9425ef3f990a should be added.

Thanks.

[1] https://lore.kernel.org/all/20240226203122.22894-1-faro...@suse.de/





Signed-off-by: Avihai Horon 
---
  migration/migration.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 9fe8fd2afd7..b73ae3a72c4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState 
*s)

  {
  int ret = 0;
  int current_active_state = s->state;
+    Error *local_err = NULL;

  if (s->state == MIGRATION_STATUS_ACTIVE) {
  ret = migration_completion_precopy(s, _active_state);
@@ -2832,6 +2833,15 @@ static void 
migration_completion(MigrationState *s)

  return;

  fail:
+    if (qemu_file_get_error_obj(s->to_dst_file, _err)) {
+    migrate_set_error(s, local_err);
+    error_free(local_err);
+    } else if (ret) {
+    error_setg_errno(_err, -ret, "Error in migration 
completion");


The 'ret = -1' case could be improved with error_setg(). As a followup.

Thanks,

C.





+    migrate_set_error(s, local_err);
+    error_free(local_err);
+    }
+
  migration_completion_failed(s, current_active_state);
  }







Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()

2024-03-28 Thread Cédric Le Goater

Hello Avihai,

On 3/28/24 15:02, Avihai Horon wrote:

After commit 9425ef3f990a ("migration: Use migrate_has_error() in
close_return_path_on_source()"), close_return_path_on_source() assumes
that migration error is set if an error occurs during migration.

This may not be true if migration errors in migration_completion(). For
example, if qemu_savevm_state_complete_precopy() errors, migration error
will not be set


Out of curiosity, could you describe a bit more the context ? Did
vfio_save_complete_precopy() fail ? why ?

We should propagate errors of .save_live_complete_precopy() handlers as
it was done .save_setup handlers(). For 9.1.


This in turn, will cause a migration hang bug, similar to the bug that
was fixed by commit 22b04245f0d5 ("migration: Join the return path
thread before releasing to_dst_file"), as shutdown() will not be issued
for the return-path channel.


yes, but this test :

if (ret < 0) {
goto fail;
}

will skip the close_return_path_on_source() call. Won't it ? So I don't
understand how it can be an issue. Am I missing something ?


Fix it by ensuring migration error is set in case of error in
migration_completion().


Why didn't you add a reference to commit 9425ef3f990a ?



Signed-off-by: Avihai Horon 
---
  migration/migration.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 9fe8fd2afd7..b73ae3a72c4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
  {
  int ret = 0;
  int current_active_state = s->state;
+Error *local_err = NULL;
  
  if (s->state == MIGRATION_STATUS_ACTIVE) {

  ret = migration_completion_precopy(s, _active_state);
@@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
  return;
  
  fail:

+if (qemu_file_get_error_obj(s->to_dst_file, _err)) {
+migrate_set_error(s, local_err);
+error_free(local_err);
+} else if (ret) {
+error_setg_errno(_err, -ret, "Error in migration completion");


The 'ret = -1' case could be improved with error_setg(). As a followup.

Thanks,

C.





+migrate_set_error(s, local_err);
+error_free(local_err);
+}
+
  migration_completion_failed(s, current_active_state);
  }
  





Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()

2024-03-28 Thread Peter Xu
On Thu, Mar 28, 2024 at 04:02:51PM +0200, Avihai Horon wrote:
> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
> close_return_path_on_source()"), close_return_path_on_source() assumes
> that migration error is set if an error occurs during migration.
> 
> This may not be true if migration errors in migration_completion(). For
> example, if qemu_savevm_state_complete_precopy() errors, migration error
> will not be set.
> 
> This in turn, will cause a migration hang bug, similar to the bug that
> was fixed by commit 22b04245f0d5 ("migration: Join the return path
> thread before releasing to_dst_file"), as shutdown() will not be issued
> for the return-path channel.
> 
> Fix it by ensuring migration error is set in case of error in
> migration_completion().
> 
> Signed-off-by: Avihai Horon 

Reviewed-by: Peter Xu 

I'll attach this if it looks all right to you:

Fixes: 9425ef3f990a ("migration: Use migrate_has_error() in 
close_return_path_on_source()")

Thanks,

> ---
>  migration/migration.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 9fe8fd2afd7..b73ae3a72c4 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
>  {
>  int ret = 0;
>  int current_active_state = s->state;
> +Error *local_err = NULL;
>  
>  if (s->state == MIGRATION_STATUS_ACTIVE) {
>  ret = migration_completion_precopy(s, _active_state);
> @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
>  return;
>  
>  fail:
> +if (qemu_file_get_error_obj(s->to_dst_file, _err)) {
> +migrate_set_error(s, local_err);
> +error_free(local_err);
> +} else if (ret) {
> +error_setg_errno(_err, -ret, "Error in migration completion");
> +migrate_set_error(s, local_err);
> +error_free(local_err);
> +}
> +
>  migration_completion_failed(s, current_active_state);
>  }
>  
> -- 
> 2.26.3
> 
> 

-- 
Peter Xu




[PATCH for-9.0 1/2] migration: Set migration error in migration_completion()

2024-03-28 Thread Avihai Horon
After commit 9425ef3f990a ("migration: Use migrate_has_error() in
close_return_path_on_source()"), close_return_path_on_source() assumes
that migration error is set if an error occurs during migration.

This may not be true if migration errors in migration_completion(). For
example, if qemu_savevm_state_complete_precopy() errors, migration error
will not be set.

This in turn, will cause a migration hang bug, similar to the bug that
was fixed by commit 22b04245f0d5 ("migration: Join the return path
thread before releasing to_dst_file"), as shutdown() will not be issued
for the return-path channel.

Fix it by ensuring migration error is set in case of error in
migration_completion().

Signed-off-by: Avihai Horon 
---
 migration/migration.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 9fe8fd2afd7..b73ae3a72c4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
 {
 int ret = 0;
 int current_active_state = s->state;
+Error *local_err = NULL;
 
 if (s->state == MIGRATION_STATUS_ACTIVE) {
 ret = migration_completion_precopy(s, _active_state);
@@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
 return;
 
 fail:
+if (qemu_file_get_error_obj(s->to_dst_file, _err)) {
+migrate_set_error(s, local_err);
+error_free(local_err);
+} else if (ret) {
+error_setg_errno(_err, -ret, "Error in migration completion");
+migrate_set_error(s, local_err);
+error_free(local_err);
+}
+
 migration_completion_failed(s, current_active_state);
 }
 
-- 
2.26.3