Re: [Qemu-devel] [PATCH] replication: interrupt failover if the main device is closed

2016-10-14 Thread Stefan Hajnoczi
On Fri, Oct 07, 2016 at 02:21:33PM +0200, Paolo Bonzini wrote:
> Without this change, there is a race condition in tests/test-replication.
> Depending on how fast the failover job (active commit) runs, there is a
> chance of two bad things happening:
> 
> 1) replication_done can be called after the secondary has been closed
> and hence when the BDRVReplicationState is not valid anymore.
> 
> 2) two copies of the active disk are present during the
> /replication/secondary/stop test (that test runs immediately after
> /replication/secondary/start, which tests failover).  This causes the
> corruption detector to fire.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/replication.c | 3 +++
>  1 file changed, 3 insertions(+)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH] replication: interrupt failover if the main device is closed

2016-10-11 Thread Changlong Xie

On 10/07/2016 08:21 PM, Paolo Bonzini wrote:

Without this change, there is a race condition in tests/test-replication.
Depending on how fast the failover job (active commit) runs, there is a
chance of two bad things happening:

1) replication_done can be called after the secondary has been closed
and hence when the BDRVReplicationState is not valid anymore.

2) two copies of the active disk are present during the
/replication/secondary/stop test (that test runs immediately after
/replication/secondary/start, which tests failover).  This causes the
corruption detector to fire.

Signed-off-by: Paolo Bonzini 


Reviewed-by: Changlong Xie 


---
  block/replication.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/block/replication.c b/block/replication.c
index 3bd1cf1..5231a00 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -133,6 +133,9 @@ static void replication_close(BlockDriverState *bs)
  if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
  replication_stop(s->rs, false, NULL);
  }
+if (s->replication_state == BLOCK_REPLICATION_FAILOVER) {
+block_job_cancel_sync(s->active_disk->bs->job);
+}

  if (s->mode == REPLICATION_MODE_SECONDARY) {
  g_free(s->top_id);







Re: [Qemu-devel] [PATCH] replication: interrupt failover if the main device is closed

2016-10-09 Thread Wen Congyang

At 2016/10/7 20:21, Paolo Bonzini wrote:

Without this change, there is a race condition in tests/test-replication.
Depending on how fast the failover job (active commit) runs, there is a
chance of two bad things happening:

1) replication_done can be called after the secondary has been closed
and hence when the BDRVReplicationState is not valid anymore.

2) two copies of the active disk are present during the
/replication/secondary/stop test (that test runs immediately after
/replication/secondary/start, which tests failover).  This causes the
corruption detector to fire.

Signed-off-by: Paolo Bonzini 


This patch looks fine to me.
Reviewed-by: Wen Congyang 


---
  block/replication.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/block/replication.c b/block/replication.c
index 3bd1cf1..5231a00 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -133,6 +133,9 @@ static void replication_close(BlockDriverState *bs)
  if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
  replication_stop(s->rs, false, NULL);
  }
+if (s->replication_state == BLOCK_REPLICATION_FAILOVER) {
+block_job_cancel_sync(s->active_disk->bs->job);
+}

  if (s->mode == REPLICATION_MODE_SECONDARY) {
  g_free(s->top_id);





[Qemu-devel] [PATCH] replication: interrupt failover if the main device is closed

2016-10-07 Thread Paolo Bonzini
Without this change, there is a race condition in tests/test-replication.
Depending on how fast the failover job (active commit) runs, there is a
chance of two bad things happening:

1) replication_done can be called after the secondary has been closed
and hence when the BDRVReplicationState is not valid anymore.

2) two copies of the active disk are present during the
/replication/secondary/stop test (that test runs immediately after
/replication/secondary/start, which tests failover).  This causes the
corruption detector to fire.

Signed-off-by: Paolo Bonzini 
---
 block/replication.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/replication.c b/block/replication.c
index 3bd1cf1..5231a00 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -133,6 +133,9 @@ static void replication_close(BlockDriverState *bs)
 if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
 replication_stop(s->rs, false, NULL);
 }
+if (s->replication_state == BLOCK_REPLICATION_FAILOVER) {
+block_job_cancel_sync(s->active_disk->bs->job);
+}
 
 if (s->mode == REPLICATION_MODE_SECONDARY) {
 g_free(s->top_id);
-- 
2.7.4