Re: [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions

2021-06-10 Thread Vladimir Sementsov-Ogievskiy

10.06.2021 17:21, Emanuele Giuseppe Esposito wrote:



On 10/06/2021 13:12, Vladimir Sementsov-Ogievskiy wrote:

10.06.2021 13:46, Emanuele Giuseppe Esposito wrote:



On 10/06/2021 12:27, Vladimir Sementsov-Ogievskiy wrote:

10.06.2021 13:14, Emanuele Giuseppe Esposito wrote:



On 09/06/2021 11:12, Vladimir Sementsov-Ogievskiy wrote:

08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:

As done in BlockCopyCallState, categorize BlockCopyTask
and BlockCopyState in IN, State and OUT fields.
This is just to understand which field has to be protected with a lock.

.sleep_state is handled in the series "coroutine: new sleep/wake API"
and thus here left as TODO.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block/block-copy.c | 47 ++
  1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index d58051288b..b3533a3003 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
  QLIST_ENTRY(BlockCopyCallState) list;
  /* State */


Why previous @list field is not in the state? For sure it's not an IN parameter 
and should be protected somehow.


-    int ret;
  bool finished;
-    QemuCoSleep sleep;
-    bool cancelled;
+    QemuCoSleep sleep; /* TODO: protect API with a lock */
  /* OUT parameters */
+    bool cancelled;
  bool error_is_read;
+    int ret;
  } BlockCopyCallState;
  typedef struct BlockCopyTask {
  AioTask task;
+    /*
+ * IN parameters. Initialized in block_copy_task_create()
+ * and never changed.
+ */
  BlockCopyState *s;
  BlockCopyCallState *call_state;
  int64_t offset;
-    int64_t bytes;
-    BlockCopyMethod method;
-    QLIST_ENTRY(BlockCopyTask) list;
+    int64_t bytes; /* only re-set in task_shrink, before running the task */
+    BlockCopyMethod method; /* initialized in block_copy_dirty_clusters() */


hmm. to be precise method is initialized in block_copy_task_create.

And after block_copy_task_create finished, task is in the list and can be read 
by parallel block_copy_dirty_clusters(). So, @bytes is part of State, we must 
protect it..


So if I understand correctly, you refer to the fact that a parallel 
block_copy_dirty_clusters() can create another task and search with 
find_conflicting_task_locked(), or in general also block_copy_wait_one() can do 
the same in parallel, correct?


yes



Here there is also another problem: if we add the task to the list and then 
shrink it in two different critical sections, we are going to have problems 
because in the meanwhile find_conflicting_tasks can be issued in parallel.


But we shrink task only once, and we do it under mutex, so we are OK I think?


I think you understood, but just in case: I am thinking the case where we have:




But maybe I am overcomplicating.



Both shrink and find_ are done under mutex, so they can't intersect. But yes, 
we should keep in mind that if we do find_ under mutex, and then release mutex, 
the information get from find_ may become incorrect.

Check callers of find_conflicting_task_locked():

block_copy_wait_one has one critical section.

if no conflicting tasks we are OK.. Are we? Ok, look at the only caller of 
block_copy_wait_one() - block_copy_common().

assume block_copy_dirty_clusters() returns 0, so there no dirty bits at some 
moment...

than in parallel thread some task may finish with failure, leaving some new 
dirty bits.. Then we check that there no conflicting tasks.. And then we go out 
of the loop, when actually we must retry for these new dirty bits.

So I'm afraid you are right, we are not threadsafe yet in block_copy_common(), 
as we should check conflicting tasks and dirty bits in same critical section to 
be consistent.


Wait, we are talking about two different problems:

- What I wanted to point out has to do with @bytes, not (as far as I 
understand) with the dirty bits. From the example I made below, I assume there 
are 3 separate non-overlapping critical sections:


T1: block_copy_task_create()
T2: find_conflicting_tasks() <-- sees the initial task
T1: task_shrink() <-- bytes are updated, T2 saw the wrong amount of bytes. This 
might or might not have consequences, I am not sure.


T1 creates the task, T2 iterates to search for conflicting tasks (called from a 
parallel block_copy_wait_one), T1 shrinks the current task. I think that T2 in 
this case misses the updated task, even though the worst it can happen is that 
the task is smaller, so a false positive (a task is not conflicting but might 
be marked as conflicting).
The outcome is that T2 is waiting for a task it shouldn't, but there is no 
error there.


That is handled in shrink: we do wake up all waiting tasks in shrink()



- Your point is about a task failing between block_copy_dirty_clusters and 
block_copy_wait_one. The task failing calls block_copy_task_end and sets the 
dirty bitmap, but at that point 

Re: [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions

2021-06-10 Thread Emanuele Giuseppe Esposito




On 10/06/2021 13:12, Vladimir Sementsov-Ogievskiy wrote:

10.06.2021 13:46, Emanuele Giuseppe Esposito wrote:



On 10/06/2021 12:27, Vladimir Sementsov-Ogievskiy wrote:

10.06.2021 13:14, Emanuele Giuseppe Esposito wrote:



On 09/06/2021 11:12, Vladimir Sementsov-Ogievskiy wrote:

08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:

As done in BlockCopyCallState, categorize BlockCopyTask
and BlockCopyState in IN, State and OUT fields.
This is just to understand which field has to be protected with a 
lock.


.sleep_state is handled in the series "coroutine: new sleep/wake API"
and thus here left as TODO.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block/block-copy.c | 47 
++

  1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index d58051288b..b3533a3003 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
  QLIST_ENTRY(BlockCopyCallState) list;
  /* State */


Why previous @list field is not in the state? For sure it's not an 
IN parameter and should be protected somehow.



-    int ret;
  bool finished;
-    QemuCoSleep sleep;
-    bool cancelled;
+    QemuCoSleep sleep; /* TODO: protect API with a lock */
  /* OUT parameters */
+    bool cancelled;
  bool error_is_read;
+    int ret;
  } BlockCopyCallState;
  typedef struct BlockCopyTask {
  AioTask task;
+    /*
+ * IN parameters. Initialized in block_copy_task_create()
+ * and never changed.
+ */
  BlockCopyState *s;
  BlockCopyCallState *call_state;
  int64_t offset;
-    int64_t bytes;
-    BlockCopyMethod method;
-    QLIST_ENTRY(BlockCopyTask) list;
+    int64_t bytes; /* only re-set in task_shrink, before running 
the task */
+    BlockCopyMethod method; /* initialized in 
block_copy_dirty_clusters() */


hmm. to be precise method is initialized in block_copy_task_create.

And after block_copy_task_create finished, task is in the list and 
can be read by parallel block_copy_dirty_clusters(). So, @bytes is 
part of State, we must protect it..


So if I understand correctly, you refer to the fact that a parallel 
block_copy_dirty_clusters() can create another task and search with 
find_conflicting_task_locked(), or in general also 
block_copy_wait_one() can do the same in parallel, correct?


yes



Here there is also another problem: if we add the task to the list 
and then shrink it in two different critical sections, we are going 
to have problems because in the meanwhile find_conflicting_tasks can 
be issued in parallel.


But we shrink task only once, and we do it under mutex, so we are OK 
I think?


I think you understood, but just in case: I am thinking the case where 
we have:




But maybe I am overcomplicating.



Both shrink and find_ are done under mutex, so they can't intersect. But 
yes, we should keep in mind that if we do find_ under mutex, and then 
release mutex, the information get from find_ may become incorrect.


Check callers of find_conflicting_task_locked():

block_copy_wait_one has one critical section.

if no conflicting tasks we are OK.. Are we? Ok, look at the only caller 
of block_copy_wait_one() - block_copy_common().


assume block_copy_dirty_clusters() returns 0, so there no dirty bits at 
some moment...


than in parallel thread some task may finish with failure, leaving some 
new dirty bits.. Then we check that there no conflicting tasks.. And 
then we go out of the loop, when actually we must retry for these new 
dirty bits.


So I'm afraid you are right, we are not threadsafe yet in 
block_copy_common(), as we should check conflicting tasks and dirty bits 
in same critical section to be consistent.


Wait, we are talking about two different problems:

- What I wanted to point out has to do with @bytes, not (as far as I 
understand) with the dirty bits. From the example I made below, I assume 
there are 3 separate non-overlapping critical sections:



T1: block_copy_task_create()
T2: find_conflicting_tasks() <-- sees the initial task
T1: task_shrink() <-- bytes are updated, T2 saw the wrong amount of 
bytes. This might or might not have consequences, I am not sure.


T1 creates the task, T2 iterates to search for conflicting tasks (called 
from a parallel block_copy_wait_one), T1 shrinks the current task. I 
think that T2 in this case misses the updated task, even though the 
worst it can happen is that the task is smaller, so a false positive (a 
task is not conflicting but might be marked as conflicting).
The outcome is that T2 is waiting for a task it shouldn't, but there is 
no error there.


- Your point is about a task failing between block_copy_dirty_clusters 
and block_copy_wait_one. The task failing calls block_copy_task_end and 
sets the dirty bitmap, but at that point block_copy_wait_one won't check 
it anymore and the bitmap is left dirty. I think the default behavior 
here should be 

Re: [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions

2021-06-10 Thread Vladimir Sementsov-Ogievskiy

10.06.2021 13:46, Emanuele Giuseppe Esposito wrote:



On 10/06/2021 12:27, Vladimir Sementsov-Ogievskiy wrote:

10.06.2021 13:14, Emanuele Giuseppe Esposito wrote:



On 09/06/2021 11:12, Vladimir Sementsov-Ogievskiy wrote:

08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:

As done in BlockCopyCallState, categorize BlockCopyTask
and BlockCopyState in IN, State and OUT fields.
This is just to understand which field has to be protected with a lock.

.sleep_state is handled in the series "coroutine: new sleep/wake API"
and thus here left as TODO.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block/block-copy.c | 47 ++
  1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index d58051288b..b3533a3003 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
  QLIST_ENTRY(BlockCopyCallState) list;
  /* State */


Why previous @list field is not in the state? For sure it's not an IN parameter 
and should be protected somehow.


-    int ret;
  bool finished;
-    QemuCoSleep sleep;
-    bool cancelled;
+    QemuCoSleep sleep; /* TODO: protect API with a lock */
  /* OUT parameters */
+    bool cancelled;
  bool error_is_read;
+    int ret;
  } BlockCopyCallState;
  typedef struct BlockCopyTask {
  AioTask task;
+    /*
+ * IN parameters. Initialized in block_copy_task_create()
+ * and never changed.
+ */
  BlockCopyState *s;
  BlockCopyCallState *call_state;
  int64_t offset;
-    int64_t bytes;
-    BlockCopyMethod method;
-    QLIST_ENTRY(BlockCopyTask) list;
+    int64_t bytes; /* only re-set in task_shrink, before running the task */
+    BlockCopyMethod method; /* initialized in block_copy_dirty_clusters() */


hmm. to be precise method is initialized in block_copy_task_create.

And after block_copy_task_create finished, task is in the list and can be read 
by parallel block_copy_dirty_clusters(). So, @bytes is part of State, we must 
protect it..


So if I understand correctly, you refer to the fact that a parallel 
block_copy_dirty_clusters() can create another task and search with 
find_conflicting_task_locked(), or in general also block_copy_wait_one() can do 
the same in parallel, correct?


yes



Here there is also another problem: if we add the task to the list and then 
shrink it in two different critical sections, we are going to have problems 
because in the meanwhile find_conflicting_tasks can be issued in parallel.


But we shrink task only once, and we do it under mutex, so we are OK I think?


I think you understood, but just in case: I am thinking the case where we have:
T1: block_copy_task_create()
T2: find_conflicting_tasks() <-- sees the initial task
T1: task_shrink() <-- bytes are updated, T2 saw the wrong amount of bytes. This 
might or might not have consequences, I am not sure.

But maybe I am overcomplicating.



Both shrink and find_ are done under mutex, so they can't intersect. But yes, 
we should keep in mind that if we do find_ under mutex, and then release mutex, 
the information get from find_ may become incorrect.

Check callers of find_conflicting_task_locked():

block_copy_wait_one has one critical section.

if no conflicting tasks we are OK.. Are we? Ok, look at the only caller of 
block_copy_wait_one() - block_copy_common().

assume block_copy_dirty_clusters() returns 0, so there no dirty bits at some 
moment...

than in parallel thread some task may finish with failure, leaving some new 
dirty bits.. Then we check that there no conflicting tasks.. And then we go out 
of the loop, when actually we must retry for these new dirty bits.

So I'm afraid you are right, we are not threadsafe yet in block_copy_common(), 
as we should check conflicting tasks and dirty bits in same critical section to 
be consistent.







So, is there a reason why we don't want
QLIST_INSERT_HEAD(>tasks, task, list);
in block_copy_dirty_clusters()?

By doing that, I think we also spare @bytes from the critical section, since it 
is only read from that point onwards.


This way find_conflicting_tasks will just skip our new creating task.. And 
we'll get conflict when try to add our new task. No, we should add task to the 
list at same critical section where we clear dirty bits from the bitmap.



I agree, with the above.
So to me the most correct solution would be to call create and shrink in the 
same lock, but this creates a much wider critical section.

Alternatively, I can leave it as it is and just update the comment.



Then we shrink task in another critical section, it should be OK too.



I am also trying to see if I can group some critical sections.

Btw I think we already talked about @bytes and it's not the first time we 
switch it from IN to STATE and vice-versa...
I mean, I agree with you but it starts to be confusing.


On last review it seemed to me that you actually protect 

Re: [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions

2021-06-10 Thread Emanuele Giuseppe Esposito




On 10/06/2021 12:27, Vladimir Sementsov-Ogievskiy wrote:

10.06.2021 13:14, Emanuele Giuseppe Esposito wrote:



On 09/06/2021 11:12, Vladimir Sementsov-Ogievskiy wrote:

08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:

As done in BlockCopyCallState, categorize BlockCopyTask
and BlockCopyState in IN, State and OUT fields.
This is just to understand which field has to be protected with a lock.

.sleep_state is handled in the series "coroutine: new sleep/wake API"
and thus here left as TODO.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block/block-copy.c | 47 
++

  1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index d58051288b..b3533a3003 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
  QLIST_ENTRY(BlockCopyCallState) list;
  /* State */


Why previous @list field is not in the state? For sure it's not an IN 
parameter and should be protected somehow.



-    int ret;
  bool finished;
-    QemuCoSleep sleep;
-    bool cancelled;
+    QemuCoSleep sleep; /* TODO: protect API with a lock */
  /* OUT parameters */
+    bool cancelled;
  bool error_is_read;
+    int ret;
  } BlockCopyCallState;
  typedef struct BlockCopyTask {
  AioTask task;
+    /*
+ * IN parameters. Initialized in block_copy_task_create()
+ * and never changed.
+ */
  BlockCopyState *s;
  BlockCopyCallState *call_state;
  int64_t offset;
-    int64_t bytes;
-    BlockCopyMethod method;
-    QLIST_ENTRY(BlockCopyTask) list;
+    int64_t bytes; /* only re-set in task_shrink, before running 
the task */
+    BlockCopyMethod method; /* initialized in 
block_copy_dirty_clusters() */


hmm. to be precise method is initialized in block_copy_task_create.

And after block_copy_task_create finished, task is in the list and 
can be read by parallel block_copy_dirty_clusters(). So, @bytes is 
part of State, we must protect it..


So if I understand correctly, you refer to the fact that a parallel 
block_copy_dirty_clusters() can create another task and search with 
find_conflicting_task_locked(), or in general also 
block_copy_wait_one() can do the same in parallel, correct?


yes



Here there is also another problem: if we add the task to the list and 
then shrink it in two different critical sections, we are going to 
have problems because in the meanwhile find_conflicting_tasks can be 
issued in parallel.


But we shrink task only once, and we do it under mutex, so we are OK I 
think?


I think you understood, but just in case: I am thinking the case where 
we have:

T1: block_copy_task_create()
T2: find_conflicting_tasks() <-- sees the initial task
T1: task_shrink() <-- bytes are updated, T2 saw the wrong amount of 
bytes. This might or might not have consequences, I am not sure.


But maybe I am overcomplicating.






So, is there a reason why we don't want
QLIST_INSERT_HEAD(>tasks, task, list);
in block_copy_dirty_clusters()?

By doing that, I think we also spare @bytes from the critical section, 
since it is only read from that point onwards.


This way find_conflicting_tasks will just skip our new creating task.. 
And we'll get conflict when try to add our new task. No, we should add 
task to the list at same critical section where we clear dirty bits from 
the bitmap.



I agree, with the above.
So to me the most correct solution would be to call create and shrink in 
the same lock, but this creates a much wider critical section.


Alternatively, I can leave it as it is and just update the comment.



Then we shrink task in another critical section, it should be OK too.



I am also trying to see if I can group some critical sections.

Btw I think we already talked about @bytes and it's not the first time 
we switch it from IN to STATE and vice-versa...

I mean, I agree with you but it starts to be confusing.


On last review it seemed to me that you actually protect bytes by 
critical section where it is needed. So here I'm saying only about the 
comment..





This also goes against your comment later in patch 4,
@@ -212,7 +222,7 @@ static BlockCopyTask 
*block_copy_task_create(BlockCopyState *s,

  bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
    /* region is dirty, so no existent tasks possible in it */
-    assert(!find_conflicting_task(s, offset, bytes));
+    assert(!find_conflicting_task_locked(s, offset, bytes));
    bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
  s->in_flight_bytes += bytes;
@@ -248,16 +258,19 @@ static void coroutine_fn 
block_copy_task_shrink(BlockCopyTask *task,


The function reads task->bytes not under mutex.. It's safe, as only 
that function is modifying the field, and it's called once. Still, 
let's make critical section a little bit wider, just for simplicity. 
I mean, simple QEMU_LOCK_GUARD() at start of function. 


Where if I understand correctly, it 

Re: [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions

2021-06-10 Thread Vladimir Sementsov-Ogievskiy

10.06.2021 13:14, Emanuele Giuseppe Esposito wrote:



On 09/06/2021 11:12, Vladimir Sementsov-Ogievskiy wrote:

08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:

As done in BlockCopyCallState, categorize BlockCopyTask
and BlockCopyState in IN, State and OUT fields.
This is just to understand which field has to be protected with a lock.

.sleep_state is handled in the series "coroutine: new sleep/wake API"
and thus here left as TODO.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block/block-copy.c | 47 ++
  1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index d58051288b..b3533a3003 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
  QLIST_ENTRY(BlockCopyCallState) list;
  /* State */


Why previous @list field is not in the state? For sure it's not an IN parameter 
and should be protected somehow.


-    int ret;
  bool finished;
-    QemuCoSleep sleep;
-    bool cancelled;
+    QemuCoSleep sleep; /* TODO: protect API with a lock */
  /* OUT parameters */
+    bool cancelled;
  bool error_is_read;
+    int ret;
  } BlockCopyCallState;
  typedef struct BlockCopyTask {
  AioTask task;
+    /*
+ * IN parameters. Initialized in block_copy_task_create()
+ * and never changed.
+ */
  BlockCopyState *s;
  BlockCopyCallState *call_state;
  int64_t offset;
-    int64_t bytes;
-    BlockCopyMethod method;
-    QLIST_ENTRY(BlockCopyTask) list;
+    int64_t bytes; /* only re-set in task_shrink, before running the task */
+    BlockCopyMethod method; /* initialized in block_copy_dirty_clusters() */


hmm. to be precise method is initialized in block_copy_task_create.

And after block_copy_task_create finished, task is in the list and can be read 
by parallel block_copy_dirty_clusters(). So, @bytes is part of State, we must 
protect it..


So if I understand correctly, you refer to the fact that a parallel 
block_copy_dirty_clusters() can create another task and search with 
find_conflicting_task_locked(), or in general also block_copy_wait_one() can do 
the same in parallel, correct?


yes



Here there is also another problem: if we add the task to the list and then 
shrink it in two different critical sections, we are going to have problems 
because in the meanwhile find_conflicting_tasks can be issued in parallel.


But we shrink task only once, and we do it under mutex, so we are OK I think?



So, is there a reason why we don't want
QLIST_INSERT_HEAD(>tasks, task, list);
in block_copy_dirty_clusters()?

By doing that, I think we also spare @bytes from the critical section, since it 
is only read from that point onwards.


This way find_conflicting_tasks will just skip our new creating task.. And 
we'll get conflict when try to add our new task. No, we should add task to the 
list at same critical section where we clear dirty bits from the bitmap.

Then we shrink task in another critical section, it should be OK too.



I am also trying to see if I can group some critical sections.

Btw I think we already talked about @bytes and it's not the first time we 
switch it from IN to STATE and vice-versa...
I mean, I agree with you but it starts to be confusing.


On last review it seemed to me that you actually protect bytes by critical 
section where it is needed. So here I'm saying only about the comment..




This also goes against your comment later in patch 4,

@@ -212,7 +222,7 @@ static BlockCopyTask *block_copy_task_create(BlockCopyState 
*s,
  bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
    /* region is dirty, so no existent tasks possible in it */
-    assert(!find_conflicting_task(s, offset, bytes));
+    assert(!find_conflicting_task_locked(s, offset, bytes));
    bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
  s->in_flight_bytes += bytes;
@@ -248,16 +258,19 @@ static void coroutine_fn 
block_copy_task_shrink(BlockCopyTask *task,


The function reads task->bytes not under mutex.. It's safe, as only that function is modifying the field, and it's called once. Still, let's make critical section a little bit wider, just for simplicity. I mean, simple QEMU_LOCK_GUARD() at start of function. 


Where if I understand correctly, it is not safe, because find_conflicting_tasks 
might search the non-updated task.



find_conflicting_tasks only reads bytes, so it can't make damage.. Anyway 
making critical sections a bit wider won't hurt.


--
Best regards,
Vladimir



Re: [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions

2021-06-10 Thread Emanuele Giuseppe Esposito




On 09/06/2021 11:12, Vladimir Sementsov-Ogievskiy wrote:

08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:

As done in BlockCopyCallState, categorize BlockCopyTask
and BlockCopyState in IN, State and OUT fields.
This is just to understand which field has to be protected with a lock.

.sleep_state is handled in the series "coroutine: new sleep/wake API"
and thus here left as TODO.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block/block-copy.c | 47 ++
  1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index d58051288b..b3533a3003 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
  QLIST_ENTRY(BlockCopyCallState) list;
  /* State */


Why previous @list field is not in the state? For sure it's not an IN 
parameter and should be protected somehow.



-    int ret;
  bool finished;
-    QemuCoSleep sleep;
-    bool cancelled;
+    QemuCoSleep sleep; /* TODO: protect API with a lock */
  /* OUT parameters */
+    bool cancelled;
  bool error_is_read;
+    int ret;
  } BlockCopyCallState;
  typedef struct BlockCopyTask {
  AioTask task;
+    /*
+ * IN parameters. Initialized in block_copy_task_create()
+ * and never changed.
+ */
  BlockCopyState *s;
  BlockCopyCallState *call_state;
  int64_t offset;
-    int64_t bytes;
-    BlockCopyMethod method;
-    QLIST_ENTRY(BlockCopyTask) list;
+    int64_t bytes; /* only re-set in task_shrink, before running the 
task */
+    BlockCopyMethod method; /* initialized in 
block_copy_dirty_clusters() */


hmm. to be precise method is initialized in block_copy_task_create.

And after block_copy_task_create finished, task is in the list and can 
be read by parallel block_copy_dirty_clusters(). So, @bytes is part of 
State, we must protect it..


So if I understand correctly, you refer to the fact that a parallel 
block_copy_dirty_clusters() can create another task and search with 
find_conflicting_task_locked(), or in general also block_copy_wait_one() 
can do the same in parallel, correct?


Here there is also another problem: if we add the task to the list and 
then shrink it in two different critical sections, we are going to have 
problems because in the meanwhile find_conflicting_tasks can be issued 
in parallel.


So, is there a reason why we don't want
QLIST_INSERT_HEAD(>tasks, task, list);
in block_copy_dirty_clusters()?

By doing that, I think we also spare @bytes from the critical section, 
since it is only read from that point onwards.


I am also trying to see if I can group some critical sections.

Btw I think we already talked about @bytes and it's not the first time 
we switch it from IN to STATE and vice-versa...

I mean, I agree with you but it starts to be confusing.


This also goes against your comment later in patch 4,

@@ -212,7 +222,7 @@ static BlockCopyTask *block_copy_task_create(BlockCopyState 
*s,
  bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
/* region is dirty, so no existent tasks possible in it */
-assert(!find_conflicting_task(s, offset, bytes));
+assert(!find_conflicting_task_locked(s, offset, bytes));
bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
  s->in_flight_bytes += bytes;
@@ -248,16 +258,19 @@ static void coroutine_fn 
block_copy_task_shrink(BlockCopyTask *task,
  


The function reads task->bytes not under mutex.. It's safe, as only that function is modifying the field, and it's called once. Still, let's make critical section a little bit wider, just for simplicity. I mean, simple QEMU_LOCK_GUARD() at start of function. 


Where if I understand correctly, it is not safe, because 
find_conflicting_tasks might search the non-updated task.


Thank you,
Emanuele




Re: [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions

2021-06-09 Thread Vladimir Sementsov-Ogievskiy

08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:

As done in BlockCopyCallState, categorize BlockCopyTask
and BlockCopyState in IN, State and OUT fields.
This is just to understand which field has to be protected with a lock.

.sleep_state is handled in the series "coroutine: new sleep/wake API"
and thus here left as TODO.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block/block-copy.c | 47 ++
  1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index d58051288b..b3533a3003 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
  QLIST_ENTRY(BlockCopyCallState) list;
  
  /* State */


Why previous @list field is not in the state? For sure it's not an IN parameter 
and should be protected somehow.


-int ret;
  bool finished;
-QemuCoSleep sleep;
-bool cancelled;
+QemuCoSleep sleep; /* TODO: protect API with a lock */
  
  /* OUT parameters */

+bool cancelled;
  bool error_is_read;
+int ret;
  } BlockCopyCallState;
  
  typedef struct BlockCopyTask {

  AioTask task;
  
+/*

+ * IN parameters. Initialized in block_copy_task_create()
+ * and never changed.
+ */
  BlockCopyState *s;
  BlockCopyCallState *call_state;
  int64_t offset;
-int64_t bytes;
-BlockCopyMethod method;
-QLIST_ENTRY(BlockCopyTask) list;
+int64_t bytes; /* only re-set in task_shrink, before running the task */
+BlockCopyMethod method; /* initialized in block_copy_dirty_clusters() */


hmm. to be precise method is initialized in block_copy_task_create.

And after block_copy_task_create finished, task is in the list and can be read 
by parallel block_copy_dirty_clusters(). So, @bytes is part of State, we must 
protect it..

method is only read by block_copy_task_entry(), so can be modified without any 
protection before running the task.


+
+/* State */
  CoQueue wait_queue; /* coroutines blocked on this task */
+
+/* To reference all call states from BlockCopyState */


That's a misleading comment.. not all sates but all tasks. I don't think we 
need this new comment, just keep @list in State section.


+QLIST_ENTRY(BlockCopyTask) list;
  } BlockCopyTask;
  
  static int64_t task_end(BlockCopyTask *task)

@@ -90,15 +98,25 @@ typedef struct BlockCopyState {
   */
  BdrvChild *source;
  BdrvChild *target;
-BdrvDirtyBitmap *copy_bitmap;
+
+/* State */
  int64_t in_flight_bytes;
-int64_t cluster_size;
  BlockCopyMethod method;
-int64_t max_transfer;
-uint64_t len;
  QLIST_HEAD(, BlockCopyTask) tasks; /* All tasks from all block-copy calls 
*/
  QLIST_HEAD(, BlockCopyCallState) calls;
+/* State fields that use a thread-safe API */
+BdrvDirtyBitmap *copy_bitmap;
+ProgressMeter *progress;
+SharedResource *mem;
+RateLimit rate_limit;
  
+/*

+ * IN parameters. Initialized in block_copy_state_new()
+ * and never changed.
+ */
+int64_t cluster_size;
+int64_t max_transfer;
+uint64_t len;
  BdrvRequestFlags write_flags;
  
  /*

@@ -114,14 +132,11 @@ typedef struct BlockCopyState {
   * In this case, block_copy() will query the source’s allocation status,
   * skip unallocated regions, clear them in the copy_bitmap, and invoke
   * block_copy_reset_unallocated() every time it does.
+ *
+ * This field is set in backup_run() before coroutines are run,
+ * therefore is an IN.
   */
  bool skip_unallocated;
-
-ProgressMeter *progress;
-
-SharedResource *mem;
-
-RateLimit rate_limit;
  } BlockCopyState;
  
  static BlockCopyTask *find_conflicting_task(BlockCopyState *s,





--
Best regards,
Vladimir