On 20/05/2021 17:19, Vladimir Sementsov-Ogievskiy wrote:
18.05.2021 13:07, Emanuele Giuseppe Esposito wrote:
Because the list of tasks is only modified by coroutine
functions, add a CoMutex in order to protect them.
Use the same mutex to protect also BlockCopyState in_flight_bytes
field to avoid adding additional syncronization primitives.
Signed-off-by: Emanuele Giuseppe Esposito <eespo...@redhat.com>
---
block/block-copy.c | 55 +++++++++++++++++++++++++++++-----------------
1 file changed, 35 insertions(+), 20 deletions(-)
diff --git a/block/block-copy.c b/block/block-copy.c
index 2e610b4142..3a949fab64 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -83,7 +83,7 @@ typedef struct BlockCopyTask {
*/
bool zeroes;
- /* State */
+ /* State. Protected by tasks_lock */
CoQueue wait_queue; /* coroutines blocked on this task */
/* To reference all call states from BlockCopyState */
@@ -106,8 +106,9 @@ typedef struct BlockCopyState {
BdrvChild *target;
/* State */
- int64_t in_flight_bytes;
+ int64_t in_flight_bytes; /* protected by tasks_lock */
only this field is protected? or the whole State section?
I guess you figured it already, here there is only in_flight_bytes
because in next patches I take care of the others.
[...]
}
@@ -213,11 +219,7 @@ static coroutine_fn BlockCopyTask
*block_copy_task_create(BlockCopyState *s,
assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
Looking at block_copy_task_create():
First, !bdrv_dirty_bitmap_next_dirty_area() doesn't take bitmaps lock,
so it's not protected at all.
Next, even if we take bitmaps lock in
bdrv_dirty_bitmap_next_dirty_area() or around it, it doesn't bring
thread-safety to block_copy_task_create():
The simplest solution here seems to protect
bdrv_dirty_bitmap_next_dirty_area and also bdrv_reset_dirty_bitmap with
the tasks lock, so that once it is released the bitmap is updated
accordingly and from my understanding no other task can get that region.
Btw, out of curiosity, why is bdrv_dirty_bitmap_next_dirty_area not
protected by a lock? Can't we have a case where two threads (like you
just mention below) check the bitmap? Or am I missing something?
Imagine block_copy_task_create() is called from two threads
simultaneously. Both calls will get same dirty area and create equal
tasks. Then, "assert(!find_conflicting_task_locked(s, offset, bytes));"
will crash.
- /* region is dirty, so no existent tasks possible in it */
- assert(!find_conflicting_task(s, offset, bytes));
-
bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
- s->in_flight_bytes += bytes;
task = g_new(BlockCopyTask, 1);
*task = (BlockCopyTask) {
@@ -228,7 +230,13 @@ static coroutine_fn BlockCopyTask
*block_copy_task_create(BlockCopyState *s,
.bytes = bytes,
};
qemu_co_queue_init(&task->wait_queue);
- QLIST_INSERT_HEAD(&s->tasks, task, list);
+
+ WITH_QEMU_LOCK_GUARD(&s->tasks_lock) {
+ s->in_flight_bytes += bytes;
+ /* region is dirty, so no existent tasks possible in it */
+ assert(!find_conflicting_task_locked(s, offset, bytes));
+ QLIST_INSERT_HEAD(&s->tasks, task, list);
+ }
return task;
}
@@ -249,25 +257,29 @@ static void coroutine_fn
block_copy_task_shrink(BlockCopyTask *task,
assert(new_bytes > 0 && new_bytes < task->bytes);
- task->s->in_flight_bytes -= task->bytes - new_bytes;
bdrv_set_dirty_bitmap(task->s->copy_bitmap,
task->offset + new_bytes, task->bytes -
new_bytes);
Then look here. Imagine, immediately after bdrv_set_dirty_bitmap() in
parallel thread the new task is created, which consumes these new dirty
bits. Again, it will find conflicting task (this one, as task->bytes is
not modified yet) and crash.
Also here, the lock guard should be enlarged to cover also the dirty
bitmap. Thank you for pointing these cases!
Emanuele
- task->bytes = new_bytes;
- qemu_co_queue_restart_all(&task->wait_queue);
+ WITH_QEMU_LOCK_GUARD(&task->s->tasks_lock) {
+ task->s->in_flight_bytes -= task->bytes - new_bytes;
+ task->bytes = new_bytes;
+ qemu_co_queue_restart_all(&task->wait_queue);
+ }
}