Re: [PATCH v11 18/21] job.c: enable job lock/unlock and remove Aiocontext locks

2022-09-27 Thread Paolo Bonzini
Il lun 26 set 2022, 14:21 Vladimir Sementsov-Ogievskiy <
vsement...@yandex-team.ru> ha scritto:

> On 9/18/22 20:12, Emanuele Giuseppe Esposito wrote:
> >>> --- a/qemu-img.c
> >>> +++ b/qemu-img.c
> >>> @@ -911,7 +911,6 @@ static void run_block_job(BlockJob *job, Error
> >>> **errp)
> >>>AioContext *aio_context = block_job_get_aio_context(job);
> >>>int ret = 0;
> >>>-aio_context_acquire(aio_context);
> >>>job_lock();
> >>>job_ref_locked(>job);
> >>>do {
> >> aio_poll() call here, doesn't require aio_context to be acquired?
> > On the contrary I think, if you see in AIO_WAIT_WHILE we explicitly
> > release it because we don't want to allow something else to run with the
> > aiocontext acquired.
> >
>
> Still, in AIO_WAIT_WHILE() we release ctx_, but do
> aio_poll(qemu_get_aio_context(), true), so we poll in other context.
>
> But here in qemu-img.c we drop aiocontext lock exactly for aio_context,
> which is an argument of aio_poll()..
>

It's the same, the acquire/release is done again in file descriptor
callback or bottom halves (typically via aio_co_wake).

Paolo


> --
> Best regards,
> Vladimir
>
>


Re: [PATCH v11 18/21] job.c: enable job lock/unlock and remove Aiocontext locks

2022-09-26 Thread Vladimir Sementsov-Ogievskiy

On 9/18/22 20:12, Emanuele Giuseppe Esposito wrote:

--- a/qemu-img.c
+++ b/qemu-img.c
@@ -911,7 +911,6 @@ static void run_block_job(BlockJob *job, Error
**errp)
   AioContext *aio_context = block_job_get_aio_context(job);
   int ret = 0;
   -    aio_context_acquire(aio_context);
   job_lock();
   job_ref_locked(>job);
   do {

aio_poll() call here, doesn't require aio_context to be acquired?

On the contrary I think, if you see in AIO_WAIT_WHILE we explicitly
release it because we don't want to allow something else to run with the
aiocontext acquired.



Still, in AIO_WAIT_WHILE() we release ctx_, but do 
aio_poll(qemu_get_aio_context(), true), so we poll in other context.

But here in qemu-img.c we drop aiocontext lock exactly for aio_context, which 
is an argument of aio_poll()..

--
Best regards,
Vladimir



Re: [PATCH v11 18/21] job.c: enable job lock/unlock and remove Aiocontext locks

2022-09-23 Thread Paolo Bonzini
On Thu, Sep 22, 2022 at 4:42 PM Emanuele Giuseppe Esposito
 wrote:
>
> Am 18/09/2022 um 19:12 schrieb Emanuele Giuseppe Esposito:
> >> In replication_stop, we call job_cancel_sync() inside
> >> aio_context_acquire - aio_context_release section. Should it be fixed?
>
> > I don't think it breaks anything. The question is: what is the
> > aiocontext there protecting? Do we need it? I have no idea.
>
> Ok Paolo reminded me that job_cancel_sync() calls
> AIO_WAIT_WHILE_UNLOCKED. This means that it indeed needs to be wrapped
> in an aiocontext release() + acquire() block.
>
> >> Another question, sometimes you move job_start out of
> >> aio-context-acquire critical section, sometimes not. As I understand,
> >> it's of for job_start to be called both with acquired aio-context or not
> >> acquired?
> >>
> > Same as above, job_start does not need the aiocontext to be taken
> > (otherwise job_lock would be useless).
>
> I still think here it does not matter if the aiocontext is taken or not.

What matters is that the AioContext lock is taken either inside or
outside the job_lock.

job_start() takes the job mutex, so you _always_ need to ensure that
the job mutex is taken inside AioContext lock and never the opposite.

>From quick grep of AIO_WAIT_WHILE_UNLOCKED(), job_finish_sync_locked()
is the only user and the only place where the two locks interact. It
is explicitly called with AioContext locks _not_ taken, and releases
job_lock when the AioContext lock might be taken inside
AIO_WAIT_WHILE_UNLOCKED(); so it should be fine. This also validates
the idea of AIO_WAIT_WHILE_UNLOCKED().

Paolo




Re: [PATCH v11 18/21] job.c: enable job lock/unlock and remove Aiocontext locks

2022-09-22 Thread Emanuele Giuseppe Esposito
Am 18/09/2022 um 19:12 schrieb Emanuele Giuseppe Esposito:
>> In replication_stop, we call job_cancel_sync() inside
>> aio_context_acquire - aio_context_release section. Should it be fixed?

> I don't think it breaks anything. The question is: what is the
> aiocontext there protecting? Do we need it? I have no idea.

Ok Paolo reminded me that job_cancel_sync() calls
AIO_WAIT_WHILE_UNLOCKED. This means that it indeed needs to be wrapped
in an aiocontext release() + acquire() block.

>> Another question, sometimes you move job_start out of
>> aio-context-acquire critical section, sometimes not. As I understand,
>> it's of for job_start to be called both with acquired aio-context or not
>> acquired?
>>
> Same as above, job_start does not need the aiocontext to be taken
> (otherwise job_lock would be useless).

I still think here it does not matter if the aiocontext is taken or not.

> Thank you,
> Emanuele




Re: [PATCH v11 18/21] job.c: enable job lock/unlock and remove Aiocontext locks

2022-09-18 Thread Emanuele Giuseppe Esposito



Am 15/09/2022 um 16:52 schrieb Vladimir Sementsov-Ogievskiy:
> On 8/26/22 16:21, Emanuele Giuseppe Esposito wrote:
>> Change the job_{lock/unlock} and macros to use job_mutex.
>>
>> Now that they are not nop anymore, remove the aiocontext
>> to avoid deadlocks.
>>
>> Therefore:
>> - when possible, remove completely the aiocontext lock/unlock pair
>> - if it is used by some other function too, reduce the locking
>>    section as much as possible, leaving the job API outside.
>> - change AIO_WAIT_WHILE in AIO_WAIT_WHILE_UNLOCKED, since we
>>    are not using the aiocontext lock anymore
>>
>> The only functions that still need the aiocontext lock are the JobDriver
>> callbacks, already documented in job.h. Reduce the locking section to
>> only cover the callback invocation and document the functions that
>> take the AioContext lock, to avoid taking it twice.
>>
>> Also remove real_job_{lock/unlock}, as they are replaced by the
>> public functions.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito 
>> Reviewed-by: Stefan Hajnoczi 
>> ---
> 
> [..]
> 
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -911,7 +911,6 @@ static void run_block_job(BlockJob *job, Error
>> **errp)
>>   AioContext *aio_context = block_job_get_aio_context(job);
>>   int ret = 0;
>>   -    aio_context_acquire(aio_context);
>>   job_lock();
>>   job_ref_locked(>job);
>>   do {
> 
> aio_poll() call here, doesn't require aio_context to be acquired?

On the contrary I think, if you see in AIO_WAIT_WHILE we explicitly
release it because we don't want to allow something else to run with the
aiocontext acquired.

> 
>> @@ -936,7 +935,6 @@ static void run_block_job(BlockJob *job, Error
>> **errp)
>>   }
>>   job_unref_locked(>job);
>>   job_unlock();
>> -    aio_context_release(aio_context);
>>     /* publish completion progress only when success */
>>   if (!ret) {
> 
> [..]
> 
> In replication_stop, we call job_cancel_sync() inside
> aio_context_acquire - aio_context_release section. Should it be fixed?

I don't think it breaks anything. The question is: what is the
aiocontext there protecting? Do we need it? I have no idea.
> 
> Another question, sometimes you move job_start out of
> aio-context-acquire critical section, sometimes not. As I understand,
> it's of for job_start to be called both with acquired aio-context or not
> acquired?
> 
Same as above, job_start does not need the aiocontext to be taken
(otherwise job_lock would be useless).

> 
> Otherwise looks good to me!
> 

Thank you,
Emanuele




Re: [PATCH v11 18/21] job.c: enable job lock/unlock and remove Aiocontext locks

2022-09-15 Thread Vladimir Sementsov-Ogievskiy

On 8/26/22 16:21, Emanuele Giuseppe Esposito wrote:

Change the job_{lock/unlock} and macros to use job_mutex.

Now that they are not nop anymore, remove the aiocontext
to avoid deadlocks.

Therefore:
- when possible, remove completely the aiocontext lock/unlock pair
- if it is used by some other function too, reduce the locking
   section as much as possible, leaving the job API outside.
- change AIO_WAIT_WHILE in AIO_WAIT_WHILE_UNLOCKED, since we
   are not using the aiocontext lock anymore

The only functions that still need the aiocontext lock are the JobDriver
callbacks, already documented in job.h. Reduce the locking section to
only cover the callback invocation and document the functions that
take the AioContext lock, to avoid taking it twice.

Also remove real_job_{lock/unlock}, as they are replaced by the
public functions.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---


[..]


--- a/qemu-img.c
+++ b/qemu-img.c
@@ -911,7 +911,6 @@ static void run_block_job(BlockJob *job, Error **errp)
  AioContext *aio_context = block_job_get_aio_context(job);
  int ret = 0;
  
-aio_context_acquire(aio_context);

  job_lock();
  job_ref_locked(>job);
  do {


aio_poll() call here, doesn't require aio_context to be acquired?


@@ -936,7 +935,6 @@ static void run_block_job(BlockJob *job, Error **errp)
  }
  job_unref_locked(>job);
  job_unlock();
-aio_context_release(aio_context);
  
  /* publish completion progress only when success */

  if (!ret) {


[..]

In replication_stop, we call job_cancel_sync() inside aio_context_acquire - 
aio_context_release section. Should it be fixed?

Another question, sometimes you move job_start out of aio-context-acquire 
critical section, sometimes not. As I understand, it's of for job_start to be 
called both with acquired aio-context or not acquired?


Otherwise looks good to me!

--
Best regards,
Vladimir



[PATCH v11 18/21] job.c: enable job lock/unlock and remove Aiocontext locks

2022-08-26 Thread Emanuele Giuseppe Esposito
Change the job_{lock/unlock} and macros to use job_mutex.

Now that they are not nop anymore, remove the aiocontext
to avoid deadlocks.

Therefore:
- when possible, remove completely the aiocontext lock/unlock pair
- if it is used by some other function too, reduce the locking
  section as much as possible, leaving the job API outside.
- change AIO_WAIT_WHILE in AIO_WAIT_WHILE_UNLOCKED, since we
  are not using the aiocontext lock anymore

The only functions that still need the aiocontext lock are the JobDriver
callbacks, already documented in job.h. Reduce the locking section to
only cover the callback invocation and document the functions that
take the AioContext lock, to avoid taking it twice.

Also remove real_job_{lock/unlock}, as they are replaced by the
public functions.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
 blockdev.c   |  72 +++-
 include/qemu/job.h   |  17 ++---
 job-qmp.c|  46 +++--
 job.c| 111 +--
 qemu-img.c   |   2 -
 tests/unit/test-bdrv-drain.c |   4 +-
 tests/unit/test-block-iothread.c |   2 +-
 tests/unit/test-blockjob.c   |  19 +++---
 8 files changed, 70 insertions(+), 203 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index dddf5699a0..2cd84d206c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -155,12 +155,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
 for (job = block_job_next_locked(NULL); job;
  job = block_job_next_locked(job)) {
 if (block_job_has_bdrv(job, blk_bs(blk))) {
-AioContext *aio_context = job->job.aio_context;
-aio_context_acquire(aio_context);
-
 job_cancel_locked(>job, false);
-
-aio_context_release(aio_context);
 }
 }
 
@@ -1836,14 +1831,7 @@ static void drive_backup_abort(BlkActionState *common)
 DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
 
 if (state->job) {
-AioContext *aio_context;
-
-aio_context = bdrv_get_aio_context(state->bs);
-aio_context_acquire(aio_context);
-
 job_cancel_sync(>job->job, true);
-
-aio_context_release(aio_context);
 }
 }
 
@@ -1937,14 +1925,7 @@ static void blockdev_backup_abort(BlkActionState *common)
 BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, 
common);
 
 if (state->job) {
-AioContext *aio_context;
-
-aio_context = bdrv_get_aio_context(state->bs);
-aio_context_acquire(aio_context);
-
 job_cancel_sync(>job->job, true);
-
-aio_context_release(aio_context);
 }
 }
 
@@ -3306,19 +3287,14 @@ out:
 }
 
 /*
- * Get a block job using its ID and acquire its AioContext.
- * Called with job_mutex held.
+ * Get a block job using its ID. Called with job_mutex held.
  */
-static BlockJob *find_block_job_locked(const char *id,
-   AioContext **aio_context,
-   Error **errp)
+static BlockJob *find_block_job_locked(const char *id, Error **errp)
 {
 BlockJob *job;
 
 assert(id != NULL);
 
-*aio_context = NULL;
-
 job = block_job_get_locked(id);
 
 if (!job) {
@@ -3327,36 +3303,30 @@ static BlockJob *find_block_job_locked(const char *id,
 return NULL;
 }
 
-*aio_context = block_job_get_aio_context(job);
-aio_context_acquire(*aio_context);
-
 return job;
 }
 
 void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
 {
-AioContext *aio_context;
 BlockJob *job;
 
 JOB_LOCK_GUARD();
-job = find_block_job_locked(device, _context, errp);
+job = find_block_job_locked(device, errp);
 
 if (!job) {
 return;
 }
 
 block_job_set_speed_locked(job, speed, errp);
-aio_context_release(aio_context);
 }
 
 void qmp_block_job_cancel(const char *device,
   bool has_force, bool force, Error **errp)
 {
-AioContext *aio_context;
 BlockJob *job;
 
 JOB_LOCK_GUARD();
-job = find_block_job_locked(device, _context, errp);
+job = find_block_job_locked(device, errp);
 
 if (!job) {
 return;
@@ -3369,22 +3339,19 @@ void qmp_block_job_cancel(const char *device,
 if (job_user_paused_locked(>job) && !force) {
 error_setg(errp, "The block job for device '%s' is currently paused",
device);
-goto out;
+return;
 }
 
 trace_qmp_block_job_cancel(job);
 job_user_cancel_locked(>job, force, errp);
-out:
-aio_context_release(aio_context);
 }
 
 void qmp_block_job_pause(const char *device, Error **errp)
 {
-AioContext *aio_context;
 BlockJob *job;
 
 JOB_LOCK_GUARD();
-job = find_block_job_locked(device, _context, errp);
+job = find_block_job_locked(device, errp);
 
 if (!job) {
 return;
@@ -3392,16 +3359,14 @@ void