Re: [PATCH for-6.2 v3 01/12] job: Context changes in job_completed_txn_abort()

2021-09-01 Thread Hanna Reitz

On 01.09.21 12:05, Vladimir Sementsov-Ogievskiy wrote:

06.08.2021 12:38, Max Reitz wrote:

Finalizing the job may cause its AioContext to change.  This is noted by
job_exit(), which points at job_txn_apply() to take this fact into
account.

However, job_completed() does not necessarily invoke job_txn_apply()
(through job_completed_txn_success()), but potentially also
job_completed_txn_abort().  The latter stores the context in a local
variable, and so always acquires the same context at its end that it has
released in the beginning -- which may be a different context from the
one that job_exit() releases at its end.  If it is different, qemu
aborts ("qemu_mutex_unlock_impl: Operation not permitted").

Drop the local @outer_ctx variable from job_completed_txn_abort(), and
instead re-acquire the actual job's context at the end of the function,
so job_exit() will release the same.

Signed-off-by: Max Reitz 
---
  job.c | 23 ++-
  1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/job.c b/job.c
index e7a5d28854..3fe23bb77e 100644
--- a/job.c
+++ b/job.c
@@ -737,7 +737,6 @@ static void job_cancel_async(Job *job, bool force)
    static void job_completed_txn_abort(Job *job)
  {
-    AioContext *outer_ctx = job->aio_context;
  AioContext *ctx;
  JobTxn *txn = job->txn;
  Job *other_job;
@@ -751,10 +750,14 @@ static void job_completed_txn_abort(Job *job)
  txn->aborting = true;
  job_txn_ref(txn);
  -    /* We can only hold the single job's AioContext lock while 
calling

+    /*
+ * We can only hold the single job's AioContext lock while calling
   * job_finalize_single() because the finalization callbacks can 
involve

- * calls of AIO_WAIT_WHILE(), which could deadlock otherwise. */
-    aio_context_release(outer_ctx);
+ * calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
+ * Note that the job's AioContext may change when it is finalized.
+ */
+    job_ref(job);
+    aio_context_release(job->aio_context);
    /* Other jobs are effectively cancelled by us, set the status 
for

   * them; this job, however, may or may not be cancelled, depending
@@ -769,6 +772,10 @@ static void job_completed_txn_abort(Job *job)
  }
  while (!QLIST_EMPTY(&txn->jobs)) {
  other_job = QLIST_FIRST(&txn->jobs);
+    /*
+ * The job's AioContext may change, so store it in @ctx so we
+ * release the same context that we have acquired before.
+ */
  ctx = other_job->aio_context;
  aio_context_acquire(ctx);
  if (!job_is_completed(other_job)) {
@@ -779,7 +786,13 @@ static void job_completed_txn_abort(Job *job)
  aio_context_release(ctx);
  }
  -    aio_context_acquire(outer_ctx);
+    /*
+ * Use job_ref()/job_unref() so we can read the AioContext here
+ * even if the job went away during job_finalize_single().
+ */
+    ctx = job->aio_context;
+    job_unref(job);
+    aio_context_acquire(ctx);



why to use ctx variable and not do it exactly same as in 
job_txn_apply() :


   aio_context_acquire(job->aio_context);
   job_unref(job);

?


Oh, I just didn’t think of that.  Sounds good, thanks!

Hanna


anyway:
Reviewed-by: Vladimir Sementsov-Ogievskiy 






Re: [PATCH for-6.2 v3 01/12] job: Context changes in job_completed_txn_abort()

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

06.08.2021 12:38, Max Reitz wrote:

Finalizing the job may cause its AioContext to change.  This is noted by
job_exit(), which points at job_txn_apply() to take this fact into
account.

However, job_completed() does not necessarily invoke job_txn_apply()
(through job_completed_txn_success()), but potentially also
job_completed_txn_abort().  The latter stores the context in a local
variable, and so always acquires the same context at its end that it has
released in the beginning -- which may be a different context from the
one that job_exit() releases at its end.  If it is different, qemu
aborts ("qemu_mutex_unlock_impl: Operation not permitted").

Drop the local @outer_ctx variable from job_completed_txn_abort(), and
instead re-acquire the actual job's context at the end of the function,
so job_exit() will release the same.

Signed-off-by: Max Reitz 
---
  job.c | 23 ++-
  1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/job.c b/job.c
index e7a5d28854..3fe23bb77e 100644
--- a/job.c
+++ b/job.c
@@ -737,7 +737,6 @@ static void job_cancel_async(Job *job, bool force)
  
  static void job_completed_txn_abort(Job *job)

  {
-AioContext *outer_ctx = job->aio_context;
  AioContext *ctx;
  JobTxn *txn = job->txn;
  Job *other_job;
@@ -751,10 +750,14 @@ static void job_completed_txn_abort(Job *job)
  txn->aborting = true;
  job_txn_ref(txn);
  
-/* We can only hold the single job's AioContext lock while calling

+/*
+ * We can only hold the single job's AioContext lock while calling
   * job_finalize_single() because the finalization callbacks can involve
- * calls of AIO_WAIT_WHILE(), which could deadlock otherwise. */
-aio_context_release(outer_ctx);
+ * calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
+ * Note that the job's AioContext may change when it is finalized.
+ */
+job_ref(job);
+aio_context_release(job->aio_context);
  
  /* Other jobs are effectively cancelled by us, set the status for

   * them; this job, however, may or may not be cancelled, depending
@@ -769,6 +772,10 @@ static void job_completed_txn_abort(Job *job)
  }
  while (!QLIST_EMPTY(&txn->jobs)) {
  other_job = QLIST_FIRST(&txn->jobs);
+/*
+ * The job's AioContext may change, so store it in @ctx so we
+ * release the same context that we have acquired before.
+ */
  ctx = other_job->aio_context;
  aio_context_acquire(ctx);
  if (!job_is_completed(other_job)) {
@@ -779,7 +786,13 @@ static void job_completed_txn_abort(Job *job)
  aio_context_release(ctx);
  }
  
-aio_context_acquire(outer_ctx);

+/*
+ * Use job_ref()/job_unref() so we can read the AioContext here
+ * even if the job went away during job_finalize_single().
+ */
+ctx = job->aio_context;
+job_unref(job);
+aio_context_acquire(ctx);



why to use ctx variable and not do it exactly same as in job_txn_apply() :

   aio_context_acquire(job->aio_context);
   job_unref(job);

?

anyway:
Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH for-6.2 v3 01/12] job: Context changes in job_completed_txn_abort()

2021-08-09 Thread Max Reitz

On 06.08.21 21:16, Eric Blake wrote:

On Fri, Aug 06, 2021 at 11:38:48AM +0200, Max Reitz wrote:

Finalizing the job may cause its AioContext to change.  This is noted by
job_exit(), which points at job_txn_apply() to take this fact into
account.

However, job_completed() does not necessarily invoke job_txn_apply()
(through job_completed_txn_success()), but potentially also
job_completed_txn_abort().  The latter stores the context in a local
variable, and so always acquires the same context at its end that it has
released in the beginning -- which may be a different context from the
one that job_exit() releases at its end.  If it is different, qemu
aborts ("qemu_mutex_unlock_impl: Operation not permitted").

Is this a bug fix that needs to make it into 6.1?


Well, I only encountered it as part of this series (which I really don’t 
think is 6.2 material at this point), and so I don’t know.


Can’t hurt, I suppose, but if we wanted this to be in 6.1, we’d better 
have a specific test for it, I think.



Drop the local @outer_ctx variable from job_completed_txn_abort(), and
instead re-acquire the actual job's context at the end of the function,
so job_exit() will release the same.

Signed-off-by: Max Reitz 
---
  job.c | 23 ++-
  1 file changed, 18 insertions(+), 5 deletions(-)

The commit message makes sense, and does a good job at explaining the
change.  I'm still a bit fuzzy on how jobs are supposed to play nice
with contexts,


I can relate :)


but since your patch matches the commit message, I'm
happy to give:

Reviewed-by: Eric Blake 


Thanks!




Re: [PATCH for-6.2 v3 01/12] job: Context changes in job_completed_txn_abort()

2021-08-06 Thread Eric Blake
On Fri, Aug 06, 2021 at 11:38:48AM +0200, Max Reitz wrote:
> Finalizing the job may cause its AioContext to change.  This is noted by
> job_exit(), which points at job_txn_apply() to take this fact into
> account.
> 
> However, job_completed() does not necessarily invoke job_txn_apply()
> (through job_completed_txn_success()), but potentially also
> job_completed_txn_abort().  The latter stores the context in a local
> variable, and so always acquires the same context at its end that it has
> released in the beginning -- which may be a different context from the
> one that job_exit() releases at its end.  If it is different, qemu
> aborts ("qemu_mutex_unlock_impl: Operation not permitted").

Is this a bug fix that needs to make it into 6.1?

> 
> Drop the local @outer_ctx variable from job_completed_txn_abort(), and
> instead re-acquire the actual job's context at the end of the function,
> so job_exit() will release the same.
> 
> Signed-off-by: Max Reitz 
> ---
>  job.c | 23 ++-
>  1 file changed, 18 insertions(+), 5 deletions(-)

The commit message makes sense, and does a good job at explaining the
change.  I'm still a bit fuzzy on how jobs are supposed to play nice
with contexts, but since your patch matches the commit message, I'm
happy to give:

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




[PATCH for-6.2 v3 01/12] job: Context changes in job_completed_txn_abort()

2021-08-06 Thread Max Reitz
Finalizing the job may cause its AioContext to change.  This is noted by
job_exit(), which points at job_txn_apply() to take this fact into
account.

However, job_completed() does not necessarily invoke job_txn_apply()
(through job_completed_txn_success()), but potentially also
job_completed_txn_abort().  The latter stores the context in a local
variable, and so always acquires the same context at its end that it has
released in the beginning -- which may be a different context from the
one that job_exit() releases at its end.  If it is different, qemu
aborts ("qemu_mutex_unlock_impl: Operation not permitted").

Drop the local @outer_ctx variable from job_completed_txn_abort(), and
instead re-acquire the actual job's context at the end of the function,
so job_exit() will release the same.

Signed-off-by: Max Reitz 
---
 job.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/job.c b/job.c
index e7a5d28854..3fe23bb77e 100644
--- a/job.c
+++ b/job.c
@@ -737,7 +737,6 @@ static void job_cancel_async(Job *job, bool force)
 
 static void job_completed_txn_abort(Job *job)
 {
-AioContext *outer_ctx = job->aio_context;
 AioContext *ctx;
 JobTxn *txn = job->txn;
 Job *other_job;
@@ -751,10 +750,14 @@ static void job_completed_txn_abort(Job *job)
 txn->aborting = true;
 job_txn_ref(txn);
 
-/* We can only hold the single job's AioContext lock while calling
+/*
+ * We can only hold the single job's AioContext lock while calling
  * job_finalize_single() because the finalization callbacks can involve
- * calls of AIO_WAIT_WHILE(), which could deadlock otherwise. */
-aio_context_release(outer_ctx);
+ * calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
+ * Note that the job's AioContext may change when it is finalized.
+ */
+job_ref(job);
+aio_context_release(job->aio_context);
 
 /* Other jobs are effectively cancelled by us, set the status for
  * them; this job, however, may or may not be cancelled, depending
@@ -769,6 +772,10 @@ static void job_completed_txn_abort(Job *job)
 }
 while (!QLIST_EMPTY(&txn->jobs)) {
 other_job = QLIST_FIRST(&txn->jobs);
+/*
+ * The job's AioContext may change, so store it in @ctx so we
+ * release the same context that we have acquired before.
+ */
 ctx = other_job->aio_context;
 aio_context_acquire(ctx);
 if (!job_is_completed(other_job)) {
@@ -779,7 +786,13 @@ static void job_completed_txn_abort(Job *job)
 aio_context_release(ctx);
 }
 
-aio_context_acquire(outer_ctx);
+/*
+ * Use job_ref()/job_unref() so we can read the AioContext here
+ * even if the job went away during job_finalize_single().
+ */
+ctx = job->aio_context;
+job_unref(job);
+aio_context_acquire(ctx);
 
 job_txn_unref(txn);
 }
-- 
2.31.1