date:20180516

Re: [Qemu-block] Problem with data miscompare using scsi-hd, cache=none and io=threads

2018-05-16 Thread Paolo Bonzini

On 15/05/2018 23:25, Daniel Henrique Barboza wrote:
> This is the current status of this investigation. I decided to start a
> discussion here, see if someone can point me something that I overlooked
> or got it wrong, before I started changing the POSIX thread pool
> behavior to see if I can enforce one specific POSIX thread to do a
> read() if we had a write() done in the same fd. Any suggestions?

Copying from the bug:

> Unless we learn something new, my understanding is that we're dealing
> with a host side limitation/bug when calling pwritev() in a different
> thread than a following preadv(), using the same file descriptor
> opened with O_DIRECT and no WCE in the host side, the kernel can't
> grant data coherency, e.g:
> 
> - thread A executes a pwritev() writing dataA in the disk
> 
> - thread B executes a preadv() call to read the data, but this
> preadv() call isn't aware of the previous pwritev() call done in
> thread A, thus the guarantee of the preadv() call reading dataA isn't
> assured (as opposed to what is described in man 3 write)
> 
> - the physical disk, due to the heavy load of the stress test, didn't
> finish writing up dataA. Since the disk itself doesn't have any
> internal cache to rely on, the preadv() call goes in and read an old
> data that differs from dataA.

There is a problem in the reasoning of the third point: if the physical
disk hasn't yet finished writing up dataA, pwritev() shouldn't have
returned.  This could be a bug in the kernel, or even in the disk.  I
suspect the kernel because SCSI passthrough doesn't show the bug; SCSI
passthrough uses ioctl() which completes exactly when the disk tells
QEMU that the command is done---it cannot report completion too early.

(Another small problem in the third point is that the disk actually does
have a cache.  But the cache should be transparent, if it weren't the
bug would be in the disk firmware).

It has to be debugged and fixed in the kernel.  The thread pool is
just... a thread pool, and shouldn't be working around bugs, especially
as serious as these.

A more likely possibility: maybe the disk has 4K sectors and QEMU is
doing read-modify-write cycles to emulate 512 byte sectors?  In this
case, mismatches are not expected, since QEMU serializes RMW cycles, but
at least we would know that the bug would be in QEMU, and where.

Paolo

[Qemu-block] [PATCH v2] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread Jie Wang

From: w00251574 

When we call addIOThread, the epollfd created in aio_context_setup,
but not close it in the process of delIOThread, so the epollfd will leak.

Signed-off-by: Jie Wang 
---
 include/block/aio.h | 8 
 util/aio-posix.c| 7 +++
 util/aio-win32.c| 4 
 util/async.c| 1 +
 4 files changed, 20 insertions(+)

diff --git a/include/block/aio.h b/include/block/aio.h
index a1d6b9e249..ae6f354e6c 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -554,6 +554,14 @@ static inline bool in_aio_context_home_thread(AioContext 
*ctx)
  */
 void aio_context_setup(AioContext *ctx);
 
+/**
+ * aio_context_destroy:
+ * @ctx: the aio context
+ *
+ * Destroy the aio context.
+ */
+void aio_context_destroy(AioContext *ctx);
+
 /**
  * aio_context_set_poll_params:
  * @ctx: the aio context
diff --git a/util/aio-posix.c b/util/aio-posix.c
index d8f0cb4af8..bd81455851 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -713,6 +713,13 @@ void aio_context_setup(AioContext *ctx)
 #endif
 }
 
+void aio_context_destroy(AioContext *ctx)
+{
+#ifdef CONFIG_EPOLL_CREATE1
+close(ctx->epollfd);
+#endif
+}
+
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
  int64_t grow, int64_t shrink, Error **errp)
 {
diff --git a/util/aio-win32.c b/util/aio-win32.c
index a67b00c6ad..e676a8d9b2 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -407,6 +407,10 @@ void aio_context_setup(AioContext *ctx)
 {
 }
 
+void aio_context_destroy(AioContext *ctx)
+{
+}
+
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
  int64_t grow, int64_t shrink, Error **errp)
 {
diff --git a/util/async.c b/util/async.c
index 4dd9d95a9e..03f62787f2 100644
--- a/util/async.c
+++ b/util/async.c
@@ -298,6 +298,7 @@ aio_ctx_finalize(GSource *source)
 qemu_rec_mutex_destroy(&ctx->lock);
 qemu_lockcnt_destroy(&ctx->list_lock);
 timerlistgroup_deinit(&ctx->tlg);
+aio_context_destroy(ctx);
 }
 
 static GSourceFuncs aio_source_funcs = {
-- 
2.15.0.windows.1

Re: [Qemu-block] [PATCH v2] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread Fam Zheng

On Wed, 05/16 14:39, Jie Wang wrote:
> From: w00251574 
> 
> When we call addIOThread, the epollfd created in aio_context_setup,
> but not close it in the process of delIOThread, so the epollfd will leak.
> 
> Signed-off-by: Jie Wang 
> ---
>  include/block/aio.h | 8 
>  util/aio-posix.c| 7 +++
>  util/aio-win32.c| 4 
>  util/async.c| 1 +
>  4 files changed, 20 insertions(+)
> 
> diff --git a/include/block/aio.h b/include/block/aio.h
> index a1d6b9e249..ae6f354e6c 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -554,6 +554,14 @@ static inline bool in_aio_context_home_thread(AioContext 
> *ctx)
>   */
>  void aio_context_setup(AioContext *ctx);
>  
> +/**
> + * aio_context_destroy:
> + * @ctx: the aio context
> + *
> + * Destroy the aio context.
> + */
> +void aio_context_destroy(AioContext *ctx);
> +
>  /**
>   * aio_context_set_poll_params:
>   * @ctx: the aio context
> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index d8f0cb4af8..bd81455851 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -713,6 +713,13 @@ void aio_context_setup(AioContext *ctx)
>  #endif
>  }
>  
> +void aio_context_destroy(AioContext *ctx)
> +{
> +#ifdef CONFIG_EPOLL_CREATE1
> +close(ctx->epollfd);

In aio_context_setup, epoll_create1 could fail. Putting this inside an "if
(ctx->epllfd >= 0)" condition is cleaner, I think. 

> +#endif
> +}
> +
>  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
>   int64_t grow, int64_t shrink, Error **errp)
>  {
> diff --git a/util/aio-win32.c b/util/aio-win32.c
> index a67b00c6ad..e676a8d9b2 100644
> --- a/util/aio-win32.c
> +++ b/util/aio-win32.c
> @@ -407,6 +407,10 @@ void aio_context_setup(AioContext *ctx)
>  {
>  }
>  
> +void aio_context_destroy(AioContext *ctx)
> +{
> +}
> +
>  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
>   int64_t grow, int64_t shrink, Error **errp)
>  {
> diff --git a/util/async.c b/util/async.c
> index 4dd9d95a9e..03f62787f2 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -298,6 +298,7 @@ aio_ctx_finalize(GSource *source)
>  qemu_rec_mutex_destroy(&ctx->lock);
>  qemu_lockcnt_destroy(&ctx->list_lock);
>  timerlistgroup_deinit(&ctx->tlg);
> +aio_context_destroy(ctx);
>  }
>  
>  static GSourceFuncs aio_source_funcs = {
> -- 
> 2.15.0.windows.1
>

Re: [Qemu-block] [PATCH v2] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread Peter Xu

On Wed, May 16, 2018 at 02:39:44PM +0800, Jie Wang wrote:
> From: w00251574 

(Maybe you'd prefer to still use "Jie Wang" here? :)

> 
> When we call addIOThread, the epollfd created in aio_context_setup,
> but not close it in the process of delIOThread, so the epollfd will leak.
> 
> Signed-off-by: Jie Wang 

[...]

> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index d8f0cb4af8..bd81455851 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -713,6 +713,13 @@ void aio_context_setup(AioContext *ctx)
>  #endif
>  }
>  
> +void aio_context_destroy(AioContext *ctx)
> +{
> +#ifdef CONFIG_EPOLL_CREATE1
> +close(ctx->epollfd);

Would it be better to call aio_epoll_disable() here?  Otherwise it
looks good to me.

> +#endif
> +}
> +

Regards,

-- 
Peter Xu

Re: [Qemu-block] [Qemu-devel] Problem with data miscompare using scsi-hd, cache=none and io=threads

2018-05-16 Thread Dr. David Alan Gilbert

* Daniel Henrique Barboza (danie...@linux.ibm.com) wrote:
> Hi,
> 
> I've been working in the last two months in a miscompare issue that happens
> when using a raid device and a SATA as scsi-hd (emulated SCSI) with
> cache=none and io=threads during a hardware stress test. I'll summarize it
> here as best as I can without creating a great wall of text - Red Hat folks
> can check [1] for all the details.
> 
> Using the following setup:
> 
> - Host is a POWER9 RHEL 7.5-alt: kernel 4.14.0-49.1.1.el7a.ppc64le,
> qemu-kvm-ma 2.10.0-20.el7 (also reproducible with upstream QEMU)
> 
> - Guest is RHEL 7.5-alt using the same kernel as the host, using two storage
> disks (a 1.8 Tb raid and a 446Gb SATA drive) as follows:
> 
>     
>   
>   
>   
>   
>   
>     
> 
> Both block devices have WCE off in the host.
> 
> With this env, we found problems when running a stress test called HTX [2].
> At a given time (usually after 24+ hours of test) HTX finds a data
> miscompare in one of the devices. This is an example:
> 
> ---
> 
> Device name: /dev/sdb
> Total blocks: 0x74706daf, Block size: 0x200
> Rule file name: /usr/lpp/htx/rules/reg/hxestorage/default.hdd
> Number of Rulefile passes (cycle) completed: 0
> Stanza running: rule_6, Thread no.: 8
> Oper performed: wrc, Current seek type: SEQ
> LBA no. where IO started: 0x94fa
> Transfer size: 0x8400
> 
> Miscompare Summary:
> ===
> LBA no. where miscomapre started: 0x94fa
> LBA no. where miscomapre ended:   0x94ff
> Miscompare start offset (in bytes):   0x8
> Miscomapre end offset (in bytes): 0xbff
> Miscompare size (in bytes):   0xbf8
> 
> Expected data (at miscomapre offset): 8c9aea5a736462007275
> Actual data (at miscomapre offset): 889aea5a736462007275

Are all the miscompares single bit errors like that one?
Is the test doing single bit manipulation or is that coming out of the
blue?

Dave

> -
> 
> 
> This means that the test executed a write at  LBA 0x94fa and, after
> confirming that the write was completed, issue 2 reads in the same LBA to
> assert the written contents and found out a mismatch.
> 
> 
> I've tested all sort of configurations between disk vs LUN, cache modes and
> AIO. My findings are:
> 
> - using device='lun' instead of device='disk', I can't reproduce the issue
> doesn't matter what other configurations are;
> - using device='disk' but with cache='writethrough', issue doesn't happen
> (haven't checked other cache modes);
> - using device='disk', cache='none' and io='native', issue doesn't happen.
> 
> 
> The issue seems to be tied with the combination device=disk + cache=none +
> io=threads. I've started digging into the SCSI layer all the way down to the
> block backend. With a shameful amount of logs I've discovered that, in the
> write that the test finds a miscompare, in block/file-posix.c:
> 
> - when doing the write, handle_aiocb_rw_vector() returns success, pwritev()
> reports that all bytes were written
> - in both reads after the write, handle_aiocb_rw_vector returns success, all
> bytes read by preadv(). In both reads, the data read is different from the
> data written by  the pwritev() that happened before
> 
> In the discussions at [1], Fam Zheng suggested a test in which we would take
> down the number of threads created in the POSIX thread pool from 64 to 1.
> The idea is to ensure that we're using the same thread to write and read.
> There was a suspicion that the kernel can't guarantee data coherency between
> different threads, even if using the same fd, when using pwritev() and
> preadv(). This would explain why the following reads in the same fd would
> fail to retrieve the same data that was written before. After doing this
> modification, the miscompare didn't reproduce.
> 
> After reverting the thread pool number change, I've made a couple of
> attempts trying to flush before read() and flushing after write(). Both
> attempts failed - the miscompare appears in both scenarios. This enforces
> the suspicion we have above - if data coherency can't be granted between
> different threads, flushing in different threads wouldn't make a difference
> too. I've also tested a suggestion from Fam where I started the disks with
> "cache.direct=on,cache.no-flush=off" - bug still reproduces.
> 
> 
> This is the current status of this investigation. I decided to start a
> discussion here, see if someone can point me something that I overlooked or
> got it wrong, before I started changing the POSIX thread pool behavior to
> see if I can enforce one specific POSIX thread to do a read() if we had a
> write() done in the same fd. Any suggestions?
> 
> 
> 
> ps: it is worth mentioning that I was able to reproduce this same bug in a
> POWER8 system running Ubuntu 18.04. Given that the code we're dealing with
> doesn't have any arch-specific behavior I wouldn't be surprised if this bug
> is also reproducible in other archs like x86.
> 
> 
> Thanks,
> 
> Dani

[Qemu-block] [PATCH v3] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread Jie Wang

From: w00251574 

When we call addIOThread, the epollfd created in aio_context_setup,
but not close it in the process of delIOThread, so the epollfd will leak.

Signed-off-by: Jie Wang 
---
 include/block/aio.h | 8 
 util/aio-posix.c| 9 +
 util/aio-win32.c| 4 
 util/async.c| 1 +
 4 files changed, 22 insertions(+)

diff --git a/include/block/aio.h b/include/block/aio.h
index a1d6b9e249..ae6f354e6c 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -554,6 +554,14 @@ static inline bool in_aio_context_home_thread(AioContext 
*ctx)
  */
 void aio_context_setup(AioContext *ctx);
 
+/**
+ * aio_context_destroy:
+ * @ctx: the aio context
+ *
+ * Destroy the aio context.
+ */
+void aio_context_destroy(AioContext *ctx);
+
 /**
  * aio_context_set_poll_params:
  * @ctx: the aio context
diff --git a/util/aio-posix.c b/util/aio-posix.c
index d8f0cb4af8..0ade2c7791 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -713,6 +713,15 @@ void aio_context_setup(AioContext *ctx)
 #endif
 }
 
+void aio_context_destroy(AioContext *ctx)
+{
+#ifdef CONFIG_EPOLL_CREATE1
+if (ctx->epollfd >= 0) {
+close(ctx->epollfd);
+}
+#endif
+}
+
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
  int64_t grow, int64_t shrink, Error **errp)
 {
diff --git a/util/aio-win32.c b/util/aio-win32.c
index a67b00c6ad..e676a8d9b2 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -407,6 +407,10 @@ void aio_context_setup(AioContext *ctx)
 {
 }
 
+void aio_context_destroy(AioContext *ctx)
+{
+}
+
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
  int64_t grow, int64_t shrink, Error **errp)
 {
diff --git a/util/async.c b/util/async.c
index 4dd9d95a9e..03f62787f2 100644
--- a/util/async.c
+++ b/util/async.c
@@ -298,6 +298,7 @@ aio_ctx_finalize(GSource *source)
 qemu_rec_mutex_destroy(&ctx->lock);
 qemu_lockcnt_destroy(&ctx->list_lock);
 timerlistgroup_deinit(&ctx->tlg);
+aio_context_destroy(ctx);
 }
 
 static GSourceFuncs aio_source_funcs = {
-- 
2.15.0.windows.1

[Qemu-block] [PATCH v4] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread Jie Wang

When we call addIOThread, the epollfd created in aio_context_setup,
but not close it in the process of delIOThread, so the epollfd will leak.

Signed-off-by: Jie Wang 
---
 include/block/aio.h | 8 
 util/aio-posix.c| 9 +
 util/aio-win32.c| 4 
 util/async.c| 1 +
 4 files changed, 22 insertions(+)

diff --git a/include/block/aio.h b/include/block/aio.h
index a1d6b9e..ae6f354 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -555,6 +555,14 @@ static inline bool in_aio_context_home_thread(AioContext 
*ctx)
 void aio_context_setup(AioContext *ctx);
 
 /**
+ * aio_context_destroy:
+ * @ctx: the aio context
+ *
+ * Destroy the aio context.
+ */
+void aio_context_destroy(AioContext *ctx);
+
+/**
  * aio_context_set_poll_params:
  * @ctx: the aio context
  * @max_ns: how long to busy poll for, in nanoseconds
diff --git a/util/aio-posix.c b/util/aio-posix.c
index d8f0cb4..0ade2c7 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -713,6 +713,15 @@ void aio_context_setup(AioContext *ctx)
 #endif
 }
 
+void aio_context_destroy(AioContext *ctx)
+{
+#ifdef CONFIG_EPOLL_CREATE1
+if (ctx->epollfd >= 0) {
+close(ctx->epollfd);
+}
+#endif
+}
+
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
  int64_t grow, int64_t shrink, Error **errp)
 {
diff --git a/util/aio-win32.c b/util/aio-win32.c
index a67b00c..e676a8d 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -407,6 +407,10 @@ void aio_context_setup(AioContext *ctx)
 {
 }
 
+void aio_context_destroy(AioContext *ctx)
+{
+}
+
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
  int64_t grow, int64_t shrink, Error **errp)
 {
diff --git a/util/async.c b/util/async.c
index 4dd9d95..03f6278 100644
--- a/util/async.c
+++ b/util/async.c
@@ -298,6 +298,7 @@ aio_ctx_finalize(GSource *source)
 qemu_rec_mutex_destroy(&ctx->lock);
 qemu_lockcnt_destroy(&ctx->list_lock);
 timerlistgroup_deinit(&ctx->tlg);
+aio_context_destroy(ctx);
 }
 
 static GSourceFuncs aio_source_funcs = {
-- 
1.8.3.1

Re: [Qemu-block] [PATCH 17/42] job: Move defer_to_main_loop to Job

2018-05-16 Thread Max Reitz

On 2018-05-15 14:17, Kevin Wolf wrote:
> Am 14.05.2018 um 17:52 hat Max Reitz geschrieben:
>> On 2018-05-09 18:26, Kevin Wolf wrote:
>>> Signed-off-by: Kevin Wolf 
>>> ---
>>>  include/block/blockjob.h |  5 
>>>  include/block/blockjob_int.h | 19 ---
>>>  include/qemu/job.h   | 20 
>>>  block/backup.c   |  7 +++---
>>>  block/commit.c   | 11 +
>>>  block/mirror.c   | 15 ++--
>>>  block/stream.c   | 14 +--
>>>  blockjob.c   | 57 
>>> 
>>>  job.c| 33 +
>>>  tests/test-bdrv-drain.c  |  7 +++---
>>>  tests/test-blockjob-txn.c| 13 +-
>>>  tests/test-blockjob.c|  7 +++---
>>>  12 files changed, 98 insertions(+), 110 deletions(-)
>>
>> [...]
>>
>>> diff --git a/block/commit.c b/block/commit.c
>>> index 85baea8f92..d326766e4d 100644
>>> --- a/block/commit.c
>>> +++ b/block/commit.c
>>> @@ -72,9 +72,10 @@ typedef struct {
>>>  int ret;
>>>  } CommitCompleteData;
>>>  
>>> -static void commit_complete(BlockJob *job, void *opaque)
>>> +static void commit_complete(Job *job, void *opaque)
>>>  {
>>> -CommitBlockJob *s = container_of(job, CommitBlockJob, common);
>>> +CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
>>
>> Now this is just abuse.
>>
>> ...but it's not the first time someone packs two container_of() into
>> one, it appears.  So, whatever, I guess.
> 
> I don't think it's abuse. Why wouldn't I directly cast to the type that
> I really want instead of casting to each of the uninteresting parent
> classes, too?

Because the final parameter is called "member" and not "path". :-)

>>> @@ -170,3 +171,35 @@ void job_unref(Job *job)
>>>  g_free(job);
>>>  }
>>>  }
>>> +
>>> +typedef struct {
>>> +Job *job;
>>> +JobDeferToMainLoopFn *fn;
>>> +void *opaque;
>>> +} JobDeferToMainLoopData;
>>> +
>>> +static void job_defer_to_main_loop_bh(void *opaque)
>>> +{
>>> +JobDeferToMainLoopData *data = opaque;
>>> +Job *job = data->job;
>>> +AioContext *aio_context = job->aio_context;
>>> +
>>> +/* Prevent race with job_defer_to_main_loop() */
>>> +aio_context_acquire(aio_context);
>>
>> I don't have a good feeling about this.  The original code had this
>> comment above an aio_context_acquire() for a context that might
>> decidedly not have anything to do with the BB's context;
>> block_job_defer_to_main_loop()'s description was that it would acquire
>> the latter, so why did it acquire the former at all?
>>
>> We wouldn't need this comment here at all, because acquiring this
>> AioContext is part of the interface.  That's why I don't have a good
>> feeling.
> 
> To be honest, I don't fully understand what the old code was trying to
> do or what race it was talking about, because I don't see any potential
> race with job_defer_to_main_loop() (neither in the old nor in the new
> code).
> 
> My suspicion is that Stefan solved the race that you reported [1] (and
> which was not with job_defer_to_main_loop(), but with random code that
> runs between that and the BH) just by adding some more code that made
> the scenario safe, but missed that this also made the existing code
> redundant. In other words, I think taking data->aio_context didn't serve
> a purpose even in the old code.

Possible, yes.

Also seems more likely than any of the explanations I tried to come up with.

> The only thing it could possibly protect is the access of data->job->bs,
> but that's not something that changes between job_defer_to_main_loop()
> and the execution of the BH.
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2014-10/msg00260.html
> 
>> The best explanation I can come up with is that the original code
>> acquired the AioContext both of the block device at the time of the BH
>> (because that needs to be done), and at the time of
>> block_job_defer_to_main_loop() -- because the latter is probably the
>> context the block_job_defer_to_main_loop() call came from, so it should
>> be (b)locked.
>>
>> But if that's the case, then the same should be done here.  The context
>> of the job may change between scheduling the BH and the BH being
>> executed, so we might lock a different context here than the one
>> job_defer_to_main_loop() ran in (i.e., job->aio_context at the time of
>> job_defer_to_main_loop() running).  And maybe we should lock that old
>> context, too -- just like block_job_defer_to_main_loop_bh() did.
> 
> Why should we lock the old context? We don't access anything protected
> by it. Even the data->job->bs access has gone away because we now have
> job->aio_context.

Because the old code did so and I don't know why. O:-)

Your explanation does make sense to me, though, so:

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 32/42] job: Move completion and cancellation to Job

2018-05-16 Thread Max Reitz

On 2018-05-15 14:59, Kevin Wolf wrote:
> Am 14.05.2018 um 22:53 hat Max Reitz geschrieben:
>> On 2018-05-09 18:26, Kevin Wolf wrote:
>>> This moves the top-level job completion and cancellation functions from
>>> BlockJob to Job.
>>>
>>> Signed-off-by: Kevin Wolf 
> 
>>> @@ -3362,7 +3362,7 @@ static void bdrv_close(BlockDriverState *bs)
>>>  
>>>  void bdrv_close_all(void)
>>>  {
>>> -block_job_cancel_sync_all();
>>> +job_cancel_sync_all();
>>
>> Do we really want to cancel jobs that might have nothing to do with the
>> block layer?
> 
> Yes, though I admit it's a bit ugly to do it here. Alternatively, I
> could assert here that no jobs are running and pull the
> job_cancel_sync_all() into the callers. For vl.c, this is trivial.
> qemu-nbd.c uses 'atexit(bdrv_close_all);', so I'd have to introduce a
> wrapper function that calls both job_cancel_sync_all() and
> bdrv_close_all().
> 
> Would you prefer that?

Yes, I'd like that.

Thanks!

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 39/42] job: Add lifecycle QMP commands

2018-05-16 Thread Max Reitz

On 2018-05-15 16:08, Kevin Wolf wrote:
> Am 15.05.2018 um 00:31 hat Max Reitz geschrieben:
>> On 2018-05-09 18:26, Kevin Wolf wrote:
>>> This adds QMP commands that control the transition between states of the
>>> job lifecycle.
>>>
>>> Signed-off-by: Kevin Wolf 
>>> ---
>>>  qapi/job.json | 112 ++
>>>  job-qmp.c | 140 
>>> ++
>>>  MAINTAINERS   |   1 +
>>>  Makefile.objs |   2 +-
>>>  trace-events  |   9 
>>>  5 files changed, 263 insertions(+), 1 deletion(-)
>>>  create mode 100644 job-qmp.c
>>>
>>> diff --git a/qapi/job.json b/qapi/job.json
>>> index bd88a358d0..7b84158292 100644
>>> --- a/qapi/job.json
>>> +++ b/qapi/job.json
>>> @@ -63,3 +63,115 @@
>>>  { 'event': 'JOB_STATUS_CHANGE',
>>>'data': { 'id': 'str',
>>>  'status': 'JobStatus' } }
>>> +
>>> +##
>>> +# @job-pause:
>>> +#
>>> +# Pause an active job.
>>> +#
>>> +# This command returns immediately after marking the active job for 
>>> pausing.
>>> +# Pausing an already paused job has no cumulative effect; a single 
>>> job-resume
>>> +# command will resume the job.
>>
>> Pausing an already paused job is, in fact, an error.
>>
>> (Which should be noted here instead of making it appear like it'd be
>> idempotent.)
> 
> Aye. Interestingly, I managed to fix it below in job-resume, but missed
> it here. I should also stick in a patch somewhere early in the series
> that fixes the documentation of block-job-pause/resume.
> 
>>> +##
>>> +# @job-cancel:
>>> +#
>>> +# Instruct an active background job to cancel at the next opportunity.
>>> +# This command returns immediately after marking the active job for
>>> +# cancellation.
>>> +#
>>> +# The job will cancel as soon as possible and then emit a JOB_STATUS_CHANGE
>>> +# event. Usually, the status will change to ABORTING, but it is possible 
>>> that
>>> +# a job successfully completes (e.g. because it was almost done and there 
>>> was
>>> +# no opportunity to cancel earlier than completing the job) and 
>>> transitions to
>>> +# PENDING instead.
>>> +#
>>> +# Note that if you issue 'job-cancel' after a mirror block job has 
>>> indicated
>>> +# (via the event BLOCK_JOB_READY, and by transitioning into the READY 
>>> state)
>>> +# that the source and destination are synchronized, then the job always
>>> +# completes successfully and transitions to PENDING as well as triggering 
>>> the
>>> +# event BLOCK_JOB_COMPLETED, to indicate that the mirroring has ended and 
>>> the
>>> +# destination now has a point-in-time copy tied to the time of the
>>> +# cancellation.
>>> +#
>>> +# @id: The job identifier.
>>> +#
>>> +# @force: If true, and the job is already in the READY state, abandon the 
>>> job
>>> +# immediately (even if it is paused) instead of waiting for the
>>> +# destination to complete its final synchronization
>>
>> The note on "final synchronization" is extremely mirror-specific.  I see
>> three choices here:
>>
>> (1) If mirror stays the only job with this special cancel semantics,
>> then we are free to note that this is a mirror-specific flag here.
>>
>> (2) Even if some other job might come along at some point where use of
>> @force may make sense, that doesn't stop us from now noting that only
>> mirror supports this, which helps readers understand what "destination"
>> and "final synchronization" mean.
>>
>> (Yes, so (1) and (2) are basically the same.)
>>
>> (3) We try to find some general description and drop the last part.
>> Like "If a job would normally decide to complete instead of actually
>> aborting, this flag can be used to convince it otherwise."  But that's
>> so handwavy, I'd rather just mark it as a special mirror flag for now.
> 
> Or how about this one?
> 
> (4) Mirror is really abusing cancel for a second completion mode and we
> don't want to have this kind of ugliness in job-cancel.

Yeah, well, right, that was implicit and I tried to be more diplomatic
by calling it "special semantics". :-)

> Remove
> @force from the schema and internally always use force=true. For
> now, block-job-cancel does the job (no pun intended) for mirror, and
> maybe we can later introduce a way to select completion mode with
> job-complete.

Sounds good to me.

> This would also get us rid of that whole long paragraph above that
> explains how mirror jobs have an exceptional behaviour.

Yes.

Max

>>> diff --git a/Makefile.objs b/Makefile.objs
>>> index 3df8d58e49..253e0356f3 100644
>>> --- a/Makefile.objs
>>> +++ b/Makefile.objs
>>> @@ -66,7 +66,7 @@ chardev-obj-y = chardev/
>>>  # block-obj-y is code used by both qemu system emulation and qemu-img
>>>  
>>>  block-obj-y += nbd/
>>> -block-obj-y += block.o blockjob.o job.o
>>> +block-obj-y += block.o blockjob.o job.o job-qmp.o
>>
>> Shouldn't this be in common-obj-y like blockdev?
> 
> Seems to build with that change, so it

Re: [Qemu-block] [PATCH v4] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread WangJie (Pluto)

Hi, Peter Xu:
If call aio_epoll_disable() here, aio_epoll_disable() will return 
before close ctx->epollfd,
Because the ctx->epoll_enabled is false in the moment.
In the process of addIOThread, aio_context_setup created epoll without 
call aio_epoll_try_enable,
so ctx->epoll_enabled have no chance to set true.

On 2018/5/16 16:36, Jie Wang wrote:
> +void aio_context_destroy(AioContext *ctx)
> +{
> +#ifdef CONFIG_EPOLL_CREATE1
> +if (ctx->epollfd >= 0) {
> +close(ctx->epollfd);
> +}
> +#endif
> +}
> +
>  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
>   int64_t grow, int64_t shrink, Error **errp)

Re: [Qemu-block] [PATCH v4] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread WangJie (Pluto)

Hi, Peter Xu:
If call aio_epoll_disable() in aio_context_destroy, aio_epoll_disable() 
will return before close(ctx->epollfd),
Because the ctx->epoll_enabled is false in the moment.
In the process of addIOThread, aio_context_setup created epoll without 
call aio_epoll_try_enable,
so ctx->epoll_enabled have no chance to set true.


On 2018/5/16 16:36, Jie Wang wrote:
> When we call addIOThread, the epollfd created in aio_context_setup,
> but not close it in the process of delIOThread, so the epollfd will leak.
> 
> Signed-off-by: Jie Wang 
> ---
>  include/block/aio.h | 8 
>  util/aio-posix.c| 9 +
>  util/aio-win32.c| 4 
>  util/async.c| 1 +
>  4 files changed, 22 insertions(+)
> 
> diff --git a/include/block/aio.h b/include/block/aio.h
> index a1d6b9e..ae6f354 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -555,6 +555,14 @@ static inline bool in_aio_context_home_thread(AioContext 
> *ctx)
>  void aio_context_setup(AioContext *ctx);
>  
>  /**
> + * aio_context_destroy:
> + * @ctx: the aio context
> + *
> + * Destroy the aio context.
> + */
> +void aio_context_destroy(AioContext *ctx);
> +
> +/**
>   * aio_context_set_poll_params:
>   * @ctx: the aio context
>   * @max_ns: how long to busy poll for, in nanoseconds
> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index d8f0cb4..0ade2c7 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -713,6 +713,15 @@ void aio_context_setup(AioContext *ctx)
>  #endif
>  }
>  
> +void aio_context_destroy(AioContext *ctx)
> +{
> +#ifdef CONFIG_EPOLL_CREATE1
> +if (ctx->epollfd >= 0) {
> +close(ctx->epollfd);
> +}
> +#endif
> +}
> +
>  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
>   int64_t grow, int64_t shrink, Error **errp)
>  {
> diff --git a/util/aio-win32.c b/util/aio-win32.c
> index a67b00c..e676a8d 100644
> --- a/util/aio-win32.c
> +++ b/util/aio-win32.c
> @@ -407,6 +407,10 @@ void aio_context_setup(AioContext *ctx)
>  {
>  }
>  
> +void aio_context_destroy(AioContext *ctx)
> +{
> +}
> +
>  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
>   int64_t grow, int64_t shrink, Error **errp)
>  {
> diff --git a/util/async.c b/util/async.c
> index 4dd9d95..03f6278 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -298,6 +298,7 @@ aio_ctx_finalize(GSource *source)
>  qemu_rec_mutex_destroy(&ctx->lock);
>  qemu_lockcnt_destroy(&ctx->list_lock);
>  timerlistgroup_deinit(&ctx->tlg);
> +aio_context_destroy(ctx);
>  }
>  
>  static GSourceFuncs aio_source_funcs = {
>

Re: [Qemu-block] [PATCH 40/42] job: Add query-jobs QMP command

2018-05-16 Thread Kevin Wolf

Am 15.05.2018 um 01:09 hat Max Reitz geschrieben:
> On 2018-05-09 18:26, Kevin Wolf wrote:
> > This adds a minimal query-jobs implementation that shouldn't pose many
> > design questions. It can later be extended to expose more information,
> > and especially job-specific information.
> > 
> > Signed-off-by: Kevin Wolf 

> > +##
> > +# @JobInfo:
> > +#
> > +# Information about a job.
> > +#
> > +# @id:  The job identifier
> > +#
> > +# @type:The job type
> 
> I'm not really happy with this description that effectively provides no
> information that one cannot already get from the field name, but I
> cannot come up with something better, so I'd rather stop typing now.
> 
> (OK, my issue here is that "job type" can be anything.  That could mean
> e.g. "Is this a block job?".   Maybe an explicit reference to JobType
> might help here, although that is already part of the documentation.  On
> that note, maybe a quote from the documentation might help make my point
> that this description is not very useful:
> 
>"type: JobType"
>The job type
> 
> Maybe "The kind of job that is being performed"?)

IMHO, that's just a more verbose way of saying nothing new. The
"problem" is that "type: JobType" is already descriptive enough, there
is no real need for providing any additional information.

Also, I'd like to mention that almost all of the documentation is taken
from BlockJobInfo, so if we decide to change something, we should
probably change it in both places.

> > +# @status:  Current job state/status
> 
> Why the "state/status"?  Forgive me my incompetence, but I don't see a
> real difference here.  But in any case, one ought to be enough, no?
> 
> (OK, full disclosure: My gut tells me "status" is what we now call the
> "progress", and this field should be called "state".  But it's called
> "status" now and it doesn't really make a difference, so it's fine to
> describe it as either.)

I'll defer to John who wrote this description originally. I think we're
just using status/state inconsistently for the same thing (JobStatus,
which represents the states of the job state machine).

> > +#
> > +# @current-progress:The current progress value
> > +#
> > +# @total-progress:  The maximum progress value
> 
> Hm, not really.  This makes it sound like at any point, this will be the
> maximum even in the future, but that's not true.
> 
> Maybe "estimated progress maximum"?  Or be even more verbose (no, that
> doesn't hurt): "This is an estimation of the value @current-progress
> needs to reach for the job to complete."
> 
> (Actually, I find it important to note that it is an estimation, maybe
> we event want to be really explicit about the fact that this value may
> change all the time, in any direction.)

I'll try to improve the documentation of these fields (both here and in
BlockJobInfo) for v2.

Kevin


signature.asc
Description: PGP signature

Re: [Qemu-block] [PATCH 40/42] job: Add query-jobs QMP command

2018-05-16 Thread Max Reitz

On 2018-05-16 13:21, Kevin Wolf wrote:
> Am 15.05.2018 um 01:09 hat Max Reitz geschrieben:
>> On 2018-05-09 18:26, Kevin Wolf wrote:
>>> This adds a minimal query-jobs implementation that shouldn't pose many
>>> design questions. It can later be extended to expose more information,
>>> and especially job-specific information.
>>>
>>> Signed-off-by: Kevin Wolf 
> 
>>> +##
>>> +# @JobInfo:
>>> +#
>>> +# Information about a job.
>>> +#
>>> +# @id:  The job identifier
>>> +#
>>> +# @type:The job type
>>
>> I'm not really happy with this description that effectively provides no
>> information that one cannot already get from the field name, but I
>> cannot come up with something better, so I'd rather stop typing now.
>>
>> (OK, my issue here is that "job type" can be anything.  That could mean
>> e.g. "Is this a block job?".   Maybe an explicit reference to JobType
>> might help here, although that is already part of the documentation.  On
>> that note, maybe a quote from the documentation might help make my point
>> that this description is not very useful:
>>
>>"type: JobType"
>>The job type
>>
>> Maybe "The kind of job that is being performed"?)
> 
> IMHO, that's just a more verbose way of saying nothing new.

True, but “Job.type: JobType -- The job type” is exactly the kind of
documentation people like to make fun of.

> The
> "problem" is that "type: JobType" is already descriptive enough, there
> is no real need for providing any additional information.

Yes, but it still doesn’t read nicely.

One could give examples (“e.g. mirror or backup”), but then again, if we
were to remove any of those, we’d have to update the documentation here.
 Also, you can again argue that such a list is precisely under JobType,
which is true.

> Also, I'd like to mention that almost all of the documentation is taken
> from BlockJobInfo, so if we decide to change something, we should
> probably change it in both places.

Ha!  Good excuse, but you know, you touched this, now you own it. :-)

>>> +# @status:  Current job state/status
>>
>> Why the "state/status"?  Forgive me my incompetence, but I don't see a
>> real difference here.  But in any case, one ought to be enough, no?
>>
>> (OK, full disclosure: My gut tells me "status" is what we now call the
>> "progress", and this field should be called "state".  But it's called
>> "status" now and it doesn't really make a difference, so it's fine to
>> describe it as either.)
> 
> I'll defer to John who wrote this description originally. I think we're
> just using status/state inconsistently for the same thing (JobStatus,
> which represents the states of the job state machine).
> 
>>> +#
>>> +# @current-progress:The current progress value
>>> +#
>>> +# @total-progress:  The maximum progress value
>>
>> Hm, not really.  This makes it sound like at any point, this will be the
>> maximum even in the future, but that's not true.
>>
>> Maybe "estimated progress maximum"?  Or be even more verbose (no, that
>> doesn't hurt): "This is an estimation of the value @current-progress
>> needs to reach for the job to complete."
>>
>> (Actually, I find it important to note that it is an estimation, maybe
>> we event want to be really explicit about the fact that this value may
>> change all the time, in any direction.)
> 
> I'll try to improve the documentation of these fields (both here and in
> BlockJobInfo) for v2.

Thanks!

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [RFC 00/10] [TESTING NEEDED] python: futurize --stage1 (Python 3 compatibility)

2018-05-16 Thread Max Reitz

On 2018-05-12 00:20, Eduardo Habkost wrote:
> TESTING NEEDED: Due to the amount of changes, I didn't test all
> scripts touched by this series.  If you are responsible for any
> of the touched files, I would appreciate help on testing the
> series.

All the iotests* still pass, so for my part:

Tested-by: Max Reitz 


*That is, the ones I run, which includes all Python tests but 149.



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v4] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread Peter Xu

On Wed, May 16, 2018 at 07:14:53PM +0800, WangJie (Pluto) wrote:
> Hi, Peter Xu:
>   If call aio_epoll_disable() here, aio_epoll_disable() will return 
> before close ctx->epollfd,
> Because the ctx->epoll_enabled is false in the moment.
>   In the process of addIOThread, aio_context_setup created epoll without 
> call aio_epoll_try_enable,
> so ctx->epoll_enabled have no chance to set true.

I see that epoll_available will only be set if epollfd != -1, so it
seems to me to make more sense if we swap the two variables in
aio_epoll_disable(), from current version:

static void aio_epoll_disable(AioContext *ctx)
{
ctx->epoll_available = false;
if (!ctx->epoll_enabled) {
return;
}
ctx->epoll_enabled = false;
close(ctx->epollfd);
}

To:

static void aio_epoll_disable(AioContext *ctx)
{
ctx->epoll_enabled = false;
if (!ctx->epoll_available) {
return;
}
ctx->epoll_available = false;
close(ctx->epollfd);
}

What do you think?  And Fam?

> 
> On 2018/5/16 16:36, Jie Wang wrote:
> > +void aio_context_destroy(AioContext *ctx)
> > +{
> > +#ifdef CONFIG_EPOLL_CREATE1
> > +if (ctx->epollfd >= 0) {
> > +close(ctx->epollfd);
> > +}
> > +#endif
> > +}
> > +
> >  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
> >   int64_t grow, int64_t shrink, Error 
> > **errp)
> 

-- 
Peter Xu

Re: [Qemu-block] Restoring bitmaps after failed/cancelled migration

2018-05-16 Thread Kevin Wolf

Am 14.05.2018 um 12:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 14.05.2018 09:41, Fam Zheng wrote:
> > On Wed, 04/18 17:00, Vladimir Sementsov-Ogievskiy wrote:
> > > Is it possible, that target will change the disk, and then we return 
> > > control
> > > to the source? In this case bitmaps will be invalid. So, should not we 
> > > drop
> > > all the bitmaps on inactivate?
> > Yes, dropping all live bitmaps upon inactivate sounds reasonable. If the dst
> > fails to start, and we want to resume VM at src, we could (optionally?) 
> > reload
> > the persistent bitmaps, I guess.
> 
> Reload from where? We didn't store them.

Maybe this just means that it turns out that not storing them was a bad
idea?

What was the motivation for not storing the bitmap? The additional
downtime? Is it really that bad, though? Bitmaps should be fairly small
for the usual image sizes and writing them out should be quick.

Kevin

Re: [Qemu-block] [PATCH v3 1/8] xen_backend: add grant table helpers

2018-05-16 Thread Paul Durrant

> -Original Message-
> From: Anthony PERARD [mailto:anthony.per...@citrix.com]
> Sent: 16 May 2018 14:50
> To: Paul Durrant 
> Cc: xen-de...@lists.xenproject.org; qemu-block@nongnu.org; qemu-
> de...@nongnu.org; Stefano Stabellini 
> Subject: Re: [PATCH v3 1/8] xen_backend: add grant table helpers
> 
> On Fri, May 04, 2018 at 08:26:00PM +0100, Paul Durrant wrote:
> > This patch adds grant table helper functions to the xen_backend code to
> > localize error reporting and use of xen_domid.
> >
> > The patch also defers the call to xengnttab_open() until just before the
> > initialise method in XenDevOps is invoked. This method is responsible for
> > mapping the shared ring. No prior method requires access to the grant
> table.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Stefano Stabellini 
> > Cc: Anthony Perard 
> >
> > v2:
> >  - New in v2
> > ---
> >  hw/xen/xen_backend.c | 123
> ++-
> >  include/hw/xen/xen_backend.h |  33 
> >  2 files changed, 144 insertions(+), 12 deletions(-)
> >
> > diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
> > index 7445b50..50412d6 100644
> > --- a/hw/xen/xen_backend.c
> > +++ b/hw/xen/xen_backend.c
> > @@ -106,6 +106,103 @@ int xen_be_set_state(struct XenDevice *xendev,
> enum xenbus_state state)
> >  return 0;
> >  }
> >
> > +void xen_be_set_max_grant_refs(struct XenDevice *xendev,
> > +   unsigned int nr_refs)
> 
> Is it fine to ignore error from set_max_grants and continue ? xen_disk.c
> seems to fail the initialisation if set_max_grants call fails. On the
> other end, xen-usb.c just keep going.
> 

I guess the upshot will be that a subsequent grant map would fail, so I think 
it should be sufficient to deal with the failure there. As you say it's use is 
inconsistent, and just plain missing in some cases.

> > +{
> > +assert(xendev->ops->flags & DEVOPS_FLAG_NEED_GNTDEV);
> > +
> > +if (xengnttab_set_max_grants(xendev->gnttabdev, nr_refs)) {
> > +xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> > +  strerror(errno));
> > +}
> > +}
> > +
> 
> > +int xen_be_copy_grant_refs(struct XenDevice *xendev,
> > +   bool to_domain,
> > +   XenGrantCopySegment segs[],
> > +   unsigned int nr_segs)
> > +{
> > +xengnttab_grant_copy_segment_t *xengnttab_segs;
> > +unsigned int i;
> > +int rc;
> > +
> > +assert(xendev->ops->flags & DEVOPS_FLAG_NEED_GNTDEV);
> > +
> > +xengnttab_segs = g_new0(xengnttab_grant_copy_segment_t,
> nr_segs);
> > +
> > +for (i = 0; i < nr_segs; i++) {
> > +XenGrantCopySegment *seg = &segs[i];
> > +xengnttab_grant_copy_segment_t *xengnttab_seg =
> &xengnttab_segs[i];
> > +
> > +if (to_domain) {
> > +xengnttab_seg->flags = GNTCOPY_dest_gref;
> > +xengnttab_seg->dest.foreign.domid = xen_domid;
> > +xengnttab_seg->dest.foreign.ref = seg->dest.foreign.ref;
> > +xengnttab_seg->dest.foreign.offset = seg->dest.foreign.offset;
> > +xengnttab_seg->source.virt = seg->source.virt;
> > +} else {
> > +xengnttab_seg->flags = GNTCOPY_source_gref;
> > +xengnttab_seg->source.foreign.domid = xen_domid;
> > +xengnttab_seg->source.foreign.ref = seg->source.foreign.ref;
> > +xengnttab_seg->source.foreign.offset =
> > +seg->source.foreign.offset;
> > +xengnttab_seg->dest.virt = seg->dest.virt;
> > +}
> 
> That's not going to work because xengnttab_grant_copy_segment_t doesn't
> exist on Xen 4.7.

Ah, I'd missed the ifdef around that in xen_disk. I'll add it here.

Cheers,

  Paul

> 
> > +
> > +xengnttab_seg->len = seg->len;
> > +}
> > +
> > +rc = xengnttab_grant_copy(xendev->gnttabdev, nr_segs,
> xengnttab_segs);
> 
> Thanks,
> 
> --
> Anthony PERARD

Re: [Qemu-block] [PATCH v3 2/8] xen_disk: remove open-coded use of libxengnttab

2018-05-16 Thread Anthony PERARD

On Fri, May 04, 2018 at 08:26:01PM +0100, Paul Durrant wrote:
> Now that helpers are present in xen_backend, this patch removes open-coded
> calls to libxengnttab from the xen_disk code.
> 
> This patch also fixes one whitspace error in the assignment of the
> XenDevOps initialise method.
> 
> Signed-off-by: Paul Durrant 

Acked-by: Anthony PERARD 

-- 
Anthony PERARD

Re: [Qemu-block] [PATCH v3 3/8] xen: remove other open-coded use of libxengnttab

2018-05-16 Thread Anthony PERARD

On Fri, May 04, 2018 at 08:26:02PM +0100, Paul Durrant wrote:
> Now that helpers are available in xen_backend, use them throughout all
> Xen PV backends.
> 
> Signed-off-by: Paul Durrant 
> ---
> diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
> index 20c43a6..73d6f1b 100644
> --- a/hw/net/xen_nic.c
> +++ b/hw/net/xen_nic.c
> @@ -160,9 +160,8 @@ static void net_tx_packets(struct XenNetDev *netdev)
>(txreq.flags & NETTXF_more_data)  ? " 
> more_data"  : "",
>(txreq.flags & NETTXF_extra_info) ? " 
> extra_info" : "");
>  
> -page = xengnttab_map_grant_ref(netdev->xendev.gnttabdev,
> -   netdev->xendev.dom,
> -   txreq.gref, PROT_READ);
> +page = xen_be_map_grant_refs(&netdev->xendev,
> + &txreq.gref, 1, PROT_READ);

xen_be_map_grant_ref instead?

>  if (page == NULL) {
>  xen_pv_printf(&netdev->xendev, 0,
>"error: tx gref dereference failed (%d)\n",
> @@ -183,7 +182,7 @@ static void net_tx_packets(struct XenNetDev *netdev)
>  qemu_send_packet(qemu_get_queue(netdev->nic),
>   page + txreq.offset, txreq.size);
>  }
> -xengnttab_unmap(netdev->xendev.gnttabdev, page, 1);
> +xen_be_unmap_grant_ref(&netdev->xendev, page);
>  net_tx_response(netdev, &txreq, NETIF_RSP_OKAY);
>  }
>  if (!netdev->tx_work) {
> @@ -254,9 +253,8 @@ static ssize_t net_rx_packet(NetClientState *nc, const 
> uint8_t *buf, size_t size
>  memcpy(&rxreq, RING_GET_REQUEST(&netdev->rx_ring, rc), sizeof(rxreq));
>  netdev->rx_ring.req_cons = ++rc;
>  
> -page = xengnttab_map_grant_ref(netdev->xendev.gnttabdev,
> -   netdev->xendev.dom,
> -   rxreq.gref, PROT_WRITE);
> +page = xen_be_map_grant_refs(&netdev->xendev, &rxreq.gref, 1,
> + PROT_WRITE);

xen_be_map_grant_ref instead?

With that fix:
Acked-by: Anthony PERARD 

-- 
Anthony PERARD

Re: [Qemu-block] [PATCH v3 1/8] xen_backend: add grant table helpers

2018-05-16 Thread Anthony PERARD

On Fri, May 04, 2018 at 08:26:00PM +0100, Paul Durrant wrote:
> This patch adds grant table helper functions to the xen_backend code to
> localize error reporting and use of xen_domid.
> 
> The patch also defers the call to xengnttab_open() until just before the
> initialise method in XenDevOps is invoked. This method is responsible for
> mapping the shared ring. No prior method requires access to the grant table.
> 
> Signed-off-by: Paul Durrant 
> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> 
> v2:
>  - New in v2
> ---
>  hw/xen/xen_backend.c | 123 
> ++-
>  include/hw/xen/xen_backend.h |  33 
>  2 files changed, 144 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
> index 7445b50..50412d6 100644
> --- a/hw/xen/xen_backend.c
> +++ b/hw/xen/xen_backend.c
> @@ -106,6 +106,103 @@ int xen_be_set_state(struct XenDevice *xendev, enum 
> xenbus_state state)
>  return 0;
>  }
>  
> +void xen_be_set_max_grant_refs(struct XenDevice *xendev,
> +   unsigned int nr_refs)

Is it fine to ignore error from set_max_grants and continue ? xen_disk.c
seems to fail the initialisation if set_max_grants call fails. On the
other end, xen-usb.c just keep going.

> +{
> +assert(xendev->ops->flags & DEVOPS_FLAG_NEED_GNTDEV);
> +
> +if (xengnttab_set_max_grants(xendev->gnttabdev, nr_refs)) {
> +xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> +  strerror(errno));
> +}
> +}
> +

> +int xen_be_copy_grant_refs(struct XenDevice *xendev,
> +   bool to_domain,
> +   XenGrantCopySegment segs[],
> +   unsigned int nr_segs)
> +{
> +xengnttab_grant_copy_segment_t *xengnttab_segs;
> +unsigned int i;
> +int rc;
> +
> +assert(xendev->ops->flags & DEVOPS_FLAG_NEED_GNTDEV);
> +
> +xengnttab_segs = g_new0(xengnttab_grant_copy_segment_t, nr_segs);
> +
> +for (i = 0; i < nr_segs; i++) {
> +XenGrantCopySegment *seg = &segs[i];
> +xengnttab_grant_copy_segment_t *xengnttab_seg = &xengnttab_segs[i];
> +
> +if (to_domain) {
> +xengnttab_seg->flags = GNTCOPY_dest_gref;
> +xengnttab_seg->dest.foreign.domid = xen_domid;
> +xengnttab_seg->dest.foreign.ref = seg->dest.foreign.ref;
> +xengnttab_seg->dest.foreign.offset = seg->dest.foreign.offset;
> +xengnttab_seg->source.virt = seg->source.virt;
> +} else {
> +xengnttab_seg->flags = GNTCOPY_source_gref;
> +xengnttab_seg->source.foreign.domid = xen_domid;
> +xengnttab_seg->source.foreign.ref = seg->source.foreign.ref;
> +xengnttab_seg->source.foreign.offset =
> +seg->source.foreign.offset;
> +xengnttab_seg->dest.virt = seg->dest.virt;
> +}

That's not going to work because xengnttab_grant_copy_segment_t doesn't
exist on Xen 4.7.

> +
> +xengnttab_seg->len = seg->len;
> +}
> +
> +rc = xengnttab_grant_copy(xendev->gnttabdev, nr_segs, xengnttab_segs);

Thanks,

-- 
Anthony PERARD

Re: [Qemu-block] [PATCH v3 3/8] xen: remove other open-coded use of libxengnttab

2018-05-16 Thread Paul Durrant

> -Original Message-
> From: Anthony PERARD [mailto:anthony.per...@citrix.com]
> Sent: 16 May 2018 15:14
> To: Paul Durrant 
> Cc: xen-de...@lists.xenproject.org; qemu-block@nongnu.org; qemu-
> de...@nongnu.org; Stefano Stabellini ; Greg Kurz
> ; Paolo Bonzini ; Jason Wang
> ; Gerd Hoffmann 
> Subject: Re: [PATCH v3 3/8] xen: remove other open-coded use of
> libxengnttab
> 
> On Fri, May 04, 2018 at 08:26:02PM +0100, Paul Durrant wrote:
> > Now that helpers are available in xen_backend, use them throughout all
> > Xen PV backends.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
> > index 20c43a6..73d6f1b 100644
> > --- a/hw/net/xen_nic.c
> > +++ b/hw/net/xen_nic.c
> > @@ -160,9 +160,8 @@ static void net_tx_packets(struct XenNetDev
> *netdev)
> >(txreq.flags & NETTXF_more_data)  ? " 
> > more_data"  : "",
> >(txreq.flags & NETTXF_extra_info) ? " 
> > extra_info" : "");
> >
> > -page = xengnttab_map_grant_ref(netdev->xendev.gnttabdev,
> > -   netdev->xendev.dom,
> > -   txreq.gref, PROT_READ);
> > +page = xen_be_map_grant_refs(&netdev->xendev,
> > + &txreq.gref, 1, PROT_READ);
> 
> xen_be_map_grant_ref instead?
> 

Yep.

> >  if (page == NULL) {
> >  xen_pv_printf(&netdev->xendev, 0,
> >"error: tx gref dereference failed (%d)\n",
> > @@ -183,7 +182,7 @@ static void net_tx_packets(struct XenNetDev
> *netdev)
> >  qemu_send_packet(qemu_get_queue(netdev->nic),
> >   page + txreq.offset, txreq.size);
> >  }
> > -xengnttab_unmap(netdev->xendev.gnttabdev, page, 1);
> > +xen_be_unmap_grant_ref(&netdev->xendev, page);
> >  net_tx_response(netdev, &txreq, NETIF_RSP_OKAY);
> >  }
> >  if (!netdev->tx_work) {
> > @@ -254,9 +253,8 @@ static ssize_t net_rx_packet(NetClientState *nc,
> const uint8_t *buf, size_t size
> >  memcpy(&rxreq, RING_GET_REQUEST(&netdev->rx_ring, rc),
> sizeof(rxreq));
> >  netdev->rx_ring.req_cons = ++rc;
> >
> > -page = xengnttab_map_grant_ref(netdev->xendev.gnttabdev,
> > -   netdev->xendev.dom,
> > -   rxreq.gref, PROT_WRITE);
> > +page = xen_be_map_grant_refs(&netdev->xendev, &rxreq.gref, 1,
> > + PROT_WRITE);
> 
> xen_be_map_grant_ref instead?
> 

And yep again.

> With that fix:
> Acked-by: Anthony PERARD 
> 

Thanks :-)

  Paul

> --
> Anthony PERARD

Re: [Qemu-block] [PATCH v3 4/8] xen_backend: add an emulation of grant copy

2018-05-16 Thread Anthony PERARD

On Fri, May 04, 2018 at 08:26:03PM +0100, Paul Durrant wrote:
> Not all Xen environments support the xengnttab_grant_copy() operation.
> E.g. where the OS is FreeBSD or Xen is older than 4.8.0.
> 
> This patch introduces an emulation of that operation using
> xengnttab_map_domain_grant_refs() and memcpy() for those environments.
> 
> Signed-off-by: Paul Durrant 
> ---

The patch looks ok, but since patch 1 is going to be change, this one is
going to need to be change as well.

-- 
Anthony PERARD

Re: [Qemu-block] [PATCH v3 4/8] xen_backend: add an emulation of grant copy

2018-05-16 Thread Paul Durrant

> -Original Message-
> From: Anthony PERARD [mailto:anthony.per...@citrix.com]
> Sent: 16 May 2018 15:31
> To: Paul Durrant 
> Cc: xen-de...@lists.xenproject.org; qemu-block@nongnu.org; qemu-
> de...@nongnu.org; Stefano Stabellini 
> Subject: Re: [PATCH v3 4/8] xen_backend: add an emulation of grant copy
> 
> On Fri, May 04, 2018 at 08:26:03PM +0100, Paul Durrant wrote:
> > Not all Xen environments support the xengnttab_grant_copy() operation.
> > E.g. where the OS is FreeBSD or Xen is older than 4.8.0.
> >
> > This patch introduces an emulation of that operation using
> > xengnttab_map_domain_grant_refs() and memcpy() for those
> environments.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> 
> The patch looks ok, but since patch 1 is going to be change, this one is
> going to need to be change as well.
> 

Sure. Thanks for the review.

  Paul

> --
> Anthony PERARD

Re: [Qemu-block] Restoring bitmaps after failed/cancelled migration

2018-05-16 Thread Vladimir Sementsov-Ogievskiy


16.05.2018 15:47, Kevin Wolf wrote:

Am 14.05.2018 um 12:09 hat Vladimir Sementsov-Ogievskiy geschrieben:

14.05.2018 09:41, Fam Zheng wrote:

On Wed, 04/18 17:00, Vladimir Sementsov-Ogievskiy wrote:

Is it possible, that target will change the disk, and then we return control
to the source? In this case bitmaps will be invalid. So, should not we drop
all the bitmaps on inactivate?

Yes, dropping all live bitmaps upon inactivate sounds reasonable. If the dst
fails to start, and we want to resume VM at src, we could (optionally?) reload
the persistent bitmaps, I guess.

Reload from where? We didn't store them.

Maybe this just means that it turns out that not storing them was a bad
idea?

What was the motivation for not storing the bitmap? The additional
downtime? Is it really that bad, though? Bitmaps should be fairly small
for the usual image sizes and writing them out should be quick.

Kevin


What are usual ones? A bitmap of standard granularity of 64k for 16Tb 
disk is ~30mb. If we have several such bitmaps it may be significant 
downtime.




--
Best regards,
Vladimir

Re: [Qemu-block] Restoring bitmaps after failed/cancelled migration

2018-05-16 Thread Kevin Wolf

Am 16.05.2018 um 17:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 16.05.2018 15:47, Kevin Wolf wrote:
> > Am 14.05.2018 um 12:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > 14.05.2018 09:41, Fam Zheng wrote:
> > > > On Wed, 04/18 17:00, Vladimir Sementsov-Ogievskiy wrote:
> > > > > Is it possible, that target will change the disk, and then we return 
> > > > > control
> > > > > to the source? In this case bitmaps will be invalid. So, should not 
> > > > > we drop
> > > > > all the bitmaps on inactivate?
> > > > Yes, dropping all live bitmaps upon inactivate sounds reasonable. If 
> > > > the dst
> > > > fails to start, and we want to resume VM at src, we could (optionally?) 
> > > > reload
> > > > the persistent bitmaps, I guess.
> > > Reload from where? We didn't store them.
> > Maybe this just means that it turns out that not storing them was a bad
> > idea?
> > 
> > What was the motivation for not storing the bitmap? The additional
> > downtime? Is it really that bad, though? Bitmaps should be fairly small
> > for the usual image sizes and writing them out should be quick.
> 
> What are usual ones? A bitmap of standard granularity of 64k for 16Tb disk
> is ~30mb. If we have several such bitmaps it may be significant downtime.

We could have an in-memory bitmap that tracks which parts of the
persistent bitmap are dirty so that you don't have to write out the
whole 30 MB during the migration downtime, but can already flush most of
the persistent bitmap before the VM is stopped.

Kevin

Re: [Qemu-block] Restoring bitmaps after failed/cancelled migration

2018-05-16 Thread Vladimir Sementsov-Ogievskiy


16.05.2018 18:32, Kevin Wolf wrote:

Am 16.05.2018 um 17:10 hat Vladimir Sementsov-Ogievskiy geschrieben:

16.05.2018 15:47, Kevin Wolf wrote:

Am 14.05.2018 um 12:09 hat Vladimir Sementsov-Ogievskiy geschrieben:

14.05.2018 09:41, Fam Zheng wrote:

On Wed, 04/18 17:00, Vladimir Sementsov-Ogievskiy wrote:

Is it possible, that target will change the disk, and then we return control
to the source? In this case bitmaps will be invalid. So, should not we drop
all the bitmaps on inactivate?

Yes, dropping all live bitmaps upon inactivate sounds reasonable. If the dst
fails to start, and we want to resume VM at src, we could (optionally?) reload
the persistent bitmaps, I guess.

Reload from where? We didn't store them.

Maybe this just means that it turns out that not storing them was a bad
idea?

What was the motivation for not storing the bitmap? The additional
downtime? Is it really that bad, though? Bitmaps should be fairly small
for the usual image sizes and writing them out should be quick.

What are usual ones? A bitmap of standard granularity of 64k for 16Tb disk
is ~30mb. If we have several such bitmaps it may be significant downtime.

We could have an in-memory bitmap that tracks which parts of the
persistent bitmap are dirty so that you don't have to write out the
whole 30 MB during the migration downtime, but can already flush most of
the persistent bitmap before the VM is stopped.

Kevin


Yes it looks possible. But how to control that downtime? Introduce 
migration state, with specific _pending function? However, it may be not 
necessary.


Anyway, I think we don't need to store it.

If we decided to resume source, bitmap is already in memory, why to 
reload it? If someone already killed source (which was in paused mode), 
it is inconsistent anyway and loss of dirty bitmap is not the worst 
possible problem.


So, finally, it looks safe enough, just to make bitmaps on source 
persistent again (or better, introduce another way to skip storing (may 
be with additional flag, so everybody will be happy), not dropping 
persistent flag). And, after source resume, we have one of the following 
situations:


1. disk was not changed during migration, so, all is ok and we have bitmaps
2. disk was changed. bitmaps are inconsistent. But not only bitmaps, the 
whole vm state is inconsistent with it's disk. This case is a bug in 
management layer and it should never happen. And possibly, we need some 
separate way, to catch such cases.


--
Best regards,
Vladimir

[Qemu-block] [PATCH] nfs: Fix error path in nfs_options_qdict_to_qapi()

2018-05-16 Thread Kevin Wolf

Don't throw away local_err, but propagate it to errp.

Signed-off-by: Kevin Wolf 
---
 block/nfs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/nfs.c b/block/nfs.c
index d6364d28bb..3349b67a76 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -571,6 +571,7 @@ static BlockdevOptionsNfs *nfs_options_qdict_to_qapi(QDict 
*options,
 qobject_unref(crumpled);
 
 if (local_err) {
+error_propagate(errp, local_err);
 return NULL;
 }
 
-- 
2.13.6

[Qemu-block] [PATCH] nfs: Remove processed options from QDict

2018-05-16 Thread Kevin Wolf

Commit c22a03454 QAPIfied option parsing in the NFS block driver, but
forgot to remove all the options we processed. Therefore, we get an
error in bdrv_open_inherit(), which thinks the remaining options are
invalid. Trying to open an NFS image will result in an error like this:

Block protocol 'nfs' doesn't support the option 'server.host'

Remove all options from the QDict to make the NFS driver work again.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
---
 block/nfs.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index 66fddf12d4..d6364d28bb 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -557,6 +557,7 @@ static BlockdevOptionsNfs *nfs_options_qdict_to_qapi(QDict 
*options,
 BlockdevOptionsNfs *opts = NULL;
 QObject *crumpled = NULL;
 Visitor *v;
+const QDictEntry *e;
 Error *local_err = NULL;
 
 crumpled = qdict_crumple(options, errp);
@@ -573,6 +574,12 @@ static BlockdevOptionsNfs *nfs_options_qdict_to_qapi(QDict 
*options,
 return NULL;
 }
 
+/* Remove the processed options from the QDict (the visitor processes
+ * _all_ options in the QDict) */
+while ((e = qdict_first(options))) {
+qdict_del(options, e->key);
+}
+
 return opts;
 }
 
-- 
2.13.6

Re: [Qemu-block] [PATCH 38/42] job: Add JOB_STATUS_CHANGE QMP event

2018-05-16 Thread Kevin Wolf

Am 15.05.2018 um 00:11 hat Max Reitz geschrieben:
> You forgot to adjust 094, and although I cannot prove it (I don't have
> an nfs setup handy right now), I have a hunch that this patch breaks 173
> as well.

NFS qemu-iotests look completely broken. And the block driver, too. I've
sent a fix for the block driver at least...

Not sure how the qemu-iotests are supposed to work, but I get messages
like these:

+mkdir: cannot create directory 
'nfs://127.0.0.1//home/kwolf/source/qemu/tests/qemu-iotests/scratch': No such 
file or directory
+common.config: Error: $TEST_DIR 
(nfs://127.0.0.1//home/kwolf/source/qemu/tests/qemu-iotests/scratch) is not a 
directory

I guess I'm okay with breaking 173 in this series when the NFS tests are
in this state...

Kevin


signature.asc
Description: PGP signature

Re: [Qemu-block] [Qemu-devel] [PATCH] nfs: Remove processed options from QDict

2018-05-16 Thread Eric Blake


On 05/16/2018 11:08 AM, Kevin Wolf wrote:

Commit c22a03454 QAPIfied option parsing in the NFS block driver, but
forgot to remove all the options we processed. Therefore, we get an
error in bdrv_open_inherit(), which thinks the remaining options are
invalid. Trying to open an NFS image will result in an error like this:

 Block protocol 'nfs' doesn't support the option 'server.host'

Remove all options from the QDict to make the NFS driver work again.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
---
  block/nfs.c | 7 +++
  1 file changed, 7 insertions(+)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [Qemu-devel] [PATCH] nfs: Fix error path in nfs_options_qdict_to_qapi()

2018-05-16 Thread Eric Blake


On 05/16/2018 11:10 AM, Kevin Wolf wrote:

Don't throw away local_err, but propagate it to errp.

Signed-off-by: Kevin Wolf 
---
  block/nfs.c | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Eric Blake 



diff --git a/block/nfs.c b/block/nfs.c
index d6364d28bb..3349b67a76 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -571,6 +571,7 @@ static BlockdevOptionsNfs *nfs_options_qdict_to_qapi(QDict 
*options,
  qobject_unref(crumpled);
  
  if (local_err) {

+error_propagate(errp, local_err);
  return NULL;
  }
  



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [Qemu-devel] [PATCH 18/42] job: Move coroutine and related code to Job

2018-05-16 Thread Kevin Wolf

Am 15.05.2018 um 01:02 hat John Snow geschrieben:
> 
> 
> On 05/09/2018 12:26 PM, Kevin Wolf wrote:
> > This commit moves some core functions for dealing with the job coroutine
> > from BlockJob to Job. This includes primarily entering the coroutine
> > (both for the first and reentering) and yielding explicitly and at pause
> > points.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  include/block/blockjob.h |  40 -
> >  include/block/blockjob_int.h |  26 --
> >  include/qemu/job.h   |  76 
> >  block/backup.c   |   2 +-
> >  block/commit.c   |   4 +-
> >  block/mirror.c   |  22 ++---
> >  block/replication.c  |   2 +-
> >  block/stream.c   |   4 +-
> >  blockdev.c   |   8 +-
> >  blockjob.c   | 201 
> > +++
> >  job.c| 137 +
> >  tests/test-bdrv-drain.c  |  38 
> >  tests/test-blockjob-txn.c|  12 +--
> >  tests/test-blockjob.c|  14 +--
> >  14 files changed, 296 insertions(+), 290 deletions(-)
> > 
> 
> [...]
> 
> >  
> >  /* Assumes the block_job_mutex is held */
> > -static bool block_job_timer_pending(BlockJob *job)
> > +static bool job_timer_pending(Job *job)
> 
> Is this intentionally left behind in blockjob.c, even once you've
> changed the name (and moved the state to job.c?)

Yes, it's just a small callback that is structually part of
block_job_set_speed(). Maybe I shouldn't rename it, but it does get a
Job rather than a BlockJob now, so I'm not sure.

Kevin

Re: [Qemu-block] [PATCH] nfs: Remove processed options from QDict

2018-05-16 Thread Jeff Cody

On Wed, May 16, 2018 at 06:08:16PM +0200, Kevin Wolf wrote:
> Commit c22a03454 QAPIfied option parsing in the NFS block driver, but
> forgot to remove all the options we processed. Therefore, we get an
> error in bdrv_open_inherit(), which thinks the remaining options are
> invalid. Trying to open an NFS image will result in an error like this:
> 
> Block protocol 'nfs' doesn't support the option 'server.host'
> 
> Remove all options from the QDict to make the NFS driver work again.
> 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Kevin Wolf 
> ---
>  block/nfs.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/block/nfs.c b/block/nfs.c
> index 66fddf12d4..d6364d28bb 100644
> --- a/block/nfs.c
> +++ b/block/nfs.c
> @@ -557,6 +557,7 @@ static BlockdevOptionsNfs 
> *nfs_options_qdict_to_qapi(QDict *options,
>  BlockdevOptionsNfs *opts = NULL;
>  QObject *crumpled = NULL;
>  Visitor *v;
> +const QDictEntry *e;
>  Error *local_err = NULL;
>  
>  crumpled = qdict_crumple(options, errp);
> @@ -573,6 +574,12 @@ static BlockdevOptionsNfs 
> *nfs_options_qdict_to_qapi(QDict *options,
>  return NULL;
>  }
>  
> +/* Remove the processed options from the QDict (the visitor processes
> + * _all_ options in the QDict) */
> +while ((e = qdict_first(options))) {
> +qdict_del(options, e->key);
> +}
> +
>  return opts;
>  }
>  
> -- 
> 2.13.6
> 

Thanks,

Applied to my block branch:

git://github.com/codyprime/qemu-kvm-jtc block

-Jeff

Re: [Qemu-block] [PATCH] nfs: Fix error path in nfs_options_qdict_to_qapi()

2018-05-16 Thread Jeff Cody

On Wed, May 16, 2018 at 06:10:34PM +0200, Kevin Wolf wrote:
> Don't throw away local_err, but propagate it to errp.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/nfs.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/nfs.c b/block/nfs.c
> index d6364d28bb..3349b67a76 100644
> --- a/block/nfs.c
> +++ b/block/nfs.c
> @@ -571,6 +571,7 @@ static BlockdevOptionsNfs 
> *nfs_options_qdict_to_qapi(QDict *options,
>  qobject_unref(crumpled);
>  
>  if (local_err) {
> +error_propagate(errp, local_err);
>  return NULL;
>  }
>  
> -- 
> 2.13.6
> 

Thanks,

Applied to my block branch:

git://github.com/codyprime/qemu-kvm-jtc block

-Jeff

Re: [Qemu-block] [PATCH 11/42] job: Add job_delete()

2018-05-16 Thread Eric Blake


On 05/09/2018 11:26 AM, Kevin Wolf wrote:

This moves freeing the Job object and its fields from block_job_unref()
to job_delete().

Signed-off-by: Kevin Wolf 
---
  include/qemu/job.h | 3 +++
  blockjob.c | 3 +--
  job.c  | 6 ++
  3 files changed, 10 insertions(+), 2 deletions(-)




+
+void job_delete(Job *job)
+{
+g_free(job->id);
+g_free(job);
+}


Should this be free-like, acting as a no-op when job == NULL on input?

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Qemu-block] [PULL 4/4] nfs: Remove processed options from QDict

2018-05-16 Thread Jeff Cody

From: Kevin Wolf 

Commit c22a03454 QAPIfied option parsing in the NFS block driver, but
forgot to remove all the options we processed. Therefore, we get an
error in bdrv_open_inherit(), which thinks the remaining options are
invalid. Trying to open an NFS image will result in an error like this:

Block protocol 'nfs' doesn't support the option 'server.host'

Remove all options from the QDict to make the NFS driver work again.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
Message-id: 20180516160816.26259-1-kw...@redhat.com
Reviewed-by: Eric Blake 
Reviewed-by: Jeff Cody 
Signed-off-by: Jeff Cody 
---
 block/nfs.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index 4ee2ad59d9..3349b67a76 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -557,6 +557,7 @@ static BlockdevOptionsNfs *nfs_options_qdict_to_qapi(QDict 
*options,
 BlockdevOptionsNfs *opts = NULL;
 QObject *crumpled = NULL;
 Visitor *v;
+const QDictEntry *e;
 Error *local_err = NULL;
 
 crumpled = qdict_crumple(options, errp);
@@ -574,6 +575,12 @@ static BlockdevOptionsNfs *nfs_options_qdict_to_qapi(QDict 
*options,
 return NULL;
 }
 
+/* Remove the processed options from the QDict (the visitor processes
+ * _all_ options in the QDict) */
+while ((e = qdict_first(options))) {
+qdict_del(options, e->key);
+}
+
 return opts;
 }
 
-- 
2.13.6

[Qemu-block] [PULL 0/4] Block patches

2018-05-16 Thread Jeff Cody

The following changes since commit c416eecea5f3aea863ab8fda5a36a24157b8f704:

  Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging 
(2018-05-15 17:02:00 +0100)

are available in the git repository at:

  git://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request

for you to fetch changes up to c82be42cc803b36fd7aed5dceec68312c7056fd5:

  nfs: Remove processed options from QDict (2018-05-16 13:37:47 -0400)


Block patches


Kevin Wolf (2):
  nfs: Fix error path in nfs_options_qdict_to_qapi()
  nfs: Remove processed options from QDict

Stefan Hajnoczi (2):
  qemu-iotests: reduce chance of races in 185
  blockjob: do not cancel timer in resume

 block/nfs.c|  8 
 blockjob.c | 22 +++---
 tests/qemu-iotests/185 | 17 +
 tests/qemu-iotests/185.out | 12 +---
 4 files changed, 41 insertions(+), 18 deletions(-)

-- 
2.13.6

[Qemu-block] [PULL 2/4] blockjob: do not cancel timer in resume

2018-05-16 Thread Jeff Cody

From: Stefan Hajnoczi 

Currently the timer is cancelled and the block job is entered by
block_job_resume().  This behavior causes drain to run extra blockjob
iterations when the job was sleeping due to the ratelimit.

This patch leaves the job asleep when block_job_resume() is called.
Jobs can still be forcibly woken up using block_job_enter(), which is
used to cancel jobs.

After this patch drain no longer runs extra blockjob iterations.  This
is the expected behavior that qemu-iotests 185 used to rely on.  We
temporarily changed the 185 test output to make it pass for the QEMU
2.12 release but now it's time to address this issue.

Cc: QingFeng Hao 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: QingFeng Hao 
Message-id: 20180508135436.30140-3-stefa...@redhat.com
Reviewed-by: Jeff Cody 
Signed-off-by: Jeff Cody 
---
 blockjob.c | 22 +++---
 tests/qemu-iotests/185 |  5 +
 tests/qemu-iotests/185.out | 12 +---
 3 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 36c5fdeb2f..112672a68b 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -209,6 +209,18 @@ static void block_job_txn_del_job(BlockJob *job)
 }
 }
 
+/* Assumes the block_job_mutex is held */
+static bool block_job_timer_pending(BlockJob *job)
+{
+return timer_pending(&job->sleep_timer);
+}
+
+/* Assumes the block_job_mutex is held */
+static bool block_job_timer_not_pending(BlockJob *job)
+{
+return !block_job_timer_pending(job);
+}
+
 static void block_job_pause(BlockJob *job)
 {
 job->pause_count++;
@@ -221,7 +233,9 @@ static void block_job_resume(BlockJob *job)
 if (job->pause_count) {
 return;
 }
-block_job_enter(job);
+
+/* kick only if no timer is pending */
+block_job_enter_cond(job, block_job_timer_not_pending);
 }
 
 void block_job_ref(BlockJob *job)
@@ -656,12 +670,6 @@ static void block_job_completed_txn_success(BlockJob *job)
 }
 }
 
-/* Assumes the block_job_mutex is held */
-static bool block_job_timer_pending(BlockJob *job)
-{
-return timer_pending(&job->sleep_timer);
-}
-
 void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
 {
 int64_t old_speed = job->speed;
diff --git a/tests/qemu-iotests/185 b/tests/qemu-iotests/185
index 975404c99d..9a2d317414 100755
--- a/tests/qemu-iotests/185
+++ b/tests/qemu-iotests/185
@@ -101,14 +101,11 @@ echo
 # command to be received (after receiving the command, the rest runs
 # synchronously, so jobs can arbitrarily continue or complete).
 #
-# Jobs present while QEMU is terminating iterate once more due to
-# bdrv_drain_all().
-#
 # The buffer size for commit and streaming is 512k (waiting for 8 seconds after
 # the first request), for active commit and mirror it's large enough to cover
 # the full 4M, and for backup it's the qcow2 cluster size, which we know is
 # 64k. As all of these are at least as large as the speed, we are sure that the
-# offset advances exactly twice before qemu exits.
+# offset advances exactly once before qemu exits.
 
 _send_qemu_cmd $h \
 "{ 'execute': 'block-commit',
diff --git a/tests/qemu-iotests/185.out b/tests/qemu-iotests/185.out
index 992162f418..57eaf8d699 100644
--- a/tests/qemu-iotests/185.out
+++ b/tests/qemu-iotests/185.out
@@ -20,7 +20,7 @@ Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 
backing_file=TEST_DIR/t.q
 {"return": {}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"SHUTDOWN", "data": {"guest": false}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 
1048576, "speed": 65536, "type": "commit"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 
524288, "speed": 65536, "type": "commit"}}
 
 === Start active commit job and exit qemu ===
 
@@ -28,8 +28,7 @@ Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 
backing_file=TEST_DIR/t.q
 {"return": {}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"SHUTDOWN", "data": {"guest": false}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_READY", "data": {"device": "disk", "len": 4194304, "offset": 
4194304, "speed": 65536, "type": "commit"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "disk", "len": 4194304, "offset": 
4194304, "speed": 65536, "type": "commit"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 4194304, "offset": 
4194304, "speed": 65536, "type": "commit"}}
 
 === Start mirror job and exit qemu ===
 
@@ -38,8 +37,7 @@ Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 
cluster_size=65536 l
 {"ret

Re: [Qemu-block] [Qemu-devel] [PATCH 18/42] job: Move coroutine and related code to Job

2018-05-16 Thread John Snow



On 05/16/2018 12:50 PM, Kevin Wolf wrote:
> Am 15.05.2018 um 01:02 hat John Snow geschrieben:
>>
>>
>> On 05/09/2018 12:26 PM, Kevin Wolf wrote:
>>> This commit moves some core functions for dealing with the job coroutine
>>> from BlockJob to Job. This includes primarily entering the coroutine
>>> (both for the first and reentering) and yielding explicitly and at pause
>>> points.
>>>
>>> Signed-off-by: Kevin Wolf 
>>> ---
>>>  include/block/blockjob.h |  40 -
>>>  include/block/blockjob_int.h |  26 --
>>>  include/qemu/job.h   |  76 
>>>  block/backup.c   |   2 +-
>>>  block/commit.c   |   4 +-
>>>  block/mirror.c   |  22 ++---
>>>  block/replication.c  |   2 +-
>>>  block/stream.c   |   4 +-
>>>  blockdev.c   |   8 +-
>>>  blockjob.c   | 201 
>>> +++
>>>  job.c| 137 +
>>>  tests/test-bdrv-drain.c  |  38 
>>>  tests/test-blockjob-txn.c|  12 +--
>>>  tests/test-blockjob.c|  14 +--
>>>  14 files changed, 296 insertions(+), 290 deletions(-)
>>>
>>
>> [...]
>>
>>>  
>>>  /* Assumes the block_job_mutex is held */
>>> -static bool block_job_timer_pending(BlockJob *job)
>>> +static bool job_timer_pending(Job *job)
>>
>> Is this intentionally left behind in blockjob.c, even once you've
>> changed the name (and moved the state to job.c?)
> 
> Yes, it's just a small callback that is structually part of
> block_job_set_speed(). Maybe I shouldn't rename it, but it does get a
> Job rather than a BlockJob now, so I'm not sure.
> 
> Kevin
> 

Your choice; it's not clear to me yet which things truly belong in which
file because I haven't had a chance to look at the finished result. I'm
sure if it looks like it needs to move later that someone will
eventually move it.

--js

[Qemu-block] [PULL 1/4] qemu-iotests: reduce chance of races in 185

2018-05-16 Thread Jeff Cody

From: Stefan Hajnoczi 

Commit 8565c3ab537e78f3e69977ec2c609dc9417a806e ("qemu-iotests: fix
185") identified a race condition in a sub-test.

Similar issues also affect the other sub-tests.  If disk I/O completes
quickly, it races with the QMP 'quit' command.  This causes spurious
test failures because QMP events are emitted in an unpredictable order.

This test relies on QEMU internals and there is no QMP API for getting
deterministic behavior needed to make this test 100% reliable.  At the
same time, the test is useful and it would be a shame to remove it.

Add sleep 0.5 to reduce the chance of races.  This is not a real fix but
appears to reduce spurious failures in practice.

Cc: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Message-id: 20180508135436.30140-2-stefa...@redhat.com
Reviewed-by: Jeff Cody 
Signed-off-by: Jeff Cody 
---
 tests/qemu-iotests/185 | 12 
 1 file changed, 12 insertions(+)

diff --git a/tests/qemu-iotests/185 b/tests/qemu-iotests/185
index 298d88d04e..975404c99d 100755
--- a/tests/qemu-iotests/185
+++ b/tests/qemu-iotests/185
@@ -118,6 +118,9 @@ _send_qemu_cmd $h \
   'speed': 65536 } }" \
 "return"
 
+# If we don't sleep here 'quit' command races with disk I/O
+sleep 0.5
+
 _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 wait=1 _cleanup_qemu
 
@@ -137,6 +140,9 @@ _send_qemu_cmd $h \
   'speed': 65536 } }" \
 "return"
 
+# If we don't sleep here 'quit' command races with disk I/O
+sleep 0.5
+
 _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 wait=1 _cleanup_qemu
 
@@ -183,6 +189,9 @@ _send_qemu_cmd $h \
   'speed': 65536 } }" \
 "return"
 
+# If we don't sleep here 'quit' command races with disk I/O
+sleep 0.5
+
 _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 wait=1 _cleanup_qemu
 
@@ -201,6 +210,9 @@ _send_qemu_cmd $h \
   'speed': 65536 } }" \
 "return"
 
+# If we don't sleep here 'quit' command races with disk I/O
+sleep 0.5
+
 _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 wait=1 _cleanup_qemu
 
-- 
2.13.6

[Qemu-block] [PULL 3/4] nfs: Fix error path in nfs_options_qdict_to_qapi()

2018-05-16 Thread Jeff Cody

From: Kevin Wolf 

Don't throw away local_err, but propagate it to errp.

Signed-off-by: Kevin Wolf 
Message-id: 20180516161034.27440-1-kw...@redhat.com
Reviewed-by: Eric Blake 
Reviewed-by: Jeff Cody 
Signed-off-by: Jeff Cody 
---
 block/nfs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/nfs.c b/block/nfs.c
index 66fddf12d4..4ee2ad59d9 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -570,6 +570,7 @@ static BlockdevOptionsNfs *nfs_options_qdict_to_qapi(QDict 
*options,
 qobject_unref(crumpled);
 
 if (local_err) {
+error_propagate(errp, local_err);
 return NULL;
 }
 
-- 
2.13.6

Re: [Qemu-block] [PATCH 12/42] job: Maintain a list of all jobs

2018-05-16 Thread Eric Blake


On 05/09/2018 11:26 AM, Kevin Wolf wrote:

This moves the job list from BlockJob to Job. Now we can check for
duplicate IDs in job_create().

Signed-off-by: Kevin Wolf 
---
  include/block/blockjob.h |  3 ---
  include/qemu/job.h   | 19 +++
  blockjob.c   | 47 +--
  job.c| 31 +++
  4 files changed, 75 insertions(+), 25 deletions(-)



@@ -71,4 +75,19 @@ JobType job_type(Job *job);
  /** Returns the enum string for the JobType of a given Job. */
  const char *job_type_str(Job *job);
  
+/**

+ * Get the next element from the list of block jobs after @job, or the
+ * first one if @job is %NULL.
+ *
+ * Returns the requested job, or %NULL if there are no more jobs left.
+ */
+Job *job_next(Job *job);


The new code is supposed to allow you to prime the search by passing in 
NULL.  This looks similar to the semantics originally held by 
block_job_next(), where




-BlockJob *block_job_next(BlockJob *job)
+static bool is_block_job(Job *job)
  {
-if (!job) {
-return QLIST_FIRST(&block_jobs);
-}


the old code did so by special-casing an incoming NULL.


-return QLIST_NEXT(job, job_list);
+return job_type(job) == JOB_TYPE_BACKUP ||
+   job_type(job) == JOB_TYPE_COMMIT ||
+   job_type(job) == JOB_TYPE_MIRROR ||
+   job_type(job) == JOB_TYPE_STREAM;
+}
+
+BlockJob *block_job_next(BlockJob *bjob)
+{
+Job *job = &bjob->job;


But the new code blindly dereferences what may be a NULL bjob.  Taking 
the address of a member of the NULL pointer may happen to work if that 
member is at offset 0, but is also likely to trigger compiler or 
sanitizer warnings; and does not work if the member is not at offset 0.


I think all you have to do is:

Job *job = bjob ? &bjob->job : NULL;


+
+do {
+job = job_next(job);
+} while (job && !is_block_job(job));
+
+
+return job ? container_of(job, BlockJob, job) : NULL;


Why two blank lines?

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [PATCH 13/42] job: Move state transitions to Job

2018-05-16 Thread Eric Blake


On 05/09/2018 11:26 AM, Kevin Wolf wrote:

This moves BlockJob.status and the closely related functions
(block_)job_state_transition() and (block_)job_apply_verb to Job. The
two QAPI enums are renamed to JobStatus and JobVerb.

Signed-off-by: Kevin Wolf 
---
  qapi/block-core.json |  14 +++
  include/block/blockjob.h |   3 --
  include/qemu/job.h   |  13 ++
  blockjob.c   | 102 +++
  job.c|  56 ++
  tests/test-blockjob.c|  39 +-
  block/trace-events   |   2 -
  trace-events |   4 ++
  8 files changed, 122 insertions(+), 111 deletions(-)




@@ -90,4 +93,14 @@ Job *job_next(Job *job);
   */
  Job *job_get(const char *id);
  
+/**

+ * Check whether the verb @bv can be applied to @job in its current state.
+ * Returns 0 if the verb can be applied; otherwise errp is set and -EPERM
+ * returned.
+ */
+int job_apply_verb(Job *job, JobVerb bv, Error **errp);


Why 'JobVerb bv' rather than 'jv' or 'verb'?


@@ -968,7 +913,6 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
  job->refcnt= 1;
  job->auto_finalize = !(flags & BLOCK_JOB_MANUAL_FINALIZE);
  job->auto_dismiss  = !(flags & BLOCK_JOB_MANUAL_DISMISS);
-block_job_state_transition(job, BLOCK_JOB_STATUS_CREATED);


No replacement to the counterpart job_state_transition() here because 
that should be handled in the parent class now, right? [1]


...


+
+int job_apply_verb(Job *job, JobVerb bv, Error **errp)


Again, the name bv made sense for BlockJobVerb, but less so now.


@@ -81,6 +135,8 @@ void *job_create(const char *job_id, const JobDriver 
*driver, Error **errp)
  job->driver= driver;
  job->id= g_strdup(job_id);
  
+job_state_transition(job, JOB_STATUS_CREATED);

+


[1] Yes, just took me a while to get to it.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [PATCH 14/42] job: Add reference counting

2018-05-16 Thread Eric Blake


On 05/09/2018 11:26 AM, Kevin Wolf wrote:

This moves reference counting from BlockJob to Job.

In order to keep calling the BlockJob cleanup code when the job is
deleted via job_unref(), introduce a new JobDriver.free callback. Every
block job must use block_job_free() for this callback, this is asserted
in block_job_create().

Signed-off-by: Kevin Wolf 
---



+++ b/job.c



+
+void job_unref(Job *job)
+{
+if (--job->refcnt == 0) {


Should this be free()-like and allow an incoming job == NULL as a no-op?

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [PATCH 17/42] job: Move defer_to_main_loop to Job

2018-05-16 Thread Eric Blake


On 05/16/2018 05:51 AM, Max Reitz wrote:


-static void commit_complete(BlockJob *job, void *opaque)
+static void commit_complete(Job *job, void *opaque)
  {
-CommitBlockJob *s = container_of(job, CommitBlockJob, common);
+CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);


Now this is just abuse.

...but it's not the first time someone packs two container_of() into
one, it appears.  So, whatever, I guess.


I don't think it's abuse. Why wouldn't I directly cast to the type that
I really want instead of casting to each of the uninteresting parent
classes, too?


Because the final parameter is called "member" and not "path". :-)


container_of() is using offsetof(); and in C99 7.17, the parameter is 
named "member-designator" which can indeed jump through multiple layers 
(any valid address constant, as defined in 6.6P9).  I don't see this as 
abuse of the interface.




The best explanation I can come up with is that the original code
acquired the AioContext both of the block device at the time of the BH
(because that needs to be done), and at the time of
block_job_defer_to_main_loop() -- because the latter is probably the
context the block_job_defer_to_main_loop() call came from, so it should
be (b)locked.

But if that's the case, then the same should be done here.  The context
of the job may change between scheduling the BH and the BH being
executed, so we might lock a different context here than the one
job_defer_to_main_loop() ran in (i.e., job->aio_context at the time of
job_defer_to_main_loop() running).  And maybe we should lock that old
context, too -- just like block_job_defer_to_main_loop_bh() did.


Why should we lock the old context? We don't access anything protected
by it. Even the data->job->bs access has gone away because we now have
job->aio_context.


Because the old code did so and I don't know why. O:-)

Your explanation does make sense to me, though, so:


Then it's best to include that explanation in the commit message itself, 
to save the future reader the hassle of digging up this thread.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [PATCH 17/42] job: Move defer_to_main_loop to Job

2018-05-16 Thread Eric Blake


On 05/09/2018 11:26 AM, Kevin Wolf wrote:

Signed-off-by: Kevin Wolf 
---


Rather sparse on the commit message, given the other comments in this 
thread.



+++ b/include/qemu/job.h
@@ -58,6 +58,9 @@ typedef struct Job {
   */
  bool cancelled;
  
+/** Set to true when the job has deferred work to the main loop. */

+bool deferred_to_main_loop;
+
  /** Element of the list of jobs */
  QLIST_ENTRY(Job) job_list;
  } Job;
@@ -131,6 +134,23 @@ Job *job_get(const char *id);
   */
  int job_apply_verb(Job *job, JobVerb bv, Error **errp);
  
+typedef void JobDeferToMainLoopFn(Job *job, void *opaque);

+
+/**
+ * @job: The job
+ * @fn: The function to run in the main loop
+ * @opaque: The opaque value that is passed to @fn
+ *
+ * This function must be called by the main job coroutine just before it
+ * returns.  @fn is executed in the main loop with the job AioContext acquired.
+ *
+ * Block jobs must call bdrv_unref(), bdrv_close(), and anything that uses
+ * bdrv_drain_all() in the main loop.


Do we still want this block-job-specific comment in the main job.h header?


+ *
+ * The @job AioContext is held while @fn executes.
+ */
+void job_defer_to_main_loop(Job *job, JobDeferToMainLoopFn *fn, void *opaque);
+



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [PATCH 23/42] blockjob: Split block_job_event_pending()

2018-05-16 Thread Eric Blake


On 05/09/2018 11:26 AM, Kevin Wolf wrote:

block_job_event_pending() doesn't only send a QMP event, but it also
transitions to the PENDING state. Split the function so that we get one
part only sending the event (like other block_job_event_* functions) and
another part than does the state transition.


s/than/that/



Signed-off-by: Kevin Wolf 
---
  blockjob.c | 27 ++-
  1 file changed, 18 insertions(+), 9 deletions(-)



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [PATCH 22/42] job: Move BlockJobCreateFlags to Job

2018-05-16 Thread Eric Blake


On 05/09/2018 11:26 AM, Kevin Wolf wrote:

This renames the BlockJobCreateFlags constants, moves a few JOB_INTERNAL
checks to job_create() and the auto_{finalize,dismiss} fields from
BlockJob to Job.


Do we still want to allow auto-finalize on all jobs, or should we keep 
it just for legacy support on block jobs?  Even more so for auto-dismiss 
(if you use the legacy interface, that's what you expect to happen; but 
other than for legacy block jobs, any sane management app is going to 
request auto-dismiss be false, so why expose it to generic jobs?)


Of course, it may still be easiest to plumb up auto- actions in the 
internal code (where it has to work to keep legacy block jobs from 
breaking, but no new callers have to enable it), so my argument may not 
apply until you actually expose a QMP interface for generic jobs.



+++ b/block/mirror.c
@@ -1282,7 +1282,7 @@ void mirror_start(const char *job_id, BlockDriverState 
*bs,
  }
  is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
  base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
-mirror_start_job(job_id, bs, BLOCK_JOB_DEFAULT, target, replaces,
+mirror_start_job(job_id, bs, JOB_DEFAULT, target, replaces,
   speed, granularity, buf_size, backing_mode,
   on_source_error, on_target_error, unmap, NULL, NULL,
   &mirror_job_driver, is_none_mode, base, false,


Just an observation that this is a lot of parameters; would using boxed 
QAPI types make these calls any more obvious?  But that's a separate 
cleanup.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [PATCH 38/42] job: Add JOB_STATUS_CHANGE QMP event

2018-05-16 Thread Eric Blake


On 05/09/2018 11:26 AM, Kevin Wolf wrote:

This adds a QMP event that is emitted whenever a job transitions from
one status to another. For the event, a new qapi/job.json schema file is
created which will contain all job-related definitions that aren't tied
to the block layer.

Signed-off-by: Kevin Wolf 
---



+++ b/qapi/job.json



+##
+# @JOB_STATUS_CHANGE:
+#
+# Emitted when a job transitions to a different status.
+#
+# @id: The job identifier
+# @status: The new job status
+#
+# Since: 2.13
+##
+{ 'event': 'JOB_STATUS_CHANGE',
+  'data': { 'id': 'str',
+'status': 'JobStatus' } }


Is it worth also trying to list the old state that the transition came 
from? But that's new compared to what block jobs are currently doing, so 
if we can't come up with a strong reason to add that, I'm okay.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [PATCH 38/42] job: Add JOB_STATUS_CHANGE QMP event

2018-05-16 Thread Kevin Wolf

Am 16.05.2018 um 21:26 hat Eric Blake geschrieben:
> On 05/09/2018 11:26 AM, Kevin Wolf wrote:
> > This adds a QMP event that is emitted whenever a job transitions from
> > one status to another. For the event, a new qapi/job.json schema file is
> > created which will contain all job-related definitions that aren't tied
> > to the block layer.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> 
> > +++ b/qapi/job.json
> 
> > +##
> > +# @JOB_STATUS_CHANGE:
> > +#
> > +# Emitted when a job transitions to a different status.
> > +#
> > +# @id: The job identifier
> > +# @status: The new job status
> > +#
> > +# Since: 2.13
> > +##
> > +{ 'event': 'JOB_STATUS_CHANGE',
> > +  'data': { 'id': 'str',
> > +'status': 'JobStatus' } }
> 
> Is it worth also trying to list the old state that the transition came from?
> But that's new compared to what block jobs are currently doing, so if we
> can't come up with a strong reason to add that, I'm okay.

If a management tool needs this information (I don't see why), it just
needs to keep track of the last state it had seen, no?

Kevin

Re: [Qemu-block] [PATCH 38/42] job: Add JOB_STATUS_CHANGE QMP event

2018-05-16 Thread Eric Blake


On 05/16/2018 03:46 PM, Kevin Wolf wrote:


+{ 'event': 'JOB_STATUS_CHANGE',
+  'data': { 'id': 'str',
+'status': 'JobStatus' } }


Is it worth also trying to list the old state that the transition came from?
But that's new compared to what block jobs are currently doing, so if we
can't come up with a strong reason to add that, I'm okay.


If a management tool needs this information (I don't see why), it just
needs to keep track of the last state it had seen, no?


And if you miss an event, the previous state is not preserved for any 
later query-job command.  Exposing something in a (possibly-missed) 
event that is not available elsewhere doesn't buy us much.  So I agree - 
there is no strong reason to advertise the previous state.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [PATCH 22/42] job: Move BlockJobCreateFlags to Job

2018-05-16 Thread Kevin Wolf

Am 16.05.2018 um 21:13 hat Eric Blake geschrieben:
> On 05/09/2018 11:26 AM, Kevin Wolf wrote:
> > This renames the BlockJobCreateFlags constants, moves a few JOB_INTERNAL
> > checks to job_create() and the auto_{finalize,dismiss} fields from
> > BlockJob to Job.
> 
> Do we still want to allow auto-finalize on all jobs, or should we keep it
> just for legacy support on block jobs?  Even more so for auto-dismiss (if
> you use the legacy interface, that's what you expect to happen; but other
> than for legacy block jobs, any sane management app is going to request
> auto-dismiss be false, so why expose it to generic jobs?)
> 
> Of course, it may still be easiest to plumb up auto- actions in the internal
> code (where it has to work to keep legacy block jobs from breaking, but no
> new callers have to enable it), so my argument may not apply until you
> actually expose a QMP interface for generic jobs.

This series doesn't expose it anyway. We can later decide whether we
want to add it or not. A sophisticated management tool like libvirt that
needs to manage individual nodes and cope with daemon restarts will
never make use of them, but they might still be useful in hand-crafted
scripts for defined special cases.

> > +++ b/block/mirror.c
> > @@ -1282,7 +1282,7 @@ void mirror_start(const char *job_id, 
> > BlockDriverState *bs,
> >   }
> >   is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
> >   base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
> > -mirror_start_job(job_id, bs, BLOCK_JOB_DEFAULT, target, replaces,
> > +mirror_start_job(job_id, bs, JOB_DEFAULT, target, replaces,
> >speed, granularity, buf_size, backing_mode,
> >on_source_error, on_target_error, unmap, NULL, NULL,
> >&mirror_job_driver, is_none_mode, base, false,
> 
> Just an observation that this is a lot of parameters; would using boxed QAPI
> types make these calls any more obvious?  But that's a separate cleanup.

I'm not sure if we have a QAPI type that matches this. But maybe it
could become a QAPI struct and very few extra parameters.

Kevin

Re: [Qemu-block] [PATCH 14/42] job: Add reference counting

2018-05-16 Thread Kevin Wolf

Am 16.05.2018 um 20:17 hat Eric Blake geschrieben:
> On 05/09/2018 11:26 AM, Kevin Wolf wrote:
> > This moves reference counting from BlockJob to Job.
> > 
> > In order to keep calling the BlockJob cleanup code when the job is
> > deleted via job_unref(), introduce a new JobDriver.free callback. Every
> > block job must use block_job_free() for this callback, this is asserted
> > in block_job_create().
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> 
> > +++ b/job.c
> 
> > +
> > +void job_unref(Job *job)
> > +{
> > +if (--job->refcnt == 0) {
> 
> Should this be free()-like and allow an incoming job == NULL as a no-op?

This behaves like block_job_unref() always behavec, and I don't see a
single caller having a NULL check before calling job_unref(), so is it
worth it?

Kevin

Re: [Qemu-block] [Qemu-devel] [RFC PATCH 06/12] qapi: add bitmap info

2018-05-16 Thread John Snow



On 05/14/2018 10:30 AM, Vladimir Sementsov-Ogievskiy wrote:
> 12.05.2018 04:25, John Snow wrote:
>> Add some of the necessary scaffolding for reporting bitmap information.
>>
>> Signed-off-by: John Snow 
>> ---
>>   qapi/block-core.json | 60
>> +++-
>>   1 file changed, 59 insertions(+), 1 deletion(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index c50517bff3..8f33f41ce7 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -33,6 +33,61 @@
>>   'date-sec': 'int', 'date-nsec': 'int',
>>   'vm-clock-sec': 'int', 'vm-clock-nsec': 'int' } }
>>   +##
>> +# @BitmapTypeEnum:
>> +#
>> +# An enumeration of possible serialized bitmap types.
>> +#
>> +# @dirty-tracking: This bitmap records information on dirty
>> +#  segments within the file.
>> +#
>> +# @unknown: This bitmap is of an unknown or reserved type.
>> +#
>> +# Since: 2.13
>> +##
>> +{ 'enum': 'BitmapTypeEnum', 'data': [ 'dirty-tracking', 'unknown' ] }
>> +
>> +##
>> +# @BitmapFlagEnum:
>> +#
>> +# An enumeration of possible flags for serialized bitmaps.
>> +#
>> +# @in-use: This bitmap is considered to be in-use, and may now be
>> inconsistent.
>> +#
>> +# @auto: This bitmap must reflect any and all changes to the file it
>> describes.
>> +#    The type of this bitmap must be @DirtyTrackingBitmap.
> 
> logical, but I don't see this restriction in the spec. May be we need to
> update the spec
> 
1: auto

"The bitmap must reflect all changes of the virtual disk by any
application that would write to this qcow2 file (including writes,
snapshot switching, etc.). The type of this bitmap must be 'dirty
tracking bitmap'."

Actually, this looks correct now that I'm looking at the spec again.
I've used a terser phrasing but I think it's correct.

>> +#
>> +# @extra-data-compatible: This bitmap has extra information
>> associated with it.
> 
> no, this flag means, that extra data is compatible. So, if you don't
> know what is this extra data, you can read and modify the bitmap,
> leaving this data as is. If this flag is unset, and there are some extra
> data, bitmap must not be used.
> 
> Finally, this spec should be consistent (or, may be better, duplicate)
> spec from docs/interop/qcow2.txt..

I might suggest a rewrite of this portion of the spec as it's a little
unclear to me.

I've given this portion a rewrite.

>> +#
>> +# @unknown: This bitmap has unknown or reserved properties.
> 
> Better is only "reserved flags" (not unknown and not properties), they
> are reserved by spec.
> 
>> +#
>> +# Since: 2.13
>> +##
>> +{ 'enum': 'BitmapFlagEnum', 'data': [ 'in-use', 'auto',
>> +  'extra-data-compatible',
>> 'unknown' ] }
>> +
>> +##
>> +# @BitmapInfo:
>> +#
>> +# @name: The name of the bitmap.
>> +#
>> +# @type: The type of bitmap.
>> +#
>> +# @granularity: Bitmap granularity, in bytes.
>> +#
>> +# @count: Overall bitmap dirtiness, in bytes.
>> +#
>> +# @flags: Bitmap flags, if any.
>> +#
>> +# Since: 2.13
>> +#
>> +##
>> +{ 'struct': 'BitmapInfo',
>> +  'data': { 'name': 'str', 'type': 'BitmapTypeEnum', 'granularity':
>> 'int',
>> +    'count': 'int', '*flags': ['BitmapFlagEnum']
> 
> may be worth add 'has-extra-data'
> 
>> +  }
>> +}
>> +
>>   ##
>>   # @ImageInfoSpecificQCow2EncryptionBase:
>>   #
>> @@ -69,6 +124,8 @@
>>   # @encrypt: details about encryption parameters; only set if image
>>   #   is encrypted (since 2.10)
>>   #
>> +# @bitmaps: list of image bitmaps (since 2.13)
>> +#
>>   # Since: 1.7
>>   ##
>>   { 'struct': 'ImageInfoSpecificQCow2',
>> @@ -77,7 +134,8 @@
>>     '*lazy-refcounts': 'bool',
>>     '*corrupt': 'bool',
>>     'refcount-bits': 'int',
>> -  '*encrypt': 'ImageInfoSpecificQCow2Encryption'
>> +  '*encrypt': 'ImageInfoSpecificQCow2Encryption',
>> +  '*bitmaps': ['BitmapInfo']
>>     } }
>>     ##
> 
>

Re: [Qemu-block] [Qemu-devel] [RFC PATCH 07/12] qcow2-bitmap: add basic bitmaps info

2018-05-16 Thread John Snow



On 05/14/2018 11:12 AM, Vladimir Sementsov-Ogievskiy wrote:
> 12.05.2018 04:25, John Snow wrote:
>> Add functions for querying the basic information inside of bitmaps.
>> Restructure the bitmaps flags masks to facilitate providing a list of
>> flags belonging to the bitmap(s) being queried.
>>
>> Signed-off-by: John Snow 
>> ---
>>   block/qcow2-bitmap.c | 81
>> ++--
>>   block/qcow2.c    |  7 +
>>   block/qcow2.h    |  1 +
>>   3 files changed, 87 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
>> index 60e01abfd7..811b82743a 100644
>> --- a/block/qcow2-bitmap.c
>> +++ b/block/qcow2-bitmap.c
>> @@ -49,8 +49,28 @@
>>     /* Bitmap directory entry flags */
>>   #define BME_RESERVED_FLAGS 0xfffcU
>> -#define BME_FLAG_IN_USE (1U << 0)
>> -#define BME_FLAG_AUTO   (1U << 1)
>> +
>> +enum BME_FLAG_BITS {
>> +    BME_FLAG_BIT__BEGIN = 0,
>> +    BME_FLAG_BIT_IN_USE = BME_FLAG_BIT__BEGIN,
>> +    BME_FLAG_BIT_AUTO   = 1,
>> +    BME_FLAG_BIT_EXTRA  = 2,
>> +    BME_FLAG_BIT__MAX,
>> +};
>> +
>> +#define BME_FLAG_IN_USE (1U << BME_FLAG_BIT_IN_USE)
>> +#define BME_FLAG_AUTO   (1U << BME_FLAG_BIT_AUTO)
>> +#define BME_FLAG_EXTRA  (1U << BME_FLAG_BIT_EXTRA)
>> +
>> +/* Correlate canonical spec values to autogenerated QAPI values */
>> +struct {
>> +    uint32_t mask;
> 
> mask is unused in this patch
> 
>> +    int qapi_value;
>> +} BMEFlagMap[BME_FLAG_BIT__MAX] = {
>> +    [BME_FLAG_BIT_IN_USE] = { BME_FLAG_IN_USE,
>> BITMAP_FLAG_ENUM_IN_USE },
>> +    [BME_FLAG_BIT_AUTO]   = { BME_FLAG_AUTO,   BITMAP_FLAG_ENUM_AUTO },
>> +    [BME_FLAG_BIT_EXTRA]  = { BME_FLAG_EXTRA, 
>> BITMAP_FLAG_ENUM_EXTRA_DATA_COMPATIBLE },
>> +};
>>     /* bits [1, 8] U [56, 63] are reserved */
>>   #define BME_TABLE_ENTRY_RESERVED_MASK 0xff0001feULL
>> @@ -663,6 +683,63 @@ static void del_bitmap_list(BlockDriverState *bs)
>>   }
>>   }
>>   +static BitmapFlagEnumList *get_bitmap_flags(uint32_t flags)
>> +{
>> +    int i;
>> +    BitmapFlagEnumList *flist = NULL;
>> +    BitmapFlagEnumList *ftmp;
>> +
>> +    while (flags) {
>> +    i = ctz32(flags);
>> +    ftmp = g_new0(BitmapFlagEnumList, 1);
>> +    if (i >= BME_FLAG_BIT__BEGIN && i < BME_FLAG_BIT__MAX) {
>> +    ftmp->value = BMEFlagMap[i].qapi_value;
>> +    } else {
>> +    ftmp->value = BITMAP_FLAG_ENUM_UNKNOWN;
> 
> so, there may be several "unknown" entries. It's inconsistent with
> "@unknown: This bitmap has unknown or reserved properties.".
> 
> Finally, can we export values for unknown flags? It should be more
> informative.
> 

I changed my mind -- qemu-img is not a debugging tool and shouldn't get
lost in the weeds trying to enumerate the different kinds of reserved
values that are set.

It's an error to have them set, or not -- which ones specifically is not
information that an end-user need know or care about, and I don't want
to create an extra structure to contain the value.

I'm going to rewrite this loop to just only ever have one unknown value
at a maximum.

--js

Re: [Qemu-block] [PATCH 37/42] job: Move progress fields to Job

2018-05-16 Thread Eric Blake


On 05/09/2018 11:26 AM, Kevin Wolf wrote:

BlockJob has fields .offset and .len, which are actually misnomers today
because they are no longer tied to block device sizes, but just progress
counters. As such they make a lot of sense in generic Jobs.

This patch moves the fields to Job and renames them to .progress_current
and .progress_total to describe their function better.

Signed-off-by: Kevin Wolf 
---



+++ b/include/qemu/job.h
@@ -114,6 +114,16 @@ typedef struct Job {
  /** True if this job should automatically dismiss itself */
  bool auto_dismiss;
  
+/**

+ * Current progress. The unit is arbitrary as long as the ratio between
+ * progress_current and progress_total represents the estimated percentage
+ * of work already done.
+ */
+int64_t progress_current;
+
+/** Estimated progress_current value at the completion of the job */
+int64_t progress_total;
+
  /** ret code passed to block_job_completed. */
  int ret;
  
@@ -304,6 +314,23 @@ void job_ref(Job *job);

   */
  void job_unref(Job *job);
  
+/**

+ * @job: The job that has made progress
+ * @done: How much progress the job made
+ *
+ * Updates the progress counter of the job.


Which progress counter? It might be worth calling out that this 
specifically updates the progress_current counter.  Also, it might be 
worth mentioning that the value is added to the current value of 
progress_counter, rather than directly assigned.



+ */
+void job_progress_update(Job *job, uint64_t done);
+
+/**
+ * @job: The job whose expected progress end value is set
+ * @remaining: Expected end value of the progress counter of the job
+ *
+ * Sets the expected end value of the progress counter of a job so that a
+ * completion percentage can be calculated when the progress is updated.
+ */
+void job_progress_set_remaining(Job *job, uint64_t remaining);


The name 'remaining' sounds like this is a delta (that progress_total is 
computed by summing the current progress_current + remaining) rather 
than directly the new value of progress_total.  Is that intended?



+++ b/job.c
@@ -364,6 +364,16 @@ void job_unref(Job *job)
  }
  }
  
+void job_progress_update(Job *job, uint64_t done)

+{
+job->progress_current += done;
+}
+
+void job_progress_set_remaining(Job *job, uint64_t remaining)
+{
+job->progress_total = job->progress_current + remaining;


Okay, so implementation-wise, both functions ARE taking a delta, rather 
than directly assigning the internal field.  The delta can only be 
positive, but job->progress_total can still shrink over time as needed.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [PATCH 14/42] job: Add reference counting

2018-05-16 Thread Eric Blake


On 05/16/2018 03:56 PM, Kevin Wolf wrote:


+
+void job_unref(Job *job)
+{
+if (--job->refcnt == 0) {


Should this be free()-like and allow an incoming job == NULL as a no-op?


This behaves like block_job_unref() always behavec, and I don't see a
single caller having a NULL check before calling job_unref(), so is it
worth it?


Only if it makes it easier to clean up a partially-constructed object 
(which is the most likely case of wanting to pass in NULL)


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [RFC PATCH 08/12] qjson: allow caller to ask for arbitrary indent

2018-05-16 Thread Eric Blake


On 05/11/2018 08:25 PM, John Snow wrote:

The function already exists, just expose it.

Signed-off-by: John Snow 
---
  include/qapi/qmp/qjson.h |  1 +
  qobject/qjson.c  | 21 +++--
  2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/qapi/qmp/qjson.h b/include/qapi/qmp/qjson.h
index b274ac3a86..0e8624c1fa 100644
--- a/include/qapi/qmp/qjson.h
+++ b/include/qapi/qmp/qjson.h
@@ -19,6 +19,7 @@ QObject *qobject_from_jsonf(const char *string, ...) 
GCC_FMT_ATTR(1, 2);
  QObject *qobject_from_jsonv(const char *string, va_list *ap, Error **errp)
  GCC_FMT_ATTR(1, 0);
  
+QString *qobject_to_json_opt(const QObject *obj, int pretty, int indent);


pretty seems like it is better as a bool.  Even though to_json() isn't 
typed correctly, that's no excuse for the public function to copy a bad 
interface (and maybe it's worth a separate cleanup patch to fix 
to_json() first).


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [Qemu-devel] Problem with data miscompare using scsi-hd, cache=none and io=threads

2018-05-16 Thread Daniel Henrique Barboza




On 05/16/2018 04:47 AM, Paolo Bonzini wrote:

On 15/05/2018 23:25, Daniel Henrique Barboza wrote:

This is the current status of this investigation. I decided to start a
discussion here, see if someone can point me something that I overlooked
or got it wrong, before I started changing the POSIX thread pool
behavior to see if I can enforce one specific POSIX thread to do a
read() if we had a write() done in the same fd. Any suggestions?

Copying from the bug:


Unless we learn something new, my understanding is that we're dealing
with a host side limitation/bug when calling pwritev() in a different
thread than a following preadv(), using the same file descriptor
opened with O_DIRECT and no WCE in the host side, the kernel can't
grant data coherency, e.g:

- thread A executes a pwritev() writing dataA in the disk

- thread B executes a preadv() call to read the data, but this
preadv() call isn't aware of the previous pwritev() call done in
thread A, thus the guarantee of the preadv() call reading dataA isn't
assured (as opposed to what is described in man 3 write)

- the physical disk, due to the heavy load of the stress test, didn't
finish writing up dataA. Since the disk itself doesn't have any
internal cache to rely on, the preadv() call goes in and read an old
data that differs from dataA.

There is a problem in the reasoning of the third point: if the physical
disk hasn't yet finished writing up dataA, pwritev() shouldn't have
returned.  This could be a bug in the kernel, or even in the disk.  I
suspect the kernel because SCSI passthrough doesn't show the bug; SCSI
passthrough uses ioctl() which completes exactly when the disk tells
QEMU that the command is done---it cannot report completion too early.

(Another small problem in the third point is that the disk actually does
have a cache.  But the cache should be transparent, if it weren't the
bug would be in the disk firmware).

It has to be debugged and fixed in the kernel.  The thread pool is
just... a thread pool, and shouldn't be working around bugs, especially
as serious as these.


Fixing in the thread pool would only make sense if we were sure that
the kernel was working as intended. I think the next step would be to
look it in the kernel level and see what is not working there.



A more likely possibility: maybe the disk has 4K sectors and QEMU is
doing read-modify-write cycles to emulate 512 byte sectors?  In this
case, mismatches are not expected, since QEMU serializes RMW cycles, but
at least we would know that the bug would be in QEMU, and where.


Haven't considered this possibility. I'll look it up if the disk has 4k
sectors and whether QEMU is emulating 512 bytes sectors.




Paolo

Re: [Qemu-block] [RFC PATCH 09/12] qapi/block-core: add BitmapMapping and BitmapEntry structs

2018-05-16 Thread Eric Blake


On 05/11/2018 08:25 PM, John Snow wrote:

Add two new structures for detailing the marked regions of bitmaps as
saved in e.g. qcow2 files.

Signed-off-by: John Snow 
---
  qapi/block-core.json | 32 
  1 file changed, 32 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 8f33f41ce7..de8ad73a78 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -298,6 +298,38 @@
 'zero': 'bool', 'depth': 'int', '*offset': 'int',
 '*filename': 'str' } }
  
+##

+# @BitmapEntry:
+#
+# Dirty Bitmap region information for a virtual block range
+#
+# @offset: the start byte of the dirty virtual range
+#
+# @length: the number of bytes of the dirty virtual range
+#
+# Since: 2.13
+#
+##
+{ 'struct': 'BitmapEntry',
+  'data': { 'offset': 'int', 'length': 'int' } }
+
+##
+# @BitmapMapping:
+#
+# List of described regions correlated to a named bitmap.
+#
+# @name: The name of the bitmap whose range is described here
+#
+# @entries: A list of zero or more @BitmapEntry elements representing
+#   the range(s) described by the bitmap.


Is it also worth documenting that the list will be in ascending order, 
with no overlaps (no two entries covering the same offset); and in fact 
with a gap between all entries (as otherwise those two consecutive 
entries could have been consolidated to one)?



+#
+# Since: 2.13
+#
+##
+{ 'struct': 'BitmapMapping',
+  'data': { 'name': 'str',
+'entries': [ 'BitmapEntry' ] } }
+
  ##
  # @BlockdevCacheInfo:
  #



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [Qemu-devel] Problem with data miscompare using scsi-hd, cache=none and io=threads

2018-05-16 Thread Daniel Henrique Barboza




On 05/16/2018 06:47 AM, Dr. David Alan Gilbert wrote:

* Daniel Henrique Barboza (danie...@linux.ibm.com) wrote:

Hi,

I've been working in the last two months in a miscompare issue that happens
when using a raid device and a SATA as scsi-hd (emulated SCSI) with
cache=none and io=threads during a hardware stress test. I'll summarize it
here as best as I can without creating a great wall of text - Red Hat folks
can check [1] for all the details.

Using the following setup:

- Host is a POWER9 RHEL 7.5-alt: kernel 4.14.0-49.1.1.el7a.ppc64le,
qemu-kvm-ma 2.10.0-20.el7 (also reproducible with upstream QEMU)

- Guest is RHEL 7.5-alt using the same kernel as the host, using two storage
disks (a 1.8 Tb raid and a 446Gb SATA drive) as follows:

     
   
   
   
   
   
     

Both block devices have WCE off in the host.

With this env, we found problems when running a stress test called HTX [2].
At a given time (usually after 24+ hours of test) HTX finds a data
miscompare in one of the devices. This is an example:

---

Device name: /dev/sdb
Total blocks: 0x74706daf, Block size: 0x200
Rule file name: /usr/lpp/htx/rules/reg/hxestorage/default.hdd
Number of Rulefile passes (cycle) completed: 0
Stanza running: rule_6, Thread no.: 8
Oper performed: wrc, Current seek type: SEQ
LBA no. where IO started: 0x94fa
Transfer size: 0x8400

Miscompare Summary:
===
LBA no. where miscomapre started: 0x94fa
LBA no. where miscomapre ended:   0x94ff
Miscompare start offset (in bytes):   0x8
Miscomapre end offset (in bytes): 0xbff
Miscompare size (in bytes):   0xbf8

Expected data (at miscomapre offset): 8c9aea5a736462007275
Actual data (at miscomapre offset): 889aea5a736462007275

Are all the miscompares single bit errors like that one?


The miscompares differs in size. What it is displayed here is the first 
snippet of
the miscompare data, but in this case the miscompare has 0xbf8 bytes of 
size.


I've seen cases where the miscompare has the same size of the data 
written - the
test initialize the disk with a known pattern (bbb for example), 
then a miscompare

happens and it founds out that the disk had the starting pattern.



Is the test doing single bit manipulation or is that coming out of the
blue?


As far as I've read in the test suite code, it is writing several 
sectors at once

then asserting that the contents were written.



Dave


-


This means that the test executed a write at  LBA 0x94fa and, after
confirming that the write was completed, issue 2 reads in the same LBA to
assert the written contents and found out a mismatch.


I've tested all sort of configurations between disk vs LUN, cache modes and
AIO. My findings are:

- using device='lun' instead of device='disk', I can't reproduce the issue
doesn't matter what other configurations are;
- using device='disk' but with cache='writethrough', issue doesn't happen
(haven't checked other cache modes);
- using device='disk', cache='none' and io='native', issue doesn't happen.


The issue seems to be tied with the combination device=disk + cache=none +
io=threads. I've started digging into the SCSI layer all the way down to the
block backend. With a shameful amount of logs I've discovered that, in the
write that the test finds a miscompare, in block/file-posix.c:

- when doing the write, handle_aiocb_rw_vector() returns success, pwritev()
reports that all bytes were written
- in both reads after the write, handle_aiocb_rw_vector returns success, all
bytes read by preadv(). In both reads, the data read is different from the
data written by  the pwritev() that happened before

In the discussions at [1], Fam Zheng suggested a test in which we would take
down the number of threads created in the POSIX thread pool from 64 to 1.
The idea is to ensure that we're using the same thread to write and read.
There was a suspicion that the kernel can't guarantee data coherency between
different threads, even if using the same fd, when using pwritev() and
preadv(). This would explain why the following reads in the same fd would
fail to retrieve the same data that was written before. After doing this
modification, the miscompare didn't reproduce.

After reverting the thread pool number change, I've made a couple of
attempts trying to flush before read() and flushing after write(). Both
attempts failed - the miscompare appears in both scenarios. This enforces
the suspicion we have above - if data coherency can't be granted between
different threads, flushing in different threads wouldn't make a difference
too. I've also tested a suggestion from Fam where I started the disks with
"cache.direct=on,cache.no-flush=off" - bug still reproduces.


This is the current status of this investigation. I decided to start a
discussion here, see if someone can point me something that I overlooked or
got it wrong, before I started changing the POSIX thread pool behavior to
see if I can enforce one specif

Re: [Qemu-block] [Qemu-devel] [RFC PATCH 08/12] qjson: allow caller to ask for arbitrary indent

2018-05-16 Thread John Snow



On 05/16/2018 05:34 PM, Eric Blake wrote:
> On 05/11/2018 08:25 PM, John Snow wrote:
>> The function already exists, just expose it.
>>
>> Signed-off-by: John Snow 
>> ---
>>   include/qapi/qmp/qjson.h |  1 +
>>   qobject/qjson.c  | 21 +++--
>>   2 files changed, 12 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/qapi/qmp/qjson.h b/include/qapi/qmp/qjson.h
>> index b274ac3a86..0e8624c1fa 100644
>> --- a/include/qapi/qmp/qjson.h
>> +++ b/include/qapi/qmp/qjson.h
>> @@ -19,6 +19,7 @@ QObject *qobject_from_jsonf(const char *string, ...)
>> GCC_FMT_ATTR(1, 2);
>>   QObject *qobject_from_jsonv(const char *string, va_list *ap, Error
>> **errp)
>>   GCC_FMT_ATTR(1, 0);
>>   +QString *qobject_to_json_opt(const QObject *obj, int pretty, int
>> indent);
> 
> pretty seems like it is better as a bool.  Even though to_json() isn't
> typed correctly, that's no excuse for the public function to copy a bad
> interface (and maybe it's worth a separate cleanup patch to fix
> to_json() first).
> 

Eh, sure, why not. Done.

Re: [Qemu-block] [Qemu-devel] [RFC PATCH 09/12] qapi/block-core: add BitmapMapping and BitmapEntry structs

2018-05-16 Thread John Snow



On 05/16/2018 05:37 PM, Eric Blake wrote:
>> +##
>> +# @BitmapMapping:
>> +#
>> +# List of described regions correlated to a named bitmap.
>> +#
>> +# @name: The name of the bitmap whose range is described here
>> +#
>> +# @entries: A list of zero or more @BitmapEntry elements representing
>> +#   the range(s) described by the bitmap.
> 
> Is it also worth documenting that the list will be in ascending order,
> with no overlaps (no two entries covering the same offset); and in fact
> with a gap between all entries (as otherwise those two consecutive
> entries could have been consolidated to one)?
> 

Hm. I didn't necessarily want to guarantee the order, but being more
specific about the output will assist legibility of the spec.

I'll amend the documentation and make some stronger guarantees.

>> +#
>> +# Since: 2.13
>> +#
>> +##
>> +{ 'struct': 'BitmapMapping',
>> +  'data': { 'name': 'str',
>> +    'entries': [ 'BitmapEntry' ] } }
>> +
>>   ##
>>   # @BlockdevCacheInfo:
>>   #

Re: [Qemu-block] [Qemu-devel] Problem with data miscompare using scsi-hd, cache=none and io=threads

2018-05-16 Thread Daniel Henrique Barboza




On 05/16/2018 06:35 PM, Daniel Henrique Barboza wrote:



On 05/16/2018 04:47 AM, Paolo Bonzini wrote:

On 15/05/2018 23:25, Daniel Henrique Barboza wrote:

This is the current status of this investigation. I decided to start a
discussion here, see if someone can point me something that I 
overlooked

or got it wrong, before I started changing the POSIX thread pool
behavior to see if I can enforce one specific POSIX thread to do a
read() if we had a write() done in the same fd. Any suggestions?

Copying from the bug:


Unless we learn something new, my understanding is that we're dealing
with a host side limitation/bug when calling pwritev() in a different
thread than a following preadv(), using the same file descriptor
opened with O_DIRECT and no WCE in the host side, the kernel can't
grant data coherency, e.g:

- thread A executes a pwritev() writing dataA in the disk

- thread B executes a preadv() call to read the data, but this
preadv() call isn't aware of the previous pwritev() call done in
thread A, thus the guarantee of the preadv() call reading dataA isn't
assured (as opposed to what is described in man 3 write)

- the physical disk, due to the heavy load of the stress test, didn't
finish writing up dataA. Since the disk itself doesn't have any
internal cache to rely on, the preadv() call goes in and read an old
data that differs from dataA.

There is a problem in the reasoning of the third point: if the physical
disk hasn't yet finished writing up dataA, pwritev() shouldn't have
returned.  This could be a bug in the kernel, or even in the disk.  I
suspect the kernel because SCSI passthrough doesn't show the bug; SCSI
passthrough uses ioctl() which completes exactly when the disk tells
QEMU that the command is done---it cannot report completion too early.

(Another small problem in the third point is that the disk actually does
have a cache.  But the cache should be transparent, if it weren't the
bug would be in the disk firmware).

It has to be debugged and fixed in the kernel.  The thread pool is
just... a thread pool, and shouldn't be working around bugs, especially
as serious as these.


Fixing in the thread pool would only make sense if we were sure that
the kernel was working as intended. I think the next step would be to
look it in the kernel level and see what is not working there.



A more likely possibility: maybe the disk has 4K sectors and QEMU is
doing read-modify-write cycles to emulate 512 byte sectors?  In this
case, mismatches are not expected, since QEMU serializes RMW cycles, but
at least we would know that the bug would be in QEMU, and where.


Haven't considered this possibility. I'll look it up if the disk has 4k
sectors and whether QEMU is emulating 512 bytes sectors.


There are several differences between the guest and the host device 
regarding the

kernel parameters. This is how the guest configured the SATA disk:


# grep . /sys/block/sdb/queue/*
/sys/block/sdb/queue/add_random:1
/sys/block/sdb/queue/chunk_sectors:0
/sys/block/sdb/queue/dax:0
/sys/block/sdb/queue/discard_granularity:4096
/sys/block/sdb/queue/discard_max_bytes:1073741824
/sys/block/sdb/queue/discard_max_hw_bytes:1073741824
/sys/block/sdb/queue/discard_zeroes_data:0
/sys/block/sdb/queue/hw_sector_size:512
/sys/block/sdb/queue/io_poll:0
/sys/block/sdb/queue/io_poll_delay:0
grep: /sys/block/sdb/queue/iosched: Is a directory
/sys/block/sdb/queue/iostats:1
/sys/block/sdb/queue/logical_block_size:512
/sys/block/sdb/queue/max_discard_segments:1
/sys/block/sdb/queue/max_hw_sectors_kb:32767
/sys/block/sdb/queue/max_integrity_segments:0
/sys/block/sdb/queue/max_sectors_kb:256
/sys/block/sdb/queue/max_segments:126
/sys/block/sdb/queue/max_segment_size:65536
/sys/block/sdb/queue/minimum_io_size:262144
/sys/block/sdb/queue/nomerges:0
/sys/block/sdb/queue/nr_requests:128
/sys/block/sdb/queue/optimal_io_size:262144
/sys/block/sdb/queue/physical_block_size:512
/sys/block/sdb/queue/read_ahead_kb:4096
/sys/block/sdb/queue/rotational:1
/sys/block/sdb/queue/rq_affinity:1
/sys/block/sdb/queue/scheduler:noop [deadline] cfq
/sys/block/sdb/queue/unpriv_sgio:0
grep: /sys/block/sdb/queue/wbt_lat_usec: Invalid argument
/sys/block/sdb/queue/write_cache:write back
/sys/block/sdb/queue/write_same_max_bytes:262144
/sys/block/sdb/queue/write_zeroes_max_bytes:262144
/sys/block/sdb/queue/zoned:none


The same device in the host:

$ grep . /sys/block/sdc/queue/*
/sys/block/sdc/queue/add_random:1
/sys/block/sdc/queue/chunk_sectors:0
/sys/block/sdc/queue/dax:0
/sys/block/sdc/queue/discard_granularity:0
/sys/block/sdc/queue/discard_max_bytes:0
/sys/block/sdc/queue/discard_max_hw_bytes:0
/sys/block/sdc/queue/discard_zeroes_data:0
/sys/block/sdc/queue/hw_sector_size:512
/sys/block/sdc/queue/io_poll:0
/sys/block/sdc/queue/io_poll_delay:0
grep: /sys/block/sdc/queue/iosched: Is a directory
/sys/block/sdc/queue/iostats:1
/sys/block/sdc/queue/logical_block_size:512
/sys/block/sdc/queue/max_discard_segments:1
/sys/block/sdc/queue/max_hw_sectors_

Re: [Qemu-block] [PATCH v4] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread WangJie (Pluto)

I agree, wait for a reply from Fam

On 2018/5/16 19:43, Peter Xu wrote:
> On Wed, May 16, 2018 at 07:14:53PM +0800, WangJie (Pluto) wrote:
>> Hi, Peter Xu:
>>  If call aio_epoll_disable() here, aio_epoll_disable() will return 
>> before close ctx->epollfd,
>> Because the ctx->epoll_enabled is false in the moment.
>>  In the process of addIOThread, aio_context_setup created epoll without 
>> call aio_epoll_try_enable,
>> so ctx->epoll_enabled have no chance to set true.
> 
> I see that epoll_available will only be set if epollfd != -1, so it
> seems to me to make more sense if we swap the two variables in
> aio_epoll_disable(), from current version:
> 
> static void aio_epoll_disable(AioContext *ctx)
> {
> ctx->epoll_available = false;
> if (!ctx->epoll_enabled) {
> return;
> }
> ctx->epoll_enabled = false;
> close(ctx->epollfd);
> }
> 
> To:
> 
> static void aio_epoll_disable(AioContext *ctx)
> {
> ctx->epoll_enabled = false;
> if (!ctx->epoll_available) {
> return;
> }
> ctx->epoll_available = false;
> close(ctx->epollfd);
> }
> 
> What do you think?  And Fam?
> 
>>
>> On 2018/5/16 16:36, Jie Wang wrote:
>>> +void aio_context_destroy(AioContext *ctx)
>>> +{
>>> +#ifdef CONFIG_EPOLL_CREATE1
>>> +if (ctx->epollfd >= 0) {
>>> +close(ctx->epollfd);
>>> +}
>>> +#endif
>>> +}
>>> +
>>>  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
>>>   int64_t grow, int64_t shrink, Error 
>>> **errp)
>>
>

Re: [Qemu-block] Restoring bitmaps after failed/cancelled migration

2018-05-16 Thread Fam Zheng

On Wed, 05/16 18:52, Vladimir Sementsov-Ogievskiy wrote:
> 16.05.2018 18:32, Kevin Wolf wrote:
> > Am 16.05.2018 um 17:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > 16.05.2018 15:47, Kevin Wolf wrote:
> > > > Am 14.05.2018 um 12:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > > 14.05.2018 09:41, Fam Zheng wrote:
> > > > > > On Wed, 04/18 17:00, Vladimir Sementsov-Ogievskiy wrote:
> > > > > > > Is it possible, that target will change the disk, and then we 
> > > > > > > return control
> > > > > > > to the source? In this case bitmaps will be invalid. So, should 
> > > > > > > not we drop
> > > > > > > all the bitmaps on inactivate?
> > > > > > Yes, dropping all live bitmaps upon inactivate sounds reasonable. 
> > > > > > If the dst
> > > > > > fails to start, and we want to resume VM at src, we could 
> > > > > > (optionally?) reload
> > > > > > the persistent bitmaps, I guess.
> > > > > Reload from where? We didn't store them.
> > > > Maybe this just means that it turns out that not storing them was a bad
> > > > idea?
> > > > 
> > > > What was the motivation for not storing the bitmap? The additional
> > > > downtime? Is it really that bad, though? Bitmaps should be fairly small
> > > > for the usual image sizes and writing them out should be quick.
> > > What are usual ones? A bitmap of standard granularity of 64k for 16Tb disk
> > > is ~30mb. If we have several such bitmaps it may be significant downtime.
> > We could have an in-memory bitmap that tracks which parts of the
> > persistent bitmap are dirty so that you don't have to write out the
> > whole 30 MB during the migration downtime, but can already flush most of
> > the persistent bitmap before the VM is stopped.
> > 
> > Kevin
> 
> Yes it looks possible. But how to control that downtime? Introduce migration
> state, with specific _pending function? However, it may be not necessary.
> 
> Anyway, I think we don't need to store it.
> 
> If we decided to resume source, bitmap is already in memory, why to reload
> it? If someone already killed source (which was in paused mode), it is
> inconsistent anyway and loss of dirty bitmap is not the worst possible
> problem.
> 
> So, finally, it looks safe enough, just to make bitmaps on source persistent
> again (or better, introduce another way to skip storing (may be with
> additional flag, so everybody will be happy), not dropping persistent flag).

This makes some sense to me. We'll then use the current persistent flag to
indicate the bitmap "is" a persistent one, instead of "should it be persisted".
They are apparently two different properties in the case discussed in this
thread.

> And, after source resume, we have one of the following situations:
> 
> 1. disk was not changed during migration, so, all is ok and we have bitmaps
> 2. disk was changed. bitmaps are inconsistent. But not only bitmaps, the
> whole vm state is inconsistent with it's disk. This case is a bug in
> management layer and it should never happen. And possibly, we need some
> separate way, to catch such cases.

Fam

Re: [Qemu-block] [PATCH v4] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread Fam Zheng

On Wed, 05/16 19:43, Peter Xu wrote:
> On Wed, May 16, 2018 at 07:14:53PM +0800, WangJie (Pluto) wrote:
> > Hi, Peter Xu:
> > If call aio_epoll_disable() here, aio_epoll_disable() will return 
> > before close ctx->epollfd,
> > Because the ctx->epoll_enabled is false in the moment.
> > In the process of addIOThread, aio_context_setup created epoll without 
> > call aio_epoll_try_enable,
> > so ctx->epoll_enabled have no chance to set true.
> 
> I see that epoll_available will only be set if epollfd != -1, so it
> seems to me to make more sense if we swap the two variables in
> aio_epoll_disable(), from current version:
> 
> static void aio_epoll_disable(AioContext *ctx)
> {
> ctx->epoll_available = false;
> if (!ctx->epoll_enabled) {
> return;
> }
> ctx->epoll_enabled = false;
> close(ctx->epollfd);
> }
> 
> To:
> 
> static void aio_epoll_disable(AioContext *ctx)
> {
> ctx->epoll_enabled = false;
> if (!ctx->epoll_available) {
> return;
> }
> ctx->epoll_available = false;
> close(ctx->epollfd);
> }
> 
> What do you think?  And Fam?

Looks good.

Fam

> 
> > 
> > On 2018/5/16 16:36, Jie Wang wrote:
> > > +void aio_context_destroy(AioContext *ctx)
> > > +{
> > > +#ifdef CONFIG_EPOLL_CREATE1
> > > +if (ctx->epollfd >= 0) {
> > > +close(ctx->epollfd);
> > > +}
> > > +#endif
> > > +}
> > > +
> > >  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
> > >   int64_t grow, int64_t shrink, Error 
> > > **errp)
> > 
> 
> -- 
> Peter Xu

[Qemu-block] [PATCH v5] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread Jie Wang

When we call addIOThread, the epollfd created in aio_context_setup,
but not close it in the process of delIOThread, so the epollfd will leak.

Signed-off-by: Jie Wang 
---
 include/block/aio.h |  8 
 util/aio-posix.c| 13 ++---
 util/aio-win32.c|  4 
 util/async.c|  1 +
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index a1d6b9e..ae6f354 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -555,6 +555,14 @@ static inline bool in_aio_context_home_thread(AioContext 
*ctx)
 void aio_context_setup(AioContext *ctx);
 
 /**
+ * aio_context_destroy:
+ * @ctx: the aio context
+ *
+ * Destroy the aio context.
+ */
+void aio_context_destroy(AioContext *ctx);
+
+/**
  * aio_context_set_poll_params:
  * @ctx: the aio context
  * @max_ns: how long to busy poll for, in nanoseconds
diff --git a/util/aio-posix.c b/util/aio-posix.c
index d8f0cb4..118bf57 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -45,11 +45,11 @@ struct AioHandler
 
 static void aio_epoll_disable(AioContext *ctx)
 {
-ctx->epoll_available = false;
-if (!ctx->epoll_enabled) {
+ctx->epoll_enabled = false;
+if (!ctx->epoll_available) {
 return;
 }
-ctx->epoll_enabled = false;
+ctx->epoll_available = false;
 close(ctx->epollfd);
 }
 
@@ -713,6 +713,13 @@ void aio_context_setup(AioContext *ctx)
 #endif
 }
 
+void aio_context_destroy(AioContext *ctx)
+{
+#ifdef CONFIG_EPOLL_CREATE1
+aio_epoll_disable(ctx);
+#endif
+}
+
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
  int64_t grow, int64_t shrink, Error **errp)
 {
diff --git a/util/aio-win32.c b/util/aio-win32.c
index a67b00c..e676a8d 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -407,6 +407,10 @@ void aio_context_setup(AioContext *ctx)
 {
 }
 
+void aio_context_destroy(AioContext *ctx)
+{
+}
+
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
  int64_t grow, int64_t shrink, Error **errp)
 {
diff --git a/util/async.c b/util/async.c
index 4dd9d95..03f6278 100644
--- a/util/async.c
+++ b/util/async.c
@@ -298,6 +298,7 @@ aio_ctx_finalize(GSource *source)
 qemu_rec_mutex_destroy(&ctx->lock);
 qemu_lockcnt_destroy(&ctx->list_lock);
 timerlistgroup_deinit(&ctx->tlg);
+aio_context_destroy(ctx);
 }
 
 static GSourceFuncs aio_source_funcs = {
-- 
1.8.3.1

Re: [Qemu-block] [PATCH v5] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread Fam Zheng

On Thu, 05/17 08:42, Jie Wang wrote:
> When we call addIOThread, the epollfd created in aio_context_setup,
> but not close it in the process of delIOThread, so the epollfd will leak.
> 
> Signed-off-by: Jie Wang 
> ---
>  include/block/aio.h |  8 
>  util/aio-posix.c| 13 ++---
>  util/aio-win32.c|  4 
>  util/async.c|  1 +
>  4 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/include/block/aio.h b/include/block/aio.h
> index a1d6b9e..ae6f354 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -555,6 +555,14 @@ static inline bool in_aio_context_home_thread(AioContext 
> *ctx)
>  void aio_context_setup(AioContext *ctx);
>  
>  /**
> + * aio_context_destroy:
> + * @ctx: the aio context
> + *
> + * Destroy the aio context.
> + */
> +void aio_context_destroy(AioContext *ctx);
> +
> +/**
>   * aio_context_set_poll_params:
>   * @ctx: the aio context
>   * @max_ns: how long to busy poll for, in nanoseconds
> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index d8f0cb4..118bf57 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -45,11 +45,11 @@ struct AioHandler
>  
>  static void aio_epoll_disable(AioContext *ctx)
>  {
> -ctx->epoll_available = false;
> -if (!ctx->epoll_enabled) {
> +ctx->epoll_enabled = false;
> +if (!ctx->epoll_available) {
>  return;
>  }
> -ctx->epoll_enabled = false;
> +ctx->epoll_available = false;
>  close(ctx->epollfd);
>  }
>  
> @@ -713,6 +713,13 @@ void aio_context_setup(AioContext *ctx)
>  #endif
>  }
>  
> +void aio_context_destroy(AioContext *ctx)
> +{
> +#ifdef CONFIG_EPOLL_CREATE1
> +aio_epoll_disable(ctx);
> +#endif
> +}
> +
>  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
>   int64_t grow, int64_t shrink, Error **errp)
>  {
> diff --git a/util/aio-win32.c b/util/aio-win32.c
> index a67b00c..e676a8d 100644
> --- a/util/aio-win32.c
> +++ b/util/aio-win32.c
> @@ -407,6 +407,10 @@ void aio_context_setup(AioContext *ctx)
>  {
>  }
>  
> +void aio_context_destroy(AioContext *ctx)
> +{
> +}
> +
>  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
>   int64_t grow, int64_t shrink, Error **errp)
>  {
> diff --git a/util/async.c b/util/async.c
> index 4dd9d95..03f6278 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -298,6 +298,7 @@ aio_ctx_finalize(GSource *source)
>  qemu_rec_mutex_destroy(&ctx->lock);
>  qemu_lockcnt_destroy(&ctx->list_lock);
>  timerlistgroup_deinit(&ctx->tlg);
> +aio_context_destroy(ctx);
>  }
>  
>  static GSourceFuncs aio_source_funcs = {
> -- 
> 1.8.3.1
> 

Reviewed-by: Fam Zheng

Re: [Qemu-block] [PATCH v5] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread Peter Xu

On Thu, May 17, 2018 at 08:42:43AM +0800, Jie Wang wrote:
> When we call addIOThread, the epollfd created in aio_context_setup,
> but not close it in the process of delIOThread, so the epollfd will leak.

(maybe also mention about the aio_epoll_disable change, or split into
 two patches?)

> 
> Signed-off-by: Jie Wang 

Reviewed-by: Peter Xu 

> ---
>  include/block/aio.h |  8 
>  util/aio-posix.c| 13 ++---
>  util/aio-win32.c|  4 
>  util/async.c|  1 +
>  4 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/include/block/aio.h b/include/block/aio.h
> index a1d6b9e..ae6f354 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -555,6 +555,14 @@ static inline bool in_aio_context_home_thread(AioContext 
> *ctx)
>  void aio_context_setup(AioContext *ctx);
>  
>  /**
> + * aio_context_destroy:
> + * @ctx: the aio context
> + *
> + * Destroy the aio context.
> + */
> +void aio_context_destroy(AioContext *ctx);
> +
> +/**
>   * aio_context_set_poll_params:
>   * @ctx: the aio context
>   * @max_ns: how long to busy poll for, in nanoseconds
> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index d8f0cb4..118bf57 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -45,11 +45,11 @@ struct AioHandler
>  
>  static void aio_epoll_disable(AioContext *ctx)
>  {
> -ctx->epoll_available = false;
> -if (!ctx->epoll_enabled) {
> +ctx->epoll_enabled = false;
> +if (!ctx->epoll_available) {
>  return;
>  }
> -ctx->epoll_enabled = false;
> +ctx->epoll_available = false;
>  close(ctx->epollfd);
>  }
>  
> @@ -713,6 +713,13 @@ void aio_context_setup(AioContext *ctx)
>  #endif
>  }
>  
> +void aio_context_destroy(AioContext *ctx)
> +{
> +#ifdef CONFIG_EPOLL_CREATE1
> +aio_epoll_disable(ctx);
> +#endif
> +}
> +
>  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
>   int64_t grow, int64_t shrink, Error **errp)
>  {
> diff --git a/util/aio-win32.c b/util/aio-win32.c
> index a67b00c..e676a8d 100644
> --- a/util/aio-win32.c
> +++ b/util/aio-win32.c
> @@ -407,6 +407,10 @@ void aio_context_setup(AioContext *ctx)
>  {
>  }
>  
> +void aio_context_destroy(AioContext *ctx)
> +{
> +}
> +
>  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
>   int64_t grow, int64_t shrink, Error **errp)
>  {
> diff --git a/util/async.c b/util/async.c
> index 4dd9d95..03f6278 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -298,6 +298,7 @@ aio_ctx_finalize(GSource *source)
>  qemu_rec_mutex_destroy(&ctx->lock);
>  qemu_lockcnt_destroy(&ctx->list_lock);
>  timerlistgroup_deinit(&ctx->tlg);
> +aio_context_destroy(ctx);
>  }
>  
>  static GSourceFuncs aio_source_funcs = {
> -- 
> 1.8.3.1
> 

-- 
Peter Xu

Re: [Qemu-block] [PATCH v5] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread WangJie (Pluto)

OK! I will split into two patches

On 2018/5/17 11:04, Peter Xu wrote:
> On Thu, May 17, 2018 at 08:42:43AM +0800, Jie Wang wrote:
>> When we call addIOThread, the epollfd created in aio_context_setup,
>> but not close it in the process of delIOThread, so the epollfd will leak.
> 
> (maybe also mention about the aio_epoll_disable change, or split into
>  two patches?)
> 
>>
>> Signed-off-by: Jie Wang 
> 
> Reviewed-by: Peter Xu 
> 
>> ---
>>  include/block/aio.h |  8 
>>  util/aio-posix.c| 13 ++---
>>  util/aio-win32.c|  4 
>>  util/async.c|  1 +
>>  4 files changed, 23 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/block/aio.h b/include/block/aio.h
>> index a1d6b9e..ae6f354 100644
>> --- a/include/block/aio.h
>> +++ b/include/block/aio.h
>> @@ -555,6 +555,14 @@ static inline bool 
>> in_aio_context_home_thread(AioContext *ctx)
>>  void aio_context_setup(AioContext *ctx);
>>  
>>  /**
>> + * aio_context_destroy:
>> + * @ctx: the aio context
>> + *
>> + * Destroy the aio context.
>> + */
>> +void aio_context_destroy(AioContext *ctx);
>> +
>> +/**
>>   * aio_context_set_poll_params:
>>   * @ctx: the aio context
>>   * @max_ns: how long to busy poll for, in nanoseconds
>> diff --git a/util/aio-posix.c b/util/aio-posix.c
>> index d8f0cb4..118bf57 100644
>> --- a/util/aio-posix.c
>> +++ b/util/aio-posix.c
>> @@ -45,11 +45,11 @@ struct AioHandler
>>  
>>  static void aio_epoll_disable(AioContext *ctx)
>>  {
>> -ctx->epoll_available = false;
>> -if (!ctx->epoll_enabled) {
>> +ctx->epoll_enabled = false;
>> +if (!ctx->epoll_available) {
>>  return;
>>  }
>> -ctx->epoll_enabled = false;
>> +ctx->epoll_available = false;
>>  close(ctx->epollfd);
>>  }
>>  
>> @@ -713,6 +713,13 @@ void aio_context_setup(AioContext *ctx)
>>  #endif
>>  }
>>  
>> +void aio_context_destroy(AioContext *ctx)
>> +{
>> +#ifdef CONFIG_EPOLL_CREATE1
>> +aio_epoll_disable(ctx);
>> +#endif
>> +}
>> +
>>  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
>>   int64_t grow, int64_t shrink, Error **errp)
>>  {
>> diff --git a/util/aio-win32.c b/util/aio-win32.c
>> index a67b00c..e676a8d 100644
>> --- a/util/aio-win32.c
>> +++ b/util/aio-win32.c
>> @@ -407,6 +407,10 @@ void aio_context_setup(AioContext *ctx)
>>  {
>>  }
>>  
>> +void aio_context_destroy(AioContext *ctx)
>> +{
>> +}
>> +
>>  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
>>   int64_t grow, int64_t shrink, Error **errp)
>>  {
>> diff --git a/util/async.c b/util/async.c
>> index 4dd9d95..03f6278 100644
>> --- a/util/async.c
>> +++ b/util/async.c
>> @@ -298,6 +298,7 @@ aio_ctx_finalize(GSource *source)
>>  qemu_rec_mutex_destroy(&ctx->lock);
>>  qemu_lockcnt_destroy(&ctx->list_lock);
>>  timerlistgroup_deinit(&ctx->tlg);
>> +aio_context_destroy(ctx);
>>  }
>>  
>>  static GSourceFuncs aio_source_funcs = {
>> -- 
>> 1.8.3.1
>>
>

[Qemu-block] [PATCH v6 1/2] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread Jie Wang

When we call addIOThread, the epollfd created in aio_context_setup,
but not close it in the process of delIOThread, so the epollfd will leak.

Signed-off-by: Jie Wang 
---
 include/block/aio.h | 8 
 util/aio-posix.c| 9 +
 util/aio-win32.c| 4 
 util/async.c| 1 +
 4 files changed, 22 insertions(+)

diff --git a/include/block/aio.h b/include/block/aio.h
index a1d6b9e..ae6f354 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -555,6 +555,14 @@ static inline bool in_aio_context_home_thread(AioContext 
*ctx)
 void aio_context_setup(AioContext *ctx);
 
 /**
+ * aio_context_destroy:
+ * @ctx: the aio context
+ *
+ * Destroy the aio context.
+ */
+void aio_context_destroy(AioContext *ctx);
+
+/**
  * aio_context_set_poll_params:
  * @ctx: the aio context
  * @max_ns: how long to busy poll for, in nanoseconds
diff --git a/util/aio-posix.c b/util/aio-posix.c
index d8f0cb4..0ade2c7 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -713,6 +713,15 @@ void aio_context_setup(AioContext *ctx)
 #endif
 }
 
+void aio_context_destroy(AioContext *ctx)
+{
+#ifdef CONFIG_EPOLL_CREATE1
+if (ctx->epollfd >= 0) {
+close(ctx->epollfd);
+}
+#endif
+}
+
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
  int64_t grow, int64_t shrink, Error **errp)
 {
diff --git a/util/aio-win32.c b/util/aio-win32.c
index a67b00c..e676a8d 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -407,6 +407,10 @@ void aio_context_setup(AioContext *ctx)
 {
 }
 
+void aio_context_destroy(AioContext *ctx)
+{
+}
+
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
  int64_t grow, int64_t shrink, Error **errp)
 {
diff --git a/util/async.c b/util/async.c
index 4dd9d95..03f6278 100644
--- a/util/async.c
+++ b/util/async.c
@@ -298,6 +298,7 @@ aio_ctx_finalize(GSource *source)
 qemu_rec_mutex_destroy(&ctx->lock);
 qemu_lockcnt_destroy(&ctx->list_lock);
 timerlistgroup_deinit(&ctx->tlg);
+aio_context_destroy(ctx);
 }
 
 static GSourceFuncs aio_source_funcs = {
-- 
1.8.3.1

[Qemu-block] [PATCH v6 2/2] iothread: let aio_epoll_disable fit to aio_context_destroy

2018-05-16 Thread Jie Wang

epoll_available will only be set if epollfd != -1, os we
can swap the two variables in aio_epoll_disable, and
aio_context_destroy can call aio_epoll_disable directly.

Signed-off-by: Jie Wang 
---
 util/aio-posix.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 0ade2c7..118bf57 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -45,11 +45,11 @@ struct AioHandler
 
 static void aio_epoll_disable(AioContext *ctx)
 {
-ctx->epoll_available = false;
-if (!ctx->epoll_enabled) {
+ctx->epoll_enabled = false;
+if (!ctx->epoll_available) {
 return;
 }
-ctx->epoll_enabled = false;
+ctx->epoll_available = false;
 close(ctx->epollfd);
 }
 
@@ -716,9 +716,7 @@ void aio_context_setup(AioContext *ctx)
 void aio_context_destroy(AioContext *ctx)
 {
 #ifdef CONFIG_EPOLL_CREATE1
-if (ctx->epollfd >= 0) {
-close(ctx->epollfd);
-}
+aio_epoll_disable(ctx);
 #endif
 }
 
-- 
1.8.3.1

Re: [Qemu-block] [PATCH v6 1/2] iothread: fix epollfd leak in the process of delIOThread

2018-05-16 Thread Fam Zheng

Thanks for the quick iterations. A note on the patch submission:

When sending patch series with git, please use git-send-email's "--thread
--no-chain-reply-to" options, and also create a cover letter using "git
format-patch --cover-letter". Or you can use git-publish. For more complete
information on how to submit patches:

https://wiki.qemu.org/Contribute/SubmitAPatch

In particular, as noted in that page, it is perferred to include revision log in
the cover letter, such as:

v6: Improve foo.

v5: Tweak bar.
Add baz.

...

On Thu, 05/17 10:25, Jie Wang wrote:
> When we call addIOThread, the epollfd created in aio_context_setup,
> but not close it in the process of delIOThread, so the epollfd will leak.
> 
> Signed-off-by: Jie Wang 
> ---
>  include/block/aio.h | 8 
>  util/aio-posix.c| 9 +
>  util/aio-win32.c| 4 
>  util/async.c| 1 +
>  4 files changed, 22 insertions(+)
> 
> diff --git a/include/block/aio.h b/include/block/aio.h
> index a1d6b9e..ae6f354 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -555,6 +555,14 @@ static inline bool in_aio_context_home_thread(AioContext 
> *ctx)
>  void aio_context_setup(AioContext *ctx);
>  
>  /**
> + * aio_context_destroy:
> + * @ctx: the aio context
> + *
> + * Destroy the aio context.
> + */
> +void aio_context_destroy(AioContext *ctx);
> +
> +/**
>   * aio_context_set_poll_params:
>   * @ctx: the aio context
>   * @max_ns: how long to busy poll for, in nanoseconds
> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index d8f0cb4..0ade2c7 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -713,6 +713,15 @@ void aio_context_setup(AioContext *ctx)
>  #endif
>  }
>  
> +void aio_context_destroy(AioContext *ctx)
> +{
> +#ifdef CONFIG_EPOLL_CREATE1
> +if (ctx->epollfd >= 0) {
> +close(ctx->epollfd);
> +}

This is changed to aio_epoll_disable in patch 2. It is unnecessary code churn.
Just swap these two patches so that you can avoid it.

> +#endif
> +}
> +
>  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
>   int64_t grow, int64_t shrink, Error **errp)
>  {
> diff --git a/util/aio-win32.c b/util/aio-win32.c
> index a67b00c..e676a8d 100644
> --- a/util/aio-win32.c
> +++ b/util/aio-win32.c
> @@ -407,6 +407,10 @@ void aio_context_setup(AioContext *ctx)
>  {
>  }
>  
> +void aio_context_destroy(AioContext *ctx)
> +{
> +}
> +
>  void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
>   int64_t grow, int64_t shrink, Error **errp)
>  {
> diff --git a/util/async.c b/util/async.c
> index 4dd9d95..03f6278 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -298,6 +298,7 @@ aio_ctx_finalize(GSource *source)
>  qemu_rec_mutex_destroy(&ctx->lock);
>  qemu_lockcnt_destroy(&ctx->list_lock);
>  timerlistgroup_deinit(&ctx->tlg);
> +aio_context_destroy(ctx);
>  }
>  
>  static GSourceFuncs aio_source_funcs = {
> -- 
> 1.8.3.1
> 

Fam

Re: [Qemu-block] [PATCH v6 2/2] iothread: let aio_epoll_disable fit to aio_context_destroy

2018-05-16 Thread Fam Zheng

On Thu, 05/17 10:26, Jie Wang wrote:
> epoll_available will only be set if epollfd != -1, os we

s/os/so/

> can swap the two variables in aio_epoll_disable, and
> aio_context_destroy can call aio_epoll_disable directly.

If you put this as 1/2 in v7, you do not want to mention the
yet-to-be-introduced aio_context_destroy this way. Maybe you can write as "the
coming aio_context_destroy can call it directly.".

Fam

> 
> Signed-off-by: Jie Wang 
> ---
>  util/aio-posix.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index 0ade2c7..118bf57 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -45,11 +45,11 @@ struct AioHandler
>  
>  static void aio_epoll_disable(AioContext *ctx)
>  {
> -ctx->epoll_available = false;
> -if (!ctx->epoll_enabled) {
> +ctx->epoll_enabled = false;
> +if (!ctx->epoll_available) {
>  return;
>  }
> -ctx->epoll_enabled = false;
> +ctx->epoll_available = false;
>  close(ctx->epollfd);
>  }
>  
> @@ -716,9 +716,7 @@ void aio_context_setup(AioContext *ctx)
>  void aio_context_destroy(AioContext *ctx)
>  {
>  #ifdef CONFIG_EPOLL_CREATE1
> -if (ctx->epollfd >= 0) {
> -close(ctx->epollfd);
> -}
> +aio_epoll_disable(ctx);
>  #endif
>  }
>  
> -- 
> 1.8.3.1
>

Re: [Qemu-block] [PATCH v6 2/2] iothread: let aio_epoll_disable fit to aio_context_destroy

2018-05-16 Thread Peter Xu

On Thu, May 17, 2018 at 10:26:17AM +0800, Jie Wang wrote:
> epoll_available will only be set if epollfd != -1, os we
> can swap the two variables in aio_epoll_disable, and
> aio_context_destroy can call aio_epoll_disable directly.
> 
> Signed-off-by: Jie Wang 
> ---
>  util/aio-posix.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index 0ade2c7..118bf57 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -45,11 +45,11 @@ struct AioHandler
>  
>  static void aio_epoll_disable(AioContext *ctx)
>  {
> -ctx->epoll_available = false;
> -if (!ctx->epoll_enabled) {
> +ctx->epoll_enabled = false;
> +if (!ctx->epoll_available) {
>  return;
>  }
> -ctx->epoll_enabled = false;
> +ctx->epoll_available = false;
>  close(ctx->epollfd);
>  }
>  
> @@ -716,9 +716,7 @@ void aio_context_setup(AioContext *ctx)
>  void aio_context_destroy(AioContext *ctx)
>  {
>  #ifdef CONFIG_EPOLL_CREATE1
> -if (ctx->epollfd >= 0) {
> -close(ctx->epollfd);
> -}
> +aio_epoll_disable(ctx);

Hmm... I think this patch should be the first if to split.

Anyway, IMHO version 5 is already good enough and has got r-bs, no
need to bother reposting a version 7.  Maintainer could possibly touch
up the commit message if necessary.

Thanks,

>  #endif
>  }
>  
> -- 
> 1.8.3.1
> 

-- 
Peter Xu

Re: [Qemu-block] [PATCH v6 2/2] iothread: let aio_epoll_disable fit to aio_context_destroy

2018-05-16 Thread WangJie (Pluto)

I enjoyed the great benefit of your suggestions, and I will improve next time. 
:)
This time, I ask maintainers to touch up the commit message base on version 5 
and merge it, thanks very much.

On 2018/5/17 14:22, Peter Xu wrote:
> On Thu, May 17, 2018 at 10:26:17AM +0800, Jie Wang wrote:
>> epoll_available will only be set if epollfd != -1, os we
>> can swap the two variables in aio_epoll_disable, and
>> aio_context_destroy can call aio_epoll_disable directly.
>>
>> Signed-off-by: Jie Wang 
>> ---
>>  util/aio-posix.c | 10 --
>>  1 file changed, 4 insertions(+), 6 deletions(-)
>>
>> diff --git a/util/aio-posix.c b/util/aio-posix.c
>> index 0ade2c7..118bf57 100644
>> --- a/util/aio-posix.c
>> +++ b/util/aio-posix.c
>> @@ -45,11 +45,11 @@ struct AioHandler
>>  
>>  static void aio_epoll_disable(AioContext *ctx)
>>  {
>> -ctx->epoll_available = false;
>> -if (!ctx->epoll_enabled) {
>> +ctx->epoll_enabled = false;
>> +if (!ctx->epoll_available) {
>>  return;
>>  }
>> -ctx->epoll_enabled = false;
>> +ctx->epoll_available = false;
>>  close(ctx->epollfd);
>>  }
>>  
>> @@ -716,9 +716,7 @@ void aio_context_setup(AioContext *ctx)
>>  void aio_context_destroy(AioContext *ctx)
>>  {
>>  #ifdef CONFIG_EPOLL_CREATE1
>> -if (ctx->epollfd >= 0) {
>> -close(ctx->epollfd);
>> -}
>> +aio_epoll_disable(ctx);
> 
> Hmm... I think this patch should be the first if to split.
> 
> Anyway, IMHO version 5 is already good enough and has got r-bs, no
> need to bother reposting a version 7.  Maintainer could possibly touch
> up the commit message if necessary.
> 
> Thanks,
> 
>>  #endif
>>  }
>>  
>> -- 
>> 1.8.3.1
>>
>

78 matches

Mail list logo