Re: [Qemu-block] [PATCH v1 1/1] vmstate: fix failed iotests case 68 and 91

2017-03-16 Thread QingFeng Hao



在 2017/3/16 16:01, Juan Quintela 写道:

QingFeng Hao  wrote:

This problem affects s390x only if we are running without KVM.
Basically, S390CPU.irqstate is unused if we do not use KVM,
and thus no buffer is allocated.
This causes size=0, first_elem=NULL and n_elems=1 in
vmstate_load_state and vmstate_save_state. And the assert fails.
With this fix we can go back to the old behavior and support
VMS_VBUFFER with size 0 and nullptr.

Signed-off-by: QingFeng Hao 
Signed-off-by: Halil Pasic 

queued

Thanks!

--
Regards
QingFeng Hao




[Qemu-block] [PATCH for-2.9] block: Propagate error in bdrv_open_backing_file

2017-03-16 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 block.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block.c b/block.c
index cb57370..6621167 100644
--- a/block.c
+++ b/block.c
@@ -2025,6 +2025,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 bdrv_set_backing_hd(bs, backing_hd, _err);
 bdrv_unref(backing_hd);
 if (local_err) {
+error_propagate(errp, local_err);
 ret = -EINVAL;
 goto free_exit;
 }
-- 
2.9.3




Re: [Qemu-block] [RFC] finegrained disk driver options control

2017-03-16 Thread Denis V. Lunev
On 03/16/2017 06:20 PM, Daniel P. Berrange wrote:
> On Thu, Mar 16, 2017 at 06:15:27PM +0300, Denis V. Lunev wrote:
>> On 03/16/2017 06:08 PM, Daniel P. Berrange wrote:
>>> On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
 On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
> On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
>> Hello, All!
>>
>> There is a problem in the current libvirt implementation. domain.xml
>> allows to specify only basic set of options, especially in the case
>> of QEMU, when there are really a lot of tweaks in format drivers.
>> Most likely these options will never be supported in a good way
>> in libvirt as recognizable entities.
>>
>> Right now in order to debug libvirt QEMU VM in production I am using
>> very strange approach:
>> - disk section of domain XML is removed
>> - exact command line options to start the disk are specified at the end
>>   of domain.xml whithin  as described by Stefan
>>  
>> http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
>>
>> The problem is that when debug is finished and viable combinations of
>> options is found I can not drop VM in such state in the production. This
>> is the pain and problem. For example, I have spend 3 days with the
>> VM of one customer which blames us for slow IO in the guest. I have
>> found very good combination of non-standard options which increases
>> disk performance 5 times (not 5%). Currently I can not put this 
>> combination
>> in the production as libvirt does not see the disk.
>>
>> I propose to do very simple thing, may be I am not the first one here,
>> but it would be nice to allow to pass arbitrary option to the QEMU
>> command line. This could be done in a very generic way if we will
>> allow to specify additional options inside  section like this:
>>
>> 
>>   > iothread='1'>
>>   
>>   
>>   
>>   
>>   
>> 
>>
>> and so on. The meaning (at least for QEMU) is quite simple -
>> these options will just be added to the end of the -drive command
>> line. The meaning for other drivers should be the same and I
>> think that there are ways to pass generic options in them.
> It is a general policy that we do *not* do generic option passthrough
> in this kind of manner. We always want to represent concepts explicitly
> with named attributes, so that if 2 hypervisors support the same concept
> we can map it the same way in the XML
 OK. How could I change L2 cache size for QCOW2 image?

 For 1 Tb disk, fragmented in guest, the performance loss is
 around 10 times. 10 TIMES. 1000%. The customer could not
 wait until proper fix in the next QEMU release especially
 if we are able to provide the kludge specifically for him.
>>> We can explicitly allow L2 cache size set in the XML but that
>>> is a pretty poor solution to the problem IMHO, as the mgmt
>>> application has no apriori knowledge of whether a particular
>>> cache size is going to be right for a particular QCow2 image.
>>>
>>> For a sustainable solution, IMHO this really needs to be fixed
>>> in QEMU so it has either a more appropriate default, or if a
>>> single default is not possible, have QEMU auto-tune its cache
>>> size dynamically to suit the characteristics of the qcow2 image.
>> Yes, I agree. That is why I am spoken about the kludge.
>>
>>
 There is an option  which specifically
 works like this. It is enabled specifically with changed scheme.
 OK, we can have this option enabled only under the same
 condition. But we have to have a way to solve the problem
 at the moment. Not in 3 month of painful dances within
 the driver. May be with limitations line increased memory
 footprint, but still.
>>> Sure, you can use  passthrough - that is the explicit
>>> temporary workaround - we don't provide any guarantee that your guest
>>> won't break when upgrading either libvirt or QEMU though, hence we
>>> mark it as tainted.
>> No and yes. Yes,  partially solves the situation.
>> No this solution has tooo strong drawbacks IMHO. The configuration
>> of this VM could not be changed anymore in any viable way and
>> there are a lot of problems as one disk is absent at libvirt level.
>>
>> Can we add the option when the VM config is tainted and debug
>> scheme enabled specifically to the disk level? This would the best
>> partial solution, which will not ruin other management tasks
>> like backup, disk add etc.
> We really don't want to propagate the custom passthrough into further
> areas of the XML. It is intentionally limited because it is not
> something we want people/apps to use for anything other than a short
> term hack. You should be able to use qemu's  '-set' argument to set
> fields against existing QEMU args, without 

Re: [Qemu-block] [RFC] finegrained disk driver options control

2017-03-16 Thread Denis V. Lunev
On 03/16/2017 06:08 PM, Daniel P. Berrange wrote:
> On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
>> On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
>>> On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
 Hello, All!

 There is a problem in the current libvirt implementation. domain.xml
 allows to specify only basic set of options, especially in the case
 of QEMU, when there are really a lot of tweaks in format drivers.
 Most likely these options will never be supported in a good way
 in libvirt as recognizable entities.

 Right now in order to debug libvirt QEMU VM in production I am using
 very strange approach:
 - disk section of domain XML is removed
 - exact command line options to start the disk are specified at the end
   of domain.xml whithin  as described by Stefan
  
 http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html

 The problem is that when debug is finished and viable combinations of
 options is found I can not drop VM in such state in the production. This
 is the pain and problem. For example, I have spend 3 days with the
 VM of one customer which blames us for slow IO in the guest. I have
 found very good combination of non-standard options which increases
 disk performance 5 times (not 5%). Currently I can not put this combination
 in the production as libvirt does not see the disk.

 I propose to do very simple thing, may be I am not the first one here,
 but it would be nice to allow to pass arbitrary option to the QEMU
 command line. This could be done in a very generic way if we will
 allow to specify additional options inside  section like this:

 
   >>> iothread='1'>
   
   
   
   
   
 

 and so on. The meaning (at least for QEMU) is quite simple -
 these options will just be added to the end of the -drive command
 line. The meaning for other drivers should be the same and I
 think that there are ways to pass generic options in them.
>>> It is a general policy that we do *not* do generic option passthrough
>>> in this kind of manner. We always want to represent concepts explicitly
>>> with named attributes, so that if 2 hypervisors support the same concept
>>> we can map it the same way in the XML
>> OK. How could I change L2 cache size for QCOW2 image?
>>
>> For 1 Tb disk, fragmented in guest, the performance loss is
>> around 10 times. 10 TIMES. 1000%. The customer could not
>> wait until proper fix in the next QEMU release especially
>> if we are able to provide the kludge specifically for him.
> We can explicitly allow L2 cache size set in the XML but that
> is a pretty poor solution to the problem IMHO, as the mgmt
> application has no apriori knowledge of whether a particular
> cache size is going to be right for a particular QCow2 image.
>
> For a sustainable solution, IMHO this really needs to be fixed
> in QEMU so it has either a more appropriate default, or if a
> single default is not possible, have QEMU auto-tune its cache
> size dynamically to suit the characteristics of the qcow2 image.
Yes, I agree. That is why I am spoken about the kludge.


>> There is an option  which specifically
>> works like this. It is enabled specifically with changed scheme.
>> OK, we can have this option enabled only under the same
>> condition. But we have to have a way to solve the problem
>> at the moment. Not in 3 month of painful dances within
>> the driver. May be with limitations line increased memory
>> footprint, but still.
> Sure, you can use  passthrough - that is the explicit
> temporary workaround - we don't provide any guarantee that your guest
> won't break when upgrading either libvirt or QEMU though, hence we
> mark it as tainted.
No and yes. Yes,  partially solves the situation.
No this solution has tooo strong drawbacks IMHO. The configuration
of this VM could not be changed anymore in any viable way and
there are a lot of problems as one disk is absent at libvirt level.

Can we add the option when the VM config is tainted and debug
scheme enabled specifically to the disk level? This would the best
partial solution, which will not ruin other management tasks
like backup, disk add etc.

Den



Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/3] block: pause block jobs for bdrv_drain_begin/end

2017-03-16 Thread no-reply
Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [PATCH v2 0/3] block: pause block jobs for 
bdrv_drain_begin/end
Message-id: 20170316212351.13797-1-js...@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20170316212351.13797-1-js...@redhat.com -> 
patchew/20170316212351.13797-1-js...@redhat.com
Switched to a new branch 'test'
1cca6f3 blockjob: add devops to blockjob backends
864d906 block-backend: add drained_begin / drained_end ops
5e4f22d blockjob: add block_job_start_shim

=== OUTPUT BEGIN ===
Checking PATCH 1/3: blockjob: add block_job_start_shim...
Checking PATCH 2/3: block-backend: add drained_begin / drained_end ops...
ERROR: suspect code indent for conditional statements (8, 14)
#70: FILE: block/block-backend.c:1903:
+if (blk->dev_ops && blk->dev_ops->drained_end) {
+  blk->dev_ops->drained_end(blk->dev_opaque);

total: 1 errors, 0 warnings, 67 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 3/3: blockjob: add devops to blockjob backends...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

[Qemu-block] [PATCH v2 3/3] blockjob: add devops to blockjob backends

2017-03-16 Thread John Snow
This lets us hook into drained_begin and drained_end requests from the
backend level, which is particularly useful for making sure that all
jobs associated with a particular node (whether the source or the target)
receive a drain request.

Suggested-by: Kevin Wolf 
Signed-off-by: John Snow 
---
 blockjob.c | 29 -
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 69b4ec6..a11d5ce 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -68,6 +68,23 @@ static const BdrvChildRole child_job = {
 .stay_at_node   = true,
 };
 
+static void block_job_drained_begin(void *opaque)
+{
+BlockJob *job = opaque;
+block_job_pause(job);
+}
+
+static void block_job_drained_end(void *opaque)
+{
+BlockJob *job = opaque;
+block_job_resume(job);
+}
+
+static const BlockDevOps block_job_dev_ops = {
+.drained_begin = block_job_drained_begin,
+.drained_end = block_job_drained_end,
+};
+
 BlockJob *block_job_next(BlockJob *job)
 {
 if (!job) {
@@ -205,11 +222,6 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 }
 
 job = g_malloc0(driver->instance_size);
-error_setg(>blocker, "block device is in use by block job: %s",
-   BlockJobType_lookup[driver->job_type]);
-block_job_add_bdrv(job, "main node", bs, 0, BLK_PERM_ALL, _abort);
-bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
-
 job->driver= driver;
 job->id= g_strdup(job_id);
 job->blk   = blk;
@@ -219,8 +231,15 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 job->paused= true;
 job->pause_count   = 1;
 job->refcnt= 1;
+
+error_setg(>blocker, "block device is in use by block job: %s",
+   BlockJobType_lookup[driver->job_type]);
+block_job_add_bdrv(job, "main node", bs, 0, BLK_PERM_ALL, _abort);
 bs->job = job;
 
+blk_set_dev_ops(blk, _job_dev_ops, job);
+bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
+
 QLIST_INSERT_HEAD(_jobs, job, job_list);
 
 blk_add_aio_context_notifier(blk, block_job_attached_aio_context,
-- 
2.9.3




[Qemu-block] [PATCH v2 2/3] block-backend: add drained_begin / drained_end ops

2017-03-16 Thread John Snow
Allow block backends to forward drain requests to their devices/users.
The initial intended purpose for this patch is to allow BBs to forward
requests along to BlockJobs, which will want to pause if their associated
BB has entered a drained region.

Signed-off-by: John Snow 
---
 block/block-backend.c  | 24 ++--
 include/sysemu/block-backend.h |  8 
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 5742c09..dec4a1c 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -65,6 +65,8 @@ struct BlockBackend {
 bool allow_write_beyond_eof;
 
 NotifierList remove_bs_notifiers, insert_bs_notifiers;
+
+int quiesce_counter;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -699,12 +701,17 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps 
*ops,
  void *opaque)
 {
 /* All drivers that use blk_set_dev_ops() are qdevified and we want to keep
- * it that way, so we can assume blk->dev is a DeviceState if blk->dev_ops
- * is set. */
+ * it that way, so we can assume blk->dev, if present, is a DeviceState if
+ * blk->dev_ops is set. Non-device users may use dev_ops without device. */
 assert(!blk->legacy_dev);
 
 blk->dev_ops = ops;
 blk->dev_opaque = opaque;
+
+/* Are we currently quiesced? Should we enforce this right now? */
+if (blk->quiesce_counter && ops->drained_begin) {
+ops->drained_begin(opaque);
+}
 }
 
 /*
@@ -1870,6 +1877,12 @@ static void blk_root_drained_begin(BdrvChild *child)
 {
 BlockBackend *blk = child->opaque;
 
+if (++blk->quiesce_counter == 1) {
+if (blk->dev_ops && blk->dev_ops->drained_begin) {
+blk->dev_ops->drained_begin(blk->dev_opaque);
+}
+}
+
 /* Note that blk->root may not be accessible here yet if we are just
  * attaching to a BlockDriverState that is drained. Use child instead. */
 
@@ -1881,7 +1894,14 @@ static void blk_root_drained_begin(BdrvChild *child)
 static void blk_root_drained_end(BdrvChild *child)
 {
 BlockBackend *blk = child->opaque;
+assert(blk->quiesce_counter);
 
 assert(blk->public.io_limits_disabled);
 --blk->public.io_limits_disabled;
+
+if (--blk->quiesce_counter == 0) {
+if (blk->dev_ops && blk->dev_ops->drained_end) {
+  blk->dev_ops->drained_end(blk->dev_opaque);
+}
+}
 }
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 096c17f..7462228 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -58,6 +58,14 @@ typedef struct BlockDevOps {
  * Runs when the size changed (e.g. monitor command block_resize)
  */
 void (*resize_cb)(void *opaque);
+/*
+ * Runs when the backend receives a drain request.
+ */
+void (*drained_begin)(void *opaque);
+/*
+ * Runs when the backend's last drain request ends.
+ */
+void (*drained_end)(void *opaque);
 } BlockDevOps;
 
 /* This struct is embedded in (the private) BlockBackend struct and contains
-- 
2.9.3




[Qemu-block] [PATCH v2 1/3] blockjob: add block_job_start_shim

2017-03-16 Thread John Snow
The purpose of this shim is to allow us to pause pre-started jobs.
The purpose of *that* is to allow us to buffer a pause request that
will be able to take effect before the job ever does any work, allowing
us to create jobs during a quiescent state (under which they will be
automatically paused), then resuming the jobs after the critical section
in any order, either:

(1) -block_job_start
-block_job_resume (via e.g. drained_end)

(2) -block_job_resume (via e.g. drained_end)
-block_job_start

The problem that requires a startup wrapper is the idea that a job must
start in the busy=true state only its first time-- all subsequent entries
require busy to be false, and the toggling of this state is otherwise
handled during existing pause and yield points.

The wrapper simply allows us to mandate that a job can "start," set busy
to true, then immediately pause only if necessary. We could avoid
requiring a wrapper, but all jobs would need to do it, so it's been
factored out here.

Signed-off-by: John Snow 
---
 blockjob.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 69126af..69b4ec6 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -250,16 +250,28 @@ static bool block_job_started(BlockJob *job)
 return job->co;
 }
 
+/**
+ * All jobs must allow a pause point before entering their job proper. This
+ * ensures that jobs can be paused prior to being started, then resumed later.
+ */
+static void coroutine_fn block_job_co_entry(void *opaque)
+{
+BlockJob *job = opaque;
+
+assert(job && job->driver && job->driver->start);
+block_job_pause_point(job);
+job->driver->start(job);
+}
+
 void block_job_start(BlockJob *job)
 {
 assert(job && !block_job_started(job) && job->paused &&
-   !job->busy && job->driver->start);
-job->co = qemu_coroutine_create(job->driver->start, job);
-if (--job->pause_count == 0) {
-job->paused = false;
-job->busy = true;
-qemu_coroutine_enter(job->co);
-}
+   job->driver && job->driver->start);
+job->co = qemu_coroutine_create(block_job_co_entry, job);
+job->pause_count--;
+job->busy = true;
+job->paused = false;
+qemu_coroutine_enter(job->co);
 }
 
 void block_job_ref(BlockJob *job)
-- 
2.9.3




[Qemu-block] [PATCH v2 0/3] block: pause block jobs for bdrv_drain_begin/end

2017-03-16 Thread John Snow
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1367369#c8

It's possible to wedge QEMU if the guest tries to reset a virtio-pci
device as QEMU is also using the drive for a blockjob. This patchset
aims to allow us to safely pause/resume jobs attached to individual
nodes in a manner similar to how bdrv_drain_all_begin/end do.

John Snow (3):
  blockjob: add block_job_start_shim
  block-backend: add drained_begin / drained_end ops
  blockjob: add devops to blockjob backends

 block/block-backend.c  | 24 --
 blockjob.c | 55 +-
 include/sysemu/block-backend.h |  8 ++
 3 files changed, 73 insertions(+), 14 deletions(-)

-- 
2.9.3




Re: [Qemu-block] [Qemu-stable] [PATCH for-2.9] thread-pool: add missing qemu_bh_cancel in completion function

2017-03-16 Thread Peter Lieven

> Am 16.03.2017 um 17:18 schrieb Paolo Bonzini :
> 
> 
> 
> On 16/03/2017 17:02, Peter Lieven wrote:
>> commit 3c80ca15 fixed a deadlock scenarion with nested aio_poll invocations.
>> 
>> However, the rescheduling of the completion BH introcuded unnecessary 
>> spinning
>> in the main-loop. On very fast file backends this can even lead to the
>> "WARNING: I/O thread spun for 1000 iterations" message popping up.
>> 
>> Callgrind reports about 3-4% less instructions with this patch running
>> qemu-img bench on a ramdisk based VMDK file.
>> 
>> Fixes: 3c80ca158c96ff902a30883a8933e755988948b1
>> Cc: qemu-sta...@nongnu.org
>> Signed-off-by: Peter Lieven 
>> ---
>> util/thread-pool.c | 7 +++
>> 1 file changed, 7 insertions(+)
>> 
>> diff --git a/util/thread-pool.c b/util/thread-pool.c
>> index ce6cd30..610646d 100644
>> --- a/util/thread-pool.c
>> +++ b/util/thread-pool.c
>> @@ -188,6 +188,13 @@ restart:
>> aio_context_release(pool->ctx);
>> elem->common.cb(elem->common.opaque, elem->ret);
>> aio_context_acquire(pool->ctx);
>> +
>> +/* We can safely cancel the completion_bh here regardless of 
>> someone
>> + * else having scheduled it meanwhile because we reenter the
>> + * completion function anyway (goto restart).
>> + */
>> +qemu_bh_cancel(pool->completion_bh);
>> +
>> qemu_aio_unref(elem);
>> goto restart;
>> } else {
>> 
> 
> Right, this is the same thing that block/linux-aio.c does.


Okay, so you also think its safe to do this? As far as I have seen you have 
been working a lot on the aio code recently.

Thanks,
Peter

Re: [Qemu-block] [Qemu-devel] [RFC patch 2/3] block-backend: add drained_begin / drained_end ops

2017-03-16 Thread John Snow


On 03/16/2017 01:25 PM, Kevin Wolf wrote:
> Am 16.03.2017 um 01:46 hat John Snow geschrieben:
>> Signed-off-by: John Snow 
>>
>> ---
>>
>> RFC questions:
>>
>> - Does the presence of blk->quiesce_counter relieve the burden of needing
>>   blk->public.io_limits_disabled? I could probably eliminate this counter
>>   entirely and just spy on the root node's quiescent state at key moments
>>   instead. I am confident I'm traipsing on delicate drain semantics.
> 
> Not for 2.9 anyway. :-)
> 
>> - Should I treat the separation of a quisced BDS/BB as a drained_end request
>>   as an analogue to how blk_insert_bs (in this patch) handles this?
> 
> It's already taken care of, you don't need to add anything for that
> (see below).
> 
>> Signed-off-by: John Snow 
>> ---
>>  block/block-backend.c  | 25 +
>>  include/sysemu/block-backend.h |  8 
>>  2 files changed, 33 insertions(+)
>>
>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index 5742c09..eb85e8b 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -65,6 +65,8 @@ struct BlockBackend {
>>  bool allow_write_beyond_eof;
>>  
>>  NotifierList remove_bs_notifiers, insert_bs_notifiers;
>> +
>> +int quiesce_counter;
>>  };
>>  
>>  typedef struct BlockBackendAIOCB {
>> @@ -559,6 +561,11 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState 
>> *bs, Error **errp)
>>  }
>>  bdrv_ref(bs);
>>  
>> +/* The new BDS may be quiescent, we should attempt to match */
>> +if (bs->quiesce_counter) {
>> +blk_root_drained_begin(blk->root);
>> +}
> 
> Do you really need this hunk? BB and BDS should already be kept in sync,
> it's just the BB and its user that aren't. This is the call chain that
> already leads to blk_root_drained_begin():
> 

Oh, good call..!

> blk_insert_bs
> bdrv_root_attach_child
> bdrv_replace_child
> bdrv_replace_child_noperm
> child->role->drained_begin(child)
> 
> Did you check whether blk->public.io_limits_disabled ever goes back to 0
> with your patch? I think it's increased twice and decreased only once.
> 

I didn't, but with your change above, it works out just fine.

> To answer your RFC question, bdrv_replace_child_noperm() also takes care
> of calling drained_end() when the BDS is detached from the BB.
> 

Excellent.

>>  notifier_list_notify(>insert_bs_notifiers, blk);
>>  if (blk->public.throttle_state) {
>>  throttle_timers_attach_aio_context(
>> @@ -705,6 +712,11 @@ void blk_set_dev_ops(BlockBackend *blk, const 
>> BlockDevOps *ops,
>>  
>>  blk->dev_ops = ops;
>>  blk->dev_opaque = opaque;
>> +
>> +/* Are we currently quiesced? Should we enforce this right now? */
>> +if (blk->quiesce_counter && ops->drained_begin) {
>> +ops->drained_begin(opaque);
>> +}
>>  }
>>  
>>  /*
>> @@ -1870,9 +1882,15 @@ static void blk_root_drained_begin(BdrvChild *child)
>>  {
>>  BlockBackend *blk = child->opaque;
>>  
>> +blk->quiesce_counter++;
>> +
>>  /* Note that blk->root may not be accessible here yet if we are just
>>   * attaching to a BlockDriverState that is drained. Use child instead. 
>> */
>>  
>> +if (blk->dev_ops && blk->dev_ops->drained_begin) {
>> +blk->dev_ops->drained_begin(blk->dev_opaque);
>> +}
> 
> Should we do this only if blk->quiesce_counter was 0? (And
> dev_ops->drained_end() when it reaches 0 again.)
> 

Ah yes, actually, that led to the question about .io_limits_disabled,
because I could have both begin/end only operate on the edge between 0/1.

> The difference is that the implementation of the callback would have to
> have its own counter as it is in this patch. Not really that bad, but at
> the moment I don't see a reason why we would need to support nested
> drained sections for dev_ops.
> 

Coincidentally, block_job_pause already supports counters! Entirely by
accident this worked.

>> +
>>  if (blk->public.io_limits_disabled++ == 0) {
>>  throttle_group_restart_blk(blk);
>>  }
>> @@ -1882,6 +1900,13 @@ static void blk_root_drained_end(BdrvChild *child)
>>  {
>>  BlockBackend *blk = child->opaque;
>>  
>> +assert(blk->quiesce_counter);
>> +blk->quiesce_counter--;
>> +
>>  assert(blk->public.io_limits_disabled);
>>  --blk->public.io_limits_disabled;
>> +
>> +if (blk->dev_ops && blk->dev_ops->drained_end) {
>> +blk->dev_ops->drained_end(blk->dev_opaque);
>> +}
> 
> The basic idea looks good to me.
> 
> Kevin
> 

I'll send a v2 shortly, ever-so-slightly touched up.

--js



Re: [Qemu-block] [RFC patch 3/3] blockjob: add devops to blockjob backends

2017-03-16 Thread Kevin Wolf
Am 16.03.2017 um 01:46 hat John Snow geschrieben:
> This lets us hook into drained_begin and drained_end requests from the
> backend level, which is particularly useful for making sure that all
> jobs associated with a particular node (whether the source or the target)
> receive a drain request.
> 
> Suggested-by: Kevin Wolf 
> Signed-off-by: John Snow 
> 
> --
> 
> RFC topics:
> 
> - BlkDevOps is traditionally only for Qdev devices, and a BlockJob is not
>   currently a 'device'... Do we want to loosen this restriction, find another
>   way to deliver callbacks to BlockJobs attached to BlkBackends, or do 
> something
>   crazy like make a BlockJob device object?
> 
>   struct JobDevice {
> DeviceState parent_obj;
> BlockJob *job;
>   } ...??

We should probably rename BlkDevOps to something that works not only for
devices, but for any user of a BlockBackend. I don't think the
implementation has to be changed, it's just a struct of callbacks and an
opaque pointer that is actually properly treated as opaque.

BlockBackend also has a dev field, which is indeed supposed to be a
DeviceState and is sometimes casted to one (if we can assert
!blk->legacy_dev), but it's a concept completely separate from
BlkDevOps. So we just need to be sure not to call blk_attach_dev() or
blk_attach_dev_legacy().

Kevin



Re: [Qemu-block] [RFC] finegrained disk driver options control

2017-03-16 Thread Denis V. Lunev
On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
> On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
>> Hello, All!
>>
>> There is a problem in the current libvirt implementation. domain.xml
>> allows to specify only basic set of options, especially in the case
>> of QEMU, when there are really a lot of tweaks in format drivers.
>> Most likely these options will never be supported in a good way
>> in libvirt as recognizable entities.
>>
>> Right now in order to debug libvirt QEMU VM in production I am using
>> very strange approach:
>> - disk section of domain XML is removed
>> - exact command line options to start the disk are specified at the end
>>   of domain.xml whithin  as described by Stefan
>>  
>> http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
>>
>> The problem is that when debug is finished and viable combinations of
>> options is found I can not drop VM in such state in the production. This
>> is the pain and problem. For example, I have spend 3 days with the
>> VM of one customer which blames us for slow IO in the guest. I have
>> found very good combination of non-standard options which increases
>> disk performance 5 times (not 5%). Currently I can not put this combination
>> in the production as libvirt does not see the disk.
>>
>> I propose to do very simple thing, may be I am not the first one here,
>> but it would be nice to allow to pass arbitrary option to the QEMU
>> command line. This could be done in a very generic way if we will
>> allow to specify additional options inside  section like this:
>>
>> 
>>   > iothread='1'>
>>   
>>   
>>   
>>   
>>   
>> 
>>
>> and so on. The meaning (at least for QEMU) is quite simple -
>> these options will just be added to the end of the -drive command
>> line. The meaning for other drivers should be the same and I
>> think that there are ways to pass generic options in them.
> It is a general policy that we do *not* do generic option passthrough
> in this kind of manner. We always want to represent concepts explicitly
> with named attributes, so that if 2 hypervisors support the same concept
> we can map it the same way in the XML
>
> Regards,
> Daniel
In general this policy means that the management software which
wants to implement some differentiation in between VMs f.e.
in disk tuning is forced to use qemu:commandline backdoor.
That is a pity. Exactly like in the case with additional logs.

OK.

Thank you for the discussion. At least I have found new way
to perform some fine tuning.

Den



Re: [Qemu-block] [RFC patch 2/3] block-backend: add drained_begin / drained_end ops

2017-03-16 Thread Kevin Wolf
Am 16.03.2017 um 01:46 hat John Snow geschrieben:
> Signed-off-by: John Snow 
> 
> ---
> 
> RFC questions:
> 
> - Does the presence of blk->quiesce_counter relieve the burden of needing
>   blk->public.io_limits_disabled? I could probably eliminate this counter
>   entirely and just spy on the root node's quiescent state at key moments
>   instead. I am confident I'm traipsing on delicate drain semantics.

Not for 2.9 anyway. :-)

> - Should I treat the separation of a quisced BDS/BB as a drained_end request
>   as an analogue to how blk_insert_bs (in this patch) handles this?

It's already taken care of, you don't need to add anything for that
(see below).

> Signed-off-by: John Snow 
> ---
>  block/block-backend.c  | 25 +
>  include/sysemu/block-backend.h |  8 
>  2 files changed, 33 insertions(+)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 5742c09..eb85e8b 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -65,6 +65,8 @@ struct BlockBackend {
>  bool allow_write_beyond_eof;
>  
>  NotifierList remove_bs_notifiers, insert_bs_notifiers;
> +
> +int quiesce_counter;
>  };
>  
>  typedef struct BlockBackendAIOCB {
> @@ -559,6 +561,11 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState 
> *bs, Error **errp)
>  }
>  bdrv_ref(bs);
>  
> +/* The new BDS may be quiescent, we should attempt to match */
> +if (bs->quiesce_counter) {
> +blk_root_drained_begin(blk->root);
> +}

Do you really need this hunk? BB and BDS should already be kept in sync,
it's just the BB and its user that aren't. This is the call chain that
already leads to blk_root_drained_begin():

blk_insert_bs
bdrv_root_attach_child
bdrv_replace_child
bdrv_replace_child_noperm
child->role->drained_begin(child)

Did you check whether blk->public.io_limits_disabled ever goes back to 0
with your patch? I think it's increased twice and decreased only once.

To answer your RFC question, bdrv_replace_child_noperm() also takes care
of calling drained_end() when the BDS is detached from the BB.

>  notifier_list_notify(>insert_bs_notifiers, blk);
>  if (blk->public.throttle_state) {
>  throttle_timers_attach_aio_context(
> @@ -705,6 +712,11 @@ void blk_set_dev_ops(BlockBackend *blk, const 
> BlockDevOps *ops,
>  
>  blk->dev_ops = ops;
>  blk->dev_opaque = opaque;
> +
> +/* Are we currently quiesced? Should we enforce this right now? */
> +if (blk->quiesce_counter && ops->drained_begin) {
> +ops->drained_begin(opaque);
> +}
>  }
>  
>  /*
> @@ -1870,9 +1882,15 @@ static void blk_root_drained_begin(BdrvChild *child)
>  {
>  BlockBackend *blk = child->opaque;
>  
> +blk->quiesce_counter++;
> +
>  /* Note that blk->root may not be accessible here yet if we are just
>   * attaching to a BlockDriverState that is drained. Use child instead. */
>  
> +if (blk->dev_ops && blk->dev_ops->drained_begin) {
> +blk->dev_ops->drained_begin(blk->dev_opaque);
> +}

Should we do this only if blk->quiesce_counter was 0? (And
dev_ops->drained_end() when it reaches 0 again.)

The difference is that the implementation of the callback would have to
have its own counter as it is in this patch. Not really that bad, but at
the moment I don't see a reason why we would need to support nested
drained sections for dev_ops.

> +
>  if (blk->public.io_limits_disabled++ == 0) {
>  throttle_group_restart_blk(blk);
>  }
> @@ -1882,6 +1900,13 @@ static void blk_root_drained_end(BdrvChild *child)
>  {
>  BlockBackend *blk = child->opaque;
>  
> +assert(blk->quiesce_counter);
> +blk->quiesce_counter--;
> +
>  assert(blk->public.io_limits_disabled);
>  --blk->public.io_limits_disabled;
> +
> +if (blk->dev_ops && blk->dev_ops->drained_end) {
> +blk->dev_ops->drained_end(blk->dev_opaque);
> +}

The basic idea looks good to me.

Kevin



[Qemu-block] [RFC] finegrained disk driver options control

2017-03-16 Thread Denis V. Lunev
Hello, All!

There is a problem in the current libvirt implementation. domain.xml
allows to specify only basic set of options, especially in the case
of QEMU, when there are really a lot of tweaks in format drivers.
Most likely these options will never be supported in a good way
in libvirt as recognizable entities.

Right now in order to debug libvirt QEMU VM in production I am using
very strange approach:
- disk section of domain XML is removed
- exact command line options to start the disk are specified at the end
  of domain.xml whithin  as described by Stefan
 
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html

The problem is that when debug is finished and viable combinations of
options is found I can not drop VM in such state in the production. This
is the pain and problem. For example, I have spend 3 days with the
VM of one customer which blames us for slow IO in the guest. I have
found very good combination of non-standard options which increases
disk performance 5 times (not 5%). Currently I can not put this combination
in the production as libvirt does not see the disk.

I propose to do very simple thing, may be I am not the first one here,
but it would be nice to allow to pass arbitrary option to the QEMU
command line. This could be done in a very generic way if we will
allow to specify additional options inside  section like this:


  
  
  
  
  
  


and so on. The meaning (at least for QEMU) is quite simple -
these options will just be added to the end of the -drive command
line. The meaning for other drivers should be the same and I
think that there are ways to pass generic options in them.

Any opinion?

Den



Re: [Qemu-block] [RFC] finegrained disk driver options control

2017-03-16 Thread Denis V. Lunev
On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
> On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
>> Hello, All!
>>
>> There is a problem in the current libvirt implementation. domain.xml
>> allows to specify only basic set of options, especially in the case
>> of QEMU, when there are really a lot of tweaks in format drivers.
>> Most likely these options will never be supported in a good way
>> in libvirt as recognizable entities.
>>
>> Right now in order to debug libvirt QEMU VM in production I am using
>> very strange approach:
>> - disk section of domain XML is removed
>> - exact command line options to start the disk are specified at the end
>>   of domain.xml whithin  as described by Stefan
>>  
>> http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
>>
>> The problem is that when debug is finished and viable combinations of
>> options is found I can not drop VM in such state in the production. This
>> is the pain and problem. For example, I have spend 3 days with the
>> VM of one customer which blames us for slow IO in the guest. I have
>> found very good combination of non-standard options which increases
>> disk performance 5 times (not 5%). Currently I can not put this combination
>> in the production as libvirt does not see the disk.
>>
>> I propose to do very simple thing, may be I am not the first one here,
>> but it would be nice to allow to pass arbitrary option to the QEMU
>> command line. This could be done in a very generic way if we will
>> allow to specify additional options inside  section like this:
>>
>> 
>>   > iothread='1'>
>>   
>>   
>>   
>>   
>>   
>> 
>>
>> and so on. The meaning (at least for QEMU) is quite simple -
>> these options will just be added to the end of the -drive command
>> line. The meaning for other drivers should be the same and I
>> think that there are ways to pass generic options in them.
> It is a general policy that we do *not* do generic option passthrough
> in this kind of manner. We always want to represent concepts explicitly
> with named attributes, so that if 2 hypervisors support the same concept
> we can map it the same way in the XML
>
> Regards,
> Daniel
OK. How could I change L2 cache size for QCOW2 image?

For 1 Tb disk, fragmented in guest, the performance loss is
around 10 times. 10 TIMES. 1000%. The customer could not
wait until proper fix in the next QEMU release especially
if we are able to provide the kludge specifically for him.

There is an option  which specifically
works like this. It is enabled specifically with changed scheme.
OK, we can have this option enabled only under the same
condition. But we have to have a way to solve the problem
at the moment. Not in 3 month of painful dances within
the driver. May be with limitations line increased memory
footprint, but still.

Will it work for you?

Den



Re: [Qemu-block] [RFC] finegrained disk driver options control

2017-03-16 Thread Kevin Wolf
Am 16.03.2017 um 16:52 hat Daniel P. Berrange geschrieben:
> On Thu, Mar 16, 2017 at 04:35:36PM +0100, Kevin Wolf wrote:
> > Am 16.03.2017 um 16:08 hat Daniel P. Berrange geschrieben:
> > > On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
> > > > On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
> > > > > On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
> > > > >> Hello, All!
> > > > >>
> > > > >> There is a problem in the current libvirt implementation. domain.xml
> > > > >> allows to specify only basic set of options, especially in the case
> > > > >> of QEMU, when there are really a lot of tweaks in format drivers.
> > > > >> Most likely these options will never be supported in a good way
> > > > >> in libvirt as recognizable entities.
> > > > >>
> > > > >> Right now in order to debug libvirt QEMU VM in production I am using
> > > > >> very strange approach:
> > > > >> - disk section of domain XML is removed
> > > > >> - exact command line options to start the disk are specified at the 
> > > > >> end
> > > > >>   of domain.xml whithin  as described by Stefan
> > > > >>  
> > > > >> http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
> > > > >>
> > > > >> The problem is that when debug is finished and viable combinations of
> > > > >> options is found I can not drop VM in such state in the production. 
> > > > >> This
> > > > >> is the pain and problem. For example, I have spend 3 days with the
> > > > >> VM of one customer which blames us for slow IO in the guest. I have
> > > > >> found very good combination of non-standard options which increases
> > > > >> disk performance 5 times (not 5%). Currently I can not put this 
> > > > >> combination
> > > > >> in the production as libvirt does not see the disk.
> > > > >>
> > > > >> I propose to do very simple thing, may be I am not the first one 
> > > > >> here,
> > > > >> but it would be nice to allow to pass arbitrary option to the QEMU
> > > > >> command line. This could be done in a very generic way if we will
> > > > >> allow to specify additional options inside  section like 
> > > > >> this:
> > > > >>
> > > > >> 
> > > > >>> > > >> iothread='1'>
> > > > >>   
> > > > >>   
> > > > >>   
> > > > >>   
> > > > >>> > > >> unit='0'/>
> > > > >> 
> > > > >>
> > > > >> and so on. The meaning (at least for QEMU) is quite simple -
> > > > >> these options will just be added to the end of the -drive command
> > > > >> line. The meaning for other drivers should be the same and I
> > > > >> think that there are ways to pass generic options in them.
> > > > > It is a general policy that we do *not* do generic option passthrough
> > > > > in this kind of manner. We always want to represent concepts 
> > > > > explicitly
> > > > > with named attributes, so that if 2 hypervisors support the same 
> > > > > concept
> > > > > we can map it the same way in the XML
> > > >
> > > > OK. How could I change L2 cache size for QCOW2 image?
> > > > 
> > > > For 1 Tb disk, fragmented in guest, the performance loss is
> > > > around 10 times. 10 TIMES. 1000%. The customer could not
> > > > wait until proper fix in the next QEMU release especially
> > > > if we are able to provide the kludge specifically for him.
> > > 
> > > We can explicitly allow L2 cache size set in the XML but that
> > > is a pretty poor solution to the problem IMHO, as the mgmt
> > > application has no apriori knowledge of whether a particular
> > > cache size is going to be right for a particular QCow2 image.
> > > 
> > > For a sustainable solution, IMHO this really needs to be fixed
> > > in QEMU so it has either a more appropriate default, or if a
> > > single default is not possible, have QEMU auto-tune its cache
> > > size dynamically to suit the characteristics of the qcow2 image.
> > 
> > A tradeoff between memory usage and performance is policy, and setting
> > policy is the management layer's job, no qemu's. We can try to provide
> > good defaults, but they are meant for manual users of qemu. libvirt is
> > expected to configure everything exactly as it wants it instead of
> > relying on defaults.
> 
> The question though is how is an app supposed to figure out what the
> optimal setting for cache size is ?  It seems to require knowledge
> of the level of disk fragmentation and guest I/O patterns, neither
> of which are things we can know upfront. Which means any atttempt to
> set cache size is little more than ill-informed guesswork

No, it requires knowledge whether the user prefers to spend more memory
for improved disk performance of this VM, at the cost of other VMs or
applications running on the machine. And this is something that qemu
can't really figure out any better than libvirt.

If you don't care about that at all, the optimal configuration in terms
of performance is to give qemu a cache large enough that the metadata of
the whole image fits in it. When setting cache-clean-interval, 

Re: [Qemu-block] [PATCH for-2.9] thread-pool: add missing qemu_bh_cancel in completion function

2017-03-16 Thread Paolo Bonzini


On 16/03/2017 17:02, Peter Lieven wrote:
> commit 3c80ca15 fixed a deadlock scenarion with nested aio_poll invocations.
> 
> However, the rescheduling of the completion BH introcuded unnecessary spinning
> in the main-loop. On very fast file backends this can even lead to the
> "WARNING: I/O thread spun for 1000 iterations" message popping up.
> 
> Callgrind reports about 3-4% less instructions with this patch running
> qemu-img bench on a ramdisk based VMDK file.
> 
> Fixes: 3c80ca158c96ff902a30883a8933e755988948b1
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Peter Lieven 
> ---
>  util/thread-pool.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/util/thread-pool.c b/util/thread-pool.c
> index ce6cd30..610646d 100644
> --- a/util/thread-pool.c
> +++ b/util/thread-pool.c
> @@ -188,6 +188,13 @@ restart:
>  aio_context_release(pool->ctx);
>  elem->common.cb(elem->common.opaque, elem->ret);
>  aio_context_acquire(pool->ctx);
> +
> +/* We can safely cancel the completion_bh here regardless of 
> someone
> + * else having scheduled it meanwhile because we reenter the
> + * completion function anyway (goto restart).
> + */
> +qemu_bh_cancel(pool->completion_bh);
> +
>  qemu_aio_unref(elem);
>  goto restart;
>  } else {
> 

Right, this is the same thing that block/linux-aio.c does.

Paolo



Re: [Qemu-block] [RFC] finegrained disk driver options control

2017-03-16 Thread Denis V. Lunev
On 03/16/2017 06:52 PM, Daniel P. Berrange wrote:
> On Thu, Mar 16, 2017 at 04:35:36PM +0100, Kevin Wolf wrote:
>> Am 16.03.2017 um 16:08 hat Daniel P. Berrange geschrieben:
>>> On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
 On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
> On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
>> Hello, All!
>>
>> There is a problem in the current libvirt implementation. domain.xml
>> allows to specify only basic set of options, especially in the case
>> of QEMU, when there are really a lot of tweaks in format drivers.
>> Most likely these options will never be supported in a good way
>> in libvirt as recognizable entities.
>>
>> Right now in order to debug libvirt QEMU VM in production I am using
>> very strange approach:
>> - disk section of domain XML is removed
>> - exact command line options to start the disk are specified at the end
>>   of domain.xml whithin  as described by Stefan
>>  
>> http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
>>
>> The problem is that when debug is finished and viable combinations of
>> options is found I can not drop VM in such state in the production. This
>> is the pain and problem. For example, I have spend 3 days with the
>> VM of one customer which blames us for slow IO in the guest. I have
>> found very good combination of non-standard options which increases
>> disk performance 5 times (not 5%). Currently I can not put this 
>> combination
>> in the production as libvirt does not see the disk.
>>
>> I propose to do very simple thing, may be I am not the first one here,
>> but it would be nice to allow to pass arbitrary option to the QEMU
>> command line. This could be done in a very generic way if we will
>> allow to specify additional options inside  section like this:
>>
>> 
>>   > iothread='1'>
>>   
>>   
>>   
>>   
>>   
>> 
>>
>> and so on. The meaning (at least for QEMU) is quite simple -
>> these options will just be added to the end of the -drive command
>> line. The meaning for other drivers should be the same and I
>> think that there are ways to pass generic options in them.
> It is a general policy that we do *not* do generic option passthrough
> in this kind of manner. We always want to represent concepts explicitly
> with named attributes, so that if 2 hypervisors support the same concept
> we can map it the same way in the XML
 OK. How could I change L2 cache size for QCOW2 image?

 For 1 Tb disk, fragmented in guest, the performance loss is
 around 10 times. 10 TIMES. 1000%. The customer could not
 wait until proper fix in the next QEMU release especially
 if we are able to provide the kludge specifically for him.
>>> We can explicitly allow L2 cache size set in the XML but that
>>> is a pretty poor solution to the problem IMHO, as the mgmt
>>> application has no apriori knowledge of whether a particular
>>> cache size is going to be right for a particular QCow2 image.
>>>
>>> For a sustainable solution, IMHO this really needs to be fixed
>>> in QEMU so it has either a more appropriate default, or if a
>>> single default is not possible, have QEMU auto-tune its cache
>>> size dynamically to suit the characteristics of the qcow2 image.
>> A tradeoff between memory usage and performance is policy, and setting
>> policy is the management layer's job, no qemu's. We can try to provide
>> good defaults, but they are meant for manual users of qemu. libvirt is
>> expected to configure everything exactly as it wants it instead of
>> relying on defaults.
> The question though is how is an app supposed to figure out what the
> optimal setting for cache size is ?  It seems to require knowledge
> of the level of disk fragmentation and guest I/O patterns, neither
> of which are things we can know upfront. Which means any atttempt to
> set cache size is little more than ill-informed guesswork
>
>
> Regards,
> Daniel
Funny thing that this information could come from the outside world,
f.e. from the SLA which is dependent from the amount of money the
end-user is paying to the hosting provider.

Den



Re: [Qemu-block] [RFC] finegrained disk driver options control

2017-03-16 Thread Denis V. Lunev
On 03/16/2017 06:35 PM, Kevin Wolf wrote:
> Am 16.03.2017 um 16:08 hat Daniel P. Berrange geschrieben:
>> On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
>>> On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
 On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
> Hello, All!
>
> There is a problem in the current libvirt implementation. domain.xml
> allows to specify only basic set of options, especially in the case
> of QEMU, when there are really a lot of tweaks in format drivers.
> Most likely these options will never be supported in a good way
> in libvirt as recognizable entities.
>
> Right now in order to debug libvirt QEMU VM in production I am using
> very strange approach:
> - disk section of domain XML is removed
> - exact command line options to start the disk are specified at the end
>   of domain.xml whithin  as described by Stefan
>  
> http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
>
> The problem is that when debug is finished and viable combinations of
> options is found I can not drop VM in such state in the production. This
> is the pain and problem. For example, I have spend 3 days with the
> VM of one customer which blames us for slow IO in the guest. I have
> found very good combination of non-standard options which increases
> disk performance 5 times (not 5%). Currently I can not put this 
> combination
> in the production as libvirt does not see the disk.
>
> I propose to do very simple thing, may be I am not the first one here,
> but it would be nice to allow to pass arbitrary option to the QEMU
> command line. This could be done in a very generic way if we will
> allow to specify additional options inside  section like this:
>
> 
>    iothread='1'>
>   
>   
>   
>   
>   
> 
>
> and so on. The meaning (at least for QEMU) is quite simple -
> these options will just be added to the end of the -drive command
> line. The meaning for other drivers should be the same and I
> think that there are ways to pass generic options in them.
 It is a general policy that we do *not* do generic option passthrough
 in this kind of manner. We always want to represent concepts explicitly
 with named attributes, so that if 2 hypervisors support the same concept
 we can map it the same way in the XML
>>> OK. How could I change L2 cache size for QCOW2 image?
>>>
>>> For 1 Tb disk, fragmented in guest, the performance loss is
>>> around 10 times. 10 TIMES. 1000%. The customer could not
>>> wait until proper fix in the next QEMU release especially
>>> if we are able to provide the kludge specifically for him.
>> We can explicitly allow L2 cache size set in the XML but that
>> is a pretty poor solution to the problem IMHO, as the mgmt
>> application has no apriori knowledge of whether a particular
>> cache size is going to be right for a particular QCow2 image.
>>
>> For a sustainable solution, IMHO this really needs to be fixed
>> in QEMU so it has either a more appropriate default, or if a
>> single default is not possible, have QEMU auto-tune its cache
>> size dynamically to suit the characteristics of the qcow2 image.
> A tradeoff between memory usage and performance is policy, and setting
> policy is the management layer's job, no qemu's. We can try to provide
> good defaults, but they are meant for manual users of qemu. libvirt is
> expected to configure everything exactly as it wants it instead of
> relying on defaults.
>
> Kevin
Exactly. We can have VMs with cold data with reduced memory footprint
and VMs with hot data with maximum IO capacity. Requirements from
management are completely different.

Den



[Qemu-block] [PATCH for-2.9] thread-pool: add missing qemu_bh_cancel in completion function

2017-03-16 Thread Peter Lieven
commit 3c80ca15 fixed a deadlock scenarion with nested aio_poll invocations.

However, the rescheduling of the completion BH introcuded unnecessary spinning
in the main-loop. On very fast file backends this can even lead to the
"WARNING: I/O thread spun for 1000 iterations" message popping up.

Callgrind reports about 3-4% less instructions with this patch running
qemu-img bench on a ramdisk based VMDK file.

Fixes: 3c80ca158c96ff902a30883a8933e755988948b1
Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven 
---
 util/thread-pool.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/util/thread-pool.c b/util/thread-pool.c
index ce6cd30..610646d 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -188,6 +188,13 @@ restart:
 aio_context_release(pool->ctx);
 elem->common.cb(elem->common.opaque, elem->ret);
 aio_context_acquire(pool->ctx);
+
+/* We can safely cancel the completion_bh here regardless of 
someone
+ * else having scheduled it meanwhile because we reenter the
+ * completion function anyway (goto restart).
+ */
+qemu_bh_cancel(pool->completion_bh);
+
 qemu_aio_unref(elem);
 goto restart;
 } else {
-- 
1.9.1




Re: [Qemu-block] [RFC] finegrained disk driver options control

2017-03-16 Thread Daniel P. Berrange
On Thu, Mar 16, 2017 at 04:35:36PM +0100, Kevin Wolf wrote:
> Am 16.03.2017 um 16:08 hat Daniel P. Berrange geschrieben:
> > On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
> > > On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
> > > > On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
> > > >> Hello, All!
> > > >>
> > > >> There is a problem in the current libvirt implementation. domain.xml
> > > >> allows to specify only basic set of options, especially in the case
> > > >> of QEMU, when there are really a lot of tweaks in format drivers.
> > > >> Most likely these options will never be supported in a good way
> > > >> in libvirt as recognizable entities.
> > > >>
> > > >> Right now in order to debug libvirt QEMU VM in production I am using
> > > >> very strange approach:
> > > >> - disk section of domain XML is removed
> > > >> - exact command line options to start the disk are specified at the end
> > > >>   of domain.xml whithin  as described by Stefan
> > > >>  
> > > >> http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
> > > >>
> > > >> The problem is that when debug is finished and viable combinations of
> > > >> options is found I can not drop VM in such state in the production. 
> > > >> This
> > > >> is the pain and problem. For example, I have spend 3 days with the
> > > >> VM of one customer which blames us for slow IO in the guest. I have
> > > >> found very good combination of non-standard options which increases
> > > >> disk performance 5 times (not 5%). Currently I can not put this 
> > > >> combination
> > > >> in the production as libvirt does not see the disk.
> > > >>
> > > >> I propose to do very simple thing, may be I am not the first one here,
> > > >> but it would be nice to allow to pass arbitrary option to the QEMU
> > > >> command line. This could be done in a very generic way if we will
> > > >> allow to specify additional options inside  section like this:
> > > >>
> > > >> 
> > > >>> > >> iothread='1'>
> > > >>   
> > > >>   
> > > >>   
> > > >>   
> > > >>> > >> unit='0'/>
> > > >> 
> > > >>
> > > >> and so on. The meaning (at least for QEMU) is quite simple -
> > > >> these options will just be added to the end of the -drive command
> > > >> line. The meaning for other drivers should be the same and I
> > > >> think that there are ways to pass generic options in them.
> > > > It is a general policy that we do *not* do generic option passthrough
> > > > in this kind of manner. We always want to represent concepts explicitly
> > > > with named attributes, so that if 2 hypervisors support the same concept
> > > > we can map it the same way in the XML
> > >
> > > OK. How could I change L2 cache size for QCOW2 image?
> > > 
> > > For 1 Tb disk, fragmented in guest, the performance loss is
> > > around 10 times. 10 TIMES. 1000%. The customer could not
> > > wait until proper fix in the next QEMU release especially
> > > if we are able to provide the kludge specifically for him.
> > 
> > We can explicitly allow L2 cache size set in the XML but that
> > is a pretty poor solution to the problem IMHO, as the mgmt
> > application has no apriori knowledge of whether a particular
> > cache size is going to be right for a particular QCow2 image.
> > 
> > For a sustainable solution, IMHO this really needs to be fixed
> > in QEMU so it has either a more appropriate default, or if a
> > single default is not possible, have QEMU auto-tune its cache
> > size dynamically to suit the characteristics of the qcow2 image.
> 
> A tradeoff between memory usage and performance is policy, and setting
> policy is the management layer's job, no qemu's. We can try to provide
> good defaults, but they are meant for manual users of qemu. libvirt is
> expected to configure everything exactly as it wants it instead of
> relying on defaults.

The question though is how is an app supposed to figure out what the
optimal setting for cache size is ?  It seems to require knowledge
of the level of disk fragmentation and guest I/O patterns, neither
of which are things we can know upfront. Which means any atttempt to
set cache size is little more than ill-informed guesswork


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-block] [RFC] finegrained disk driver options control

2017-03-16 Thread Alberto Garcia
On Thu 16 Mar 2017 04:08:50 PM CET, Daniel P. Berrange wrote:
>> OK. How could I change L2 cache size for QCOW2 image?
>> 
>> For 1 Tb disk, fragmented in guest, the performance loss is
>> around 10 times. 10 TIMES. 1000%. The customer could not
>> wait until proper fix in the next QEMU release especially
>> if we are able to provide the kludge specifically for him.
>
> We can explicitly allow L2 cache size set in the XML but that
> is a pretty poor solution to the problem IMHO, as the mgmt
> application has no apriori knowledge of whether a particular
> cache size is going to be right for a particular QCow2 image.
>
> For a sustainable solution, IMHO this really needs to be fixed
> in QEMU so it has either a more appropriate default, or if a
> single default is not possible, have QEMU auto-tune its cache
> size dynamically to suit the characteristics of the qcow2 image.

Related bug report (and discussion):

https://bugzilla.redhat.com/show_bug.cgi?id=1377735

Berto



Re: [Qemu-block] [RFC] finegrained disk driver options control

2017-03-16 Thread Daniel P. Berrange
On Thu, Mar 16, 2017 at 06:15:27PM +0300, Denis V. Lunev wrote:
> On 03/16/2017 06:08 PM, Daniel P. Berrange wrote:
> > On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
> >> On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
> >>> On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
>  Hello, All!
> 
>  There is a problem in the current libvirt implementation. domain.xml
>  allows to specify only basic set of options, especially in the case
>  of QEMU, when there are really a lot of tweaks in format drivers.
>  Most likely these options will never be supported in a good way
>  in libvirt as recognizable entities.
> 
>  Right now in order to debug libvirt QEMU VM in production I am using
>  very strange approach:
>  - disk section of domain XML is removed
>  - exact command line options to start the disk are specified at the end
>    of domain.xml whithin  as described by Stefan
>   
>  http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
> 
>  The problem is that when debug is finished and viable combinations of
>  options is found I can not drop VM in such state in the production. This
>  is the pain and problem. For example, I have spend 3 days with the
>  VM of one customer which blames us for slow IO in the guest. I have
>  found very good combination of non-standard options which increases
>  disk performance 5 times (not 5%). Currently I can not put this 
>  combination
>  in the production as libvirt does not see the disk.
> 
>  I propose to do very simple thing, may be I am not the first one here,
>  but it would be nice to allow to pass arbitrary option to the QEMU
>  command line. This could be done in a very generic way if we will
>  allow to specify additional options inside  section like this:
> 
>  
>  iothread='1'>
>    
>    
>    
>    
>    
>  
> 
>  and so on. The meaning (at least for QEMU) is quite simple -
>  these options will just be added to the end of the -drive command
>  line. The meaning for other drivers should be the same and I
>  think that there are ways to pass generic options in them.
> >>> It is a general policy that we do *not* do generic option passthrough
> >>> in this kind of manner. We always want to represent concepts explicitly
> >>> with named attributes, so that if 2 hypervisors support the same concept
> >>> we can map it the same way in the XML
> >> OK. How could I change L2 cache size for QCOW2 image?
> >>
> >> For 1 Tb disk, fragmented in guest, the performance loss is
> >> around 10 times. 10 TIMES. 1000%. The customer could not
> >> wait until proper fix in the next QEMU release especially
> >> if we are able to provide the kludge specifically for him.
> > We can explicitly allow L2 cache size set in the XML but that
> > is a pretty poor solution to the problem IMHO, as the mgmt
> > application has no apriori knowledge of whether a particular
> > cache size is going to be right for a particular QCow2 image.
> >
> > For a sustainable solution, IMHO this really needs to be fixed
> > in QEMU so it has either a more appropriate default, or if a
> > single default is not possible, have QEMU auto-tune its cache
> > size dynamically to suit the characteristics of the qcow2 image.
> Yes, I agree. That is why I am spoken about the kludge.
> 
> 
> >> There is an option  which specifically
> >> works like this. It is enabled specifically with changed scheme.
> >> OK, we can have this option enabled only under the same
> >> condition. But we have to have a way to solve the problem
> >> at the moment. Not in 3 month of painful dances within
> >> the driver. May be with limitations line increased memory
> >> footprint, but still.
> > Sure, you can use  passthrough - that is the explicit
> > temporary workaround - we don't provide any guarantee that your guest
> > won't break when upgrading either libvirt or QEMU though, hence we
> > mark it as tainted.
> No and yes. Yes,  partially solves the situation.
> No this solution has tooo strong drawbacks IMHO. The configuration
> of this VM could not be changed anymore in any viable way and
> there are a lot of problems as one disk is absent at libvirt level.
> 
> Can we add the option when the VM config is tainted and debug
> scheme enabled specifically to the disk level? This would the best
> partial solution, which will not ruin other management tasks
> like backup, disk add etc.

We really don't want to propagate the custom passthrough into further
areas of the XML. It is intentionally limited because it is not
something we want people/apps to use for anything other than a short
term hack. You should be able to use qemu's  '-set' argument to set
fields against existing QEMU args, without having to throw away the
entire libvirt  config built by libvirt.

Regards,

Re: [Qemu-block] [RFC] finegrained disk driver options control

2017-03-16 Thread Daniel P. Berrange
On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
> On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
> > On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
> >> Hello, All!
> >>
> >> There is a problem in the current libvirt implementation. domain.xml
> >> allows to specify only basic set of options, especially in the case
> >> of QEMU, when there are really a lot of tweaks in format drivers.
> >> Most likely these options will never be supported in a good way
> >> in libvirt as recognizable entities.
> >>
> >> Right now in order to debug libvirt QEMU VM in production I am using
> >> very strange approach:
> >> - disk section of domain XML is removed
> >> - exact command line options to start the disk are specified at the end
> >>   of domain.xml whithin  as described by Stefan
> >>  
> >> http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
> >>
> >> The problem is that when debug is finished and viable combinations of
> >> options is found I can not drop VM in such state in the production. This
> >> is the pain and problem. For example, I have spend 3 days with the
> >> VM of one customer which blames us for slow IO in the guest. I have
> >> found very good combination of non-standard options which increases
> >> disk performance 5 times (not 5%). Currently I can not put this combination
> >> in the production as libvirt does not see the disk.
> >>
> >> I propose to do very simple thing, may be I am not the first one here,
> >> but it would be nice to allow to pass arbitrary option to the QEMU
> >> command line. This could be done in a very generic way if we will
> >> allow to specify additional options inside  section like this:
> >>
> >> 
> >>>> iothread='1'>
> >>   
> >>   
> >>   
> >>   
> >>   
> >> 
> >>
> >> and so on. The meaning (at least for QEMU) is quite simple -
> >> these options will just be added to the end of the -drive command
> >> line. The meaning for other drivers should be the same and I
> >> think that there are ways to pass generic options in them.
> > It is a general policy that we do *not* do generic option passthrough
> > in this kind of manner. We always want to represent concepts explicitly
> > with named attributes, so that if 2 hypervisors support the same concept
> > we can map it the same way in the XML
>
> OK. How could I change L2 cache size for QCOW2 image?
> 
> For 1 Tb disk, fragmented in guest, the performance loss is
> around 10 times. 10 TIMES. 1000%. The customer could not
> wait until proper fix in the next QEMU release especially
> if we are able to provide the kludge specifically for him.

We can explicitly allow L2 cache size set in the XML but that
is a pretty poor solution to the problem IMHO, as the mgmt
application has no apriori knowledge of whether a particular
cache size is going to be right for a particular QCow2 image.

For a sustainable solution, IMHO this really needs to be fixed
in QEMU so it has either a more appropriate default, or if a
single default is not possible, have QEMU auto-tune its cache
size dynamically to suit the characteristics of the qcow2 image.

> There is an option  which specifically
> works like this. It is enabled specifically with changed scheme.
> OK, we can have this option enabled only under the same
> condition. But we have to have a way to solve the problem
> at the moment. Not in 3 month of painful dances within
> the driver. May be with limitations line increased memory
> footprint, but still.

Sure, you can use  passthrough - that is the explicit
temporary workaround - we don't provide any guarantee that your guest
won't break when upgrading either libvirt or QEMU though, hence we
mark it as tainted.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-block] [RFC] finegrained disk driver options control

2017-03-16 Thread Daniel P. Berrange
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
> Hello, All!
> 
> There is a problem in the current libvirt implementation. domain.xml
> allows to specify only basic set of options, especially in the case
> of QEMU, when there are really a lot of tweaks in format drivers.
> Most likely these options will never be supported in a good way
> in libvirt as recognizable entities.
> 
> Right now in order to debug libvirt QEMU VM in production I am using
> very strange approach:
> - disk section of domain XML is removed
> - exact command line options to start the disk are specified at the end
>   of domain.xml whithin  as described by Stefan
>  
> http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
> 
> The problem is that when debug is finished and viable combinations of
> options is found I can not drop VM in such state in the production. This
> is the pain and problem. For example, I have spend 3 days with the
> VM of one customer which blames us for slow IO in the guest. I have
> found very good combination of non-standard options which increases
> disk performance 5 times (not 5%). Currently I can not put this combination
> in the production as libvirt does not see the disk.
> 
> I propose to do very simple thing, may be I am not the first one here,
> but it would be nice to allow to pass arbitrary option to the QEMU
> command line. This could be done in a very generic way if we will
> allow to specify additional options inside  section like this:
> 
> 
>iothread='1'>
>   
>   
>   
>   
>   
> 
> 
> and so on. The meaning (at least for QEMU) is quite simple -
> these options will just be added to the end of the -drive command
> line. The meaning for other drivers should be the same and I
> think that there are ways to pass generic options in them.

It is a general policy that we do *not* do generic option passthrough
in this kind of manner. We always want to represent concepts explicitly
with named attributes, so that if 2 hypervisors support the same concept
we can map it the same way in the XML

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-block] [PATCH] blockdev: fix bitmap clear undo

2017-03-16 Thread Kevin Wolf
Am 15.03.2017 um 22:28 hat John Snow geschrieben:
> Only undo the action if we actually prepared the action.
> 
> Signed-off-by: John Snow 

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [RFC patch 3/3] blockjob: add devops to blockjob backends

2017-03-16 Thread Paolo Bonzini


On 16/03/2017 01:46, John Snow wrote:
> - BlkDevOps is traditionally only for Qdev devices, and a BlockJob is not
>   currently a 'device'... Do we want to loosen this restriction, find another
>   way to deliver callbacks to BlockJobs attached to BlkBackends, or do 
> something
>   crazy like make a BlockJob device object?
> 
>   struct JobDevice {
> DeviceState parent_obj;
> BlockJob *job;
>   } ...??

I think we want to loosen it.

> - Not so sure about leaving the initial job refcount at 1, since we are giving
>   away a handle to this job as dev_opaque. By incrementing it to 2, however,
>   it's not clear whose responsibility it is to decrement it again.

It would be another callback, sent at the time the dev_ops are removed.
But devices have been treating dev_opaque as a weak reference, it makes
sense to do the same for jobs.

Paolo



Re: [Qemu-block] [RFC patch 2/3] block-backend: add drained_begin / drained_end ops

2017-03-16 Thread Paolo Bonzini


On 16/03/2017 01:46, John Snow wrote:
> 
> - Does the presence of blk->quiesce_counter relieve the burden of needing
>   blk->public.io_limits_disabled? I could probably eliminate this counter
>   entirely and just spy on the root node's quiescent state at key moments
>   instead. I am confident I'm traipsing on delicate drain semantics.

Tricky question.  I believe io_limits_disabled is a bit of a hack.

Certainly you could get rid of disable_external now that we have
drained_begin/drained_end, but it would be a separate patch.

> - Should I treat the separation of a quisced BDS/BB as a drained_end request
>   as an analogue to how blk_insert_bs (in this patch) handles this?

I think so.

Paolo



Re: [Qemu-block] [RFC patch 1/3] blockjob: add block_job_start_shim

2017-03-16 Thread Paolo Bonzini


On 16/03/2017 01:46, John Snow wrote:
> +/**
> + * This code must run after the job's coroutine has been entered,
> + * but not before.
> + */
> +static void coroutine_fn block_job_start_shim(void *opaque)
> +{
> +BlockJob *job = opaque;
> +
> +assert(job && job->driver && job->driver->start);
> +block_job_pause_point(job);
> +job->driver->start(job);
> +}

Poor function, don't call it a shim :)  Let's just call it
block_job_co_entry.

Paolo

>  void block_job_start(BlockJob *job)
>  {
>  assert(job && !block_job_started(job) && job->paused &&
> -   !job->busy && job->driver->start);
> -job->co = qemu_coroutine_create(job->driver->start, job);
> -if (--job->pause_count == 0) {
> -job->paused = false;
> -job->busy = true;
> -qemu_coroutine_enter(job->co);
> -}
> +   job->driver && job->driver->start);
> +job->co = qemu_coroutine_create(block_job_start_shim, job);
> +job->pause_count--;
> +job->busy = true;
> +job->paused = false;
> +qemu_coroutine_enter(job->co);
>  }
>  
>  void block_job_ref(BlockJob *job)



Re: [Qemu-block] [PATCH v1 1/1] vmstate: fix failed iotests case 68 and 91

2017-03-16 Thread Juan Quintela
QingFeng Hao  wrote:
> This problem affects s390x only if we are running without KVM.
> Basically, S390CPU.irqstate is unused if we do not use KVM,
> and thus no buffer is allocated.
> This causes size=0, first_elem=NULL and n_elems=1 in
> vmstate_load_state and vmstate_save_state. And the assert fails.
> With this fix we can go back to the old behavior and support
> VMS_VBUFFER with size 0 and nullptr.
>
> Signed-off-by: QingFeng Hao 
> Signed-off-by: Halil Pasic 

queued



Re: [Qemu-block] [PATCH v2] migration/block: Avoid invoking blk_drain too frequently

2017-03-16 Thread Juan Quintela
Fam Zheng  wrote:
> On Wed, 03/15 17:31, Dr. David Alan Gilbert wrote:
>> * Fam Zheng (f...@redhat.com) wrote:
>> > On Wed, 03/15 11:37, Lidong Chen wrote:
>> > > Increase bmds->cur_dirty after submit io, so reduce the frequency
>> > > involve into blk_drain, and improve the performance obviously
>> > > when block migration.
>> > > 
>> > > The performance test result of this patch:
>> > > 
>> > > During the block dirty save phase, this patch improve guest os IOPS
>> > > from 4.0K to 9.5K. and improve the migration speed from
>> > > 505856 rsec/s to 855756 rsec/s.
>> > > 
>> > > Signed-off-by: Lidong Chen 
>> > > ---
>> > >  migration/block.c | 3 +++
>> > >  1 file changed, 3 insertions(+)
>> > > 
>> > > diff --git a/migration/block.c b/migration/block.c
>> > > index 6741228..7734ff7 100644
>> > > --- a/migration/block.c
>> > > +++ b/migration/block.c
>> > > @@ -576,6 +576,9 @@ static int mig_save_device_dirty(QEMUFile *f, 
>> > > BlkMigDevState *bmds,
>> > >  }
>> > >  
>> > >  bdrv_reset_dirty_bitmap(bmds->dirty_bitmap, sector, 
>> > > nr_sectors);
>> > > +sector += nr_sectors;
>> > > +bmds->cur_dirty = sector;
>> > > +
>> > >  break;
>> > >  }
>> > >  sector += BDRV_SECTORS_PER_DIRTY_CHUNK;
>> > > -- 
>> > > 1.8.3.1
>> > > 
>> > 
>> > Nice catch above all, thank you!
>> > 
>> > Reviewed-by: Fam Zheng 
>> 
>> Are you taking that via a block pull?
>
> I can do that, but I'm not sure whether it should go to 2.9. This is a
> performance improvement, which usually doesn't qualify as bug fixes. But this
> also looks like a mistake in original code.
>
> Fam

I am taking it through migration and push it.  I agree with your description.



Re: [Qemu-block] [PATCH v2 01/30] trace: Fix backwards mirror_yield parameters

2017-03-16 Thread Stefan Hajnoczi
On Mon, Mar 13, 2017 at 02:55:18PM -0500, Eric Blake wrote:
> block/trace-events lists the parameters for mirror_yield
> consistently with other mirror events (cnt just after s, like in
> mirror_before_sleep; in_flight last, like in mirror_yield_in_flight).
> But the callers were passing parameters in the wrong order, leading
> to poor trace messages, including type truncation when there are
> more than 4G dirty sectors involved.  Broken since its introduction
> in commit bd48bde.
> 
> While touching this, ensure that all callers use the same type
> (uint64_t) for cnt, as a later patch will enable the compiler to do
> stricter type-checking.
> 
> Signed-off-by: Eric Blake 
> ---
>  block/mirror.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Thanks, applied to my tracing-next tree:
https://github.com/stefanha/qemu/commits/tracing-next

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v2 02/30] trace: Fix incorrect megasas trace parameters

2017-03-16 Thread Stefan Hajnoczi
On Mon, Mar 13, 2017 at 02:55:19PM -0500, Eric Blake wrote:
> hw/scsi/trace-events lists cmd as the first parameter for both
> megasas_iovec_overflow and megasas_iovec_underflow, but the caller
> was mistakenly passing cmd->iov_size twice instead of the command
> index.  Also, trace_megasas_abort_invalid is called with parameters
> in the wrong order.  Broken since its introduction in commit
> e8f943c3.
> 
> Signed-off-by: Eric Blake 
> ---
>  hw/scsi/megasas.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Thanks, applied to my tracing-next tree:
https://github.com/stefanha/qemu/commits/tracing-next

Stefan


signature.asc
Description: PGP signature