Re: [Qemu-block] [PATCH v2 17/17] block: Move request_alignment into BlockLimit

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> It makes more sense to have ALL block size limit constraints
> in the same struct.  Improve the documentation while at it.
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [PATCH v2 16/17] block: Split bdrv_merge_limits() from bdrv_refresh_limits()

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> The raw block driver was blindly copying all limits from bs->file,
> even though: 1. the main bdrv_refresh_limits() already does this
> for many of gthe limits, and 2. blindly copying from the children

s/gthe/the ?

> can weaken any stricter limits that were already inherited from
> the backing dhain during the main bdrv_refresh_limits().  Also,
> the next patch is about to move .request_alignment into
> BlockLimits, and that is a limit that should NOT be copied from
> other layers in the BDS chain.
> 
> Solve the issue by factoring out a new bdrv_merge_limits(),
> and using that function to properly merge limits when comparing
> two BlockDriverState objects.
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [PATCH v2 15/17] block: Switch discard length bounds to byte-based

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> Sector-based limits are awkward to think about; in our on-going
> quest to move to byte-based interfaces, convert max_discard and
> discard_alignment.  Rename them, using 'pdiscard' as an aid to
> track which remaining discard interfaces need conversion, and so
> that the compiler will help us catch the change in semantics
> across any rebased code.  In iscsi.c, sector_limits_lun2qemu()
> is no longer needed; and the BlockLimits type is now completely
> byte-based.
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v2: rebase nbd and iscsi limits across earlier improvements
> ---
>  include/block/block_int.h | 21 +++--
>  block/io.c| 16 +---
>  block/iscsi.c | 19 ++-
>  block/nbd.c   |  2 +-
>  qemu-img.c|  3 ++-
>  5 files changed, 33 insertions(+), 28 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 2b45d57..0169019 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -324,18 +324,27 @@ struct BlockDriver {
>  };
> 
>  typedef struct BlockLimits {
> -/* maximum number of sectors that can be discarded at once */
> -int max_discard;
> +/* maximum number of bytes that can be discarded at once (since it
> + * is signed, it must be < 2G, if set), should be multiple of
> + * pdiscard_alignment, but need not be power of 2. May be 0 if no
> + * inherent 32-bit limit */
> +int32_t max_pdiscard;

Why not use uint32_t?

> 
> -/* optimal alignment for discard requests in sectors */
> -int64_t discard_alignment;
> +/* optimal alignment for discard requests in bytes, must be power
> + * of 2, less than max_discard if that is set, and multiple of
> + * bs->request_alignment. May be 0 if bs->request_alignment is
> + * good enough */
> +uint32_t pdiscard_alignment;
> 
>  /* maximum number of bytes that can zeroized at once (since it is
> - * signed, it must be < 2G, if set) */
> + * signed, it must be < 2G, if set), should be multiple of
> + * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
>  int32_t max_pwrite_zeroes;
> 
>  /* optimal alignment for write zeroes requests in bytes, must be
> - * power of 2, and less than max_pwrite_zeroes if that is set */
> + * power of 2, less than max_pwrite_zeroes if that is set, and
> + * multiple of bs->request_alignment. May be 0 if
> + * bs->request_alignment is good enough */
>  uint32_t pwrite_zeroes_alignment;
> 
>  /* optimal transfer length in bytes (must be power of 2, and
> diff --git a/block/io.c b/block/io.c
> index 5e38ab4..0b5c40d 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2352,19 +2352,21 @@ int coroutine_fn bdrv_co_discard(BlockDriverState 
> *bs, int64_t sector_num,
>BDRV_TRACKED_DISCARD);
>  bdrv_set_dirty(bs, sector_num, nb_sectors);
> 
> -max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
> +max_discard = MIN_NON_ZERO(bs->bl.max_pdiscard >> BDRV_SECTOR_BITS,
> +   BDRV_REQUEST_MAX_SECTORS);
>  while (nb_sectors > 0) {
>  int ret;
>  int num = nb_sectors;
> +int discard_alignment = bs->bl.pdiscard_alignment >> 
> BDRV_SECTOR_BITS;
> 
>  /* align request */
> -if (bs->bl.discard_alignment &&
> -num >= bs->bl.discard_alignment &&
> -sector_num % bs->bl.discard_alignment) {
> -if (num > bs->bl.discard_alignment) {
> -num = bs->bl.discard_alignment;
> +if (discard_alignment &&
> +num >= discard_alignment &&
> +sector_num % discard_alignment) {
> +if (num > discard_alignment) {
> +num = discard_alignment;
>  }
> -num -= sector_num % bs->bl.discard_alignment;
> +num -= sector_num % discard_alignment;
>  }
> 
>  /* limit request size */
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 739d5ca..af9babf 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -1696,13 +1696,6 @@ static void iscsi_close(BlockDriverState *bs)
>  memset(iscsilun, 0, sizeof(IscsiLun));
>  }
> 
> -static int sector_limits_lun2qemu(int64_t sector, IscsiLun *iscsilun)
> -{
> -int limit = MIN(sector_lun2qemu(sector, iscsilun), INT_MAX / 2 + 1);
> -
> -return limit < BDRV_REQUEST_MAX_SECTORS ? limit : 0;
> -}
> -
>  static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>  /* We don't actually refresh here, but just return data queried in
> @@ -1722,14 +1715,14 @@ static void iscsi_refresh_limits(BlockDriverState 
> *bs, Error **errp)
>  }
> 
>  if (iscsilun->lbp.lbpu) {
> -if (iscsilun->bl.max_unmap < 0x) {
> -bs->bl.max_discard =
> -

Re: [Qemu-block] [PATCH v2 14/17] block: Switch transfer length bounds to byte-based

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> Sector-based limits are awkward to think about; in our on-going
> quest to move to byte-based interfaces, convert max_transfer_length
> and opt_transfer_length.  Rename them (dropping the _length suffix)
> so that the compiler will help us catch the change in semantics
> across any rebased code, and improve the documentation.  Use unsigned
> values, so that we don't have to worry about negative values and
> so that bit-twiddling is easier; however, we are still constrained
> by 2^31 of signed int in most APIs.
> 
> Signed-off-by: Eric Blake 

Looks good apart from the scsi-generic blocksize calculation, which is not an
issue of this patch.

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [Qemu-devel] [PATCH v2 12/17] block: Set request_alignment during .bdrv_refresh_limits()

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> We want to eventually stick request_alignment alongside other
> BlockLimits, but first, we must ensure it is populated at the
> same time as all other limits, rather than being a special case
> that is set only when a block is first opened.
> 
> Add a .bdrv_refresh_limits() to all four of our legacy devices
> that will always be sector-only (bochs, cloop, dmg, vvfat), in
> spite of their recent conversion to expose a byte interface.
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [PATCH v2 13/17] block: Set default request_alignment during bdrv_refresh_limits()

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> We want to eventually stick request_alignment alongside other
> BlockLimits, but first, we must ensure it is populated at the
> same time as all other limits, rather than being a special case
> that is set only when a block is first opened.
> 
> Now that all drivers have been updated to supply an override
> of request_alignment during their .bdrv_refresh_limits(), as
> needed, the block layer itself can defer setting the default
> alignment until part of the overall bdrv_refresh_limits().
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [Qemu-devel] [PATCH v2 09/17] iscsi: Set request_alignment during .bdrv_refresh_limits()

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> We want to eventually stick request_alignment alongside other
> BlockLimits, but first, we must ensure it is populated at the
> same time as all other limits, rather than being a special case
> that is set only when a block is first opened.
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [Qemu-devel] [PATCH v2 11/17] raw-win32: Set request_alignment during .bdrv_refresh_limits()

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> We want to eventually stick request_alignment alongside other
> BlockLimits, but first, we must ensure it is populated at the
> same time as all other limits, rather than being a special case
> that is set only when a block is first opened.
> 
> In this case, raw_probe_alignment() already did what we needed,
> so just fix its signature and wire it in correctly.
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [Qemu-devel] [PATCH v2 10/17] qcow2: Set request_alignment during .bdrv_refresh_limits()

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> We want to eventually stick request_alignment alongside other
> BlockLimits, but first, we must ensure it is populated at the
> same time as all other limits, rather than being a special case
> that is set only when a block is first opened.
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [Qemu-devel] [PATCH v2 07/17] block: Give nonzero result to blk_get_max_transfer_length()

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> Making all callers special-case 0 as unlimited is awkward,
> and we DO have a hard maximum of BDRV_REQUEST_MAX_SECTORS given
> our current block layer API limits.
> 
> In the case of scsi, this means that we now always advertise a
> limit to the guest, even in cases where the underlying layers
> previously use 0 for no inherent limit beyond the block layer.
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v2: new patch
> ---
>  block/block-backend.c  |  7 ---
>  hw/block/virtio-blk.c  |  3 +--
>  hw/scsi/scsi-generic.c | 12 ++--
>  3 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 34500e6..1fb070b 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1303,15 +1303,16 @@ int blk_get_flags(BlockBackend *blk)
>  }
>  }
> 
> +/* Returns the maximum transfer length, in sectors; guaranteed nonzero */
>  int blk_get_max_transfer_length(BlockBackend *blk)
>  {
>  BlockDriverState *bs = blk_bs(blk);
> +int max = 0;
> 
>  if (bs) {
> -return bs->bl.max_transfer_length;
> -} else {
> -return 0;
> +max = bs->bl.max_transfer_length;
>  }
> +return MIN_NON_ZERO(max, BDRV_REQUEST_MAX_SECTORS);
>  }
> 
>  int blk_get_max_iov(BlockBackend *blk)
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 284e646..1d2792e 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -382,7 +382,7 @@ static int multireq_compare(const void *a, const void *b)
>  void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
>  {
>  int i = 0, start = 0, num_reqs = 0, niov = 0, nb_sectors = 0;
> -int max_xfer_len = 0;
> +int max_xfer_len;
>  int64_t sector_num = 0;
> 
>  if (mrb->num_reqs == 1) {
> @@ -392,7 +392,6 @@ void virtio_blk_submit_multireq(BlockBackend *blk, 
> MultiReqBuffer *mrb)
>  }
> 
>  max_xfer_len = blk_get_max_transfer_length(mrb->reqs[0]->dev->blk);
> -max_xfer_len = MIN_NON_ZERO(max_xfer_len, BDRV_REQUEST_MAX_SECTORS);
> 
>  qsort(mrb->reqs, mrb->num_reqs, sizeof(*mrb->reqs),
>_compare);
> diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
> index 71372a8..db04a76 100644
> --- a/hw/scsi/scsi-generic.c
> +++ b/hw/scsi/scsi-generic.c
> @@ -226,12 +226,12 @@ static void scsi_read_complete(void * opaque, int ret)
>  r->req.cmd.buf[0] == INQUIRY &&
>  r->req.cmd.buf[2] == 0xb0) {
>  uint32_t max_xfer_len = blk_get_max_transfer_length(s->conf.blk);
> -if (max_xfer_len) {
> -stl_be_p(>buf[8], max_xfer_len);
> -/* Also take care of the opt xfer len. */
> -if (ldl_be_p(>buf[12]) > max_xfer_len) {
> -stl_be_p(>buf[12], max_xfer_len);
> -}
> +
> +assert(max_xfer_len);
> +stl_be_p(>buf[8], max_xfer_len);
> +/* Also take care of the opt xfer len. */
> +if (ldl_be_p(>buf[12]) > max_xfer_len) {
> +stl_be_p(>buf[12], max_xfer_len);

Paolo: should we take s->blocksize into account? I missed it when I wrote this
piece of code.

Fam

>  }
>  }
>  scsi_req_data(>req, len);
> -- 
> 2.5.5
> 
> 



Re: [Qemu-block] [Qemu-devel] [PATCH v2 06/17] iscsi: Advertise realistic limits to block layer

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> The function sector_limits_lun2qemu() returns a value in units of
> the block layer's 512-byte sector, and can be as large as
> 0x4000, which is much larger than the block layer's inherent
> limit of BDRV_REQUEST_MAX_SECTORS.  The block layer already
> handles '0' as a synonym to the inherent limit, and it is nicer
> to return this value than it is to calculate an arbitrary
> maximum, for two reasons: we want to ensure that the block layer
> continues to special-case '0' as 'no limit beyond the inherent
> limits'; and we want to be able to someday expand the block
> layer to allow 64-bit limits, where auditing for uses of
> BDRV_REQUEST_MAX_SECTORS will help us make sure we aren't
> artificially constraining iscsi to old block layer limits.
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [Qemu-devel] [PATCH v2 05/17] nbd: Advertise realistic limits to block layer

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> We were basing the advertisement of maximum discard and transfer
> length off of UINT32_MAX, but since the rest of the block layer
> has signed int limits on a transaction, nothing could ever reach
> that maximum, and we risk overflowing an int once things are
> converted to byte-based rather than sector-based limits.  What's
> more, we DO have a much smaller limit: both the current kernel
> and qemu-nbd have a hard limit of 32M on a read or write
> transaction, and while they may also permit up to a full 32 bits
> on a discard transaction, the upstream NBD protocol is proposing
> wording that without any explicit advertisement otherwise,
> clients should limit ALL requests to the same limits as read and
> write, even though the other requests do not actually require as
> many bytes across the wire.  So the better limit to tell the
> block layer is 32M for both values.
> 
> Signed-off-by: Eric Blake 
> 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [Qemu-devel] [PATCH v2 04/17] nbd: Allow larger requests

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> The NBD layer was breaking up request at a limit of 2040 sectors
> (just under 1M) to cater to old qemu-nbd. But the server limit
> was raised to 32M in commit 2d8214885 to match the kernel, more
> than three years ago; and the upstream NBD Protocol is proposing
> documentation that without any explicit communication to state
> otherwise, a client should be able to safely assume that a 32M
> transaction will work.  It is time to rely on the larger sizing,
> and any downstream distro that cares about maximum
> interoperability to older qemu-nbd servers can just tweak the
> value of #define NBD_MAX_SECTORS.
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [PATCH v2 03/17] block: Fix harmless off-by-one in bdrv_aligned_preadv()

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> If the amount of data to read ends exactly on the total size
> of the bs, then we were wasting time creating a local qiov
> to read the data in preparation for what would normally be
> appending zeroes beyond the end, even though this corner case
> has nothing further to do.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Kevin Wolf 

Reviewed-by: Fam Zheng 




Re: [Qemu-block] [PATCH v2 02/17] block: Document supported flags during bdrv_aligned_preadv()

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> We don't pass any flags on to drivers to handle.  Tighten an
> assert to explain why we pass 0 to bdrv_driver_preadv(), and add
> some comments on things to be aware of if we want to turn on
> per-BDS BDRV_REQ_FUA support during reads in the future.  Also,
> document that we may want to consider using unmap during
> copy-on-read operations where the read is all zeroes.
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [PATCH v2 01/17] block: Tighter assertions on bdrv_aligned_pwritev()

2016-06-15 Thread Fam Zheng
On Tue, 06/14 15:30, Eric Blake wrote:
> For symmetry with bdrv_aligned_preadv(), assert that the caller
> really has aligned things properly. This requires adding an align
> parameter, which is used now only in the new asserts, but will
> come in handy in a later patch that adds auto-fragmentation to the
> max transfer size, since that value need not always be a multiple
> of the alignment, and therefore must be rounded down.
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [Qemu-devel] [PATCH 1/2] blockdev: Add dynamic generation of module_block.h

2016-06-15 Thread Fam Zheng
On Wed, 06/15 14:40, Colin Lord wrote:
> From: Marc Mari 
> 
> To simplify the addition of new block modules, add a script that generates
> include/qemu/module_block.h automatically from the modules' source code.
> 
> This script assumes that the QEMU coding style rules are followed.
> 
> Signed-off-by: Marc Marí 
> Signed-off-by: Colin Lord 
> ---
>  .gitignore  |   1 +
>  Makefile|   8 +++
>  scripts/modules/module_block.py | 134 
> 
>  3 files changed, 143 insertions(+)
>  create mode 100644 scripts/modules/module_block.py
> 
> diff --git a/.gitignore b/.gitignore
> index 38ee1c5..06aa064 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -110,3 +110,4 @@ tags
>  TAGS
>  docker-src.*
>  *~
> +/include/qemu/module_block.h
> diff --git a/Makefile b/Makefile
> index ed4032a..8f8b6a2 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -76,6 +76,8 @@ GENERATED_HEADERS += trace/generated-ust-provider.h
>  GENERATED_SOURCES += trace/generated-ust.c
>  endif
>  
> +GENERATED_HEADERS += include/qemu/module_block.h
> +
>  # Don't try to regenerate Makefile or configure
>  # We don't generate any of them
>  Makefile: ;
> @@ -352,6 +354,12 @@ ivshmem-client$(EXESUF): $(ivshmem-client-obj-y) 
> libqemuutil.a libqemustub.a
>  ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) libqemuutil.a libqemustub.a
>   $(call LINK, $^)
>  
> +include/qemu/module_block.h: $(SRC_PATH)/scripts/modules/module_block.py 
> config-host.mak
> + $(call quiet-command,$(PYTHON) \
> +$(SRC_PATH)/scripts/modules/module_block.py \
> + $(SRC_PATH)/"./include/qemu/" $(addprefix $(SRC_PATH)/,$(patsubst 
> %.mo,%.c,$(block-obj-m))), \
> + "  GEN   $@")
> +
>  clean:
>  # avoid old build problems by removing potentially incorrect old files
>   rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h 
> gen-op-arm.h
> diff --git a/scripts/modules/module_block.py b/scripts/modules/module_block.py
> new file mode 100644
> index 000..005bc49
> --- /dev/null
> +++ b/scripts/modules/module_block.py
> @@ -0,0 +1,134 @@
> +#!/usr/bin/python
> +#
> +# Module information generator
> +#
> +# Copyright Red Hat, Inc. 2015
> +#
> +# Authors:
> +#  Marc Mari 

Address hidden seems like a mistake during copy from web. :)

One more below..

> +#
> +# This work is licensed under the terms of the GNU GPL, version 2.
> +# See the COPYING file in the top-level directory.
> +
> +from __future__ import print_function
> +import sys
> +import os
> +
> +def get_string_struct(line):
> +data = line.split()
> +
> +# data[0] -> struct element name
> +# data[1] -> =
> +# data[2] -> value
> +
> +return data[2].replace('"', '')[:-1]
> +
> +def add_module(fhader, library, format_name, protocol_name,
> +probe, probe_device):
> +lines = []
> +lines.append('.library_name = "' + library + '",')
> +if format_name != "":
> +lines.append('.format_name = "' + format_name + '",')
> +if protocol_name != "":
> +lines.append('.protocol_name = "' + protocol_name + '",')
> +if probe:
> +lines.append('.has_probe = true,')
> +if probe_device:
> +lines.append('.has_probe_device = true,')
> +
> +text = '\n\t'.join(lines)
> +fheader.write('\n\t{\n\t' + text + '\n\t},')
> +
> +def process_file(fheader, filename):
> +# This parser assumes the coding style rules are being followed
> +with open(filename, "r") as cfile:
> +found_something = False
> +found_start = False
> +library, _ = os.path.splitext(os.path.basename(filename))
> +for line in cfile:
> +if found_start:
> +line = line.replace('\n', '')
> +if line.find(".format_name") != -1:
> +format_name = get_string_struct(line)
> +elif line.find(".protocol_name") != -1:
> +protocol_name = get_string_struct(line)
> +elif line.find(".bdrv_probe") != -1:
> +probe = True
> +elif line.find(".bdrv_probe_device") != -1:
> +probe_device = True
> +elif line == "};":
> +add_module(fheader, library, format_name, protocol_name,
> +probe, probe_device)
> +found_start = False
> +elif line.find("static BlockDriver") != -1:
> +found_something = True
> +found_start = True
> +format_name = ""
> +protocol_name = ""
> +probe = False
> +probe_device = False
> +
> +if not found_something:
> +print("No BlockDriver struct found in " + filename + ". \
> +Is this really a module?", file=sys.stderr)
> +sys.exit(1)
> +
> +def print_top(fheader):
> +

Re: [Qemu-block] [PATCH 2/2] blockdev: Add dynamic module loading for block drivers

2016-06-15 Thread Paolo Bonzini


On 15/06/2016 20:40, Colin Lord wrote:
> 
> The only block drivers that can be converted into modules are the drivers
> that don't perform any init operation except for registering themselves. This
> is why libiscsi has been disabled as a module.

I don't think it has in this patch :) but you can also move the
iscsi_opts registration to vl.c.

Paolo



Re: [Qemu-block] [PATCH 1/2] blockdev: Add dynamic generation of module_block.h

2016-06-15 Thread Paolo Bonzini


On 15/06/2016 20:40, Colin Lord wrote:
> +def add_module(fhader, library, format_name, protocol_name,

fhader looks like a typo.

Paolo

> +probe, probe_device):
> +lines = []
> +lines.append('.library_name = "' + library + '",')
> +if format_name != "":
> +lines.append('.format_name = "' + format_name + '",')
> +if protocol_name != "":
> +lines.append('.protocol_name = "' + protocol_name + '",')
> +if probe:
> +lines.append('.has_probe = true,')
> +if probe_device:
> +lines.append('.has_probe_device = true,')
> +
> +text = '\n\t'.join(lines)
> +fheader.write('\n\t{\n\t' + text + '\n\t},')



Re: [Qemu-block] [PATCH v2] ssi: change ssi_slave_init to be a realize ops

2016-06-15 Thread Paolo Bonzini


On 15/06/2016 23:16, Cédric Le Goater wrote:
> This enables qemu to handle late inits and report errors. All the SSI
> slave routine names were changed accordingly. Code was modified to
> handle errors when possible (m25p80 and ssi-sd)
> 
> Tested with the m25p80 slave object.
> 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Cédric Le Goater 
> ---
> 
>  Should apply on top of :
>  
> m25p80: fix test on blk_pread() return value
> https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05574.html
> 
>  Changes since v1 :
> 
>  - added a error_setg() report in ssi_sd_realize()
> 
>  hw/arm/spitz.c   |   12 
>  hw/arm/tosa.c|5 ++---
>  hw/arm/z2.c  |6 ++
>  hw/block/m25p80.c|   12 +---
>  hw/display/ads7846.c |5 ++---
>  hw/display/ssd0323.c |5 ++---
>  hw/misc/max111x.c|   12 ++--
>  hw/sd/ssi-sd.c   |9 +
>  hw/ssi/ssi.c |6 +++---
>  include/hw/ssi/ssi.h |2 +-
>  10 files changed, 32 insertions(+), 42 deletions(-)
> 
> Index: qemu-ast2400-mainline.git/hw/arm/spitz.c
> ===
> --- qemu-ast2400-mainline.git.orig/hw/arm/spitz.c
> +++ qemu-ast2400-mainline.git/hw/arm/spitz.c
> @@ -598,15 +598,13 @@ static uint32_t spitz_lcdtg_transfer(SSI
>  return 0;
>  }
>  
> -static int spitz_lcdtg_init(SSISlave *dev)
> +static void spitz_lcdtg_realize(SSISlave *dev, Error **errp)
>  {
>  SpitzLCDTG *s = FROM_SSI_SLAVE(SpitzLCDTG, dev);
>  
>  spitz_lcdtg = s;
>  s->bl_power = 0;
>  s->bl_intensity = 0x20;
> -
> -return 0;
>  }
>  
>  /* SSP devices */
> @@ -666,7 +664,7 @@ static void spitz_adc_temp_on(void *opaq
>  max111x_set_input(max, MAX_BATT_TEMP, 0);
>  }
>  
> -static int corgi_ssp_init(SSISlave *d)
> +static void corgi_ssp_realize(SSISlave *d, Error **errp)
>  {
>  DeviceState *dev = DEVICE(d);
>  CorgiSSPState *s = FROM_SSI_SLAVE(CorgiSSPState, d);
> @@ -675,8 +673,6 @@ static int corgi_ssp_init(SSISlave *d)
>  s->bus[0] = ssi_create_bus(dev, "ssi0");
>  s->bus[1] = ssi_create_bus(dev, "ssi1");
>  s->bus[2] = ssi_create_bus(dev, "ssi2");
> -
> -return 0;
>  }
>  
>  static void spitz_ssp_attach(PXA2xxState *cpu)
> @@ -1121,7 +1117,7 @@ static void corgi_ssp_class_init(ObjectC
>  DeviceClass *dc = DEVICE_CLASS(klass);
>  SSISlaveClass *k = SSI_SLAVE_CLASS(klass);
>  
> -k->init = corgi_ssp_init;
> +k->realize = corgi_ssp_realize;
>  k->transfer = corgi_ssp_transfer;
>  dc->vmsd = _corgi_ssp_regs;
>  }
> @@ -1150,7 +1146,7 @@ static void spitz_lcdtg_class_init(Objec
>  DeviceClass *dc = DEVICE_CLASS(klass);
>  SSISlaveClass *k = SSI_SLAVE_CLASS(klass);
>  
> -k->init = spitz_lcdtg_init;
> +k->realize = spitz_lcdtg_realize;
>  k->transfer = spitz_lcdtg_transfer;
>  dc->vmsd = _spitz_lcdtg_regs;
>  }
> Index: qemu-ast2400-mainline.git/hw/arm/tosa.c
> ===
> --- qemu-ast2400-mainline.git.orig/hw/arm/tosa.c
> +++ qemu-ast2400-mainline.git/hw/arm/tosa.c
> @@ -127,10 +127,9 @@ static uint32_t tosa_ssp_tansfer(SSISlav
>  return 0;
>  }
>  
> -static int tosa_ssp_init(SSISlave *dev)
> +static void tosa_ssp_realize(SSISlave *dev, Error **errp)
>  {
>  /* Nothing to do.  */
> -return 0;
>  }
>  
>  #define TYPE_TOSA_DAC "tosa_dac"
> @@ -283,7 +282,7 @@ static void tosa_ssp_class_init(ObjectCl
>  {
>  SSISlaveClass *k = SSI_SLAVE_CLASS(klass);
>  
> -k->init = tosa_ssp_init;
> +k->realize = tosa_ssp_realize;
>  k->transfer = tosa_ssp_tansfer;
>  }
>  
> Index: qemu-ast2400-mainline.git/hw/arm/z2.c
> ===
> --- qemu-ast2400-mainline.git.orig/hw/arm/z2.c
> +++ qemu-ast2400-mainline.git/hw/arm/z2.c
> @@ -151,14 +151,12 @@ static void z2_lcd_cs(void *opaque, int
>  z2_lcd->selected = !level;
>  }
>  
> -static int zipit_lcd_init(SSISlave *dev)
> +static void zipit_lcd_realize(SSISlave *dev, Error **errp)
>  {
>  ZipitLCD *z = FROM_SSI_SLAVE(ZipitLCD, dev);
>  z->selected = 0;
>  z->enabled = 0;
>  z->pos = 0;
> -
> -return 0;
>  }
>  
>  static VMStateDescription vmstate_zipit_lcd_state = {
> @@ -181,7 +179,7 @@ static void zipit_lcd_class_init(ObjectC
>  DeviceClass *dc = DEVICE_CLASS(klass);
>  SSISlaveClass *k = SSI_SLAVE_CLASS(klass);
>  
> -k->init = zipit_lcd_init;
> +k->realize = zipit_lcd_realize;
>  k->transfer = zipit_lcd_transfer;
>  dc->vmsd = _zipit_lcd_state;
>  }
> Index: qemu-ast2400-mainline.git/hw/block/m25p80.c
> ===
> --- qemu-ast2400-mainline.git.orig/hw/block/m25p80.c
> +++ qemu-ast2400-mainline.git/hw/block/m25p80.c
> @@ -28,6 +28,7 @@
>  #include "hw/ssi/ssi.h"
>  #include "qemu/bitops.h"
>  #include "qemu/log.h"
> 

[Qemu-block] [PATCH v2] ssi: change ssi_slave_init to be a realize ops

2016-06-15 Thread Cédric Le Goater
This enables qemu to handle late inits and report errors. All the SSI
slave routine names were changed accordingly. Code was modified to
handle errors when possible (m25p80 and ssi-sd)

Tested with the m25p80 slave object.

Suggested-by: Paolo Bonzini 
Signed-off-by: Cédric Le Goater 
---

 Should apply on top of :
 
m25p80: fix test on blk_pread() return value
https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05574.html

 Changes since v1 :

 - added a error_setg() report in ssi_sd_realize()

 hw/arm/spitz.c   |   12 
 hw/arm/tosa.c|5 ++---
 hw/arm/z2.c  |6 ++
 hw/block/m25p80.c|   12 +---
 hw/display/ads7846.c |5 ++---
 hw/display/ssd0323.c |5 ++---
 hw/misc/max111x.c|   12 ++--
 hw/sd/ssi-sd.c   |9 +
 hw/ssi/ssi.c |6 +++---
 include/hw/ssi/ssi.h |2 +-
 10 files changed, 32 insertions(+), 42 deletions(-)

Index: qemu-ast2400-mainline.git/hw/arm/spitz.c
===
--- qemu-ast2400-mainline.git.orig/hw/arm/spitz.c
+++ qemu-ast2400-mainline.git/hw/arm/spitz.c
@@ -598,15 +598,13 @@ static uint32_t spitz_lcdtg_transfer(SSI
 return 0;
 }
 
-static int spitz_lcdtg_init(SSISlave *dev)
+static void spitz_lcdtg_realize(SSISlave *dev, Error **errp)
 {
 SpitzLCDTG *s = FROM_SSI_SLAVE(SpitzLCDTG, dev);
 
 spitz_lcdtg = s;
 s->bl_power = 0;
 s->bl_intensity = 0x20;
-
-return 0;
 }
 
 /* SSP devices */
@@ -666,7 +664,7 @@ static void spitz_adc_temp_on(void *opaq
 max111x_set_input(max, MAX_BATT_TEMP, 0);
 }
 
-static int corgi_ssp_init(SSISlave *d)
+static void corgi_ssp_realize(SSISlave *d, Error **errp)
 {
 DeviceState *dev = DEVICE(d);
 CorgiSSPState *s = FROM_SSI_SLAVE(CorgiSSPState, d);
@@ -675,8 +673,6 @@ static int corgi_ssp_init(SSISlave *d)
 s->bus[0] = ssi_create_bus(dev, "ssi0");
 s->bus[1] = ssi_create_bus(dev, "ssi1");
 s->bus[2] = ssi_create_bus(dev, "ssi2");
-
-return 0;
 }
 
 static void spitz_ssp_attach(PXA2xxState *cpu)
@@ -1121,7 +1117,7 @@ static void corgi_ssp_class_init(ObjectC
 DeviceClass *dc = DEVICE_CLASS(klass);
 SSISlaveClass *k = SSI_SLAVE_CLASS(klass);
 
-k->init = corgi_ssp_init;
+k->realize = corgi_ssp_realize;
 k->transfer = corgi_ssp_transfer;
 dc->vmsd = _corgi_ssp_regs;
 }
@@ -1150,7 +1146,7 @@ static void spitz_lcdtg_class_init(Objec
 DeviceClass *dc = DEVICE_CLASS(klass);
 SSISlaveClass *k = SSI_SLAVE_CLASS(klass);
 
-k->init = spitz_lcdtg_init;
+k->realize = spitz_lcdtg_realize;
 k->transfer = spitz_lcdtg_transfer;
 dc->vmsd = _spitz_lcdtg_regs;
 }
Index: qemu-ast2400-mainline.git/hw/arm/tosa.c
===
--- qemu-ast2400-mainline.git.orig/hw/arm/tosa.c
+++ qemu-ast2400-mainline.git/hw/arm/tosa.c
@@ -127,10 +127,9 @@ static uint32_t tosa_ssp_tansfer(SSISlav
 return 0;
 }
 
-static int tosa_ssp_init(SSISlave *dev)
+static void tosa_ssp_realize(SSISlave *dev, Error **errp)
 {
 /* Nothing to do.  */
-return 0;
 }
 
 #define TYPE_TOSA_DAC "tosa_dac"
@@ -283,7 +282,7 @@ static void tosa_ssp_class_init(ObjectCl
 {
 SSISlaveClass *k = SSI_SLAVE_CLASS(klass);
 
-k->init = tosa_ssp_init;
+k->realize = tosa_ssp_realize;
 k->transfer = tosa_ssp_tansfer;
 }
 
Index: qemu-ast2400-mainline.git/hw/arm/z2.c
===
--- qemu-ast2400-mainline.git.orig/hw/arm/z2.c
+++ qemu-ast2400-mainline.git/hw/arm/z2.c
@@ -151,14 +151,12 @@ static void z2_lcd_cs(void *opaque, int
 z2_lcd->selected = !level;
 }
 
-static int zipit_lcd_init(SSISlave *dev)
+static void zipit_lcd_realize(SSISlave *dev, Error **errp)
 {
 ZipitLCD *z = FROM_SSI_SLAVE(ZipitLCD, dev);
 z->selected = 0;
 z->enabled = 0;
 z->pos = 0;
-
-return 0;
 }
 
 static VMStateDescription vmstate_zipit_lcd_state = {
@@ -181,7 +179,7 @@ static void zipit_lcd_class_init(ObjectC
 DeviceClass *dc = DEVICE_CLASS(klass);
 SSISlaveClass *k = SSI_SLAVE_CLASS(klass);
 
-k->init = zipit_lcd_init;
+k->realize = zipit_lcd_realize;
 k->transfer = zipit_lcd_transfer;
 dc->vmsd = _zipit_lcd_state;
 }
Index: qemu-ast2400-mainline.git/hw/block/m25p80.c
===
--- qemu-ast2400-mainline.git.orig/hw/block/m25p80.c
+++ qemu-ast2400-mainline.git/hw/block/m25p80.c
@@ -28,6 +28,7 @@
 #include "hw/ssi/ssi.h"
 #include "qemu/bitops.h"
 #include "qemu/log.h"
+#include "qapi/error.h"
 
 #ifndef M25P80_ERR_DEBUG
 #define M25P80_ERR_DEBUG 0
@@ -878,7 +879,7 @@ static uint32_t m25p80_transfer8(SSISlav
 return r;
 }
 
-static int m25p80_init(SSISlave *ss)
+static void m25p80_realize(SSISlave *ss, Error **errp)
 {
 DriveInfo *dinfo;
 Flash *s = M25P80(ss);
@@ -899,18 +900,15 @@ static int 

[Qemu-block] [PATCH 2/2] blockdev: Add dynamic module loading for block drivers

2016-06-15 Thread Colin Lord
From: Marc Mari 

Extend the current module interface to allow for block drivers to be loaded
dynamically on request.

The only block drivers that can be converted into modules are the drivers
that don't perform any init operation except for registering themselves. This
is why libiscsi has been disabled as a module.

All the necessary module information is located in a new structure found in
include/qemu/module_block.h

Signed-off-by: Marc Marí 
Signed-off-by: Colin Lord 
---
 Makefile  |  3 --
 block.c   | 86 +--
 include/qemu/module.h |  3 ++
 util/module.c | 37 ++
 4 files changed, 90 insertions(+), 39 deletions(-)

diff --git a/Makefile b/Makefile
index 8f8b6a2..461187c 100644
--- a/Makefile
+++ b/Makefile
@@ -247,9 +247,6 @@ Makefile: $(version-obj-y) $(version-lobj-y)
 libqemustub.a: $(stub-obj-y)
 libqemuutil.a: $(util-obj-y)
 
-block-modules = $(foreach o,$(block-obj-m),"$(basename $(subst /,-,$o))",) NULL
-util/module.o-cflags = -D'CONFIG_BLOCK_MODULES=$(block-modules)'
-
 ##
 
 qemu-img.o: qemu-img-cmds.h
diff --git a/block.c b/block.c
index f54bc25..7a91434 100644
--- a/block.c
+++ b/block.c
@@ -26,6 +26,7 @@
 #include "block/block_int.h"
 #include "block/blockjob.h"
 #include "qemu/error-report.h"
+#include "qemu/module_block.h"
 #include "qemu/module.h"
 #include "qapi/qmp/qerror.h"
 #include "qapi/qmp/qbool.h"
@@ -242,11 +243,29 @@ BlockDriverState *bdrv_new(void)
 BlockDriver *bdrv_find_format(const char *format_name)
 {
 BlockDriver *drv1;
+size_t i;
+
 QLIST_FOREACH(drv1, _drivers, list) {
 if (!strcmp(drv1->format_name, format_name)) {
 return drv1;
 }
 }
+
+for (i = 0; i < ARRAY_SIZE(block_driver_modules); ++i) {
+if (!strcmp(block_driver_modules[i].format_name, format_name)) {
+block_module_load_one(block_driver_modules[i].library_name);
+/* Copying code is not nice, but this way the current discovery is
+ * not modified. Calling recursively could fail if the library
+ * has been deleted.
+ */
+QLIST_FOREACH(drv1, _drivers, list) {
+if (!strcmp(drv1->format_name, format_name)) {
+return drv1;
+}
+}
+}
+}
+
 return NULL;
 }
 
@@ -447,8 +466,15 @@ int get_tmp_filename(char *filename, int size)
 static BlockDriver *find_hdev_driver(const char *filename)
 {
 int score_max = 0, score;
+size_t i;
 BlockDriver *drv = NULL, *d;
 
+for (i = 0; i < ARRAY_SIZE(block_driver_modules); ++i) {
+if (block_driver_modules[i].has_probe_device) {
+block_module_load_one(block_driver_modules[i].library_name);
+}
+}
+
 QLIST_FOREACH(d, _drivers, list) {
 if (d->bdrv_probe_device) {
 score = d->bdrv_probe_device(filename);
@@ -470,6 +496,7 @@ BlockDriver *bdrv_find_protocol(const char *filename,
 char protocol[128];
 int len;
 const char *p;
+size_t i;
 
 /* TODO Drivers without bdrv_file_open must be specified explicitly */
 
@@ -496,6 +523,7 @@ BlockDriver *bdrv_find_protocol(const char *filename,
 len = sizeof(protocol) - 1;
 memcpy(protocol, filename, len);
 protocol[len] = '\0';
+
 QLIST_FOREACH(drv1, _drivers, list) {
 if (drv1->protocol_name &&
 !strcmp(drv1->protocol_name, protocol)) {
@@ -503,6 +531,23 @@ BlockDriver *bdrv_find_protocol(const char *filename,
 }
 }
 
+for (i = 0; i < ARRAY_SIZE(block_driver_modules); ++i) {
+if (block_driver_modules[i].protocol_name &&
+!strcmp(block_driver_modules[i].protocol_name, protocol)) {
+block_module_load_one(block_driver_modules[i].library_name);
+/* Copying code is not nice, but this way the current discovery is
+ * not modified. Calling recursively could fail if the library
+ * has been deleted.
+ */
+QLIST_FOREACH(drv1, _drivers, list) {
+if (drv1->protocol_name &&
+!strcmp(drv1->protocol_name, protocol)) {
+return drv1;
+}
+}
+}
+}
+
 error_setg(errp, "Unknown protocol '%s'", protocol);
 return NULL;
 }
@@ -525,8 +570,15 @@ BlockDriver *bdrv_probe_all(const uint8_t *buf, int 
buf_size,
 const char *filename)
 {
 int score_max = 0, score;
+size_t i;
 BlockDriver *drv = NULL, *d;
 
+for (i = 0; i < ARRAY_SIZE(block_driver_modules); ++i) {
+if (block_driver_modules[i].has_probe) {
+block_module_load_one(block_driver_modules[i].library_name);
+}
+}
+
 QLIST_FOREACH(d, _drivers, list) {
 if (d->bdrv_probe) {
  

Re: [Qemu-block] [PATCH] ssi: change ssi_slave_init to be a realize ops

2016-06-15 Thread Paolo Bonzini


On 15/06/2016 17:44, Cédric Le Goater wrote:
>  s->sd = sd_init(dinfo ? blk_by_legacy_dinfo(dinfo) : NULL, true);
>  if (s->sd == NULL) {
> -return -1;

This needs an error_setg (see device_realize in hw/core/qdev.c for an
example) until sd_init is changed to take Error *.

Otherwise looks okay.

Paolo

> +return;
>  }
>  register_savevm(dev, "ssi_sd", -1, 1, ssi_sd_save, ssi_sd_load, s);
> -return 0;
>  }



[Qemu-block] [PATCH 0/2] Dynamic module loading for block drivers

2016-06-15 Thread Colin Lord
This is a repost of some previous patches written by Marc Marí which
were also reposted by Richard Jones a few months ago. The original
series and reposted series are here:

https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg01995.html
https://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg01994.html

I've tried to take some of the previous feedback and integrate it into
this series. There are also a few things I haven't touched that were
previously mentioned though:

1) Denis Lunev suggested having block_module_load_one return the
loaded driver to reduce duplicated for loops in many of the functions
in block.c. I'd be happy to do this but wasn't completely sure how
error handling would happen in that case since currently the return
value is an integer error code. Would I switch to using the
error handling mechanisms provided in util/error.c?

2) In the past some debate was had about the design of the modules.
I haven't made any changes in this regard because it didn't seem a
conclusion had been reached. Some links for background:

Fam Zheng suggested a simpler parser:
https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg02201.html

Denis Lunev suggested a design more similar to Linux kernel modules:
https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg05331.html

Richard Jones suggested reasons to keep things as is:
https://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg01994.html

Marc Mari (2):
  blockdev: Add dynamic generation of module_block.h
  blockdev: Add dynamic module loading for block drivers

 .gitignore  |   1 +
 Makefile|  11 +++-
 block.c |  86 +++---
 include/qemu/module.h   |   3 +
 scripts/modules/module_block.py | 134 
 util/module.c   |  37 +++
 6 files changed, 233 insertions(+), 39 deletions(-)
 create mode 100644 scripts/modules/module_block.py

-- 
2.5.5




[Qemu-block] [PATCH 1/2] blockdev: Add dynamic generation of module_block.h

2016-06-15 Thread Colin Lord
From: Marc Mari 

To simplify the addition of new block modules, add a script that generates
include/qemu/module_block.h automatically from the modules' source code.

This script assumes that the QEMU coding style rules are followed.

Signed-off-by: Marc Marí 
Signed-off-by: Colin Lord 
---
 .gitignore  |   1 +
 Makefile|   8 +++
 scripts/modules/module_block.py | 134 
 3 files changed, 143 insertions(+)
 create mode 100644 scripts/modules/module_block.py

diff --git a/.gitignore b/.gitignore
index 38ee1c5..06aa064 100644
--- a/.gitignore
+++ b/.gitignore
@@ -110,3 +110,4 @@ tags
 TAGS
 docker-src.*
 *~
+/include/qemu/module_block.h
diff --git a/Makefile b/Makefile
index ed4032a..8f8b6a2 100644
--- a/Makefile
+++ b/Makefile
@@ -76,6 +76,8 @@ GENERATED_HEADERS += trace/generated-ust-provider.h
 GENERATED_SOURCES += trace/generated-ust.c
 endif
 
+GENERATED_HEADERS += include/qemu/module_block.h
+
 # Don't try to regenerate Makefile or configure
 # We don't generate any of them
 Makefile: ;
@@ -352,6 +354,12 @@ ivshmem-client$(EXESUF): $(ivshmem-client-obj-y) 
libqemuutil.a libqemustub.a
 ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) libqemuutil.a libqemustub.a
$(call LINK, $^)
 
+include/qemu/module_block.h: $(SRC_PATH)/scripts/modules/module_block.py 
config-host.mak
+   $(call quiet-command,$(PYTHON) \
+$(SRC_PATH)/scripts/modules/module_block.py \
+   $(SRC_PATH)/"./include/qemu/" $(addprefix $(SRC_PATH)/,$(patsubst 
%.mo,%.c,$(block-obj-m))), \
+   "  GEN   $@")
+
 clean:
 # avoid old build problems by removing potentially incorrect old files
rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h 
gen-op-arm.h
diff --git a/scripts/modules/module_block.py b/scripts/modules/module_block.py
new file mode 100644
index 000..005bc49
--- /dev/null
+++ b/scripts/modules/module_block.py
@@ -0,0 +1,134 @@
+#!/usr/bin/python
+#
+# Module information generator
+#
+# Copyright Red Hat, Inc. 2015
+#
+# Authors:
+#  Marc Mari 
+#
+# This work is licensed under the terms of the GNU GPL, version 2.
+# See the COPYING file in the top-level directory.
+
+from __future__ import print_function
+import sys
+import os
+
+def get_string_struct(line):
+data = line.split()
+
+# data[0] -> struct element name
+# data[1] -> =
+# data[2] -> value
+
+return data[2].replace('"', '')[:-1]
+
+def add_module(fhader, library, format_name, protocol_name,
+probe, probe_device):
+lines = []
+lines.append('.library_name = "' + library + '",')
+if format_name != "":
+lines.append('.format_name = "' + format_name + '",')
+if protocol_name != "":
+lines.append('.protocol_name = "' + protocol_name + '",')
+if probe:
+lines.append('.has_probe = true,')
+if probe_device:
+lines.append('.has_probe_device = true,')
+
+text = '\n\t'.join(lines)
+fheader.write('\n\t{\n\t' + text + '\n\t},')
+
+def process_file(fheader, filename):
+# This parser assumes the coding style rules are being followed
+with open(filename, "r") as cfile:
+found_something = False
+found_start = False
+library, _ = os.path.splitext(os.path.basename(filename))
+for line in cfile:
+if found_start:
+line = line.replace('\n', '')
+if line.find(".format_name") != -1:
+format_name = get_string_struct(line)
+elif line.find(".protocol_name") != -1:
+protocol_name = get_string_struct(line)
+elif line.find(".bdrv_probe") != -1:
+probe = True
+elif line.find(".bdrv_probe_device") != -1:
+probe_device = True
+elif line == "};":
+add_module(fheader, library, format_name, protocol_name,
+probe, probe_device)
+found_start = False
+elif line.find("static BlockDriver") != -1:
+found_something = True
+found_start = True
+format_name = ""
+protocol_name = ""
+probe = False
+probe_device = False
+
+if not found_something:
+print("No BlockDriver struct found in " + filename + ". \
+Is this really a module?", file=sys.stderr)
+sys.exit(1)
+
+def print_top(fheader):
+fheader.write('''/* AUTOMATICALLY GENERATED, DO NOT MODIFY */
+/*
+ * QEMU Block Module Infrastructure
+ *
+ * Copyright Red Hat, Inc. 2015
+ *
+ * Authors:
+ *  Marc Mari   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+''')
+
+fheader.write('''#ifndef QEMU_MODULE_BLOCK_H

Re: [Qemu-block] [PATCH] m25p80: fix test on blk_pread() return value

2016-06-15 Thread Cédric Le Goater
Hello Eric,

On 06/13/2016 06:47 PM, Eric Blake wrote:
> On 06/13/2016 10:25 AM, Cédric Le Goater wrote:
> 
>>
>> It seems that commit 243e6f69c129 ("m25p80: Switch to byte-based block 
>> access") 
>> is bringing another issue :
>>
>> qemu-system-arm: 
>> /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1252: 
>> bdrv_aligned_pwritev: Assertion `!qiov || bytes == qiov->size' failed.
>> Aborted (core dumped)
> 
> Can you provide a more complete stack dump, and/or a recipe on how to
> repeat the assertion?

Here you are, it is a bit long ...
 
You need to get a few files :

* an OpenBMC kernel : 

wget 
https://openpower.xyz/job/openbmc-build/distro=ubuntu,target=palmetto/lastSuccessfulBuild/artifact/images/palmetto/cuImage

* an OpenBMC rootfs : 

wget 
https://openpower.xyz/job/openbmc-build/distro=ubuntu,target=palmetto/lastSuccessfulBuild/artifact/images/palmetto/obmc-phosphor-image-palmetto.cpio.gz

* an OpenBMC flash image containing the above (with which we should boot 
someday) : 

wget 
https://openpower.xyz/job/openbmc-build/distro=ubuntu,target=palmetto/lastSuccessfulBuild/artifact/images/palmetto/flash-palmetto

* an OpenPower flash image for the host : 

wget 
https://openpower.xyz/job/openpower-op-build/distro=ubuntu,target=palmetto/lastSuccessfulBuild/artifact/images/palmetto.pnor

Clone this qemu branch which adds to the ast24000 SOC its SPI/SMC controllers:

https://github.com/legoater/qemu/commits/aspeed-ssi

The extra commits bring :

 ast2400: add SMC controllers (FMC and SPI)
 ast2400: add SPI flash slave object
 ast2400: create SPI flash slaves

 hw/arm/ast2400.c|  31 +++
 hw/arm/palmetto-bmc.c   |   3 +
 hw/ssi/Makefile.objs|   1 +
 hw/ssi/aspeed_smc.c | 451 

 include/hw/arm/ast2400.h|   3 +
 include/hw/ssi/aspeed_smc.h | 105 +++
 6 files changed, 594 insertions(+)
 create mode 100644 hw/ssi/aspeed_smc.c
 create mode 100644 include/hw/ssi/aspeed_smc.h

and these, but we don't really care :

 m25p80: fix test on blk_pread() return value
 ssi: change ssi_slave_init to be a realize ops

Compile with arm-softmmu, should be enough, and run with :

qemu-system-arm -m 256 -M palmetto-bmc -kernel ./cuImage  -initrd 
./obmc-phosphor-image-palmetto.cpio.gz -mtdblock ./flash-palmetto -mtdblock 
./palmetto.pnor -snapshot -nographic -nodefaults -monitor stdio -serial pty -S

When booted, log with root/OpenBmc and run :

dd if=/dev/zero of=/dev/mtd0

you should get :

(qemu) qemu-system-arm: .../block/io.c:1243: bdrv_aligned_pwritev: 
Assertion `!qiov || bytes == qiov->size' failed.
Aborted (core dumped)

If there are some messages like :

qemu-system-arm: aspeed_smc_flash_read: flash not in usermode

This is because a userspace tool is trying to access the host
flash image (./palmetto.pnor) but the support for linear
addressing mode is not in the branch yet. you can ignore them.

If you have read up to here, that probably means you might try the
above :) I wish I had a simpler way, but no ... we need to work on 
some unit test I guess.

Thanks,

C.





[Qemu-block] [PATCH 7/7] sheepdog: remove useless casts

2016-06-15 Thread Laurent Vivier
This patch is the result of coccinelle script
scripts/coccinelle/typecast.cocci

CC: Hitoshi Mitake 
CC: qemu-block@nongnu.org
Signed-off-by: Laurent Vivier 
---
 block/sheepdog.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 23fbace..95e6a11 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1049,7 +1049,7 @@ static int parse_vdiname(BDRVSheepdogState *s, const char 
*filename,
 const char *host_spec, *vdi_spec;
 int nr_sep, ret;
 
-strstart(filename, "sheepdog:", (const char **));
+strstart(filename, "sheepdog:", );
 p = q = g_strdup(filename);
 
 /* count the number of separators */
@@ -2652,7 +2652,7 @@ static int sd_snapshot_list(BlockDriverState *bs, 
QEMUSnapshotInfo **psn_tab)
 req.opcode = SD_OP_READ_VDIS;
 req.data_length = max;
 
-ret = do_req(fd, s->aio_context, (SheepdogReq *),
+ret = do_req(fd, s->aio_context, ,
  vdi_inuse, , );
 
 closesocket(fd);
-- 
2.5.5




Re: [Qemu-block] [Qemu-devel] Supplying QCOW2 as 'file' driver to `blockdev-add` results in a QEMU crash

2016-06-15 Thread Eric Blake
On 06/15/2016 09:38 AM, Eric Blake wrote:
> On 06/15/2016 09:17 AM, Max Reitz wrote:
>> On 15.06.2016 11:58, Kashyap Chamarthy wrote:
>>> Seems like supplying "qcow2" file BlockdevDriver option to QMP
>>> `blockdev-add` results in a SIGSEGV:
>>>
>>> [...]
>>> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>>> 0x55a0121f in visit_type_BlockdevRef ()
>>> [...]
>>>
>>> Reproducer
>>> --
>>
>> Even simpler reproducer:
>>
>> {'execute':'blockdev-add','arguments':{'options':{'driver':'raw'}}}
>>
>> Seems like a QAPI problem to me, and bisecting yields
>> dbf11922622685934bfb41e7cf2be9bd4a0405c0 as the culprit.
> 
> I'm looking into it. Thanks for the testcase.

Okay, the problem is based on error handling - you have a missing 'file'
argument.  That patch consolidated things to do two things at once
instead of two calls where the second was skipped if the first failed;
and now ends up dereferencing NULL.  I didn't notice or test it at the
time, so I get to enhance the testsuite as part of my patch.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH] ssi: change ssi_slave_init to be a realize ops

2016-06-15 Thread Cédric Le Goater
This enables qemu to handle late inits and report errors. All the SSI
slave routine names were changed accordingly. Code was modified to
handle errors when possible (m25p80)

Tested with the m25p80 slave object.

Suggested-by: Paolo Bonzini 
Signed-off-by: Cédric Le Goater 
---

 Should apply on top of :
 
m25p80: fix test on blk_pread() return value
https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05574.html

 hw/arm/spitz.c   | 12 
 hw/arm/tosa.c|  5 ++---
 hw/arm/z2.c  |  6 ++
 hw/block/m25p80.c| 12 +---
 hw/display/ads7846.c |  5 ++---
 hw/display/ssd0323.c |  5 ++---
 hw/misc/max111x.c| 12 ++--
 hw/sd/ssi-sd.c   |  7 +++
 hw/ssi/ssi.c |  6 +++---
 include/hw/ssi/ssi.h |  2 +-
 10 files changed, 30 insertions(+), 42 deletions(-)

diff --git a/hw/arm/spitz.c b/hw/arm/spitz.c
index ba40f8302bc6..41cc2eeeb1ba 100644
--- a/hw/arm/spitz.c
+++ b/hw/arm/spitz.c
@@ -598,15 +598,13 @@ static uint32_t spitz_lcdtg_transfer(SSISlave *dev, 
uint32_t value)
 return 0;
 }
 
-static int spitz_lcdtg_init(SSISlave *dev)
+static void spitz_lcdtg_realize(SSISlave *dev, Error **errp)
 {
 SpitzLCDTG *s = FROM_SSI_SLAVE(SpitzLCDTG, dev);
 
 spitz_lcdtg = s;
 s->bl_power = 0;
 s->bl_intensity = 0x20;
-
-return 0;
 }
 
 /* SSP devices */
@@ -666,7 +664,7 @@ static void spitz_adc_temp_on(void *opaque, int line, int 
level)
 max111x_set_input(max, MAX_BATT_TEMP, 0);
 }
 
-static int corgi_ssp_init(SSISlave *d)
+static void corgi_ssp_realize(SSISlave *d, Error **errp)
 {
 DeviceState *dev = DEVICE(d);
 CorgiSSPState *s = FROM_SSI_SLAVE(CorgiSSPState, d);
@@ -675,8 +673,6 @@ static int corgi_ssp_init(SSISlave *d)
 s->bus[0] = ssi_create_bus(dev, "ssi0");
 s->bus[1] = ssi_create_bus(dev, "ssi1");
 s->bus[2] = ssi_create_bus(dev, "ssi2");
-
-return 0;
 }
 
 static void spitz_ssp_attach(PXA2xxState *cpu)
@@ -1121,7 +1117,7 @@ static void corgi_ssp_class_init(ObjectClass *klass, void 
*data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 SSISlaveClass *k = SSI_SLAVE_CLASS(klass);
 
-k->init = corgi_ssp_init;
+k->realize = corgi_ssp_realize;
 k->transfer = corgi_ssp_transfer;
 dc->vmsd = _corgi_ssp_regs;
 }
@@ -1150,7 +1146,7 @@ static void spitz_lcdtg_class_init(ObjectClass *klass, 
void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 SSISlaveClass *k = SSI_SLAVE_CLASS(klass);
 
-k->init = spitz_lcdtg_init;
+k->realize = spitz_lcdtg_realize;
 k->transfer = spitz_lcdtg_transfer;
 dc->vmsd = _spitz_lcdtg_regs;
 }
diff --git a/hw/arm/tosa.c b/hw/arm/tosa.c
index 4e9494f94c20..2db66508b5b6 100644
--- a/hw/arm/tosa.c
+++ b/hw/arm/tosa.c
@@ -127,10 +127,9 @@ static uint32_t tosa_ssp_tansfer(SSISlave *dev, uint32_t 
value)
 return 0;
 }
 
-static int tosa_ssp_init(SSISlave *dev)
+static void tosa_ssp_realize(SSISlave *dev, Error **errp)
 {
 /* Nothing to do.  */
-return 0;
 }
 
 #define TYPE_TOSA_DAC "tosa_dac"
@@ -283,7 +282,7 @@ static void tosa_ssp_class_init(ObjectClass *klass, void 
*data)
 {
 SSISlaveClass *k = SSI_SLAVE_CLASS(klass);
 
-k->init = tosa_ssp_init;
+k->realize = tosa_ssp_realize;
 k->transfer = tosa_ssp_tansfer;
 }
 
diff --git a/hw/arm/z2.c b/hw/arm/z2.c
index aea895a500ef..68a92f3184d7 100644
--- a/hw/arm/z2.c
+++ b/hw/arm/z2.c
@@ -151,14 +151,12 @@ static void z2_lcd_cs(void *opaque, int line, int level)
 z2_lcd->selected = !level;
 }
 
-static int zipit_lcd_init(SSISlave *dev)
+static void zipit_lcd_realize(SSISlave *dev, Error **errp)
 {
 ZipitLCD *z = FROM_SSI_SLAVE(ZipitLCD, dev);
 z->selected = 0;
 z->enabled = 0;
 z->pos = 0;
-
-return 0;
 }
 
 static VMStateDescription vmstate_zipit_lcd_state = {
@@ -181,7 +179,7 @@ static void zipit_lcd_class_init(ObjectClass *klass, void 
*data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 SSISlaveClass *k = SSI_SLAVE_CLASS(klass);
 
-k->init = zipit_lcd_init;
+k->realize = zipit_lcd_realize;
 k->transfer = zipit_lcd_transfer;
 dc->vmsd = _zipit_lcd_state;
 }
diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index 51d85960566f..e6f6e23fb71d 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -28,6 +28,7 @@
 #include "hw/ssi/ssi.h"
 #include "qemu/bitops.h"
 #include "qemu/log.h"
+#include "qapi/error.h"
 
 #ifndef M25P80_ERR_DEBUG
 #define M25P80_ERR_DEBUG 0
@@ -878,7 +879,7 @@ static uint32_t m25p80_transfer8(SSISlave *ss, uint32_t tx)
 return r;
 }
 
-static int m25p80_init(SSISlave *ss)
+static void m25p80_realize(SSISlave *ss, Error **errp)
 {
 DriveInfo *dinfo;
 Flash *s = M25P80(ss);
@@ -899,18 +900,15 @@ static int m25p80_init(SSISlave *ss)
 
 s->storage = blk_blockalign(s->blk, s->size);
 
-/* FIXME: Move to late init */
 if (blk_pread(s->blk, 0, s->storage, s->size) != s->size) {
-fprintf(stderr, "Failed to 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/2] qemu-img: Use strerror() for generic resize error

2016-06-15 Thread Eric Blake
On 06/15/2016 09:36 AM, Max Reitz wrote:
> Emitting the plain error number is not very helpful. Use strerror()
> instead.
> 
> Signed-off-by: Max Reitz 
> ---
>  qemu-img.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 14e2661..d5ccd9a 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -3277,7 +3277,7 @@ static int img_resize(int argc, char **argv)
>  error_report("Image is read-only");
>  break;
>  default:
> -error_report("Error resizing image (%d)", -ret);
> +error_report("Error resizing image: %s", strerror(-ret));
>  break;

I've argued in the past that we have lots of error_report() callers
using strerror(), and may want to add an error_report_errno() (similar
to error_setg_errno()); this just adds to the list of things that would
benefit.  But such a conversion would be a separate BiteSized task, and
doesn't invalidate your patch.

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] qcow2: Avoid making the L1 table too big

2016-06-15 Thread Eric Blake
On 06/15/2016 09:36 AM, Max Reitz wrote:
> We refuse to open images whose L1 table we deem "too big". Consequently,
> we should not produce such images ourselves.
> 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Max Reitz 
> ---
>  block/qcow2-cluster.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Eric Blake 

> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 893ddf6..335b9b0 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -65,7 +65,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t 
> min_size,
>  }
>  }
>  
> -if (new_l1_size > INT_MAX / sizeof(uint64_t)) {
> +if (new_l1_size > QCOW_MAX_L1_SIZE / sizeof(uint64_t)) {
>  return -EFBIG;
>  }
>  
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] Supplying QCOW2 as 'file' driver to `blockdev-add` results in a QEMU crash

2016-06-15 Thread Eric Blake
On 06/15/2016 09:17 AM, Max Reitz wrote:
> On 15.06.2016 11:58, Kashyap Chamarthy wrote:
>> Seems like supplying "qcow2" file BlockdevDriver option to QMP
>> `blockdev-add` results in a SIGSEGV:
>>
>>  [...]
>> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>> 0x55a0121f in visit_type_BlockdevRef ()
>>  [...]
>>
>> Reproducer
>> --
> 
> Even simpler reproducer:
> 
> {'execute':'blockdev-add','arguments':{'options':{'driver':'raw'}}}
> 
> Seems like a QAPI problem to me, and bisecting yields
> dbf11922622685934bfb41e7cf2be9bd4a0405c0 as the culprit.

I'm looking into it. Thanks for the testcase.


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH 2/2] qcow2: Avoid making the L1 table too big

2016-06-15 Thread Max Reitz
We refuse to open images whose L1 table we deem "too big". Consequently,
we should not produce such images ourselves.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Max Reitz 
---
 block/qcow2-cluster.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 893ddf6..335b9b0 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -65,7 +65,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t 
min_size,
 }
 }
 
-if (new_l1_size > INT_MAX / sizeof(uint64_t)) {
+if (new_l1_size > QCOW_MAX_L1_SIZE / sizeof(uint64_t)) {
 return -EFBIG;
 }
 
-- 
2.8.3




[Qemu-block] [PATCH 0/2] qcow2: Avoid making the L1 table too big

2016-06-15 Thread Max Reitz
See https://bugs.launchpad.net/qemu/+bug/1592590 for a bug report.

Reproducer:

$ ./qemu-img create -f qcow2 test.qcow2 1M
Formatting 'test.qcow2', fmt=qcow2 size=1048576 encryption=off
cluster_size=65536 lazy_refcounts=off refcount_bits=16
$ ./qemu-img resize test.qcow2 10T
Image resized.
$ ./qemu-img info test.qcow2
qemu-img: Could not open 'test.qcow2': Active L1 table too large

After this series:

$ ./qemu-img resize test.qcow2 10T
qemu-img: Error resizing image: File too large


Max Reitz (2):
  qemu-img: Use strerror() for generic resize error
  qcow2: Avoid making the L1 table too big

 block/qcow2-cluster.c | 2 +-
 qemu-img.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

-- 
2.8.3




[Qemu-block] [PATCH 1/2] qemu-img: Use strerror() for generic resize error

2016-06-15 Thread Max Reitz
Emitting the plain error number is not very helpful. Use strerror()
instead.

Signed-off-by: Max Reitz 
---
 qemu-img.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-img.c b/qemu-img.c
index 14e2661..d5ccd9a 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3277,7 +3277,7 @@ static int img_resize(int argc, char **argv)
 error_report("Image is read-only");
 break;
 default:
-error_report("Error resizing image (%d)", -ret);
+error_report("Error resizing image: %s", strerror(-ret));
 break;
 }
 out:
-- 
2.8.3




Re: [Qemu-block] [Qemu-devel] Supplying QCOW2 as 'file' driver to `blockdev-add` results in a QEMU crash

2016-06-15 Thread Max Reitz
On 15.06.2016 11:58, Kashyap Chamarthy wrote:
> Seems like supplying "qcow2" file BlockdevDriver option to QMP
> `blockdev-add` results in a SIGSEGV:
> 
>   [...]
> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x55a0121f in visit_type_BlockdevRef ()
>   [...]
> 
> Reproducer
> --

Even simpler reproducer:

{'execute':'blockdev-add','arguments':{'options':{'driver':'raw'}}}

Seems like a QAPI problem to me, and bisecting yields
dbf11922622685934bfb41e7cf2be9bd4a0405c0 as the culprit.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 08/14] block/nbd: Accept SocketAddress

2016-06-15 Thread Max Reitz
On 15.06.2016 01:14, Eric Blake wrote:
> On 04/06/2016 12:28 PM, Max Reitz wrote:
>> Add a new option "address" to the NBD block driver which accepts a
>> SocketAddress.
>>
>> "path", "host" and "port" are still supported as legacy options and are
>> mapped to their corresponding SocketAddress representation.
> 
> The back-compat work is pretty slick.
> 
>>
>> Signed-off-by: Max Reitz 
>> ---
>>  block/nbd.c   | 97 
>> ++-
>>  tests/qemu-iotests/051.out|  4 +-
>>  tests/qemu-iotests/051.pc.out |  4 +-
>>  3 files changed, 64 insertions(+), 41 deletions(-)
>>
>> diff --git a/block/nbd.c b/block/nbd.c
>> index 3adf302..9ae66ba 100644
>> --- a/block/nbd.c
>> +++ b/block/nbd.c
>> @@ -32,6 +32,8 @@
>>  #include "qemu/uri.h"
>>  #include "block/block_int.h"
>>  #include "qemu/module.h"
>> +#include "qapi-visit.h"
>> +#include "qapi/qmp-input-visitor.h"
>>  #include "qapi/qmp/qdict.h"
>>  #include "qapi/qmp/qjson.h"
>>  #include "qapi/qmp/qint.h"
>> @@ -128,7 +130,9 @@ static bool nbd_has_filename_options_conflict(QDict 
>> *options, Error **errp)
>>  if (!strcmp(e->key, "host")
>>  || !strcmp(e->key, "port")
>>  || !strcmp(e->key, "path")
>> -|| !strcmp(e->key, "export"))
>> +|| !strcmp(e->key, "export")
>> +|| !strcmp(e->key, "address")
>> +|| !strncmp(e->key, "address.", 8))
> 
> May need a tweak if you break before '||'

Will do. (Same for patch 6.)

> 
>>  {
>>  error_setg(errp, "Option '%s' cannot be used with a file name",
>> e->key);
>> @@ -202,46 +206,66 @@ out:
>>  g_free(file);
>>  }
>>  
>> +static bool nbd_process_legacy_socket_options(QDict *options, Error **errp)
>> +{
>> +if (qdict_haskey(options, "path") && qdict_haskey(options, "host")) {
>> +error_setg(errp, "path and host may not be used at the same time");
>> +return false;
>> +} else if (qdict_haskey(options, "path")) {
>> +if (qdict_haskey(options, "port")) {
>> +error_setg(errp, "port may not be used without host");
>> +return false;
>> +}
>> +
>> +qdict_put(options, "address.type", qstring_from_str("unix"));
>> +qdict_change_key(options, "path", "address.data.path");
>> +} else if (qdict_haskey(options, "host")) {
>> +qdict_put(options, "address.type", qstring_from_str("inet"));
>> +qdict_change_key(options, "host", "address.data.host");
>> +if (!qdict_change_key(options, "port", "address.data.port")) {
>> +qdict_put(options, "address.data.port",
>> +  qstring_from_str(stringify(NBD_DEFAULT_PORT)));
>> +}
> 
> The rewrite from old to modern is rather nice.  I wonder if we could
> then use a generated qapi_visit_SocketAddress instead of manually
> handling all the complication of SocketAddress proper.
> 
>> +}
>> +
>> +return true;
>> +}
>> +
>>  static SocketAddress *nbd_config(BDRVNBDState *s, QDict *options, char 
>> **export,
>>   Error **errp)
>>  {
>> -SocketAddress *saddr;
>> +SocketAddress *saddr = NULL;
>> +QDict *addr = NULL;
>> +QObject *crumpled_addr;
>> +QmpInputVisitor *iv;
>> +Error *local_err = NULL;
>>  
>> -if (qdict_haskey(options, "path") == qdict_haskey(options, "host")) {
>> -if (qdict_haskey(options, "path")) {
>> -error_setg(errp, "path and host may not be used at the same 
>> time");
>> -} else {
>> -error_setg(errp, "one of path and host must be specified");
>> -}
>> -return NULL;
>> +if (!nbd_process_legacy_socket_options(options, errp)) {
>> +goto fail;
>>  }
>> -if (qdict_haskey(options, "port") && !qdict_haskey(options, "host")) {
>> -error_setg(errp, "port may not be used without host");
>> -return NULL;
>> +
>> +qdict_extract_subqdict(options, , "address.");
>> +if (!qdict_size(addr)) {
>> +error_setg(errp, "NBD server address missing");
>> +goto fail;
>>  }
>>  
>> -saddr = g_new0(SocketAddress, 1);
>> +crumpled_addr = qdict_crumple(addr, true, errp);
>> +if (!crumpled_addr) {
>> +goto fail;
>> +}
>>  
> 
> Ah, so you ARE depending on Dan's qdict_crumple().

In v2, I was still using my own version (qdict_unflatten()), written
specifically for this. Since both fulfill the same purpose and I got to
review qdict_crumple() before anyone got to review qdict_unflatten(), I
dropped the qdict_unflatten() patch from v3; also, "crumple" is more
imaginative than "unflatten", so it has to be better. :-)

>> -if (qdict_haskey(options, "path")) {
>> -UnixSocketAddress *q_unix;
>> -saddr->type = SOCKET_ADDRESS_KIND_UNIX;
>> -q_unix = saddr->u.q_unix.data = g_new0(UnixSocketAddress, 1);
>> -q_unix->path = 

Re: [Qemu-block] [PATCH] m25p80: provide a realize to support late inits.

2016-06-15 Thread Cédric Le Goater
On 06/15/2016 04:20 PM, Paolo Bonzini wrote:
> 
> 
> On 15/06/2016 16:00, Cédric Le Goater wrote:
>> We also need to realize() the SSISlave part of the object. This is why
>> the previous realize() ops is stored in M25P80Class and called in the
>> object realize() ops.
>>
>> This is fully compatible with the existing users of m25p80 and it
>> provides a way to handle errors on the drive backend.
>>
>> Signed-off-by: Cédric Le Goater 
> 
> I think you should instead:
> 
> 1) change hw/ssi/ssi.c's ssi_slave_init to be an override of dc->realize
> 
> 2) change SSISlaveClass's init member to a realize function

OK. I will look into that. 

Thanks,

C.


> Thanks,
> 
> Paolo
> 
>> ---
>>
>>  Should apply on top of :
>>  
>> m25p80: fix test on blk_pread() return value
>> https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05574.html
>>
>>  hw/block/m25p80.c | 24 +---
>>  1 file changed, 17 insertions(+), 7 deletions(-)
>>
>> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
>> index 51d85960566f..c47722d3a3e5 100644
>> --- a/hw/block/m25p80.c
>> +++ b/hw/block/m25p80.c
>> @@ -28,6 +28,7 @@
>>  #include "hw/ssi/ssi.h"
>>  #include "qemu/bitops.h"
>>  #include "qemu/log.h"
>> +#include "qapi/error.h"
>>  
>>  #ifndef M25P80_ERR_DEBUG
>>  #define M25P80_ERR_DEBUG 0
>> @@ -339,6 +340,7 @@ typedef struct Flash {
>>  
>>  typedef struct M25P80Class {
>>  SSISlaveClass parent_class;
>> +DeviceRealize parent_dc_realize;
>>  FlashPartInfo *pi;
>>  } M25P80Class;
>>  
>> @@ -880,7 +882,6 @@ static uint32_t m25p80_transfer8(SSISlave *ss, uint32_t 
>> tx)
>>  
>>  static int m25p80_init(SSISlave *ss)
>>  {
>> -DriveInfo *dinfo;
>>  Flash *s = M25P80(ss);
>>  M25P80Class *mc = M25P80_GET_CLASS(s);
>>  
>> @@ -888,8 +889,18 @@ static int m25p80_init(SSISlave *ss)
>>  
>>  s->size = s->pi->sector_size * s->pi->n_sectors;
>>  s->dirty_page = -1;
>> +return 0;
>> +}
>> +
>> +static void m25p80_realize(DeviceState *dev, Error **errp)
>> +{
>> +Flash *s = M25P80(dev);
>> +M25P80Class *mc = M25P80_GET_CLASS(s);
>> +DriveInfo *dinfo;
>> +
>> +/* initialize the SSISlave part */
>> +mc->parent_dc_realize(dev, errp);
>>  
>> -/* FIXME use a qdev drive property instead of drive_get_next() */
>>  dinfo = drive_get_next(IF_MTD);
>>  
>>  if (dinfo) {
>> @@ -899,18 +910,15 @@ static int m25p80_init(SSISlave *ss)
>>  
>>  s->storage = blk_blockalign(s->blk, s->size);
>>  
>> -/* FIXME: Move to late init */
>>  if (blk_pread(s->blk, 0, s->storage, s->size) != s->size) {
>> -fprintf(stderr, "Failed to initialize SPI flash!\n");
>> -return 1;
>> +error_setg(errp, "failed to read the initial flash content");
>> +return;
>>  }
>>  } else {
>>  DB_PRINT_L(0, "No BDRV - binding to RAM\n");
>>  s->storage = blk_blockalign(NULL, s->size);
>>  memset(s->storage, 0xFF, s->size);
>>  }
>> -
>> -return 0;
>>  }
>>  
>>  static void m25p80_reset(DeviceState *d)
>> @@ -967,6 +975,8 @@ static void m25p80_class_init(ObjectClass *klass, void 
>> *data)
>>  dc->vmsd = _m25p80;
>>  dc->props = m25p80_properties;
>>  dc->reset = m25p80_reset;
>> +mc->parent_dc_realize = dc->realize;
>> +dc->realize = m25p80_realize;
>>  mc->pi = data;
>>  }
>>  
>>




Re: [Qemu-block] [PATCH] m25p80: provide a realize to support late inits.

2016-06-15 Thread Paolo Bonzini


On 15/06/2016 16:00, Cédric Le Goater wrote:
> We also need to realize() the SSISlave part of the object. This is why
> the previous realize() ops is stored in M25P80Class and called in the
> object realize() ops.
> 
> This is fully compatible with the existing users of m25p80 and it
> provides a way to handle errors on the drive backend.
> 
> Signed-off-by: Cédric Le Goater 

I think you should instead:

1) change hw/ssi/ssi.c's ssi_slave_init to be an override of dc->realize

2) change SSISlaveClass's init member to a realize function

Thanks,

Paolo

> ---
> 
>  Should apply on top of :
>  
> m25p80: fix test on blk_pread() return value
> https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05574.html
> 
>  hw/block/m25p80.c | 24 +---
>  1 file changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
> index 51d85960566f..c47722d3a3e5 100644
> --- a/hw/block/m25p80.c
> +++ b/hw/block/m25p80.c
> @@ -28,6 +28,7 @@
>  #include "hw/ssi/ssi.h"
>  #include "qemu/bitops.h"
>  #include "qemu/log.h"
> +#include "qapi/error.h"
>  
>  #ifndef M25P80_ERR_DEBUG
>  #define M25P80_ERR_DEBUG 0
> @@ -339,6 +340,7 @@ typedef struct Flash {
>  
>  typedef struct M25P80Class {
>  SSISlaveClass parent_class;
> +DeviceRealize parent_dc_realize;
>  FlashPartInfo *pi;
>  } M25P80Class;
>  
> @@ -880,7 +882,6 @@ static uint32_t m25p80_transfer8(SSISlave *ss, uint32_t 
> tx)
>  
>  static int m25p80_init(SSISlave *ss)
>  {
> -DriveInfo *dinfo;
>  Flash *s = M25P80(ss);
>  M25P80Class *mc = M25P80_GET_CLASS(s);
>  
> @@ -888,8 +889,18 @@ static int m25p80_init(SSISlave *ss)
>  
>  s->size = s->pi->sector_size * s->pi->n_sectors;
>  s->dirty_page = -1;
> +return 0;
> +}
> +
> +static void m25p80_realize(DeviceState *dev, Error **errp)
> +{
> +Flash *s = M25P80(dev);
> +M25P80Class *mc = M25P80_GET_CLASS(s);
> +DriveInfo *dinfo;
> +
> +/* initialize the SSISlave part */
> +mc->parent_dc_realize(dev, errp);
>  
> -/* FIXME use a qdev drive property instead of drive_get_next() */
>  dinfo = drive_get_next(IF_MTD);
>  
>  if (dinfo) {
> @@ -899,18 +910,15 @@ static int m25p80_init(SSISlave *ss)
>  
>  s->storage = blk_blockalign(s->blk, s->size);
>  
> -/* FIXME: Move to late init */
>  if (blk_pread(s->blk, 0, s->storage, s->size) != s->size) {
> -fprintf(stderr, "Failed to initialize SPI flash!\n");
> -return 1;
> +error_setg(errp, "failed to read the initial flash content");
> +return;
>  }
>  } else {
>  DB_PRINT_L(0, "No BDRV - binding to RAM\n");
>  s->storage = blk_blockalign(NULL, s->size);
>  memset(s->storage, 0xFF, s->size);
>  }
> -
> -return 0;
>  }
>  
>  static void m25p80_reset(DeviceState *d)
> @@ -967,6 +975,8 @@ static void m25p80_class_init(ObjectClass *klass, void 
> *data)
>  dc->vmsd = _m25p80;
>  dc->props = m25p80_properties;
>  dc->reset = m25p80_reset;
> +mc->parent_dc_realize = dc->realize;
> +dc->realize = m25p80_realize;
>  mc->pi = data;
>  }
>  
> 



[Qemu-block] [PATCH v2] m25p80: provide a realize to support late inits.

2016-06-15 Thread Cédric Le Goater
We also need to realize() the SSISlave part of the object. This is why
the previous realize() ops is stored in M25P80Class and called in the
object realize() ops.

This is fully compatible with the existing users of m25p80 and it
provides a way to handle errors on the drive backend.

Signed-off-by: Cédric Le Goater 
---

 Should apply on top of :
 
m25p80: fix test on blk_pread() return value
https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05574.html

 Changes since v2 :

 - Added a FIXME comment removed in v1

 hw/block/m25p80.c |   23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

Index: qemu-ast2400-mainline.git/hw/block/m25p80.c
===
--- qemu-ast2400-mainline.git.orig/hw/block/m25p80.c
+++ qemu-ast2400-mainline.git/hw/block/m25p80.c
@@ -28,6 +28,7 @@
 #include "hw/ssi/ssi.h"
 #include "qemu/bitops.h"
 #include "qemu/log.h"
+#include "qapi/error.h"
 
 #ifndef M25P80_ERR_DEBUG
 #define M25P80_ERR_DEBUG 0
@@ -339,6 +340,7 @@ typedef struct Flash {
 
 typedef struct M25P80Class {
 SSISlaveClass parent_class;
+DeviceRealize parent_dc_realize;
 FlashPartInfo *pi;
 } M25P80Class;
 
@@ -880,7 +882,6 @@ static uint32_t m25p80_transfer8(SSISlav
 
 static int m25p80_init(SSISlave *ss)
 {
-DriveInfo *dinfo;
 Flash *s = M25P80(ss);
 M25P80Class *mc = M25P80_GET_CLASS(s);
 
@@ -888,6 +889,17 @@ static int m25p80_init(SSISlave *ss)
 
 s->size = s->pi->sector_size * s->pi->n_sectors;
 s->dirty_page = -1;
+return 0;
+}
+
+static void m25p80_realize(DeviceState *dev, Error **errp)
+{
+Flash *s = M25P80(dev);
+M25P80Class *mc = M25P80_GET_CLASS(s);
+DriveInfo *dinfo;
+
+/* initialize the SSISlave part */
+mc->parent_dc_realize(dev, errp);
 
 /* FIXME use a qdev drive property instead of drive_get_next() */
 dinfo = drive_get_next(IF_MTD);
@@ -899,18 +911,15 @@ static int m25p80_init(SSISlave *ss)
 
 s->storage = blk_blockalign(s->blk, s->size);
 
-/* FIXME: Move to late init */
 if (blk_pread(s->blk, 0, s->storage, s->size) != s->size) {
-fprintf(stderr, "Failed to initialize SPI flash!\n");
-return 1;
+error_setg(errp, "failed to read the initial flash content");
+return;
 }
 } else {
 DB_PRINT_L(0, "No BDRV - binding to RAM\n");
 s->storage = blk_blockalign(NULL, s->size);
 memset(s->storage, 0xFF, s->size);
 }
-
-return 0;
 }
 
 static void m25p80_reset(DeviceState *d)
@@ -967,6 +976,8 @@ static void m25p80_class_init(ObjectClas
 dc->vmsd = _m25p80;
 dc->props = m25p80_properties;
 dc->reset = m25p80_reset;
+mc->parent_dc_realize = dc->realize;
+dc->realize = m25p80_realize;
 mc->pi = data;
 }
 



Re: [Qemu-block] [Qemu-devel] [PATCH] m25p80: provide a realize to support late inits.

2016-06-15 Thread Cédric Le Goater
On 06/15/2016 04:07 PM, Peter Maydell wrote:
> On 15 June 2016 at 15:00, Cédric Le Goater  wrote:
>> We also need to realize() the SSISlave part of the object. This is why
>> the previous realize() ops is stored in M25P80Class and called in the
>> object realize() ops.
>>
>> This is fully compatible with the existing users of m25p80 and it
>> provides a way to handle errors on the drive backend.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
> 
>> +static void m25p80_realize(DeviceState *dev, Error **errp)
>> +{
>> +Flash *s = M25P80(dev);
>> +M25P80Class *mc = M25P80_GET_CLASS(s);
>> +DriveInfo *dinfo;
>> +
>> +/* initialize the SSISlave part */
>> +mc->parent_dc_realize(dev, errp);
>>
>> -/* FIXME use a qdev drive property instead of drive_get_next() */
>>  dinfo = drive_get_next(IF_MTD);
> 
> Why have you removed the FIXME comment ?

This is an error. I should not have removed this one. Only the one related 
on late-init :/

C.




Re: [Qemu-block] [Qemu-devel] [PATCH] m25p80: provide a realize to support late inits.

2016-06-15 Thread Peter Maydell
On 15 June 2016 at 15:00, Cédric Le Goater  wrote:
> We also need to realize() the SSISlave part of the object. This is why
> the previous realize() ops is stored in M25P80Class and called in the
> object realize() ops.
>
> This is fully compatible with the existing users of m25p80 and it
> provides a way to handle errors on the drive backend.
>
> Signed-off-by: Cédric Le Goater 
> ---

> +static void m25p80_realize(DeviceState *dev, Error **errp)
> +{
> +Flash *s = M25P80(dev);
> +M25P80Class *mc = M25P80_GET_CLASS(s);
> +DriveInfo *dinfo;
> +
> +/* initialize the SSISlave part */
> +mc->parent_dc_realize(dev, errp);
>
> -/* FIXME use a qdev drive property instead of drive_get_next() */
>  dinfo = drive_get_next(IF_MTD);

Why have you removed the FIXME comment ?

thanks
-- PMM



Re: [Qemu-block] [PATCH 4/9] mirror: efficiently zero out target

2016-06-15 Thread Denis V. Lunev

On 06/15/2016 03:34 PM, Eric Blake wrote:

On 06/15/2016 02:46 AM, Denis V. Lunev wrote:

On 06/15/2016 06:00 AM, Eric Blake wrote:

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:

With a bdrv_co_write_zeroes method on a target BDS zeroes will not be
placed
into the wire. Thus the target could be very efficiently zeroed out.
This
is should be done with the largest chunk possible.


Probably nicer to track this in bytes.  And do you really want a
hard-coded arbitrary limit, or is it better to live with
MIN_NON_ZERO(target_bs->bl.max_pwrite_zeroes, INT_MAX)?

unfortunately we should. INT_MAX is not aligned as required.
May be we should align INT_MAX properly to fullfill
write_zeroes alignment.

Hmm, may be we can align INT_MAX properly down. OK,
I'll try to do that gracefully.

It's fairly easy to round a max_transfer or max_pwrite_zeroes down to an
aligned value; we already have code in io.c that does that in
bdrv_co_do_pwrite_zeroes().


ok




@@ -512,7 +513,8 @@ static int mirror_dirty_init(MirrorBlockJob *s)
 end = s->bdev_length / BDRV_SECTOR_SIZE;
   -if (base == NULL && !bdrv_has_zero_init(target_bs)) {
+if (base == NULL && !bdrv_has_zero_init(target_bs) &&
+target_bs->drv->bdrv_co_write_zeroes == NULL) {

Indentation is off, although if checkpatch.pl doesn't complain I guess
it doesn't matter that much.

Why should you care whether the target_bs->drv implements a callback?
Can't you just rely on the normal bdrv_*() functions to do the dirty
work of picking the most efficient implementation without you having to
bypass the block layer?  In fact, isn't that the whole goal of
bdrv_make_zero() - why not call that instead of reimplementing it?

this is the idea of the patch actually. If the callback is not
implemented, we
will have zeroes actually written or send to the wire. In this case
there is
not much sense to do that, the amount of data actually written will be
significantly increased (some areas will be written twice - with zeroes and
with the actual data).


But that's where bdrv_can_write_zeroes_with_unmap() comes in handy - you
can use the public interface to learn whether bdrv_make_zero() will be
efficient or not, without having to probe what the backend supports.


bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs)
{
BlockDriverInfo bdi;

if (bs->backing || !(bs->open_flags & BDRV_O_UNMAP)) {
return false;
}

if (bdrv_get_info(bs, ) == 0) {
return bdi.can_write_zeroes_with_unmap;
}

return false;
}


This function looks rotten. We CAN efficiently zero out
QCOW2 images even with backing store available. Though
the availability of the bdrv_co_write_zeroes does not
guarantee that it is working (NFS, CIFS etc for raw_posix.c).





On the other hand, if callback is implemented, we will have very small
amount
of data in the wire and written actually and thus will have a benefit. I am
trying to avoid very small chunks of data. Here (during the migration
process)
the data is sent with 10 Mb chunks and with takes a LOT of time with NBD.
We can send chunks 1.5 Gb (currently). They occupies the same 26 bytes
of data
on the transport layer.

I agree that we don't want to pre-initialize the device to zero unless
write zeroes is an efficient operation, but I don't think that the
existence of bs->drv->bdrv_co_[p]write_zeroes is the right way to find
that out.

I also think that we need to push harder on the NBD list that under the
new block limits proposal, we WANT to be able to advertise when the new
NBD_CMD_WRITE_ZEROES command will accept a larger size than
NBD_CMD_WRITE (as currently written, the BLOCK_INFO extension proposal
states that if a server advertises a max transaction size to the client,
then the client must honor that size for all commands including
NBD_CMD_WRITE_ZEROES, which would mean your 1.5G request [or my proposed
2G - 4k request] is invalid and would have to be a bunch of 32M requests).
https://sourceforge.net/p/nbd/mailman/message/35081223/




I see...



[Qemu-block] [PATCH] m25p80: provide a realize to support late inits.

2016-06-15 Thread Cédric Le Goater
We also need to realize() the SSISlave part of the object. This is why
the previous realize() ops is stored in M25P80Class and called in the
object realize() ops.

This is fully compatible with the existing users of m25p80 and it
provides a way to handle errors on the drive backend.

Signed-off-by: Cédric Le Goater 
---

 Should apply on top of :
 
m25p80: fix test on blk_pread() return value
https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05574.html

 hw/block/m25p80.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index 51d85960566f..c47722d3a3e5 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -28,6 +28,7 @@
 #include "hw/ssi/ssi.h"
 #include "qemu/bitops.h"
 #include "qemu/log.h"
+#include "qapi/error.h"
 
 #ifndef M25P80_ERR_DEBUG
 #define M25P80_ERR_DEBUG 0
@@ -339,6 +340,7 @@ typedef struct Flash {
 
 typedef struct M25P80Class {
 SSISlaveClass parent_class;
+DeviceRealize parent_dc_realize;
 FlashPartInfo *pi;
 } M25P80Class;
 
@@ -880,7 +882,6 @@ static uint32_t m25p80_transfer8(SSISlave *ss, uint32_t tx)
 
 static int m25p80_init(SSISlave *ss)
 {
-DriveInfo *dinfo;
 Flash *s = M25P80(ss);
 M25P80Class *mc = M25P80_GET_CLASS(s);
 
@@ -888,8 +889,18 @@ static int m25p80_init(SSISlave *ss)
 
 s->size = s->pi->sector_size * s->pi->n_sectors;
 s->dirty_page = -1;
+return 0;
+}
+
+static void m25p80_realize(DeviceState *dev, Error **errp)
+{
+Flash *s = M25P80(dev);
+M25P80Class *mc = M25P80_GET_CLASS(s);
+DriveInfo *dinfo;
+
+/* initialize the SSISlave part */
+mc->parent_dc_realize(dev, errp);
 
-/* FIXME use a qdev drive property instead of drive_get_next() */
 dinfo = drive_get_next(IF_MTD);
 
 if (dinfo) {
@@ -899,18 +910,15 @@ static int m25p80_init(SSISlave *ss)
 
 s->storage = blk_blockalign(s->blk, s->size);
 
-/* FIXME: Move to late init */
 if (blk_pread(s->blk, 0, s->storage, s->size) != s->size) {
-fprintf(stderr, "Failed to initialize SPI flash!\n");
-return 1;
+error_setg(errp, "failed to read the initial flash content");
+return;
 }
 } else {
 DB_PRINT_L(0, "No BDRV - binding to RAM\n");
 s->storage = blk_blockalign(NULL, s->size);
 memset(s->storage, 0xFF, s->size);
 }
-
-return 0;
 }
 
 static void m25p80_reset(DeviceState *d)
@@ -967,6 +975,8 @@ static void m25p80_class_init(ObjectClass *klass, void 
*data)
 dc->vmsd = _m25p80;
 dc->props = m25p80_properties;
 dc->reset = m25p80_reset;
+mc->parent_dc_realize = dc->realize;
+dc->realize = m25p80_realize;
 mc->pi = data;
 }
 
-- 
2.1.4




Re: [Qemu-block] [PATCH v2 04/17] nbd: Allow larger requests

2016-06-15 Thread Paolo Bonzini


On 14/06/2016 23:30, Eric Blake wrote:
> The NBD layer was breaking up request at a limit of 2040 sectors
> (just under 1M) to cater to old qemu-nbd. But the server limit
> was raised to 32M in commit 2d8214885 to match the kernel, more
> than three years ago; and the upstream NBD Protocol is proposing
> documentation that without any explicit communication to state
> otherwise, a client should be able to safely assume that a 32M
> transaction will work.  It is time to rely on the larger sizing,
> and any downstream distro that cares about maximum
> interoperability to older qemu-nbd servers can just tweak the
> value of #define NBD_MAX_SECTORS.
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v2: new patch
> ---
>  include/block/nbd.h | 1 +
>  block/nbd-client.c  | 4 
>  2 files changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index b86a976..36dde24 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -76,6 +76,7 @@ enum {
> 
>  /* Maximum size of a single READ/WRITE data buffer */
>  #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)
> +#define NBD_MAX_SECTORS (NBD_MAX_BUFFER_SIZE / BDRV_SECTOR_SIZE)
> 
>  ssize_t nbd_wr_syncv(QIOChannel *ioc,
>   struct iovec *iov,
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 4d13444..420bce8 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -269,10 +269,6 @@ static int nbd_co_writev_1(BlockDriverState *bs, int64_t 
> sector_num,
>  return -reply.error;
>  }
> 
> -/* qemu-nbd has a limit of slightly less than 1M per request.  Try to
> - * remain aligned to 4K. */
> -#define NBD_MAX_SECTORS 2040
> -
>  int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
>  int nb_sectors, QEMUIOVector *qiov)
>  {
> 

Acked-by: Paolo Bonzini 



Re: [Qemu-block] [PATCH v2 05/17] nbd: Advertise realistic limits to block layer

2016-06-15 Thread Paolo Bonzini


On 14/06/2016 23:30, Eric Blake wrote:
> We were basing the advertisement of maximum discard and transfer
> length off of UINT32_MAX, but since the rest of the block layer
> has signed int limits on a transaction, nothing could ever reach
> that maximum, and we risk overflowing an int once things are
> converted to byte-based rather than sector-based limits.  What's
> more, we DO have a much smaller limit: both the current kernel
> and qemu-nbd have a hard limit of 32M on a read or write
> transaction, and while they may also permit up to a full 32 bits
> on a discard transaction, the upstream NBD protocol is proposing
> wording that without any explicit advertisement otherwise,
> clients should limit ALL requests to the same limits as read and
> write, even though the other requests do not actually require as
> many bytes across the wire.  So the better limit to tell the
> block layer is 32M for both values.
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v2: new patch
> ---
>  block/nbd.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 6015e8b..bf67c8a 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -362,8 +362,8 @@ static int nbd_co_flush(BlockDriverState *bs)
> 
>  static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
> -bs->bl.max_discard = UINT32_MAX >> BDRV_SECTOR_BITS;
> -bs->bl.max_transfer_length = UINT32_MAX >> BDRV_SECTOR_BITS;
> +bs->bl.max_discard = NBD_MAX_SECTORS;
> +bs->bl.max_transfer_length = NBD_MAX_SECTORS;
>  }
> 
>  static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
> 

Acked-by: Paolo Bonzini 
Cc: qemu-sta...@nongnu.org



Re: [Qemu-block] [PATCH] m25p80: fix test on blk_pread() return value

2016-06-15 Thread Cédric Le Goater
On 06/15/2016 09:57 AM, Kevin Wolf wrote:
> Am 14.06.2016 um 18:02 hat Cédric Le Goater geschrieben:
>> On 06/14/2016 10:38 AM, Kevin Wolf wrote:
>>> Am 14.06.2016 um 10:02 hat Cédric Le Goater geschrieben:
>> #4  0x7fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, 
>> req=, offset=30878208, 
>> bytes=512, qiov=0x7fa7f47fee60, flags=0)
>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243
>> #5  0x7fa81c669ecb in bdrv_co_pwritev (bs=0x7fa81d4dd050, offset=8, 
>> bytes=512, qiov=0x7fa80d5191c0, 
>> flags=(BDRV_REQ_COPY_ON_READ | BDRV_REQ_ZERO_WRITE | 
>> BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_SERIALISING | BDRV_REQ_FUA | unknown: 
>> 4278124256), flags@entry=(unknown: 0))
>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1492
>
> That 'flags' value looks bogus...
>
>> #6  0x7fa81c65e367 in blk_co_pwritev (blk=0x7fa81d4c5b60, 
>> offset=30878208, bytes=256, qiov=0x7fa80d5191c0, 
>> flags=(unknown: 0)) at 
>> /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:788
>> #7  0x7fa81c65e49b in blk_aio_write_entry (opaque=0x7fa7e849aca0)
>> at 
>> /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:977
>> #8  0x7fa81c6c823a in coroutine_trampoline (i0=, 
>> i1=)
>> at 
>> /home/legoater/work/qemu/qemu-ast2400-mainline.git/util/coroutine-ucontext.c:78
>> #9  0x7fa818ea8f00 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>
> and we don't get anything further in the backtrace beyond coroutines, to
> see who's sending the bad parameters.  I recently debugged a bogus flags
> in bdrv_aio_preadv, by hoisting an assert to occur before coroutines are
> used in blk_aio_prwv():
>
> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02948.html
>
> I've just posted v2 of that patch (now a 2/2 series), but in v2 no
> longer kept the assert at that point.  But maybe the correct fix, and/or
> the hack for catching the bug prior to coroutines, will help you debug
> where the bad arguments are coming from.

 That does not fix the assert.
  
>> #10 0x7fa80d5189d0 in ?? ()
>> #11 0x in ?? ()
>> (gdb) up 4
>> #4  0x7fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, 
>> req=, offset=30878208, 
>> bytes=512, qiov=0x7fa7f47fee60, flags=0)
>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243
>> 1243 assert(!qiov || bytes == qiov->size);
>> (gdb) p *qiov 
>> $1 = {iov = 0x7fa81da671d0, niov = 1, nalloc = 1, size = 256}

 So, it seems that the issue is coming from the fact that bdrv_co_pwritev()
 does not handle alignments less than BDRV_SECTOR_SIZE :

/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment);

 It calls bdrv_aligned_pwritev() which does the assert : 

assert(!qiov || bytes == qiov->size);
>>>
>>> Yes, but between these two places, there is code that should actually
>>> enforce the right alignment:
>>>
>>> if ((offset + bytes) & (align - 1)) {
>>> ...
>>> }
>>>
>>> You can see in your backtrace that bdrv_aligned_pwritev() gets a
>>> different qiov than bdrv_co_pwritev() (which is local_qiov in the latter
>>> function).
>>>
>>> It's just unclear to me why this code extended bytes, but didn't add the
>>> tail_buf iovec to local_qiov.
>>
>> The gdb backtrace is bogus. It does not make sense. May be a gdb issue
>> with multithread on jessie.
>>
>> In the path tracking the tail bytes, we have : 
>>
>>  if ((offset + bytes) & (align - 1)) {
>>  ...
>   if (!use_local_qiov) {
>   qemu_iovec_init(_qiov, qiov->niov + 1);
>   qemu_iovec_concat(_qiov, qiov, 0, qiov->size);
>   use_local_qiov = true;
>   }
>> tail_bytes = (offset + bytes) & (align - 1);
>> qemu_iovec_add(_qiov, tail_buf + tail_bytes, align - 
>> tail_bytes);
>>
>> bytes = ROUND_UP(bytes, align);
>> }
>>
>> This is where the issue is I think. The qiov holds 256 and bytes 512.
>>
>> I have no idea how to fix that though.
> 
> Added some more context above. qiov->size as passed from the device is
> already 256 bytes, which are added to local_qiov with
> qemu_iovec_concat(). And then we add another 256 from tail_buf in the
> lines that you quoted, so in theory we should end up with a properly
> aligned 256 + 256 = 512 byte qiov.

yes. 

It seems that qiov is bogus after the bdrv_aligned_preadv() call. It gets 
zeroed most of the time, sometime ->size is 1, and then qemu_iovec_concat()
constructs an awful local_qiov, which brings the assert in 
bdrv_aligned_pwritev()

How's that possible ? Could it be a serialization issue ? 

C.




Re: [Qemu-block] [PATCH] linux-aio: Cancel BH if not needed

2016-06-15 Thread Paolo Bonzini


On 15/06/2016 13:16, Kevin Wolf wrote:
> linux-aio uses a BH in order to make sure that the remaining completions
> are processed even in nested event loops of completion callbacks in
> order to avoid deadlocks.
> 
> There is no need, however, to have the BH overhead for the first call
> into qemu_laio_completion_bh() or after all pending completions have
> already been processed. Therefore, this patch calls directly into
> qemu_laio_completion_bh() in qemu_laio_completion_cb() and cancels
> the BH after qemu_laio_completion_bh() has processed all pending
> completions.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/linux-aio.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/block/linux-aio.c b/block/linux-aio.c
> index fe7cece..e468960 100644
> --- a/block/linux-aio.c
> +++ b/block/linux-aio.c
> @@ -149,6 +149,8 @@ static void qemu_laio_completion_bh(void *opaque)
>  if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(>io_q.pending)) {
>  ioq_submit(s);
>  }
> +
> +qemu_bh_cancel(s->completion_bh);
>  }
>  
>  static void qemu_laio_completion_cb(EventNotifier *e)
> @@ -156,7 +158,7 @@ static void qemu_laio_completion_cb(EventNotifier *e)
>  LinuxAioState *s = container_of(e, LinuxAioState, e);
>  
>  if (event_notifier_test_and_clear(>e)) {
> -qemu_bh_schedule(s->completion_bh);
> +qemu_laio_completion_bh(s);
>  }
>  }
>  
> 

Reviewed-by: Paolo Bonzini 



Re: [Qemu-block] [PATCH 4/9] mirror: efficiently zero out target

2016-06-15 Thread Eric Blake
On 06/15/2016 02:46 AM, Denis V. Lunev wrote:
> On 06/15/2016 06:00 AM, Eric Blake wrote:
>> On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
>>> With a bdrv_co_write_zeroes method on a target BDS zeroes will not be
>>> placed
>>> into the wire. Thus the target could be very efficiently zeroed out.
>>> This
>>> is should be done with the largest chunk possible.
>>>

>> Probably nicer to track this in bytes.  And do you really want a
>> hard-coded arbitrary limit, or is it better to live with
>> MIN_NON_ZERO(target_bs->bl.max_pwrite_zeroes, INT_MAX)?
> unfortunately we should. INT_MAX is not aligned as required.
> May be we should align INT_MAX properly to fullfill
> write_zeroes alignment.
> 
> Hmm, may be we can align INT_MAX properly down. OK,
> I'll try to do that gracefully.

It's fairly easy to round a max_transfer or max_pwrite_zeroes down to an
aligned value; we already have code in io.c that does that in
bdrv_co_do_pwrite_zeroes().

> 
>>> @@ -512,7 +513,8 @@ static int mirror_dirty_init(MirrorBlockJob *s)
>>> end = s->bdev_length / BDRV_SECTOR_SIZE;
>>>   -if (base == NULL && !bdrv_has_zero_init(target_bs)) {
>>> +if (base == NULL && !bdrv_has_zero_init(target_bs) &&
>>> +target_bs->drv->bdrv_co_write_zeroes == NULL) {
>> Indentation is off, although if checkpatch.pl doesn't complain I guess
>> it doesn't matter that much.
>>
>> Why should you care whether the target_bs->drv implements a callback?
>> Can't you just rely on the normal bdrv_*() functions to do the dirty
>> work of picking the most efficient implementation without you having to
>> bypass the block layer?  In fact, isn't that the whole goal of
>> bdrv_make_zero() - why not call that instead of reimplementing it?
> this is the idea of the patch actually. If the callback is not
> implemented, we
> will have zeroes actually written or send to the wire. In this case
> there is
> not much sense to do that, the amount of data actually written will be
> significantly increased (some areas will be written twice - with zeroes and
> with the actual data).
>

But that's where bdrv_can_write_zeroes_with_unmap() comes in handy - you
can use the public interface to learn whether bdrv_make_zero() will be
efficient or not, without having to probe what the backend supports.

> On the other hand, if callback is implemented, we will have very small
> amount
> of data in the wire and written actually and thus will have a benefit. I am
> trying to avoid very small chunks of data. Here (during the migration
> process)
> the data is sent with 10 Mb chunks and with takes a LOT of time with NBD.
> We can send chunks 1.5 Gb (currently). They occupies the same 26 bytes
> of data
> on the transport layer.

I agree that we don't want to pre-initialize the device to zero unless
write zeroes is an efficient operation, but I don't think that the
existence of bs->drv->bdrv_co_[p]write_zeroes is the right way to find
that out.

I also think that we need to push harder on the NBD list that under the
new block limits proposal, we WANT to be able to advertise when the new
NBD_CMD_WRITE_ZEROES command will accept a larger size than
NBD_CMD_WRITE (as currently written, the BLOCK_INFO extension proposal
states that if a server advertises a max transaction size to the client,
then the client must honor that size for all commands including
NBD_CMD_WRITE_ZEROES, which would mean your 1.5G request [or my proposed
2G - 4k request] is invalid and would have to be a bunch of 32M requests).
https://sourceforge.net/p/nbd/mailman/message/35081223/

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 3/9] mirror: optimize dirty bitmap filling in mirror_run a bit

2016-06-15 Thread Eric Blake
On 06/15/2016 02:41 AM, Denis V. Lunev wrote:
> On 06/15/2016 05:36 AM, Eric Blake wrote:
>> On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
>>> There is no need to scan allocation tables if we have mark_all_dirty
>>> flag
>>> set. Just mark it all dirty.
>>>

>>>   int ret, n;
>>> end = s->bdev_length / BDRV_SECTOR_SIZE;
>>>   +if (base == NULL && !bdrv_has_zero_init(target_bs)) {
>>> +bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end);
>> Won't work as written.  'end' is 64 bits, but bdrv_set_dirty_bitmap() is
>> limited to a 32-bit sector count.  Might be first worth updating
>> bdrv_set_dirty_bitmap() and friends to be byte-based rather than
>> sector-based (but still tracking a sector per bit, obviously), as well
>> as expand it to operate on 64-bit sizes rather than 32-bit.
> very nice catch! thank you
> 

[meta-comment - your reply style is a bit hard to read. It's best to
include a blank line both before and after any text you write when
replying in context, as the extra spacing calls visual attention to your
reply rather than being part of a wall of text]

>> I'm also worried slightly that the existing code repeated things in a
>> loop, and therefore had pause points every iteration and could thus
>> remain responsive to an early cancel.  But doing the entire operation in
>> one chunk (assuming you fix bitmap code to handle a 64-bit size) may end
>> up running for so long without interruption that you lose the benefits
>> of an early interruption that you have by virtue of a 32-bit limit.
>>
> I do not think that this should be worried actually. We just perform memset
> inside for not that big area (1 Tb disk will have 2 Mb dirty area
> bitmap) under
> default parameters.
> 

Okay, so the real slowdown in the loop was the rest of the code
(checking backing status in bdrv_is_allocated_above() and not in
actually writing to the bitmap (bdrv_set_dirty_bitmap()).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Nbd] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-15 Thread Wouter Verhelst
On Wed, Jun 15, 2016 at 11:27:21AM +0100, Alex Bligh wrote:
> Perhaps this should read "If an error occurs, the server MUST either initiate
> a hard disconnect before the entire payload has been sent or
> set the appropriate code in the error field and send the response header
> without any payload." if we want to go down this route.

Something along those lines is what I meant, yes.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Re: [Qemu-block] Supplying QCOW2 as 'file' driver to `blockdev-add` results in a QEMU crash

2016-06-15 Thread Kashyap Chamarthy
On Wed, Jun 15, 2016 at 11:58:31AM +0200, Kashyap Chamarthy wrote:
> Seems like supplying "qcow2" file BlockdevDriver option to QMP
> `blockdev-add` results in a SIGSEGV:
> 
>   [...]
> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x55a0121f in visit_type_BlockdevRef ()
>   [...]
> 

[...]
 
> Then, invoke the 'blockdev-add' QMP command with these arguments and options:
> 
> $ socat UNIX:/export/qmp-sock 
> READLINE,history=$HOME/.qmp_history,prompt='QMP> '
> {"QMP": {"version": {"qemu": {"micro": 0, "minor": 6, "major": 2}, 
> "package": " (qemu-2.6.0-3.fc24)"}, "capabilities": []}}
> QMP> {"execute":"qmp_capabilities"}
> {"return": {}}
> 
> QMP> { "execute": "blockdev-add",
>  "arguments": { "options" : { "driver": "qcow2", 
>  "id": "drive-ide1-0-0",
>  "file": { "driver": "qcow2",
>"filename": "backup1.qcow2" } 
> } } }
> 

[...]

Related SIGSEGV case:

(1) driver: raw, file: driver: raw

QMP> { "execute": "blockdev-add",
 "arguments": { "options" : { "driver": "raw", 
 "id": "drive-ide1-0-0",
 "file": { "driver": "raw",
   "filename": "/tmp/test1.raw" } } 
} }


And the below are the *good* cases, where the block device is added
successfully:

(2) driver: qcow2, file: driver: file

$ qemu-img create -f qcow2 /tmp/test2.qcow2 512M

QMP> { "execute": "blockdev-add",
 "arguments": { "options" : { "driver": "qcow2", 
 "id": "drive-ide2-0-0",
 "file": { "driver": "file",
   "filename": "/tmp/test2.qcow2" } 
} } }
{"return": {}}


(3) driver: raw, file: driver: file

$ qemu-img create -f raw /tmp/test3.raw 512M

QMP> { "execute": "blockdev-add",
 "arguments": { "options" : { "driver": "raw", 
 "id": "drive-ide3-0-0",
 "file": { "driver": "file",
   "filename": "/tmp/test3.raw" } } 
} }
{"return": {}}

-- 
/kashyap



[Qemu-block] [PATCH] linux-aio: Cancel BH if not needed

2016-06-15 Thread Kevin Wolf
linux-aio uses a BH in order to make sure that the remaining completions
are processed even in nested event loops of completion callbacks in
order to avoid deadlocks.

There is no need, however, to have the BH overhead for the first call
into qemu_laio_completion_bh() or after all pending completions have
already been processed. Therefore, this patch calls directly into
qemu_laio_completion_bh() in qemu_laio_completion_cb() and cancels
the BH after qemu_laio_completion_bh() has processed all pending
completions.

Signed-off-by: Kevin Wolf 
---
 block/linux-aio.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index fe7cece..e468960 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -149,6 +149,8 @@ static void qemu_laio_completion_bh(void *opaque)
 if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(>io_q.pending)) {
 ioq_submit(s);
 }
+
+qemu_bh_cancel(s->completion_bh);
 }
 
 static void qemu_laio_completion_cb(EventNotifier *e)
@@ -156,7 +158,7 @@ static void qemu_laio_completion_cb(EventNotifier *e)
 LinuxAioState *s = container_of(e, LinuxAioState, e);
 
 if (event_notifier_test_and_clear(>e)) {
-qemu_bh_schedule(s->completion_bh);
+qemu_laio_completion_bh(s);
 }
 }
 
-- 
1.8.3.1




Re: [Qemu-block] [Qemu-devel] [PATCH 0/9] major rework of drive-mirror

2016-06-15 Thread Dr. David Alan Gilbert
* Denis V. Lunev (d...@openvz.org) wrote:
> Block commit of the active image to the backing store on a slow disk
> could never end. For example with the guest with the following loop
> inside
> while true; do
> dd bs=1k count=1 if=/dev/zero of=x
> done
> running above slow storage could not complete the operation with a
> resonable amount of time:
> virsh blockcommit rhel7 sda --active --shallow
> virsh qemu-monitor-event
> virsh qemu-monitor-command rhel7 \
> '{"execute":"block-job-complete",\
>   "arguments":{"device":"drive-scsi0-0-0-0"} }'
> virsh qemu-monitor-event
> Completion event is never received.
> 
> This problem could not be fixed easily with the current architecture. We
> should either prohibit guest writes (making dirty bitmap dirty) or switch
> to the sycnchronous scheme.
> 
> This series switches driver mirror to synch scheme. Actually we can have
> something more intelligent and switch to sync mirroring just after
> the first pass over the bitmap. Though this could be done relatively
> easily during discussion. The most difficult things are here.

Some random thoughts:
   a) People have been asking for post-copy for block storage, and this
solves the same problem a different way.
   b) I guess the other way is to turn on write-throttling - that would
also guarantee the rate at which writes happen.
   c) Don't you end up with something verymuch like what the colo block 
replication
guys have ended up with?

Dave

> The set also adds some performance improvements dealing with
> known-to-be-zero areas.
> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> 
> Denis V. Lunev (9):
>   mirror: fix calling of blk_aio_pwritev/blk_aio_preadv
>   mirror: create mirror_dirty_init helper for mirror_run
>   mirror: optimize dirty bitmap filling in mirror_run a bit
>   mirror: efficiently zero out target
>   mirror: improve performance of mirroring of empty disk
>   block: pass qiov into before_write notifier
>   mirror: allow to save buffer for QEMUIOVector in MirrorOp
>   mirror: use synch scheme for drive mirror
>   mirror: replace bdrv_dirty_bitmap with plain hbitmap
> 
>  block/io.c|  12 +-
>  block/mirror.c| 290 
> +++---
>  include/block/block_int.h |   1 +
>  3 files changed, 229 insertions(+), 74 deletions(-)
> 
> -- 
> 2.5.0
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-block] [Qemu-devel] [PATCH 5/9] mirror: improve performance of mirroring of empty disk

2016-06-15 Thread Denis V. Lunev

On 06/15/2016 12:19 PM, Stefan Hajnoczi wrote:

On Tue, Jun 14, 2016 at 09:20:47PM -0600, Eric Blake wrote:

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:

We should not take into account zero blocks for delay calculations.
They are not read and thus IO throttling is not required. In the
other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes
days.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Jeff Cody 
CC: Eric Blake 
---
  block/mirror.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

Seems reasonable, but I'll let others more familiar with throttling give
the final say.

There is a bounce buffer fallback when !drv->bdrv_co_pwrite_zeroes.  In
that case we need to account for the bytes transferred.  I don't see
where the patch takes this into account.

Interesting point.

I think we are charging for IO performed. Thus with the
absence of the callback we should charge for io_sectors/2
The write will be full scale, there is no reading.

Correct?



Re: [Qemu-block] [PATCH 0/9] major rework of drive-mirror

2016-06-15 Thread Denis V. Lunev

On 06/15/2016 01:25 PM, Kevin Wolf wrote:

Am 15.06.2016 um 11:34 hat Denis V. Lunev geschrieben:

On 06/15/2016 12:06 PM, Kevin Wolf wrote:

The second big thing is that I don't want to see new users of the
notifiers in I/O functions. Let's try if we can't add a filter
BlockDriver instead. Then we'd add an option to set the filter node-name
in the mirror QMP command so that the management tool is aware of the
node and can refer to it.

this will be much more difficult to implement at my experience.

I agree that it will be more difficult, though I'm not sure whether it's
much more.


Can you share more details why filters are bad?

Basically, notifiers (and other code for supporting other features in
the common I/O path) just don't fit the block layer design very well.
They are more hacked on instead of designed in.

This shows in different ways in different places, so I can't fully tell
you the problems that using notifiers in the mirror code would cause
(which is a problem in itself), but for example the copy-on-read code
that has been added to the core block layer instead of a separate filter
driver had trouble with ignoring its own internal requests that it made
to the BDS, which was hacked around with a new request flag, i.e. making
the core block layer even more complicated.


An example that I can think of with respect to mirroring is taking a
snapshot during the operation. Op blockers prevent this from happening
at the moment, but it seems to be a very reasonable thing for a user to
want. Let me use the filter node approach to visualise this:

 (raw-posix)(qcow2)
 source_file <- source <- mirror <- virtio-blk
|
 target_file <- target <+

There are two ways to create a snapshot: Either you put it logically
between the mirror and virtio-blk, so that only source (now a backing
file), but no new writes will be mirrored. This is easy in both
approaches, but maybe the less commonly wanted thing.

The other option is putting the new overlay between source and mirror:

 (raw-posix)(qcow2)
 source_file <- source <- snap <- mirror <- virtio-blk
|
 target_file <- target <+

With the mirror intercept being its own BDS node, making this change is
very easy and doesn't involve any code that is specific to mirroring.

If it doesn't have a filter driver and uses notifiers, however, the
mirror is directly tied to the source BDS, and now it doesn't just work,
but you need extra code that explicitly moves the notifier from the
source BDS to the snap BDS. And you need such code not only for
mirroring, but for everything that potentially hooks into the I/O
functions.


Maybe these two examples give you an idea why I want to use the concepts
that are designed into the block layer with much thought rather than
hacked on notifiers. Notifiers may be simpler to implement in the first
place, but they lead to a less consistent and harder to maintain block
layer in the long run.

ok. great explanation, thank you



If we don't do this now, we'll have to introduce it later and can't be
sure that the management tool knows about it. This would complicate
things quite a bit because we would have to make sure that the added
node stays invisible to the management tool.


I think these two things are the big architectural questions. The rest
is hopefully more or less implementation details.

I completely agree with you.

We have the following choices:
1. implement parameter to use 'active'/'passive' mode from the very
beginning
2. switch to 'active' mode upon receiving "block-job-complete"
command unconditionally
3. switch to 'active' mode upon receiving "block-job-complete"
command with proper parameter
4. switch to 'active' mode after timeout (I personally do not like
this option)

I think that choices 1 and 3 do not contradict each other and
could be implemented to gather.

I think we definitely want 1. and some way to switch after the fact.
That you suggest three different conditions for doing the switch
suggests that it is policy and doesn't belong in qemu, but in the
management tools. So how about complementing 1. with 5.?

5. Provide a QMP command to switch between active and passive mode

reasonable. looks OK to me.

Den



Re: [Qemu-block] [Nbd] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-15 Thread Paolo Bonzini


On 15/06/2016 12:27, Alex Bligh wrote:
> 
> On 15 Jun 2016, at 10:18, Paolo Bonzini  wrote:
> 
>>> So what should those servers do (like 2 of mine) which don't buffer
>>> the entire read, if they get an error having already sent some data?
>>
>> They have sent an error code of zero, and it turned out to be wrong.  So
>> the only thing they can do safely is disconnect.
> 
> Right, but that is not what Wouter's change says:
> 
> +If an error occurs, the server SHOULD set the appropriate error code
> +in the error field. The server MAY then initiate a hard disconnect.
> +If it chooses not to, it MUST NOT send any payload for this request.
> 
> I read this as either
> 
> a) the server can issue a hard disconnect without sending any reply; or
> 
> b) it must send the reply header with no payload
> 
> It also seems to permit not setting the error code (it's only a 'SHOULD'),
> not disconnecting (it's a MAY), then not sending any payload, which is a
> nonsense.

Right.

> Perhaps this should read "If an error occurs, the server MUST either initiate
> a hard disconnect before the entire payload has been sent or
> set the appropriate code in the error field and send the response header
> without any payload." if we want to go down this route.

Yes, I agree.

I do believe we want to go down this route.  I think we agree that
partial buffering may always require the server to disconnect after an
error.  Therefore I don't see any benefit at all in sending a payload
after an error message.

Paolo



Re: [Qemu-block] [PATCH v2 0/6] block: Enable byte granularity I/O

2016-06-15 Thread Kevin Wolf
Am 15.06.2016 um 11:02 hat Stefan Hajnoczi geschrieben:
> On Tue, Jun 14, 2016 at 03:32:29PM +0200, Kevin Wolf wrote:
> > Previous series have already converted some block drivers to byte-based 
> > rather
> > than sector-based interfaces. However, the common I/O path as well as 
> > raw-posix
> > still enforced a minimum alignment of 512 bytes because some sector-based 
> > logic
> > was involved.
> > 
> > This patch series removes these limitations and a sub-sector request 
> > actually
> > ends up as a sub-sector syscall now.
> > 
> > v2:
> > - Updated trace-events for bdrv_co_do_copy_on_readv() [Eric]
> > - Added some assertions [Eric]
> > - Renamed laio_submit_co() -> laio_co_submit() and added coroutine_fn
> >   to its prototype [Stefan]
> > - linux-aio: Include block/block.h and use BDRV_SECTOR_SIZE instead
> >   of 512 [Eric]
> > 
> > Kevin Wolf (6):
> >   block: Byte-based bdrv_co_do_copy_on_readv()
> >   block: Prepare bdrv_aligned_preadv() for byte-aligned requests
> >   block: Prepare bdrv_aligned_pwritev() for byte-aligned requests
> >   raw-posix: Switch to bdrv_co_* interfaces
> >   raw-posix: Implement .bdrv_co_preadv/pwritev
> >   block: Don't enforce 512 byte minimum alignment
> > 
> >  block.c   |   2 +-
> >  block/io.c| 128 
> > ++
> >  block/linux-aio.c |  84 -
> >  block/mirror.c|  10 ++--
> >  block/raw-aio.h   |   3 ++
> >  block/raw-posix.c |  62 
> >  include/block/block.h |  10 ++--
> >  trace-events  |   2 +-
> >  8 files changed, 176 insertions(+), 125 deletions(-)
> 
> Reviewed-by: Stefan Hajnoczi 

Thanks. Applied to my block branch.

Kevin


pgpttWnD7ifTa.pgp
Description: PGP signature


Re: [Qemu-block] [Nbd] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-15 Thread Alex Bligh

On 15 Jun 2016, at 10:18, Paolo Bonzini  wrote:

>> So what should those servers do (like 2 of mine) which don't buffer
>> the entire read, if they get an error having already sent some data?
> 
> They have sent an error code of zero, and it turned out to be wrong.  So
> the only thing they can do safely is disconnect.

Right, but that is not what Wouter's change says:

+If an error occurs, the server SHOULD set the appropriate error code
+in the error field. The server MAY then initiate a hard disconnect.
+If it chooses not to, it MUST NOT send any payload for this request.

I read this as either

a) the server can issue a hard disconnect without sending any reply; or

b) it must send the reply header with no payload

It also seems to permit not setting the error code (it's only a 'SHOULD'),
not disconnecting (it's a MAY), then not sending any payload, which is a
nonsense.

Perhaps this should read "If an error occurs, the server MUST either initiate
a hard disconnect before the entire payload has been sent or
set the appropriate code in the error field and send the response header
without any payload." if we want to go down this route.


-- 
Alex Bligh







Re: [Qemu-block] [PATCH 0/9] major rework of drive-mirror

2016-06-15 Thread Kevin Wolf
Am 15.06.2016 um 11:34 hat Denis V. Lunev geschrieben:
> On 06/15/2016 12:06 PM, Kevin Wolf wrote:
> >The second big thing is that I don't want to see new users of the
> >notifiers in I/O functions. Let's try if we can't add a filter
> >BlockDriver instead. Then we'd add an option to set the filter node-name
> >in the mirror QMP command so that the management tool is aware of the
> >node and can refer to it.
> this will be much more difficult to implement at my experience.

I agree that it will be more difficult, though I'm not sure whether it's
much more.

> Can you share more details why filters are bad?

Basically, notifiers (and other code for supporting other features in
the common I/O path) just don't fit the block layer design very well.
They are more hacked on instead of designed in.

This shows in different ways in different places, so I can't fully tell
you the problems that using notifiers in the mirror code would cause
(which is a problem in itself), but for example the copy-on-read code
that has been added to the core block layer instead of a separate filter
driver had trouble with ignoring its own internal requests that it made
to the BDS, which was hacked around with a new request flag, i.e. making
the core block layer even more complicated.


An example that I can think of with respect to mirroring is taking a
snapshot during the operation. Op blockers prevent this from happening
at the moment, but it seems to be a very reasonable thing for a user to
want. Let me use the filter node approach to visualise this:

(raw-posix)(qcow2)
source_file <- source <- mirror <- virtio-blk
   |
target_file <- target <+

There are two ways to create a snapshot: Either you put it logically
between the mirror and virtio-blk, so that only source (now a backing
file), but no new writes will be mirrored. This is easy in both
approaches, but maybe the less commonly wanted thing.

The other option is putting the new overlay between source and mirror:

(raw-posix)(qcow2)
source_file <- source <- snap <- mirror <- virtio-blk
   |
target_file <- target <+

With the mirror intercept being its own BDS node, making this change is
very easy and doesn't involve any code that is specific to mirroring.

If it doesn't have a filter driver and uses notifiers, however, the
mirror is directly tied to the source BDS, and now it doesn't just work,
but you need extra code that explicitly moves the notifier from the
source BDS to the snap BDS. And you need such code not only for
mirroring, but for everything that potentially hooks into the I/O
functions.


Maybe these two examples give you an idea why I want to use the concepts
that are designed into the block layer with much thought rather than
hacked on notifiers. Notifiers may be simpler to implement in the first
place, but they lead to a less consistent and harder to maintain block
layer in the long run.

> >If we don't do this now, we'll have to introduce it later and can't be
> >sure that the management tool knows about it. This would complicate
> >things quite a bit because we would have to make sure that the added
> >node stays invisible to the management tool.
> >
> >
> >I think these two things are the big architectural questions. The rest
> >is hopefully more or less implementation details.
> I completely agree with you.
> 
> We have the following choices:
> 1. implement parameter to use 'active'/'passive' mode from the very
> beginning
> 2. switch to 'active' mode upon receiving "block-job-complete"
> command unconditionally
> 3. switch to 'active' mode upon receiving "block-job-complete"
> command with proper parameter
> 4. switch to 'active' mode after timeout (I personally do not like
> this option)
> 
> I think that choices 1 and 3 do not contradict each other and
> could be implemented to gather.

I think we definitely want 1. and some way to switch after the fact.
That you suggest three different conditions for doing the switch
suggests that it is policy and doesn't belong in qemu, but in the
management tools. So how about complementing 1. with 5.?

5. Provide a QMP command to switch between active and passive mode

Kevin



[Qemu-block] Supplying QCOW2 as 'file' driver to `blockdev-add` results in a QEMU crash

2016-06-15 Thread Kashyap Chamarthy
Seems like supplying "qcow2" file BlockdevDriver option to QMP
`blockdev-add` results in a SIGSEGV:

[...]
Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x55a0121f in visit_type_BlockdevRef ()
[...]

Reproducer
--

Tested with: qemu-2.6.0-3.fc24

Invoke this QEMU command-line (QMP server over Unix socket) in GDB:

$ gdb /usr/bin/qemu-system-x86_64
[...]
(gdb) run -machine accel=kvm -name cirrvm -S -machine 
pc-i440fx-2.1,accel=kvm,usb=off -cpu SandyBridge -m 977 -realtime mlock=off 
-smp 1,sockets=1,cores=1,threads=1 -nographic -no-user-confi
g -nodefaults -chardev 
socket,id=charmonitor,path=/var/tmp/cirrvm.monitor,server,nowait -mon 
chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global 
kvm-pit.lost_tick
_policy=discard -no-hpet -no-shutdown -boot strict=on -device 
ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x3.0x7 -drive 
file=./cirros-0.3.3.qcow2,if=none,id=drive-ide0-0-0,driver=qcow2 -device ide
-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -chardev 
pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device 
virtio-balloon-pci,id=balloon0,bus=pci
.0,addr=0x4 -msg timestamp=on -qmp unix:./qmp-sock,server --monitor stdio
[...]

Then, invoke the 'blockdev-add' QMP command with these arguments and options:

$ socat UNIX:/export/qmp-sock 
READLINE,history=$HOME/.qmp_history,prompt='QMP> '
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 6, "major": 2}, 
"package": " (qemu-2.6.0-3.fc24)"}, "capabilities": []}}
QMP> {"execute":"qmp_capabilities"}
{"return": {}}

QMP> { "execute": "blockdev-add",
 "arguments": { "options" : { "driver": "qcow2", 
 "id": "drive-ide1-0-0",
 "file": { "driver": "qcow2",
   "filename": "backup1.qcow2" } } 
} }


Backtrace
-

[...]
Starting program: /usr/bin/qemu-system-x86_64 -machine accel=kvm -name cirrvm 
-S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu SandyBridge -m 977 -realtime 
mlock=off -smp 1,sockets=1,cores=1
,threads=1 -nographic -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/tmp/cirrvm.monitor,server,nowait -mon 
chardev=charmonitor,id=monitor,mode=control -rtc base=utc,dri
ftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot 
strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x3.0x7 -drive 
file=./cirros-0.3.3.qcow2,if=none,id=dri
ve-ide0-0-0,driver=qcow2 -device 
ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -chardev 
pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device vi
rtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on -qmp 
unix:./qmp-sock,server --monitor stdio
[...]

[New Thread 0x7fffcb792700 (LWP 2169)]
char device redirected to /dev/pts/50 (label charserial0)
QEMU waiting for connection on: disconnected:unix:./qmp-sock,server
[New Thread 0x7fffcad7f700 (LWP 2234)]
QEMU 2.6.0 monitor - type 'help' for more information
(qemu) [New Thread 0x7fffca57e700 (LWP 2235)]
[Thread 0x7fffcad7f700 (LWP 2234) exited]

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x55a0121f in visit_type_BlockdevRef ()
(gdb) thread apply all bt full

Thread 4 (Thread 0x7fffca57e700 (LWP 2235)):
#0  0x7fffdabf4bd0 in pthread_cond_wait@@GLIBC_2.3.2 () at 
/lib64/libpthread.so.0
#1  0x55a199e9 in qemu_cond_wait ()
#2  0x5571e26f in qemu_kvm_cpu_thread_fn ()
#3  0x7fffdabef5ca in start_thread () at /lib64/libpthread.so.0
#4  0x7fffda928ead in clone () at /lib64/libc.so.6

Thread 2 (Thread 0x7fffcb792700 (LWP 2169)):
#0  0x7fffda922ff9 in syscall () at /lib64/libc.so.6
#1  0x55a19cf8 in qemu_event_wait ()
#2  0x55a27e6e in call_rcu_thread ()
#3  0x7fffdabef5ca in start_thread () at /lib64/libpthread.so.0
#4  0x7fffda928ead in clone () at /lib64/libc.so.6

Thread 1 (Thread 0x77ed0f80 (LWP 2162)):
#0  0x55a0121f in visit_type_BlockdevRef ()
#1  0x55a016a2 in visit_type_BlockdevOptionsGenericFormat_members ()
#2  0x55a01903 in visit_type_BlockdevOptionsGenericCOWFormat_members ()
#3  0x55a01a53 in visit_type_BlockdevOptionsQcow2_members ()
#4  0x55a010d5 in visit_type_BlockdevOptions_members ()
#5  0x55a012c8 in visit_type_BlockdevRef ()
#6  0x55a016a2 in visit_type_BlockdevOptionsGenericFormat_members ()
#7  0x55a01903 in visit_type_BlockdevOptionsGenericCOWFormat_members ()
#8  0x55a01a53 in visit_type_BlockdevOptionsQcow2_members ()
#9  0x55a010d5 in visit_type_BlockdevOptions_members ()
#10 0x55a0116f in visit_type_BlockdevOptions ()
#11 0x55a077a2 in visit_type_q_obj_blockdev_add_arg_members ()
#12 0x5580691b in qmp_marshal_blockdev_add ()
#13 0x55721460 in handle_qmp_command ()
#14 

Re: [Qemu-block] [PATCH 0/9] major rework of drive-mirror

2016-06-15 Thread Stefan Hajnoczi
On Tue, Jun 14, 2016 at 06:25:07PM +0300, Denis V. Lunev wrote:
> Block commit of the active image to the backing store on a slow disk
> could never end. For example with the guest with the following loop
> inside
> while true; do
> dd bs=1k count=1 if=/dev/zero of=x
> done
> running above slow storage could not complete the operation with a
> resonable amount of time:
> virsh blockcommit rhel7 sda --active --shallow
> virsh qemu-monitor-event
> virsh qemu-monitor-command rhel7 \
> '{"execute":"block-job-complete",\
>   "arguments":{"device":"drive-scsi0-0-0-0"} }'
> virsh qemu-monitor-event
> Completion event is never received.
> 
> This problem could not be fixed easily with the current architecture. We
> should either prohibit guest writes (making dirty bitmap dirty) or switch
> to the sycnchronous scheme.
> 
> This series switches driver mirror to synch scheme. Actually we can have
> something more intelligent and switch to sync mirroring just after
> the first pass over the bitmap. Though this could be done relatively
> easily during discussion. The most difficult things are here.
> 
> The set also adds some performance improvements dealing with
> known-to-be-zero areas.

Patches 1-5 look pretty good to me.

I think the synchronous I/O mirroring approach will need more discussion
and it could be split into a separate series so the earlier patches can
be merged already.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH 8/9] mirror: use synch scheme for drive mirror

2016-06-15 Thread Stefan Hajnoczi
On Tue, Jun 14, 2016 at 06:25:15PM +0300, Denis V. Lunev wrote:
> Block commit of the active image to the backing store on a slow disk
> could never end. For example with the guest with the following loop
> inside
> while true; do
> dd bs=1k count=1 if=/dev/zero of=x
> done
> running above slow storage could not complete the operation with a
> resonable amount of time:
> virsh blockcommit rhel7 sda --active --shallow
> virsh qemu-monitor-event
> virsh qemu-monitor-command rhel7 \
> '{"execute":"block-job-complete",\
>   "arguments":{"device":"drive-scsi0-0-0-0"} }'
> virsh qemu-monitor-event
> Completion event is never received.
> 
> This problem could not be fixed easily with the current architecture. We
> should either prohibit guest writes (making dirty bitmap dirty) or switch
> to the sycnchronous scheme.
> 
> This patch implements the latter. It adds mirror_before_write_notify
> callback. In this case all data written from the guest is synchnonously
> written to the mirror target. Though the problem is solved partially.
> We should switch from bdrv_dirty_bitmap to simple hbitmap. This will be
> done in the next patch.
> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/mirror.c | 78 
> ++
>  1 file changed, 78 insertions(+)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 7471211..086256c 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -58,6 +58,9 @@ typedef struct MirrorBlockJob {
>  QSIMPLEQ_HEAD(, MirrorBuffer) buf_free;
>  int buf_free_count;
>  
> +NotifierWithReturn before_write;
> +CoQueue dependent_writes;
> +
>  unsigned long *in_flight_bitmap;
>  int in_flight;
>  int sectors_in_flight;
> @@ -125,6 +128,7 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
>  g_free(op->buf);

qemu_vfree() must be used for qemu_blockalign() memory.

>  g_free(op);
>  
> +qemu_co_queue_restart_all(>dependent_writes);
>  if (s->waiting_for_io) {
>  qemu_coroutine_enter(s->common.co, NULL);
>  }
> @@ -511,6 +515,74 @@ static void mirror_exit(BlockJob *job, void *opaque)
>  bdrv_unref(src);
>  }
>  
> +static int coroutine_fn mirror_before_write_notify(
> +NotifierWithReturn *notifier, void *opaque)
> +{
> +MirrorBlockJob *s = container_of(notifier, MirrorBlockJob, before_write);
> +BdrvTrackedRequest *req = opaque;
> +MirrorOp *op;
> +int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> +int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
> +int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
> +int64_t end_sector = sector_num + nb_sectors;
> +int64_t aligned_start, aligned_end;
> +
> +if (req->type != BDRV_TRACKED_DISCARD && req->type != 
> BDRV_TRACKED_WRITE) {
> +/* this is not discard and write, we do not care */
> +return 0;
> +}
> +
> +while (1) {
> +bool waited = false;
> +int64_t sn;
> +
> +for (sn = sector_num; sn < end_sector; sn += sectors_per_chunk) {
> +int64_t chunk = sn / sectors_per_chunk;
> +if (test_bit(chunk, s->in_flight_bitmap)) {
> +trace_mirror_yield_in_flight(s, chunk, s->in_flight);
> +qemu_co_queue_wait(>dependent_writes);
> +waited = true;
> +}
> +}
> +
> +if (!waited) {
> +break;
> +}
> +}
> +
> +aligned_start = QEMU_ALIGN_UP(sector_num, sectors_per_chunk);
> +aligned_end = QEMU_ALIGN_DOWN(sector_num + nb_sectors, 
> sectors_per_chunk);
> +if (aligned_end > aligned_start) {
> +bdrv_reset_dirty_bitmap(s->dirty_bitmap, aligned_start,
> +aligned_end - aligned_start);
> +}
> +
> +if (req->type == BDRV_TRACKED_DISCARD) {
> +mirror_do_zero_or_discard(s, sector_num, nb_sectors, true);
> +return 0;
> +}
> +
> +s->in_flight++;
> +s->sectors_in_flight += nb_sectors;
> +
> +/* Allocate a MirrorOp that is used as an AIO callback.  */
> +op = g_new(MirrorOp, 1);
> +op->s = s;
> +op->sector_num = sector_num;
> +op->nb_sectors = nb_sectors;
> +op->buf = qemu_try_blockalign(blk_bs(s->target), req->qiov->size);
> +if (op->buf == NULL) {
> +g_free(op);
> +return -ENOMEM;
> +}
> +qemu_iovec_init(>qiov, req->qiov->niov);
> +qemu_iovec_clone(>qiov, req->qiov, op->buf);

Now op->qiov's iovec[] array is equivalent to req->qiov but points to
op->buf.  But you never copied the data from req->qiov to op->buf so
junk gets written to the target!

> +

Re: [Qemu-block] [PATCH 0/9] major rework of drive-mirror

2016-06-15 Thread Denis V. Lunev

On 06/15/2016 12:06 PM, Kevin Wolf wrote:

Am 14.06.2016 um 17:25 hat Denis V. Lunev geschrieben:

Block commit of the active image to the backing store on a slow disk
could never end. For example with the guest with the following loop
inside
 while true; do
 dd bs=1k count=1 if=/dev/zero of=x
 done
running above slow storage could not complete the operation with a
resonable amount of time:
 virsh blockcommit rhel7 sda --active --shallow
 virsh qemu-monitor-event
 virsh qemu-monitor-command rhel7 \
 '{"execute":"block-job-complete",\
   "arguments":{"device":"drive-scsi0-0-0-0"} }'
 virsh qemu-monitor-event
Completion event is never received.

This problem could not be fixed easily with the current architecture. We
should either prohibit guest writes (making dirty bitmap dirty) or switch
to the sycnchronous scheme.

This series switches driver mirror to synch scheme. Actually we can have
something more intelligent and switch to sync mirroring just after
the first pass over the bitmap. Though this could be done relatively
easily during discussion. The most difficult things are here.

The set also adds some performance improvements dealing with
known-to-be-zero areas.

I only read the cover letter and had a quick look at the patch doing the
actual switch, so this is by far not a real review, but I have a few
general comments anway:


First of all, let's make sure we're all using the same terminology. In
past discussions about mirror modes, we distinguished active/passive and
synchronous/asynchronous.

* An active mirror mirrors requests immediately when they are made by
   the guest. A passive mirror just remembers that it needs to mirror
   something and does it whenever it wants.

* A synchronous mirror completes the guest request only after the data
   has successfully been written to both the live imaeg and the target.
   An asynchronous one can complete the guest request before the mirror
   I/O has completed.

In these terms, the currently implemented mirror is a passive
asynchronous one. If I understand correctly, what you are doing in this
series is to convert it unconditionally to an active asynchronous one.


The "unconditionally" part is my first complaint: The active mirror does
potentially a lot more I/O, so it's not clear that you want to use it.
This should be the user's choice. (We always intended to add an active
mirror sooner or later, but so far nobody needed it desperately enough.)

this sounds reasonable




The second big thing is that I don't want to see new users of the
notifiers in I/O functions. Let's try if we can't add a filter
BlockDriver instead. Then we'd add an option to set the filter node-name
in the mirror QMP command so that the management tool is aware of the
node and can refer to it.

this will be much more difficult to implement at my experience. Can you
share more details why filters are bad?


If we don't do this now, we'll have to introduce it later and can't be
sure that the management tool knows about it. This would complicate
things quite a bit because we would have to make sure that the added
node stays invisible to the management tool.


I think these two things are the big architectural questions. The rest
is hopefully more or less implementation details.

I completely agree with you.

We have the following choices:
1. implement parameter to use 'active'/'passive' mode from the very 
beginning
2. switch to 'active' mode upon receiving "block-job-complete" command 
unconditionally
3. switch to 'active' mode upon receiving "block-job-complete" command 
with proper parameter
4. switch to 'active' mode after timeout (I personally do not like this 
option)


I think that choices 1 and 3 do not contradict each other and
could be implemented to gather.

Den



Re: [Qemu-block] [PATCH 5/9] mirror: improve performance of mirroring of empty disk

2016-06-15 Thread Stefan Hajnoczi
On Tue, Jun 14, 2016 at 09:20:47PM -0600, Eric Blake wrote:
> On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> > We should not take into account zero blocks for delay calculations.
> > They are not read and thus IO throttling is not required. In the
> > other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes
> > days.
> > 
> > Signed-off-by: Denis V. Lunev 
> > Reviewed-by: Vladimir Sementsov-Ogievskiy 
> > CC: Stefan Hajnoczi 
> > CC: Fam Zheng 
> > CC: Kevin Wolf 
> > CC: Max Reitz 
> > CC: Jeff Cody 
> > CC: Eric Blake 
> > ---
> >  block/mirror.c | 7 +--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> Seems reasonable, but I'll let others more familiar with throttling give
> the final say.

There is a bounce buffer fallback when !drv->bdrv_co_pwrite_zeroes.  In
that case we need to account for the bytes transferred.  I don't see
where the patch takes this into account.


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH 6/9] block: pass qiov into before_write notifier

2016-06-15 Thread Stefan Hajnoczi
On Tue, Jun 14, 2016 at 06:25:13PM +0300, Denis V. Lunev wrote:
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/io.c| 12 +++-
>  include/block/block_int.h |  1 +
>  2 files changed, 8 insertions(+), 5 deletions(-)

I read Eric's reply and realized the duplication with my earlier reply.
Feel free to ignore mine and just respond to Eric.


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH 6/9] block: pass qiov into before_write notifier

2016-06-15 Thread Stefan Hajnoczi
On Tue, Jun 14, 2016 at 06:25:13PM +0300, Denis V. Lunev wrote:
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/io.c| 12 +++-
>  include/block/block_int.h |  1 +
>  2 files changed, 8 insertions(+), 5 deletions(-)

This patch is missing a commit description.  Why is this necessary?

If you add .qiov then can we remove the .bytes field?


signature.asc
Description: PGP signature


Re: [Qemu-block] [Nbd] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-15 Thread Paolo Bonzini


- Original Message -
> From: "Alex Bligh" 
> To: "Wouter Verhelst" 
> Cc: "Alex Bligh" , nbd-gene...@lists.sourceforge.net, 
> "Paolo Bonzini" ,
> qemu-de...@nongnu.org, "qemu block" 
> Sent: Wednesday, June 15, 2016 10:52:21 AM
> Subject: Re: [Nbd] [PATCH v4 04/11] nbd: Improve server handling of bogus 
> commands
> 
> 
> > On 15 Jun 2016, at 09:03, Wouter Verhelst  wrote:
> > 
> > On Wed, Jun 15, 2016 at 09:05:22AM +0200, Wouter Verhelst wrote:
> >> There are more clients than the Linux and qemu ones, but I think it's
> >> fair to say that those two are the most important ones. If they agree
> >> that a read reply which errors should come without payload, then I think
> >> we should update the standard to say that, too.
> > 
> > I've just pushed a commit that changes the spec (and the implementation)
> > so that if a server encounters a read error, it does not send a payload.
> > 
> > In other words, the current behaviour of qemu is correct, is now
> > documented to be correct, and should not be changed.
> 
> So what should those servers do (like 2 of mine) which don't buffer
> the entire read, if they get an error having already sent some data?

They have sent an error code of zero, and it turned out to be wrong.  So
the only thing they can do safely is disconnect.

Paolo



Re: [Qemu-block] [PATCH 0/9] major rework of drive-mirror

2016-06-15 Thread Kevin Wolf
Am 14.06.2016 um 17:25 hat Denis V. Lunev geschrieben:
> Block commit of the active image to the backing store on a slow disk
> could never end. For example with the guest with the following loop
> inside
> while true; do
> dd bs=1k count=1 if=/dev/zero of=x
> done
> running above slow storage could not complete the operation with a
> resonable amount of time:
> virsh blockcommit rhel7 sda --active --shallow
> virsh qemu-monitor-event
> virsh qemu-monitor-command rhel7 \
> '{"execute":"block-job-complete",\
>   "arguments":{"device":"drive-scsi0-0-0-0"} }'
> virsh qemu-monitor-event
> Completion event is never received.
> 
> This problem could not be fixed easily with the current architecture. We
> should either prohibit guest writes (making dirty bitmap dirty) or switch
> to the sycnchronous scheme.
> 
> This series switches driver mirror to synch scheme. Actually we can have
> something more intelligent and switch to sync mirroring just after
> the first pass over the bitmap. Though this could be done relatively
> easily during discussion. The most difficult things are here.
> 
> The set also adds some performance improvements dealing with
> known-to-be-zero areas.

I only read the cover letter and had a quick look at the patch doing the
actual switch, so this is by far not a real review, but I have a few
general comments anway:


First of all, let's make sure we're all using the same terminology. In
past discussions about mirror modes, we distinguished active/passive and
synchronous/asynchronous.

* An active mirror mirrors requests immediately when they are made by
  the guest. A passive mirror just remembers that it needs to mirror
  something and does it whenever it wants.

* A synchronous mirror completes the guest request only after the data
  has successfully been written to both the live imaeg and the target.
  An asynchronous one can complete the guest request before the mirror
  I/O has completed.

In these terms, the currently implemented mirror is a passive
asynchronous one. If I understand correctly, what you are doing in this
series is to convert it unconditionally to an active asynchronous one.


The "unconditionally" part is my first complaint: The active mirror does
potentially a lot more I/O, so it's not clear that you want to use it.
This should be the user's choice. (We always intended to add an active
mirror sooner or later, but so far nobody needed it desperately enough.)


The second big thing is that I don't want to see new users of the
notifiers in I/O functions. Let's try if we can't add a filter
BlockDriver instead. Then we'd add an option to set the filter node-name
in the mirror QMP command so that the management tool is aware of the
node and can refer to it.

If we don't do this now, we'll have to introduce it later and can't be
sure that the management tool knows about it. This would complicate
things quite a bit because we would have to make sure that the added
node stays invisible to the management tool.


I think these two things are the big architectural questions. The rest
is hopefully more or less implementation details.

Kevin



Re: [Qemu-block] [PATCH v2 0/6] block: Enable byte granularity I/O

2016-06-15 Thread Stefan Hajnoczi
On Tue, Jun 14, 2016 at 03:32:29PM +0200, Kevin Wolf wrote:
> Previous series have already converted some block drivers to byte-based rather
> than sector-based interfaces. However, the common I/O path as well as 
> raw-posix
> still enforced a minimum alignment of 512 bytes because some sector-based 
> logic
> was involved.
> 
> This patch series removes these limitations and a sub-sector request actually
> ends up as a sub-sector syscall now.
> 
> v2:
> - Updated trace-events for bdrv_co_do_copy_on_readv() [Eric]
> - Added some assertions [Eric]
> - Renamed laio_submit_co() -> laio_co_submit() and added coroutine_fn
>   to its prototype [Stefan]
> - linux-aio: Include block/block.h and use BDRV_SECTOR_SIZE instead
>   of 512 [Eric]
> 
> Kevin Wolf (6):
>   block: Byte-based bdrv_co_do_copy_on_readv()
>   block: Prepare bdrv_aligned_preadv() for byte-aligned requests
>   block: Prepare bdrv_aligned_pwritev() for byte-aligned requests
>   raw-posix: Switch to bdrv_co_* interfaces
>   raw-posix: Implement .bdrv_co_preadv/pwritev
>   block: Don't enforce 512 byte minimum alignment
> 
>  block.c   |   2 +-
>  block/io.c| 128 
> ++
>  block/linux-aio.c |  84 -
>  block/mirror.c|  10 ++--
>  block/raw-aio.h   |   3 ++
>  block/raw-posix.c |  62 
>  include/block/block.h |  10 ++--
>  trace-events  |   2 +-
>  8 files changed, 176 insertions(+), 125 deletions(-)
> 
> -- 
> 1.8.3.1
> 

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH 8/9] mirror: use synch scheme for drive mirror

2016-06-15 Thread Denis V. Lunev

On 06/15/2016 07:18 AM, Eric Blake wrote:

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:

Block commit of the active image to the backing store on a slow disk
could never end. For example with the guest with the following loop
inside
 while true; do
 dd bs=1k count=1 if=/dev/zero of=x
 done
running above slow storage could not complete the operation with a

s/with/within/


resonable amount of time:

s/resonable/reasonable/


 virsh blockcommit rhel7 sda --active --shallow
 virsh qemu-monitor-event
 virsh qemu-monitor-command rhel7 \
 '{"execute":"block-job-complete",\
   "arguments":{"device":"drive-scsi0-0-0-0"} }'
 virsh qemu-monitor-event
Completion event is never received.

This problem could not be fixed easily with the current architecture. We
should either prohibit guest writes (making dirty bitmap dirty) or switch
to the sycnchronous scheme.

s/sycnchronous/synchronous/


This patch implements the latter. It adds mirror_before_write_notify
callback. In this case all data written from the guest is synchnonously

s/synchnonously/synchronously/


written to the mirror target. Though the problem is solved partially.
We should switch from bdrv_dirty_bitmap to simple hbitmap. This will be
done in the next patch.


In other words, the mere act of mirroring a guest will now be
guest-visible in that the guest is auto-throttled while waiting for the
mirroring to be written out.  It seems like you would want to be able to
opt in or out of this scheme.  Is it something that can be toggled
mid-operation (try asynchronous, and switch to synchronous if a timeout
elapses)?


Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Jeff Cody 
CC: Eric Blake 
---
  block/mirror.c | 78 ++
  1 file changed, 78 insertions(+)


I'll leave the actual idea to others to review, because there may be
some ramifications that I'm not thinking of.


I would like to start the discussion with this series.
Yes, may be we need the policy and should switch
to synch scheme after the first stage of mirroring
(when 'complete' command is sent by the management
layer.

This could be done relatively easily on the base of this
patches. Really. Though I want to obtain some general
acceptance in advance.

Den

P.S. Thank you very much for looking at this ;)



Re: [Qemu-block] [Nbd] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-15 Thread Alex Bligh

> On 15 Jun 2016, at 09:03, Wouter Verhelst  wrote:
> 
> On Wed, Jun 15, 2016 at 09:05:22AM +0200, Wouter Verhelst wrote:
>> There are more clients than the Linux and qemu ones, but I think it's
>> fair to say that those two are the most important ones. If they agree
>> that a read reply which errors should come without payload, then I think
>> we should update the standard to say that, too.
> 
> I've just pushed a commit that changes the spec (and the implementation)
> so that if a server encounters a read error, it does not send a payload.
> 
> In other words, the current behaviour of qemu is correct, is now
> documented to be correct, and should not be changed.

So what should those servers do (like 2 of mine) which don't buffer
the entire read, if they get an error having already sent some data?

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-block] [PATCH 4/9] mirror: efficiently zero out target

2016-06-15 Thread Denis V. Lunev

On 06/15/2016 06:00 AM, Eric Blake wrote:

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:

With a bdrv_co_write_zeroes method on a target BDS zeroes will not be placed
into the wire. Thus the target could be very efficiently zeroed out. This
is should be done with the largest chunk possible.

This improves the performance of the live migration of the empty disk by
150 times if NBD supports write_zeroes.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Jeff Cody 
CC: Eric Blake 
---
  block/mirror.c | 32 +++-
  1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index c7b3639..c2f8773 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -21,6 +21,7 @@
  #include "qemu/ratelimit.h"
  #include "qemu/bitmap.h"
  
+#define MIRROR_ZERO_CHUNK   (3u << (29 - BDRV_SECTOR_BITS))  /* 1.5 Gb */

Probably nicer to track this in bytes.  And do you really want a
hard-coded arbitrary limit, or is it better to live with
MIN_NON_ZERO(target_bs->bl.max_pwrite_zeroes, INT_MAX)?

unfortunately we should. INT_MAX is not aligned as required.
May be we should align INT_MAX properly to fullfill
write_zeroes alignment.

Hmm, may be we can align INT_MAX properly down. OK,
I'll try to do that gracefully.


@@ -512,7 +513,8 @@ static int mirror_dirty_init(MirrorBlockJob *s)
  
  end = s->bdev_length / BDRV_SECTOR_SIZE;
  
-if (base == NULL && !bdrv_has_zero_init(target_bs)) {

+if (base == NULL && !bdrv_has_zero_init(target_bs) &&
+target_bs->drv->bdrv_co_write_zeroes == NULL) {

Indentation is off, although if checkpatch.pl doesn't complain I guess
it doesn't matter that much.

Why should you care whether the target_bs->drv implements a callback?
Can't you just rely on the normal bdrv_*() functions to do the dirty
work of picking the most efficient implementation without you having to
bypass the block layer?  In fact, isn't that the whole goal of
bdrv_make_zero() - why not call that instead of reimplementing it?
this is the idea of the patch actually. If the callback is not 
implemented, we

will have zeroes actually written or send to the wire. In this case there is
not much sense to do that, the amount of data actually written will be
significantly increased (some areas will be written twice - with zeroes and
with the actual data).

On the other hand, if callback is implemented, we will have very small 
amount

of data in the wire and written actually and thus will have a benefit. I am
trying to avoid very small chunks of data. Here (during the migration 
process)

the data is sent with 10 Mb chunks and with takes a LOT of time with NBD.
We can send chunks 1.5 Gb (currently). They occupies the same 26 bytes 
of data

on the transport layer.


Patch needs rebasing - we've redone this into bdrv_co_pwrite_zeroes and
a byte interface, since upstream commit c1499a5e.

sure!


  bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end);
  return 0;
  }
@@ -546,6 +548,34 @@ static int mirror_dirty_init(MirrorBlockJob *s)
  }
  sector_num += n;
  }
+
+if (base != NULL || bdrv_has_zero_init(target_bs)) {

You're now repeating the conditional that used to be 'bool
mark_all_dirty' (well, this is !mark_all_dirty); is it worth keeping the
simpler bool around?

not quite. The difference is in the presence of the callback,
but sure I can cache it. no prob.


+/* no need to zero out entire disk */
+return 0;
+}
+
+for (sector_num = 0; sector_num < end; ) {
+int nb_sectors = MIN(MIRROR_ZERO_CHUNK, end - sector_num);

Why limit yourself to 1.5G? It's either too small for what you can
really do, or too large for what the device permits.  See my above
comment about MIN_NON_ZERO.

alignment, covered above


+int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+
+if (now - last_pause_ns > SLICE_TIME) {
+last_pause_ns = now;
+block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, 0);
+}
+
+if (block_job_is_cancelled(>common)) {
+return -EINTR;
+}
+
+if (s->in_flight == MAX_IN_FLIGHT) {
+trace_mirror_yield(s, s->in_flight, s->buf_free_count, -1);
+mirror_wait_for_io(s);
+continue;
+}

Hmm - I guess your mirror yield points are why you couldn't just
directly use bdrv_make_zero(); but is that something where some code
refactoring can share more code rather than duplicating it?

the purpose is to put several requests into the wire in parallel.
Original mirror code do this nicely and thus is reused.


+
+mirror_do_zero_or_discard(s, sector_num, nb_sectors, false);
+sector_num += nb_sectors;
+}
  return 0;
  

Re: [Qemu-block] [PATCH 3/9] mirror: optimize dirty bitmap filling in mirror_run a bit

2016-06-15 Thread Denis V. Lunev

On 06/15/2016 05:36 AM, Eric Blake wrote:

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:

There is no need to scan allocation tables if we have mark_all_dirty flag
set. Just mark it all dirty.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Jeff Cody 
CC: Eric Blake 
---
  block/mirror.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 797659d..c7b3639 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -507,12 +507,16 @@ static int mirror_dirty_init(MirrorBlockJob *s)
  BlockDriverState *base = s->base;
  BlockDriverState *bs = blk_bs(s->common.blk);
  BlockDriverState *target_bs = blk_bs(s->target);
-bool mark_all_dirty = base == NULL && !bdrv_has_zero_init(target_bs);
  uint64_t last_pause_ns;
  int ret, n;
  
  end = s->bdev_length / BDRV_SECTOR_SIZE;
  
+if (base == NULL && !bdrv_has_zero_init(target_bs)) {

+bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end);

Won't work as written.  'end' is 64 bits, but bdrv_set_dirty_bitmap() is
limited to a 32-bit sector count.  Might be first worth updating
bdrv_set_dirty_bitmap() and friends to be byte-based rather than
sector-based (but still tracking a sector per bit, obviously), as well
as expand it to operate on 64-bit sizes rather than 32-bit.

very nice catch! thank you


I'm also worried slightly that the existing code repeated things in a
loop, and therefore had pause points every iteration and could thus
remain responsive to an early cancel.  But doing the entire operation in
one chunk (assuming you fix bitmap code to handle a 64-bit size) may end
up running for so long without interruption that you lose the benefits
of an early interruption that you have by virtue of a 32-bit limit.


I do not think that this should be worried actually. We just perform memset
inside for not that big area (1 Tb disk will have 2 Mb dirty area 
bitmap) under

default parameters.



Re: [Qemu-block] [Nbd] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-15 Thread Wouter Verhelst
On Wed, Jun 15, 2016 at 09:05:22AM +0200, Wouter Verhelst wrote:
> There are more clients than the Linux and qemu ones, but I think it's
> fair to say that those two are the most important ones. If they agree
> that a read reply which errors should come without payload, then I think
> we should update the standard to say that, too.

I've just pushed a commit that changes the spec (and the implementation)
so that if a server encounters a read error, it does not send a payload.

In other words, the current behaviour of qemu is correct, is now
documented to be correct, and should not be changed.

Regards,

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v2] block: drop support for using qcow[2] encryption with system emulators

2016-06-15 Thread Kevin Wolf
Am 13.06.2016 um 17:36 hat Kevin Wolf geschrieben:
> Am 13.06.2016 um 13:30 hat Daniel P. Berrange geschrieben:
> > So rather than fix the crash, and backport it to stable
> > releases, just go ahead with what we have warned users about
> > and disable any use of qcow2 encryption in the system
> > emulators. qemu-img/qemu-io/qemu-nbd are still able to access
> > qcow2 encrypted images for the sake of data conversion.
> > 
> > In the future, qcow2 will gain support for the alternative
> > luks format, but when this happens it'll be using the
> > '-object secret' infrastructure for gettings keys, which
> > avoids this problematic scenario entirely.
> > 
> > Signed-off-by: Daniel P. Berrange 
> 
> Apart from the commit message comments above and from Eric, this looks
> good to me. I'll give a little more time for review before merging this.

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH] m25p80: fix test on blk_pread() return value

2016-06-15 Thread Kevin Wolf
Am 14.06.2016 um 18:02 hat Cédric Le Goater geschrieben:
> On 06/14/2016 10:38 AM, Kevin Wolf wrote:
> > Am 14.06.2016 um 10:02 hat Cédric Le Goater geschrieben:
>  #4  0x7fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, 
>  req=, offset=30878208, 
>  bytes=512, qiov=0x7fa7f47fee60, flags=0)
>  at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243
>  #5  0x7fa81c669ecb in bdrv_co_pwritev (bs=0x7fa81d4dd050, offset=8, 
>  bytes=512, qiov=0x7fa80d5191c0, 
>  flags=(BDRV_REQ_COPY_ON_READ | BDRV_REQ_ZERO_WRITE | 
>  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_SERIALISING | BDRV_REQ_FUA | unknown: 
>  4278124256), flags@entry=(unknown: 0))
>  at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1492
> >>>
> >>> That 'flags' value looks bogus...
> >>>
>  #6  0x7fa81c65e367 in blk_co_pwritev (blk=0x7fa81d4c5b60, 
>  offset=30878208, bytes=256, qiov=0x7fa80d5191c0, 
>  flags=(unknown: 0)) at 
>  /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:788
>  #7  0x7fa81c65e49b in blk_aio_write_entry (opaque=0x7fa7e849aca0)
>  at 
>  /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:977
>  #8  0x7fa81c6c823a in coroutine_trampoline (i0=, 
>  i1=)
>  at 
>  /home/legoater/work/qemu/qemu-ast2400-mainline.git/util/coroutine-ucontext.c:78
>  #9  0x7fa818ea8f00 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> >>>
> >>> and we don't get anything further in the backtrace beyond coroutines, to
> >>> see who's sending the bad parameters.  I recently debugged a bogus flags
> >>> in bdrv_aio_preadv, by hoisting an assert to occur before coroutines are
> >>> used in blk_aio_prwv():
> >>>
> >>> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02948.html
> >>>
> >>> I've just posted v2 of that patch (now a 2/2 series), but in v2 no
> >>> longer kept the assert at that point.  But maybe the correct fix, and/or
> >>> the hack for catching the bug prior to coroutines, will help you debug
> >>> where the bad arguments are coming from.
> >>
> >> That does not fix the assert.
> >>  
>  #10 0x7fa80d5189d0 in ?? ()
>  #11 0x in ?? ()
>  (gdb) up 4
>  #4  0x7fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, 
>  req=, offset=30878208, 
>  bytes=512, qiov=0x7fa7f47fee60, flags=0)
>  at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243
>  1243 assert(!qiov || bytes == qiov->size);
>  (gdb) p *qiov 
>  $1 = {iov = 0x7fa81da671d0, niov = 1, nalloc = 1, size = 256}
> >>
> >> So, it seems that the issue is coming from the fact that bdrv_co_pwritev()
> >> does not handle alignments less than BDRV_SECTOR_SIZE :
> >>
> >>/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
> >>uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment);
> >>
> >> It calls bdrv_aligned_pwritev() which does the assert : 
> >>
> >>assert(!qiov || bytes == qiov->size);
> > 
> > Yes, but between these two places, there is code that should actually
> > enforce the right alignment:
> > 
> > if ((offset + bytes) & (align - 1)) {
> > ...
> > }
> > 
> > You can see in your backtrace that bdrv_aligned_pwritev() gets a
> > different qiov than bdrv_co_pwritev() (which is local_qiov in the latter
> > function).
> > 
> > It's just unclear to me why this code extended bytes, but didn't add the
> > tail_buf iovec to local_qiov.
> 
> The gdb backtrace is bogus. It does not make sense. May be a gdb issue
> with multithread on jessie.
> 
> In the path tracking the tail bytes, we have : 
> 
>  if ((offset + bytes) & (align - 1)) {
>   ...
  if (!use_local_qiov) {
  qemu_iovec_init(_qiov, qiov->niov + 1);
  qemu_iovec_concat(_qiov, qiov, 0, qiov->size);
  use_local_qiov = true;
  }
> tail_bytes = (offset + bytes) & (align - 1);
> qemu_iovec_add(_qiov, tail_buf + tail_bytes, align - 
> tail_bytes);
> 
> bytes = ROUND_UP(bytes, align);
> }
> 
> This is where the issue is I think. The qiov holds 256 and bytes 512.
> 
> I have no idea how to fix that though.

Added some more context above. qiov->size as passed from the device is
already 256 bytes, which are added to local_qiov with
qemu_iovec_concat(). And then we add another 256 from tail_buf in the
lines that you quoted, so in theory we should end up with a properly
aligned 256 + 256 = 512 byte qiov.

Kevin



Re: [Qemu-block] [PATCH] block-backend: allow flush on devices with open tray

2016-06-15 Thread Kevin Wolf
Am 14.06.2016 um 18:13 hat Max Reitz geschrieben:
> On 14.06.2016 17:54, John Snow wrote:
> > 
> > 
> > On 06/14/2016 09:19 AM, Max Reitz wrote:
> >> On 10.06.2016 23:59, John Snow wrote:
> >>> If a device still has an attached BDS because the medium has not yet
> >>> been removed, we will be unable to migrate to a new host because
> >>> blk_flush will return an error for that backend.
> >>>
> >>> Replace the call to blk_is_available to blk_is_inserted to weaken
> >>> the check and allow flushes from the backend to work, while still
> >>> disallowing flushes from the frontend/device model to work.
> >>>
> >>> This fixes a regression present in 2.6.0 caused by the following commit:
> >>> fe1a9cbc339bb54d20f1ca4c1e8788d16944d5cf
> >>> block: Move some bdrv_*_all() functions to BB
> >>>
> >>> Signed-off-by: John Snow 
> >>> ---
> >>>  block/block-backend.c | 2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> I'm still not sure we shouldn't do the same for blk_{co,aio}_flush(). I
> >> guess you exclude them here because you specifically want to fix the
> >> issue mentioned in the commit message, but then we could just make
> >> blk_flush_all() ignore an -ENOMEDIUM.
> > 
> > Yeah, I didn't investigate the full path. Just making the minimal fixes.
> > Is there a concern that this may still leave certain pathways broken
> > when the CDROM tray is open?
> > 
> > I don't know of any immediately without digging again.
> > 
> >>
> >> I personally think we should make all blk_*flush() functions use
> >> blk_is_inserted() instead of blk_is_available(). As we have discussed on
> >> IRC, there are probably not that many cases a guest can flush a medium
> >> in an open tray anyway (because the main use case are read-only
> >> CD-ROMs), and even if so, that wouldn't change any data, so even if the
> >> guest can actually flush something on an open tray, I don't think anyone
> >> would complain.
> >>
> >> Max
> >>
> > 
> > I have difficulty making pragmatic arguments when purity is at stake,
> > but I've already wandered outside of my device model, so I will defer to
> > your judgment.
> > 
> >>> diff --git a/block/block-backend.c b/block/block-backend.c
> >>> index 34500e6..d1e875e 100644
> >>> --- a/block/block-backend.c
> >>> +++ b/block/block-backend.c
> >>> @@ -1122,7 +1122,7 @@ int blk_co_flush(BlockBackend *blk)
> >>>  
> >>>  int blk_flush(BlockBackend *blk)
> >>>  {
> >>> -if (!blk_is_available(blk)) {
> >>> +if (!blk_is_inserted(blk)) {
> >>>  return -ENOMEDIUM;
> >>>  }
> >>>  
> >>>
> >>
> >>
> > 
> > Is this a NACK unless I attempt to address the wider design issue?
> 
> I just don't see a point in using blk_is_inserted() here but
> blk_is_available() in the other blk_*flush() functions. If
> blk_is_inserted() is correct for blk_flush(), it should be correct for
> blk_co_flush() and blk_aio_flush(), too.

I agree, if we can, we should keep the behaviour consistent between all
interfaces types (sync/AIO/coroutine, byte-based/sector-based) for the
same operation.

Eric also rightfully said that we need a test cases, so a v2 would be
good anyway.

Kevin


pgp0NZSZH8q9E.pgp
Description: PGP signature


Re: [Qemu-block] [Nbd] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-15 Thread Wouter Verhelst
On Tue, Jun 14, 2016 at 04:02:15PM +0100, Alex Bligh wrote:
> 
> On 14 Jun 2016, at 14:32, Paolo Bonzini  wrote:
> 
> > 
> > On 13/06/2016 23:41, Alex Bligh wrote:
> >> That's one of the reasons that there is a proposal to add
> >> STRUCTURED_READ to the spec (although I still haven't had time to
> >> implement that for qemu), so that we have a newer approach that allows
> >> for proper error handling without ambiguity on whether bogus bytes must
> >> be sent on a failed read.  But you'd have to convince me that ALL
> >> existing NBD server and client implementations expect to handle a read
> >> error without read payload, otherwise, I will stick with the notion that
> >> the current spec wording is correct, and that read errors CANNOT be
> >> gracefully recovered from unless BOTH sides transfer (possibly bogus)
> >> bytes along with the error message, and which is why BOTH sides of the
> >> protocol are warned that read errors usually result in a disconnection
> >> rather than clean continuation, without the addition of STRUCTURED_READ.
> > 
> > I suspect that there are exactly two client implementations,
> 
> My understanding is that there are more than 2 client implementations.
> A quick google found at least one BSD client. I bet read error handling
> is a mess in all of them.
> 
> > namely
> > Linux and QEMU's, and both do the right thing.
> 
> This depends what you mean by 'right'. Both appear to be non-compliant
> with the standard.
> 
> Note the standard is not defined by the client implementation, but
> by the protocol document.

No, it isn't. At least not yet.

The standard document has only become formal a few months ago. It's
certainly possible that we made a mistake formalizing things, and if so,
we should fix the document rather than saying "whatever you've been
doing these years, even though it worked, is wrong".

There are more clients than the Linux and qemu ones, but I think it's
fair to say that those two are the most important ones. If they agree
that a read reply which errors should come without payload, then I think
we should update the standard to say that, too.

> > What servers do doesn't matter, if all the clients agree.
> 
> The spec originally was not clear on how errors on reads should be
> handled, leading to any read causing a protocol drop. The spec is
> now clear. Unfortunately it is not possible to make a back compatible
> fix. Hence the real fix here is to implement structured replies,
> which is what Eric and I have been working on.

That much, at any rate, is true.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Re: [Qemu-block] [Nbd] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-15 Thread Wouter Verhelst
On Mon, Jun 13, 2016 at 10:41:05PM +0100, Alex Bligh wrote:
> For amusement value, the non-threaded handler (which is not used
> any more) does not send any payload on an error:
>  https://github.com/yoe/nbd/blob/master/nbd-server.c#L1734

nbd-server used to just drop the connection on read error.

> In essence read error handling is a horrible mess in NBD, and
> I would not expect it to work in general :-(

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12