[PATCH v1] block/raw-format: implement .bdrv_get_specific_info handler
When using the raw format, allow exposing specific info by the underlying storage. In particular, this will enable RBD images using the raw format to indicate a LUKS2 encrypted image in the output of qemu-img info. Signed-off-by: Or Ozeri --- block/raw-format.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/block/raw-format.c b/block/raw-format.c index 7717578ed6..f6e70e2356 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -369,6 +369,12 @@ static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi) return bdrv_get_info(bs->file->bs, bdi); } +static ImageInfoSpecific *raw_get_specific_info(BlockDriverState *bs, +Error **errp) +{ +return bdrv_get_specific_info(bs->file->bs, errp); +} + static void raw_refresh_limits(BlockDriverState *bs, Error **errp) { if (bs->probed) { @@ -603,6 +609,7 @@ BlockDriver bdrv_raw = { .has_variable_length = true, .bdrv_measure = &raw_measure, .bdrv_get_info= &raw_get_info, +.bdrv_get_specific_info = &raw_get_specific_info, .bdrv_refresh_limits = &raw_refresh_limits, .bdrv_probe_blocksizes = &raw_probe_blocksizes, .bdrv_probe_geometry = &raw_probe_geometry, -- 2.27.0
Re: [RFC PATCH v2 1/7] hw/misc: Add device to help managing aliased memory regions
On 4/22/21 3:33 AM, Richard Henderson wrote: > On 4/19/21 2:43 AM, Philippe Mathieu-Daudé wrote: >> Not really RFC, simply that I'v to add the technical description, >> but I'd like to know if there could be a possibility to not accept >> this device (because I missed something) before keeping working on >> it. So far it is only used in hobbyist boards. >> >> Cc: Peter Xu >> Cc: Paolo Bonzini >> --- >> include/hw/misc/aliased_region.h | 87 >> hw/misc/aliased_region.c | 132 +++ >> MAINTAINERS | 6 ++ >> hw/misc/Kconfig | 3 + >> hw/misc/meson.build | 1 + >> 5 files changed, 229 insertions(+) >> create mode 100644 include/hw/misc/aliased_region.h >> create mode 100644 hw/misc/aliased_region.c > > Looks reasonable to me. > > >> + subregion_bits = 64 - clz64(s->span_size - 1); >> + s->mem.count = s->region_size >> subregion_bits; >> + assert(s->mem.count > 1); >> + subregion_size = 1ULL << subregion_bits; > > So... subregion_size = pow2floor(s->span_size)? No... should be subregion_size = pow2ceil(s->span_size). > Why use a floor-ish computation here and pow2ceil later in > aliased_mr_realize? I missed it :) > Why use either floor or ceil as opposed to > asserting that the size is in fact a power of 2? Unfortunately not all memory regions are power of 2 :( See for example the armv7m_systick device (size 0xe0). > Why must the region be > a power of 2, as opposed to e.g. a multiple of page_size? I/O regions don't have to be multiple of page_size... See AVR devices for example: (qemu) info mtree address-space: memory - (prio 0, i/o): system -7fff (prio 0, rom): flash 0080-008000ff (prio -1234, i/o): I/O 00800023-00800025 (prio -1000, i/o): atmega-gpio-b 00800026-00800028 (prio -1000, i/o): atmega-gpio-c 00800029-0080002b (prio -1000, i/o): atmega-gpio-d 00800035-00800035 (prio -1000, i/o): avr-timer8-intflag 00800036-00800036 (prio 0, i/o): avr-timer16-intflag 00800037-00800037 (prio -1000, i/o): avr-timer8-intflag 0080003f-00800041 (prio -1000, i/o): avr-eeprom 00800044-00800048 (prio -1000, i/o): avr-timer8 0080004c-0080004e (prio -1000, i/o): avr-spi 00800060-00800060 (prio -1000, i/o): avr-watchdog 00800064-00800064 (prio 0, i/o): avr-power 0080006e-0080006e (prio -1000, i/o): avr-timer8-intmask 0080006f-0080006f (prio 0, i/o): avr-timer16-intmask 00800070-00800070 (prio -1000, i/o): avr-timer8-intmask 00800074-00800075 (prio -1000, i/o): avr-ext-mem-ctrl 00800078-0080007f (prio -1000, i/o): avr-adc 00800080-0080008d (prio 0, i/o): avr-timer16 008000b0-008000b4 (prio -1000, i/o): avr-timer8 008000b8-008000bd (prio -1000, i/o): avr-twi 008000c0-008000c6 (prio 0, i/o): avr-usart 00800100-008008ff (prio 0, ram): sram > Or just the > most basic requirement that region_size be a multiple of span_size? OK. Thanks for the review! Now I need to fill the documentation part :/ Phil.
Re: [PATCH] hw/sd: sdhci: Enable 64-bit system bus capability in the default SD/MMC host controller
Awesome! Thank you, Philippe and Bin! On Mon, Jul 5, 2021 at 2:54 AM Philippe Mathieu-Daudé wrote: > On 6/23/21 8:59 PM, Joanne Koong wrote: > > The default SD/MMC host controller uses SD spec v2.00. 64-bit system bus > capability > > was added in v2. > > > > In this change, we arrive at 0x157834b4 by computing (0x057834b4 | (1ul > << 28)) > > where 28 represents the BUS64BIT SDHC_CAPAB field. > > > > Signed-off-by: Joanne Koong > > --- > > hw/sd/sdhci-internal.h | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > Thanks, series applied to sdmmc-next tree. >
[PATCH] block/file-posix: Use O_RDWR for locking on macOS
qemu_lock_fd_test always returns 0 when fd is not open for writing and exclusive is true on macOS 11.3.1. Signed-off-by: Akihiko Odaki --- block/file-posix.c | 17 + 1 file changed, 17 insertions(+) diff --git a/block/file-posix.c b/block/file-posix.c index b3fbb9bd63a..017fbc6b055 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -130,6 +130,14 @@ #define RAW_LOCK_PERM_BASE 100 #define RAW_LOCK_SHARED_BASE 200 +/* + * qemu_lock_fd_test always returns 0 when fd is not open for writing and + * exclusive is true on macOS 11.3.1. + */ +#ifdef __APPLE__ +#define RAW_LOCK_WRITES +#endif + typedef struct BDRVRawState { int fd; bool use_lock; @@ -638,7 +646,11 @@ static int raw_open_common(BlockDriverState *bs, QDict *options, false); s->open_flags = open_flags; +#ifdef RAW_LOCK_WRITES +raw_parse_flags(bdrv_flags, &s->open_flags, s->use_lock); +#else raw_parse_flags(bdrv_flags, &s->open_flags, false); +#endif s->fd = -1; fd = qemu_open(filename, s->open_flags, errp); @@ -1004,6 +1016,11 @@ static int raw_reconfigure_getfd(BlockDriverState *bs, int flags, bool has_writers = perm & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED | BLK_PERM_RESIZE); int fcntl_flags = O_APPEND | O_NONBLOCK; +#ifdef RAW_LOCK_WRITES +if (s->use_lock) { +has_writers = true; +} +#endif #ifdef O_NOATIME fcntl_flags |= O_NOATIME; #endif -- 2.30.1 (Apple Git-130)
Re: [PATCH v2 0/6] block: block-status cache for data regions
Am 23.06.2021 um 17:01 hat Max Reitz geschrieben: > Hi, > > See the cover letter from v1 for the general idea: > > https://lists.nongnu.org/archive/html/qemu-block/2021-06/msg00843.html > > > The biggest change here in v2 is that instead of having a common CoMutex > protect the block-status cache, we’re using RCU now. So to read from > the cache, or even to invalidate it, no lock is needed, only to update > it with new data. > > Disclaimer: I have no experience with RCU in practice so far, neither in > qemu nor anywhere else. So I hope I’ve used it correctly... This I hope as well. :-) With the missing qatomic_rcu_read() added: Reviewed-by: Kevin Wolf
Re: [PATCH v4 01/34] modules: add modinfo macros
On 24/06/21 22:37, Eduardo Habkost wrote: On Thu, Jun 24, 2021 at 12:38:03PM +0200, Gerd Hoffmann wrote: Add macros for module info annotations. Instead of having that module meta-data stored in lists in util/module.c place directly in the module source code. [...] +/* module implements QOM type */ +#define module_obj(name) modinfo(obj, name) Can we make OBJECT_DEFINE_TYPE*() use this macro automatically? Yeah, that's possible. I would do it as a separate patch though, because Gerd is on vacation and he asked me to include it in a pull request before soft freeze. Thanks, Paolo
Re: [PATCH v2 2/6] block: block-status cache for data regions
Am 23.06.2021 um 17:01 hat Max Reitz geschrieben: > As we have attempted before > (https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html, > "file-posix: Cache lseek result for data regions"; > https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html, > "file-posix: Cache next hole"), this patch seeks to reduce the number of > SEEK_DATA/HOLE operations the file-posix driver has to perform. The > main difference is that this time it is implemented as part of the > general block layer code. > > The problem we face is that on some filesystems or in some > circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the > implementation is outside of qemu, there is little we can do about its > performance. > > We have already introduced the want_zero parameter to > bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls > unless we really want zero information; but sometimes we do want that > information, because for files that consist largely of zero areas, > special-casing those areas can give large performance boosts. So the > real problem is with files that consist largely of data, so that > inquiring the block status does not gain us much performance, but where > such an inquiry itself takes a lot of time. > > To address this, we want to cache data regions. Most of the time, when > bad performance is reported, it is in places where the image is iterated > over from start to end (qemu-img convert or the mirror job), so a simple > yet effective solution is to cache only the current data region. > > (Note that only caching data regions but not zero regions means that > returning false information from the cache is not catastrophic: Treating > zeroes as data is fine. While we try to invalidate the cache on zero > writes and discards, such incongruences may still occur when there are > other processes writing to the image.) > > We only use the cache for nodes without children (i.e. protocol nodes), > because that is where the problem is: Drivers that rely on block-status > implementations outside of qemu (e.g. SEEK_DATA/HOLE). > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307 > Signed-off-by: Max Reitz Since you indicated that you'll respin the patch, I'll add my minor comments: > @@ -2442,9 +2445,58 @@ static int coroutine_fn > bdrv_co_block_status(BlockDriverState *bs, > aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset; > > if (bs->drv->bdrv_co_block_status) { > -ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset, > -aligned_bytes, pnum, &local_map, > -&local_file); > +bool from_cache = false; > + > +/* > + * Use the block-status cache only for protocol nodes: Format > + * drivers are generally quick to inquire the status, but protocol > + * drivers often need to get information from outside of qemu, so > + * we do not have control over the actual implementation. There > + * have been cases where inquiring the status took an unreasonably > + * long time, and we can do nothing in qemu to fix it. > + * This is especially problematic for images with large data areas, > + * because finding the few holes in them and giving them special > + * treatment does not gain much performance. Therefore, we try to > + * cache the last-identified data region. > + * > + * Second, limiting ourselves to protocol nodes allows us to assume > + * the block status for data regions to be DATA | OFFSET_VALID, and > + * that the host offset is the same as the guest offset. > + * > + * Note that it is possible that external writers zero parts of > + * the cached regions without the cache being invalidated, and so > + * we may report zeroes as data. This is not catastrophic, > + * however, because reporting zeroes as data is fine. > + */ > +if (QLIST_EMPTY(&bs->children)) { > +if (bdrv_bsc_is_data(bs, aligned_offset, pnum)) { > +ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID; > +local_file = bs; > +local_map = aligned_offset; > + > +from_cache = true; > +} > +} > + > +if (!from_cache) { Is having a separate variable from_cache really useful? This looks like it could just be: if (QLIST_EMPTY() && bdrv_bsc_is_data()) { // The code above } else { // The code below } > +ret = bs->drv->bdrv_co_block_status(bs, want_zero, > aligned_offset, > +aligned_bytes, pnum, > &local_map, > +&local_file); > + > +/* > + * Note that checking QLIST_EMPTY(&bs->children) is also done > when > + * t
[PATCH] block/replication.c: Properly attach children
The replication driver needs access to the children block-nodes of it's child so it can issue bdrv_make_empty to manage the replication. However, it does this by directly copying the BdrvChilds, which is wrong. Fix this by properly attaching the block-nodes with bdrv_attach_child(). Also, remove a workaround introduced in commit 6ecbc6c52672db5c13805735ca02784879ce8285 "replication: Avoid blk_make_empty() on read-only child". Signed-off-by: Lukas Straub --- -v2: Test for BDRV_CHILD_PRIMARY in replication_child_perm, since bs->file might not be set yet. (Vladimir) block/replication.c | 94 + 1 file changed, 61 insertions(+), 33 deletions(-) diff --git a/block/replication.c b/block/replication.c index 52163f2d1f..fd8cb728a3 100644 --- a/block/replication.c +++ b/block/replication.c @@ -166,7 +166,12 @@ static void replication_child_perm(BlockDriverState *bs, BdrvChild *c, uint64_t perm, uint64_t shared, uint64_t *nperm, uint64_t *nshared) { -*nperm = BLK_PERM_CONSISTENT_READ; +if (role & BDRV_CHILD_PRIMARY) { +*nperm = BLK_PERM_CONSISTENT_READ; +} else { +*nperm = 0; +} + if ((bs->open_flags & (BDRV_O_INACTIVE | BDRV_O_RDWR)) == BDRV_O_RDWR) { *nperm |= BLK_PERM_WRITE; } @@ -340,17 +345,7 @@ static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) return; } -BlockBackend *blk = blk_new(qemu_get_current_aio_context(), -BLK_PERM_WRITE, BLK_PERM_ALL); -blk_insert_bs(blk, s->hidden_disk->bs, &local_err); -if (local_err) { -error_propagate(errp, local_err); -blk_unref(blk); -return; -} - -ret = blk_make_empty(blk, errp); -blk_unref(blk); +ret = bdrv_make_empty(s->hidden_disk, errp); if (ret < 0) { return; } @@ -365,27 +360,35 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable, Error **errp) { BDRVReplicationState *s = bs->opaque; +BdrvChild *hidden_disk, *secondary_disk; BlockReopenQueue *reopen_queue = NULL; +/* + * s->hidden_disk and s->secondary_disk may not be set yet, as they will + * only be set after the children are writable. + */ +hidden_disk = bs->file->bs->backing; +secondary_disk = hidden_disk->bs->backing; + if (writable) { -s->orig_hidden_read_only = bdrv_is_read_only(s->hidden_disk->bs); -s->orig_secondary_read_only = bdrv_is_read_only(s->secondary_disk->bs); +s->orig_hidden_read_only = bdrv_is_read_only(hidden_disk->bs); +s->orig_secondary_read_only = bdrv_is_read_only(secondary_disk->bs); } -bdrv_subtree_drained_begin(s->hidden_disk->bs); -bdrv_subtree_drained_begin(s->secondary_disk->bs); +bdrv_subtree_drained_begin(hidden_disk->bs); +bdrv_subtree_drained_begin(secondary_disk->bs); if (s->orig_hidden_read_only) { QDict *opts = qdict_new(); qdict_put_bool(opts, BDRV_OPT_READ_ONLY, !writable); -reopen_queue = bdrv_reopen_queue(reopen_queue, s->hidden_disk->bs, +reopen_queue = bdrv_reopen_queue(reopen_queue, hidden_disk->bs, opts, true); } if (s->orig_secondary_read_only) { QDict *opts = qdict_new(); qdict_put_bool(opts, BDRV_OPT_READ_ONLY, !writable); -reopen_queue = bdrv_reopen_queue(reopen_queue, s->secondary_disk->bs, +reopen_queue = bdrv_reopen_queue(reopen_queue, secondary_disk->bs, opts, true); } @@ -393,8 +396,8 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable, bdrv_reopen_multiple(reopen_queue, errp); } -bdrv_subtree_drained_end(s->hidden_disk->bs); -bdrv_subtree_drained_end(s->secondary_disk->bs); +bdrv_subtree_drained_end(hidden_disk->bs); +bdrv_subtree_drained_end(secondary_disk->bs); } static void backup_job_cleanup(BlockDriverState *bs) @@ -451,6 +454,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, BlockDriverState *bs = rs->opaque; BDRVReplicationState *s; BlockDriverState *top_bs; +BdrvChild *active_disk, *hidden_disk, *secondary_disk; int64_t active_length, hidden_length, disk_length; AioContext *aio_context; Error *local_err = NULL; @@ -488,32 +492,32 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, case REPLICATION_MODE_PRIMARY: break; case REPLICATION_MODE_SECONDARY: -s->active_disk = bs->file; -if (!s->active_disk || !s->active_disk->bs || -!s->active_disk->bs->backing) { +active_disk = bs->file; +if (!active_disk || !active_disk->bs || +!active_disk->bs->backing) {
Re: [PATCH v5 6/6] block: Make blockdev-reopen stable API
06.07.2021 14:23, Kevin Wolf wrote: From: Alberto Garcia This patch drops the 'x-' prefix from x-blockdev-reopen. Signed-off-by: Alberto Garcia Signed-off-by: Kevin Wolf Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [PATCH v5 5/6] iotests: Test reopening multiple devices at the same time
06.07.2021 14:23, Kevin Wolf wrote: From: Alberto Garcia This test swaps the images used by two active block devices. This is now possible thanks to the new ability to run x-blockdev-reopen on multiple devices at the same time. Signed-off-by: Alberto Garcia Signed-off-by: Kevin Wolf Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [PATCH v5 0/6] block/rbd: migrate to coroutines and add write zeroes support
> Am 06.07.2021 um 17:25 schrieb Kevin Wolf : > > Am 06.07.2021 um 16:55 hat Peter Lieven geschrieben: Am 06.07.2021 um 15:19 schrieb Kevin Wolf : >>> >>> Am 02.07.2021 um 19:23 hat Ilya Dryomov geschrieben: This series migrates the qemu rbd driver from the old aio emulation to native coroutines and adds write zeroes support which is important for block operations. To achieve this we first bump the librbd requirement to the already outdated luminous release of ceph to get rid of some wrappers and ifdef'ry in the code. >>> >>> Thanks, applied to the block branch. >>> >>> I've only had a very quick look at the patches, but I think there is one >>> suggestion for a cleanup I can make: The qemu_rbd_finish_bh() >>> indirection is probably unnecessary now because aio_co_wake() is thread >>> safe. >> >> But this is new, isn’t it? > > Not sure in what sense you mean. aio_co_wake() has always been thread > safe, as far as I know. > > Obviously, the old code didn't use aio_co_wake(), but directly called > some callbacks, so the part that is new is your coroutine conversion > that enables getting rid of the BH. > >> We also have this indirection in iscsi and nfs drivers I think. > > Indeed, the resulting codes look the same. In iscsi and nfs it doesn't > come from an incomplete converstion to coroutines, but they both used > qemu_coroutine_enter() originally, which resumes the coroutine in the > current thread... If I remember correctly this would also serialize requests and thus we used BHs. libnfs and libiscsi are not thread safe as well and they completely run in qemus threads so this wasn’t the original reason. Thanks for the hints to the relevant commit. I will send a follow up for rbd/nfs/iscsi in about 2 weeks. Peter > >> Does it matter that the completion callback is called from an librbd >> thread? Will the coroutine continue to run in the right thread? > > ...whereas aio_co_wake() resumes the coroutine in the thread where it > was running before. > > (Before commit 5f50be9b5, this would have been buggy from an librbd > thread, but now it should work correctly even for threads that are > neither iothreads nor vcpu threads.) > >> I will have a decent look after my vacation. > > Sounds good, thanks. Enjoy your vacation! > > Kevin >
Re: [PATCH v5 4/6] block: Support multiple reopening with x-blockdev-reopen
06.07.2021 14:23, Kevin Wolf wrote: From: Alberto Garcia [ kwolf: Fixed AioContext locking ] Signed-off-by: Alberto Garcia Signed-off-by: Kevin Wolf Reviewed-by: Vladimir Sementsov-Ogievskiy --- qapi/block-core.json | 18 +++-- blockdev.c| 81 ++- tests/qemu-iotests/155| 9 ++- tests/qemu-iotests/165| 4 +- tests/qemu-iotests/245| 27 --- tests/qemu-iotests/248| 2 +- tests/qemu-iotests/248.out| 2 +- tests/qemu-iotests/296| 9 ++- [..] ## # @blockdev-del: diff --git a/blockdev.c b/blockdev.c index f657d090d3..1e8c946828 100644 --- a/blockdev.c +++ b/blockdev.c [..] -bs = bdrv_find_node(options->node_name); -if (!bs) { -error_setg(errp, "Failed to find node with node-name='%s'", +bs = bdrv_find_node(options->node_name); +if (!bs) { +error_setg(errp, "Failed to find node with node-name='%s'", options->node_name); indentation broken -goto fail; -} +goto fail; +} -/* Put all options in a QDict and flatten it */ -visit_type_BlockdevOptions(v, NULL, &options, &error_abort); -visit_complete(v, &obj); -qdict = qobject_to(QDict, obj); +/* Put all options in a QDict and flatten it */ +v = qobject_output_visitor_new(&obj); +visit_type_BlockdevOptions(v, NULL, &options, &error_abort); +visit_complete(v, &obj); +visit_free(v); [..] +# Run x-blockdev-reopen on a list of block devices +def reopenMultiple(self, opts, errmsg = None): +result = self.vm.qmp('x-blockdev-reopen', conv_keys = False, options = opts) I don't really care if this doesn't break iotest 297, but PEP8 doesn't like whitespaces around '=' for named arguments.. +if errmsg: +self.assert_qmp(result, 'error/class', 'GenericError') +self.assert_qmp(result, 'error/desc', errmsg) +else: +self.assert_qmp(result, 'return', {}) -- Best regards, Vladimir
Re: [ovirt-users] Re: Any way to terminate stuck export task
On Tue, Jul 6, 2021 at 5:55 PM Gianluca Cecchi wrote: > > On Tue, Jul 6, 2021 at 2:52 PM Nir Soffer wrote: > >> >> >> Too bad. >> >> You can evaluate how ovirt 4.4. will work with this appliance using >> this dd command: >> >> dd if=/dev/zero bs=8M count=38400 of=/path/to/new/disk >> oflag=direct conv=fsync >> >> We don't use dd for this, but the operation is the same on NFS < 4.2. >> > > I confirm I'm able to saturate the 1Gb/s link. tried creating a 10Gb file on > the StoreOnce appliance > # time dd if=/dev/zero bs=8M count=1280 > of=/rhev/data-center/mnt/172.16.1.137\:_nas_EXPORT-DOMAIN/ansible_ova/test.img > oflag=direct conv=fsync > 1280+0 records in > 1280+0 records out > 10737418240 bytes (11 GB) copied, 98.0172 s, 110 MB/s > > real 1m38.035s > user 0m0.003s > sys 0m2.366s > > So are you saying that after upgrading to 4.4.6 (or just released 4.4.7) I > should be able to export with this speed? The preallocation part will run at the same speed, and then you need to copy the used parts of the disk, time depending on how much data is used. > Or anyway I do need NFS v4.2? Without NFS 4.2. With NFS 4.2 the entire allocation will take less than a second without consuming any network bandwidth. > BTW: is there any capping put in place by oVirt to the export phase (the > qemu-img command in practice)? Designed for example not to perturbate the > activity of hypervisor?Or do you think that if I have a 10Gb/s network > backend and powerful disks on oVirt and powerful NFS server processing power > I should have much more speed? We don't have any capping in place, usually people complain that copying images is too slow. In general when copying to file base storage we don't use -W option (unordered writes) so copy will be slower compared with block based storage, when qemu-img use 8 concurrent writes. So in a way we always cap the copies to file based storage. To get maximum throughput you need to run multiple copies at the same time. Nir
Re: [PATCH v5 3/6] block: Acquire AioContexts during bdrv_reopen_multiple()
06.07.2021 14:23, Kevin Wolf wrote: As the BlockReopenQueue can contain nodes in multiple AioContexts, only one of which may be locked when AIO_WAIT_WHILE() can be called, we can't let the caller lock the right contexts. Instead, individually lock the AioContext of a single node when iterating the queue. Reintroduce bdrv_reopen() as a wrapper for reopening a single node that drains the node and temporarily drops the AioContext lock for bdrv_reopen_multiple(). Signed-off-by: Kevin Wolf I don't have the whole picture in mind. But looks clear: Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [PATCH v5 0/6] block/rbd: migrate to coroutines and add write zeroes support
Am 06.07.2021 um 16:55 hat Peter Lieven geschrieben: > > Am 06.07.2021 um 15:19 schrieb Kevin Wolf : > > > > Am 02.07.2021 um 19:23 hat Ilya Dryomov geschrieben: > >> This series migrates the qemu rbd driver from the old aio emulation > >> to native coroutines and adds write zeroes support which is important > >> for block operations. > >> > >> To achieve this we first bump the librbd requirement to the already > >> outdated luminous release of ceph to get rid of some wrappers and > >> ifdef'ry in the code. > > > > Thanks, applied to the block branch. > > > > I've only had a very quick look at the patches, but I think there is one > > suggestion for a cleanup I can make: The qemu_rbd_finish_bh() > > indirection is probably unnecessary now because aio_co_wake() is thread > > safe. > > But this is new, isn’t it? Not sure in what sense you mean. aio_co_wake() has always been thread safe, as far as I know. Obviously, the old code didn't use aio_co_wake(), but directly called some callbacks, so the part that is new is your coroutine conversion that enables getting rid of the BH. > We also have this indirection in iscsi and nfs drivers I think. Indeed, the resulting codes look the same. In iscsi and nfs it doesn't come from an incomplete converstion to coroutines, but they both used qemu_coroutine_enter() originally, which resumes the coroutine in the current thread... > Does it matter that the completion callback is called from an librbd > thread? Will the coroutine continue to run in the right thread? ...whereas aio_co_wake() resumes the coroutine in the thread where it was running before. (Before commit 5f50be9b5, this would have been buggy from an librbd thread, but now it should work correctly even for threads that are neither iothreads nor vcpu threads.) > I will have a decent look after my vacation. Sounds good, thanks. Enjoy your vacation! Kevin
Re: [PATCH v5 0/6] block/rbd: migrate to coroutines and add write zeroes support
> Am 06.07.2021 um 15:19 schrieb Kevin Wolf : > > Am 02.07.2021 um 19:23 hat Ilya Dryomov geschrieben: >> This series migrates the qemu rbd driver from the old aio emulation >> to native coroutines and adds write zeroes support which is important >> for block operations. >> >> To achieve this we first bump the librbd requirement to the already >> outdated luminous release of ceph to get rid of some wrappers and >> ifdef'ry in the code. > > Thanks, applied to the block branch. > > I've only had a very quick look at the patches, but I think there is one > suggestion for a cleanup I can make: The qemu_rbd_finish_bh() > indirection is probably unnecessary now because aio_co_wake() is thread > safe. But this is new, isn’t it? We also have this indirection in iscsi and nfs drivers I think. Does it matter that the completion callback is called from an librbd thread? Will the coroutine continue to run in the right thread? I will have a decent look after my vacation. Anyway, Thanks for applying, Peter > > (Also, if I were the responsible maintainer, I would prefer true/false > rather than 0/1 for bools, but that's minor. :-)) > > Kevin >
Re: [ovirt-users] Re: Any way to terminate stuck export task
On Tue, Jul 6, 2021 at 2:52 PM Nir Soffer wrote: > > Too bad. > > You can evaluate how ovirt 4.4. will work with this appliance using > this dd command: > > dd if=/dev/zero bs=8M count=38400 of=/path/to/new/disk > oflag=direct conv=fsync > > We don't use dd for this, but the operation is the same on NFS < 4.2. > > I confirm I'm able to saturate the 1Gb/s link. tried creating a 10Gb file on the StoreOnce appliance # time dd if=/dev/zero bs=8M count=1280 of=/rhev/data-center/mnt/ 172.16.1.137\:_nas_EXPORT-DOMAIN/ansible_ova/test.img oflag=direct conv=fsync 1280+0 records in 1280+0 records out 10737418240 bytes (11 GB) copied, 98.0172 s, 110 MB/s real 1m38.035s user 0m0.003s sys 0m2.366s So are you saying that after upgrading to 4.4.6 (or just released 4.4.7) I should be able to export with this speed? Or anyway I do need NFS v4.2? BTW: is there any capping put in place by oVirt to the export phase (the qemu-img command in practice)? Designed for example not to perturbate the activity of hypervisor?Or do you think that if I have a 10Gb/s network backend and powerful disks on oVirt and powerful NFS server processing power I should have much more speed? > Based on the 50 MiB/s rate you reported earlier, I guess you have a > 1Gbit network to > this appliance, so zeroing can do up to 128 MiB/s, which will take > about 40 minutes > for 300G. > > Using NFS 4.2, fallocate will complete in less than a second. > I can sort of confirm this also for 4.3.10. I have a test CentOS 7.4 VM configured as NFS server and, if I configure it as an export domain using the default autonegotiate option, it is (strangely enough) mounted as NFS v4.1 and the initial fallocate takes some minutes (55Gb disk). If I reconfigure it forcing NFS v4.2, it does it and the initial fallocate is immediate, in the sense that "ls -l" on the export domain becomes quite immediately the size of the virtual disk. Thanks, Gianluca
Re: [PATCH v5 1/6] qcow2: Fix dangling pointer after reopen for 'file'
Am 06.07.2021 um 15:12 hat Vladimir Sementsov-Ogievskiy geschrieben: > 06.07.2021 14:23, Kevin Wolf wrote: > > Without an external data file, s->data_file is a second pointer with the > > same value as bs->file. When changing bs->file to a different BdrvChild > > and freeing the old BdrvChild, s->data_file must also be updated, > > otherwise it points to freed memory and causes crashes. > > > > This problem was caught by iotests case 245. > > > > Fixes: df2b7086f169239ebad5d150efa29c9bb6d4f820 > > Signed-off-by: Kevin Wolf > > Reviewed-by: Vladimir Sementsov-Ogievskiy > > Still, some ideas below: > > > --- > > block/qcow2.c | 14 ++ > > 1 file changed, 14 insertions(+) > > > > diff --git a/block/qcow2.c b/block/qcow2.c > > index ee4530cdbd..cb459ef6a6 100644 > > --- a/block/qcow2.c > > +++ b/block/qcow2.c > > @@ -962,6 +962,7 @@ static bool read_cache_sizes(BlockDriverState *bs, > > QemuOpts *opts, > > } > > typedef struct Qcow2ReopenState { > > +bool had_data_file; > > Qcow2Cache *l2_table_cache; > > Qcow2Cache *refcount_block_cache; > > int l2_slice_size; /* Number of entries in a slice of the L2 table */ > > @@ -1932,6 +1933,8 @@ static int qcow2_reopen_prepare(BDRVReopenState > > *state, > > r = g_new0(Qcow2ReopenState, 1); > > state->opaque = r; > > +r->had_data_file = has_data_file(state->bs); > > + > > So, during reopen, at some moment s->data_file become invalid. So, we > shouldn't rely on it.. > > Maybe we need > >s->data_file = NULL; > > here.. "need" is a strong word, but I guess we shouldn't access it between prepare and commit, so I agree setting it to NULL would make bugs in this area very visible. In fact, we wouldn't even need r->had_data_file then because commit could just set s->data_file = state->bs->file if it's NULL. > > ret = qcow2_update_options_prepare(state->bs, r, state->options, > > state->flags, errp); > > if (ret < 0) { > > @@ -1966,7 +1969,18 @@ fail: > > static void qcow2_reopen_commit(BDRVReopenState *state) > > { > > +BDRVQcow2State *s = state->bs->opaque; > > +Qcow2ReopenState *r = state->opaque; > > + > > qcow2_update_options_commit(state->bs, state->opaque); > > Worth doing > >assert(r->had_data_file == has_data_file(state->bs)); > > here, to be double sure? This would be wrong because at the time that this runs, state->bs->file is already updated and this is what has_data_file() checks against. So you can't use has_data_file() any more until it's synced again with the code below. In fact, this is why I even added r->had_data_file. At first I directly used has_data_file(state->bs) here and watched it break. > > +if (!r->had_data_file && s->data_file != state->bs->file) { > > +/* > > + * If s->data_file is just a second pointer to bs->file (which is > > the > > + * case without an external data file), it may need to be updated. > > + */ > > +s->data_file = state->bs->file; > > +assert(!has_data_file(state->bs)); > > +} > > g_free(state->opaque); > > } Kevin
Re: [PATCH v5 0/6] block/rbd: migrate to coroutines and add write zeroes support
Am 02.07.2021 um 19:23 hat Ilya Dryomov geschrieben: > This series migrates the qemu rbd driver from the old aio emulation > to native coroutines and adds write zeroes support which is important > for block operations. > > To achieve this we first bump the librbd requirement to the already > outdated luminous release of ceph to get rid of some wrappers and > ifdef'ry in the code. Thanks, applied to the block branch. I've only had a very quick look at the patches, but I think there is one suggestion for a cleanup I can make: The qemu_rbd_finish_bh() indirection is probably unnecessary now because aio_co_wake() is thread safe. (Also, if I were the responsible maintainer, I would prefer true/false rather than 0/1 for bools, but that's minor. :-)) Kevin
Re: [PATCH v5 1/3] block/file-posix: Optimize for macOS
On Mon, Jul 05, 2021 at 10:04:56PM +0900, Akihiko Odaki wrote: > This commit introduces "punch hole" operation and optimizes transfer > block size for macOS. > > Thanks to Konstantin Nazarov for detailed analysis of a flaw in an > old version of this change: > https://gist.github.com/akihikodaki/87df4149e7ca87f18dc56807ec5a1bc5#gistcomment-3654667 > > Signed-off-by: Akihiko Odaki > --- > block/file-posix.c | 27 +-- > 1 file changed, 25 insertions(+), 2 deletions(-) Thanks, applied to my block tree: https://gitlab.com/stefanha/qemu/commits/block Stefan signature.asc Description: PGP signature
Re: [PATCH v5 2/6] block: Add bdrv_reopen_queue_free()
06.07.2021 14:23, Kevin Wolf wrote: From: Alberto Garcia Move the code to free a BlockReopenQueue to a separate function. It will be used in a subsequent patch. [ kwolf: Also free explicit_options and options, and explicitly qobject_ref() the value when it continues to be used. This avoids memory leaks as we saw them in two recent patches. ] Hmm, not clear from the context which two patches you mean Signed-off-by: Alberto Garcia Signed-off-by: Kevin Wolf Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [PATCH v5 1/6] qcow2: Fix dangling pointer after reopen for 'file'
06.07.2021 14:23, Kevin Wolf wrote: Without an external data file, s->data_file is a second pointer with the same value as bs->file. When changing bs->file to a different BdrvChild and freeing the old BdrvChild, s->data_file must also be updated, otherwise it points to freed memory and causes crashes. This problem was caught by iotests case 245. Fixes: df2b7086f169239ebad5d150efa29c9bb6d4f820 Signed-off-by: Kevin Wolf Reviewed-by: Vladimir Sementsov-Ogievskiy Still, some ideas below: --- block/qcow2.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/block/qcow2.c b/block/qcow2.c index ee4530cdbd..cb459ef6a6 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -962,6 +962,7 @@ static bool read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, } typedef struct Qcow2ReopenState { +bool had_data_file; Qcow2Cache *l2_table_cache; Qcow2Cache *refcount_block_cache; int l2_slice_size; /* Number of entries in a slice of the L2 table */ @@ -1932,6 +1933,8 @@ static int qcow2_reopen_prepare(BDRVReopenState *state, r = g_new0(Qcow2ReopenState, 1); state->opaque = r; +r->had_data_file = has_data_file(state->bs); + So, during reopen, at some moment s->data_file become invalid. So, we shouldn't rely on it.. Maybe we need s->data_file = NULL; here.. ret = qcow2_update_options_prepare(state->bs, r, state->options, state->flags, errp); if (ret < 0) { @@ -1966,7 +1969,18 @@ fail: static void qcow2_reopen_commit(BDRVReopenState *state) { +BDRVQcow2State *s = state->bs->opaque; +Qcow2ReopenState *r = state->opaque; + qcow2_update_options_commit(state->bs, state->opaque); Worth doing assert(r->had_data_file == has_data_file(state->bs)); here, to be double sure? +if (!r->had_data_file && s->data_file != state->bs->file) { +/* + * If s->data_file is just a second pointer to bs->file (which is the + * case without an external data file), it may need to be updated. + */ +s->data_file = state->bs->file; +assert(!has_data_file(state->bs)); +} g_free(state->opaque); } -- Best regards, Vladimir
Re: [ovirt-users] Re: Any way to terminate stuck export task
On Tue, Jul 6, 2021 at 10:21 AM Gianluca Cecchi wrote: > > On Mon, Jul 5, 2021 at 5:06 PM Nir Soffer wrote: > >> >> >> qemu-img is busy in posix_fallocate(), wiring one byte to every 4k block. >> >> If you add -tt -T (as I suggested), we can see how much time each write >> takes, >> which may explain why this takes so much time. >> >> strace -f -p 14342 --tt -T >> > > It seems I missed part of your suggestion... i didn't get the "-tt -T" (or I > didn't see it...) > > With it I get this during the export (in networking of host console 4 > mbit/s): > > # strace -f -p 25243 -tt -T > strace: Process 25243 attached with 2 threads > [pid 25243] 09:17:32.503907 ppoll([{fd=9, events=POLLIN|POLLERR|POLLHUP}], 1, > NULL, NULL, 8 > [pid 25244] 09:17:32.694207 pwrite64(12, "\0", 1, 3773509631) = 1 <0.59> > [pid 25244] 09:17:32.694412 pwrite64(12, "\0", 1, 3773513727) = 1 <0.78> > [pid 25244] 09:17:32.694608 pwrite64(12, "\0", 1, 3773517823) = 1 <0.56> > [pid 25244] 09:17:32.694729 pwrite64(12, "\0", 1, 3773521919) = 1 <0.24> > [pid 25244] 09:17:32.694796 pwrite64(12, "\0", 1, 3773526015) = 1 <0.20> > [pid 25244] 09:17:32.694855 pwrite64(12, "\0", 1, 3773530111) = 1 <0.15> > [pid 25244] 09:17:32.694908 pwrite64(12, "\0", 1, 3773534207) = 1 <0.14> > [pid 25244] 09:17:32.694950 pwrite64(12, "\0", 1, 3773538303) = 1 <0.16> > [pid 25244] 09:17:32.694993 pwrite64(12, "\0", 1, 3773542399) = 1 <0.200032> > [pid 25244] 09:17:32.895140 pwrite64(12, "\0", 1, 3773546495) = 1 <0.34> > [pid 25244] 09:17:32.895227 pwrite64(12, "\0", 1, 3773550591) = 1 <0.29> > [pid 25244] 09:17:32.895296 pwrite64(12, "\0", 1, 3773554687) = 1 <0.24> > [pid 25244] 09:17:32.895353 pwrite64(12, "\0", 1, 3773558783) = 1 <0.16> > [pid 25244] 09:17:32.895400 pwrite64(12, "\0", 1, 3773562879) = 1 <0.15> > [pid 25244] 09:17:32.895443 pwrite64(12, "\0", 1, 3773566975) = 1 <0.15> > [pid 25244] 09:17:32.895485 pwrite64(12, "\0", 1, 3773571071) = 1 <0.15> > [pid 25244] 09:17:32.895527 pwrite64(12, "\0", 1, 3773575167) = 1 <0.17> > [pid 25244] 09:17:32.895570 pwrite64(12, "\0", 1, 3773579263) = 1 <0.199493> > [pid 25244] 09:17:33.095147 pwrite64(12, "\0", 1, 3773583359) = 1 <0.31> > [pid 25244] 09:17:33.095262 pwrite64(12, "\0", 1, 3773587455) = 1 <0.61> > [pid 25244] 09:17:33.095378 pwrite64(12, "\0", 1, 3773591551) = 1 <0.27> > [pid 25244] 09:17:33.095445 pwrite64(12, "\0", 1, 3773595647) = 1 <0.21> > [pid 25244] 09:17:33.095498 pwrite64(12, "\0", 1, 3773599743) = 1 <0.16> > [pid 25244] 09:17:33.095542 pwrite64(12, "\0", 1, 3773603839) = 1 <0.14> Most writes are pretty fast, but from time to time there is a very slow write. >From the small sample you posted, we have: awk '{print $11}' strace.out | sed -e "s///" | awk '{sum+=$1; if ($1 < 0.1) {fast+=$1; fast_nr++} else {slow+=$1; slow_nr++}} END{printf "average: %.6f slow: %.6f fast: %.6f\n", sum/NR, slow/slow_nr, fast/fast_nr}' average: 0.016673 slow: 0.199763 fast: 0.28 Preallocating a 300 GiB disk will take about 15 days :-) >>> 300*1024**3 / 4096 * 0.016673 / 3600 / 24 15.176135 If all writes would be fast, it will take less than an hour: >>> 300*1024**3 / 4096 * 0.28 / 3600 0.61166933 > . . . > > BTW: it seems my NAS appliance doesn't support 4.2 version of NFS, because if > I force it, I then get an error in mount and in engine.log this error for > both nodes as they try to mount: > > 2021-07-05 17:01:56,082+02 ERROR > [org.ovirt.engine.core.bll.storage.connection.FileStorageHelper] > (EE-ManagedThreadFactory-engine-Thread-2554190) [642eb6be] The connection > with details '172.16.1.137:/nas/EXPORT-DOMAIN' failed because of error code > '477' and error message is: problem while trying to mount target > > > and in vdsm.log: > MountError: (32, ';mount.nfs: Protocol not supported\n') Too bad. You can evaluate how ovirt 4.4. will work with this appliance using this dd command: dd if=/dev/zero bs=8M count=38400 of=/path/to/new/disk oflag=direct conv=fsync We don't use dd for this, but the operation is the same on NFS < 4.2. Based on the 50 MiB/s rate you reported earlier, I guess you have a 1Gbit network to this appliance, so zeroing can do up to 128 MiB/s, which will take about 40 minutes for 300G. Using NFS 4.2, fallocate will complete in less than a second. Here is example from my test system, creating 90g raw preallocated volume: 2021-07-06 15:46:40,382+0300 INFO (tasks/1) [storage.Volume] Request to create RAW volume /rhev/data-center/mnt/storage2:_exp ort_00/a600ba04-34f9-4793-a5dc-6d4150716d14/images/bcf7c623-8fd8-47b3-aaee-a65c0872536d/82def38d-b41b-4126-826e-0513d669f1b5 with capacity = 96636764160 (fileVolume:493) ... 2021-07-06 15:46:40,447+0300 INFO (tasks/1) [storage.Volume] Preallocating volume /rhev/data-center/mnt/storage2:_export_00/a600ba04-34f9-4793-a5dc-6d4150716d14/images/bcf7c623-8fd8-47b3-aaee-a65c0872536d/82def38d-b41b-4126-826e-0513d66
Re: [PATCH]: /hw/nvme/ctrl error handling if descriptors are greater than 1024
Hi Padmakar, Patch looks good, but it got messed up by your MUA. You've previously contributed correctly formatted patches, so please revert to that method when sending patches in the future ;) Reviewed-by: Klaus Jensen On Jul 6 16:13, Padmakar Kalghatgi wrote: From: padmakar if the number of descriptors or pages is more than 1024, dma writes or reads will result in failure. Hence, we check if the number of descriptors or pages is more than 1024 in the nvme module and return Internal Device error. Signed-off-by: Padmakar Kalghatgi --- hw/nvme/ctrl.c | 14 ++ hw/nvme/trace-events | 1 + 2 files changed, 15 insertions(+) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 40a7efc..082592f 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -602,6 +602,20 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, NvmeSg *sg, hwaddr addr, size_t len) return NVME_SUCCESS; } +/* + * The QEMU has an inherent issue where in if the no. + * of descriptors is more than 1024, it will result in + * failure during the dma write or reads. Hence, we need + * to return the error. + */ + +if (((sg->flags & NVME_SG_DMA) && ((sg->qsg.nsg + 1) > IOV_MAX)) || +((sg->iov.niov + 1) > IOV_MAX)) { +NVME_GUEST_ERR(pci_nvme_ub_sg_desc_toohigh, + "number of descriptors is greater than 1024"); +return NVME_INTERNAL_DEV_ERROR; +} + trace_pci_nvme_map_addr(addr, len); if (nvme_addr_is_cmb(n, addr)) { diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events index ea33d0c..bfe1a3b 100644 --- a/hw/nvme/trace-events +++ b/hw/nvme/trace-events @@ -202,3 +202,4 @@ pci_nvme_ub_db_wr_invalid_cqhead(uint32_t qid, uint16_t new_head) "completion qu pci_nvme_ub_db_wr_invalid_sq(uint32_t qid) "submission queue doorbell write for nonexistent queue, sqid=%"PRIu32", ignoring" pci_nvme_ub_db_wr_invalid_sqtail(uint32_t qid, uint16_t new_tail) "submission queue doorbell write value beyond queue size, sqid=%"PRIu32", new_head=%"PRIu16", ignoring" pci_nvme_ub_unknown_css_value(void) "unknown value in cc.css field" +pci_nvme_ub_sg_desc_toohigh(void) "the number of sg descriptors is too high" -- 2.7.0.windows.1 signature.asc Description: PGP signature
[PATCH v5 6/6] block: Make blockdev-reopen stable API
From: Alberto Garcia This patch drops the 'x-' prefix from x-blockdev-reopen. Signed-off-by: Alberto Garcia Signed-off-by: Kevin Wolf --- qapi/block-core.json| 6 +++--- blockdev.c | 2 +- tests/qemu-iotests/155 | 2 +- tests/qemu-iotests/165 | 2 +- tests/qemu-iotests/245 | 10 +- tests/qemu-iotests/248 | 2 +- tests/qemu-iotests/248.out | 2 +- tests/qemu-iotests/296 | 2 +- tests/qemu-iotests/298 | 2 +- tests/qemu-iotests/tests/remove-bitmap-from-backing | 4 ++-- 10 files changed, 17 insertions(+), 17 deletions(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index 052520331e..2eb399f0d4 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -4219,7 +4219,7 @@ { 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true } ## -# @x-blockdev-reopen: +# @blockdev-reopen: # # Reopens one or more block devices using the given set of options. # Any option not specified will be reset to its default value regardless @@ -4257,9 +4257,9 @@ # image does not have a default backing file name as part of its # metadata. # -# Since: 4.0 +# Since: 6.0 ## -{ 'command': 'x-blockdev-reopen', +{ 'command': 'blockdev-reopen', 'data': { 'options': ['BlockdevOptions'] } } ## diff --git a/blockdev.c b/blockdev.c index 1e8c946828..7639f2108e 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3560,7 +3560,7 @@ fail: visit_free(v); } -void qmp_x_blockdev_reopen(BlockdevOptionsList *reopen_list, Error **errp) +void qmp_blockdev_reopen(BlockdevOptionsList *reopen_list, Error **errp) { BlockReopenQueue *queue = NULL; GSList *drained = NULL; diff --git a/tests/qemu-iotests/155 b/tests/qemu-iotests/155 index 3400b0312a..fec43d662d 100755 --- a/tests/qemu-iotests/155 +++ b/tests/qemu-iotests/155 @@ -261,7 +261,7 @@ class TestBlockdevMirrorReopen(MirrorBaseClass): result = self.vm.qmp('blockdev-add', node_name="backing", driver="null-co") self.assert_qmp(result, 'return', {}) -result = self.vm.qmp('x-blockdev-reopen', options = [{ +result = self.vm.qmp('blockdev-reopen', options = [{ 'node-name': "target", 'driver': iotests.imgfmt, 'file': "target-file", diff --git a/tests/qemu-iotests/165 b/tests/qemu-iotests/165 index 57aa88ecae..92a431315b 100755 --- a/tests/qemu-iotests/165 +++ b/tests/qemu-iotests/165 @@ -137,7 +137,7 @@ class TestPersistentDirtyBitmap(iotests.QMPTestCase): assert sha256_1 == self.getSha256() # Reopen to RW -result = self.vm.qmp('x-blockdev-reopen', options = [{ +result = self.vm.qmp('blockdev-reopen', options = [{ 'node-name': 'node0', 'driver': iotests.imgfmt, 'file': { diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245 index 6eff352099..28a116a6aa 100755 --- a/tests/qemu-iotests/245 +++ b/tests/qemu-iotests/245 @@ -1,7 +1,7 @@ #!/usr/bin/env python3 # group: rw # -# Test cases for the QMP 'x-blockdev-reopen' command +# Test cases for the QMP 'blockdev-reopen' command # # Copyright (C) 2018-2019 Igalia, S.L. # Author: Alberto Garcia @@ -85,16 +85,16 @@ class TestBlockdevReopen(iotests.QMPTestCase): "Expected output of %d qemu-io commands, found %d" % (found, self.total_io_cmds)) -# Run x-blockdev-reopen on a list of block devices +# Run blockdev-reopen on a list of block devices def reopenMultiple(self, opts, errmsg = None): -result = self.vm.qmp('x-blockdev-reopen', conv_keys = False, options = opts) +result = self.vm.qmp('blockdev-reopen', conv_keys = False, options = opts) if errmsg: self.assert_qmp(result, 'error/class', 'GenericError') self.assert_qmp(result, 'error/desc', errmsg) else: self.assert_qmp(result, 'return', {}) -# Run x-blockdev-reopen on a single block device (specified by +# Run blockdev-reopen on a single block device (specified by # 'opts') but applying 'newopts' on top of it. The original 'opts' # dict is unmodified def reopen(self, opts, newopts = {}, errmsg = None): @@ -161,7 +161,7 @@ class TestBlockdevReopen(iotests.QMPTestCase): self.reopen(opts, {'file.locking': 'off'}, "Cannot change the option 'locking'") self.reopen(opts, {'file.filename': None}, "Invalid parameter type for 'options[0].file.filename', expected: string") -# node-name is optional in BlockdevOptions, but x-blockdev-reopen needs it +# node-name is optional in BlockdevOptions, but blo
[PATCH v5 1/6] qcow2: Fix dangling pointer after reopen for 'file'
Without an external data file, s->data_file is a second pointer with the same value as bs->file. When changing bs->file to a different BdrvChild and freeing the old BdrvChild, s->data_file must also be updated, otherwise it points to freed memory and causes crashes. This problem was caught by iotests case 245. Fixes: df2b7086f169239ebad5d150efa29c9bb6d4f820 Signed-off-by: Kevin Wolf --- block/qcow2.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/block/qcow2.c b/block/qcow2.c index ee4530cdbd..cb459ef6a6 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -962,6 +962,7 @@ static bool read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, } typedef struct Qcow2ReopenState { +bool had_data_file; Qcow2Cache *l2_table_cache; Qcow2Cache *refcount_block_cache; int l2_slice_size; /* Number of entries in a slice of the L2 table */ @@ -1932,6 +1933,8 @@ static int qcow2_reopen_prepare(BDRVReopenState *state, r = g_new0(Qcow2ReopenState, 1); state->opaque = r; +r->had_data_file = has_data_file(state->bs); + ret = qcow2_update_options_prepare(state->bs, r, state->options, state->flags, errp); if (ret < 0) { @@ -1966,7 +1969,18 @@ fail: static void qcow2_reopen_commit(BDRVReopenState *state) { +BDRVQcow2State *s = state->bs->opaque; +Qcow2ReopenState *r = state->opaque; + qcow2_update_options_commit(state->bs, state->opaque); +if (!r->had_data_file && s->data_file != state->bs->file) { +/* + * If s->data_file is just a second pointer to bs->file (which is the + * case without an external data file), it may need to be updated. + */ +s->data_file = state->bs->file; +assert(!has_data_file(state->bs)); +} g_free(state->opaque); } -- 2.31.1
[PATCH v5 2/6] block: Add bdrv_reopen_queue_free()
From: Alberto Garcia Move the code to free a BlockReopenQueue to a separate function. It will be used in a subsequent patch. [ kwolf: Also free explicit_options and options, and explicitly qobject_ref() the value when it continues to be used. This avoids memory leaks as we saw them in two recent patches. ] Signed-off-by: Alberto Garcia Signed-off-by: Kevin Wolf --- include/block/block.h | 1 + block.c | 22 -- 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index 7ec77ecb1a..6d42992985 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -386,6 +386,7 @@ BlockDriverState *bdrv_new_open_driver(BlockDriver *drv, const char *node_name, BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue, BlockDriverState *bs, QDict *options, bool keep_old_opts); +void bdrv_reopen_queue_free(BlockReopenQueue *bs_queue); int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp); int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only, Error **errp); diff --git a/block.c b/block.c index acd35cb0cb..ee9b46e95e 100644 --- a/block.c +++ b/block.c @@ -4095,6 +4095,19 @@ BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue, NULL, 0, keep_old_opts); } +void bdrv_reopen_queue_free(BlockReopenQueue *bs_queue) +{ +if (bs_queue) { +BlockReopenQueueEntry *bs_entry, *next; +QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) { +qobject_unref(bs_entry->state.explicit_options); +qobject_unref(bs_entry->state.options); +g_free(bs_entry); +} +g_free(bs_queue); +} +} + /* * Reopen multiple BlockDriverStates atomically & transactionally. * @@ -4197,15 +4210,10 @@ abort: if (bs_entry->prepared) { bdrv_reopen_abort(&bs_entry->state); } -qobject_unref(bs_entry->state.explicit_options); -qobject_unref(bs_entry->state.options); } cleanup: -QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) { -g_free(bs_entry); -} -g_free(bs_queue); +bdrv_reopen_queue_free(bs_queue); return ret; } @@ -4573,6 +4581,8 @@ static void bdrv_reopen_commit(BDRVReopenState *reopen_state) /* set BDS specific flags now */ qobject_unref(bs->explicit_options); qobject_unref(bs->options); +qobject_ref(reopen_state->explicit_options); +qobject_ref(reopen_state->options); bs->explicit_options = reopen_state->explicit_options; bs->options= reopen_state->options; -- 2.31.1
[PATCH v5 0/6] Make blockdev-reopen stable
This series picks up the remaining patches from Berto's series "[PATCH v4 0/6] Allow changing bs->file on reopen", which are not merged into master yet. Apart from renaming 'x-blockdev-reopen' into 'blockdev-reopen', the remaining functional change in this series is taking a list of nodes to reopen as an argument so that multiple changes to the graph can be made atomically that would be invalid separately (e.g. due to permission checks on the intermediate state). It also contains a qcow2 fix for a bug introduced by the part of the series that was already picked up in Vladimir's "[PATCH v6 0/9] Allow changing bs->file on reopen". Alberto Garcia (4): block: Add bdrv_reopen_queue_free() block: Support multiple reopening with x-blockdev-reopen iotests: Test reopening multiple devices at the same time block: Make blockdev-reopen stable API Kevin Wolf (2): qcow2: Fix dangling pointer after reopen for 'file' block: Acquire AioContexts during bdrv_reopen_multiple() qapi/block-core.json | 24 +++--- include/block/block.h | 3 + block.c | 71 + block/qcow2.c | 14 block/replication.c | 7 ++ blockdev.c| 76 ++ qemu-io-cmds.c| 7 +- tests/qemu-iotests/155| 9 ++- tests/qemu-iotests/165| 4 +- tests/qemu-iotests/245| 78 +++ tests/qemu-iotests/245.out| 4 +- tests/qemu-iotests/248| 4 +- tests/qemu-iotests/248.out| 2 +- tests/qemu-iotests/296| 11 ++- tests/qemu-iotests/298| 4 +- .../tests/remove-bitmap-from-backing | 22 +++--- 16 files changed, 240 insertions(+), 100 deletions(-) -- 2.31.1
[PATCH v5 4/6] block: Support multiple reopening with x-blockdev-reopen
From: Alberto Garcia [ kwolf: Fixed AioContext locking ] Signed-off-by: Alberto Garcia Signed-off-by: Kevin Wolf --- qapi/block-core.json | 18 +++-- blockdev.c| 81 ++- tests/qemu-iotests/155| 9 ++- tests/qemu-iotests/165| 4 +- tests/qemu-iotests/245| 27 --- tests/qemu-iotests/248| 2 +- tests/qemu-iotests/248.out| 2 +- tests/qemu-iotests/296| 9 ++- tests/qemu-iotests/298| 4 +- .../tests/remove-bitmap-from-backing | 18 +++-- 10 files changed, 99 insertions(+), 75 deletions(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index 4a46552816..052520331e 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -4221,13 +4221,15 @@ ## # @x-blockdev-reopen: # -# Reopens a block device using the given set of options. Any option -# not specified will be reset to its default value regardless of its -# previous status. If an option cannot be changed or a particular +# Reopens one or more block devices using the given set of options. +# Any option not specified will be reset to its default value regardless +# of its previous status. If an option cannot be changed or a particular # driver does not support reopening then the command will return an -# error. +# error. All devices in the list are reopened in one transaction, so +# if one of them fails then the whole transaction is cancelled. # -# The top-level @node-name option (from BlockdevOptions) must be +# The command receives a list of block devices to reopen. For each one +# of them, the top-level @node-name option (from BlockdevOptions) must be # specified and is used to select the block device to be reopened. # Other @node-name options must be either omitted or set to the # current name of the appropriate node. This command won't change any @@ -4247,8 +4249,8 @@ # # 4) NULL: the current child (if any) is detached. # -# Options (1) and (2) are supported in all cases, but at the moment -# only @backing allows replacing or detaching an existing child. +# Options (1) and (2) are supported in all cases. Option (3) is +# supported for @file and @backing, and option (4) for @backing only. # # Unlike with blockdev-add, the @backing option must always be present # unless the node being reopened does not have a backing file and its @@ -4258,7 +4260,7 @@ # Since: 4.0 ## { 'command': 'x-blockdev-reopen', - 'data': 'BlockdevOptions', 'boxed': true } + 'data': { 'options': ['BlockdevOptions'] } } ## # @blockdev-del: diff --git a/blockdev.c b/blockdev.c index f657d090d3..1e8c946828 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3560,51 +3560,60 @@ fail: visit_free(v); } -void qmp_x_blockdev_reopen(BlockdevOptions *options, Error **errp) -{ -BlockDriverState *bs; -AioContext *ctx; -QObject *obj; -Visitor *v = qobject_output_visitor_new(&obj); -BlockReopenQueue *queue; -QDict *qdict; - -/* Check for the selected node name */ -if (!options->has_node_name) { -error_setg(errp, "node-name not specified"); -goto fail; -} +void qmp_x_blockdev_reopen(BlockdevOptionsList *reopen_list, Error **errp) +{ +BlockReopenQueue *queue = NULL; +GSList *drained = NULL; + +/* Add each one of the BDS that we want to reopen to the queue */ +for (; reopen_list != NULL; reopen_list = reopen_list->next) { +BlockdevOptions *options = reopen_list->value; +BlockDriverState *bs; +AioContext *ctx; +QObject *obj; +Visitor *v; +QDict *qdict; + +/* Check for the selected node name */ +if (!options->has_node_name) { +error_setg(errp, "node-name not specified"); +goto fail; +} -bs = bdrv_find_node(options->node_name); -if (!bs) { -error_setg(errp, "Failed to find node with node-name='%s'", +bs = bdrv_find_node(options->node_name); +if (!bs) { +error_setg(errp, "Failed to find node with node-name='%s'", options->node_name); -goto fail; -} +goto fail; +} -/* Put all options in a QDict and flatten it */ -visit_type_BlockdevOptions(v, NULL, &options, &error_abort); -visit_complete(v, &obj); -qdict = qobject_to(QDict, obj); +/* Put all options in a QDict and flatten it */ +v = qobject_output_visitor_new(&obj); +visit_type_BlockdevOptions(v, NULL, &options, &error_abort); +visit_complete(v, &obj); +visit_free(v); -qdict_flatten(qdict); +qdict = qobject_to(QDict, obj); -/* Perform the reopen operation */ -ctx = bdrv_get_aio_context(bs); -aio_context_acquire(ctx); -bdrv_subtree_drained_begin(bs); -aio_context_release(ctx); +qdict_flatten(qdict);
[PATCH v5 3/6] block: Acquire AioContexts during bdrv_reopen_multiple()
As the BlockReopenQueue can contain nodes in multiple AioContexts, only one of which may be locked when AIO_WAIT_WHILE() can be called, we can't let the caller lock the right contexts. Instead, individually lock the AioContext of a single node when iterating the queue. Reintroduce bdrv_reopen() as a wrapper for reopening a single node that drains the node and temporarily drops the AioContext lock for bdrv_reopen_multiple(). Signed-off-by: Kevin Wolf --- include/block/block.h | 2 ++ block.c | 49 --- block/replication.c | 7 +++ blockdev.c| 5 + qemu-io-cmds.c| 7 +-- 5 files changed, 57 insertions(+), 13 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index 6d42992985..3477290f9a 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -388,6 +388,8 @@ BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue, bool keep_old_opts); void bdrv_reopen_queue_free(BlockReopenQueue *bs_queue); int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp); +int bdrv_reopen(BlockDriverState *bs, QDict *opts, bool keep_old_opts, +Error **errp); int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only, Error **errp); int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset, diff --git a/block.c b/block.c index ee9b46e95e..06fb69df5c 100644 --- a/block.c +++ b/block.c @@ -4124,19 +4124,26 @@ void bdrv_reopen_queue_free(BlockReopenQueue *bs_queue) * * All affected nodes must be drained between bdrv_reopen_queue() and * bdrv_reopen_multiple(). + * + * To be called from the main thread, with all other AioContexts unlocked. */ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp) { int ret = -1; BlockReopenQueueEntry *bs_entry, *next; +AioContext *ctx; Transaction *tran = tran_new(); g_autoptr(GHashTable) found = NULL; g_autoptr(GSList) refresh_list = NULL; +assert(qemu_get_current_aio_context() == qemu_get_aio_context()); assert(bs_queue != NULL); QTAILQ_FOREACH(bs_entry, bs_queue, entry) { +ctx = bdrv_get_aio_context(bs_entry->state.bs); +aio_context_acquire(ctx); ret = bdrv_flush(bs_entry->state.bs); +aio_context_release(ctx); if (ret < 0) { error_setg_errno(errp, -ret, "Error flushing drive"); goto abort; @@ -4145,7 +4152,10 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp) QTAILQ_FOREACH(bs_entry, bs_queue, entry) { assert(bs_entry->state.bs->quiesce_counter > 0); +ctx = bdrv_get_aio_context(bs_entry->state.bs); +aio_context_acquire(ctx); ret = bdrv_reopen_prepare(&bs_entry->state, bs_queue, tran, errp); +aio_context_release(ctx); if (ret < 0) { goto abort; } @@ -4188,7 +4198,10 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp) * to first element. */ QTAILQ_FOREACH_REVERSE(bs_entry, bs_queue, entry) { +ctx = bdrv_get_aio_context(bs_entry->state.bs); +aio_context_acquire(ctx); bdrv_reopen_commit(&bs_entry->state); +aio_context_release(ctx); } tran_commit(tran); @@ -4197,7 +4210,10 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp) BlockDriverState *bs = bs_entry->state.bs; if (bs->drv->bdrv_reopen_commit_post) { +ctx = bdrv_get_aio_context(bs); +aio_context_acquire(ctx); bs->drv->bdrv_reopen_commit_post(&bs_entry->state); +aio_context_release(ctx); } } @@ -4208,7 +4224,10 @@ abort: tran_abort(tran); QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) { if (bs_entry->prepared) { +ctx = bdrv_get_aio_context(bs_entry->state.bs); +aio_context_acquire(ctx); bdrv_reopen_abort(&bs_entry->state); +aio_context_release(ctx); } } @@ -4218,23 +4237,39 @@ cleanup: return ret; } -int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only, - Error **errp) +int bdrv_reopen(BlockDriverState *bs, QDict *opts, bool keep_old_opts, +Error **errp) { -int ret; +AioContext *ctx = bdrv_get_aio_context(bs); BlockReopenQueue *queue; -QDict *opts = qdict_new(); - -qdict_put_bool(opts, BDRV_OPT_READ_ONLY, read_only); +int ret; bdrv_subtree_drained_begin(bs); -queue = bdrv_reopen_queue(NULL, bs, opts, true); +if (ctx != qemu_get_aio_context()) { +aio_context_release(ctx); +} + +queue = bdrv_reopen_queue(NULL, bs, opts, keep_old_opts); ret = bdrv_reopen_multiple(queue, errp); + +if (ctx != qemu_get_aio_context()) { +aio_context_acquire(ctx); +} bdrv_subtree_drained_end(b
[PATCH v5 5/6] iotests: Test reopening multiple devices at the same time
From: Alberto Garcia This test swaps the images used by two active block devices. This is now possible thanks to the new ability to run x-blockdev-reopen on multiple devices at the same time. Signed-off-by: Alberto Garcia Signed-off-by: Kevin Wolf --- tests/qemu-iotests/245 | 47 ++ tests/qemu-iotests/245.out | 4 ++-- 2 files changed, 49 insertions(+), 2 deletions(-) diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245 index b81fde8f12..6eff352099 100755 --- a/tests/qemu-iotests/245 +++ b/tests/qemu-iotests/245 @@ -649,6 +649,53 @@ class TestBlockdevReopen(iotests.QMPTestCase): '-c', 'read -P 0x40 0x40008 1', '-c', 'read -P 0x80 0x40010 1', hd_path[0]) +# Swap the disk images of two active block devices +def test_swap_files(self): +# Add hd0 and hd2 (none of them with backing files) +opts0 = hd_opts(0) +result = self.vm.qmp('blockdev-add', conv_keys = False, **opts0) +self.assert_qmp(result, 'return', {}) + +opts2 = hd_opts(2) +result = self.vm.qmp('blockdev-add', conv_keys = False, **opts2) +self.assert_qmp(result, 'return', {}) + +# Write different data to both block devices +self.run_qemu_io("hd0", "write -P 0xa0 0 1k") +self.run_qemu_io("hd2", "write -P 0xa2 0 1k") + +# Check that the data reads correctly +self.run_qemu_io("hd0", "read -P 0xa0 0 1k") +self.run_qemu_io("hd2", "read -P 0xa2 0 1k") + +# It's not possible to make a block device use an image that +# is already being used by the other device. +self.reopen(opts0, {'file': 'hd2-file'}, +"Permission conflict on node 'hd2-file': permissions " +"'write, resize' are both required by node 'hd2' (uses " +"node 'hd2-file' as 'file' child) and unshared by node " +"'hd0' (uses node 'hd2-file' as 'file' child).") +self.reopen(opts2, {'file': 'hd0-file'}, +"Permission conflict on node 'hd0-file': permissions " +"'write, resize' are both required by node 'hd0' (uses " +"node 'hd0-file' as 'file' child) and unshared by node " +"'hd2' (uses node 'hd0-file' as 'file' child).") + +# But we can swap the images if we reopen both devices at the +# same time +opts0['file'] = 'hd2-file' +opts2['file'] = 'hd0-file' +self.reopenMultiple([opts0, opts2]) +self.run_qemu_io("hd0", "read -P 0xa2 0 1k") +self.run_qemu_io("hd2", "read -P 0xa0 0 1k") + +# And we can of course come back to the original state +opts0['file'] = 'hd0-file' +opts2['file'] = 'hd2-file' +self.reopenMultiple([opts0, opts2]) +self.run_qemu_io("hd0", "read -P 0xa0 0 1k") +self.run_qemu_io("hd2", "read -P 0xa2 0 1k") + # Misc reopen tests with different block drivers @iotests.skip_if_unsupported(['quorum', 'throttle']) def test_misc_drivers(self): diff --git a/tests/qemu-iotests/245.out b/tests/qemu-iotests/245.out index daf1e51922..4eced19294 100644 --- a/tests/qemu-iotests/245.out +++ b/tests/qemu-iotests/245.out @@ -17,8 +17,8 @@ read 1/1 bytes at offset 262152 read 1/1 bytes at offset 262160 1 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -.. +... -- -Ran 24 tests +Ran 25 tests OK -- 2.31.1
[PATCH]: /hw/nvme/ctrl error handling if descriptors are greater than 1024
From: padmakar if the number of descriptors or pages is more than 1024, dma writes or reads will result in failure. Hence, we check if the number of descriptors or pages is more than 1024 in the nvme module and return Internal Device error. Signed-off-by: Padmakar Kalghatgi --- hw/nvme/ctrl.c | 14 ++ hw/nvme/trace-events | 1 + 2 files changed, 15 insertions(+) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 40a7efc..082592f 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -602,6 +602,20 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, NvmeSg *sg, hwaddr addr, size_t len) return NVME_SUCCESS; } +/* + * The QEMU has an inherent issue where in if the no. + * of descriptors is more than 1024, it will result in + * failure during the dma write or reads. Hence, we need + * to return the error. + */ + +if (((sg->flags & NVME_SG_DMA) && ((sg->qsg.nsg + 1) > IOV_MAX)) || +((sg->iov.niov + 1) > IOV_MAX)) { +NVME_GUEST_ERR(pci_nvme_ub_sg_desc_toohigh, + "number of descriptors is greater than 1024"); +return NVME_INTERNAL_DEV_ERROR; +} + trace_pci_nvme_map_addr(addr, len); if (nvme_addr_is_cmb(n, addr)) { diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events index ea33d0c..bfe1a3b 100644 --- a/hw/nvme/trace-events +++ b/hw/nvme/trace-events @@ -202,3 +202,4 @@ pci_nvme_ub_db_wr_invalid_cqhead(uint32_t qid, uint16_t new_head) "completion qu pci_nvme_ub_db_wr_invalid_sq(uint32_t qid) "submission queue doorbell write for nonexistent queue, sqid=%"PRIu32", ignoring" pci_nvme_ub_db_wr_invalid_sqtail(uint32_t qid, uint16_t new_tail) "submission queue doorbell write value beyond queue size, sqid=%"PRIu32", new_head=%"PRIu16", ignoring" pci_nvme_ub_unknown_css_value(void) "unknown value in cc.css field" +pci_nvme_ub_sg_desc_toohigh(void) "the number of sg descriptors is too high" -- 2.7.0.windows.1
Re: [PATCH 07/10] iotests/297: return error code from run_linters()
25.06.2021 21:20, John Snow wrote: This turns run_linters() into a bit of a hybrid test; returning non-zero on failed execution while also printing diffable information. This is done for the benefit of the avocado simple test runner, which will soon be attempting to execute this test from a different environment. (Note: universal_newlines is added to the pylint invocation for type consistency with the mypy run -- it's not strictly necessary, but it avoids some typing errors caused by our re-use of the 'p' variable.) Signed-off-by: John Snow --- tests/qemu-iotests/297 | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/tests/qemu-iotests/297 b/tests/qemu-iotests/297 index 1e8334d1d4..7db1f9ed45 100755 --- a/tests/qemu-iotests/297 +++ b/tests/qemu-iotests/297 @@ -68,19 +68,22 @@ def run_linters( files: List[str], directory: str = '.', env: Optional[Mapping[str, str]] = None, -) -> None: +) -> int: +ret = 0 print('=== pylint ===') sys.stdout.flush() # Todo notes are fine, but fixme's or xxx's should probably just be # fixed (in tests, at least) -subprocess.run( +p = subprocess.run( ('python3', '-m', 'pylint', '--score=n', '--notes=FIXME,XXX', *files), cwd=directory, env=env, check=False, +universal_newlines=True, ) +ret += p.returncode print('=== mypy ===') sys.stdout.flush() @@ -113,9 +116,12 @@ def run_linters( universal_newlines=True ) +ret += p.returncode if p.returncode != 0: print(p.stdout) +return ret + def main() -> None: for linter in ('pylint-3', 'mypy'): Hmm.. 1. Rather unusual for a function in python to return int error-code, more usual is raising exceptions.. 2. making a sum of return codes looks odd to me 3. Do we really want to run mypy if pylint failed? Maybe better not doing it, and just switch s/check=False/check=True/ ? This way: 3.1 the function becomes native wrapper for subprocess.run, and raise same exceptions 3.2 we don't waste CI time by running mypy when pylint failed anyway -- Best regards, Vladimir
Re: [PATCH 06/10] iotests/297: Add 'directory' argument to run_linters
25.06.2021 21:20, John Snow wrote: Allow run_linters to work well if it's executed from a different directory. Signed-off-by: John Snow Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [PATCH 05/10] iotests/297: Separate environment setup from test execution
25.06.2021 21:20, John Snow wrote: Move environment setup into main(), leaving pure test execution behind in run_linters(). Signed-off-by: John Snow Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
[PATCH 0/4] hw/nvme: fix controller hotplugging
From: Klaus Jensen Back in May, Hannes posted a fix[1] to re-enable NVMe PCI hotplug. We discussed a bit back and fourth and I mentioned that the core issue was an artifact of the parent/child relationship stemming from the qdev setup we have with namespaces attaching to controller through a qdev bus. The gist of this series is the fourth patch "hw/nvme: fix controller hot unplugging" which basically causes namespaces to be reassigned to a bus owned by the subsystem if the parent controller is linked to one. This fixes `device_del/add nvme` in such settings. Note, that in the case that there is no subsystem involved, nvme devices can be removed from the system with `device_del`, but this *will* cause the namespaces to be removed as well since there is no place (i.e. no subsystem) for them to "linger". And since this series does not add support for hotplugging nvme-ns devices, while an nvme device can be readded, no namespaces can. Support for hotplugging nvme-ns devices is present in [1], but I'd rather not add that since I think '-device nvme-ns' is already a bad design choice. Now, I do realize that it is not "pretty" to explicitly change the parent bus, so I do have a an RFC patch in queue that replaces the subsystem and namespace devices with objects, but keeps -device shims available for backwards compatibility. This approach will solve the problems properly and should be a better model. However, I don't believe it will make it for 6.1 and I'd really like to at least fix the unplugging for 6.1 and this gets the job done. [1]: 20210511073511.32511-1-h...@suse.de Klaus Jensen (4): hw/nvme: remove NvmeCtrl parameter from ns setup/check functions hw/nvme: mark nvme-subsys non-hotpluggable hw/nvme: unregister controller with subsystem at exit hw/nvme: fix controller hot unplugging hw/nvme/nvme.h | 21 --- hw/nvme/ctrl.c | 14 +++--- hw/nvme/ns.c | 67 ++-- hw/nvme/subsys.c | 10 4 files changed, 74 insertions(+), 38 deletions(-) -- 2.32.0
[PATCH 2/4] hw/nvme: mark nvme-subsys non-hotpluggable
From: Klaus Jensen We currently lack the infrastructure to handle subsystem hotplugging, so disable it. Signed-off-by: Klaus Jensen --- hw/nvme/subsys.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c index 192223d17ca1..dc7a96862f37 100644 --- a/hw/nvme/subsys.c +++ b/hw/nvme/subsys.c @@ -61,6 +61,7 @@ static void nvme_subsys_class_init(ObjectClass *oc, void *data) dc->realize = nvme_subsys_realize; dc->desc = "Virtual NVMe subsystem"; +dc->hotpluggable = false; device_class_set_props(dc, nvme_subsystem_props); } -- 2.32.0
[PATCH 3/4] hw/nvme: unregister controller with subsystem at exit
From: Klaus Jensen Make sure the controller is unregistered from the subsystem when device is removed. Signed-off-by: Klaus Jensen --- hw/nvme/nvme.h | 1 + hw/nvme/ctrl.c | 4 hw/nvme/subsys.c | 5 + 3 files changed, 10 insertions(+) diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 0868359a1e86..c4065467d877 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -50,6 +50,7 @@ typedef struct NvmeSubsystem { } NvmeSubsystem; int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp); +void nvme_subsys_unregister_ctrl(NvmeSubsystem *subsys, NvmeCtrl *n); static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys, uint32_t cntlid) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index dd1801510032..90e3ee2b70ee 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -6523,6 +6523,10 @@ static void nvme_exit(PCIDevice *pci_dev) nvme_ns_cleanup(ns); } +if (n->subsys) { +nvme_subsys_unregister_ctrl(n->subsys, n); +} + g_free(n->cq); g_free(n->sq); g_free(n->aer_reqs); diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c index dc7a96862f37..92caa604a280 100644 --- a/hw/nvme/subsys.c +++ b/hw/nvme/subsys.c @@ -32,6 +32,11 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) return cntlid; } +void nvme_subsys_unregister_ctrl(NvmeSubsystem *subsys, NvmeCtrl *n) +{ +subsys->ctrls[n->cntlid] = NULL; +} + static void nvme_subsys_setup(NvmeSubsystem *subsys) { const char *nqn = subsys->params.nqn ? -- 2.32.0
[PATCH 1/4] hw/nvme: remove NvmeCtrl parameter from ns setup/check functions
From: Klaus Jensen The nvme_ns_setup and nvme_ns_check_constraints should not depend on the controller state. Refactor and remove it. Signed-off-by: Klaus Jensen --- hw/nvme/nvme.h | 2 +- hw/nvme/ctrl.c | 2 +- hw/nvme/ns.c | 37 ++--- 3 files changed, 20 insertions(+), 21 deletions(-) diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 56f8eceed2ad..0868359a1e86 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -246,7 +246,7 @@ static inline void nvme_aor_dec_active(NvmeNamespace *ns) } void nvme_ns_init_format(NvmeNamespace *ns); -int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp); +int nvme_ns_setup(NvmeNamespace *ns, Error **errp); void nvme_ns_drain(NvmeNamespace *ns); void nvme_ns_shutdown(NvmeNamespace *ns); void nvme_ns_cleanup(NvmeNamespace *ns); diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 629b0d38c2a2..dd1801510032 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -6498,7 +6498,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) ns = &n->namespace; ns->params.nsid = 1; -if (nvme_ns_setup(n, ns, errp)) { +if (nvme_ns_setup(ns, errp)) { return; } diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 4275c3db6301..3c4f5b8c714a 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -346,8 +346,7 @@ static void nvme_zoned_ns_shutdown(NvmeNamespace *ns) assert(ns->nr_open_zones == 0); } -static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns, - Error **errp) +static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) { if (!ns->blkconf.blk) { error_setg(errp, "block backend not configured"); @@ -366,20 +365,6 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns, return -1; } -if (!n->subsys) { -if (ns->params.detached) { -error_setg(errp, "detached requires that the nvme device is " - "linked to an nvme-subsys device"); -return -1; -} - -if (ns->params.shared) { -error_setg(errp, "shared requires that the nvme device is " - "linked to an nvme-subsys device"); -return -1; -} -} - if (ns->params.zoned) { if (ns->params.max_active_zones) { if (ns->params.max_open_zones > ns->params.max_active_zones) { @@ -411,9 +396,9 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns, return 0; } -int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) +int nvme_ns_setup(NvmeNamespace *ns, Error **errp) { -if (nvme_ns_check_constraints(n, ns, errp)) { +if (nvme_ns_check_constraints(ns, errp)) { return -1; } @@ -465,7 +450,21 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp) uint32_t nsid = ns->params.nsid; int i; -if (nvme_ns_setup(n, ns, errp)) { +if (!n->subsys) { +if (ns->params.detached) { +error_setg(errp, "detached requires that the nvme device is " + "linked to an nvme-subsys device"); +return; +} + +if (ns->params.shared) { +error_setg(errp, "shared requires that the nvme device is " + "linked to an nvme-subsys device"); +return; +} +} + +if (nvme_ns_setup(ns, errp)) { return; } -- 2.32.0
Re: [PATCH 04/10] iotests/297: Create main() function
25.06.2021 21:20, John Snow wrote: Instead of running "run_linters" directly, create a main() function that will be responsible for environment setup, leaving run_linters() responsible only for execution of the linters. (That environment setup will be moved over in forthcoming commits.) Signed-off-by: John Snow Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
[PATCH 4/4] hw/nvme: fix controller hot unplugging
From: Klaus Jensen Prior to this patch the nvme-ns devices are always children of the NvmeBus owned by the NvmeCtrl. This causes the namespaces to be unrealized when the parent device is removed. However, when subsystems are involved, this is not what we want since the namespaces may be attached to other controllers as well. This patch adds an additional NvmeBus on the subsystem device. When nvme-ns devices are realized, if the parent controller device is linked to a subsystem, the parent bus is set to the subsystem one instead. This makes sure that namespaces are kept alive and not unrealized. Signed-off-by: Klaus Jensen --- hw/nvme/nvme.h | 18 ++ hw/nvme/ctrl.c | 8 +--- hw/nvme/ns.c | 32 +--- hw/nvme/subsys.c | 4 4 files changed, 44 insertions(+), 18 deletions(-) diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index c4065467d877..9401e212f9f7 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -33,12 +33,21 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_NSID_BROADCAST - 1); typedef struct NvmeCtrl NvmeCtrl; typedef struct NvmeNamespace NvmeNamespace; +#define TYPE_NVME_BUS "nvme-bus" +OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS) + +typedef struct NvmeBus { +BusState parent_bus; +bool is_subsys; +} NvmeBus; + #define TYPE_NVME_SUBSYS "nvme-subsys" #define NVME_SUBSYS(obj) \ OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS) typedef struct NvmeSubsystem { DeviceState parent_obj; +NvmeBus bus; uint8_t subnqn[256]; NvmeCtrl *ctrls[NVME_MAX_CONTROLLERS]; @@ -365,13 +374,6 @@ typedef struct NvmeCQueue { QTAILQ_HEAD(, NvmeRequest) req_list; } NvmeCQueue; -#define TYPE_NVME_BUS "nvme-bus" -#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS) - -typedef struct NvmeBus { -BusState parent_bus; -} NvmeBus; - #define TYPE_NVME "nvme" #define NVME(obj) \ OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME) @@ -463,7 +465,7 @@ typedef struct NvmeCtrl { static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid) { -if (!nsid || nsid > NVME_MAX_NAMESPACES) { +if (!n || !nsid || nsid > NVME_MAX_NAMESPACES) { return NULL; } diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 90e3ee2b70ee..7c8fca36d9a5 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -6516,11 +6516,13 @@ static void nvme_exit(PCIDevice *pci_dev) for (i = 1; i <= NVME_MAX_NAMESPACES; i++) { ns = nvme_ns(n, i); -if (!ns) { -continue; +if (ns) { +ns->attached--; } +} -nvme_ns_cleanup(ns); +if (n->subsys) { +nvme_subsys_unregister_ctrl(n->subsys, n); } if (n->subsys) { diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 3c4f5b8c714a..612a2786d75d 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -444,13 +444,29 @@ void nvme_ns_cleanup(NvmeNamespace *ns) static void nvme_ns_realize(DeviceState *dev, Error **errp) { NvmeNamespace *ns = NVME_NS(dev); -BusState *s = qdev_get_parent_bus(dev); -NvmeCtrl *n = NVME(s->parent); -NvmeSubsystem *subsys = n->subsys; +BusState *qbus = qdev_get_parent_bus(dev); +NvmeBus *bus = NVME_BUS(qbus); +NvmeCtrl *n = NULL; +NvmeSubsystem *subsys = NULL; uint32_t nsid = ns->params.nsid; int i; -if (!n->subsys) { +if (bus->is_subsys) { +subsys = NVME_SUBSYS(qbus->parent); +} else { +n = NVME(qbus->parent); +subsys = n->subsys; +} + +if (subsys) { +/* + * If this namespace belongs to a subsystem (through a link on the + * controller device), reparent the device. + */ +if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) { +return; +} +} else { if (ns->params.detached) { error_setg(errp, "detached requires that the nvme device is " "linked to an nvme-subsys device"); @@ -470,7 +486,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp) if (!nsid) { for (i = 1; i <= NVME_MAX_NAMESPACES; i++) { -if (nvme_ns(n, i) || nvme_subsys_ns(subsys, i)) { +if (nvme_subsys_ns(subsys, i) || nvme_ns(n, i)) { continue; } @@ -483,7 +499,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp) return; } } else { -if (nvme_ns(n, nsid) || nvme_subsys_ns(subsys, nsid)) { +if (nvme_subsys_ns(subsys, nsid) || nvme_ns(n, nsid)) { error_setg(errp, "namespace id '%d' already allocated", nsid); return; } @@ -509,7 +525,9 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp) } } -nvme_attach_ns(n, ns); +if (n) { +nvme_attach_ns(n, ns); +} } static Property nvme_ns_props[] = { diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c index 92caa604a280..fb7c3a7c55fc 100644
Re: [PATCH 03/10] iotests/297: Don't rely on distro-specific linter binaries
25.06.2021 21:20, John Snow wrote: 'pylint-3' is another Fedora-ism. Use "python3 -m pylint" or "python3 -m mypy" to access these scripts instead. This style of invocation will prefer the "correct" tool when run in a virtual environment. Note that we still check for "pylint-3" before the test begins -- this check is now "overly strict", but shouldn't cause anything that was already running correctly to start failing. Signed-off-by: John Snow Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [PATCH 02/10] iotests/297: Add get_files() function
25.06.2021 21:20, John Snow wrote: Split out file discovery into its own method to begin separating out the "environment setup" and "test execution" phases. Signed-off-by: John Snow Reviewed-by: Vladimir Sementsov-Ogievskiy --- tests/qemu-iotests/297 | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/tests/qemu-iotests/297 b/tests/qemu-iotests/297 index 493dda17fb..0bc1195805 100755 --- a/tests/qemu-iotests/297 +++ b/tests/qemu-iotests/297 @@ -21,6 +21,7 @@ import re import shutil import subprocess import sys +from typing import List import iotests @@ -56,9 +57,15 @@ def is_python_file(filename: str, directory: str = '.') -> bool: return False +def get_test_files(directory: str = '.') -> List[str]: +return [ +f for f in (set(os.listdir(directory)) - set(SKIP_FILES)) +if is_python_file(f, directory) +] + + def run_linters(): -files = [filename for filename in (set(os.listdir('.')) - set(SKIP_FILES)) - if is_python_file(filename)] +files = get_test_files() Hmm. It looks like files in tests/qemu-iotests/tests are ignored now.. That's bad iotests.logger.debug('Files to be checked:') iotests.logger.debug(', '.join(sorted(files))) -- Best regards, Vladimir
Re: [PATCH 01/10] iotests/297: modify is_python_file to work from any CWD
25.06.2021 21:20, John Snow wrote: Add a directory argument to is_python_file to allow it to work correctly no matter what CWD we happen to run it from. This is done in anticipation of running the iotests from another directory (./python/). Signed-off-by: John Snow Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [PATCH] hw/ide: Fix crash when plugging a piix3-ide device into the x-remote machine
On 7/6/21 10:24 AM, Thomas Huth wrote: > On 16/04/2021 14.52, Thomas Huth wrote: >> QEMU currently crashes when the user tries to do something like: >> >> qemu-system-x86_64 -M x-remote -device piix3-ide > > It's now several months later already, and this crash has still not been > fixed yet. The next softfreeze is around the corner and the > "device-crash-test" still stumbles accross this problem. > If the other solutions are not mergable yet (what's the status here?), See this big thread about ISA vs PCI IDE modelling / design: https://www.mail-archive.com/qemu-devel@nongnu.org/msg809678.html TL;DR: short term we are screwed. long term, not worth it. Stefan, IIRC the multi-process conclusion was we have to reject PCI devices briding another (non-PCI) bus, such ISA / I2C / USB / SD / ... because QEMU register the bus type globally and the command line machinery resolves it to plug user-creatable devices, so we can not share such buses. Is that correct? > could we please include my patch for QEMU v6.1 instead, so that we get > this silenced in the device-crash-test script? Yes please. Regards, Phil. >> This happens because the "isabus" variable is not initialized with >> the x-remote machine yet. Add a proper check for this condition >> and propagate the error to the caller, so we can fail there gracefully. >> >> Signed-off-by: Thomas Huth >> --- >> hw/ide/ioport.c | 16 ++-- >> hw/ide/piix.c | 22 +- >> hw/isa/isa-bus.c | 14 ++ >> include/hw/ide/internal.h | 2 +- >> include/hw/isa/isa.h | 13 - >> 5 files changed, 46 insertions(+), 21 deletions(-)
Re: [PATCH] hw/ide: Fix crash when plugging a piix3-ide device into the x-remote machine
On 16/04/2021 14.52, Thomas Huth wrote: QEMU currently crashes when the user tries to do something like: qemu-system-x86_64 -M x-remote -device piix3-ide It's now several months later already, and this crash has still not been fixed yet. The next softfreeze is around the corner and the "device-crash-test" still stumbles accross this problem. If the other solutions are not mergable yet (what's the status here?), could we please include my patch for QEMU v6.1 instead, so that we get this silenced in the device-crash-test script? Thanks, Thomas This happens because the "isabus" variable is not initialized with the x-remote machine yet. Add a proper check for this condition and propagate the error to the caller, so we can fail there gracefully. Signed-off-by: Thomas Huth --- hw/ide/ioport.c | 16 ++-- hw/ide/piix.c | 22 +- hw/isa/isa-bus.c | 14 ++ include/hw/ide/internal.h | 2 +- include/hw/isa/isa.h | 13 - 5 files changed, 46 insertions(+), 21 deletions(-) diff --git a/hw/ide/ioport.c b/hw/ide/ioport.c index b613ff3bba..e6caa537fa 100644 --- a/hw/ide/ioport.c +++ b/hw/ide/ioport.c @@ -50,15 +50,19 @@ static const MemoryRegionPortio ide_portio2_list[] = { PORTIO_END_OF_LIST(), }; -void ide_init_ioport(IDEBus *bus, ISADevice *dev, int iobase, int iobase2) +int ide_init_ioport(IDEBus *bus, ISADevice *dev, int iobase, int iobase2) { +int ret; + /* ??? Assume only ISA and PCI configurations, and that the PCI-ISA bridge has been setup properly to always register with ISA. */ -isa_register_portio_list(dev, &bus->portio_list, - iobase, ide_portio_list, bus, "ide"); +ret = isa_register_portio_list(dev, &bus->portio_list, + iobase, ide_portio_list, bus, "ide"); -if (iobase2) { -isa_register_portio_list(dev, &bus->portio2_list, - iobase2, ide_portio2_list, bus, "ide"); +if (ret == 0 && iobase2) { +ret = isa_register_portio_list(dev, &bus->portio2_list, + iobase2, ide_portio2_list, bus, "ide"); } + +return ret; } diff --git a/hw/ide/piix.c b/hw/ide/piix.c index b9860e35a5..d3e738320b 100644 --- a/hw/ide/piix.c +++ b/hw/ide/piix.c @@ -26,6 +26,7 @@ #include "qemu/osdep.h" #include "hw/pci/pci.h" #include "migration/vmstate.h" +#include "qapi/error.h" #include "qemu/module.h" #include "sysemu/block-backend.h" #include "sysemu/blockdev.h" @@ -123,7 +124,8 @@ static void piix_ide_reset(DeviceState *dev) pci_conf[0x20] = 0x01; /* BMIBA: 20-23h */ } -static void pci_piix_init_ports(PCIIDEState *d) { +static int pci_piix_init_ports(PCIIDEState *d) +{ static const struct { int iobase; int iobase2; @@ -132,24 +134,30 @@ static void pci_piix_init_ports(PCIIDEState *d) { {0x1f0, 0x3f6, 14}, {0x170, 0x376, 15}, }; -int i; +int i, ret; for (i = 0; i < 2; i++) { ide_bus_new(&d->bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2); -ide_init_ioport(&d->bus[i], NULL, port_info[i].iobase, -port_info[i].iobase2); +ret = ide_init_ioport(&d->bus[i], NULL, port_info[i].iobase, + port_info[i].iobase2); +if (ret) { +return ret; +} ide_init2(&d->bus[i], isa_get_irq(NULL, port_info[i].isairq)); bmdma_init(&d->bus[i], &d->bmdma[i], d); d->bmdma[i].bus = &d->bus[i]; ide_register_restart_cb(&d->bus[i]); } + +return 0; } static void pci_piix_ide_realize(PCIDevice *dev, Error **errp) { PCIIDEState *d = PCI_IDE(dev); uint8_t *pci_conf = dev->config; +int rc; pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode @@ -158,7 +166,11 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error **errp) vmstate_register(VMSTATE_IF(dev), 0, &vmstate_ide_pci, d); -pci_piix_init_ports(d); +rc = pci_piix_init_ports(d); +if (rc) { +error_setg_errno(errp, -rc, "Failed to realize %s", + object_get_typename(OBJECT(dev))); +} } int pci_piix3_xen_ide_unplug(DeviceState *dev, bool aux) diff --git a/hw/isa/isa-bus.c b/hw/isa/isa-bus.c index 7820068e6e..cffaa35e9c 100644 --- a/hw/isa/isa-bus.c +++ b/hw/isa/isa-bus.c @@ -131,13 +131,17 @@ void isa_register_ioport(ISADevice *dev, MemoryRegion *io, uint16_t start) isa_init_ioport(dev, start); } -void isa_register_portio_list(ISADevice *dev, - PortioList *piolist, uint16_t start, - const MemoryRegionPortio *pio_start, - void *opaque, const char *name) +int isa_register_portio_list(ISADevice *dev, + PortioList *piolist, uint1
Re: [ovirt-users] Re: Any way to terminate stuck export task
On Mon, Jul 5, 2021 at 5:06 PM Nir Soffer wrote: > > qemu-img is busy in posix_fallocate(), wiring one byte to every 4k block. > > If you add -tt -T (as I suggested), we can see how much time each write > takes, > which may explain why this takes so much time. > > strace -f -p 14342 --tt -T > > It seems I missed part of your suggestion... i didn't get the "-tt -T" (or I didn't see it...) With it I get this during the export (in networking of host console 4 mbit/s): # strace -f -p 25243 -tt -T strace: Process 25243 attached with 2 threads [pid 25243] 09:17:32.503907 ppoll([{fd=9, events=POLLIN|POLLERR|POLLHUP}], 1, NULL, NULL, 8 [pid 25244] 09:17:32.694207 pwrite64(12, "\0", 1, 3773509631) = 1 <0.59> [pid 25244] 09:17:32.694412 pwrite64(12, "\0", 1, 3773513727) = 1 <0.78> [pid 25244] 09:17:32.694608 pwrite64(12, "\0", 1, 3773517823) = 1 <0.56> [pid 25244] 09:17:32.694729 pwrite64(12, "\0", 1, 3773521919) = 1 <0.24> [pid 25244] 09:17:32.694796 pwrite64(12, "\0", 1, 3773526015) = 1 <0.20> [pid 25244] 09:17:32.694855 pwrite64(12, "\0", 1, 3773530111) = 1 <0.15> [pid 25244] 09:17:32.694908 pwrite64(12, "\0", 1, 3773534207) = 1 <0.14> [pid 25244] 09:17:32.694950 pwrite64(12, "\0", 1, 3773538303) = 1 <0.16> [pid 25244] 09:17:32.694993 pwrite64(12, "\0", 1, 3773542399) = 1 <0.200032> [pid 25244] 09:17:32.895140 pwrite64(12, "\0", 1, 3773546495) = 1 <0.34> [pid 25244] 09:17:32.895227 pwrite64(12, "\0", 1, 3773550591) = 1 <0.29> [pid 25244] 09:17:32.895296 pwrite64(12, "\0", 1, 3773554687) = 1 <0.24> [pid 25244] 09:17:32.895353 pwrite64(12, "\0", 1, 3773558783) = 1 <0.16> [pid 25244] 09:17:32.895400 pwrite64(12, "\0", 1, 3773562879) = 1 <0.15> [pid 25244] 09:17:32.895443 pwrite64(12, "\0", 1, 3773566975) = 1 <0.15> [pid 25244] 09:17:32.895485 pwrite64(12, "\0", 1, 3773571071) = 1 <0.15> [pid 25244] 09:17:32.895527 pwrite64(12, "\0", 1, 3773575167) = 1 <0.17> [pid 25244] 09:17:32.895570 pwrite64(12, "\0", 1, 3773579263) = 1 <0.199493> [pid 25244] 09:17:33.095147 pwrite64(12, "\0", 1, 3773583359) = 1 <0.31> [pid 25244] 09:17:33.095262 pwrite64(12, "\0", 1, 3773587455) = 1 <0.61> [pid 25244] 09:17:33.095378 pwrite64(12, "\0", 1, 3773591551) = 1 <0.27> [pid 25244] 09:17:33.095445 pwrite64(12, "\0", 1, 3773595647) = 1 <0.21> [pid 25244] 09:17:33.095498 pwrite64(12, "\0", 1, 3773599743) = 1 <0.16> [pid 25244] 09:17:33.095542 pwrite64(12, "\0", 1, 3773603839) = 1 <0.14> . . . BTW: it seems my NAS appliance doesn't support 4.2 version of NFS, because if I force it, I then get an error in mount and in engine.log this error for both nodes as they try to mount: 2021-07-05 17:01:56,082+02 ERROR [org.ovirt.engine.core.bll.storage.connection.FileStorageHelper] (EE-ManagedThreadFactory-engine-Thread-2554190) [642eb6be] The connection with details '172.16.1.137:/nas/EXPORT-DOMAIN' failed because of error code '477' and error message is: problem while trying to mount target and in vdsm.log: MountError: (32, ';mount.nfs: Protocol not supported\n') With NFSv3 I get apparently the same command: vdsm 19702 3036 7 17:15 ?00:00:02 /usr/bin/qemu-img convert -p -t none -T none -f raw /rhev/data-center/mnt/blockSD/679c0725-75fb-4af7-bff1-7c447c5d789c/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/d2a89b5e-7d62-4695-96d8-b762ce52b379 -O raw -o preallocation=falloc /rhev/data-center/mnt/172.16.1.137: _nas_EXPORT-DOMAIN/20433d5d-9d82-4079-9252-0e746ce54106/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/d2a89b5e-7d62-4695-96d8-b762ce52b379 The file size seems bigger but anyway very low throughput as with NFS v4. Gianluca