Re: [Qemu-block] [PATCH v2 0/1] qemu: fix the bug reported by qemu-iotests case 055
Sorry, missed Cc qemu-stable in this patch. 在 2016-11-26 13:46, QingFeng Hao 写道: Hi all, v2 includes changes due to review comments by Kevin Wolf(thanks to Kevin). v2: * Add endian conversion for lba field in vmdk_write_extent. Based on master's commit: 00227fefd205: Update version for v2.8.0-rc1 release v1: * Add patch to fix the bug reported by qemu-iotests case 055. Jing Liu and I found the cause was in vmdk and the fix is in the followed patch. Based on master's commit: 00227fefd205: Update version for v2.8.0-rc1 release Upstream master's qemu-iotests case 055 reports the following error: +qemu-img: Error while reading offset 0 of tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument +qemu-img: Error while reading offset 0 of tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument +qemu-img: Error while reading offset 0 of tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument +qemu-img: Error while reading offset 0 of tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument Thanks! QingFeng Hao (1): block/vmdk: Fix the endian problem of buf_len and lba block/vmdk.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- QingFeng Hao(Robin)
[Qemu-block] [PATCH v2 1/1] block/vmdk: Fix the endian problem of buf_len and lba
The problem was triggered by qemu-iotests case 055. It failed when it was comparing the compressed vmdk image with original test.img. The cause is that buf_len in vmdk_write_extent wasn't converted to little-endian before it was stored to disk. But later vmdk_read_extent read it and converted it from little-endian to cpu endian. If the cpu is big-endian like s390, the problem will happen and the data length read by vmdk_read_extent will become invalid! The fix is to add the conversion in vmdk_write_extent, meanwhile, repair the endianness problem of lba field which shall also be converted to little-endian before storing to disk. Cc: qemu-sta...@nongnu.org Signed-off-by: QingFeng HaoSigned-off-by: Jing Liu Signed-off-by: Kevin Wolf --- block/vmdk.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/vmdk.c b/block/vmdk.c index a11c27a..26e5f95 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -1354,8 +1354,8 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t cluster_offset, goto out; } -data->lba = offset >> BDRV_SECTOR_BITS; -data->size = buf_len; +data->lba = cpu_to_le64(offset >> BDRV_SECTOR_BITS); +data->size = cpu_to_le32(buf_len); n_bytes = buf_len + sizeof(VmdkGrainMarker); iov = (struct iovec) { -- 2.8.4
[Qemu-block] [PATCH v2 0/1] qemu: fix the bug reported by qemu-iotests case 055
Hi all, v2 includes changes due to review comments by Kevin Wolf(thanks to Kevin). v2: * Add endian conversion for lba field in vmdk_write_extent. Based on master's commit: 00227fefd205: Update version for v2.8.0-rc1 release v1: * Add patch to fix the bug reported by qemu-iotests case 055. Jing Liu and I found the cause was in vmdk and the fix is in the followed patch. Based on master's commit: 00227fefd205: Update version for v2.8.0-rc1 release Upstream master's qemu-iotests case 055 reports the following error: +qemu-img: Error while reading offset 0 of tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument +qemu-img: Error while reading offset 0 of tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument +qemu-img: Error while reading offset 0 of tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument +qemu-img: Error while reading offset 0 of tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument Thanks! QingFeng Hao (1): block/vmdk: Fix the endian problem of buf_len and lba block/vmdk.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- 2.8.4
Re: [Qemu-block] [Qemu-devel] [PATCH v10 07/10] hbitmap: serialization
John Snowwrites: > From: Vladimir Sementsov-Ogievskiy > > Functions to serialize / deserialize(restore) HBitmap. HBitmap should be > saved to linear sequence of bits independently of endianness and bitmap > array element (unsigned long) size. Therefore Little Endian is chosen. > > These functions are appropriate for dirty bitmap migration, restoring > the bitmap in several steps is available. To save performance, every > step writes only the last level of the bitmap. All other levels are > restored by hbitmap_deserialize_finish() as a last step of restoring. > So, HBitmap is inconsistent while restoring. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > [Fix left shift operand to 1UL; add "finish" parameter. - Fam] > Signed-off-by: Fam Zheng > > Signed-off-by: John Snow [...] > diff --git a/util/hbitmap.c b/util/hbitmap.c > index f303975..5d1a21c 100644 > --- a/util/hbitmap.c > +++ b/util/hbitmap.c > @@ -397,6 +397,143 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item) > return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != > 0; > } > > +uint64_t hbitmap_serialization_granularity(const HBitmap *hb) > +{ > +/* Require at least 64 bit granularity to be safe on both 64 bit and 32 > bit > + * hosts. */ > +return 64 << hb->granularity; > +} Coverity reports: *** CID 1365378: Integer handling issues (OVERFLOW_BEFORE_WIDEN) /util/hbitmap.c: 404 in hbitmap_serialization_granularity() 398 } 399 400 uint64_t hbitmap_serialization_granularity(const HBitmap *hb) 401 { 402 /* Require at least 64 bit granularity to be safe on both 64 bit and 32 bit 403 * hosts. */ >>> CID 1365378: Integer handling issues (OVERFLOW_BEFORE_WIDEN) >>> Potentially overflowing expression "0x40 << hb->granularity" with type "int" (32 bits, signed) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned). 404 return 64 << hb->granularity; 405 } 406 407 /* Start should be aligned to serialization granularity, chunk size should be 408 * aligned to serialization granularity too, except for last chunk. 409 */ Use (uint64_t)64 << hb->granularity for an unsigned 64 bit shift. [...]
Re: [Qemu-block] [PATCH for-2.8 v1 1/1] block/vmdk: Fix the endian problem of buf_len
Am 25.11.2016 um 11:48 hat Hao QingFeng geschrieben: > 在 2016-11-25 18:21, Kevin Wolf 写道: > >[ Cc: Fam, qemu-stable ] > > > >Am 25.11.2016 um 11:06 hat QingFeng Hao geschrieben: > >>The problem was triggered by qemu-iotests case 055. It failed when it > >>was comparing the compressed vmdk image with original test.img. > >> > >>The cause is that buf_len in vmdk_write_extent wasn't converted to > >>little-endian before it was stored to disk. But later vmdk_read_extent > >>read it and converted it from little-endian to cpu endian. > >>If the cpu is big-endian like s390, the problem will happen and > >>the data length read by vmdk_read_extent will become invalid! > >>The fix is to add the conversion in vmdk_write_extent. > >> > >>Signed-off-by: QingFeng Hao> >>Signed-off-by: Jing Liu > >Sounds like something that should still be fixed for 2.8 and in the > >stable branches. > I didn't find the stable branch on upstream, the latest is 2.6 maybe > it's a private one? :-) I think maybe it's only created once work for the 2.7.1 release begins. Anyway, this is not your problem, just make sure to CC the qemu-stable mailing list for bug fixes that affect previous versions as well. It's also helpful if you put a line like this immediately before the signoffs: Cc: qemu-sta...@nongnu.org This makes sure that the stable release maintainers see the patch, and everything else will be taken care of by them. > >>diff --git a/block/vmdk.c b/block/vmdk.c > >>index a11c27a..bf6667f 100644 > >>--- a/block/vmdk.c > >>+++ b/block/vmdk.c > >>@@ -1355,7 +1355,7 @@ static int vmdk_write_extent(VmdkExtent *extent, > >>int64_t cluster_offset, > >> } > >> data->lba = offset >> BDRV_SECTOR_BITS; > >>-data->size = buf_len; > >>+data->size = cpu_to_le32(buf_len); > >At least data->lba needs to be fixed, too, both here and in > >vmdk_read_extent(). Host endianness in an image file is always wrong. > Good detection! Are you going to send a v2 that adds this fix? > >Maybe we should audit the whole driver for endianness problems. > Good sight! maybe we can encapsulate a suite of endianness functions > for the structures to avoid missing some elements of them like lba? > Also will be better for reuse. When I am checking the endianness calls > in vmdk_create_extent, I am just afraid of missing one. :-) Sounds like a good move in general, but let's make sure to keep it separate from the fixes. The reason is that the fixes can still be included in the 2.8 release, but code refactoring would only be for 2.9. You can work on both at the same time, of course, but organise things so that we have a patch series where all the bug fixes come first (for 2.8) and only then the type-safe refactoring (for 2.9). Kevin
[Qemu-block] [PATCH for-2.8 0/4] Allow 'cache-clean-interval' in Linux only
Hi all, The cache-clean-interval setting of qcow2 frees the memory of the L2 cache tables that haven't been used after a certain interval of time. QEMU uses madvise() with MADV_DONTNEED for this. After that call, the data in the specified cache tables is discarded by the kernel. The problem with this behavior is that it is Linux-specific. madvise() itself is not a standard system call and while other implementations (e.g. FreeBSD) also have MADV_DONTNEED, they don't share the same semantics. POSIX defines posix_madvise(), which has POSIX_MADV_DONTNEED, and that's what QEMU uses in systems that don't implement madvise(). However POSIX_MADV_DONTNEED also has different semantics and cannot be used for our purposes. As a matter of fact, in glibc it is a no-op: https://github.molgen.mpg.de/git-mirror/glibc/blob/glibc-2.23/sysdeps/unix/sysv/linux/posix_madvise.c So while this all is mentioned in the QEMU documentation, there's nothing preventing users of other systems from trying to use this feature. In non-Linux systems it is worse than a no-op: it invalidates perfectly valid cache tables for no reason without freeing their memory. This series makes Linux a hard requirement for cache-clean-interval and prints an error message in other systems. Regards, Berto Alberto Garcia (4): qcow2: Make qcow2_cache_table_release() work only in Linux qcow2: Allow 'cache-clean-interval' in Linux only qcow2: Remove stale comment docs: Specify that cache-clean-interval is only supported in Linux block/qcow2-cache.c | 6 +++--- block/qcow2.c| 8 docs/qcow2-cache.txt | 5 +++-- 3 files changed, 14 insertions(+), 5 deletions(-) -- 2.10.2
[Qemu-block] [PATCH 3/4] qcow2: Remove stale comment
We haven't been using CONFIG_MADVISE since 02d0e095031b7fda77de8b Signed-off-by: Alberto Garcia--- block/qcow2-cache.c | 1 - 1 file changed, 1 deletion(-) diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c index ab8ee2d..1d25147 100644 --- a/block/qcow2-cache.c +++ b/block/qcow2-cache.c @@ -22,7 +22,6 @@ * THE SOFTWARE. */ -/* Needed for CONFIG_MADVISE */ #include "qemu/osdep.h" #include "block/block_int.h" #include "qemu-common.h" -- 2.10.2
[Qemu-block] [PATCH 4/4] docs: Specify that cache-clean-interval is only supported in Linux
Make it clear that having Linux is a hard requirement for this feature. Signed-off-by: Alberto Garcia--- docs/qcow2-cache.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt index 5bb0607..1fdd6f9 100644 --- a/docs/qcow2-cache.txt +++ b/docs/qcow2-cache.txt @@ -160,5 +160,6 @@ If unset, the default value for this parameter is 0 and it disables this feature. Note that this functionality currently relies on the MADV_DONTNEED -argument for madvise() to actually free the memory, so it is not -useful in systems that don't follow that behavior. +argument for madvise() to actually free the memory. This is a +Linux-specific feature, so cache-clean-interval is not supported in +other systems. -- 2.10.2
[Qemu-block] [PATCH 2/4] qcow2: Allow 'cache-clean-interval' in Linux only
The cache-clean-interval option of qcow2 only works on Linux. However we allow setting it in other systems regardless of whether it works or not. In those systems this option is not simply a no-op: it actually invalidates perfectly valid cache tables for no good reason without freeing their memory. This patch forbids using that option in non-Linux systems. Signed-off-by: Alberto Garcia--- block/qcow2.c | 8 1 file changed, 8 insertions(+) diff --git a/block/qcow2.c b/block/qcow2.c index 7cfcd84..ed9e0f3 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -668,6 +668,14 @@ static int qcow2_update_options_prepare(BlockDriverState *bs, r->cache_clean_interval = qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, s->cache_clean_interval); +#ifndef CONFIG_LINUX +if (r->cache_clean_interval != 0) { +error_setg(errp, QCOW2_OPT_CACHE_CLEAN_INTERVAL + " not supported on this host"); +ret = -EINVAL; +goto fail; +} +#endif if (r->cache_clean_interval > UINT_MAX) { error_setg(errp, "Cache clean interval too big"); ret = -EINVAL; -- 2.10.2
[Qemu-block] [PATCH 1/4] qcow2: Make qcow2_cache_table_release() work only in Linux
We are using QEMU_MADV_DONTNEED to discard the memory of individual L2 cache tables. The problem with this is that those semantics are specific to the Linux madvise() system call. Other implementations of madvise() (including the very Linux implementation of posix_madvise()) don't do that, so we cannot use them for the same purpose. This patch makes the code Linux-specific and uses madvise() directly since there's no point in going through qemu_madvise() for this. Signed-off-by: Alberto Garcia--- block/qcow2-cache.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c index 6eaefed..ab8ee2d 100644 --- a/block/qcow2-cache.c +++ b/block/qcow2-cache.c @@ -66,7 +66,8 @@ static inline int qcow2_cache_get_table_idx(BlockDriverState *bs, static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c, int i, int num_tables) { -#if QEMU_MADV_DONTNEED != QEMU_MADV_INVALID +/* Using MADV_DONTNEED to discard memory is a Linux-specific feature */ +#ifdef CONFIG_LINUX BDRVQcow2State *s = bs->opaque; void *t = qcow2_cache_get_table_addr(bs, c, i); int align = getpagesize(); @@ -74,7 +75,7 @@ static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c, size_t offset = QEMU_ALIGN_UP((uintptr_t) t, align) - (uintptr_t) t; size_t length = QEMU_ALIGN_DOWN(mem_size - offset, align); if (length > 0) { -qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED); +madvise((uint8_t *) t + offset, length, MADV_DONTNEED); } #endif } -- 2.10.2
Re: [Qemu-block] [PATCH for-2.8 v1 1/1] block/vmdk: Fix the endian problem of buf_len
在 2016-11-25 18:21, Kevin Wolf 写道: [ Cc: Fam, qemu-stable ] Am 25.11.2016 um 11:06 hat QingFeng Hao geschrieben: The problem was triggered by qemu-iotests case 055. It failed when it was comparing the compressed vmdk image with original test.img. The cause is that buf_len in vmdk_write_extent wasn't converted to little-endian before it was stored to disk. But later vmdk_read_extent read it and converted it from little-endian to cpu endian. If the cpu is big-endian like s390, the problem will happen and the data length read by vmdk_read_extent will become invalid! The fix is to add the conversion in vmdk_write_extent. Signed-off-by: QingFeng HaoSigned-off-by: Jing Liu Sounds like something that should still be fixed for 2.8 and in the stable branches. I didn't find the stable branch on upstream, the latest is 2.6 maybe it's a private one? :-) diff --git a/block/vmdk.c b/block/vmdk.c index a11c27a..bf6667f 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -1355,7 +1355,7 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t cluster_offset, } data->lba = offset >> BDRV_SECTOR_BITS; -data->size = buf_len; +data->size = cpu_to_le32(buf_len); At least data->lba needs to be fixed, too, both here and in vmdk_read_extent(). Host endianness in an image file is always wrong. Good detection! Maybe we should audit the whole driver for endianness problems. Good sight! maybe we can encapsulate a suite of endianness functions for the structures to avoid missing some elements of them like lba? Also will be better for reuse. When I am checking the endianness calls in vmdk_create_extent, I am just afraid of missing one. :-) Thanks! Kevin -- QingFeng Hao(Robin)
Re: [Qemu-block] [PATCH for-2.8 v1 1/1] block/vmdk: Fix the endian problem of buf_len
[ Cc: Fam, qemu-stable ] Am 25.11.2016 um 11:06 hat QingFeng Hao geschrieben: > The problem was triggered by qemu-iotests case 055. It failed when it > was comparing the compressed vmdk image with original test.img. > > The cause is that buf_len in vmdk_write_extent wasn't converted to > little-endian before it was stored to disk. But later vmdk_read_extent > read it and converted it from little-endian to cpu endian. > If the cpu is big-endian like s390, the problem will happen and > the data length read by vmdk_read_extent will become invalid! > The fix is to add the conversion in vmdk_write_extent. > > Signed-off-by: QingFeng Hao> Signed-off-by: Jing Liu Sounds like something that should still be fixed for 2.8 and in the stable branches. > diff --git a/block/vmdk.c b/block/vmdk.c > index a11c27a..bf6667f 100644 > --- a/block/vmdk.c > +++ b/block/vmdk.c > @@ -1355,7 +1355,7 @@ static int vmdk_write_extent(VmdkExtent *extent, > int64_t cluster_offset, > } > > data->lba = offset >> BDRV_SECTOR_BITS; > -data->size = buf_len; > +data->size = cpu_to_le32(buf_len); At least data->lba needs to be fixed, too, both here and in vmdk_read_extent(). Host endianness in an image file is always wrong. Maybe we should audit the whole driver for endianness problems. Kevin
Re: [Qemu-block] [PATCH v1 05/18] tests/test-rbcache: add test cases
Am 25.11.2016 um 10:58 hat Pavel Butsykin geschrieben: > On 24.11.2016 15:20, Kevin Wolf wrote: > >Visualised, we test these requests: > > > >1: * > >2: ** > >3:* > >4:*** > >5: > > > >You test inserting the only element, inserting after the last element, > >inserting in the middle and inserting something that overlaps two other > >requests at its start and end. > > > >That's a good start, but it might be worth testing more scenarios: > > > >- Inserting a new first element to a non-empty cache > > What do you mean? To insert an element with zero offset when the cache > already contains other nodes.? Yes, that would be one way to do it. Maybe just swap requests 1 and 2. > >- Overlapping only at the start > >- Overlapping only at the end > >- Overlapping in the middle (i.e. including existing ranges as a > > subset) > > * With only one node > > * With multiple nodes (like adding offset=2, size=16kb here) > > > > Ok. Kevin
[Qemu-block] [Qemu-devel] [PATCH v1 0/1] qemu: fix the bug reported by qemu-iotests case 055
Hi all, This patch is to fix the bug reported by qemu-iotests case 055 and based on upstream master's commit: 00227fefd205: Update version for v2.8.0-rc1 release Jing Liu and I found the cause was in vmdk and the fix is in the followed patch. Thanks! Upstream master's qemu-iotests case 055 reports the following error: +qemu-img: Error while reading offset 0 of tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument +qemu-img: Error while reading offset 0 of tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument +qemu-img: Error while reading offset 0 of tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument +qemu-img: Error while reading offset 0 of tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument QingFeng Hao (1): block/vmdk: Fix the endian problem of buf_len block/vmdk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.8.4
[Qemu-block] [Qemu-devel] [PATCH v1 1/1] block/vmdk: Fix the endian problem of buf_len
The problem was triggered by qemu-iotests case 055. It failed when it was comparing the compressed vmdk image with original test.img. The cause is that buf_len in vmdk_write_extent wasn't converted to little-endian before it was stored to disk. But later vmdk_read_extent read it and converted it from little-endian to cpu endian. If the cpu is big-endian like s390, the problem will happen and the data length read by vmdk_read_extent will become invalid! The fix is to add the conversion in vmdk_write_extent. Signed-off-by: QingFeng HaoSigned-off-by: Jing Liu --- block/vmdk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/vmdk.c b/block/vmdk.c index a11c27a..bf6667f 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -1355,7 +1355,7 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t cluster_offset, } data->lba = offset >> BDRV_SECTOR_BITS; -data->size = buf_len; +data->size = cpu_to_le32(buf_len); n_bytes = buf_len + sizeof(VmdkGrainMarker); iov = (struct iovec) { -- 2.8.4
Re: [Qemu-block] [PATCH v1 05/18] tests/test-rbcache: add test cases
On 24.11.2016 15:20, Kevin Wolf wrote: Am 15.11.2016 um 07:37 hat Pavel Butsykin geschrieben: Signed-off-by: Pavel Butsykin--- tests/Makefile.include | 3 + tests/test-rbcache.c | 308 + 2 files changed, 311 insertions(+) create mode 100644 tests/test-rbcache.c diff --git a/tests/Makefile.include b/tests/Makefile.include index 8162f6f..36bb472 100644 --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -54,6 +54,8 @@ check-unit-y += tests/test-hbitmap$(EXESUF) gcov-files-test-hbitmap-y = blockjob.c check-unit-y += tests/test-blockjob$(EXESUF) check-unit-y += tests/test-blockjob-txn$(EXESUF) +gcov-files-test-rbcache-y = util/rbcache.c +check-unit-y += tests/test-rbcache$(EXESUF) check-unit-y += tests/test-x86-cpuid$(EXESUF) # all code tested by test-x86-cpuid is inside topology.h gcov-files-test-x86-cpuid-y = @@ -481,6 +483,7 @@ tests/test-blockjob-txn$(EXESUF): tests/test-blockjob-txn.o $(test-block-obj-y) tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y) tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y) tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y) +tests/test-rbcache$(EXESUF): tests/test-rbcache.o $(test-util-obj-y) tests/test-x86-cpuid$(EXESUF): tests/test-x86-cpuid.o tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o migration/xbzrle.o page_cache.o $(test-util-obj-y) tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o diff --git a/tests/test-rbcache.c b/tests/test-rbcache.c new file mode 100644 index 000..1c72bfa --- /dev/null +++ b/tests/test-rbcache.c @@ -0,0 +1,308 @@ +/* + * QEMU Range-Based Cache core + * + * Copyright (C) 2015-2016 Parallels IP Holdings GmbH. All rights reserved. + * + * Author: Pavel Butsykin + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu/rbcache.h" + +typedef struct TestRBCacheData { +RBCache *cache; +} TestRBCacheData; + +typedef struct TestRBCacheConfig { +uint64_t limit_size; +int eviction_type; +RBNodeAlloc *alloc; +RBNodeFree *free; +void *opaque; +} TestRBCacheConfig; + +#define KB(_n) ((_n) << 10) +#define MB(_n) ((_n) << 20) + +#define OFFSET1 0 +#define SIZE1 KB(1) + +#define OFFSET2 KB(1) +#define SIZE2 KB(2) + +#define OFFSET3 KB(15) +#define SIZE3 KB(1) + +#define OFFSET4 KB(7) +#define SIZE4 KB(7) + +#define OFFSET5 KB(2) +#define SIZE5 KB(8) Visualised, we test these requests: 1: * 2: ** 3:* 4:*** 5: You test inserting the only element, inserting after the last element, inserting in the middle and inserting something that overlaps two other requests at its start and end. That's a good start, but it might be worth testing more scenarios: - Inserting a new first element to a non-empty cache What do you mean? To insert an element with zero offset when the cache already contains other nodes.? - Overlapping only at the start - Overlapping only at the end - Overlapping in the middle (i.e. including existing ranges as a subset) * With only one node * With multiple nodes (like adding offset=2, size=16kb here) Ok. + +static void test_rbcache_init(TestRBCacheData *data, const void *ctx) +{ +g_assert_nonnull(data->cache); +} + +static void test_rbcache_insert(TestRBCacheData *data, const void *ctx) +{ +RBCacheNode *node1 = rbcache_node_alloc(data->cache, OFFSET1, SIZE1); +RBCacheNode *node2 = rbcache_node_alloc(data->cache, OFFSET2, SIZE2); +RBCacheNode *node3 = rbcache_node_alloc(data->cache, OFFSET3, SIZE3); +RBCacheNode *node4 = rbcache_node_alloc(data->cache, OFFSET4, SIZE4); +RBCacheNode *node5 = rbcache_node_alloc(data->cache, OFFSET5, SIZE5); +RBCacheNode *node; + +node = rbcache_insert(data->cache, node1); +g_assert_true(node == node1); + +node = rbcache_insert(data->cache, node2); +g_assert_true(node == node2); + +node = rbcache_insert(data->cache, node3); +g_assert_true(node == node3); + +node = rbcache_insert(data->cache, node4); +g_assert_true(node == node4); + +node = rbcache_insert(data->cache, node5); +g_assert_true(node != node5); You can test for the right result instead: g_assert_true(node == node2); +rbcache_node_free(data->cache, node5); +} + +static void test_rbcache_search(TestRBCacheData *data, const void *ctx) +{ +RBCacheNode *node; + +test_rbcache_insert(data, ctx); + +node = rbcache_search(data->cache, OFFSET1, SIZE1); +g_assert_nonnull(node); +g_assert_cmpuint(node->offset, ==, OFFSET1); +g_assert_cmpuint(node->bytes, ==, SIZE1); + +node = rbcache_search(data->cache, OFFSET2 + KB(1), SIZE1); Intentional that you use SIZE1 here even though we're looking at node 2? No, here we need
Re: [Qemu-block] [PATCH v1 01/18] block/io: add bdrv_aio_{preadv, pwritev}
On 24.11.2016 15:36, Kevin Wolf wrote: Am 24.11.2016 um 11:58 hat Pavel Butsykin geschrieben: On 23.11.2016 17:28, Kevin Wolf wrote: Am 15.11.2016 um 07:36 hat Pavel Butsykin geschrieben: It's just byte-based wrappers over bdrv_co_aio_prw_vector(), which provide a byte-based interface for AIO read/write. Signed-off-by: Pavel ButsykinI'm in the process to phase out the last users of bdrv_aio_*() so that this set of interfaces can be removed. I'm doing this because it's an unnecessary redundancy, we have too many wrapper functions that expose the same functionality with different syntax. So let's not add new users. At first sight, you don't even seem to use bdrv_aio_preadv() for actual parallelism, but you often have a pattern like this: void foo_cb(void *opaque) { ... qemu_coroutine_enter(acb->co); } void caller() { ... acb = bdrv_aio_preadv(...); qemu_coroutine_yield(); } The code will actually become a lot simpler if you use bdrv_co_preadv() instead because you don't have to have a callback, but you get pure sequential code. The part that actually has some parallelism, pcache_readahead_request(), already creates its own coroutine, so it runs in the background without using callback-style interfaces. I used bdrv_co_preadv(), because it conveniently solves the partial cache hit. To solve the partial cache hit, we need to split a request into smaller parts, make asynchronous requests and wait for all requests in one place. Do you propose to create a coroutine for each part of request? It seemed to me that bdrv_co_preadv() is a wrapper that allows us to get rid of the same code. It's actually the other way round, bdrv_co_preadv() is the "native" block layer API, and bdrv_aio_*() are wrappers providing an alternative interface. Sorry, I mixed up bdrv_co_preadv() and drv_aio_preadv() :) Also I forgot that in this version, pcache can't split one request into many small requests(as in RFC version). I'm looking at pcache_co_readahead(), for example. It looks like this: bdrv_aio_preadv(bs->file, node->common.offset, _acb.qiov, node->common.bytes, pcache_aio_readahead_cb, _acb); qemu_coroutine_yield(); And then we have pcache_aio_readahead_cb(), which ends in: qemu_coroutine_enter(acb->co); So here the callback style doesn't buy you anything, it just rips the code apart in two function. There is no parallelism here anyway, pcache_co_readahead() doesn't do anything until the callback reenters it. This is a very obvious example where bdrv_co_preadv() will simplify the code. I agree. It's similar with the other bdrv_aio_preadv() calls, which are in pcache_co_preadv(): if (bytes > s->max_aio_size) { bdrv_aio_preadv(bs->file, offset, qiov, bytes, pcache_aio_read_cb, ); goto out; } update_req_stats(s->req_stats, offset, bytes); status = pcache_lookup_data(); if (status == CACHE_MISS) { bdrv_aio_preadv(bs->file, offset, qiov, bytes, pcache_aio_read_cb, ); } else if (status == PARTIAL_CACHE_HIT) { assert(acb.part.qiov.niov != 0); bdrv_aio_preadv(bs->file, acb.part.offset, , acb.part.bytes, pcache_aio_read_cb, ); } pcache_readahead_request(); if (status == CACHE_HIT && --acb.ref == 0) { return 0; } out: qemu_coroutine_yield(); Here you have mainly the pcache_readahead_request() call between bdrv_aio_preadv() and the yield. It only spawns a new coroutine, which works in the background, so I think you can move it to before the reads and then the reads can trivially become bdrv_co_preadv() and the callback can again be inlined instead of ripping the function in two parts. The bdrv_aio_pwritev() call in pcache_co_pwritev() is just the same thing and using the coroutine version results in obvious code improvements. And I think this are all uses of bdrv_aio_*() in the pcache driver, so converting it to use bdrv_co_*() instead isn't only possible, but will improve the legibility of your code, too. It's a clear win in all three places. Yes, you're right, this scheme was acceptable when the caller was bdrv_aio_*(), but now it just confuses. Thanks for the explanation! Kevin