Re: [Qemu-block] [PATCH v2 0/1] qemu: fix the bug reported by qemu-iotests case 055

2016-11-25 Thread Hao QingFeng

Sorry, missed Cc qemu-stable in this patch.


在 2016-11-26 13:46, QingFeng Hao 写道:

Hi all,
v2 includes changes due to review comments by Kevin Wolf(thanks to Kevin).

v2:
* Add endian conversion for lba field in vmdk_write_extent.
   Based on master's commit:
   00227fefd205: Update version for v2.8.0-rc1 release

v1:
* Add patch to fix the bug reported by qemu-iotests case 055.
   Jing Liu and I found the cause was in vmdk and the fix is in the
   followed patch.
   Based on master's commit:
   00227fefd205: Update version for v2.8.0-rc1 release

   Upstream master's qemu-iotests case 055 reports the following error:
   +qemu-img: Error while reading offset 0 of 
tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument
   +qemu-img: Error while reading offset 0 of 
tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument
   +qemu-img: Error while reading offset 0 of 
tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument
   +qemu-img: Error while reading offset 0 of 
tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument

Thanks!

QingFeng Hao (1):
   block/vmdk: Fix the endian problem of buf_len and lba

  block/vmdk.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)



--
QingFeng Hao(Robin)




[Qemu-block] [PATCH v2 1/1] block/vmdk: Fix the endian problem of buf_len and lba

2016-11-25 Thread QingFeng Hao
The problem was triggered by qemu-iotests case 055. It failed when it
was comparing the compressed vmdk image with original test.img.

The cause is that buf_len in vmdk_write_extent wasn't converted to
little-endian before it was stored to disk. But later vmdk_read_extent
read it and converted it from little-endian to cpu endian.
If the cpu is big-endian like s390, the problem will happen and
the data length read by vmdk_read_extent will become invalid!
The fix is to add the conversion in vmdk_write_extent, meanwhile,
repair the endianness problem of lba field which shall also be converted
to little-endian before storing to disk.

Cc: qemu-sta...@nongnu.org
Signed-off-by: QingFeng Hao 
Signed-off-by: Jing Liu 
Signed-off-by: Kevin Wolf 
---
 block/vmdk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index a11c27a..26e5f95 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1354,8 +1354,8 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t 
cluster_offset,
 goto out;
 }
 
-data->lba = offset >> BDRV_SECTOR_BITS;
-data->size = buf_len;
+data->lba = cpu_to_le64(offset >> BDRV_SECTOR_BITS);
+data->size = cpu_to_le32(buf_len);
 
 n_bytes = buf_len + sizeof(VmdkGrainMarker);
 iov = (struct iovec) {
-- 
2.8.4




[Qemu-block] [PATCH v2 0/1] qemu: fix the bug reported by qemu-iotests case 055

2016-11-25 Thread QingFeng Hao
Hi all,
v2 includes changes due to review comments by Kevin Wolf(thanks to Kevin).

v2:
* Add endian conversion for lba field in vmdk_write_extent.
  Based on master's commit:
  00227fefd205: Update version for v2.8.0-rc1 release

v1:
* Add patch to fix the bug reported by qemu-iotests case 055.
  Jing Liu and I found the cause was in vmdk and the fix is in the
  followed patch.
  Based on master's commit:
  00227fefd205: Update version for v2.8.0-rc1 release

  Upstream master's qemu-iotests case 055 reports the following error:
  +qemu-img: Error while reading offset 0 of 
tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument
  +qemu-img: Error while reading offset 0 of 
tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument
  +qemu-img: Error while reading offset 0 of 
tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument
  +qemu-img: Error while reading offset 0 of 
tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument 

Thanks!

QingFeng Hao (1):
  block/vmdk: Fix the endian problem of buf_len and lba

 block/vmdk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.8.4




Re: [Qemu-block] [Qemu-devel] [PATCH v10 07/10] hbitmap: serialization

2016-11-25 Thread Markus Armbruster
John Snow  writes:

> From: Vladimir Sementsov-Ogievskiy 
>
> Functions to serialize / deserialize(restore) HBitmap. HBitmap should be
> saved to linear sequence of bits independently of endianness and bitmap
> array element (unsigned long) size. Therefore Little Endian is chosen.
>
> These functions are appropriate for dirty bitmap migration, restoring
> the bitmap in several steps is available. To save performance, every
> step writes only the last level of the bitmap. All other levels are
> restored by hbitmap_deserialize_finish() as a last step of restoring.
> So, HBitmap is inconsistent while restoring.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> [Fix left shift operand to 1UL; add "finish" parameter. - Fam]
> Signed-off-by: Fam Zheng 
>
> Signed-off-by: John Snow 
[...]
> diff --git a/util/hbitmap.c b/util/hbitmap.c
> index f303975..5d1a21c 100644
> --- a/util/hbitmap.c
> +++ b/util/hbitmap.c
> @@ -397,6 +397,143 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
>  return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != 
> 0;
>  }
>  
> +uint64_t hbitmap_serialization_granularity(const HBitmap *hb)
> +{
> +/* Require at least 64 bit granularity to be safe on both 64 bit and 32 
> bit
> + * hosts. */
> +return 64 << hb->granularity;
> +}

Coverity reports:

*** CID 1365378:  Integer handling issues  (OVERFLOW_BEFORE_WIDEN)
/util/hbitmap.c: 404 in hbitmap_serialization_granularity()
398 }
399 
400 uint64_t hbitmap_serialization_granularity(const HBitmap *hb)
401 {
402 /* Require at least 64 bit granularity to be safe on both 64 
bit and 32 bit
403  * hosts. */
>>> CID 1365378:  Integer handling issues  (OVERFLOW_BEFORE_WIDEN)
>>> Potentially overflowing expression "0x40 << hb->granularity" with 
type "int" (32 bits, signed) is evaluated using 32-bit arithmetic, and then 
used in a context that expects an expression of type "uint64_t" (64 bits, 
unsigned).
404 return 64 << hb->granularity;
405 }
406 
407 /* Start should be aligned to serialization granularity, chunk size 
should be
408  * aligned to serialization granularity too, except for last chunk.
409  */

Use (uint64_t)64 << hb->granularity for an unsigned 64 bit shift.

[...]



Re: [Qemu-block] [PATCH for-2.8 v1 1/1] block/vmdk: Fix the endian problem of buf_len

2016-11-25 Thread Kevin Wolf
Am 25.11.2016 um 11:48 hat Hao QingFeng geschrieben:
> 在 2016-11-25 18:21, Kevin Wolf 写道:
> >[ Cc: Fam, qemu-stable ]
> >
> >Am 25.11.2016 um 11:06 hat QingFeng Hao geschrieben:
> >>The problem was triggered by qemu-iotests case 055. It failed when it
> >>was comparing the compressed vmdk image with original test.img.
> >>
> >>The cause is that buf_len in vmdk_write_extent wasn't converted to
> >>little-endian before it was stored to disk. But later vmdk_read_extent
> >>read it and converted it from little-endian to cpu endian.
> >>If the cpu is big-endian like s390, the problem will happen and
> >>the data length read by vmdk_read_extent will become invalid!
> >>The fix is to add the conversion in vmdk_write_extent.
> >>
> >>Signed-off-by: QingFeng Hao 
> >>Signed-off-by: Jing Liu 
> >Sounds like something that should still be fixed for 2.8 and in the
> >stable branches.
> I didn't find the stable branch on upstream, the latest is 2.6 maybe
> it's a private one? :-)

I think maybe it's only created once work for the 2.7.1 release begins.

Anyway, this is not your problem, just make sure to CC the qemu-stable
mailing list for bug fixes that affect previous versions as well. It's
also helpful if you put a line like this immediately before the
signoffs:

Cc: qemu-sta...@nongnu.org

This makes sure that the stable release maintainers see the patch,
and everything else will be taken care of by them.

> >>diff --git a/block/vmdk.c b/block/vmdk.c
> >>index a11c27a..bf6667f 100644
> >>--- a/block/vmdk.c
> >>+++ b/block/vmdk.c
> >>@@ -1355,7 +1355,7 @@ static int vmdk_write_extent(VmdkExtent *extent, 
> >>int64_t cluster_offset,
> >>  }
> >>  data->lba = offset >> BDRV_SECTOR_BITS;
> >>-data->size = buf_len;
> >>+data->size = cpu_to_le32(buf_len);
> >At least data->lba needs to be fixed, too, both here and in
> >vmdk_read_extent(). Host endianness in an image file is always wrong.
> Good detection!

Are you going to send a v2 that adds this fix?

> >Maybe we should audit the whole driver for endianness problems.
> Good sight! maybe we can encapsulate a suite of endianness functions
> for the structures to avoid missing some elements of them like lba?
> Also will be better for reuse. When I am checking the endianness calls
> in vmdk_create_extent, I am just afraid of missing one. :-)

Sounds like a good move in general, but let's make sure to keep it
separate from the fixes. The reason is that the fixes can still be
included in the 2.8 release, but code refactoring would only be for 2.9.

You can work on both at the same time, of course, but organise things so
that we have a patch series where all the bug fixes come first (for 2.8)
and only then the type-safe refactoring (for 2.9).

Kevin



[Qemu-block] [PATCH for-2.8 0/4] Allow 'cache-clean-interval' in Linux only

2016-11-25 Thread Alberto Garcia
Hi all,

The cache-clean-interval setting of qcow2 frees the memory of the L2
cache tables that haven't been used after a certain interval of time.

QEMU uses madvise() with MADV_DONTNEED for this. After that call, the
data in the specified cache tables is discarded by the kernel. The
problem with this behavior is that it is Linux-specific. madvise()
itself is not a standard system call and while other implementations
(e.g. FreeBSD) also have MADV_DONTNEED, they don't share the same
semantics.

POSIX defines posix_madvise(), which has POSIX_MADV_DONTNEED, and
that's what QEMU uses in systems that don't implement madvise().
However POSIX_MADV_DONTNEED also has different semantics and cannot be
used for our purposes. As a matter of fact, in glibc it is a no-op:

https://github.molgen.mpg.de/git-mirror/glibc/blob/glibc-2.23/sysdeps/unix/sysv/linux/posix_madvise.c

So while this all is mentioned in the QEMU documentation, there's
nothing preventing users of other systems from trying to use this
feature. In non-Linux systems it is worse than a no-op: it invalidates
perfectly valid cache tables for no reason without freeing their
memory.

This series makes Linux a hard requirement for cache-clean-interval
and prints an error message in other systems.

Regards,

Berto

Alberto Garcia (4):
  qcow2: Make qcow2_cache_table_release() work only in Linux
  qcow2: Allow 'cache-clean-interval' in Linux only
  qcow2: Remove stale comment
  docs: Specify that cache-clean-interval is only supported in Linux

 block/qcow2-cache.c  | 6 +++---
 block/qcow2.c| 8 
 docs/qcow2-cache.txt | 5 +++--
 3 files changed, 14 insertions(+), 5 deletions(-)

-- 
2.10.2




[Qemu-block] [PATCH 3/4] qcow2: Remove stale comment

2016-11-25 Thread Alberto Garcia
We haven't been using CONFIG_MADVISE since 02d0e095031b7fda77de8b

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cache.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index ab8ee2d..1d25147 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -22,7 +22,6 @@
  * THE SOFTWARE.
  */
 
-/* Needed for CONFIG_MADVISE */
 #include "qemu/osdep.h"
 #include "block/block_int.h"
 #include "qemu-common.h"
-- 
2.10.2




[Qemu-block] [PATCH 4/4] docs: Specify that cache-clean-interval is only supported in Linux

2016-11-25 Thread Alberto Garcia
Make it clear that having Linux is a hard requirement for this
feature.

Signed-off-by: Alberto Garcia 
---
 docs/qcow2-cache.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index 5bb0607..1fdd6f9 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -160,5 +160,6 @@ If unset, the default value for this parameter is 0 and it 
disables
 this feature.
 
 Note that this functionality currently relies on the MADV_DONTNEED
-argument for madvise() to actually free the memory, so it is not
-useful in systems that don't follow that behavior.
+argument for madvise() to actually free the memory. This is a
+Linux-specific feature, so cache-clean-interval is not supported in
+other systems.
-- 
2.10.2




[Qemu-block] [PATCH 2/4] qcow2: Allow 'cache-clean-interval' in Linux only

2016-11-25 Thread Alberto Garcia
The cache-clean-interval option of qcow2 only works on Linux. However
we allow setting it in other systems regardless of whether it works or
not.

In those systems this option is not simply a no-op: it actually
invalidates perfectly valid cache tables for no good reason without
freeing their memory.

This patch forbids using that option in non-Linux systems.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 7cfcd84..ed9e0f3 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -668,6 +668,14 @@ static int qcow2_update_options_prepare(BlockDriverState 
*bs,
 r->cache_clean_interval =
 qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL,
 s->cache_clean_interval);
+#ifndef CONFIG_LINUX
+if (r->cache_clean_interval != 0) {
+error_setg(errp, QCOW2_OPT_CACHE_CLEAN_INTERVAL
+   " not supported on this host");
+ret = -EINVAL;
+goto fail;
+}
+#endif
 if (r->cache_clean_interval > UINT_MAX) {
 error_setg(errp, "Cache clean interval too big");
 ret = -EINVAL;
-- 
2.10.2




[Qemu-block] [PATCH 1/4] qcow2: Make qcow2_cache_table_release() work only in Linux

2016-11-25 Thread Alberto Garcia
We are using QEMU_MADV_DONTNEED to discard the memory of individual L2
cache tables. The problem with this is that those semantics are
specific to the Linux madvise() system call. Other implementations of
madvise() (including the very Linux implementation of posix_madvise())
don't do that, so we cannot use them for the same purpose.

This patch makes the code Linux-specific and uses madvise() directly
since there's no point in going through qemu_madvise() for this.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cache.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 6eaefed..ab8ee2d 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -66,7 +66,8 @@ static inline int qcow2_cache_get_table_idx(BlockDriverState 
*bs,
 static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
   int i, int num_tables)
 {
-#if QEMU_MADV_DONTNEED != QEMU_MADV_INVALID
+/* Using MADV_DONTNEED to discard memory is a Linux-specific feature */
+#ifdef CONFIG_LINUX
 BDRVQcow2State *s = bs->opaque;
 void *t = qcow2_cache_get_table_addr(bs, c, i);
 int align = getpagesize();
@@ -74,7 +75,7 @@ static void qcow2_cache_table_release(BlockDriverState *bs, 
Qcow2Cache *c,
 size_t offset = QEMU_ALIGN_UP((uintptr_t) t, align) - (uintptr_t) t;
 size_t length = QEMU_ALIGN_DOWN(mem_size - offset, align);
 if (length > 0) {
-qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED);
+madvise((uint8_t *) t + offset, length, MADV_DONTNEED);
 }
 #endif
 }
-- 
2.10.2




Re: [Qemu-block] [PATCH for-2.8 v1 1/1] block/vmdk: Fix the endian problem of buf_len

2016-11-25 Thread Hao QingFeng



在 2016-11-25 18:21, Kevin Wolf 写道:

[ Cc: Fam, qemu-stable ]

Am 25.11.2016 um 11:06 hat QingFeng Hao geschrieben:

The problem was triggered by qemu-iotests case 055. It failed when it
was comparing the compressed vmdk image with original test.img.

The cause is that buf_len in vmdk_write_extent wasn't converted to
little-endian before it was stored to disk. But later vmdk_read_extent
read it and converted it from little-endian to cpu endian.
If the cpu is big-endian like s390, the problem will happen and
the data length read by vmdk_read_extent will become invalid!
The fix is to add the conversion in vmdk_write_extent.

Signed-off-by: QingFeng Hao 
Signed-off-by: Jing Liu 

Sounds like something that should still be fixed for 2.8 and in the
stable branches.
I didn't find the stable branch on upstream, the latest is 2.6 maybe 
it's a private one? :-)



diff --git a/block/vmdk.c b/block/vmdk.c
index a11c27a..bf6667f 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1355,7 +1355,7 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t 
cluster_offset,
  }
  
  data->lba = offset >> BDRV_SECTOR_BITS;

-data->size = buf_len;
+data->size = cpu_to_le32(buf_len);

At least data->lba needs to be fixed, too, both here and in
vmdk_read_extent(). Host endianness in an image file is always wrong.

Good detection!


Maybe we should audit the whole driver for endianness problems.

Good sight! maybe we can encapsulate a suite of endianness functions
for the structures to avoid missing some elements of them like lba?
Also will be better for reuse. When I am checking the endianness calls
in vmdk_create_extent, I am just afraid of missing one. :-)
Thanks!


Kevin



--
QingFeng Hao(Robin)




Re: [Qemu-block] [PATCH for-2.8 v1 1/1] block/vmdk: Fix the endian problem of buf_len

2016-11-25 Thread Kevin Wolf
[ Cc: Fam, qemu-stable ]

Am 25.11.2016 um 11:06 hat QingFeng Hao geschrieben:
> The problem was triggered by qemu-iotests case 055. It failed when it
> was comparing the compressed vmdk image with original test.img.
> 
> The cause is that buf_len in vmdk_write_extent wasn't converted to
> little-endian before it was stored to disk. But later vmdk_read_extent
> read it and converted it from little-endian to cpu endian.
> If the cpu is big-endian like s390, the problem will happen and
> the data length read by vmdk_read_extent will become invalid!
> The fix is to add the conversion in vmdk_write_extent.
> 
> Signed-off-by: QingFeng Hao 
> Signed-off-by: Jing Liu 

Sounds like something that should still be fixed for 2.8 and in the
stable branches.

> diff --git a/block/vmdk.c b/block/vmdk.c
> index a11c27a..bf6667f 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1355,7 +1355,7 @@ static int vmdk_write_extent(VmdkExtent *extent, 
> int64_t cluster_offset,
>  }
>  
>  data->lba = offset >> BDRV_SECTOR_BITS;
> -data->size = buf_len;
> +data->size = cpu_to_le32(buf_len);

At least data->lba needs to be fixed, too, both here and in
vmdk_read_extent(). Host endianness in an image file is always wrong.

Maybe we should audit the whole driver for endianness problems.

Kevin



Re: [Qemu-block] [PATCH v1 05/18] tests/test-rbcache: add test cases

2016-11-25 Thread Kevin Wolf
Am 25.11.2016 um 10:58 hat Pavel Butsykin geschrieben:
> On 24.11.2016 15:20, Kevin Wolf wrote:
> >Visualised, we test these requests:
> >
> >1: *
> >2:  **
> >3:*
> >4:***
> >5:   
> >
> >You test inserting the only element, inserting after the last element,
> >inserting in the middle and inserting something that overlaps two other
> >requests at its start and end.
> >
> >That's a good start, but it might be worth testing more scenarios:
> >
> >- Inserting a new first element to a non-empty cache
> 
> What do you mean? To insert an element with zero offset when the cache
> already contains other nodes.?

Yes, that would be one way to do it. Maybe just swap requests 1 and 2.

> >- Overlapping only at the start
> >- Overlapping only at the end
> >- Overlapping in the middle (i.e. including existing ranges as a
> >   subset)
> > * With only one node
> > * With multiple nodes (like adding offset=2, size=16kb here)
> >
> 
> Ok.

Kevin



[Qemu-block] [Qemu-devel] [PATCH v1 0/1] qemu: fix the bug reported by qemu-iotests case 055

2016-11-25 Thread QingFeng Hao
Hi all,
This patch is to fix the bug reported by qemu-iotests case 055
and based on upstream master's commit:
00227fefd205: Update version for v2.8.0-rc1 release
Jing Liu and I found the cause was in vmdk and the fix is in the followed patch.
Thanks!

Upstream master's qemu-iotests case 055 reports the following error:
+qemu-img: Error while reading offset 0 of 
tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument
+qemu-img: Error while reading offset 0 of 
tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument
+qemu-img: Error while reading offset 0 of 
tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument
+qemu-img: Error while reading offset 0 of 
tests/qemu-iotests/scratch/blockdev-target.img: Invalid argument 

QingFeng Hao (1):
  block/vmdk: Fix the endian problem of buf_len

 block/vmdk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.8.4




[Qemu-block] [Qemu-devel] [PATCH v1 1/1] block/vmdk: Fix the endian problem of buf_len

2016-11-25 Thread QingFeng Hao
The problem was triggered by qemu-iotests case 055. It failed when it
was comparing the compressed vmdk image with original test.img.

The cause is that buf_len in vmdk_write_extent wasn't converted to
little-endian before it was stored to disk. But later vmdk_read_extent
read it and converted it from little-endian to cpu endian.
If the cpu is big-endian like s390, the problem will happen and
the data length read by vmdk_read_extent will become invalid!
The fix is to add the conversion in vmdk_write_extent.

Signed-off-by: QingFeng Hao 
Signed-off-by: Jing Liu 
---
 block/vmdk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index a11c27a..bf6667f 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1355,7 +1355,7 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t 
cluster_offset,
 }
 
 data->lba = offset >> BDRV_SECTOR_BITS;
-data->size = buf_len;
+data->size = cpu_to_le32(buf_len);
 
 n_bytes = buf_len + sizeof(VmdkGrainMarker);
 iov = (struct iovec) {
-- 
2.8.4




Re: [Qemu-block] [PATCH v1 05/18] tests/test-rbcache: add test cases

2016-11-25 Thread Pavel Butsykin

On 24.11.2016 15:20, Kevin Wolf wrote:

Am 15.11.2016 um 07:37 hat Pavel Butsykin geschrieben:

Signed-off-by: Pavel Butsykin 
---
  tests/Makefile.include |   3 +
  tests/test-rbcache.c   | 308 +
  2 files changed, 311 insertions(+)
  create mode 100644 tests/test-rbcache.c

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 8162f6f..36bb472 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -54,6 +54,8 @@ check-unit-y += tests/test-hbitmap$(EXESUF)
  gcov-files-test-hbitmap-y = blockjob.c
  check-unit-y += tests/test-blockjob$(EXESUF)
  check-unit-y += tests/test-blockjob-txn$(EXESUF)
+gcov-files-test-rbcache-y = util/rbcache.c
+check-unit-y += tests/test-rbcache$(EXESUF)
  check-unit-y += tests/test-x86-cpuid$(EXESUF)
  # all code tested by test-x86-cpuid is inside topology.h
  gcov-files-test-x86-cpuid-y =
@@ -481,6 +483,7 @@ tests/test-blockjob-txn$(EXESUF): tests/test-blockjob-txn.o 
$(test-block-obj-y)
  tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y)
  tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y)
  tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y)
+tests/test-rbcache$(EXESUF): tests/test-rbcache.o $(test-util-obj-y)
  tests/test-x86-cpuid$(EXESUF): tests/test-x86-cpuid.o
  tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o migration/xbzrle.o 
page_cache.o $(test-util-obj-y)
  tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o
diff --git a/tests/test-rbcache.c b/tests/test-rbcache.c
new file mode 100644
index 000..1c72bfa
--- /dev/null
+++ b/tests/test-rbcache.c
@@ -0,0 +1,308 @@
+/*
+ * QEMU Range-Based Cache core
+ *
+ * Copyright (C) 2015-2016 Parallels IP Holdings GmbH. All rights reserved.
+ *
+ * Author: Pavel Butsykin 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/rbcache.h"
+
+typedef struct TestRBCacheData {
+RBCache *cache;
+} TestRBCacheData;
+
+typedef struct TestRBCacheConfig {
+uint64_t limit_size;
+int eviction_type;
+RBNodeAlloc *alloc;
+RBNodeFree  *free;
+void *opaque;
+} TestRBCacheConfig;
+
+#define KB(_n) ((_n) << 10)
+#define MB(_n) ((_n) << 20)
+
+#define OFFSET1 0
+#define SIZE1   KB(1)
+
+#define OFFSET2 KB(1)
+#define SIZE2   KB(2)
+
+#define OFFSET3 KB(15)
+#define SIZE3   KB(1)
+
+#define OFFSET4 KB(7)
+#define SIZE4   KB(7)
+
+#define OFFSET5 KB(2)
+#define SIZE5   KB(8)


Visualised, we test these requests:

1: *
2:  **
3:*
4:***
5:   

You test inserting the only element, inserting after the last element,
inserting in the middle and inserting something that overlaps two other
requests at its start and end.

That's a good start, but it might be worth testing more scenarios:

- Inserting a new first element to a non-empty cache


What do you mean? To insert an element with zero offset when the cache
already contains other nodes.?


- Overlapping only at the start
- Overlapping only at the end
- Overlapping in the middle (i.e. including existing ranges as a
   subset)
 * With only one node
 * With multiple nodes (like adding offset=2, size=16kb here)



Ok.


+
+static void test_rbcache_init(TestRBCacheData *data, const void *ctx)
+{
+g_assert_nonnull(data->cache);
+}
+
+static void test_rbcache_insert(TestRBCacheData *data, const void *ctx)
+{
+RBCacheNode *node1 = rbcache_node_alloc(data->cache, OFFSET1, SIZE1);
+RBCacheNode *node2 = rbcache_node_alloc(data->cache, OFFSET2, SIZE2);
+RBCacheNode *node3 = rbcache_node_alloc(data->cache, OFFSET3, SIZE3);
+RBCacheNode *node4 = rbcache_node_alloc(data->cache, OFFSET4, SIZE4);
+RBCacheNode *node5 = rbcache_node_alloc(data->cache, OFFSET5, SIZE5);
+RBCacheNode *node;
+
+node = rbcache_insert(data->cache, node1);
+g_assert_true(node == node1);
+
+node = rbcache_insert(data->cache, node2);
+g_assert_true(node == node2);
+
+node = rbcache_insert(data->cache, node3);
+g_assert_true(node == node3);
+
+node = rbcache_insert(data->cache, node4);
+g_assert_true(node == node4);
+
+node = rbcache_insert(data->cache, node5);
+g_assert_true(node != node5);


You can test for the right result instead:

 g_assert_true(node == node2);



+rbcache_node_free(data->cache, node5);
+}
+
+static void test_rbcache_search(TestRBCacheData *data, const void *ctx)
+{
+RBCacheNode *node;
+
+test_rbcache_insert(data, ctx);
+
+node = rbcache_search(data->cache, OFFSET1, SIZE1);
+g_assert_nonnull(node);
+g_assert_cmpuint(node->offset, ==, OFFSET1);
+g_assert_cmpuint(node->bytes, ==, SIZE1);
+
+node = rbcache_search(data->cache, OFFSET2 + KB(1), SIZE1);


Intentional that you use SIZE1 here even though we're looking at node 2?


No, here we need 

Re: [Qemu-block] [PATCH v1 01/18] block/io: add bdrv_aio_{preadv, pwritev}

2016-11-25 Thread Pavel Butsykin

On 24.11.2016 15:36, Kevin Wolf wrote:

Am 24.11.2016 um 11:58 hat Pavel Butsykin geschrieben:

On 23.11.2016 17:28, Kevin Wolf wrote:

Am 15.11.2016 um 07:36 hat Pavel Butsykin geschrieben:

It's just byte-based wrappers over bdrv_co_aio_prw_vector(), which provide
  a byte-based interface for AIO read/write.

Signed-off-by: Pavel Butsykin 


I'm in the process to phase out the last users of bdrv_aio_*() so that
this set of interfaces can be removed. I'm doing this because it's an
unnecessary redundancy, we have too many wrapper functions that expose
the same functionality with different syntax. So let's not add new
users.

At first sight, you don't even seem to use bdrv_aio_preadv() for actual
parallelism, but you often have a pattern like this:

 void foo_cb(void *opaque)
 {
 ...
 qemu_coroutine_enter(acb->co);
 }

 void caller()
 {
 ...
 acb = bdrv_aio_preadv(...);
 qemu_coroutine_yield();
 }

The code will actually become a lot simpler if you use bdrv_co_preadv()
instead because you don't have to have a callback, but you get pure
sequential code.

The part that actually has some parallelism, pcache_readahead_request(),
already creates its own coroutine, so it runs in the background without
using callback-style interfaces.


I used bdrv_co_preadv(), because it conveniently solves the partial
cache hit. To solve the partial cache hit, we need to split a request
into smaller parts, make asynchronous requests and wait for all
requests in one place.

Do you propose to create a coroutine for each part of request? It
seemed to me that bdrv_co_preadv() is a wrapper that allows us to get
rid of the same code.


It's actually the other way round, bdrv_co_preadv() is the "native"
block layer API, and bdrv_aio_*() are wrappers providing an alternative
interface.



Sorry, I mixed up bdrv_co_preadv() and drv_aio_preadv() :)

Also I forgot that in this version, pcache can't split one request into 
many small requests(as in RFC version).



I'm looking at pcache_co_readahead(), for example. It looks like this:

 bdrv_aio_preadv(bs->file, node->common.offset, _acb.qiov,
 node->common.bytes, pcache_aio_readahead_cb,
 _acb);
 qemu_coroutine_yield();

And then we have pcache_aio_readahead_cb(), which ends in:

 qemu_coroutine_enter(acb->co);

So here the callback style doesn't buy you anything, it just rips the
code apart in two function. There is no parallelism here anyway,
pcache_co_readahead() doesn't do anything until the callback reenters
it. This is a very obvious example where bdrv_co_preadv() will simplify
the code.



I agree.


It's similar with the other bdrv_aio_preadv() calls, which are in
pcache_co_preadv():

 if (bytes > s->max_aio_size) {
 bdrv_aio_preadv(bs->file, offset, qiov, bytes,
 pcache_aio_read_cb, );
 goto out;
 }

 update_req_stats(s->req_stats, offset, bytes);

 status = pcache_lookup_data();
 if (status == CACHE_MISS) {
 bdrv_aio_preadv(bs->file, offset, qiov, bytes,
 pcache_aio_read_cb, );
 } else if (status == PARTIAL_CACHE_HIT) {
 assert(acb.part.qiov.niov != 0);
 bdrv_aio_preadv(bs->file, acb.part.offset, ,
 acb.part.bytes, pcache_aio_read_cb, );
 }

 pcache_readahead_request();

 if (status == CACHE_HIT && --acb.ref == 0) {
 return 0;
 }

 out:
 qemu_coroutine_yield();

Here you have mainly the pcache_readahead_request() call between
bdrv_aio_preadv() and the yield. It only spawns a new coroutine, which
works in the background, so I think you can move it to before the reads
and then the reads can trivially become bdrv_co_preadv() and the
callback can again be inlined instead of ripping the function in two
parts.


The bdrv_aio_pwritev() call in pcache_co_pwritev() is just the same
thing and using the coroutine version results in obvious code
improvements.


And I think this are all uses of bdrv_aio_*() in the pcache driver, so
converting it to use bdrv_co_*() instead isn't only possible, but will
improve the legibility of your code, too. It's a clear win in all three
places.



Yes, you're right, this scheme was acceptable when the caller was
bdrv_aio_*(), but now it just confuses. Thanks for the explanation!


Kevin