Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2022-01-08 Thread Peter Lieven
Am 06.01.22 um 17:01 schrieb Ilya Dryomov:
> On Thu, Jan 6, 2022 at 4:27 PM Peter Lieven  wrote:
>> Am 05.10.21 um 10:36 schrieb Ilya Dryomov:
>>> On Tue, Oct 5, 2021 at 10:19 AM Peter Lieven  wrote:
 Am 05.10.21 um 09:54 schrieb Ilya Dryomov:
> On Thu, Sep 16, 2021 at 2:21 PM Peter Lieven  wrote:
>> the qemu rbd driver currently lacks support for bdrv_co_block_status.
>> This results mainly in incorrect progress during block operations (e.g.
>> qemu-img convert with an rbd image as source).
>>
>> This patch utilizes the rbd_diff_iterate2 call from librbd to detect
>> allocated and unallocated (all zero areas).
>>
>> To avoid querying the ceph OSDs for the answer this is only done if
>> the image has the fast-diff feature which depends on the object-map and
>> exclusive-lock features. In this case it is guaranteed that the 
>> information
>> is present in memory in the librbd client and thus very fast.
>>
>> If fast-diff is not available all areas are reported to be allocated
>> which is the current behaviour if bdrv_co_block_status is not 
>> implemented.
>>
>> Signed-off-by: Peter Lieven 
>> ---
>> V2->V3:
>> - check rbd_flags every time (they can change during runtime) [Ilya]
>> - also check for fast-diff invalid flag [Ilya]
>> - *map and *file cant be NULL [Ilya]
>> - set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an
>> unallocated area [Ilya]
>> - typo: catched -> caught [Ilya]
>> - changed wording about fast-diff, object-map and exclusive lock in
>> commit msg [Ilya]
>>
>> V1->V2:
>> - add commit comment [Stefano]
>> - use failed_post_open [Stefano]
>> - remove redundant assert [Stefano]
>> - add macro+comment for the magic -9000 value [Stefano]
>> - always set *file if its non NULL [Stefano]
>>
>>block/rbd.c | 126 
>>1 file changed, 126 insertions(+)
>>
>> diff --git a/block/rbd.c b/block/rbd.c
>> index dcf82b15b8..3cb24f9981 100644
>> --- a/block/rbd.c
>> +++ b/block/rbd.c
>> @@ -1259,6 +1259,131 @@ static ImageInfoSpecific 
>> *qemu_rbd_get_specific_info(BlockDriverState *bs,
>>return spec_info;
>>}
>>
>> +typedef struct rbd_diff_req {
>> +uint64_t offs;
>> +uint64_t bytes;
>> +int exists;
> Hi Peter,
>
> Nit: make exists a bool.  The one in the callback has to be an int
> because of the callback signature but let's not spread that.
>
>> +} rbd_diff_req;
>> +
>> +/*
>> + * rbd_diff_iterate2 allows to interrupt the exection by returning a 
>> negative
>> + * value in the callback routine. Choose a value that does not conflict 
>> with
>> + * an existing exitcode and return it if we want to prematurely stop the
>> + * execution because we detected a change in the allocation status.
>> + */
>> +#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
>> +
>> +static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len,
>> +   int exists, void *opaque)
>> +{
>> +struct rbd_diff_req *req = opaque;
>> +
>> +assert(req->offs + req->bytes <= offs);
>> +
>> +if (req->exists && offs > req->offs + req->bytes) {
>> +/*
>> + * we started in an allocated area and jumped over an 
>> unallocated area,
>> + * req->bytes contains the length of the allocated area before 
>> the
>> + * unallocated area. stop further processing.
>> + */
>> +return QEMU_RBD_EXIT_DIFF_ITERATE2;
>> +}
>> +if (req->exists && !exists) {
>> +/*
>> + * we started in an allocated area and reached a hole. 
>> req->bytes
>> + * contains the length of the allocated area before the hole.
>> + * stop further processing.
>> + */
>> +return QEMU_RBD_EXIT_DIFF_ITERATE2;
> Do you have a test case for when this branch is taken?
 That would happen if you diff from a snapshot, the question is if it can 
 also happen if the image is a clone from a snapshot?


>> +}
>> +if (!req->exists && exists && offs > req->offs) {
>> +/*
>> + * we started in an unallocated area and hit the first allocated
>> + * block. req->bytes must be set to the length of the 
>> unallocated area
>> + * before the allocated area. stop further processing.
>> + */
>> +req->bytes = offs - req->offs;
>> +return QEMU_RBD_EXIT_DIFF_ITERATE2;
>> +}
>> +
>> +/*
>> + * assert that we caught all cases above and allocation state has 
>> not
>> + * changed during callbacks.
>> + */
>> +assert(ex

Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2022-01-06 Thread Peter Lieven
Am 06.01.22 um 18:47 schrieb Ilya Dryomov:
> On Thu, Jan 6, 2022 at 5:33 PM Peter Lieven  wrote:
>> Am 06.01.22 um 17:01 schrieb Ilya Dryomov:
>>> On Thu, Jan 6, 2022 at 4:27 PM Peter Lieven  wrote:
 Am 05.10.21 um 10:36 schrieb Ilya Dryomov:
> On Tue, Oct 5, 2021 at 10:19 AM Peter Lieven  wrote:
>> Am 05.10.21 um 09:54 schrieb Ilya Dryomov:
>>> On Thu, Sep 16, 2021 at 2:21 PM Peter Lieven  wrote:
 the qemu rbd driver currently lacks support for bdrv_co_block_status.
 This results mainly in incorrect progress during block operations (e.g.
 qemu-img convert with an rbd image as source).

 This patch utilizes the rbd_diff_iterate2 call from librbd to detect
 allocated and unallocated (all zero areas).

 To avoid querying the ceph OSDs for the answer this is only done if
 the image has the fast-diff feature which depends on the object-map and
 exclusive-lock features. In this case it is guaranteed that the 
 information
 is present in memory in the librbd client and thus very fast.

 If fast-diff is not available all areas are reported to be allocated
 which is the current behaviour if bdrv_co_block_status is not 
 implemented.

 Signed-off-by: Peter Lieven 
 ---
 V2->V3:
 - check rbd_flags every time (they can change during runtime) [Ilya]
 - also check for fast-diff invalid flag [Ilya]
 - *map and *file cant be NULL [Ilya]
 - set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an
  unallocated area [Ilya]
 - typo: catched -> caught [Ilya]
 - changed wording about fast-diff, object-map and exclusive lock in
  commit msg [Ilya]

 V1->V2:
 - add commit comment [Stefano]
 - use failed_post_open [Stefano]
 - remove redundant assert [Stefano]
 - add macro+comment for the magic -9000 value [Stefano]
 - always set *file if its non NULL [Stefano]

 block/rbd.c | 126 
 
 1 file changed, 126 insertions(+)

 diff --git a/block/rbd.c b/block/rbd.c
 index dcf82b15b8..3cb24f9981 100644
 --- a/block/rbd.c
 +++ b/block/rbd.c
 @@ -1259,6 +1259,131 @@ static ImageInfoSpecific 
 *qemu_rbd_get_specific_info(BlockDriverState *bs,
 return spec_info;
 }

 +typedef struct rbd_diff_req {
 +uint64_t offs;
 +uint64_t bytes;
 +int exists;
>>> Hi Peter,
>>>
>>> Nit: make exists a bool.  The one in the callback has to be an int
>>> because of the callback signature but let's not spread that.
>>>
 +} rbd_diff_req;
 +
 +/*
 + * rbd_diff_iterate2 allows to interrupt the exection by returning a 
 negative
 + * value in the callback routine. Choose a value that does not 
 conflict with
 + * an existing exitcode and return it if we want to prematurely stop 
 the
 + * execution because we detected a change in the allocation status.
 + */
 +#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
 +
 +static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len,
 +   int exists, void *opaque)
 +{
 +struct rbd_diff_req *req = opaque;
 +
 +assert(req->offs + req->bytes <= offs);
 +
 +if (req->exists && offs > req->offs + req->bytes) {
 +/*
 + * we started in an allocated area and jumped over an 
 unallocated area,
 + * req->bytes contains the length of the allocated area 
 before the
 + * unallocated area. stop further processing.
 + */
 +return QEMU_RBD_EXIT_DIFF_ITERATE2;
 +}
 +if (req->exists && !exists) {
 +/*
 + * we started in an allocated area and reached a hole. 
 req->bytes
 + * contains the length of the allocated area before the hole.
 + * stop further processing.
 + */
 +return QEMU_RBD_EXIT_DIFF_ITERATE2;
>>> Do you have a test case for when this branch is taken?
>> That would happen if you diff from a snapshot, the question is if it can 
>> also happen if the image is a clone from a snapshot?
>>
>>
 +}
 +if (!req->exists && exists && offs > req->offs) {
 +/*
 + * we started in an unallocated area and hit the first 
 allocated
 + * block. req->bytes must be set to the length of the 
 unallocated area
 + * before the allocate

Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2022-01-06 Thread Ilya Dryomov
On Thu, Jan 6, 2022 at 5:33 PM Peter Lieven  wrote:
>
> Am 06.01.22 um 17:01 schrieb Ilya Dryomov:
> > On Thu, Jan 6, 2022 at 4:27 PM Peter Lieven  wrote:
> >> Am 05.10.21 um 10:36 schrieb Ilya Dryomov:
> >>> On Tue, Oct 5, 2021 at 10:19 AM Peter Lieven  wrote:
>  Am 05.10.21 um 09:54 schrieb Ilya Dryomov:
> > On Thu, Sep 16, 2021 at 2:21 PM Peter Lieven  wrote:
> >> the qemu rbd driver currently lacks support for bdrv_co_block_status.
> >> This results mainly in incorrect progress during block operations (e.g.
> >> qemu-img convert with an rbd image as source).
> >>
> >> This patch utilizes the rbd_diff_iterate2 call from librbd to detect
> >> allocated and unallocated (all zero areas).
> >>
> >> To avoid querying the ceph OSDs for the answer this is only done if
> >> the image has the fast-diff feature which depends on the object-map and
> >> exclusive-lock features. In this case it is guaranteed that the 
> >> information
> >> is present in memory in the librbd client and thus very fast.
> >>
> >> If fast-diff is not available all areas are reported to be allocated
> >> which is the current behaviour if bdrv_co_block_status is not 
> >> implemented.
> >>
> >> Signed-off-by: Peter Lieven 
> >> ---
> >> V2->V3:
> >> - check rbd_flags every time (they can change during runtime) [Ilya]
> >> - also check for fast-diff invalid flag [Ilya]
> >> - *map and *file cant be NULL [Ilya]
> >> - set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an
> >>  unallocated area [Ilya]
> >> - typo: catched -> caught [Ilya]
> >> - changed wording about fast-diff, object-map and exclusive lock in
> >>  commit msg [Ilya]
> >>
> >> V1->V2:
> >> - add commit comment [Stefano]
> >> - use failed_post_open [Stefano]
> >> - remove redundant assert [Stefano]
> >> - add macro+comment for the magic -9000 value [Stefano]
> >> - always set *file if its non NULL [Stefano]
> >>
> >> block/rbd.c | 126 
> >> 
> >> 1 file changed, 126 insertions(+)
> >>
> >> diff --git a/block/rbd.c b/block/rbd.c
> >> index dcf82b15b8..3cb24f9981 100644
> >> --- a/block/rbd.c
> >> +++ b/block/rbd.c
> >> @@ -1259,6 +1259,131 @@ static ImageInfoSpecific 
> >> *qemu_rbd_get_specific_info(BlockDriverState *bs,
> >> return spec_info;
> >> }
> >>
> >> +typedef struct rbd_diff_req {
> >> +uint64_t offs;
> >> +uint64_t bytes;
> >> +int exists;
> > Hi Peter,
> >
> > Nit: make exists a bool.  The one in the callback has to be an int
> > because of the callback signature but let's not spread that.
> >
> >> +} rbd_diff_req;
> >> +
> >> +/*
> >> + * rbd_diff_iterate2 allows to interrupt the exection by returning a 
> >> negative
> >> + * value in the callback routine. Choose a value that does not 
> >> conflict with
> >> + * an existing exitcode and return it if we want to prematurely stop 
> >> the
> >> + * execution because we detected a change in the allocation status.
> >> + */
> >> +#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
> >> +
> >> +static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len,
> >> +   int exists, void *opaque)
> >> +{
> >> +struct rbd_diff_req *req = opaque;
> >> +
> >> +assert(req->offs + req->bytes <= offs);
> >> +
> >> +if (req->exists && offs > req->offs + req->bytes) {
> >> +/*
> >> + * we started in an allocated area and jumped over an 
> >> unallocated area,
> >> + * req->bytes contains the length of the allocated area 
> >> before the
> >> + * unallocated area. stop further processing.
> >> + */
> >> +return QEMU_RBD_EXIT_DIFF_ITERATE2;
> >> +}
> >> +if (req->exists && !exists) {
> >> +/*
> >> + * we started in an allocated area and reached a hole. 
> >> req->bytes
> >> + * contains the length of the allocated area before the hole.
> >> + * stop further processing.
> >> + */
> >> +return QEMU_RBD_EXIT_DIFF_ITERATE2;
> > Do you have a test case for when this branch is taken?
>  That would happen if you diff from a snapshot, the question is if it can 
>  also happen if the image is a clone from a snapshot?
> 
> 
> >> +}
> >> +if (!req->exists && exists && offs > req->offs) {
> >> +/*
> >> + * we started in an unallocated area and hit the first 
> >> allocated
> >> + * block. req->bytes must be set to the length of the 
> >> unallocated area
> >> + * before the allocated area. stop further processing.
> >> + 

Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2022-01-06 Thread Peter Lieven

Am 06.01.22 um 17:01 schrieb Ilya Dryomov:

On Thu, Jan 6, 2022 at 4:27 PM Peter Lieven  wrote:

Am 05.10.21 um 10:36 schrieb Ilya Dryomov:

On Tue, Oct 5, 2021 at 10:19 AM Peter Lieven  wrote:

Am 05.10.21 um 09:54 schrieb Ilya Dryomov:

On Thu, Sep 16, 2021 at 2:21 PM Peter Lieven  wrote:

the qemu rbd driver currently lacks support for bdrv_co_block_status.
This results mainly in incorrect progress during block operations (e.g.
qemu-img convert with an rbd image as source).

This patch utilizes the rbd_diff_iterate2 call from librbd to detect
allocated and unallocated (all zero areas).

To avoid querying the ceph OSDs for the answer this is only done if
the image has the fast-diff feature which depends on the object-map and
exclusive-lock features. In this case it is guaranteed that the information
is present in memory in the librbd client and thus very fast.

If fast-diff is not available all areas are reported to be allocated
which is the current behaviour if bdrv_co_block_status is not implemented.

Signed-off-by: Peter Lieven 
---
V2->V3:
- check rbd_flags every time (they can change during runtime) [Ilya]
- also check for fast-diff invalid flag [Ilya]
- *map and *file cant be NULL [Ilya]
- set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an
 unallocated area [Ilya]
- typo: catched -> caught [Ilya]
- changed wording about fast-diff, object-map and exclusive lock in
 commit msg [Ilya]

V1->V2:
- add commit comment [Stefano]
- use failed_post_open [Stefano]
- remove redundant assert [Stefano]
- add macro+comment for the magic -9000 value [Stefano]
- always set *file if its non NULL [Stefano]

block/rbd.c | 126 
1 file changed, 126 insertions(+)

diff --git a/block/rbd.c b/block/rbd.c
index dcf82b15b8..3cb24f9981 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1259,6 +1259,131 @@ static ImageInfoSpecific 
*qemu_rbd_get_specific_info(BlockDriverState *bs,
return spec_info;
}

+typedef struct rbd_diff_req {
+uint64_t offs;
+uint64_t bytes;
+int exists;

Hi Peter,

Nit: make exists a bool.  The one in the callback has to be an int
because of the callback signature but let's not spread that.


+} rbd_diff_req;
+
+/*
+ * rbd_diff_iterate2 allows to interrupt the exection by returning a negative
+ * value in the callback routine. Choose a value that does not conflict with
+ * an existing exitcode and return it if we want to prematurely stop the
+ * execution because we detected a change in the allocation status.
+ */
+#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
+
+static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len,
+   int exists, void *opaque)
+{
+struct rbd_diff_req *req = opaque;
+
+assert(req->offs + req->bytes <= offs);
+
+if (req->exists && offs > req->offs + req->bytes) {
+/*
+ * we started in an allocated area and jumped over an unallocated area,
+ * req->bytes contains the length of the allocated area before the
+ * unallocated area. stop further processing.
+ */
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+if (req->exists && !exists) {
+/*
+ * we started in an allocated area and reached a hole. req->bytes
+ * contains the length of the allocated area before the hole.
+ * stop further processing.
+ */
+return QEMU_RBD_EXIT_DIFF_ITERATE2;

Do you have a test case for when this branch is taken?

That would happen if you diff from a snapshot, the question is if it can also 
happen if the image is a clone from a snapshot?



+}
+if (!req->exists && exists && offs > req->offs) {
+/*
+ * we started in an unallocated area and hit the first allocated
+ * block. req->bytes must be set to the length of the unallocated area
+ * before the allocated area. stop further processing.
+ */
+req->bytes = offs - req->offs;
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+
+/*
+ * assert that we caught all cases above and allocation state has not
+ * changed during callbacks.
+ */
+assert(exists == req->exists || !req->bytes);
+req->exists = exists;
+
+/*
+ * assert that we either return an unallocated block or have got callbacks
+ * for all allocated blocks present.
+ */
+assert(!req->exists || offs == req->offs + req->bytes);
+req->bytes = offs + len - req->offs;
+
+return 0;
+}
+
+static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
+ bool want_zero, int64_t 
offset,
+ int64_t bytes, int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
+{
+BDRVRBDState *s = bs->opaque;
+int ret, r;

Nit: I would rename ret to status or somethi

Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2022-01-06 Thread Ilya Dryomov
On Thu, Jan 6, 2022 at 4:27 PM Peter Lieven  wrote:
>
> Am 05.10.21 um 10:36 schrieb Ilya Dryomov:
> > On Tue, Oct 5, 2021 at 10:19 AM Peter Lieven  wrote:
> >> Am 05.10.21 um 09:54 schrieb Ilya Dryomov:
> >>> On Thu, Sep 16, 2021 at 2:21 PM Peter Lieven  wrote:
>  the qemu rbd driver currently lacks support for bdrv_co_block_status.
>  This results mainly in incorrect progress during block operations (e.g.
>  qemu-img convert with an rbd image as source).
> 
>  This patch utilizes the rbd_diff_iterate2 call from librbd to detect
>  allocated and unallocated (all zero areas).
> 
>  To avoid querying the ceph OSDs for the answer this is only done if
>  the image has the fast-diff feature which depends on the object-map and
>  exclusive-lock features. In this case it is guaranteed that the 
>  information
>  is present in memory in the librbd client and thus very fast.
> 
>  If fast-diff is not available all areas are reported to be allocated
>  which is the current behaviour if bdrv_co_block_status is not 
>  implemented.
> 
>  Signed-off-by: Peter Lieven 
>  ---
>  V2->V3:
>  - check rbd_flags every time (they can change during runtime) [Ilya]
>  - also check for fast-diff invalid flag [Ilya]
>  - *map and *file cant be NULL [Ilya]
>  - set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an
>  unallocated area [Ilya]
>  - typo: catched -> caught [Ilya]
>  - changed wording about fast-diff, object-map and exclusive lock in
>  commit msg [Ilya]
> 
>  V1->V2:
>  - add commit comment [Stefano]
>  - use failed_post_open [Stefano]
>  - remove redundant assert [Stefano]
>  - add macro+comment for the magic -9000 value [Stefano]
>  - always set *file if its non NULL [Stefano]
> 
> block/rbd.c | 126 
> 1 file changed, 126 insertions(+)
> 
>  diff --git a/block/rbd.c b/block/rbd.c
>  index dcf82b15b8..3cb24f9981 100644
>  --- a/block/rbd.c
>  +++ b/block/rbd.c
>  @@ -1259,6 +1259,131 @@ static ImageInfoSpecific 
>  *qemu_rbd_get_specific_info(BlockDriverState *bs,
> return spec_info;
> }
> 
>  +typedef struct rbd_diff_req {
>  +uint64_t offs;
>  +uint64_t bytes;
>  +int exists;
> >>> Hi Peter,
> >>>
> >>> Nit: make exists a bool.  The one in the callback has to be an int
> >>> because of the callback signature but let's not spread that.
> >>>
>  +} rbd_diff_req;
>  +
>  +/*
>  + * rbd_diff_iterate2 allows to interrupt the exection by returning a 
>  negative
>  + * value in the callback routine. Choose a value that does not conflict 
>  with
>  + * an existing exitcode and return it if we want to prematurely stop the
>  + * execution because we detected a change in the allocation status.
>  + */
>  +#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
>  +
>  +static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len,
>  +   int exists, void *opaque)
>  +{
>  +struct rbd_diff_req *req = opaque;
>  +
>  +assert(req->offs + req->bytes <= offs);
>  +
>  +if (req->exists && offs > req->offs + req->bytes) {
>  +/*
>  + * we started in an allocated area and jumped over an 
>  unallocated area,
>  + * req->bytes contains the length of the allocated area before 
>  the
>  + * unallocated area. stop further processing.
>  + */
>  +return QEMU_RBD_EXIT_DIFF_ITERATE2;
>  +}
>  +if (req->exists && !exists) {
>  +/*
>  + * we started in an allocated area and reached a hole. 
>  req->bytes
>  + * contains the length of the allocated area before the hole.
>  + * stop further processing.
>  + */
>  +return QEMU_RBD_EXIT_DIFF_ITERATE2;
> >>> Do you have a test case for when this branch is taken?
> >>
> >> That would happen if you diff from a snapshot, the question is if it can 
> >> also happen if the image is a clone from a snapshot?
> >>
> >>
>  +}
>  +if (!req->exists && exists && offs > req->offs) {
>  +/*
>  + * we started in an unallocated area and hit the first allocated
>  + * block. req->bytes must be set to the length of the 
>  unallocated area
>  + * before the allocated area. stop further processing.
>  + */
>  +req->bytes = offs - req->offs;
>  +return QEMU_RBD_EXIT_DIFF_ITERATE2;
>  +}
>  +
>  +/*
>  + * assert that we caught all cases above and allocation state has 
>  not
>  + * changed during callbacks.
>  + */
>  +assert(exists == req->exists || !req->bytes);
> 

Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2022-01-06 Thread Peter Lieven

Am 05.10.21 um 10:36 schrieb Ilya Dryomov:

On Tue, Oct 5, 2021 at 10:19 AM Peter Lieven  wrote:

Am 05.10.21 um 09:54 schrieb Ilya Dryomov:

On Thu, Sep 16, 2021 at 2:21 PM Peter Lieven  wrote:

the qemu rbd driver currently lacks support for bdrv_co_block_status.
This results mainly in incorrect progress during block operations (e.g.
qemu-img convert with an rbd image as source).

This patch utilizes the rbd_diff_iterate2 call from librbd to detect
allocated and unallocated (all zero areas).

To avoid querying the ceph OSDs for the answer this is only done if
the image has the fast-diff feature which depends on the object-map and
exclusive-lock features. In this case it is guaranteed that the information
is present in memory in the librbd client and thus very fast.

If fast-diff is not available all areas are reported to be allocated
which is the current behaviour if bdrv_co_block_status is not implemented.

Signed-off-by: Peter Lieven 
---
V2->V3:
- check rbd_flags every time (they can change during runtime) [Ilya]
- also check for fast-diff invalid flag [Ilya]
- *map and *file cant be NULL [Ilya]
- set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an
unallocated area [Ilya]
- typo: catched -> caught [Ilya]
- changed wording about fast-diff, object-map and exclusive lock in
commit msg [Ilya]

V1->V2:
- add commit comment [Stefano]
- use failed_post_open [Stefano]
- remove redundant assert [Stefano]
- add macro+comment for the magic -9000 value [Stefano]
- always set *file if its non NULL [Stefano]

   block/rbd.c | 126 
   1 file changed, 126 insertions(+)

diff --git a/block/rbd.c b/block/rbd.c
index dcf82b15b8..3cb24f9981 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1259,6 +1259,131 @@ static ImageInfoSpecific 
*qemu_rbd_get_specific_info(BlockDriverState *bs,
   return spec_info;
   }

+typedef struct rbd_diff_req {
+uint64_t offs;
+uint64_t bytes;
+int exists;

Hi Peter,

Nit: make exists a bool.  The one in the callback has to be an int
because of the callback signature but let's not spread that.


+} rbd_diff_req;
+
+/*
+ * rbd_diff_iterate2 allows to interrupt the exection by returning a negative
+ * value in the callback routine. Choose a value that does not conflict with
+ * an existing exitcode and return it if we want to prematurely stop the
+ * execution because we detected a change in the allocation status.
+ */
+#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
+
+static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len,
+   int exists, void *opaque)
+{
+struct rbd_diff_req *req = opaque;
+
+assert(req->offs + req->bytes <= offs);
+
+if (req->exists && offs > req->offs + req->bytes) {
+/*
+ * we started in an allocated area and jumped over an unallocated area,
+ * req->bytes contains the length of the allocated area before the
+ * unallocated area. stop further processing.
+ */
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+if (req->exists && !exists) {
+/*
+ * we started in an allocated area and reached a hole. req->bytes
+ * contains the length of the allocated area before the hole.
+ * stop further processing.
+ */
+return QEMU_RBD_EXIT_DIFF_ITERATE2;

Do you have a test case for when this branch is taken?


That would happen if you diff from a snapshot, the question is if it can also 
happen if the image is a clone from a snapshot?



+}
+if (!req->exists && exists && offs > req->offs) {
+/*
+ * we started in an unallocated area and hit the first allocated
+ * block. req->bytes must be set to the length of the unallocated area
+ * before the allocated area. stop further processing.
+ */
+req->bytes = offs - req->offs;
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+
+/*
+ * assert that we caught all cases above and allocation state has not
+ * changed during callbacks.
+ */
+assert(exists == req->exists || !req->bytes);
+req->exists = exists;
+
+/*
+ * assert that we either return an unallocated block or have got callbacks
+ * for all allocated blocks present.
+ */
+assert(!req->exists || offs == req->offs + req->bytes);
+req->bytes = offs + len - req->offs;
+
+return 0;
+}
+
+static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
+ bool want_zero, int64_t 
offset,
+ int64_t bytes, int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
+{
+BDRVRBDState *s = bs->opaque;
+int ret, r;

Nit: I would rename ret to status or something like that to make
it clear(er) that it is an actual value and never an error.  Or,
even better, dro

Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2021-10-07 Thread Peter Lieven

Am 05.10.21 um 10:36 schrieb Ilya Dryomov:

On Tue, Oct 5, 2021 at 10:19 AM Peter Lieven  wrote:

Am 05.10.21 um 09:54 schrieb Ilya Dryomov:

On Thu, Sep 16, 2021 at 2:21 PM Peter Lieven  wrote:

the qemu rbd driver currently lacks support for bdrv_co_block_status.
This results mainly in incorrect progress during block operations (e.g.
qemu-img convert with an rbd image as source).

This patch utilizes the rbd_diff_iterate2 call from librbd to detect
allocated and unallocated (all zero areas).

To avoid querying the ceph OSDs for the answer this is only done if
the image has the fast-diff feature which depends on the object-map and
exclusive-lock features. In this case it is guaranteed that the information
is present in memory in the librbd client and thus very fast.

If fast-diff is not available all areas are reported to be allocated
which is the current behaviour if bdrv_co_block_status is not implemented.

Signed-off-by: Peter Lieven 
---
V2->V3:
- check rbd_flags every time (they can change during runtime) [Ilya]
- also check for fast-diff invalid flag [Ilya]
- *map and *file cant be NULL [Ilya]
- set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an
unallocated area [Ilya]
- typo: catched -> caught [Ilya]
- changed wording about fast-diff, object-map and exclusive lock in
commit msg [Ilya]

V1->V2:
- add commit comment [Stefano]
- use failed_post_open [Stefano]
- remove redundant assert [Stefano]
- add macro+comment for the magic -9000 value [Stefano]
- always set *file if its non NULL [Stefano]

   block/rbd.c | 126 
   1 file changed, 126 insertions(+)

diff --git a/block/rbd.c b/block/rbd.c
index dcf82b15b8..3cb24f9981 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1259,6 +1259,131 @@ static ImageInfoSpecific 
*qemu_rbd_get_specific_info(BlockDriverState *bs,
   return spec_info;
   }

+typedef struct rbd_diff_req {
+uint64_t offs;
+uint64_t bytes;
+int exists;

Hi Peter,

Nit: make exists a bool.  The one in the callback has to be an int
because of the callback signature but let's not spread that.


+} rbd_diff_req;
+
+/*
+ * rbd_diff_iterate2 allows to interrupt the exection by returning a negative
+ * value in the callback routine. Choose a value that does not conflict with
+ * an existing exitcode and return it if we want to prematurely stop the
+ * execution because we detected a change in the allocation status.
+ */
+#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
+
+static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len,
+   int exists, void *opaque)
+{
+struct rbd_diff_req *req = opaque;
+
+assert(req->offs + req->bytes <= offs);
+
+if (req->exists && offs > req->offs + req->bytes) {
+/*
+ * we started in an allocated area and jumped over an unallocated area,
+ * req->bytes contains the length of the allocated area before the
+ * unallocated area. stop further processing.
+ */
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+if (req->exists && !exists) {
+/*
+ * we started in an allocated area and reached a hole. req->bytes
+ * contains the length of the allocated area before the hole.
+ * stop further processing.
+ */
+return QEMU_RBD_EXIT_DIFF_ITERATE2;

Do you have a test case for when this branch is taken?


That would happen if you diff from a snapshot, the question is if it can also 
happen if the image is a clone from a snapshot?



+}
+if (!req->exists && exists && offs > req->offs) {
+/*
+ * we started in an unallocated area and hit the first allocated
+ * block. req->bytes must be set to the length of the unallocated area
+ * before the allocated area. stop further processing.
+ */
+req->bytes = offs - req->offs;
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+
+/*
+ * assert that we caught all cases above and allocation state has not
+ * changed during callbacks.
+ */
+assert(exists == req->exists || !req->bytes);
+req->exists = exists;
+
+/*
+ * assert that we either return an unallocated block or have got callbacks
+ * for all allocated blocks present.
+ */
+assert(!req->exists || offs == req->offs + req->bytes);
+req->bytes = offs + len - req->offs;
+
+return 0;
+}
+
+static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
+ bool want_zero, int64_t 
offset,
+ int64_t bytes, int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
+{
+BDRVRBDState *s = bs->opaque;
+int ret, r;

Nit: I would rename ret to status or something like that to make
it clear(er) that it is an actual value and never an error.  Or,
even better, dro

Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2021-10-05 Thread Peter Lieven

Am 05.10.21 um 10:36 schrieb Ilya Dryomov:

On Tue, Oct 5, 2021 at 10:19 AM Peter Lieven  wrote:

Am 05.10.21 um 09:54 schrieb Ilya Dryomov:

On Thu, Sep 16, 2021 at 2:21 PM Peter Lieven  wrote:

the qemu rbd driver currently lacks support for bdrv_co_block_status.
This results mainly in incorrect progress during block operations (e.g.
qemu-img convert with an rbd image as source).

This patch utilizes the rbd_diff_iterate2 call from librbd to detect
allocated and unallocated (all zero areas).

To avoid querying the ceph OSDs for the answer this is only done if
the image has the fast-diff feature which depends on the object-map and
exclusive-lock features. In this case it is guaranteed that the information
is present in memory in the librbd client and thus very fast.

If fast-diff is not available all areas are reported to be allocated
which is the current behaviour if bdrv_co_block_status is not implemented.

Signed-off-by: Peter Lieven 
---
V2->V3:
- check rbd_flags every time (they can change during runtime) [Ilya]
- also check for fast-diff invalid flag [Ilya]
- *map and *file cant be NULL [Ilya]
- set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an
unallocated area [Ilya]
- typo: catched -> caught [Ilya]
- changed wording about fast-diff, object-map and exclusive lock in
commit msg [Ilya]

V1->V2:
- add commit comment [Stefano]
- use failed_post_open [Stefano]
- remove redundant assert [Stefano]
- add macro+comment for the magic -9000 value [Stefano]
- always set *file if its non NULL [Stefano]

   block/rbd.c | 126 
   1 file changed, 126 insertions(+)

diff --git a/block/rbd.c b/block/rbd.c
index dcf82b15b8..3cb24f9981 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1259,6 +1259,131 @@ static ImageInfoSpecific 
*qemu_rbd_get_specific_info(BlockDriverState *bs,
   return spec_info;
   }

+typedef struct rbd_diff_req {
+uint64_t offs;
+uint64_t bytes;
+int exists;

Hi Peter,

Nit: make exists a bool.  The one in the callback has to be an int
because of the callback signature but let's not spread that.


+} rbd_diff_req;
+
+/*
+ * rbd_diff_iterate2 allows to interrupt the exection by returning a negative
+ * value in the callback routine. Choose a value that does not conflict with
+ * an existing exitcode and return it if we want to prematurely stop the
+ * execution because we detected a change in the allocation status.
+ */
+#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
+
+static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len,
+   int exists, void *opaque)
+{
+struct rbd_diff_req *req = opaque;
+
+assert(req->offs + req->bytes <= offs);
+
+if (req->exists && offs > req->offs + req->bytes) {
+/*
+ * we started in an allocated area and jumped over an unallocated area,
+ * req->bytes contains the length of the allocated area before the
+ * unallocated area. stop further processing.
+ */
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+if (req->exists && !exists) {
+/*
+ * we started in an allocated area and reached a hole. req->bytes
+ * contains the length of the allocated area before the hole.
+ * stop further processing.
+ */
+return QEMU_RBD_EXIT_DIFF_ITERATE2;

Do you have a test case for when this branch is taken?


That would happen if you diff from a snapshot, the question is if it can also 
happen if the image is a clone from a snapshot?



+}
+if (!req->exists && exists && offs > req->offs) {
+/*
+ * we started in an unallocated area and hit the first allocated
+ * block. req->bytes must be set to the length of the unallocated area
+ * before the allocated area. stop further processing.
+ */
+req->bytes = offs - req->offs;
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+
+/*
+ * assert that we caught all cases above and allocation state has not
+ * changed during callbacks.
+ */
+assert(exists == req->exists || !req->bytes);
+req->exists = exists;
+
+/*
+ * assert that we either return an unallocated block or have got callbacks
+ * for all allocated blocks present.
+ */
+assert(!req->exists || offs == req->offs + req->bytes);
+req->bytes = offs + len - req->offs;
+
+return 0;
+}
+
+static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
+ bool want_zero, int64_t 
offset,
+ int64_t bytes, int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
+{
+BDRVRBDState *s = bs->opaque;
+int ret, r;

Nit: I would rename ret to status or something like that to make
it clear(er) that it is an actual value and never an error.  Or,
even better, dro

Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2021-10-05 Thread Ilya Dryomov
On Tue, Oct 5, 2021 at 10:19 AM Peter Lieven  wrote:
>
> Am 05.10.21 um 09:54 schrieb Ilya Dryomov:
> > On Thu, Sep 16, 2021 at 2:21 PM Peter Lieven  wrote:
> >> the qemu rbd driver currently lacks support for bdrv_co_block_status.
> >> This results mainly in incorrect progress during block operations (e.g.
> >> qemu-img convert with an rbd image as source).
> >>
> >> This patch utilizes the rbd_diff_iterate2 call from librbd to detect
> >> allocated and unallocated (all zero areas).
> >>
> >> To avoid querying the ceph OSDs for the answer this is only done if
> >> the image has the fast-diff feature which depends on the object-map and
> >> exclusive-lock features. In this case it is guaranteed that the information
> >> is present in memory in the librbd client and thus very fast.
> >>
> >> If fast-diff is not available all areas are reported to be allocated
> >> which is the current behaviour if bdrv_co_block_status is not implemented.
> >>
> >> Signed-off-by: Peter Lieven 
> >> ---
> >> V2->V3:
> >> - check rbd_flags every time (they can change during runtime) [Ilya]
> >> - also check for fast-diff invalid flag [Ilya]
> >> - *map and *file cant be NULL [Ilya]
> >> - set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an
> >>unallocated area [Ilya]
> >> - typo: catched -> caught [Ilya]
> >> - changed wording about fast-diff, object-map and exclusive lock in
> >>commit msg [Ilya]
> >>
> >> V1->V2:
> >> - add commit comment [Stefano]
> >> - use failed_post_open [Stefano]
> >> - remove redundant assert [Stefano]
> >> - add macro+comment for the magic -9000 value [Stefano]
> >> - always set *file if its non NULL [Stefano]
> >>
> >>   block/rbd.c | 126 
> >>   1 file changed, 126 insertions(+)
> >>
> >> diff --git a/block/rbd.c b/block/rbd.c
> >> index dcf82b15b8..3cb24f9981 100644
> >> --- a/block/rbd.c
> >> +++ b/block/rbd.c
> >> @@ -1259,6 +1259,131 @@ static ImageInfoSpecific 
> >> *qemu_rbd_get_specific_info(BlockDriverState *bs,
> >>   return spec_info;
> >>   }
> >>
> >> +typedef struct rbd_diff_req {
> >> +uint64_t offs;
> >> +uint64_t bytes;
> >> +int exists;
> > Hi Peter,
> >
> > Nit: make exists a bool.  The one in the callback has to be an int
> > because of the callback signature but let's not spread that.
> >
> >> +} rbd_diff_req;
> >> +
> >> +/*
> >> + * rbd_diff_iterate2 allows to interrupt the exection by returning a 
> >> negative
> >> + * value in the callback routine. Choose a value that does not conflict 
> >> with
> >> + * an existing exitcode and return it if we want to prematurely stop the
> >> + * execution because we detected a change in the allocation status.
> >> + */
> >> +#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
> >> +
> >> +static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len,
> >> +   int exists, void *opaque)
> >> +{
> >> +struct rbd_diff_req *req = opaque;
> >> +
> >> +assert(req->offs + req->bytes <= offs);
> >> +
> >> +if (req->exists && offs > req->offs + req->bytes) {
> >> +/*
> >> + * we started in an allocated area and jumped over an unallocated 
> >> area,
> >> + * req->bytes contains the length of the allocated area before the
> >> + * unallocated area. stop further processing.
> >> + */
> >> +return QEMU_RBD_EXIT_DIFF_ITERATE2;
> >> +}
> >> +if (req->exists && !exists) {
> >> +/*
> >> + * we started in an allocated area and reached a hole. req->bytes
> >> + * contains the length of the allocated area before the hole.
> >> + * stop further processing.
> >> + */
> >> +return QEMU_RBD_EXIT_DIFF_ITERATE2;
> > Do you have a test case for when this branch is taken?
>
>
> That would happen if you diff from a snapshot, the question is if it can also 
> happen if the image is a clone from a snapshot?
>
>
> >
> >> +}
> >> +if (!req->exists && exists && offs > req->offs) {
> >> +/*
> >> + * we started in an unallocated area and hit the first allocated
> >> + * block. req->bytes must be set to the length of the unallocated 
> >> area
> >> + * before the allocated area. stop further processing.
> >> + */
> >> +req->bytes = offs - req->offs;
> >> +return QEMU_RBD_EXIT_DIFF_ITERATE2;
> >> +}
> >> +
> >> +/*
> >> + * assert that we caught all cases above and allocation state has not
> >> + * changed during callbacks.
> >> + */
> >> +assert(exists == req->exists || !req->bytes);
> >> +req->exists = exists;
> >> +
> >> +/*
> >> + * assert that we either return an unallocated block or have got 
> >> callbacks
> >> + * for all allocated blocks present.
> >> + */
> >> +assert(!req->exists || offs == req->offs + req->bytes);
> >> +req->bytes = offs + len - req->offs;
> >> +
> >> +return 0;
> >> +}
> >> +
> >> +static int cor

Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2021-10-05 Thread Peter Lieven

Am 05.10.21 um 09:54 schrieb Ilya Dryomov:

On Thu, Sep 16, 2021 at 2:21 PM Peter Lieven  wrote:

the qemu rbd driver currently lacks support for bdrv_co_block_status.
This results mainly in incorrect progress during block operations (e.g.
qemu-img convert with an rbd image as source).

This patch utilizes the rbd_diff_iterate2 call from librbd to detect
allocated and unallocated (all zero areas).

To avoid querying the ceph OSDs for the answer this is only done if
the image has the fast-diff feature which depends on the object-map and
exclusive-lock features. In this case it is guaranteed that the information
is present in memory in the librbd client and thus very fast.

If fast-diff is not available all areas are reported to be allocated
which is the current behaviour if bdrv_co_block_status is not implemented.

Signed-off-by: Peter Lieven 
---
V2->V3:
- check rbd_flags every time (they can change during runtime) [Ilya]
- also check for fast-diff invalid flag [Ilya]
- *map and *file cant be NULL [Ilya]
- set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an
   unallocated area [Ilya]
- typo: catched -> caught [Ilya]
- changed wording about fast-diff, object-map and exclusive lock in
   commit msg [Ilya]

V1->V2:
- add commit comment [Stefano]
- use failed_post_open [Stefano]
- remove redundant assert [Stefano]
- add macro+comment for the magic -9000 value [Stefano]
- always set *file if its non NULL [Stefano]

  block/rbd.c | 126 
  1 file changed, 126 insertions(+)

diff --git a/block/rbd.c b/block/rbd.c
index dcf82b15b8..3cb24f9981 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1259,6 +1259,131 @@ static ImageInfoSpecific 
*qemu_rbd_get_specific_info(BlockDriverState *bs,
  return spec_info;
  }

+typedef struct rbd_diff_req {
+uint64_t offs;
+uint64_t bytes;
+int exists;

Hi Peter,

Nit: make exists a bool.  The one in the callback has to be an int
because of the callback signature but let's not spread that.


+} rbd_diff_req;
+
+/*
+ * rbd_diff_iterate2 allows to interrupt the exection by returning a negative
+ * value in the callback routine. Choose a value that does not conflict with
+ * an existing exitcode and return it if we want to prematurely stop the
+ * execution because we detected a change in the allocation status.
+ */
+#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
+
+static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len,
+   int exists, void *opaque)
+{
+struct rbd_diff_req *req = opaque;
+
+assert(req->offs + req->bytes <= offs);
+
+if (req->exists && offs > req->offs + req->bytes) {
+/*
+ * we started in an allocated area and jumped over an unallocated area,
+ * req->bytes contains the length of the allocated area before the
+ * unallocated area. stop further processing.
+ */
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+if (req->exists && !exists) {
+/*
+ * we started in an allocated area and reached a hole. req->bytes
+ * contains the length of the allocated area before the hole.
+ * stop further processing.
+ */
+return QEMU_RBD_EXIT_DIFF_ITERATE2;

Do you have a test case for when this branch is taken?



That would happen if you diff from a snapshot, the question is if it can also 
happen if the image is a clone from a snapshot?





+}
+if (!req->exists && exists && offs > req->offs) {
+/*
+ * we started in an unallocated area and hit the first allocated
+ * block. req->bytes must be set to the length of the unallocated area
+ * before the allocated area. stop further processing.
+ */
+req->bytes = offs - req->offs;
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+
+/*
+ * assert that we caught all cases above and allocation state has not
+ * changed during callbacks.
+ */
+assert(exists == req->exists || !req->bytes);
+req->exists = exists;
+
+/*
+ * assert that we either return an unallocated block or have got callbacks
+ * for all allocated blocks present.
+ */
+assert(!req->exists || offs == req->offs + req->bytes);
+req->bytes = offs + len - req->offs;
+
+return 0;
+}
+
+static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
+ bool want_zero, int64_t 
offset,
+ int64_t bytes, int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
+{
+BDRVRBDState *s = bs->opaque;
+int ret, r;

Nit: I would rename ret to status or something like that to make
it clear(er) that it is an actual value and never an error.  Or,
even better, drop it entirely and return one of the two bitmasks
directly.


+struct rbd_diff_req req = { .offs =

Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2021-10-05 Thread Ilya Dryomov
On Thu, Sep 16, 2021 at 2:21 PM Peter Lieven  wrote:
>
> the qemu rbd driver currently lacks support for bdrv_co_block_status.
> This results mainly in incorrect progress during block operations (e.g.
> qemu-img convert with an rbd image as source).
>
> This patch utilizes the rbd_diff_iterate2 call from librbd to detect
> allocated and unallocated (all zero areas).
>
> To avoid querying the ceph OSDs for the answer this is only done if
> the image has the fast-diff feature which depends on the object-map and
> exclusive-lock features. In this case it is guaranteed that the information
> is present in memory in the librbd client and thus very fast.
>
> If fast-diff is not available all areas are reported to be allocated
> which is the current behaviour if bdrv_co_block_status is not implemented.
>
> Signed-off-by: Peter Lieven 
> ---
> V2->V3:
> - check rbd_flags every time (they can change during runtime) [Ilya]
> - also check for fast-diff invalid flag [Ilya]
> - *map and *file cant be NULL [Ilya]
> - set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an
>   unallocated area [Ilya]
> - typo: catched -> caught [Ilya]
> - changed wording about fast-diff, object-map and exclusive lock in
>   commit msg [Ilya]
>
> V1->V2:
> - add commit comment [Stefano]
> - use failed_post_open [Stefano]
> - remove redundant assert [Stefano]
> - add macro+comment for the magic -9000 value [Stefano]
> - always set *file if its non NULL [Stefano]
>
>  block/rbd.c | 126 
>  1 file changed, 126 insertions(+)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index dcf82b15b8..3cb24f9981 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -1259,6 +1259,131 @@ static ImageInfoSpecific 
> *qemu_rbd_get_specific_info(BlockDriverState *bs,
>  return spec_info;
>  }
>
> +typedef struct rbd_diff_req {
> +uint64_t offs;
> +uint64_t bytes;
> +int exists;

Hi Peter,

Nit: make exists a bool.  The one in the callback has to be an int
because of the callback signature but let's not spread that.

> +} rbd_diff_req;
> +
> +/*
> + * rbd_diff_iterate2 allows to interrupt the exection by returning a negative
> + * value in the callback routine. Choose a value that does not conflict with
> + * an existing exitcode and return it if we want to prematurely stop the
> + * execution because we detected a change in the allocation status.
> + */
> +#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
> +
> +static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len,
> +   int exists, void *opaque)
> +{
> +struct rbd_diff_req *req = opaque;
> +
> +assert(req->offs + req->bytes <= offs);
> +
> +if (req->exists && offs > req->offs + req->bytes) {
> +/*
> + * we started in an allocated area and jumped over an unallocated 
> area,
> + * req->bytes contains the length of the allocated area before the
> + * unallocated area. stop further processing.
> + */
> +return QEMU_RBD_EXIT_DIFF_ITERATE2;
> +}
> +if (req->exists && !exists) {
> +/*
> + * we started in an allocated area and reached a hole. req->bytes
> + * contains the length of the allocated area before the hole.
> + * stop further processing.
> + */
> +return QEMU_RBD_EXIT_DIFF_ITERATE2;

Do you have a test case for when this branch is taken?

> +}
> +if (!req->exists && exists && offs > req->offs) {
> +/*
> + * we started in an unallocated area and hit the first allocated
> + * block. req->bytes must be set to the length of the unallocated 
> area
> + * before the allocated area. stop further processing.
> + */
> +req->bytes = offs - req->offs;
> +return QEMU_RBD_EXIT_DIFF_ITERATE2;
> +}
> +
> +/*
> + * assert that we caught all cases above and allocation state has not
> + * changed during callbacks.
> + */
> +assert(exists == req->exists || !req->bytes);
> +req->exists = exists;
> +
> +/*
> + * assert that we either return an unallocated block or have got 
> callbacks
> + * for all allocated blocks present.
> + */
> +assert(!req->exists || offs == req->offs + req->bytes);
> +req->bytes = offs + len - req->offs;
> +
> +return 0;
> +}
> +
> +static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
> + bool want_zero, int64_t 
> offset,
> + int64_t bytes, int64_t 
> *pnum,
> + int64_t *map,
> + BlockDriverState **file)
> +{
> +BDRVRBDState *s = bs->opaque;
> +int ret, r;

Nit: I would rename ret to status or something like that to make
it clear(er) that it is an actual value and never an error.  Or,
even better, drop it entirely and return one of the two bitm

[PATCH V3] block/rbd: implement bdrv_co_block_status

2021-09-16 Thread Peter Lieven
the qemu rbd driver currently lacks support for bdrv_co_block_status.
This results mainly in incorrect progress during block operations (e.g.
qemu-img convert with an rbd image as source).

This patch utilizes the rbd_diff_iterate2 call from librbd to detect
allocated and unallocated (all zero areas).

To avoid querying the ceph OSDs for the answer this is only done if
the image has the fast-diff feature which depends on the object-map and
exclusive-lock features. In this case it is guaranteed that the information
is present in memory in the librbd client and thus very fast.

If fast-diff is not available all areas are reported to be allocated
which is the current behaviour if bdrv_co_block_status is not implemented.

Signed-off-by: Peter Lieven 
---
V2->V3:
- check rbd_flags every time (they can change during runtime) [Ilya]
- also check for fast-diff invalid flag [Ilya]
- *map and *file cant be NULL [Ilya]
- set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an
  unallocated area [Ilya]
- typo: catched -> caught [Ilya]
- changed wording about fast-diff, object-map and exclusive lock in
  commit msg [Ilya]

V1->V2:
- add commit comment [Stefano]
- use failed_post_open [Stefano]
- remove redundant assert [Stefano]
- add macro+comment for the magic -9000 value [Stefano]
- always set *file if its non NULL [Stefano]

 block/rbd.c | 126 
 1 file changed, 126 insertions(+)

diff --git a/block/rbd.c b/block/rbd.c
index dcf82b15b8..3cb24f9981 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1259,6 +1259,131 @@ static ImageInfoSpecific 
*qemu_rbd_get_specific_info(BlockDriverState *bs,
 return spec_info;
 }
 
+typedef struct rbd_diff_req {
+uint64_t offs;
+uint64_t bytes;
+int exists;
+} rbd_diff_req;
+
+/*
+ * rbd_diff_iterate2 allows to interrupt the exection by returning a negative
+ * value in the callback routine. Choose a value that does not conflict with
+ * an existing exitcode and return it if we want to prematurely stop the
+ * execution because we detected a change in the allocation status.
+ */
+#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
+
+static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len,
+   int exists, void *opaque)
+{
+struct rbd_diff_req *req = opaque;
+
+assert(req->offs + req->bytes <= offs);
+
+if (req->exists && offs > req->offs + req->bytes) {
+/*
+ * we started in an allocated area and jumped over an unallocated area,
+ * req->bytes contains the length of the allocated area before the
+ * unallocated area. stop further processing.
+ */
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+if (req->exists && !exists) {
+/*
+ * we started in an allocated area and reached a hole. req->bytes
+ * contains the length of the allocated area before the hole.
+ * stop further processing.
+ */
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+if (!req->exists && exists && offs > req->offs) {
+/*
+ * we started in an unallocated area and hit the first allocated
+ * block. req->bytes must be set to the length of the unallocated area
+ * before the allocated area. stop further processing.
+ */
+req->bytes = offs - req->offs;
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+
+/*
+ * assert that we caught all cases above and allocation state has not
+ * changed during callbacks.
+ */
+assert(exists == req->exists || !req->bytes);
+req->exists = exists;
+
+/*
+ * assert that we either return an unallocated block or have got callbacks
+ * for all allocated blocks present.
+ */
+assert(!req->exists || offs == req->offs + req->bytes);
+req->bytes = offs + len - req->offs;
+
+return 0;
+}
+
+static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
+ bool want_zero, int64_t 
offset,
+ int64_t bytes, int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
+{
+BDRVRBDState *s = bs->opaque;
+int ret, r;
+struct rbd_diff_req req = { .offs = offset };
+uint64_t features, flags;
+
+assert(offset + bytes <= s->image_size);
+
+/* default to all sectors allocated */
+ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+*map = offset;
+*file = bs;
+*pnum = bytes;
+
+/* check if RBD image supports fast-diff */
+r = rbd_get_features(s->image, &features);
+if (r < 0) {
+goto out;
+}
+if (!(features & RBD_FEATURE_FAST_DIFF)) {
+goto out;
+}
+
+/* check if RBD fast-diff result is valid */
+r = rbd_get_flags(s->image, &flags);
+if (r < 0) {
+goto out;
+}
+if (flags & RBD_FLAG_FAST_DIFF_INVALID