Re: [PATCH v2 6/9] hw/scsi: Rename SCSIRequest::resid as 'residual'

2022-01-06 Thread Richard Henderson

On 1/4/22 12:54 AM, Philippe Mathieu-Daudé wrote:

From: Philippe Mathieu-Daudé

The 'resid' field is slightly confusing and could be
interpreted as some ID. Rename it as 'residual' which
is clearer to review. No logical change.

Signed-off-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
---
  include/hw/scsi/scsi.h |  4 ++--
  hw/scsi/megasas.c  | 42 +-
  hw/scsi/scsi-bus.c | 10 +-
  hw/scsi/scsi-disk.c|  4 ++--
  softmmu/dma-helpers.c  | 26 +-
  5 files changed, 47 insertions(+), 39 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 5/9] hw/rdma/rdma_utils: Rename rdma_pci_dma_map 'len' argument

2022-01-06 Thread Richard Henderson

On 1/4/22 12:54 AM, Philippe Mathieu-Daudé wrote:

From: Philippe Mathieu-Daudé

Various APIs use 'pval' naming for 'pointer to val'.
rdma_pci_dma_map() uses 'plen' for 'PCI length', but since
'PCI' is already explicit in the function name, simplify
and rename the argument 'len'. No logical change.

Signed-off-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
---
  hw/rdma/rdma_utils.h |  2 +-
  hw/rdma/rdma_utils.c | 14 +++---
  2 files changed, 8 insertions(+), 8 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 7/9] hw/dma: Fix format string issues using dma_addr_t

2022-01-06 Thread Richard Henderson

On 1/4/22 12:54 AM, Philippe Mathieu-Daudé wrote:

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/ide/ahci.c| 2 +-
  hw/rdma/trace-events | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 4/9] hw/dma: Remove CONFIG_USER_ONLY check

2022-01-06 Thread Richard Henderson

On 1/4/22 12:54 AM, Philippe Mathieu-Daudé wrote:

From: Philippe Mathieu-Daudé

DMA API should not be included in user-mode emulation.
If so, build should fail. Remove the CONFIG_USER_ONLY check.

Signed-off-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
---
  include/sysemu/dma.h | 3 ---
  1 file changed, 3 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 3/9] hw/pci: Document pci_dma_map()

2022-01-06 Thread Richard Henderson

On 1/4/22 12:54 AM, Philippe Mathieu-Daudé wrote:

From: Philippe Mathieu-Daudé 

Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Philippe Mathieu-Daudé 
---
  include/hw/pci/pci.h | 12 
  1 file changed, 12 insertions(+)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 5b36334a28a..07f08aa0626 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -876,6 +876,18 @@ PCI_DMA_DEFINE_LDST(q_be, q_be, 64);
  
  #undef PCI_DMA_DEFINE_LDST
  
+/**

+ * pci_dma_map: Map device PCI address space range into host virtual address
+ *
+ * May map a subset of the requested range, given by and returned in @plen.
+ * May return %NULL and set *@plen to zero(0), if resources needed to perform
+ * the mapping are exhausted.
+ *
+ * @dev: #PCIDevice to be accessed
+ * @addr: address within that device's address space
+ * @plen: pointer to length of buffer; updated on return
+ * @dir: indicates the transfer direction
+ */


As Peter recently mentioned, the ** format is off.
The description goes below the arguments.

Otherwise,
Reviewed-by: Richard Henderson 


r~



Re: [PATCH v2 2/9] hw/pci: Restrict pci-bus stub to sysemu

2022-01-06 Thread Richard Henderson

On 1/4/22 12:54 AM, Philippe Mathieu-Daudé wrote:

From: Philippe Mathieu-Daudé

Neither tools nor user-mode emulation require the PCI bus stub.

Signed-off-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
---
  stubs/meson.build | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 1/9] hw/nvram: Restrict stub to sysemu and tools

2022-01-06 Thread Richard Henderson

On 1/4/22 12:54 AM, Philippe Mathieu-Daudé wrote:

+if have_system or have_tools

...

  if have_system
+  stub_ss.add(files('fw_cfg.c'))


Disconnect in tests?


r~



Re: [PATCH v2 1/2] qemu-storage-daemon: Add vhost-user-blk help

2022-01-06 Thread Eric Blake
On Thu, Dec 23, 2021 at 11:14:25AM +0100, Philippe Mathieu-Daudé wrote:
> Add missing vhost-user-blk help:
> 
>   $ qemu-storage-daemon -h
>   ...
> --export [type=]vhost-user-blk,id=,node-name=,
>  addr.type=unix,addr.path=[,writable=on|off]
>  [,logical-block-size=][,num-queues=]
>export the specified block node as a
>vhosts-user-blk device over UNIX domain socket

Why 'vhosts-' here instead of 'vhost'?

> --export [type=]vhost-user-blk,id=,node-name=,
>  fd,addr.str=[,writable=on|off]
>  [,logical-block-size=][,num-queues=]
>export the specified block node as a
>vhosts-user-blk device over file descriptor

here too.

>   ...
> 
> Fixes: 90fc91d50b7 ("convert vhost-user-blk server to block export API")
> Reported-by: Qing Wang 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  storage-daemon/qemu-storage-daemon.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/storage-daemon/qemu-storage-daemon.c 
> b/storage-daemon/qemu-storage-daemon.c
> index 52cf17e8ace..0c19e128e3f 100644
> --- a/storage-daemon/qemu-storage-daemon.c
> +++ b/storage-daemon/qemu-storage-daemon.c
> @@ -104,6 +104,19 @@ static void help(void)
>  " export the specified block node over FUSE\n"
>  "\n"
>  #endif /* CONFIG_FUSE */
> +#ifdef CONFIG_VHOST_USER_BLK_SERVER
> +"  --export [type=]vhost-user-blk,id=,node-name=,\n"
> +"   addr.type=unix,addr.path=[,writable=on|off]\n"
> +"   [,logical-block-size=][,num-queues=]\n"
> +" export the specified block node as a\n"
> +" vhosts-user-blk device over UNIX domain socket\n"

This...

> +"  --export [type=]vhost-user-blk,id=,node-name=,\n"
> +"   fd,addr.str=[,writable=on|off]\n"
> +"   [,logical-block-size=][,num-queues=]\n"
> +" export the specified block node as a\n"
> +" vhosts-user-blk device over file descriptor\n"

...and this line would need the same tweak.

If you agree that it's a typo, then let's fix it, and you can have

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2022-01-06 Thread Peter Lieven
t;>>>> Nit: I would rename ret to status or something like that to make
>>>>>>> it clear(er) that it is an actual value and never an error.  Or,
>>>>>>> even better, drop it entirely and return one of the two bitmasks
>>>>>>> directly.
>>>>>>>
>>>>>>>> +struct rbd_diff_req req = { .offs = offset };
>>>>>>>> +uint64_t features, flags;
>>>>>>>> +
>>>>>>>> +assert(offset + bytes <= s->image_size);
>>>>>>>> +
>>>>>>>> +/* default to all sectors allocated */
>>>>>>>> +ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
>>>>>>>> +*map = offset;
>>>>>>>> +*file = bs;
>>>>>>>> +*pnum = bytes;
>>>>>>>> +
>>>>>>>> +/* check if RBD image supports fast-diff */
>>>>>>>> +r = rbd_get_features(s->image, );
>>>>>>>> +if (r < 0) {
>>>>>>>> +goto out;
>>>>>>>> +}
>>>>>>>> +if (!(features & RBD_FEATURE_FAST_DIFF)) {
>>>>>>>> +goto out;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/* check if RBD fast-diff result is valid */
>>>>>>>> +r = rbd_get_flags(s->image, );
>>>>>>>> +if (r < 0) {
>>>>>>>> +goto out;
>>>>>>>> +}
>>>>>>>> +if (flags & RBD_FLAG_FAST_DIFF_INVALID) {
>>>>>>>> +goto out;
>>>>>>> Nit: I'm not a fan of labels that cover just the return statement.
>>>>>>> Feel free to ignore this one but I would get rid of it and replace
>>>>>>> these gotos with returns.
>>>>>> That would be return with the bitmask directly coded in if I also
>>>>>>
>>>>>> drop the ret variable. I can change that, no problem.
>>>>>>
>>>>>>
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
>>>>>>>> +  qemu_rbd_co_block_status_cb, );
>>>>>>>> +if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
>>>>>>>> +goto out;
>>>>>>>> +}
>>>>>>>> +assert(req.bytes <= bytes);
>>>>>>>> +if (!req.exists) {
>>>>>>>> +if (r == 0 && !req.bytes) {
>>>>>>>> +/*
>>>>>>>> + * rbd_diff_iterate2 does not invoke callbacks for 
>>>>>>>> unallocated areas
>>>>>>>> + * except for the case where an overlay has a hole where 
>>>>>>>> the parent
>>>>>>>> + * has not. This here catches the case where no callback 
>>>>>>>> was
>>>>>>>> + * invoked at all.
>>>>>>>> + */
>>>>>>> The above is true in the case of diffing against a snapshot, i.e. when
>>>>>>> the "from" snapshot has some data where the "to" revision (whether HEAD
>>>>>>> or another snapshot) has a hole.  I don't think it is true for child vs
>>>>>>> parent (but it has been a while since I looked at the diff code).  As
>>>>>>> long as NULL is passed for fromsnapname, I would expect the callback to
>>>>>>> be invoked only for allocated areas.  If I'm right, we could simplify
>>>>>>> qemu_rbd_co_block_status_cb() considerably.
>>>>>> See my comment in the callback. Can you look at the diff code or give
>>>>>> me at least a pointer where I can find it. I am not that familiar with
>>>>>> the librbd code yet.
>>>>> I assumed that you added !exists handling because it came up in your
>>>>> testing.  If you don't have a test case then let's proceed under the
>>>>> assumption that it doesn't happen for clones.
>>>> Hi Ilya,
>>>>
>>>>
>>>> it seems that our assumption was false. I have an image which shows holes 
>>>> without diffing against
>>>>
>>>> a snapshot. Do you have an idea why?
>>>>
>>>>
>>>> $ rbd --conf /etc/ceph/ceph-dev01.conf --id lieven info 
>>>> dhp-standard/c4ca7ee9-36ce-4fc9-9d3b-ece8a4f8b83e/c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw
>>>> rbd image 'c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw':
>>>>   size 20 GiB in 2 objects
>>>>   order 20 (1 MiB objects)
>>>>   snapshot_count: 2
>>>>   id: 3d6daa102e4d9f
>>>>   block_name_prefix: rbd_data.3d6daa102e4d9f
>>>>   format: 2
>>>>   features: layering, exclusive-lock, object-map, fast-diff, 
>>>> deep-flatten
>>>>   op_features:
>>>>   flags:
>>>>   create_timestamp: Tue Sep 21 14:16:56 2021
>>>>   access_timestamp: Thu Jan  6 15:24:46 2022
>>>>   modify_timestamp: Thu Jan  6 15:45:42 2022
>>>>
>>>>
>>>> $ rbd --conf /etc/ceph/ceph-dev01.conf --id lieven snap ls 
>>>> dhp-standard/c4ca7ee9-36ce-4fc9-9d3b-ece8a4f8b83e/c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw
>>>> SNAPID  NAME SIZEPROTECTED TIMESTAMP
>>>>12297  dlp-20210921-144509  20 GiB Tue Sep 21 14:45:09 2021
>>>>17745  dlp-20220106-04  20 GiB Thu Jan  6 04:00:01 2022
>>>>
>>>>
>>>> $ rbd --conf /etc/ceph/ceph-dev01.conf --id lieven diff --whole-object 
>>>> dhp-standard/c4ca7ee9-36ce-4fc9-9d3b-ece8a4f8b83e/c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw
>>>>  | grep zero
>>>> 2044723201048576  zero
>>>> 1114636288   1048576  zero
>>>> 1115684864   1048576  zero
>>>> 1116733440   1048576  zero
>>>> 1117782016   1048576  zero
>>>> 1218445312   1048576  zero
>>>> 1219493888   1048576  zero
>>>> 1220542464   1048576  zero
>>> Hi Peter,
>>>
>>> Yes, I stumbled upon this just yesterday while looking into another
>>> librbd issue surfaced by this patch [1] and was going to email you and
>>> the list after wrapping my head around this behavior.  I see where it
>>> is coming from but I'm not sure what the right fix is.  I would prefer
>>> to patch librbd but that may turn out to be not feasible.  Let me get
>>> back on this next week.
>>>
>>> [1] https://tracker.ceph.com/issues/53784
>>
>> Looks like it was a good idea to add all these asserts...
>>
>>
>> The issue I see does occur with a simple qemu-img convert on the given image 
>> as source. It triggers the assert(exists) in the cb.
> Right, "qemu-img convert" issues a block status request.  Does your
> image have snapshots?  This behavior should be caused by a snapshot
> that contains a discarded object:
>
> $ rbd create --size 16M foo
> $ sudo rbd map foo
> /dev/rbd0
> $ sudo xfs_io -d -c 'pwrite -b 4M 0 16M' /dev/rbd0
> wrote 16777216/16777216 bytes at offset 0
> 16 MiB, 4 ops; 0.4607 sec (34.726 MiB/sec and 8.6814 ops/sec)
> $ rbd snap create foo@snap
> Creating snap: 100% complete...done.
> $ rbd diff foo
> OffsetLength   Type
> 0 4194304  data
> 4194304   4194304  data
> 8388608   4194304  data
> 12582912  4194304  data
>
> $ blkdiscard -o 4M -l 8M /dev/rbd0
> $ rbd diff foo
> OffsetLength   Type
> 0 4194304  data
> 4194304   4194304  zero
> 8388608   4194304  zero
> 12582912  4194304  data
>
> If you remove that snapshot, these "holes" should disappear from the
> report:
>
> $ rbd snap rm foo@snap
> Removing snap: 100% complete...done.
> $ rbd diff foo
> OffsetLength   Type
> 0 4194304  data
> 12582912  4194304  data
>
> Thanks,
>
> Ilya


So we have 2 different issues:


1) For the holes that we did not expect we should use the original version of 
my patch which was able to handle holes properly.

2) For issue #53784 the right would be to fix the issue in librbd. We could 
discuss if we try to "detect" this issue, report a warning once and

then disable the get_block_status functionality for the session. The question 
is, is it possible to "detect" the issue? If thats not possible

should we patch qemu to report unaligned requests as allocated for affected 
librbd versions?


Best,

Peter










Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2022-01-06 Thread Ilya Dryomov
+
> >>>>>> +assert(offset + bytes <= s->image_size);
> >>>>>> +
> >>>>>> +/* default to all sectors allocated */
> >>>>>> +ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
> >>>>>> +*map = offset;
> >>>>>> +*file = bs;
> >>>>>> +*pnum = bytes;
> >>>>>> +
> >>>>>> +/* check if RBD image supports fast-diff */
> >>>>>> +r = rbd_get_features(s->image, );
> >>>>>> +if (r < 0) {
> >>>>>> +goto out;
> >>>>>> +}
> >>>>>> +if (!(features & RBD_FEATURE_FAST_DIFF)) {
> >>>>>> +goto out;
> >>>>>> +}
> >>>>>> +
> >>>>>> +/* check if RBD fast-diff result is valid */
> >>>>>> +r = rbd_get_flags(s->image, );
> >>>>>> +if (r < 0) {
> >>>>>> +goto out;
> >>>>>> +}
> >>>>>> +if (flags & RBD_FLAG_FAST_DIFF_INVALID) {
> >>>>>> +goto out;
> >>>>> Nit: I'm not a fan of labels that cover just the return statement.
> >>>>> Feel free to ignore this one but I would get rid of it and replace
> >>>>> these gotos with returns.
> >>>> That would be return with the bitmask directly coded in if I also
> >>>>
> >>>> drop the ret variable. I can change that, no problem.
> >>>>
> >>>>
> >>>>>> +}
> >>>>>> +
> >>>>>> +r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
> >>>>>> +  qemu_rbd_co_block_status_cb, );
> >>>>>> +if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
> >>>>>> +goto out;
> >>>>>> +}
> >>>>>> +assert(req.bytes <= bytes);
> >>>>>> +if (!req.exists) {
> >>>>>> +if (r == 0 && !req.bytes) {
> >>>>>> +/*
> >>>>>> + * rbd_diff_iterate2 does not invoke callbacks for 
> >>>>>> unallocated areas
> >>>>>> + * except for the case where an overlay has a hole where 
> >>>>>> the parent
> >>>>>> + * has not. This here catches the case where no callback 
> >>>>>> was
> >>>>>> + * invoked at all.
> >>>>>> + */
> >>>>> The above is true in the case of diffing against a snapshot, i.e. when
> >>>>> the "from" snapshot has some data where the "to" revision (whether HEAD
> >>>>> or another snapshot) has a hole.  I don't think it is true for child vs
> >>>>> parent (but it has been a while since I looked at the diff code).  As
> >>>>> long as NULL is passed for fromsnapname, I would expect the callback to
> >>>>> be invoked only for allocated areas.  If I'm right, we could simplify
> >>>>> qemu_rbd_co_block_status_cb() considerably.
> >>>> See my comment in the callback. Can you look at the diff code or give
> >>>> me at least a pointer where I can find it. I am not that familiar with
> >>>> the librbd code yet.
> >>> I assumed that you added !exists handling because it came up in your
> >>> testing.  If you don't have a test case then let's proceed under the
> >>> assumption that it doesn't happen for clones.
> >>
> >> Hi Ilya,
> >>
> >>
> >> it seems that our assumption was false. I have an image which shows holes 
> >> without diffing against
> >>
> >> a snapshot. Do you have an idea why?
> >>
> >>
> >> $ rbd --conf /etc/ceph/ceph-dev01.conf --id lieven info 
> >> dhp-standard/c4ca7ee9-36ce-4fc9-9d3b-ece8a4f8b83e/c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw
> >> rbd image 'c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw':
> >>   size 20 GiB in 2 objects
> >>   order 20 (1 MiB objects)
> >>   snapshot_count: 2
> >>   id: 3d6daa102e4d9f
> >>   block_name_prefix: rbd_data.3d6daa102e4d9f
> >>   format: 2
> >>   features: layering, exclusive-lock, object-map, fast-diff, 
> >> deep-flatten
> >>   op_features:
> >>   flags:
> >>   create_timestamp: Tue Sep 21 14:16:56 2021
> >>   access_timestamp: Thu Jan  6 15:24:46 2022
> >>   modify_timestamp: Thu Jan  6 15:45:42 2022
> >>
> >>
> >> $ rbd --conf /etc/ceph/ceph-dev01.conf --id lieven snap ls 
> >> dhp-standard/c4ca7ee9-36ce-4fc9-9d3b-ece8a4f8b83e/c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw
> >> SNAPID  NAME SIZEPROTECTED TIMESTAMP
> >>12297  dlp-20210921-144509  20 GiB Tue Sep 21 14:45:09 2021
> >>17745  dlp-20220106-04  20 GiB Thu Jan  6 04:00:01 2022
> >>
> >>
> >> $ rbd --conf /etc/ceph/ceph-dev01.conf --id lieven diff --whole-object 
> >> dhp-standard/c4ca7ee9-36ce-4fc9-9d3b-ece8a4f8b83e/c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw
> >>  | grep zero
> >> 2044723201048576  zero
> >> 1114636288   1048576  zero
> >> 1115684864   1048576  zero
> >> 1116733440   1048576  zero
> >> 1117782016   1048576  zero
> >> 1218445312   1048576  zero
> >> 1219493888   1048576  zero
> >> 1220542464   1048576  zero
> > Hi Peter,
> >
> > Yes, I stumbled upon this just yesterday while looking into another
> > librbd issue surfaced by this patch [1] and was going to email you and
> > the list after wrapping my head around this behavior.  I see where it
> > is coming from but I'm not sure what the right fix is.  I would prefer
> > to patch librbd but that may turn out to be not feasible.  Let me get
> > back on this next week.
> >
> > [1] https://tracker.ceph.com/issues/53784
>
>
> Looks like it was a good idea to add all these asserts...
>
>
> The issue I see does occur with a simple qemu-img convert on the given image 
> as source. It triggers the assert(exists) in the cb.

Right, "qemu-img convert" issues a block status request.  Does your
image have snapshots?  This behavior should be caused by a snapshot
that contains a discarded object:

$ rbd create --size 16M foo
$ sudo rbd map foo
/dev/rbd0
$ sudo xfs_io -d -c 'pwrite -b 4M 0 16M' /dev/rbd0
wrote 16777216/16777216 bytes at offset 0
16 MiB, 4 ops; 0.4607 sec (34.726 MiB/sec and 8.6814 ops/sec)
$ rbd snap create foo@snap
Creating snap: 100% complete...done.
$ rbd diff foo
OffsetLength   Type
0 4194304  data
4194304   4194304  data
8388608   4194304  data
12582912  4194304  data

$ blkdiscard -o 4M -l 8M /dev/rbd0
$ rbd diff foo
OffsetLength   Type
0 4194304  data
4194304   4194304  zero
8388608   4194304  zero
12582912  4194304  data

If you remove that snapshot, these "holes" should disappear from the
report:

$ rbd snap rm foo@snap
Removing snap: 100% complete...done.
$ rbd diff foo
OffsetLength   Type
0 4194304  data
12582912  4194304  data

Thanks,

Ilya



Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2022-01-06 Thread Peter Lieven
lockDriverState **file)
+{
+BDRVRBDState *s = bs->opaque;
+int ret, r;

Nit: I would rename ret to status or something like that to make
it clear(er) that it is an actual value and never an error.  Or,
even better, drop it entirely and return one of the two bitmasks
directly.


+struct rbd_diff_req req = { .offs = offset };
+uint64_t features, flags;
+
+assert(offset + bytes <= s->image_size);
+
+/* default to all sectors allocated */
+ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+*map = offset;
+*file = bs;
+*pnum = bytes;
+
+/* check if RBD image supports fast-diff */
+r = rbd_get_features(s->image, );
+if (r < 0) {
+goto out;
+}
+if (!(features & RBD_FEATURE_FAST_DIFF)) {
+goto out;
+}
+
+/* check if RBD fast-diff result is valid */
+r = rbd_get_flags(s->image, );
+if (r < 0) {
+goto out;
+}
+if (flags & RBD_FLAG_FAST_DIFF_INVALID) {
+goto out;

Nit: I'm not a fan of labels that cover just the return statement.
Feel free to ignore this one but I would get rid of it and replace
these gotos with returns.

That would be return with the bitmask directly coded in if I also

drop the ret variable. I can change that, no problem.



+}
+
+r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
+  qemu_rbd_co_block_status_cb, );
+if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
+goto out;
+}
+assert(req.bytes <= bytes);
+if (!req.exists) {
+if (r == 0 && !req.bytes) {
+/*
+ * rbd_diff_iterate2 does not invoke callbacks for unallocated 
areas
+ * except for the case where an overlay has a hole where the parent
+ * has not. This here catches the case where no callback was
+ * invoked at all.
+ */

The above is true in the case of diffing against a snapshot, i.e. when
the "from" snapshot has some data where the "to" revision (whether HEAD
or another snapshot) has a hole.  I don't think it is true for child vs
parent (but it has been a while since I looked at the diff code).  As
long as NULL is passed for fromsnapname, I would expect the callback to
be invoked only for allocated areas.  If I'm right, we could simplify
qemu_rbd_co_block_status_cb() considerably.

See my comment in the callback. Can you look at the diff code or give
me at least a pointer where I can find it. I am not that familiar with
the librbd code yet.

I assumed that you added !exists handling because it came up in your
testing.  If you don't have a test case then let's proceed under the
assumption that it doesn't happen for clones.


Hi Ilya,


it seems that our assumption was false. I have an image which shows holes 
without diffing against

a snapshot. Do you have an idea why?


$ rbd --conf /etc/ceph/ceph-dev01.conf --id lieven info 
dhp-standard/c4ca7ee9-36ce-4fc9-9d3b-ece8a4f8b83e/c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw
rbd image 'c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw':
  size 20 GiB in 2 objects
  order 20 (1 MiB objects)
  snapshot_count: 2
  id: 3d6daa102e4d9f
  block_name_prefix: rbd_data.3d6daa102e4d9f
  format: 2
  features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
  op_features:
  flags:
  create_timestamp: Tue Sep 21 14:16:56 2021
  access_timestamp: Thu Jan  6 15:24:46 2022
  modify_timestamp: Thu Jan  6 15:45:42 2022


$ rbd --conf /etc/ceph/ceph-dev01.conf --id lieven snap ls 
dhp-standard/c4ca7ee9-36ce-4fc9-9d3b-ece8a4f8b83e/c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw
SNAPID  NAME SIZEPROTECTED TIMESTAMP
   12297  dlp-20210921-144509  20 GiB Tue Sep 21 14:45:09 2021
   17745  dlp-20220106-04  20 GiB Thu Jan  6 04:00:01 2022


$ rbd --conf /etc/ceph/ceph-dev01.conf --id lieven diff --whole-object 
dhp-standard/c4ca7ee9-36ce-4fc9-9d3b-ece8a4f8b83e/c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw
 | grep zero
2044723201048576  zero
1114636288   1048576  zero
1115684864   1048576  zero
1116733440   1048576  zero
1117782016   1048576  zero
1218445312   1048576  zero
1219493888   1048576  zero
1220542464   1048576  zero

Hi Peter,

Yes, I stumbled upon this just yesterday while looking into another
librbd issue surfaced by this patch [1] and was going to email you and
the list after wrapping my head around this behavior.  I see where it
is coming from but I'm not sure what the right fix is.  I would prefer
to patch librbd but that may turn out to be not feasible.  Let me get
back on this next week.

[1] https://tracker.ceph.com/issues/53784



Looks like it was a good idea to add all these asserts...


The issue I see does occur with a simple qemu-img convert on the given image as 
source. It triggers the assert(exists) in the cb.

My dev cluster is running on Nautilus 14.2.22.


Best,

Peter






Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2022-01-06 Thread Ilya Dryomov
th the bitmask directly coded in if I also
> >>
> >> drop the ret variable. I can change that, no problem.
> >>
> >>
> >>>> +}
> >>>> +
> >>>> +r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
> >>>> +  qemu_rbd_co_block_status_cb, );
> >>>> +if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
> >>>> +goto out;
> >>>> +}
> >>>> +assert(req.bytes <= bytes);
> >>>> +if (!req.exists) {
> >>>> +if (r == 0 && !req.bytes) {
> >>>> +/*
> >>>> + * rbd_diff_iterate2 does not invoke callbacks for 
> >>>> unallocated areas
> >>>> + * except for the case where an overlay has a hole where 
> >>>> the parent
> >>>> + * has not. This here catches the case where no callback was
> >>>> + * invoked at all.
> >>>> + */
> >>> The above is true in the case of diffing against a snapshot, i.e. when
> >>> the "from" snapshot has some data where the "to" revision (whether HEAD
> >>> or another snapshot) has a hole.  I don't think it is true for child vs
> >>> parent (but it has been a while since I looked at the diff code).  As
> >>> long as NULL is passed for fromsnapname, I would expect the callback to
> >>> be invoked only for allocated areas.  If I'm right, we could simplify
> >>> qemu_rbd_co_block_status_cb() considerably.
> >> See my comment in the callback. Can you look at the diff code or give
> >> me at least a pointer where I can find it. I am not that familiar with
> >> the librbd code yet.
> > I assumed that you added !exists handling because it came up in your
> > testing.  If you don't have a test case then let's proceed under the
> > assumption that it doesn't happen for clones.
>
>
> Hi Ilya,
>
>
> it seems that our assumption was false. I have an image which shows holes 
> without diffing against
>
> a snapshot. Do you have an idea why?
>
>
> $ rbd --conf /etc/ceph/ceph-dev01.conf --id lieven info 
> dhp-standard/c4ca7ee9-36ce-4fc9-9d3b-ece8a4f8b83e/c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw
> rbd image 'c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw':
>  size 20 GiB in 2 objects
>  order 20 (1 MiB objects)
>  snapshot_count: 2
>  id: 3d6daa102e4d9f
>  block_name_prefix: rbd_data.3d6daa102e4d9f
>  format: 2
>  features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>  op_features:
>  flags:
>  create_timestamp: Tue Sep 21 14:16:56 2021
>  access_timestamp: Thu Jan  6 15:24:46 2022
>  modify_timestamp: Thu Jan  6 15:45:42 2022
>
>
> $ rbd --conf /etc/ceph/ceph-dev01.conf --id lieven snap ls 
> dhp-standard/c4ca7ee9-36ce-4fc9-9d3b-ece8a4f8b83e/c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw
> SNAPID  NAME SIZEPROTECTED TIMESTAMP
>   12297  dlp-20210921-144509  20 GiB Tue Sep 21 14:45:09 2021
>   17745  dlp-20220106-04  20 GiB Thu Jan  6 04:00:01 2022
>
>
> $ rbd --conf /etc/ceph/ceph-dev01.conf --id lieven diff --whole-object 
> dhp-standard/c4ca7ee9-36ce-4fc9-9d3b-ece8a4f8b83e/c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw
>  | grep zero
> 2044723201048576  zero
> 1114636288   1048576  zero
> 1115684864   1048576  zero
> 1116733440   1048576  zero
> 1117782016   1048576  zero
> 1218445312   1048576  zero
> 1219493888   1048576  zero
> 1220542464   1048576  zero

Hi Peter,

Yes, I stumbled upon this just yesterday while looking into another
librbd issue surfaced by this patch [1] and was going to email you and
the list after wrapping my head around this behavior.  I see where it
is coming from but I'm not sure what the right fix is.  I would prefer
to patch librbd but that may turn out to be not feasible.  Let me get
back on this next week.

[1] https://tracker.ceph.com/issues/53784

Thanks,

Ilya



Re: [PATCH V3] block/rbd: implement bdrv_co_block_status

2022-01-06 Thread Peter Lieven
o status or something like that to make
it clear(er) that it is an actual value and never an error.  Or,
even better, drop it entirely and return one of the two bitmasks
directly.


+struct rbd_diff_req req = { .offs = offset };
+uint64_t features, flags;
+
+assert(offset + bytes <= s->image_size);
+
+/* default to all sectors allocated */
+ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+*map = offset;
+*file = bs;
+*pnum = bytes;
+
+/* check if RBD image supports fast-diff */
+r = rbd_get_features(s->image, );
+if (r < 0) {
+goto out;
+}
+if (!(features & RBD_FEATURE_FAST_DIFF)) {
+goto out;
+}
+
+/* check if RBD fast-diff result is valid */
+r = rbd_get_flags(s->image, );
+if (r < 0) {
+goto out;
+}
+if (flags & RBD_FLAG_FAST_DIFF_INVALID) {
+goto out;

Nit: I'm not a fan of labels that cover just the return statement.
Feel free to ignore this one but I would get rid of it and replace
these gotos with returns.


That would be return with the bitmask directly coded in if I also

drop the ret variable. I can change that, no problem.



+}
+
+r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
+  qemu_rbd_co_block_status_cb, );
+if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
+goto out;
+}
+assert(req.bytes <= bytes);
+if (!req.exists) {
+if (r == 0 && !req.bytes) {
+/*
+ * rbd_diff_iterate2 does not invoke callbacks for unallocated 
areas
+ * except for the case where an overlay has a hole where the parent
+ * has not. This here catches the case where no callback was
+ * invoked at all.
+ */

The above is true in the case of diffing against a snapshot, i.e. when
the "from" snapshot has some data where the "to" revision (whether HEAD
or another snapshot) has a hole.  I don't think it is true for child vs
parent (but it has been a while since I looked at the diff code).  As
long as NULL is passed for fromsnapname, I would expect the callback to
be invoked only for allocated areas.  If I'm right, we could simplify
qemu_rbd_co_block_status_cb() considerably.

See my comment in the callback. Can you look at the diff code or give
me at least a pointer where I can find it. I am not that familiar with
the librbd code yet.

I assumed that you added !exists handling because it came up in your
testing.  If you don't have a test case then let's proceed under the
assumption that it doesn't happen for clones.



Hi Ilya,


it seems that our assumption was false. I have an image which shows holes 
without diffing against

a snapshot. Do you have an idea why?


$ rbd --conf /etc/ceph/ceph-dev01.conf --id lieven info 
dhp-standard/c4ca7ee9-36ce-4fc9-9d3b-ece8a4f8b83e/c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw
rbd image 'c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw':
    size 20 GiB in 2 objects
    order 20 (1 MiB objects)
    snapshot_count: 2
    id: 3d6daa102e4d9f
    block_name_prefix: rbd_data.3d6daa102e4d9f
    format: 2
    features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
    op_features:
    flags:
    create_timestamp: Tue Sep 21 14:16:56 2021
    access_timestamp: Thu Jan  6 15:24:46 2022
    modify_timestamp: Thu Jan  6 15:45:42 2022


$ rbd --conf /etc/ceph/ceph-dev01.conf --id lieven snap ls 
dhp-standard/c4ca7ee9-36ce-4fc9-9d3b-ece8a4f8b83e/c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw
SNAPID  NAME SIZE    PROTECTED TIMESTAMP
 12297  dlp-20210921-144509  20 GiB Tue Sep 21 14:45:09 2021
 17745  dlp-20220106-04  20 GiB Thu Jan  6 04:00:01 2022


$ rbd --conf /etc/ceph/ceph-dev01.conf --id lieven diff --whole-object 
dhp-standard/c4ca7ee9-36ce-4fc9-9d3b-ece8a4f8b83e/c1ad11d0-4f6a-4cc1-8aa3-ff3c413c1471.raw
 | grep zero
204472320    1048576  zero
1114636288   1048576  zero
1115684864   1048576  zero
1116733440   1048576  zero
1117782016   1048576  zero
1218445312   1048576  zero
1219493888   1048576  zero
1220542464   1048576  zero




Best,

Peter






[PULL 28/52] hw/vhost-user-blk: turn on VIRTIO_BLK_F_SIZE_MAX feature for virtio blk device

2022-01-06 Thread Michael S. Tsirkin
From: Andy Pei 

Turn on pre-defined feature VIRTIO_BLK_F_SIZE_MAX for virtio blk device to
avoid guest DMA request sizes which are too large for hardware spec.

Signed-off-by: Andy Pei 
Message-Id: <1641202092-149677-1-git-send-email-andy@intel.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Acked-by: Raphael Norwitz 
---
 hw/block/vhost-user-blk.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index ab11ce8252..1a42ae9187 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -252,6 +252,7 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice 
*vdev,
 VHostUserBlk *s = VHOST_USER_BLK(vdev);
 
 /* Turn on pre-defined features */
+virtio_add_feature(, VIRTIO_BLK_F_SIZE_MAX);
 virtio_add_feature(, VIRTIO_BLK_F_SEG_MAX);
 virtio_add_feature(, VIRTIO_BLK_F_GEOMETRY);
 virtio_add_feature(, VIRTIO_BLK_F_TOPOLOGY);
-- 
MST




[PULL 22/52] vhost-user-blk: propagate error return from generic vhost

2022-01-06 Thread Michael S. Tsirkin
From: Roman Kagan 

Fix the only callsite that doesn't propagate the error code from the
generic vhost code.

Signed-off-by: Roman Kagan 
Message-Id: <2021153354.18807-11-rvka...@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Raphael Norwitz 
---
 hw/block/vhost-user-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index f9b17f6813..ab11ce8252 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -100,7 +100,7 @@ static int vhost_user_blk_handle_config_change(struct 
vhost_dev *dev)
_err);
 if (ret < 0) {
 error_report_err(local_err);
-return -1;
+return ret;
 }
 
 /* valid for resize only */
-- 
MST




[PULL 14/52] vhost-user-blk: reconnect on any error during realize

2022-01-06 Thread Michael S. Tsirkin
From: Roman Kagan 

vhost-user-blk realize only attempts to reconnect if the previous
connection attempt failed on "a problem with the connection and not an
error related to the content (which would fail again the same way in the
next attempt)".

However this distinction is very subtle, and may be inadvertently broken
if the code changes somewhere deep down the stack and a new error gets
propagated up to here.

OTOH now that the number of reconnection attempts is limited it seems
harmless to try reconnecting on any error.

So relax the condition of whether to retry connecting to check for any
error.

This patch amends a527e312b5 "vhost-user-blk: Implement reconnection
during realize".

Signed-off-by: Roman Kagan 
Message-Id: <2021153354.18807-2-rvka...@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Raphael Norwitz 
---
 hw/block/vhost-user-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index ba13cb87e5..f9b17f6813 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -511,7 +511,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 *errp = NULL;
 }
 ret = vhost_user_blk_realize_connect(s, errp);
-} while (ret == -EPROTO && retries--);
+} while (ret < 0 && retries--);
 
 if (ret < 0) {
 goto virtio_err;
-- 
MST




Re: [PULL 0/2] SD/MMC patches for 2022-01-04

2022-01-06 Thread Philippe Mathieu-Daudé

On 4/1/22 19:42, Richard Henderson wrote:

On 1/3/22 11:54 PM, Philippe Mathieu-Daudé wrote:
The following changes since commit 
b5a3d8bc9146ba22a25116cb748c97341bf99737:


   Merge tag 'pull-misc-20220103' of https://gitlab.com/rth7680/qemu 
into staging (2022-01-03 09:34:41 -0800)


are available in the Git repository at:

   https://github.com/philmd/qemu.git tags/sdmmc-20220104

for you to fetch changes up to d666c7b529c503381a714b97d2e174848b5aad8d:

   hw/sd: Add SDHC support for SD card SPI-mode (2022-01-04 08:50:27 
+0100)



SD/MMC patches queue

- Add SDHC support for SD card SPI-mode (Frank Chang)



Frank Chang (1):
   hw/sd: Add SDHC support for SD card SPI-mode

Philippe Mathieu-Daudé (1):
   hw/sd/sdcard: Rename Write Protect Group variables

  hw/sd/sd.c | 48 +---
  1 file changed, 29 insertions(+), 19 deletions(-)



Lots of failures of the form

../hw/sd/sd.c:842:33: error: 'SDState' {aka 'struct SDState'} has no 
member named 'wp_groups'; did you mean 'wp_group_bits'?

   842 | if (test_bit(wpnum, sd->wp_groups)) {
   | ^
   | wp_group_bits



I am really sorry for having wasted your time this way... I was in the
middle of migrating my workstation and posted the non-rebased branch :(

I'll take extra care this doesn't happen again.

Regards,

Phil.



Re: [PATCH v2 0/9] hw/dma: Use dma_addr_t type definition when relevant

2022-01-06 Thread Michael S. Tsirkin
On Tue, Jan 04, 2022 at 09:54:22AM +0100, Philippe Mathieu-Daudé wrote:
> Since v1:
> - Addressed David review comment (stick to dma_addr_t type)
> - Addressed Peter review comment (incorrect doc string)


PCI things:

Reviewed-by: Michael S. Tsirkin 

who's merging all this? Yourself?

> Hi,
> 
> This series aims to clarify a bit the DMA API, in particular the
> 'size' argument which is not clear enough (as we use multiple types
> for it). This helps avoiding build failures on 32-bit host [*] (and
> likely overflows calculation too IMO).
> 
> Some units using the DMA API are first removed from user-mode
> emulation to avoid build failure (they shouldn't be there in
> the first place).
> 
> Then some variables are renamed for clarity (no functional change).
> 
> Finally we replace misuses with dma_addr_t typedef. The previous
> patch which failed on 32-bit host applied on top (not failing anymore).
> 
> Regards,
> 
> Phil.
> 
> [*] https://www.mail-archive.com/qemu-devel@nongnu.org/msg858825.html
> 
> Supersedes: <20211231114901.976937-1-phi...@redhat.com>
> 
> Philippe Mathieu-Daudé (9):
>   hw/nvram: Restrict stub to sysemu and tools
>   hw/pci: Restrict pci-bus stub to sysemu
>   hw/pci: Document pci_dma_map()
>   hw/dma: Remove CONFIG_USER_ONLY check
>   hw/rdma/rdma_utils: Rename rdma_pci_dma_map 'len' argument
>   hw/scsi: Rename SCSIRequest::resid as 'residual'
>   hw/dma: Fix format string issues using dma_addr_t
>   hw/dma: Use dma_addr_t type definition when relevant
>   hw/dma: Let dma_buf_read() / dma_buf_write() propagate MemTxResult
> 
>  hw/rdma/rdma_utils.h   |  2 +-
>  include/hw/pci/pci.h   | 12 ++
>  include/hw/scsi/scsi.h |  4 +-
>  include/sysemu/dma.h   | 31 ---
>  hw/ide/ahci.c  | 10 ++---
>  hw/nvme/ctrl.c |  6 +--
>  hw/rdma/rdma_utils.c   | 14 +++
>  hw/scsi/megasas.c  | 85 +-
>  hw/scsi/scsi-bus.c | 12 +++---
>  hw/scsi/scsi-disk.c|  4 +-
>  softmmu/dma-helpers.c  | 34 +++--
>  hw/nvram/meson.build   |  6 ++-
>  hw/rdma/trace-events   |  2 +-
>  stubs/meson.build  |  4 +-
>  14 files changed, 134 insertions(+), 92 deletions(-)
> 
> -- 
> 2.33.1
> 




Re: [PATCH v3 01/15] pcie: Add support for Single Root I/O Virtualization (SR/IOV)

2022-01-06 Thread Michael S. Tsirkin
On Tue, Dec 21, 2021 at 03:32:32PM +0100, Lukasz Maniak wrote:
> From: Knut Omang 
> 
> This patch provides the building blocks for creating an SR/IOV
> PCIe Extended Capability header and register/unregister
> SR/IOV Virtual Functions.
> 
> Signed-off-by: Knut Omang 
> ---
>  hw/pci/meson.build  |   1 +
>  hw/pci/pci.c|  97 +---
>  hw/pci/pcie.c   |   5 +
>  hw/pci/pcie_sriov.c | 287 
>  hw/pci/trace-events |   5 +
>  include/hw/pci/pci.h|  12 +-
>  include/hw/pci/pcie.h   |   6 +
>  include/hw/pci/pcie_sriov.h |  67 +
>  include/qemu/typedefs.h |   2 +
>  9 files changed, 456 insertions(+), 26 deletions(-)
>  create mode 100644 hw/pci/pcie_sriov.c
>  create mode 100644 include/hw/pci/pcie_sriov.h
> 
> diff --git a/hw/pci/meson.build b/hw/pci/meson.build
> index 5c4bbac817..bcc9c75919 100644
> --- a/hw/pci/meson.build
> +++ b/hw/pci/meson.build
> @@ -5,6 +5,7 @@ pci_ss.add(files(
>'pci.c',
>'pci_bridge.c',
>'pci_host.c',
> +  'pcie_sriov.c',
>'shpc.c',
>'slotid_cap.c'
>  ))
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index e5993c1ef5..1892a7e74c 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -239,6 +239,9 @@ int pci_bar(PCIDevice *d, int reg)
>  {
>  uint8_t type;
>  
> +/* PCIe virtual functions do not have their own BARs */
> +assert(!pci_is_vf(d));
> +
>  if (reg != PCI_ROM_SLOT)
>  return PCI_BASE_ADDRESS_0 + reg * 4;
>  
> @@ -304,10 +307,30 @@ void pci_device_deassert_intx(PCIDevice *dev)
>  }
>  }
>  
> -static void pci_do_device_reset(PCIDevice *dev)
> +static void pci_reset_regions(PCIDevice *dev)
>  {
>  int r;
> +if (pci_is_vf(dev)) {
> +return;
> +}
> +
> +for (r = 0; r < PCI_NUM_REGIONS; ++r) {
> +PCIIORegion *region = >io_regions[r];
> +if (!region->size) {
> +continue;
> +}
> +
> +if (!(region->type & PCI_BASE_ADDRESS_SPACE_IO) &&
> +region->type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> +pci_set_quad(dev->config + pci_bar(dev, r), region->type);
> +} else {
> +pci_set_long(dev->config + pci_bar(dev, r), region->type);
> +}
> +}
> +}
>  
> +static void pci_do_device_reset(PCIDevice *dev)
> +{
>  pci_device_deassert_intx(dev);
>  assert(dev->irq_state == 0);
>  
> @@ -323,19 +346,7 @@ static void pci_do_device_reset(PCIDevice *dev)
>pci_get_word(dev->wmask + PCI_INTERRUPT_LINE) |
>pci_get_word(dev->w1cmask + 
> PCI_INTERRUPT_LINE));
>  dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
> -for (r = 0; r < PCI_NUM_REGIONS; ++r) {
> -PCIIORegion *region = >io_regions[r];
> -if (!region->size) {
> -continue;
> -}
> -
> -if (!(region->type & PCI_BASE_ADDRESS_SPACE_IO) &&
> -region->type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> -pci_set_quad(dev->config + pci_bar(dev, r), region->type);
> -} else {
> -pci_set_long(dev->config + pci_bar(dev, r), region->type);
> -}
> -}
> +pci_reset_regions(dev);
>  pci_update_mappings(dev);
>  
>  msi_reset(dev);
> @@ -884,6 +895,15 @@ static void pci_init_multifunction(PCIBus *bus, 
> PCIDevice *dev, Error **errp)
>  dev->config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;
>  }
>  
> +/* With SR/IOV and ARI, a device at function 0 need not be a 
> multifunction
> + * device, as it may just be a VF that ended up with function 0 in
> + * the legacy PCI interpretation. Avoid failing in such cases:
> + */
> +if (pci_is_vf(dev) &&
> +dev->exp.sriov_vf.pf->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
> +return;
> +}
> +
>  /*
>   * multifunction bit is interpreted in two ways as follows.
>   *   - all functions must set the bit to 1.
> @@ -1083,6 +1103,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
> *pci_dev,
> bus->devices[devfn]->name);
>  return NULL;
>  } else if (dev->hotplugged &&
> +   !pci_is_vf(pci_dev) &&
> pci_get_function_0(pci_dev)) {
>  error_setg(errp, "PCI: slot %d function 0 already occupied by %s,"
> " new func %s cannot be exposed to guest.",
> @@ -1191,6 +1212,7 @@ void pci_register_bar(PCIDevice *pci_dev, int 
> region_num,
>  pcibus_t size = memory_region_size(memory);
>  uint8_t hdr_type;
>  
> +assert(!pci_is_vf(pci_dev)); /* VFs must use pcie_sriov_vf_register_bar 
> */
>  assert(region_num >= 0);
>  assert(region_num < PCI_NUM_REGIONS);
>  assert(is_power_of_2(size));
> @@ -1294,11 +1316,43 @@ pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int 
> region_num)
>  return pci_dev->io_regions[region_num].addr;
>  }
>  
> -static pcibus_t pci_bar_address(PCIDevice *d,
> - 

Re: [PATCH 00/10] vhost: stick to -errno error return convention

2022-01-06 Thread Michael S. Tsirkin
On Thu, Nov 11, 2021 at 06:33:44PM +0300, Roman Kagan wrote:
> Error propagation between the generic vhost code and the specific backends is
> not quite consistent: some places follow "return -1 and set errno" convention,
> while others assume "return negated errno".  Furthermore, not enough care is
> taken not to clobber errno.
> 
> As a result, on certain code paths the errno resulting from a failure may get
> overridden by another function call, and then that zero errno inidicating
> success is propagated up the stack, leading to failures being lost.  In
> particular, we've seen errors in the communication with a vhost-user-blk slave
> not trigger an immediate connection drop and reconnection, leaving it in a
> broken state.
> 
> Rework error propagation to always return negated errno on errors and
> correctly pass it up the stack.
> 
> Roman Kagan (10):
>   vhost-user-blk: reconnect on any error during realize
>   chardev/char-socket: tcp_chr_recv: don't clobber errno
>   chardev/char-socket: tcp_chr_sync_read: don't clobber errno
>   chardev/char-fe: don't allow EAGAIN from blocking read

So I dropped this one. If you are so inclined, pls work on
this separately.

>   vhost-backend: avoid overflow on memslots_limit
>   vhost-backend: stick to -errno error return convention
>   vhost-vdpa: stick to -errno error return convention
>   vhost-user: stick to -errno error return convention
>   vhost: stick to -errno error return convention
>   vhost-user-blk: propagate error return from generic vhost
> 
>  chardev/char-fe.c |   7 +-
>  chardev/char-socket.c |  17 +-
>  hw/block/vhost-user-blk.c |   4 +-
>  hw/virtio/vhost-backend.c |   4 +-
>  hw/virtio/vhost-user.c| 401 +-
>  hw/virtio/vhost-vdpa.c|  37 ++--
>  hw/virtio/vhost.c |  98 +-
>  7 files changed, 307 insertions(+), 261 deletions(-)
> 
> -- 
> 2.33.1
>