Re: [PATCH 0/3] Support for Zoned UFS

2023-12-11 Thread Jeuk Kim

I've already done all the ufs related review.


If the SCSI maintainers approve this patchset, I'll put it in my tree 
and create a pull request.



Thank you,

Jeuk


On 12/8/2023 3:09 PM, Daejun Park wrote:

This patch enables zoned support for UFS devices.
By applying this patch, a QEMU run can use parameters to configure whether
each LU of each UFS is zoned, and the capacity, size, and max open zones.
Zoned UFS is implemented by referencing ZBC2.
(https://www.t10.org/members/w_zbc2.htm)

Daejun Park (3):
   hw/ufs: Support for Zoned UFS
   hw/scsi: add mode sense support for zbc device
   tests/qtest: Add tests for Zoned UFS

  hw/scsi/scsi-disk.c|  13 +-
  hw/ufs/lu.c| 616 +
  hw/ufs/ufs.c   |   6 +-
  hw/ufs/ufs.h   |  32 +++
  include/block/ufs.h|  31 +++
  tests/qtest/ufs-test.c | 178 
  6 files changed, 870 insertions(+), 6 deletions(-)





Re: [PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation

2023-12-11 Thread Jason Wang
On Mon, Dec 11, 2023 at 4:29 PM Akihiko Odaki  wrote:
>
> On 2023/12/11 16:26, Jason Wang wrote:
> > On Mon, Dec 11, 2023 at 1:30 PM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/12/11 11:52, Jason Wang wrote:
> >>> On Sun, Dec 10, 2023 at 12:06 PM Akihiko Odaki  
> >>> wrote:
> 
>  Introduction
>  
> 
>  This series is based on the RFC series submitted by Yui Washizu[1].
>  See also [2] for the context.
> 
>  This series enables SR-IOV emulation for virtio-net. It is useful
>  to test SR-IOV support on the guest, or to expose several vDPA devices
>  in a VM. vDPA devices can also provide L2 switching feature for
>  offloading though it is out of scope to allow the guest to configure
>  such a feature.
> 
>  The PF side code resides in virtio-pci. The VF side code resides in
>  the PCI common infrastructure, but it is restricted to work only for
>  virtio-net-pci because of lack of validation.
> 
>  User Interface
>  --
> 
>  A user can configure a SR-IOV capable virtio-net device by adding
>  virtio-net-pci functions to a bus. Below is a command line example:
>  -netdev user,id=n -netdev user,id=o
>  -netdev user,id=p -netdev user,id=q
>  -device pcie-root-port,id=b
>  -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
>  -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
>  -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
>  -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
> 
>  The VFs specify the paired PF with "sriov-pf" property. The PF must be
>  added after all VFs. It is user's responsibility to ensure that VFs have
>  function numbers larger than one of the PF, and the function numbers
>  have a consistent stride.
> >>>
> >>> This seems not user friendly. Any reason we can't just allow user to
> >>> specify the stride here?
> >>
> >> It should be possible to assign addr automatically without requiring
> >> user to specify the stride. I'll try that in the next version.
> >>
> >>>
> >>> Btw, I vaguely remember qemu allows the params to be accepted as a
> >>> list. If this is true, we can accept a list of netdev here?
> >>
> >> Yes, rocker does that. But the problem is not just about getting
> >> parameters needed for VFs, which I forgot to mention in the cover letter
> >> and will explain below.
> >>
> >>>
> 
>  Keeping VF instances
>  
> 
>  A problem with SR-IOV emulation is that it needs to hotplug the VFs as
>  the guest requests. Previously, this behavior was implemented by
>  realizing and unrealizing VFs at runtime. However, this strategy does
>  not work well for the proposed virtio-net emulation; in this proposal,
>  device options passed in the command line must be maintained as VFs
>  are hotplugged, but they are consumed when the machine starts and not
>  available after that, which makes realizing VFs at runtime impossible.
> >>>
> >>> Could we store the device options in the PF?
> >>
> >> I wrote it's to store the device options, but the problem is actually
> >> more about realizing VFs at runtime instead of at the initialization time.
> >>
> >> Realizing VFs at runtime have two major problems. One is that it delays
> >> the validations of options; invalid options will be noticed when the
> >> guest requests to realize VFs.
> >
> > If PCI spec allows the failure when creating VF, then it should not be
> > a problem.
>
> I doubt the spec cares such a failure at all. VF enablement should
> always work for a real hardware. It's neither user-friendly to tell
> configuration errors at runtime.

I'm not sure which options we should care about? Did you mean netdev
options or the virtio-net specific ones?

If VF stick to the same options as PF (except for the SRIOV), it
should be validated during the PF initialization.

>
> >
> >> netdevs also warn that they are not used
> >> at initialization time, not knowing that they will be used by VFs later.
> >
> > We could invent things to calm down this false positive.
> >
> >> References to other QEMU objects in the option may also die before VFs
> >> are realized.
> >
> > Is there any other thing than netdev we need to consider?
>
> You will also want to set a distinct mac for each VF. Other properties
> does not matter much in my opinion.

Qemu doesn't check mac duplication now. So it's up to the mgmt layer.

>
> >
> >>
> >> The other problem is that QEMU cannot interact with the unrealized VFs.
> >> For example, if you type "device_add virtio-net-pci,id=vf,sriov-pf=pf"
> >> in HMP, you will expect "device_del vf" works, but it's hard to
> >> implement such behaviors with unrealized VFs.
> >
> > I think hotplug can only be done at PF level if we do that.
>
> Assuming you mean to let netdev and mac accept arrays, yes.
>
> >
> >>
> >> I was first going to 

Re: [PULL 20/20] tracing: install trace events file only if necessary

2023-12-11 Thread Carlos Santos
On Mon, Dec 11, 2023 at 4:58 PM Stefan Hajnoczi  wrote:
>
> On Wed, Dec 06, 2023 at 07:26:01AM -0300, Carlos Santos wrote:
> > On Thu, Apr 20, 2023 at 9:10 AM Stefan Hajnoczi  wrote:
> > >
> > > From: Carlos Santos 
> > >
> > > It is not useful when configuring with --enable-trace-backends=nop.
> > >
> > > Signed-off-by: Carlos Santos 
> > > Signed-off-by: Stefan Hajnoczi 
> > > Message-Id: <20230408010410.281263-1-casan...@redhat.com>
> > > ---
> > >  trace/meson.build | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/trace/meson.build b/trace/meson.build
> > > index 8e80be895c..30b1d942eb 100644
> > > --- a/trace/meson.build
> > > +++ b/trace/meson.build
> > > @@ -64,7 +64,7 @@ trace_events_all = custom_target('trace-events-all',
> > >   input: trace_events_files,
> > >   command: [ 'cat', '@INPUT@' ],
> > >   capture: true,
> > > - install: true,
> > > + install: get_option('trace_backends') 
> > > != [ 'nop' ],
> > >   install_dir: qemu_datadir)
> > >
> > >  if 'ust' in get_option('trace_backends')
> > > --
> > > 2.39.2
> > >
> >
> > Hello,
> >
> > I still don't see this in the master branch. Is there something
> > preventing it from being applied?
>
> Hi Carlos,
> Apologies, I dropped this patch when respinning the pull request after
> the CI test failures caused by the zoned patches.
>
> Your patch has been merged on my tracing tree again and will make it
> into qemu.git/master when the development window opens again after the
> QEMU 8.2.0 release (hopefully next Tuesday).
>
> Stefan

Great. Thanks!

-- 
Carlos Santos
Senior Software Maintenance Engineer
Red Hat
casan...@redhat.comT: +55-11-3534-6186




Re: [PULL 20/20] tracing: install trace events file only if necessary

2023-12-11 Thread Stefan Hajnoczi
On Wed, Dec 06, 2023 at 07:26:01AM -0300, Carlos Santos wrote:
> On Thu, Apr 20, 2023 at 9:10 AM Stefan Hajnoczi  wrote:
> >
> > From: Carlos Santos 
> >
> > It is not useful when configuring with --enable-trace-backends=nop.
> >
> > Signed-off-by: Carlos Santos 
> > Signed-off-by: Stefan Hajnoczi 
> > Message-Id: <20230408010410.281263-1-casan...@redhat.com>
> > ---
> >  trace/meson.build | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/trace/meson.build b/trace/meson.build
> > index 8e80be895c..30b1d942eb 100644
> > --- a/trace/meson.build
> > +++ b/trace/meson.build
> > @@ -64,7 +64,7 @@ trace_events_all = custom_target('trace-events-all',
> >   input: trace_events_files,
> >   command: [ 'cat', '@INPUT@' ],
> >   capture: true,
> > - install: true,
> > + install: get_option('trace_backends') != 
> > [ 'nop' ],
> >   install_dir: qemu_datadir)
> >
> >  if 'ust' in get_option('trace_backends')
> > --
> > 2.39.2
> >
> 
> Hello,
> 
> I still don't see this in the master branch. Is there something
> preventing it from being applied?

Hi Carlos,
Apologies, I dropped this patch when respinning the pull request after
the CI test failures caused by the zoned patches.

Your patch has been merged on my tracing tree again and will make it
into qemu.git/master when the development window opens again after the
QEMU 8.2.0 release (hopefully next Tuesday).

Stefan

> 
> -- 
> Carlos Santos
> Senior Software Maintenance Engineer
> Red Hat
> casan...@redhat.comT: +55-11-3534-6186
> 


signature.asc
Description: PGP signature


Re: [PATCH v2 1/2] qdev: add IOThreadVirtQueueMappingList property type

2023-12-11 Thread Markus Armbruster
Kevin Wolf  writes:

> Am 18.09.2023 um 18:16 hat Stefan Hajnoczi geschrieben:
>> virtio-blk and virtio-scsi devices will need a way to specify the
>> mapping between IOThreads and virtqueues. At the moment all virtqueues
>> are assigned to a single IOThread or the main loop. This single thread
>> can be a CPU bottleneck, so it is necessary to allow finer-grained
>> assignment to spread the load.
>> 
>> Introduce DEFINE_PROP_IOTHREAD_VQ_MAPPING_LIST() so devices can take a
>> parameter that maps virtqueues to IOThreads. The command-line syntax for
>> this new property is as follows:
>> 
>>   --device 
>> '{"driver":"foo","iothread-vq-mapping":[{"iothread":"iothread0","vqs":[0,1,2]},...]}'
>> 
>> IOThreads are specified by name and virtqueues are specified by 0-based
>> index.
>> 
>> It will be common to simply assign virtqueues round-robin across a set
>> of IOThreads. A convenient syntax that does not require specifying
>> individual virtqueue indices is available:
>> 
>>   --device 
>> '{"driver":"foo","iothread-vq-mapping":[{"iothread":"iothread0"},{"iothread":"iothread1"},...]}'
>> 
>> Signed-off-by: Stefan Hajnoczi 
>
> When testing this, Qing Wang noticed that "info qtree" crashes. This is
> because the string output visitor doesn't support structs. I suppose
> IOThreadVirtQueueMapping is the first struct type that is used in a qdev
> property type.
>
> So we'll probably have to add some kind of struct support to the string
> output visitor before we can apply this. Even if it's as stupid as just
> printing "" without actually displaying
> the value.

The string visitors have been nothing but trouble.

For input, we can now use keyval_parse() and the QObject input visitor
instead.  Comes with restrictions, but I'd argue it's a more solid base
than the string input visitor.

Perhaps we can do something similar for output: create a suitable
formatter for use it with the QObject output visitor, replacing the
string output visitor.




Re: [PATCH for-8.2] block: Fix AioContext locking in qmp_block_resize()

2023-12-11 Thread Stefan Hajnoczi
On Fri, 8 Dec 2023 at 07:44, Kevin Wolf  wrote:
>
> The AioContext must be unlocked before calling blk_co_unref(), because
> it takes the AioContext lock internally in blk_unref_bh(), which is
> scheduled in the main thread. If we don't unlock, the AioContext is
> locked twice and nested event loops such as in bdrv_graph_wrlock() will
> deadlock.
>
> Cc: qemu-sta...@nongnu.org
> Fixes: https://issues.redhat.com/browse/RHEL-15965
> Fixes: 0c7d204f50c382c6baac8c94bd57af4a022b3888
> Signed-off-by: Kevin Wolf 
> ---
>  blockdev.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

>From IRC:

09:40 < stefanha> kwolf: "[PATCH for-8.2] block: Fix AioContext
locking in qmp_block_resize()" fixes QEMU 8.1 bug and is not a
regression?
09:41 < stefanha> I'm trying to understand the nature of the issue and
whether to roll an -rc4 tomorrow and delay the QEMU 8.2 release by a
week.
09:41 < kwolf> stefanha: Looks like it, yes
09:41 < kwolf> stefanha: Probably not worth an -rc4 on its own if
there are no other fixes
09:42 < stefanha> Okay, thanks. If nothing else comes up by tomorrow I
will tag v8.2.0 (final) and we can merge this immediately when the
development window and -stable tree opens.

>
> diff --git a/blockdev.c b/blockdev.c
> index 4c1177e8db..c91f49e7b6 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -2400,8 +2400,9 @@ void coroutine_fn qmp_block_resize(const char *device, 
> const char *node_name,
>
>  bdrv_co_lock(bs);
>  bdrv_drained_end(bs);
> -blk_co_unref(blk);
>  bdrv_co_unlock(bs);
> +
> +blk_co_unref(blk);
>  }
>
>  void qmp_block_stream(const char *job_id, const char *device,
> --
> 2.43.0
>
>



Re: [PATCH v2 1/2] qdev: add IOThreadVirtQueueMappingList property type

2023-12-11 Thread Kevin Wolf
Am 18.09.2023 um 18:16 hat Stefan Hajnoczi geschrieben:
> virtio-blk and virtio-scsi devices will need a way to specify the
> mapping between IOThreads and virtqueues. At the moment all virtqueues
> are assigned to a single IOThread or the main loop. This single thread
> can be a CPU bottleneck, so it is necessary to allow finer-grained
> assignment to spread the load.
> 
> Introduce DEFINE_PROP_IOTHREAD_VQ_MAPPING_LIST() so devices can take a
> parameter that maps virtqueues to IOThreads. The command-line syntax for
> this new property is as follows:
> 
>   --device 
> '{"driver":"foo","iothread-vq-mapping":[{"iothread":"iothread0","vqs":[0,1,2]},...]}'
> 
> IOThreads are specified by name and virtqueues are specified by 0-based
> index.
> 
> It will be common to simply assign virtqueues round-robin across a set
> of IOThreads. A convenient syntax that does not require specifying
> individual virtqueue indices is available:
> 
>   --device 
> '{"driver":"foo","iothread-vq-mapping":[{"iothread":"iothread0"},{"iothread":"iothread1"},...]}'
> 
> Signed-off-by: Stefan Hajnoczi 

When testing this, Qing Wang noticed that "info qtree" crashes. This is
because the string output visitor doesn't support structs. I suppose
IOThreadVirtQueueMapping is the first struct type that is used in a qdev
property type.

So we'll probably have to add some kind of struct support to the string
output visitor before we can apply this. Even if it's as stupid as just
printing "" without actually displaying
the value.

Kevin




[PATCH] iotests: don't run tests requiring cached writes in '-nocache' mode

2023-12-11 Thread Andrey Drobyshev
There're tests whose logic implies running without O_DIRECT set,
otherwise they fail when running iotests in '-nocache' mode.  For these
tests let's add _require_no_o_direct() helper which can be put in the
preabmle and which makes sure '-nocache' isn't set.  Use it to skip
running the following tests:

  * 271: creates files with unaligned sizes, thus producing multiple
errors like:

qemu-io: can't open device /path/to/t.qcow2.raw: Cannot get 'write'
permission without 'resize': Image size is not a multiple of request alignment

  * 308, file-io-error: use fuse exports.  Though fuse does have
'direct-io' mode (see https://docs.kernel.org/filesystems/fuse-io.html)
we aren't using it yet, thus getting errors like:

qemu-io: can't open device /path/to/t.qcow2.fuse: Could not open
'/path/to/t.qcow2.fuse': filesystem does not support O_DIRECT

Signed-off-by: Andrey Drobyshev 
---
 tests/qemu-iotests/271 | 1 +
 tests/qemu-iotests/308 | 2 ++
 tests/qemu-iotests/common.rc   | 7 +++
 tests/qemu-iotests/tests/file-io-error | 1 +
 4 files changed, 11 insertions(+)

diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
index 59a6fafa2f..1424b6954d 100755
--- a/tests/qemu-iotests/271
+++ b/tests/qemu-iotests/271
@@ -44,6 +44,7 @@ _supported_fmt qcow2
 _supported_proto file nfs
 _supported_os Linux
 _unsupported_imgopts extended_l2 compat=0.10 cluster_size data_file 
refcount_bits=1[^0-9]
+_require_no_o_direct
 
 l2_offset=$((0x4))
 
diff --git a/tests/qemu-iotests/308 b/tests/qemu-iotests/308
index de12b2b1b9..535455e5b1 100755
--- a/tests/qemu-iotests/308
+++ b/tests/qemu-iotests/308
@@ -52,6 +52,8 @@ _unsupported_fmt vpc
 _supported_proto file # We create the FUSE export manually
 _supported_os Linux # We need /dev/urandom
 
+_require_no_o_direct
+
 # $1: Export ID
 # $2: Options (beyond the node-name and ID)
 # $3: Expected return value (defaults to 'return')
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 95c12577dd..f61eae73b4 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -857,6 +857,13 @@ _check_o_direct()
 [[ "$out" != *"O_DIRECT"* ]]
 }
 
+_require_no_o_direct()
+{
+if [ $CACHEMODE == "none" ] || [ $CACHEMODE == "directsync" ]; then
+_notrun "not suitable for cache mode: $CACHEMODE (implies O_DIRECT)"
+fi
+}
+
 _require_o_direct()
 {
 if ! _check_o_direct; then
diff --git a/tests/qemu-iotests/tests/file-io-error 
b/tests/qemu-iotests/tests/file-io-error
index 88ee5f670c..2b8dc7f009 100755
--- a/tests/qemu-iotests/tests/file-io-error
+++ b/tests/qemu-iotests/tests/file-io-error
@@ -40,6 +40,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 # Format-agnostic (we do not use any), but we do test the file protocol
 _supported_proto file
 _require_drivers blkdebug null-co
+_require_no_o_direct
 
 if [ "$IMGOPTSSYNTAX" = "true" ]; then
 # We need `$QEMU_IO -f file` to work; IMGOPTSSYNTAX uses --image-opts,
-- 
2.39.3




Re: [PATCH] block: allocate aligned write buffer for 'truncate -m full'

2023-12-11 Thread Denis V. Lunev

On 12/11/23 11:55, Andrey Drobyshev wrote:

In case we're truncating an image opened with O_DIRECT, we might get
-EINVAL on write with unaligned buffer.  In particular, when running
iotests/298 with '-nocache' we get:

qemu-io: Failed to resize underlying file: Could not write zeros for
preallocation: Invalid argument

Let's just allocate the buffer using qemu_blockalign0() instead.

Signed-off-by: Andrey Drobyshev 
---
  block/file-posix.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index b862406c71..cee8de510b 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2354,7 +2354,7 @@ static int handle_aiocb_truncate(void *opaque)
  goto out;
  }
  
-buf = g_malloc0(65536);

+buf = qemu_blockalign0(aiocb->bs, 65536);
  
  seek_result = lseek(fd, current_length, SEEK_SET);

  if (seek_result < 0) {
@@ -2413,7 +2413,7 @@ out:
  }
  }
  
-g_free(buf);

+qemu_vfree(buf);
  return result;
  }
  

Reviewed-by: Denis V. Lunev 



[PATCH] block: allocate aligned write buffer for 'truncate -m full'

2023-12-11 Thread Andrey Drobyshev
In case we're truncating an image opened with O_DIRECT, we might get
-EINVAL on write with unaligned buffer.  In particular, when running
iotests/298 with '-nocache' we get:

qemu-io: Failed to resize underlying file: Could not write zeros for
preallocation: Invalid argument

Let's just allocate the buffer using qemu_blockalign0() instead.

Signed-off-by: Andrey Drobyshev 
---
 block/file-posix.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index b862406c71..cee8de510b 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2354,7 +2354,7 @@ static int handle_aiocb_truncate(void *opaque)
 goto out;
 }
 
-buf = g_malloc0(65536);
+buf = qemu_blockalign0(aiocb->bs, 65536);
 
 seek_result = lseek(fd, current_length, SEEK_SET);
 if (seek_result < 0) {
@@ -2413,7 +2413,7 @@ out:
 }
 }
 
-g_free(buf);
+qemu_vfree(buf);
 return result;
 }
 
-- 
2.39.3




Re: [PULL 29/32] virtio-blk: implement BlockDevOps->drained_begin()

2023-12-11 Thread Fiona Ebner
Am 08.12.23 um 09:32 schrieb Kevin Wolf:
> 
> I'm not involved in it myself, but the kind of theme reminds me of this
> downstream bug that Hanna analysed recently:
> 
> https://issues.redhat.com/browse/RHEL-3934
> 
> Does it look like the same root cause to you?
> 

Thank you for the reference! Yes, that does sound like the same root
cause. And the workaround I ended up with is also very similar, but it
was missing kicking the virt queue.

Best Regards,
Fiona




Re: [PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation

2023-12-11 Thread Akihiko Odaki

On 2023/12/11 16:26, Jason Wang wrote:

On Mon, Dec 11, 2023 at 1:30 PM Akihiko Odaki  wrote:


On 2023/12/11 11:52, Jason Wang wrote:

On Sun, Dec 10, 2023 at 12:06 PM Akihiko Odaki  wrote:


Introduction


This series is based on the RFC series submitted by Yui Washizu[1].
See also [2] for the context.

This series enables SR-IOV emulation for virtio-net. It is useful
to test SR-IOV support on the guest, or to expose several vDPA devices
in a VM. vDPA devices can also provide L2 switching feature for
offloading though it is out of scope to allow the guest to configure
such a feature.

The PF side code resides in virtio-pci. The VF side code resides in
the PCI common infrastructure, but it is restricted to work only for
virtio-net-pci because of lack of validation.

User Interface
--

A user can configure a SR-IOV capable virtio-net device by adding
virtio-net-pci functions to a bus. Below is a command line example:
-netdev user,id=n -netdev user,id=o
-netdev user,id=p -netdev user,id=q
-device pcie-root-port,id=b
-device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
-device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
-device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
-device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f

The VFs specify the paired PF with "sriov-pf" property. The PF must be
added after all VFs. It is user's responsibility to ensure that VFs have
function numbers larger than one of the PF, and the function numbers
have a consistent stride.


This seems not user friendly. Any reason we can't just allow user to
specify the stride here?


It should be possible to assign addr automatically without requiring
user to specify the stride. I'll try that in the next version.



Btw, I vaguely remember qemu allows the params to be accepted as a
list. If this is true, we can accept a list of netdev here?


Yes, rocker does that. But the problem is not just about getting
parameters needed for VFs, which I forgot to mention in the cover letter
and will explain below.





Keeping VF instances


A problem with SR-IOV emulation is that it needs to hotplug the VFs as
the guest requests. Previously, this behavior was implemented by
realizing and unrealizing VFs at runtime. However, this strategy does
not work well for the proposed virtio-net emulation; in this proposal,
device options passed in the command line must be maintained as VFs
are hotplugged, but they are consumed when the machine starts and not
available after that, which makes realizing VFs at runtime impossible.


Could we store the device options in the PF?


I wrote it's to store the device options, but the problem is actually
more about realizing VFs at runtime instead of at the initialization time.

Realizing VFs at runtime have two major problems. One is that it delays
the validations of options; invalid options will be noticed when the
guest requests to realize VFs.


If PCI spec allows the failure when creating VF, then it should not be
a problem.


I doubt the spec cares such a failure at all. VF enablement should 
always work for a real hardware. It's neither user-friendly to tell 
configuration errors at runtime.





netdevs also warn that they are not used
at initialization time, not knowing that they will be used by VFs later.


We could invent things to calm down this false positive.


References to other QEMU objects in the option may also die before VFs
are realized.


Is there any other thing than netdev we need to consider?


You will also want to set a distinct mac for each VF. Other properties 
does not matter much in my opinion.






The other problem is that QEMU cannot interact with the unrealized VFs.
For example, if you type "device_add virtio-net-pci,id=vf,sriov-pf=pf"
in HMP, you will expect "device_del vf" works, but it's hard to
implement such behaviors with unrealized VFs.


I think hotplug can only be done at PF level if we do that.


Assuming you mean to let netdev and mac accept arrays, yes.





I was first going to compromise and allow such quirky behaviors, but I
realized such a compromise is unnecessary if we reuse the PCI power down
logic so I wrote v2.


Haven't checked the code, but anything related to the PM here?


You mean power management? We don't have to care about PCI power down 
for VFs because powering down a SR-IOV PCI device will reset it and thus 
disable its VFs.


Regards,
Akihiko Odaki