Re: [PATCH v4 1/3] hw/block/fdc: Extract blk_create_empty_drive()

2021-11-25 Thread Hanna Reitz

On 24.11.21 17:15, Philippe Mathieu-Daudé wrote:

We are going to re-use this code in the next commit,
so extract it as a new blk_create_empty_drive() function.

Inspired-by: Hanna Reitz 


:)


Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/block/fdc.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)


Reviewed-by: Hanna Reitz 




Re: [PATCH v4 3/3] tests/qtest/fdc-test: Add a regression test for CVE-2021-20196

2021-11-25 Thread Hanna Reitz

On 24.11.21 17:15, Philippe Mathieu-Daudé wrote:

Without the previous commit, when running 'make check-qtest-i386'
with QEMU configured with '--enable-sanitizers' we get:

   AddressSanitizer:DEADLYSIGNAL
   =
   ==287878==ERROR: AddressSanitizer: SEGV on unknown address 0x0344
   ==287878==The signal is caused by a WRITE memory access.
   ==287878==Hint: address points to the zero page.
   #0 0x564b2e5bac27 in blk_inc_in_flight block/block-backend.c:1346:5
   #1 0x564b2e5bb228 in blk_pwritev_part block/block-backend.c:1317:5
   #2 0x564b2e5bcd57 in blk_pwrite block/block-backend.c:1498:11
   #3 0x564b2ca1cdd3 in fdctrl_write_data hw/block/fdc.c:2221:17
   #4 0x564b2ca1b2f7 in fdctrl_write hw/block/fdc.c:829:9
   #5 0x564b2dc49503 in portio_write softmmu/ioport.c:201:9

Add the reproducer for CVE-2021-20196.

Suggested-by: Alexander Bulekov 
Reviewed-by: Darren Kenny 
Signed-off-by: Philippe Mathieu-Daudé 
---
  tests/qtest/fdc-test.c | 38 ++
  1 file changed, 38 insertions(+)

diff --git a/tests/qtest/fdc-test.c b/tests/qtest/fdc-test.c
index 26b69f7c5cd..8f6eee84a47 100644
--- a/tests/qtest/fdc-test.c
+++ b/tests/qtest/fdc-test.c
@@ -32,6 +32,9 @@
  /* TODO actually test the results and get rid of this */
  #define qmp_discard_response(...) qobject_unref(qmp(__VA_ARGS__))
  
+#define DRIVE_FLOPPY_BLANK \

+"-drive 
if=floppy,file=null-co://,file.read-zeroes=on,format=raw,size=1440k"
+
  #define TEST_IMAGE_SIZE 1440 * 1024
  
  #define FLOPPY_BASE 0x3f0

@@ -546,6 +549,40 @@ static void fuzz_registers(void)
  }
  }
  
+static bool qtest_check_clang_sanitizer(void)

+{
+#if defined(__SANITIZE_ADDRESS__) || __has_feature(address_sanitizer)
+return true;
+#else
+g_test_skip("QEMU not configured using --enable-sanitizers");
+return false;
+#endif
+}
+static void test_cve_2021_20196(void)
+{
+QTestState *s;
+
+if (!qtest_check_clang_sanitizer()) {
+return;
+}
+
+s = qtest_initf("-nographic -m 32M -nodefaults " DRIVE_FLOPPY_BLANK);
+
+qtest_outw(s, 0x3f4, 0x0500);
+qtest_outb(s, 0x3f5, 0x00);
+qtest_outb(s, 0x3f5, 0x00);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outb(s, 0x3f5, 0x00);
+qtest_outw(s, 0x3f1, 0x0400);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outw(s, 0x3f4, 0x);
+qtest_outb(s, 0x3f5, 0x00);
+qtest_outb(s, 0x3f5, 0x01);
+qtest_outw(s, 0x3f1, 0x0500);
+qtest_outb(s, 0x3f5, 0x00);
+qtest_quit(s);
+}
+


Now this works as a reproducer for me, but... this is a completely 
different I/O sequence now, right?


Can’t complain, though, I didn’t understand the previous one, I can’t 
claim I need to understand this one or why they’re different.


All the rest looks good to me, so all in all:

Reviewed-by: Hanna Reitz 




Re: [PATCH v2 12/15] hw/nvme: Initialize capability structures for primary/secondary controllers

2021-11-25 Thread Łukasz Gieryk
On Wed, Nov 24, 2021 at 03:26:30PM +0100, Łukasz Gieryk wrote:
> On Wed, Nov 24, 2021 at 09:04:31AM +0100, Klaus Jensen wrote:
> > On Nov 16 16:34, Łukasz Gieryk wrote:
> > > With four new properties:
> > >  - sriov_v{i,q}_flexible,
> > >  - sriov_max_v{i,q}_per_vf,
> > > one can configure the number of available flexible resources, as well as
> > > the limits. The primary and secondary controller capability structures
> > > are initialized accordingly.
> > > 
> > > Since the number of available queues (interrupts) now varies between
> > > VF/PF, BAR size calculation is also adjusted.
> > > 
> > > Signed-off-by: Łukasz Gieryk 
> > > ---
> > >  hw/nvme/ctrl.c   | 138 ---
> > >  hw/nvme/nvme.h   |   4 ++
> > >  include/block/nvme.h |   5 ++
> > >  3 files changed, 140 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > > index f8f5dfe204..f589ffde59 100644
> > > --- a/hw/nvme/ctrl.c
> > > +++ b/hw/nvme/ctrl.c
> > > @@ -6358,13 +6444,40 @@ static void nvme_init_state(NvmeCtrl *n)
> > >  n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
> > >  n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
> > >  
> > > -list->numcntl = cpu_to_le16(n->params.sriov_max_vfs);
> > > -for (i = 0; i < n->params.sriov_max_vfs; i++) {
> > > +list->numcntl = cpu_to_le16(max_vfs);
> > > +for (i = 0; i < max_vfs; i++) {
> > >  sctrl = &list->sec[i];
> > >  sctrl->pcid = cpu_to_le16(n->cntlid);
> > >  }
> > >  
> > >  cap->cntlid = cpu_to_le16(n->cntlid);
> > > +cap->crt = NVME_CRT_VQ | NVME_CRT_VI;
> > > +
> > > +if (pci_is_vf(&n->parent_obj)) {
> > > +cap->vqprt = cpu_to_le16(1 + n->conf_ioqpairs);
> > > +} else {
> > > +cap->vqprt = cpu_to_le16(1 + n->params.max_ioqpairs -
> > > + n->params.sriov_vq_flexible);
> > > +cap->vqfrt = cpu_to_le32(n->params.sriov_vq_flexible);
> > > +cap->vqrfap = cap->vqfrt;
> > > +cap->vqgran = cpu_to_le16(NVME_VF_RES_GRANULARITY);
> > > +cap->vqfrsm = n->params.sriov_max_vq_per_vf ?
> > > +cpu_to_le16(n->params.sriov_max_vq_per_vf) :
> > > +cap->vqprt;
> > 
> > That this defaults to VQPRT doesn't seem right. It should default to
> > VQFRT. Does not make sense to report a maximum number of assignable
> > flexible resources that are bigger than the number of flexible resources
> > available.
> 
> I’ve explained in on of v1 threads why I think using the current default
> is better than VQPRT.
> 
> What you’ve noticed is indeed an inconvenience, but it’s – at least in
> my opinion – part of the design. What matters is the current number of
> unassigned flexible resources. It may be lower than VQFRSM due to
> multiple reasons:
>  1) resources are bound to PF, 
>  2) resources are bound to other VFs,
>  3) resources simply don’t exist (not baked in silicone: VQFRT < VQFRSM).
> 
> If 1) and 2) are allowed to happen, and the user must be aware of that,
> then why 3) shouldn’t?
> 

I’ve done some more thinking, and now I’m not happy with my version, nor
the suggested VQPRT.

How about using this formula instead?:

v{q,i}frsm = sriov_max_v{I,q}_per_vf ? sriov_max_v{I,q}_per_vf :
 floor(sriov_v{i,q}_flexible / sriov_max_vfs)

v{q,i}frsm would end up with values similar/proportional to those
reported by and actual SR-IOV-capable device available on the market.




Re: [PATCH v4 3/3] tests/qtest/fdc-test: Add a regression test for CVE-2021-20196

2021-11-25 Thread Philippe Mathieu-Daudé
On 11/25/21 12:57, Hanna Reitz wrote:
> On 24.11.21 17:15, Philippe Mathieu-Daudé wrote:
>> Without the previous commit, when running 'make check-qtest-i386'
>> with QEMU configured with '--enable-sanitizers' we get:
>>
>>    AddressSanitizer:DEADLYSIGNAL
>>    =
>>    ==287878==ERROR: AddressSanitizer: SEGV on unknown address
>> 0x0344
>>    ==287878==The signal is caused by a WRITE memory access.
>>    ==287878==Hint: address points to the zero page.
>>    #0 0x564b2e5bac27 in blk_inc_in_flight
>> block/block-backend.c:1346:5
>>    #1 0x564b2e5bb228 in blk_pwritev_part block/block-backend.c:1317:5
>>    #2 0x564b2e5bcd57 in blk_pwrite block/block-backend.c:1498:11
>>    #3 0x564b2ca1cdd3 in fdctrl_write_data hw/block/fdc.c:2221:17
>>    #4 0x564b2ca1b2f7 in fdctrl_write hw/block/fdc.c:829:9
>>    #5 0x564b2dc49503 in portio_write softmmu/ioport.c:201:9
>>
>> Add the reproducer for CVE-2021-20196.
>>
>> Suggested-by: Alexander Bulekov 
>> Reviewed-by: Darren Kenny 
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>   tests/qtest/fdc-test.c | 38 ++
>>   1 file changed, 38 insertions(+)
>>
>> diff --git a/tests/qtest/fdc-test.c b/tests/qtest/fdc-test.c
>> index 26b69f7c5cd..8f6eee84a47 100644
>> --- a/tests/qtest/fdc-test.c
>> +++ b/tests/qtest/fdc-test.c
>> @@ -32,6 +32,9 @@
>>   /* TODO actually test the results and get rid of this */
>>   #define qmp_discard_response(...) qobject_unref(qmp(__VA_ARGS__))
>>   +#define DRIVE_FLOPPY_BLANK \
>> +    "-drive
>> if=floppy,file=null-co://,file.read-zeroes=on,format=raw,size=1440k"
>> +
>>   #define TEST_IMAGE_SIZE 1440 * 1024
>>     #define FLOPPY_BASE 0x3f0
>> @@ -546,6 +549,40 @@ static void fuzz_registers(void)
>>   }
>>   }
>>   +static bool qtest_check_clang_sanitizer(void)
>> +{
>> +#if defined(__SANITIZE_ADDRESS__) || __has_feature(address_sanitizer)
>> +    return true;
>> +#else
>> +    g_test_skip("QEMU not configured using --enable-sanitizers");
>> +    return false;
>> +#endif
>> +}
>> +static void test_cve_2021_20196(void)
>> +{
>> +    QTestState *s;
>> +
>> +    if (!qtest_check_clang_sanitizer()) {
>> +    return;
>> +    }
>> +
>> +    s = qtest_initf("-nographic -m 32M -nodefaults "
>> DRIVE_FLOPPY_BLANK);
>> +
>> +    qtest_outw(s, 0x3f4, 0x0500);
>> +    qtest_outb(s, 0x3f5, 0x00);
>> +    qtest_outb(s, 0x3f5, 0x00);
>> +    qtest_outw(s, 0x3f4, 0x);
>> +    qtest_outb(s, 0x3f5, 0x00);
>> +    qtest_outw(s, 0x3f1, 0x0400);
>> +    qtest_outw(s, 0x3f4, 0x);
>> +    qtest_outw(s, 0x3f4, 0x);
>> +    qtest_outb(s, 0x3f5, 0x00);
>> +    qtest_outb(s, 0x3f5, 0x01);
>> +    qtest_outw(s, 0x3f1, 0x0500);
>> +    qtest_outb(s, 0x3f5, 0x00);
>> +    qtest_quit(s);
>> +}
>> +
> 
> Now this works as a reproducer for me, but... this is a completely
> different I/O sequence now, right?

The patch Alexander sent [*] was indeed not working, but I could
manually reproduce, then I figure while the commit *description*
was working, the patch *content* was not accurate. This patch uses
the commit description.

[1] https://www.mail-archive.com/qemu-block@nongnu.org/msg82825.html

> Can’t complain, though, I didn’t understand the previous one, I can’t
> claim I need to understand this one or why they’re different.

Same here =)

> All the rest looks good to me, so all in all:
> 
> Reviewed-by: Hanna Reitz 

Thank you!




Re: Questions about losing the write lock of raw-format disks after migration

2021-11-25 Thread Hanna Reitz

On 24.11.21 13:56, Peng Liang via wrote:

Hi folks,

When we test migration with raw-format disk, we found that the QEMU
process in the dst will lose the write lock after migration.  However,
the QEMU process in the dst will still hold the write lock for
qcow2-format disk.

After reading some block layer's code, I found that the first
blk_set_perm in blk_root_activate will set blk->shared_perm to
BLK_PERM_ALL (disable all shared permissions?).  Then in
blk_vm_state_changed, blk_set_perm will set shared_perm to
blk->shared_perm, which is BLK_PERM_ALL.  And it makes
raw_handle_perm_lock not to get the write lock.

So I try the following patch and it will fix the problem:
diff --git a/block/block-backend.c b/block/block-backend.c
index 12ef80ea17..96518fd1f0 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -197,13 +197,6 @@ static void blk_root_activate(BdrvChild *child,
Error **errp)

  blk->disable_perm = false;

-blk_set_perm(blk, blk->perm, BLK_PERM_ALL, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-blk->disable_perm = true;
-return;
-}
-
  if (runstate_check(RUN_STATE_INMIGRATE)) {
  /* Activation can happen when migration process is still
active, for
   * example when nbd_server_add is called during non-shared storage

I'm new to the block layer and I'm not sure that it's a right fix to the
problem.  Any idea about the problem and the patch?


Hi Peng,

Thanks for your report!  I can reproduce this problem.

It appears to me the problem is that blk_set_perm(), well, sets 
blk->perm and blk->shared_perm.  So by once calling it with 
BLK_PERM_ALL, we override blk->shared_perm to from then on be 
BLK_PERM_ALL, even though the guest device has not set that at all. We 
later call blk_set_perm(blk->blk_perm, blk->shared_perm) (in 
blk_vm_state_changed()), however, blk->shared_perm is now BLK_PERM_ALL, 
so this is a no-op.  That means that any restrictions the guest device 
has imposed (like the default share-rw=off) is not reflected in the 
block device’s permissions.


This is not apparent with qcow2, because the qcow2 format imposes its 
own restrictions in addition to the guest device.


I think the right way to fix this is to save blk->shared_perm somewhere 
and then restore it after the blk_set_perm(BLK_PERM_ALL) call.  I’ll 
send a patch (with a test case).


Hanna




[PATCH 0/2] block-backend: Retain permissions after migration

2021-11-25 Thread Hanna Reitz
Hi,

Peng Liang has reported an issue regarding migration of raw images here:
https://lists.nongnu.org/archive/html/qemu-block/2021-11/msg00673.html

It turns out that after migrating, all permissions are shared when they
weren’t before.  The cause of the problem is that we deliberately delay
restricting the shared permissions until migration is really done (until
the runstate is no longer INMIGRATE) and first share all permissions;
but this causes us to lose the original shared permission mask and
overwrites it with BLK_PERM_ALL, so once we do try to restrict the
shared permissions, we only again share them all.

Fix this by saving the set of shared permissions through the first
blk_perm_set() call that shares all; and add a regression test.


I don’t believe we have to fix this in 6.2, because I think this bug has
existed for four years now.  (I.e. it isn’t critical, and it’s no
regression.)


Hanna Reitz (2):
  block-backend: Retain permissions after migration
  iotests/migration-permissions: New test

 block/block-backend.c |  11 ++
 .../qemu-iotests/tests/migration-permissions  | 101 ++
 .../tests/migration-permissions.out   |   5 +
 3 files changed, 117 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/migration-permissions
 create mode 100644 tests/qemu-iotests/tests/migration-permissions.out

-- 
2.33.1




[PATCH 1/2] block-backend: Retain permissions after migration

2021-11-25 Thread Hanna Reitz
After migration, the permissions the guest device wants to impose on its
BlockBackend are stored in blk->perm and blk->shared_perm.  In
blk_root_activate(), we take our permissions, but keep all shared
permissions open by calling `blk_set_perm(blk->perm, BLK_PERM_ALL)`.

Only afterwards (immediately or later, depending on the runstate) do we
restrict the shared permissions by calling
`blk_set_perm(blk->perm, blk->shared_perm)`.  Unfortunately, our first
call with shared_perm=BLK_PERM_ALL has overwritten blk->shared_perm to
be BLK_PERM_ALL, so this is a no-op and the set of shared permissions is
not restricted.

Fix this bug by saving the set of shared permissions before invoking
blk_set_perm() with BLK_PERM_ALL and restoring it afterwards.

Fixes: 5f7772c4d0cf32f4e779fcd5a69ae4dae24aeebf
   ("block-backend: Defer shared_perm tightening migration
   completion")
Reported-by: Peng Liang 
Signed-off-by: Hanna Reitz 
---
 block/block-backend.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 12ef80ea17..41e388fe1f 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -190,6 +190,7 @@ static void blk_root_activate(BdrvChild *child, Error 
**errp)
 {
 BlockBackend *blk = child->opaque;
 Error *local_err = NULL;
+uint64_t saved_shared_perm;
 
 if (!blk->disable_perm) {
 return;
@@ -197,12 +198,22 @@ static void blk_root_activate(BdrvChild *child, Error 
**errp)
 
 blk->disable_perm = false;
 
+/*
+ * blk->shared_perm contains the permissions we want to share once
+ * migration is really completely done.  For now, we need to share
+ * all; but we also need to retain blk->shared_perm, which is
+ * overwritten by a successful blk_set_perm() call.  Save it and
+ * restore it below.
+ */
+saved_shared_perm = blk->shared_perm;
+
 blk_set_perm(blk, blk->perm, BLK_PERM_ALL, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 blk->disable_perm = true;
 return;
 }
+blk->shared_perm = saved_shared_perm;
 
 if (runstate_check(RUN_STATE_INMIGRATE)) {
 /* Activation can happen when migration process is still active, for
-- 
2.33.1




[PATCH 2/2] iotests/migration-permissions: New test

2021-11-25 Thread Hanna Reitz
This test checks that a raw image in use by a virtio-blk device does not
share the WRITE permission both before and after migration.

Signed-off-by: Hanna Reitz 
---
 .../qemu-iotests/tests/migration-permissions  | 101 ++
 .../tests/migration-permissions.out   |   5 +
 2 files changed, 106 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/migration-permissions
 create mode 100644 tests/qemu-iotests/tests/migration-permissions.out

diff --git a/tests/qemu-iotests/tests/migration-permissions 
b/tests/qemu-iotests/tests/migration-permissions
new file mode 100755
index 00..6be02581c7
--- /dev/null
+++ b/tests/qemu-iotests/tests/migration-permissions
@@ -0,0 +1,101 @@
+#!/usr/bin/env python3
+# group: migration
+#
+# Copyright (C) 2021 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import iotests
+from iotests import imgfmt, qemu_img_create, qemu_io
+
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+mig_sock = os.path.join(iotests.sock_dir, 'mig.sock')
+
+
+class TestMigrationPermissions(iotests.QMPTestCase):
+def setUp(self):
+qemu_img_create('-f', imgfmt, test_img, '1M')
+
+# Set up two VMs (source and destination) accessing the same raw
+# image file with a virtio-blk device; prepare the destination for
+# migration with .add_incoming() and enable migration events
+vms = [None, None]
+for i in range(2):
+vms[i] = iotests.VM(path_suffix=f'{i}')
+vms[i].add_blockdev(f'file,node-name=prot,filename={test_img}')
+vms[i].add_blockdev(f'{imgfmt},node-name=fmt,file=prot')
+vms[i].add_device('virtio-blk,drive=fmt')
+
+if i == 1:
+vms[i].add_incoming(f'unix:{mig_sock}')
+
+vms[i].launch()
+
+result = vms[i].qmp('migrate-set-capabilities',
+capabilities=[
+{'capability': 'events', 'state': True}
+])
+self.assert_qmp(result, 'return', {})
+
+self.vm_s = vms[0]
+self.vm_d = vms[1]
+
+def tearDown(self):
+self.vm_s.shutdown()
+self.vm_d.shutdown()
+try:
+os.remove(mig_sock)
+except FileNotFoundError:
+pass
+os.remove(test_img)
+
+# Migrate an image in use by a virtio-blk device to another VM and
+# verify that the WRITE permission is unshared both before and after
+# migration
+def test_post_migration_permissions(self):
+# Try to access the image R/W, which should fail because virtio-blk
+# has not been configured with share-rw=on
+log = qemu_io('-f', imgfmt, '-c', 'quit', test_img)
+if not log.strip():
+print('ERROR (pre-migration): qemu-io should not be able to '
+  'access this image, but it reported no error')
+else:
+# This is the expected output
+assert 'Is another process using the image' in log
+
+# Now migrate the VM
+self.vm_s.qmp('migrate', uri=f'unix:{mig_sock}')
+assert self.vm_s.wait_migration(None)
+assert self.vm_d.wait_migration(None)
+
+# Try the same qemu-io access again, verifying that the WRITE
+# permission remains unshared
+log = qemu_io('-f', imgfmt, '-c', 'quit', test_img)
+if not log.strip():
+print('ERROR (post-migration): qemu-io should not be able to '
+  'access this image, but it reported no error')
+else:
+# This is the expected output
+assert 'Is another process using the image' in log
+
+
+if __name__ == '__main__':
+# Only works with raw images because we are testing the
+# BlockBackend permissions; image format drivers may additionally
+# unshare permissions and thus tamper with the result
+iotests.main(supported_fmts=['raw'],
+ supported_protocols=['file'])
diff --git a/tests/qemu-iotests/tests/migration-permissions.out 
b/tests/qemu-iotests/tests/migration-permissions.out
new file mode 100644
index 00..ae1213e6f8
--- /dev/null
+++ b/tests/qemu-iotests/tests/migration-permissions.out
@@ -0,0 +1,5 @@
+.
+--
+Ran 1 tests
+
+OK
-- 
2.33.1




Re: [PATCH 1/2] block-backend: Retain permissions after migration

2021-11-25 Thread Philippe Mathieu-Daudé
On 11/25/21 14:53, Hanna Reitz wrote:
> After migration, the permissions the guest device wants to impose on its
> BlockBackend are stored in blk->perm and blk->shared_perm.  In
> blk_root_activate(), we take our permissions, but keep all shared
> permissions open by calling `blk_set_perm(blk->perm, BLK_PERM_ALL)`.
> 
> Only afterwards (immediately or later, depending on the runstate) do we
> restrict the shared permissions by calling
> `blk_set_perm(blk->perm, blk->shared_perm)`.  Unfortunately, our first
> call with shared_perm=BLK_PERM_ALL has overwritten blk->shared_perm to
> be BLK_PERM_ALL, so this is a no-op and the set of shared permissions is
> not restricted.
> 
> Fix this bug by saving the set of shared permissions before invoking
> blk_set_perm() with BLK_PERM_ALL and restoring it afterwards.
> 
> Fixes: 5f7772c4d0cf32f4e779fcd5a69ae4dae24aeebf
>("block-backend: Defer shared_perm tightening migration
>completion")
> Reported-by: Peng Liang 
> Signed-off-by: Hanna Reitz 
> ---
>  block/block-backend.c | 11 +++
>  1 file changed, 11 insertions(+)

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2 00/15] hw/nvme: SR-IOV with Virtualization Enhancements

2021-11-25 Thread Łukasz Gieryk
On Wed, Nov 24, 2021 at 09:03:06AM +0100, Klaus Jensen wrote:
> Hi Lukasz,
> 
> I've been through this. I have a couple of review comments, but overall
> looks good for inclusion in nvme-next. Would be nice to get this in
> early in the cycle so it can mature there for 7.0.

We (I’m speaking on behalf of the other Lukasz) are really happy to
read that. We will do our best to make it happen.

> 
> I'd like that we mark this support experimental, so we can easily do
> some changes to how parameters work since I'm not sure we completely
> agree on that yet.
> 
> By the way, in the future, please add me and Keith as CCs on the entire
> series so we get CC'ed on replies to the cover-letter ;)
> 

> > List of known gaps and nice-to-haves:
> > 
> > 1) Interaction of secondary controllers with namespaces is not 100%
> > following the spec
> > 
> > The limitation: VF has to be visible on the PCI bus first, and only then
> > such VF can have a namespace attached.
> > 
> 
> Looking at the spec I'm not even sure what the expected behavior is
> supposed to be, can you elaborate? I rebased this on latest, and with
> Hannes changes, shared namespaces will be attached by default, which
> seems to be reasonable.

An example flow:

# Release flexible resources from PF (assuming it’s /dev/nvme0)
nvme virt-mgmt -c 0 -r 0 -n 0 -a 1 /dev/nvme0
nvme virt-mgmt -c 0 -r 1 -n 0 -a 1 /dev/nvme0
echo 1 > /sys/class/nvme/nvme0/reset_controller
# Bind sane minimums to VF1 (cntlid=1) and set it online
nvme virt-mgmt -c 1 -r 0 -n 5 -a 8 /dev/nvme0
nvme virt-mgmt -c 1 -r 1 -n 5 -a 8 /dev/nvme0
nvme virt-mgmt -c 1 -a 9 /dev/nvme0
# Enable 2 VFs
echo 2 > /sys/bus/pci/devices//sriov_numvfs
# PF, VF1 and VF2 are visible on PCI
lspci | grep Non-Volatile
# The NVMe driver is bound to PF and VF1 (the only online VF)
nvme list -v
# VFs shall eventually not support Ns Management/Attachment commands,
# and namespaces should be attached to VFs (i.e., their secondary
# controllers) through the PF.
# A namespace can be attached to VF1, VF2
nvme attach-ns /dev/nvme0 -c 1 -n X
nvme attach-ns /dev/nvme0 -c 2 -n X
# According to the spec this should also succeed, but today it won’t
nvme attach-ns /dev/nvme0 -c 3 -n X

VF3’s NvmeCtrl object is not yet allocated, so today there’s nothing
for nvme_subsys_ctrl() to return for cntlid=3, besides NULL (the
current behavior) or SUBSYS_SLOT_RSVD.

Relevant use cases:
 - admin can configure disabled VFs,
 - information about attached ns persists when VFs are disabled,
are not that critical, but of course it’s a discrepancy from what a
real device can handle.

In my opinion, to handle the cases correctly, information about attached
namespaces could be moved to subsystem. Could you share your thoughts
whether such approach would make sense?