On Wed, Jun 22, 2022 at 6:38 AM Dan Williams <[email protected]> wrote:
>
> Jason Wang wrote:
> > The NVDIMM region could be available before the virtio_device_ready()
> > that is called by virtio_dev_probe(). This means the driver tries to
> > use device before DRIVER_OK which violates the spec, fixing this by
> > set device ready before the nvdimm_pmem_region_create().
>
> Can you clarify the failure path. What race is virtio_device_ready()
> losing?

So it's something like this:

1) virtio_device_ready() will set DRIVER_OK to the device.
2) virtio spec disallow device to process a virtqueue without DRIVER_OK to set

But the nd_region is available to user after nd_region_create(), and a
flush could be issued before virtio_device_ready().

This means the hypervisor gets a kick on the virtqueue before
DRIVER_OK. The hypervisor should choose not to respond to that kick
according to the spec. This will result in infinite wait in
virtio_pmem_flush().

Fortunately, qemu doesn't check DRIVER_OK and can process the
virtqueue without DRIVER_OK (which is kind of a spec violation), so we
survive for the past few years. But there's no guarantee it can work
for other hypervisor.

So we need to set DRIVER_OK before nd_region_create() to make sure flush works.

>
> >
> > Note that this means the virtio_pmem_host_ack() could be triggered
> > before the creation of the nd region, this is safe since the
> > virtio_pmem_host_ack() since pmem_lock has been initialized and we
> > check if we've added any buffer before trying to proceed.
>
> I got a little bit lost with the usage of "we" here. Can you clarify
> which function / context is making which guarantee?

By "we" I meant the callback for the req_vq that is virtio_pmem_host_ack().

If we do virtio_device_ready() before nd_region_create(). A buggy or
malicious hypervisor can raise the notification before
nd_region_create(). We need to make sure virtio_pmem_host_ack() can
survive from this. And since we've checked whether we've submitted any
request before, so in the case where guest memory is protected from
the host, we're safe here.

>
> >
> > Fixes 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> > Signed-off-by: Jason Wang <[email protected]>
> > ---
> >  drivers/nvdimm/virtio_pmem.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > index 48f8327d0431..173f2f5adaea 100644
> > --- a/drivers/nvdimm/virtio_pmem.c
> > +++ b/drivers/nvdimm/virtio_pmem.c
> > @@ -84,6 +84,17 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
> >       ndr_desc.provider_data = vdev;
> >       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> >       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > +     /*
> > +      * The NVDIMM region could be available before the
> > +      * virtio_device_ready() that is called by
> > +      * virtio_dev_probe(), so we set device ready here.
> > +      *
> > +      * The callback - virtio_pmem_host_ack() is safe to be called
> > +      * before the nvdimm_pmem_region_create() since the pmem_lock
> > +      * has been initialized and legality of a used buffer is
> > +      * validated before moving forward.
>
> This comment feels like changelog material.

I had this in the changelog:

> > Note that this means the virtio_pmem_host_ack() could be triggered
> > before the creation of the nd region, this is safe since the
> > virtio_pmem_host_ack() since pmem_lock has been initialized and we
> > check if we've added any buffer before trying to proceed.

> Just document why
> virtio_device_ready() must be called before device_add() of the
> nd_region.

This comment wants to explain the side effect of having
virtio_device_ready() before nvdimm_pmem_region_create() and why we
can survive from that.

Thanks

>
> > +      */
> > +     virtio_device_ready(vdev);
> >       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> >       if (!nd_region) {
> >               dev_err(&vdev->dev, "failed to create nvdimm region\n");
> > @@ -92,6 +103,7 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
> >       }
> >       return 0;
> >  out_nd:
> > +     virtio_reset_device(vdev);
> >       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> >  out_vq:
> >       vdev->config->del_vqs(vdev);
> > --
> > 2.25.1
> >
>


Reply via email to