Re: [PATCH RFC 1/4] include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR
On Wed, Jan 09, 2019 at 11:35:52AM +0100, Miguel Ojeda wrote: > On Tue, Jan 8, 2019 at 6:44 PM Nick Desaulniers > wrote: > > > > Also for more context, see: > > commit 7829fb09a2b4 ("lib: make memzero_explicit more robust against > > dead store elimination") > > By the way, shouldn't that barrier_data() be directly in compiler.h > too, since it is for both gcc & clang? > > > Reviewed-by: Nick Desaulniers > > > > + Miguel > > Miguel, would you mind taking this into your compiler-attributes tree? > > Sure, at least we get quickly some linux-next time. BTW why linux-next? shouldn't this go into 5.0 and stable? It's a bugfix after all. > Note it would be nice to separate the patch into two (one for the > comments, another for OPTIMIZER_HIDE_VAR), and also possibly another > for barrier_data(). > > Cheers, > Miguel ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 0/5] kvm "virtio pmem" device
On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote: > This patch series has implementation for "virtio pmem". > "virtio pmem" is fake persistent memory(nvdimm) in guest > which allows to bypass the guest page cache. This also > implements a VIRTIO based asynchronous flush mechanism. H. Sharing the host page cache direct into the guest VM. Sounds like a good idea, but. This means the guest VM can now run timing attacks to observe host side page cache residency, and depending on the implementation I'm guessing that the guest will be able to control host side page cache eviction, too (e.g. via discard or hole punch operations). Which means this functionality looks to me like a new vector for information leakage into and out of the guest VM via guest controlled host page cache manipulation. https://arxiv.org/pdf/1901.01161 I might be wrong, but if I'm not we're going to have to be very careful about how guest VMs can access and manipulate host side resources like the page cache. Cheers, Dave. -- Dave Chinner da...@fromorbit.com ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio-ccw: wire up ->bus_name callback
On Thu, 3 Jan 2019 15:11:45 +0100 Cornelia Huck wrote: > Return the bus id of the ccw proxy device. This makes 'ethtool -i' > show a more useful value than 'virtio' in the bus-info field. > > Signed-off-by: Cornelia Huck > --- > drivers/s390/virtio/virtio_ccw.c | 8 > 1 file changed, 8 insertions(+) Queued. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation
On Wed, Jan 09, 2019 at 07:22:50PM +0100, Christian Borntraeger wrote: > > On 09.01.2019 15:52, Michael S. Tsirkin wrote: > > On Wed, Jan 09, 2019 at 01:07:16PM +0100, Christian Borntraeger wrote: > >> On 09.01.2019 11:35, Wei Wang wrote: > >>> On 01/08/2019 04:46 PM, Christian Borntraeger wrote: > > On 08.01.2019 06:35, Wei Wang wrote: > > On 01/07/2019 09:49 PM, Christian Borntraeger wrote: > >> On 07.01.2019 08:01, Wei Wang wrote: > >>> virtio-ccw has deadlock issues with reading the config space inside > >>> the > >>> interrupt context, so we tweak the virtballoon_changed implementation > >>> by moving the config read operations into the related workqueue > >>> contexts. > >>> The config_read_bitmap is used as a flag to the workqueue callbacks > >>> about the related config fields that need to be read. > >>> > >>> The cmd_id_received is also renamed to cmd_id_received_cache, and > >>> the value should be obtained via virtio_balloon_cmd_id_received. > >>> > >>> Reported-by: Christian Borntraeger > >>> Signed-off-by: Wei Wang > >>> Reviewed-by: Cornelia Huck > >>> Reviewed-by: Halil Pasic > >> Together with > >> virtio_pci: use queue idx instead of array idx to set up the vq > >> virtio: don't allocate vqs when names[i] = NULL > >> > >> Tested-by: Christian Borntraeger > > OK. I don't plan to send a new version of the above patches as no > > changes needed so far. > > > > Michael, if the above two patches look good to you, please help add the > > related tested-by > > and reviewed-by tags. Thanks. > Can we also make sure that > > virtio_pci: use queue idx instead of array idx to set up the vq > virtio: don't allocate vqs when names[i] = NULL > > also land in stable? > > >>> > >>> You could also send the request to stable after it gets merged to Linus' > >>> tree. > >>> The stable review committee will decide whether to take it. > >>> > >>> Please see Option 2: > >>> > >>> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html > >>> > >> > >> Those patches are not upstream yet, Correct? > >> > >> Michael, > >> > >> can you add the stable tag before submitting? If not, can you give me a > >> heads up when doing the > >> pull request so that I can ping the stable folks. > > > > Can you reply to patches that you feel are needed on stable with just > > > > Cc: sta...@vger.kernel.org > > > > in the message body? > > > > Then it's all automatically handled by ack attaching scripts. > > Done. But this only works if those patches are not already part of a tree. I > guess they have to go via > your tree, correct? Yes. It works because I rebase my tree. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation
On 09.01.2019 15:52, Michael S. Tsirkin wrote: > On Wed, Jan 09, 2019 at 01:07:16PM +0100, Christian Borntraeger wrote: >> On 09.01.2019 11:35, Wei Wang wrote: >>> On 01/08/2019 04:46 PM, Christian Borntraeger wrote: On 08.01.2019 06:35, Wei Wang wrote: > On 01/07/2019 09:49 PM, Christian Borntraeger wrote: >> On 07.01.2019 08:01, Wei Wang wrote: >>> virtio-ccw has deadlock issues with reading the config space inside the >>> interrupt context, so we tweak the virtballoon_changed implementation >>> by moving the config read operations into the related workqueue >>> contexts. >>> The config_read_bitmap is used as a flag to the workqueue callbacks >>> about the related config fields that need to be read. >>> >>> The cmd_id_received is also renamed to cmd_id_received_cache, and >>> the value should be obtained via virtio_balloon_cmd_id_received. >>> >>> Reported-by: Christian Borntraeger >>> Signed-off-by: Wei Wang >>> Reviewed-by: Cornelia Huck >>> Reviewed-by: Halil Pasic >> Together with >> virtio_pci: use queue idx instead of array idx to set up the vq >> virtio: don't allocate vqs when names[i] = NULL >> >> Tested-by: Christian Borntraeger > OK. I don't plan to send a new version of the above patches as no changes > needed so far. > > Michael, if the above two patches look good to you, please help add the > related tested-by > and reviewed-by tags. Thanks. Can we also make sure that virtio_pci: use queue idx instead of array idx to set up the vq virtio: don't allocate vqs when names[i] = NULL also land in stable? >>> >>> You could also send the request to stable after it gets merged to Linus' >>> tree. >>> The stable review committee will decide whether to take it. >>> >>> Please see Option 2: >>> >>> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html >>> >> >> Those patches are not upstream yet, Correct? >> >> Michael, >> >> can you add the stable tag before submitting? If not, can you give me a >> heads up when doing the >> pull request so that I can ping the stable folks. > > Can you reply to patches that you feel are needed on stable with just > > Cc: sta...@vger.kernel.org > > in the message body? > > Then it's all automatically handled by ack attaching scripts. Done. But this only works if those patches are not already part of a tree. I guess they have to go via your tree, correct? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 15/15] drm/bochs: reserve bo for pin/unpin
On Wed, Jan 9, 2019 at 12:36 PM Daniel Vetter wrote: > > On Wed, Jan 9, 2019 at 3:54 PM Gerd Hoffmann wrote: > > > > On Wed, Jan 09, 2019 at 11:10:44AM +0100, Daniel Vetter wrote: > > > On Tue, Jan 08, 2019 at 12:25:19PM +0100, Gerd Hoffmann wrote: > > > > The buffer object must be reserved before calling > > > > ttm_bo_validate for pinning/unpinning. > > > > > > > > Signed-off-by: Gerd Hoffmann > > > > > > Seems a bit a bisect fumble in your series here: legacy kms code reserved > > > the ttm bo before calling boch_bo_pin/unpin, your atomic code doesn't. I > > > think pushing this into bochs_bo_pin/unpin makes sense for atomic, but to > > > avoid bisect fail I think you need to have these temporarily in your > > > cleanup/prepare_plane functions too. > > > > I think I've sorted that. Have some other changes too, will probably > > send v3 tomorrow. > > > > > Looked through the entire series, this here is the only issue I think > > > should be fixed before merging (making atomic_enable optional can be done > > > as a follow-up if you feel like it). With that addressed on the series: > > > > > > Acked-by: Daniel Vetter > > > > Thanks. > > > > While being at it: I'm also looking at dma-buf export and import > > support for the qemu drivers. > > > > Right now both qxl and virtio have gem_prime_get_sg_table and > > gem_prime_import_sg_table handlers which throw a WARN_ONCE() and return > > an error. > > > > If I understand things correctly it is valid to set all import/export > > callbacks (prime_handle_to_fd, prime_fd_to_handle, > > gem_prime_get_sg_table, gem_prime_import_sg_table) to NULL when not > > supporting dma-buf import/export and still advertise DRIVER_PRIME to > > indicate the other prime callbacks are supported (so generic fbdev > > emulation can use gem_prime_vmap etc). Is that correct? > > I'm not sure how much that's a good idea ... Never thought about it > tbh. All the fbdev/dma-buf stuff has plenty of hacks and > inconsistencies still, so I guess we can't make it much worse really. > > > On exporting: > > > > TTM_PL_TT should be easy, just pin the buffer, grab the pages list and > > feed that into drm_prime_pages_to_sg. Didn't try yet though. Is that > > approach correct? > > > > Is it possible to export TTM_PL_VRAM objects (with backing storage being > > a pci memory bar)? If so, how? > > Not really in general. amdgpu upcasts to amdgpu_bo (if it's amgpu BO) > and then knows the internals so it can do a proper pci peer2peer > mapping. Or at least there's been lots of patches floating around to > make that happen. Here's Christian's WIP stuff for adding device memory support to dma-buf: https://cgit.freedesktop.org/~deathsimple/linux/log/?h=p2p Alex > > I think other drivers migrate the bo out of VRAM. > > > On importing: > > > > Importing into TTM_PL_TT object looks easy again, at least when the > > object is actually stored in RAM. What if not? > > They are all supposed to be stored in RAM. Note that all current ttm > importers totally break the abstraction, by taking the sg list, > throwing the dma mapping away and assuming there's a struct page > backing it. Would be good if we could stop spreading that abuse - the > dma-buf interfaces have been modelled after the ttm bo interfaces, so > shouldn't be too hard to wire this up correctly. > > > Importing into TTM_PL_VRAM: Impossible I think, without copying over > > the data. Should that be done? If so, how? Or is it better to just > > not support import then? > > Hm, since you ask about TTM concepts and not what this means in terms > of dma-buf: As long as you upcast to the ttm_bo you can do whatever > you want to really. But with plain dma-buf this doesn't work right now > (least because ttm assumes it gets system RAM on import, in theory you > could put the peer2peer dma mapping into the sg list and it should > work). > -Daniel > > > > > thanks, > > Gerd > > > > ___ > > dri-devel mailing list > > dri-de...@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag
> > > > This patch adds 'DAXDEV_BUFFERED' flag which is set > > for virtio pmem corresponding nd_region. This later > > is used to disable MAP_SYNC functionality for ext4 > > & xfs filesystem. > > > > Signed-off-by: Pankaj Gupta > > --- > > drivers/dax/super.c | 17 + > > drivers/nvdimm/pmem.c| 3 +++ > > drivers/nvdimm/region_devs.c | 7 +++ > > drivers/virtio/pmem.c| 1 + > > include/linux/dax.h | 9 + > > include/linux/libnvdimm.h| 6 ++ > > 6 files changed, 43 insertions(+) > > > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > > index 6e928f3..9128740 100644 > > --- a/drivers/dax/super.c > > +++ b/drivers/dax/super.c > > @@ -167,6 +167,8 @@ enum dax_device_flags { > > DAXDEV_ALIVE, > > /* gate whether dax_flush() calls the low level flush routine */ > > DAXDEV_WRITE_CACHE, > > + /* flag to disable MAP_SYNC for virtio based host page cache flush > > */ > > + DAXDEV_BUFFERED, > > }; > > > > /** > > @@ -335,6 +337,21 @@ bool dax_write_cache_enabled(struct dax_device > > *dax_dev) > > } > > EXPORT_SYMBOL_GPL(dax_write_cache_enabled); > > > > +void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc) > > +{ > > + if (wc) > > + set_bit(DAXDEV_BUFFERED, &dax_dev->flags); > > + else > > + clear_bit(DAXDEV_BUFFERED, &dax_dev->flags); > > +} > > +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache); > > The "write_cache" property was structured this way because it can > conceivably change at runtime. The MAP_SYNC capability should be > static and never changed after init. o.k. Will change. > > > +bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev) > > +{ > > + return test_bit(DAXDEV_BUFFERED, &dax_dev->flags); > > +} > > +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache_enabled); > > Echoing Darrick and Jan this is should be a generic property of a > dax_device and not specific to virtio. I don't like the "buffered" > designation as that's not accurate. There may be hardware reasons why > a dax_device is not synchronous, like a requirement to flush a > write-pending queue or otherwise notify the device of new writes. Agree. > > I would just have a dax_synchronous() helper and a DAXDEV_SYNC flag. I > would also modify alloc_dax() to take a flags argument so that the > capability can be instantiated when the dax_device is allocated. o.k. Will make the change. Thanks, Pankaj > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 2/2] virtio: don't allocate vqs when names[i] = NULL
Cc: sta...@vger.kernel.org Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") On 28.12.2018 03:26, Wei Wang wrote: > Some vqs may not need to be allocated when their related feature bits > are disabled. So callers may pass in such vqs with "names = NULL". > Then we skip such vq allocations. > > Signed-off-by: Wei Wang > --- > drivers/misc/mic/vop/vop_main.c| 9 +++-- > drivers/remoteproc/remoteproc_virtio.c | 9 +++-- > drivers/s390/virtio/virtio_ccw.c | 12 +--- > drivers/virtio/virtio_mmio.c | 9 +++-- > 4 files changed, 30 insertions(+), 9 deletions(-) > > diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c > index 6b212c8..2bfa3a9 100644 > --- a/drivers/misc/mic/vop/vop_main.c > +++ b/drivers/misc/mic/vop/vop_main.c > @@ -394,16 +394,21 @@ static int vop_find_vqs(struct virtio_device *dev, > unsigned nvqs, > struct _vop_vdev *vdev = to_vopvdev(dev); > struct vop_device *vpdev = vdev->vpdev; > struct mic_device_ctrl __iomem *dc = vdev->dc; > - int i, err, retry; > + int i, err, retry, queue_idx = 0; > > /* We must have this many virtqueues. */ > if (nvqs > ioread8(&vdev->desc->num_vq)) > return -ENOENT; > > for (i = 0; i < nvqs; ++i) { > + if (!names[i]) { > + vqs[i] = NULL; > + continue; > + } > + > dev_dbg(_vop_dev(vdev), "%s: %d: %s\n", > __func__, i, names[i]); > - vqs[i] = vop_find_vq(dev, i, callbacks[i], names[i], > + vqs[i] = vop_find_vq(dev, queue_idx++, callbacks[i], names[i], >ctx ? ctx[i] : false); > if (IS_ERR(vqs[i])) { > err = PTR_ERR(vqs[i]); > diff --git a/drivers/remoteproc/remoteproc_virtio.c > b/drivers/remoteproc/remoteproc_virtio.c > index 183fc42..2d7cd344 100644 > --- a/drivers/remoteproc/remoteproc_virtio.c > +++ b/drivers/remoteproc/remoteproc_virtio.c > @@ -153,10 +153,15 @@ static int rproc_virtio_find_vqs(struct virtio_device > *vdev, unsigned int nvqs, >const bool * ctx, >struct irq_affinity *desc) > { > - int i, ret; > + int i, ret, queue_idx = 0; > > for (i = 0; i < nvqs; ++i) { > - vqs[i] = rp_find_vq(vdev, i, callbacks[i], names[i], > + if (!names[i]) { > + vqs[i] = NULL; > + continue; > + } > + > + vqs[i] = rp_find_vq(vdev, queue_idx++, callbacks[i], names[i], > ctx ? ctx[i] : false); > if (IS_ERR(vqs[i])) { > ret = PTR_ERR(vqs[i]); > diff --git a/drivers/s390/virtio/virtio_ccw.c > b/drivers/s390/virtio/virtio_ccw.c > index fc9dbad..ae1d56d 100644 > --- a/drivers/s390/virtio/virtio_ccw.c > +++ b/drivers/s390/virtio/virtio_ccw.c > @@ -635,7 +635,7 @@ static int virtio_ccw_find_vqs(struct virtio_device > *vdev, unsigned nvqs, > { > struct virtio_ccw_device *vcdev = to_vc_device(vdev); > unsigned long *indicatorp = NULL; > - int ret, i; > + int ret, i, queue_idx = 0; > struct ccw1 *ccw; > > ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL); > @@ -643,8 +643,14 @@ static int virtio_ccw_find_vqs(struct virtio_device > *vdev, unsigned nvqs, > return -ENOMEM; > > for (i = 0; i < nvqs; ++i) { > - vqs[i] = virtio_ccw_setup_vq(vdev, i, callbacks[i], names[i], > - ctx ? ctx[i] : false, ccw); > + if (!names[i]) { > + vqs[i] = NULL; > + continue; > + } > + > + vqs[i] = virtio_ccw_setup_vq(vdev, queue_idx++, callbacks[i], > + names[i], ctx ? ctx[i] : false, > + ccw); > if (IS_ERR(vqs[i])) { > ret = PTR_ERR(vqs[i]); > vqs[i] = NULL; > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c > index 4cd9ea5..d9dd0f78 100644 > --- a/drivers/virtio/virtio_mmio.c > +++ b/drivers/virtio/virtio_mmio.c > @@ -468,7 +468,7 @@ static int vm_find_vqs(struct virtio_device *vdev, > unsigned nvqs, > { > struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); > unsigned int irq = platform_get_irq(vm_dev->pdev, 0); > - int i, err; > + int i, err, queue_idx = 0; > > err = request_irq(irq, vm_interrupt, IRQF_SHARED, > dev_name(&vdev->dev), vm_dev); > @@ -476,7 +476,12 @@ static int vm_find_vqs(struct virtio_device *vdev, > unsigned nvqs, > return err; > > for (i = 0; i < nvqs; ++i) { > - vqs[i] = vm_setup_vq(vdev, i, callbacks[i], names[i], > + if (!names[i]) { > +
Re: [PATCH v2 15/15] drm/bochs: reserve bo for pin/unpin
On Wed, Jan 9, 2019 at 3:54 PM Gerd Hoffmann wrote: > > On Wed, Jan 09, 2019 at 11:10:44AM +0100, Daniel Vetter wrote: > > On Tue, Jan 08, 2019 at 12:25:19PM +0100, Gerd Hoffmann wrote: > > > The buffer object must be reserved before calling > > > ttm_bo_validate for pinning/unpinning. > > > > > > Signed-off-by: Gerd Hoffmann > > > > Seems a bit a bisect fumble in your series here: legacy kms code reserved > > the ttm bo before calling boch_bo_pin/unpin, your atomic code doesn't. I > > think pushing this into bochs_bo_pin/unpin makes sense for atomic, but to > > avoid bisect fail I think you need to have these temporarily in your > > cleanup/prepare_plane functions too. > > I think I've sorted that. Have some other changes too, will probably > send v3 tomorrow. > > > Looked through the entire series, this here is the only issue I think > > should be fixed before merging (making atomic_enable optional can be done > > as a follow-up if you feel like it). With that addressed on the series: > > > > Acked-by: Daniel Vetter > > Thanks. > > While being at it: I'm also looking at dma-buf export and import > support for the qemu drivers. > > Right now both qxl and virtio have gem_prime_get_sg_table and > gem_prime_import_sg_table handlers which throw a WARN_ONCE() and return > an error. > > If I understand things correctly it is valid to set all import/export > callbacks (prime_handle_to_fd, prime_fd_to_handle, > gem_prime_get_sg_table, gem_prime_import_sg_table) to NULL when not > supporting dma-buf import/export and still advertise DRIVER_PRIME to > indicate the other prime callbacks are supported (so generic fbdev > emulation can use gem_prime_vmap etc). Is that correct? I'm not sure how much that's a good idea ... Never thought about it tbh. All the fbdev/dma-buf stuff has plenty of hacks and inconsistencies still, so I guess we can't make it much worse really. > On exporting: > > TTM_PL_TT should be easy, just pin the buffer, grab the pages list and > feed that into drm_prime_pages_to_sg. Didn't try yet though. Is that > approach correct? > > Is it possible to export TTM_PL_VRAM objects (with backing storage being > a pci memory bar)? If so, how? Not really in general. amdgpu upcasts to amdgpu_bo (if it's amgpu BO) and then knows the internals so it can do a proper pci peer2peer mapping. Or at least there's been lots of patches floating around to make that happen. I think other drivers migrate the bo out of VRAM. > On importing: > > Importing into TTM_PL_TT object looks easy again, at least when the > object is actually stored in RAM. What if not? They are all supposed to be stored in RAM. Note that all current ttm importers totally break the abstraction, by taking the sg list, throwing the dma mapping away and assuming there's a struct page backing it. Would be good if we could stop spreading that abuse - the dma-buf interfaces have been modelled after the ttm bo interfaces, so shouldn't be too hard to wire this up correctly. > Importing into TTM_PL_VRAM: Impossible I think, without copying over > the data. Should that be done? If so, how? Or is it better to just > not support import then? Hm, since you ask about TTM concepts and not what this means in terms of dma-buf: As long as you upcast to the ttm_bo you can do whatever you want to really. But with plain dma-buf this doesn't work right now (least because ttm assumes it gets system RAM on import, in theory you could put the peer2peer dma mapping into the sg list and it should work). -Daniel > > thanks, > Gerd > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 1/4] include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR
On Wed, Jan 09, 2019 at 11:35:52AM +0100, Miguel Ojeda wrote: > On Tue, Jan 8, 2019 at 6:44 PM Nick Desaulniers > wrote: > > > > Also for more context, see: > > commit 7829fb09a2b4 ("lib: make memzero_explicit more robust against > > dead store elimination") > > By the way, shouldn't that barrier_data() be directly in compiler.h > too, since it is for both gcc & clang? > > > Reviewed-by: Nick Desaulniers > > > > + Miguel > > Miguel, would you mind taking this into your compiler-attributes tree? > > Sure, at least we get quickly some linux-next time. > > Note it would be nice to separate the patch into two (one for the > comments, another for OPTIMIZER_HIDE_VAR), and also possibly another > for barrier_data(). > > Cheers, > Miguel Okay, I will try. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation
On Wed, Jan 09, 2019 at 01:07:16PM +0100, Christian Borntraeger wrote: > On 09.01.2019 11:35, Wei Wang wrote: > > On 01/08/2019 04:46 PM, Christian Borntraeger wrote: > >> > >> On 08.01.2019 06:35, Wei Wang wrote: > >>> On 01/07/2019 09:49 PM, Christian Borntraeger wrote: > On 07.01.2019 08:01, Wei Wang wrote: > > virtio-ccw has deadlock issues with reading the config space inside the > > interrupt context, so we tweak the virtballoon_changed implementation > > by moving the config read operations into the related workqueue > > contexts. > > The config_read_bitmap is used as a flag to the workqueue callbacks > > about the related config fields that need to be read. > > > > The cmd_id_received is also renamed to cmd_id_received_cache, and > > the value should be obtained via virtio_balloon_cmd_id_received. > > > > Reported-by: Christian Borntraeger > > Signed-off-by: Wei Wang > > Reviewed-by: Cornelia Huck > > Reviewed-by: Halil Pasic > Together with > virtio_pci: use queue idx instead of array idx to set up the vq > virtio: don't allocate vqs when names[i] = NULL > > Tested-by: Christian Borntraeger > >>> OK. I don't plan to send a new version of the above patches as no changes > >>> needed so far. > >>> > >>> Michael, if the above two patches look good to you, please help add the > >>> related tested-by > >>> and reviewed-by tags. Thanks. > >> Can we also make sure that > >> > >> virtio_pci: use queue idx instead of array idx to set up the vq > >> virtio: don't allocate vqs when names[i] = NULL > >> > >> also land in stable? > >> > > > > You could also send the request to stable after it gets merged to Linus' > > tree. > > The stable review committee will decide whether to take it. > > > > Please see Option 2: > > > > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html > > > > Those patches are not upstream yet, Correct? > > Michael, > > can you add the stable tag before submitting? If not, can you give me a heads > up when doing the > pull request so that I can ping the stable folks. Can you reply to patches that you feel are needed on stable with just Cc: sta...@vger.kernel.org in the message body? Then it's all automatically handled by ack attaching scripts. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 4/5] ext4: disable map_sync for virtio pmem
> > On Wed 09-01-19 19:26:05, Pankaj Gupta wrote: > > Virtio pmem provides asynchronous host page cache flush > > mechanism. We don't support 'MAP_SYNC' with virtio pmem > > and ext4. > > > > Signed-off-by: Pankaj Gupta > ... > > @@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct > > vm_area_struct *vma) > > if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC)) > > return -EOPNOTSUPP; > > > > + /* We don't support synchronous mappings with guest direct access > > +* and virtio based host page cache flush mechanism. > > +*/ > > + if (IS_DAX(file_inode(file)) && virtio_pmem_host_cache_enabled(dax_dev) > > + && (vma->vm_flags & VM_SYNC)) > > + return -EOPNOTSUPP; > > + > > Shouldn't there rather be some generic way of doing this? Having > virtio_pmem_host_cache_enabled() check in filesystem code just looks like > filesystem sniffing into details is should not care about... Maybe just > naming this (or having a wrapper) dax_dev_map_sync_supported()? Thanks for the feedback. Just wanted to avoid 'dax' in function name just to differentiate this with real dax. But yes can add a wrapper: dax_dev_map_sync_supported() Thanks, Pankaj ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 5/5] xfs: disable map_sync for virtio pmem
Virtio pmem provides asynchronous host page cache flush mechanism. we don't support 'MAP_SYNC' with virtio pmem and xfs. Signed-off-by: Pankaj Gupta --- fs/xfs/xfs_file.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index e474250..eae4aa4 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1190,6 +1190,14 @@ xfs_file_mmap( if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC)) return -EOPNOTSUPP; + /* We don't support synchronous mappings with guest direct access +* and virtio based host page cache mechanism. +*/ + if (IS_DAX(file_inode(filp)) && virtio_pmem_host_cache_enabled( + xfs_find_daxdev_for_inode(file_inode(filp))) && + (vma->vm_flags & VM_SYNC)) + return -EOPNOTSUPP; + file_accessed(filp); vma->vm_ops = &xfs_file_vm_ops; if (IS_DAX(file_inode(filp))) -- 2.9.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation
On Wed, Jan 09, 2019 at 06:35:01PM +0800, Wei Wang wrote: > On 01/08/2019 04:46 PM, Christian Borntraeger wrote: > > > > On 08.01.2019 06:35, Wei Wang wrote: > > > On 01/07/2019 09:49 PM, Christian Borntraeger wrote: > > > > On 07.01.2019 08:01, Wei Wang wrote: > > > > > virtio-ccw has deadlock issues with reading the config space inside > > > > > the > > > > > interrupt context, so we tweak the virtballoon_changed implementation > > > > > by moving the config read operations into the related workqueue > > > > > contexts. > > > > > The config_read_bitmap is used as a flag to the workqueue callbacks > > > > > about the related config fields that need to be read. > > > > > > > > > > The cmd_id_received is also renamed to cmd_id_received_cache, and > > > > > the value should be obtained via virtio_balloon_cmd_id_received. > > > > > > > > > > Reported-by: Christian Borntraeger > > > > > Signed-off-by: Wei Wang > > > > > Reviewed-by: Cornelia Huck > > > > > Reviewed-by: Halil Pasic > > > > Together with > > > > virtio_pci: use queue idx instead of array idx to set up the vq > > > > virtio: don't allocate vqs when names[i] = NULL > > > > > > > > Tested-by: Christian Borntraeger > > > OK. I don't plan to send a new version of the above patches as no changes > > > needed so far. > > > > > > Michael, if the above two patches look good to you, please help add the > > > related tested-by > > > and reviewed-by tags. Thanks. > > Can we also make sure that > > > > virtio_pci: use queue idx instead of array idx to set up the vq > > virtio: don't allocate vqs when names[i] = NULL > > > > also land in stable? > > > > You could also send the request to stable after it gets merged to Linus' > tree. > The stable review committee will decide whether to take it. > > Please see Option 2: > > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html > > > Best, > Wei That's mostly for 3rd party reporters. Why not Cc stable directly in the patches? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 4/5] ext4: disable map_sync for virtio pmem
Virtio pmem provides asynchronous host page cache flush mechanism. We don't support 'MAP_SYNC' with virtio pmem and ext4. Signed-off-by: Pankaj Gupta --- fs/ext4/file.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 69d65d4..e54f48b 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -360,8 +360,10 @@ static const struct vm_operations_struct ext4_file_vm_ops = { static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma) { struct inode *inode = file->f_mapping->host; + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + struct dax_device *dax_dev = sbi->s_daxdev; - if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb + if (unlikely(ext4_forced_shutdown(sbi))) return -EIO; /* @@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma) if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC)) return -EOPNOTSUPP; + /* We don't support synchronous mappings with guest direct access +* and virtio based host page cache flush mechanism. +*/ + if (IS_DAX(file_inode(file)) && virtio_pmem_host_cache_enabled(dax_dev) + && (vma->vm_flags & VM_SYNC)) + return -EOPNOTSUPP; + file_accessed(file); if (IS_DAX(file_inode(file))) { vma->vm_ops = &ext4_dax_vm_ops; -- 2.9.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag
This patch adds 'DAXDEV_BUFFERED' flag which is set for virtio pmem corresponding nd_region. This later is used to disable MAP_SYNC functionality for ext4 & xfs filesystem. Signed-off-by: Pankaj Gupta --- drivers/dax/super.c | 17 + drivers/nvdimm/pmem.c| 3 +++ drivers/nvdimm/region_devs.c | 7 +++ drivers/virtio/pmem.c| 1 + include/linux/dax.h | 9 + include/linux/libnvdimm.h| 6 ++ 6 files changed, 43 insertions(+) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 6e928f3..9128740 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -167,6 +167,8 @@ enum dax_device_flags { DAXDEV_ALIVE, /* gate whether dax_flush() calls the low level flush routine */ DAXDEV_WRITE_CACHE, + /* flag to disable MAP_SYNC for virtio based host page cache flush */ + DAXDEV_BUFFERED, }; /** @@ -335,6 +337,21 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev) } EXPORT_SYMBOL_GPL(dax_write_cache_enabled); +void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc) +{ + if (wc) + set_bit(DAXDEV_BUFFERED, &dax_dev->flags); + else + clear_bit(DAXDEV_BUFFERED, &dax_dev->flags); +} +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache); + +bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev) +{ + return test_bit(DAXDEV_BUFFERED, &dax_dev->flags); +} +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache_enabled); + bool dax_alive(struct dax_device *dax_dev) { lockdep_assert_held(&dax_srcu); diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index fe1217b..8d190a3 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -472,6 +472,9 @@ static int pmem_attach_disk(struct device *dev, return -ENOMEM; } dax_write_cache(dax_dev, nvdimm_has_cache(nd_region)); + + /* Set buffered bit in 'dax_dev' for virtio pmem */ + virtio_pmem_host_cache(dax_dev, nvdimm_is_buffered(nd_region)); pmem->dax_dev = dax_dev; gendev = disk_to_dev(disk); diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index f8218b4..1f8b2be 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -1264,6 +1264,13 @@ int nd_region_conflict(struct nd_region *nd_region, resource_size_t start, return device_for_each_child(&nvdimm_bus->dev, &ctx, region_conflict); } +int nvdimm_is_buffered(struct nd_region *nd_region) +{ + return is_nd_pmem(&nd_region->dev) && + test_bit(ND_REGION_BUFFERED, &nd_region->flags); +} +EXPORT_SYMBOL_GPL(nvdimm_is_buffered); + void __exit nd_region_devs_exit(void) { ida_destroy(®ion_ida); diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c index 51f5349..901767b 100644 --- a/drivers/virtio/pmem.c +++ b/drivers/virtio/pmem.c @@ -81,6 +81,7 @@ static int virtio_pmem_probe(struct virtio_device *vdev) ndr_desc.numa_node = nid; ndr_desc.flush = virtio_pmem_flush; set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags); + set_bit(ND_REGION_BUFFERED, &ndr_desc.flags); nd_region = nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc); nd_region->provider_data = dev_to_virtio (nd_region->dev.parent->parent); diff --git a/include/linux/dax.h b/include/linux/dax.h index 0dd316a..d16e03e 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -37,6 +37,8 @@ void put_dax(struct dax_device *dax_dev); void kill_dax(struct dax_device *dax_dev); void dax_write_cache(struct dax_device *dax_dev, bool wc); bool dax_write_cache_enabled(struct dax_device *dax_dev); +void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc); +bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev); #else static inline struct dax_device *dax_get_by_host(const char *host) { @@ -64,6 +66,13 @@ static inline bool dax_write_cache_enabled(struct dax_device *dax_dev) { return false; } +static inline void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc) +{ +} +static inline bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev) +{ + return false; +} #endif struct writeback_control; diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h index ca8bc07..94616f1 100644 --- a/include/linux/libnvdimm.h +++ b/include/linux/libnvdimm.h @@ -64,6 +64,11 @@ enum { */ ND_REGION_PERSIST_MEMCTRL = 2, + /* provides virtio based asynchronous flush mechanism for buffered +* host page cache. +*/ + ND_REGION_BUFFERED = 3, + /* mark newly adjusted resources as requiring a label update */ DPA_RESOURCE_ADJUSTED = 1 << 0, }; @@ -265,6 +270,7 @@ int generic_nvdimm_flush(struct nd_region *nd_region); int nvdimm_has_flush(struct nd_region *nd_region); int nvdimm_has_cache(struct nd_region *nd_region); int nvdimm_in_overwrite(struct nvdimm *nvdimm)
[PATCH v3 0/5] kvm "virtio pmem" device
This patch series has implementation for "virtio pmem". "virtio pmem" is fake persistent memory(nvdimm) in guest which allows to bypass the guest page cache. This also implements a VIRTIO based asynchronous flush mechanism. Sharing guest kernel driver in this patchset with the changes suggested in v2. Tested with Qemu side device emulation for virtio-pmem [6]. Details of project idea for 'virtio pmem' flushing interface is shared [3] & [4]. Implementation is divided into two parts: New virtio pmem guest driver and qemu code changes for new virtio pmem paravirtualized device. 1. Guest virtio-pmem kernel driver - - Reads persistent memory range from paravirt device and registers with 'nvdimm_bus'. - 'nvdimm/pmem' driver uses this information to allocate persistent memory region and setup filesystem operations to the allocated memory. - virtio pmem driver implements asynchronous flushing interface to flush from guest to host. 2. Qemu virtio-pmem device - - Creates virtio pmem device and exposes a memory range to KVM guest. - At host side this is file backed memory which acts as persistent memory. - Qemu side flush uses aio thread pool API's and virtio for asynchronous guest multi request handling. David Hildenbrand CCed also posted a modified version[6] of qemu virtio-pmem code based on updated Qemu memory device API. Virtio-pmem errors handling: Checked behaviour of virtio-pmem for below types of errors Need suggestions on expected behaviour for handling these errors? - Hardware Errors: Uncorrectable recoverable Errors: a] virtio-pmem: - As per current logic if error page belongs to Qemu process, host MCE handler isolates(hwpoison) that page and send SIGBUS. Qemu SIGBUS handler injects exception to KVM guest. - KVM guest then isolates the page and send SIGBUS to guest userspace process which has mapped the page. b] Existing implementation for ACPI pmem driver: - Handles such errors with MCE notifier and creates a list of bad blocks. Read/direct access DAX operation return EIO if accessed memory page fall in bad block list. - It also starts backgound scrubbing. - Similar functionality can be reused in virtio-pmem with MCE notifier but without scrubbing(no ACPI/ARS)? Need inputs to confirm if this behaviour is ok or needs any change? Changes from PATCH v2: [1] - Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] - Use name 'virtio pmem' in place of 'fake dax' Changes from PATCH v1: [2] - 0-day build test for build dependency on libnvdimm Changes suggested by - [Dan Williams] - Split the driver into two parts virtio & pmem - Move queuing of async block request to block layer - Add "sync" parameter in nvdimm_flush function - Use indirect call for nvdimm_flush - Don’t move declarations to common global header e.g nd.h - nvdimm_flush() return 0 or -EIO if it fails - Teach nsio_rw_bytes() that the flush can fail - Rename nvdimm_flush() to generic_nvdimm_flush() - Use 'nd_region->provider_data' for long dereferencing - Remove virtio_pmem_freeze/restore functions - Remove BSD license text with SPDX license text - Add might_sleep() in virtio_pmem_flush - [Luiz] - Make spin_lock_irqsave() narrow Changes from RFC v3 - Rebase to latest upstream - Luiz - Call ndregion->flush in place of nvdimm_flush- Luiz - kmalloc return check - Luiz - virtqueue full handling - Stefan - Don't map entire virtio_pmem_req to device - Stefan - request leak, correct sizeof req- Stefan - Move declaration to virtio_pmem.c Changes from RFC v2: - Add flush function in the nd_region in place of switching on a flag - Dan & Stefan - Add flush completion function with proper locking and wait for host side flush completion - Stefan & Dan - Keep userspace API in uapi header file - Stefan, MST - Use LE fields & New device id - MST - Indentation & spacing suggestions - MST & Eric - Remove extra header files & add licensing - Stefan Changes from RFC v1: - Reuse existing 'pmem' code for registering persistent memory and other operations instead of creating an entirely new block driver. - Use VIRTIO driver to register memory information with nvdimm_bus and create region_type accordingly. - Call VIRTIO flush from existing pmem driver. Pankaj Gupta (5): libnvdimm: nd_region flush callback support virtio-pmem: Add virtio-pmem guest driver libnvdimm: add nd_region buffered dax_dev flag ext4: disable map_sync for virtio pmem xfs: disable map_sync for virtio pmem [2] https://lkml.org/lkml/2018/8/31/407 [3] https://www.spinics.net/lists/kvm/msg149761.html [4] https://www.spinics.net/lists/kvm/msg153095.html [5] https://lkml.org/lkml/2018/8/31/413 [6] https://marc.info/?l=qemu-devel&m=153555721901824&w=2 drivers/acpi/nfit
[PATCH v3 2/5] virtio-pmem: Add virtio pmem driver
This patch adds virtio-pmem driver for KVM guest. Guest reads the persistent memory range information from Qemu over VIRTIO and registers it on nvdimm_bus. It also creates a nd_region object with the persistent memory range information so that existing 'nvdimm/pmem' driver can reserve this into system memory map. This way 'virtio-pmem' driver uses existing functionality of pmem driver to register persistent memory compatible for DAX capable filesystems. This also provides function to perform guest flush over VIRTIO from 'pmem' driver when userspace performs flush on DAX memory range. Signed-off-by: Pankaj Gupta --- drivers/nvdimm/virtio_pmem.c | 84 ++ drivers/virtio/Kconfig | 10 drivers/virtio/Makefile | 1 + drivers/virtio/pmem.c| 124 +++ include/linux/virtio_pmem.h | 60 +++ include/uapi/linux/virtio_ids.h | 1 + include/uapi/linux/virtio_pmem.h | 10 7 files changed, 290 insertions(+) create mode 100644 drivers/nvdimm/virtio_pmem.c create mode 100644 drivers/virtio/pmem.c create mode 100644 include/linux/virtio_pmem.h create mode 100644 include/uapi/linux/virtio_pmem.h diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c new file mode 100644 index 000..2a1b1ba --- /dev/null +++ b/drivers/nvdimm/virtio_pmem.c @@ -0,0 +1,84 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * virtio_pmem.c: Virtio pmem Driver + * + * Discovers persistent memory range information + * from host and provides a virtio based flushing + * interface. + */ +#include +#include "nd.h" + + /* The interrupt handler */ +void host_ack(struct virtqueue *vq) +{ + unsigned int len; + unsigned long flags; + struct virtio_pmem_request *req, *req_buf; + struct virtio_pmem *vpmem = vq->vdev->priv; + + spin_lock_irqsave(&vpmem->pmem_lock, flags); + while ((req = virtqueue_get_buf(vq, &len)) != NULL) { + req->done = true; + wake_up(&req->host_acked); + + if (!list_empty(&vpmem->req_list)) { + req_buf = list_first_entry(&vpmem->req_list, + struct virtio_pmem_request, list); + list_del(&vpmem->req_list); + req_buf->wq_buf_avail = true; + wake_up(&req_buf->wq_buf); + } + } + spin_unlock_irqrestore(&vpmem->pmem_lock, flags); +} +EXPORT_SYMBOL_GPL(host_ack); + + /* The request submission function */ +int virtio_pmem_flush(struct nd_region *nd_region) +{ + int err; + unsigned long flags; + struct scatterlist *sgs[2], sg, ret; + struct virtio_device *vdev = nd_region->provider_data; + struct virtio_pmem *vpmem = vdev->priv; + struct virtio_pmem_request *req; + + might_sleep(); + req = kmalloc(sizeof(*req), GFP_KERNEL); + if (!req) + return -ENOMEM; + + req->done = req->wq_buf_avail = false; + strcpy(req->name, "FLUSH"); + init_waitqueue_head(&req->host_acked); + init_waitqueue_head(&req->wq_buf); + sg_init_one(&sg, req->name, strlen(req->name)); + sgs[0] = &sg; + sg_init_one(&ret, &req->ret, sizeof(req->ret)); + sgs[1] = &ret; + + spin_lock_irqsave(&vpmem->pmem_lock, flags); + err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC); + if (err) { + dev_err(&vdev->dev, "failed to send command to virtio pmem device\n"); + + list_add_tail(&vpmem->req_list, &req->list); + spin_unlock_irqrestore(&vpmem->pmem_lock, flags); + + /* When host has read buffer, this completes via host_ack */ + wait_event(req->wq_buf, req->wq_buf_avail); + spin_lock_irqsave(&vpmem->pmem_lock, flags); + } + virtqueue_kick(vpmem->req_vq); + spin_unlock_irqrestore(&vpmem->pmem_lock, flags); + + /* When host has read buffer, this completes via host_ack */ + wait_event(req->host_acked, req->done); + err = req->ret; + kfree(req); + + return err; +}; +EXPORT_SYMBOL_GPL(virtio_pmem_flush); +MODULE_LICENSE("GPL"); diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 3589764..9f634a2 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY If unsure, say Y. +config VIRTIO_PMEM + tristate "Support for virtio pmem driver" + depends on VIRTIO + depends on LIBNVDIMM + help + This driver provides support for virtio based flushing interface + for persistent memory range. + + If unsure, say M. + config VIRTIO_BALLOON tristate "Virtio balloon driver" depends on VIRTIO diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index 3a2b5c5..143ce91 100644 --- a/drivers/virtio/Makefile +++ b/driv
[PATCH v3 1/5] libnvdimm: nd_region flush callback support
This patch adds functionality to perform flush from guest to host over VIRTIO. We are registering a callback based on 'nd_region' type. virtio_pmem driver requires this special flush function. For rest of the region types we are registering existing flush function. Report error returned by host fsync failure to userspace. This also handles asynchronous flush requests from the block layer by creating a child bio and chaining it with parent bio. Signed-off-by: Pankaj Gupta --- drivers/acpi/nfit/core.c | 4 ++-- drivers/nvdimm/claim.c | 6 -- drivers/nvdimm/nd.h | 1 + drivers/nvdimm/pmem.c| 12 drivers/nvdimm/region_devs.c | 38 -- include/linux/libnvdimm.h| 5 - 6 files changed, 55 insertions(+), 11 deletions(-) diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index b072cfc..f154852 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -2234,7 +2234,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw, offset = to_interleave_offset(offset, mmio); writeq(cmd, mmio->addr.base + offset); - nvdimm_flush(nfit_blk->nd_region); + nvdimm_flush(nfit_blk->nd_region, NULL, false); if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH) readq(mmio->addr.base + offset); @@ -2283,7 +2283,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk, } if (rw) - nvdimm_flush(nfit_blk->nd_region); + nvdimm_flush(nfit_blk->nd_region, NULL, false); rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0; return rc; diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c index fb667bf..a1dfa06 100644 --- a/drivers/nvdimm/claim.c +++ b/drivers/nvdimm/claim.c @@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns, struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512); sector_t sector = offset >> 9; - int rc = 0; + int rc = 0, ret = 0; if (unlikely(!size)) return 0; @@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns, } memcpy_flushcache(nsio->addr + offset, buf, size); - nvdimm_flush(to_nd_region(ndns->dev.parent)); + ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL, false); + if (ret) + rc = ret; return rc; } diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h index 98317e7..d53a2d1 100644 --- a/drivers/nvdimm/nd.h +++ b/drivers/nvdimm/nd.h @@ -160,6 +160,7 @@ struct nd_region { struct nd_interleave_set *nd_set; struct nd_percpu_lane __percpu *lane; struct nd_mapping mapping[0]; + int (*flush)(struct nd_region *nd_region); }; struct nd_blk_region { diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 6071e29..5d6a4a1 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page, static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio) { + int ret = 0; blk_status_t rc = 0; bool do_acct; unsigned long start; @@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio) struct nd_region *nd_region = to_region(pmem); if (bio->bi_opf & REQ_PREFLUSH) - nvdimm_flush(nd_region); + ret = nvdimm_flush(nd_region, bio, true); do_acct = nd_iostat_start(bio, &start); bio_for_each_segment(bvec, bio, iter) { @@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio) nd_iostat_end(bio, start); if (bio->bi_opf & REQ_FUA) - nvdimm_flush(nd_region); + ret = nvdimm_flush(nd_region, bio, true); + + if (ret) + bio->bi_status = errno_to_blk_status(ret); bio_endio(bio); return BLK_QC_T_NONE; @@ -528,14 +532,14 @@ static int nd_pmem_remove(struct device *dev) sysfs_put(pmem->bb_state); pmem->bb_state = NULL; } - nvdimm_flush(to_nd_region(dev->parent)); + nvdimm_flush(to_nd_region(dev->parent), NULL, false); return 0; } static void nd_pmem_shutdown(struct device *dev) { - nvdimm_flush(to_nd_region(dev->parent)); + nvdimm_flush(to_nd_region(dev->parent), NULL, false); } static void nd_pmem_notify(struct device *dev, enum nvdimm_event event) diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index fa37afc..5508727 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -290,7 +290,9 @@ static ssize_t deep_flush_store(struct device *dev, struct device_attribute *att return rc;
Re: [Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device
Please ignore this series as my network went down while sending this. I will send this series again. Thanks, Pankaj > > This patch series has implementation for "virtio pmem". > "virtio pmem" is fake persistent memory(nvdimm) in guest > which allows to bypass the guest page cache. This also > implements a VIRTIO based asynchronous flush mechanism. > > Sharing guest kernel driver in this patchset with the > changes suggested in v2. Tested with Qemu side device > emulation for virtio-pmem [6]. > > Details of project idea for 'virtio pmem' flushing interface > is shared [3] & [4]. > > Implementation is divided into two parts: > New virtio pmem guest driver and qemu code changes for new > virtio pmem paravirtualized device. > > 1. Guest virtio-pmem kernel driver > - >- Reads persistent memory range from paravirt device and > registers with 'nvdimm_bus'. >- 'nvdimm/pmem' driver uses this information to allocate > persistent memory region and setup filesystem operations > to the allocated memory. >- virtio pmem driver implements asynchronous flushing > interface to flush from guest to host. > > 2. Qemu virtio-pmem device > - >- Creates virtio pmem device and exposes a memory range to > KVM guest. >- At host side this is file backed memory which acts as > persistent memory. >- Qemu side flush uses aio thread pool API's and virtio > for asynchronous guest multi request handling. > >David Hildenbrand CCed also posted a modified version[6] of >qemu virtio-pmem code based on updated Qemu memory device API. > > Virtio-pmem errors handling: > > Checked behaviour of virtio-pmem for below types of errors > Need suggestions on expected behaviour for handling these errors? > > - Hardware Errors: Uncorrectable recoverable Errors: > a] virtio-pmem: > - As per current logic if error page belongs to Qemu process, > host MCE handler isolates(hwpoison) that page and send SIGBUS. > Qemu SIGBUS handler injects exception to KVM guest. > - KVM guest then isolates the page and send SIGBUS to guest > userspace process which has mapped the page. > > b] Existing implementation for ACPI pmem driver: > - Handles such errors with MCE notifier and creates a list > of bad blocks. Read/direct access DAX operation return EIO > if accessed memory page fall in bad block list. > - It also starts backgound scrubbing. > - Similar functionality can be reused in virtio-pmem with MCE > notifier but without scrubbing(no ACPI/ARS)? Need inputs to > confirm if this behaviour is ok or needs any change? > > Changes from PATCH v2: [1] > - Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] > - Use name 'virtio pmem' in place of 'fake dax' > > Changes from PATCH v1: [2] > - 0-day build test for build dependency on libnvdimm > > Changes suggested by - [Dan Williams] > - Split the driver into two parts virtio & pmem > - Move queuing of async block request to block layer > - Add "sync" parameter in nvdimm_flush function > - Use indirect call for nvdimm_flush > - Don’t move declarations to common global header e.g nd.h > - nvdimm_flush() return 0 or -EIO if it fails > - Teach nsio_rw_bytes() that the flush can fail > - Rename nvdimm_flush() to generic_nvdimm_flush() > - Use 'nd_region->provider_data' for long dereferencing > - Remove virtio_pmem_freeze/restore functions > - Remove BSD license text with SPDX license text > > - Add might_sleep() in virtio_pmem_flush - [Luiz] > - Make spin_lock_irqsave() narrow > > Changes from RFC v3 > - Rebase to latest upstream - Luiz > - Call ndregion->flush in place of nvdimm_flush- Luiz > - kmalloc return check - Luiz > - virtqueue full handling - Stefan > - Don't map entire virtio_pmem_req to device - Stefan > - request leak, correct sizeof req- Stefan > - Move declaration to virtio_pmem.c > > Changes from RFC v2: > - Add flush function in the nd_region in place of switching > on a flag - Dan & Stefan > - Add flush completion function with proper locking and wait > for host side flush completion - Stefan & Dan > - Keep userspace API in uapi header file - Stefan, MST > - Use LE fields & New device id - MST > - Indentation & spacing suggestions - MST & Eric > - Remove extra header files & add licensing - Stefan > > Changes from RFC v1: > - Reuse existing 'pmem' code for registering persistent > memory and other operations instead of creating an entirely > new block driver. > - Use VIRTIO driver to register memory information with > nvdimm_bus and create region_type accordingly. > - Call VIRTIO flush from existing pmem driver. > > Pankaj Gupta (5): >libnvdimm: nd_region flush callback support >virtio-pmem: Add virtio-pmem guest driver >libnvdimm: add nd_region buffered dax_dev flag >ext4: disable map_sync for virtio pmem
Re: [PATCH v3 4/5] ext4: disable map_sync for virtio pmem
On Wed 09-01-19 19:26:05, Pankaj Gupta wrote: > Virtio pmem provides asynchronous host page cache flush > mechanism. We don't support 'MAP_SYNC' with virtio pmem > and ext4. > > Signed-off-by: Pankaj Gupta ... > @@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct > vm_area_struct *vma) > if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC)) > return -EOPNOTSUPP; > > + /* We don't support synchronous mappings with guest direct access > + * and virtio based host page cache flush mechanism. > + */ > + if (IS_DAX(file_inode(file)) && virtio_pmem_host_cache_enabled(dax_dev) > + && (vma->vm_flags & VM_SYNC)) > + return -EOPNOTSUPP; > + Shouldn't there rather be some generic way of doing this? Having virtio_pmem_host_cache_enabled() check in filesystem code just looks like filesystem sniffing into details is should not care about... Maybe just naming this (or having a wrapper) dax_dev_map_sync_supported()? Honza -- Jan Kara SUSE Labs, CR ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio:linux:kernel:NULL check after kmalloc is needed
On Tue, Jan 08, 2019 at 01:40:33PM +0800, Yi Wang wrote: > From: "huang.zijiang" > > NULL check is needed because kmalloc maybe return NULL. > > Signed-off-by: huang.zijiang Can't hurt I will queue it. > --- > tools/virtio/linux/kernel.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h > index 7ef45a4..2afcad8 100644 > --- a/tools/virtio/linux/kernel.h > +++ b/tools/virtio/linux/kernel.h > @@ -65,6 +65,8 @@ static inline void *kzalloc(size_t s, gfp_t gfp) > { > void *p = kmalloc(s, gfp); > > + if (!p) > + return -ENOMEM; > memset(p, 0, s); > return p; > } > -- > 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next V2 1/3] virtio: introduce in order feature bit
On Wed, Jan 09, 2019 at 04:05:28PM +0800, Jason Wang wrote: > Signed-off-by: Jason Wang > --- > include/uapi/linux/virtio_config.h | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/include/uapi/linux/virtio_config.h > b/include/uapi/linux/virtio_config.h > index 1196e1c1d4f6..2698e069ed9e 100644 > --- a/include/uapi/linux/virtio_config.h > +++ b/include/uapi/linux/virtio_config.h > @@ -78,6 +78,12 @@ > /* This feature indicates support for the packed virtqueue layout. */ > #define VIRTIO_F_RING_PACKED 34 > > +/* > + * Device uses buffers in the same order in which they have been > + * available. been *made* available > + */ > +#define VIRTIO_F_IN_ORDER35 > + > /* > * Does the device support Single Root I/O Virtualization? > */ > -- > 2.17.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 5/5] xfs: disable map_sync for virtio pmem
Virtio pmem provides asynchronous host page cache flush mechanism. we don't support 'MAP_SYNC' with virtio pmem and xfs. Signed-off-by: Pankaj Gupta --- fs/xfs/xfs_file.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index e474250..eae4aa4 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1190,6 +1190,14 @@ xfs_file_mmap( if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC)) return -EOPNOTSUPP; + /* We don't support synchronous mappings with guest direct access +* and virtio based host page cache mechanism. +*/ + if (IS_DAX(file_inode(filp)) && virtio_pmem_host_cache_enabled( + xfs_find_daxdev_for_inode(file_inode(filp))) && + (vma->vm_flags & VM_SYNC)) + return -EOPNOTSUPP; + file_accessed(filp); vma->vm_ops = &xfs_file_vm_ops; if (IS_DAX(file_inode(filp))) -- 2.9.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net V2] vhost: log dirty page correctly
On Wed, Jan 09, 2019 at 03:29:47PM +0800, Jason Wang wrote: > Vhost dirty page logging API is designed to sync through GPA. But we > try to log GIOVA when device IOTLB is enabled. This is wrong and may > lead to missing data after migration. > > To solve this issue, when logging with device IOTLB enabled, we will: > > 1) reuse the device IOTLB translation result of GIOVA->HVA mapping to >get HVA, for writable descriptor, get HVA through iovec. For used >ring update, translate its GIOVA to HVA > 2) traverse the GPA->HVA mapping to get the possible GPA and log >through GPA. Pay attention this reverse mapping is not guaranteed >to be unique, so we should log each possible GPA in this case. > > This fix the failure of scp to guest during migration. In -next, we > will probably support passing GIOVA->GPA instead of GIOVA->HVA. > > Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API") > Reported-by: Jintack Lim > Cc: Jintack Lim > Signed-off-by: Jason Wang > --- > The patch is needed for stable. > Changes from V1: > - return error instead of warn > --- > drivers/vhost/net.c | 3 +- > drivers/vhost/vhost.c | 82 +++ > drivers/vhost/vhost.h | 3 +- > 3 files changed, 72 insertions(+), 16 deletions(-) > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > index 36f3d0f49e60..bca86bf7189f 100644 > --- a/drivers/vhost/net.c > +++ b/drivers/vhost/net.c > @@ -1236,7 +1236,8 @@ static void handle_rx(struct vhost_net *net) > if (nvq->done_idx > VHOST_NET_BATCH) > vhost_net_signal_used(nvq); > if (unlikely(vq_log)) > - vhost_log_write(vq, vq_log, log, vhost_len); > + vhost_log_write(vq, vq_log, log, vhost_len, > + vq->iov, in); > total_len += vhost_len; > if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) { > vhost_poll_queue(&vq->poll); > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index 9f7942cbcbb2..ee095f08ffd4 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -1733,11 +1733,70 @@ static int log_write(void __user *log_base, > return r; > } > > +static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len) > +{ > + struct vhost_umem *umem = vq->umem; > + struct vhost_umem_node *u; > + u64 gpa; > + int r; > + bool hit = false; > + > + list_for_each_entry(u, &umem->umem_list, link) { > + if (u->userspace_addr < hva && > + u->userspace_addr + u->size >= > + hva + len) { So this tries to see that the GPA range is completely within the GVA region. Does this have to be the case? And if yes why not return 0 below instead of hit = true? I'm also a bit concerned about overflow when addr + len is on a 64 bit boundary. Why not check add + size - 1 and hva + len - 1 instead? > + gpa = u->start + hva - u->userspace_addr; > + r = log_write(vq->log_base, gpa, len); > + if (r < 0) > + return r; > + hit = true; > + } > + } > + > + if (!hit) > + return -EFAULT; > + > + return 0; > +} > + > +static int log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len) > +{ > + struct iovec iov[64]; > + int i, ret; > + > + if (!vq->iotlb) > + return log_write(vq->log_base, vq->log_addr + used_offset, len); > + > + ret = translate_desc(vq, (u64)(uintptr_t)vq->used + used_offset, > + len, iov, 64, VHOST_ACCESS_WO); We don't need the cast to u64 here do we? > + if (ret) > + return ret; > + > + for (i = 0; i < ret; i++) { > + ret = log_write_hva(vq, (u64)(uintptr_t)iov[i].iov_base, We don't need the cast to u64 here do we? > + iov[i].iov_len); > + if (ret) > + return ret; > + } > + > + return 0; > +} > + > int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log, > - unsigned int log_num, u64 len) > + unsigned int log_num, u64 len, struct iovec *iov, int count) > { > int i, r; > > + if (vq->iotlb) { > + for (i = 0; i < count; i++) { > + r = log_write_hva(vq, (u64)(uintptr_t)iov[i].iov_base, > + iov[i].iov_len); We don't need the cast to u64 here do we? > + if (r < 0) > + return r; > + } > + return 0; > + } > + > /* Make sure data written is seen before log. */ > smp_wmb(); Shouldn't the wmb be before log_write_hva too? > for (i = 0; i < log_num; ++i) { > @@ -1769,9 +1828,8 @@ static int vhost_update_used_flags(struct > vhost_virtqueue *vq) >
Re: [PATCH v2 15/15] drm/bochs: reserve bo for pin/unpin
On Wed, Jan 9, 2019 at 9:52 PM Gerd Hoffmann wrote: > > Hi, > > > > If I understand things correctly it is valid to set all import/export > > > callbacks (prime_handle_to_fd, prime_fd_to_handle, > > > gem_prime_get_sg_table, gem_prime_import_sg_table) to NULL when not > > > supporting dma-buf import/export and still advertise DRIVER_PRIME to > > > indicate the other prime callbacks are supported (so generic fbdev > > > emulation can use gem_prime_vmap etc). Is that correct? > > > > I'm not sure how much that's a good idea ... Never thought about it > > tbh. All the fbdev/dma-buf stuff has plenty of hacks and > > inconsistencies still, so I guess we can't make it much worse really. > > Setting prime_handle_to_fd + prime_fd_to_handle to NULL has the effect > that drm stops advertising DRM_PRIME_CAP_{IMPORT,EXPORT} to userspace. > > Which looks better to me than telling userspace we support it then throw > errors unconditionally when userspace tries to use that. > > > > Is it possible to export TTM_PL_VRAM objects (with backing storage being > > > a pci memory bar)? If so, how? > > > > Not really in general. amdgpu upcasts to amdgpu_bo (if it's amgpu BO) > > and then knows the internals so it can do a proper pci peer2peer > > mapping. Or at least there's been lots of patches floating around to > > make that happen. > > That is limited to bo sharing between two amdgpu devices, correct? > > > I think other drivers migrate the bo out of VRAM. > > Well, that doesn't look too useful. bochs and qxl virtual hardware > can't access buffers outside VRAM. So, while I could migrate the > buffers to RAM (via memcpy) when exporting they would at the same time > become unusable for the GPU ... > > > > On importing: > > > > > > Importing into TTM_PL_TT object looks easy again, at least when the > > > object is actually stored in RAM. What if not? > > > > They are all supposed to be stored in RAM. Note that all current ttm > > importers totally break the abstraction, by taking the sg list, > > throwing the dma mapping away and assuming there's a struct page > > backing it. Would be good if we could stop spreading that abuse - the > > dma-buf interfaces have been modelled after the ttm bo interfaces, so > > shouldn't be too hard to wire this up correctly. > > Ok. With virtio-gpu (where objects are backed by RAM pages anyway) > wiring this up should be easy. > > But given there is no correct sample code I can look at it would be cool > if you could give some more hints how this is supposed to work. The > gem_prime_import_sg_table() callback gets a sg list passed in after all, > so I probably would have tried to take the sg list too ... I'm not a fan of that helper either, that's really the broken part imo. i915 doesn't use it. It's a midlayer so that the nvidia blob can avoid directly touching the EXPORT_SYMBOL_GPL dma-buf symbols, afaiui there's really no other solid reason for it. What the new gem cma helpers does is imo much better (it still uses the import_sg_table midlayer, but oh well). For ttm you'd need to make sure that all the various ttm cpu side access functions also all go through the relevant dma-buf interfaces, and not through the struct page list fished out of the sgt. That was at least the idea, long ago. > > > Importing into TTM_PL_VRAM: Impossible I think, without copying over > > > the data. Should that be done? If so, how? Or is it better to just > > > not support import then? > > > > Hm, since you ask about TTM concepts and not what this means in terms > > of dma-buf: > > Ok, more details on the quesion: > > dma-buf: whatever the driver gets passed into the > gem_prime_import_sg_table() callback. > > import into TTM_PL_VRAM: qemu driver which supports VRAM storage only > (bochs, qxl), so the buffer has to be stored there if we want do > something with it (like scanning out to a crtc). > > > As long as you upcast to the ttm_bo you can do whatever > > you want to really. > > Well, if the dma-buf comes from another device (say export vgem bo, then > try import into bochs/qxl/virtio) I can't upcast. In that case you'll in practice only get system RAM, and you're not allowed to move it (dma-buf is meant to be zero-copy after all). If your hw can't scan these out directly, then userspace needs to arrange for a buffer copy into a native buffer somehow (that's how Xorg prime works at least I think). No idea whether your virtual gpus can make use of that directly. You might also get some pci peer2peer range in the future, but it's strictly opt-in (because there's too many dma-buf importers that just blindly assume there's a struct page behind the sgt). > When the dma-buf comes from the same device drm_gem_prime_import_dev() > will notice and take a shortcut (skip import, just increase refcount > instead), so I don't have to worry about that case in the > gem_prime_import_sg_table() callback. You can also upcast if it's from the same driver, not just same device. -Daniel > > But with plain dma-
Re: [PATCH v2 15/15] drm/bochs: reserve bo for pin/unpin
Hi, > > If I understand things correctly it is valid to set all import/export > > callbacks (prime_handle_to_fd, prime_fd_to_handle, > > gem_prime_get_sg_table, gem_prime_import_sg_table) to NULL when not > > supporting dma-buf import/export and still advertise DRIVER_PRIME to > > indicate the other prime callbacks are supported (so generic fbdev > > emulation can use gem_prime_vmap etc). Is that correct? > > I'm not sure how much that's a good idea ... Never thought about it > tbh. All the fbdev/dma-buf stuff has plenty of hacks and > inconsistencies still, so I guess we can't make it much worse really. Setting prime_handle_to_fd + prime_fd_to_handle to NULL has the effect that drm stops advertising DRM_PRIME_CAP_{IMPORT,EXPORT} to userspace. Which looks better to me than telling userspace we support it then throw errors unconditionally when userspace tries to use that. > > Is it possible to export TTM_PL_VRAM objects (with backing storage being > > a pci memory bar)? If so, how? > > Not really in general. amdgpu upcasts to amdgpu_bo (if it's amgpu BO) > and then knows the internals so it can do a proper pci peer2peer > mapping. Or at least there's been lots of patches floating around to > make that happen. That is limited to bo sharing between two amdgpu devices, correct? > I think other drivers migrate the bo out of VRAM. Well, that doesn't look too useful. bochs and qxl virtual hardware can't access buffers outside VRAM. So, while I could migrate the buffers to RAM (via memcpy) when exporting they would at the same time become unusable for the GPU ... > > On importing: > > > > Importing into TTM_PL_TT object looks easy again, at least when the > > object is actually stored in RAM. What if not? > > They are all supposed to be stored in RAM. Note that all current ttm > importers totally break the abstraction, by taking the sg list, > throwing the dma mapping away and assuming there's a struct page > backing it. Would be good if we could stop spreading that abuse - the > dma-buf interfaces have been modelled after the ttm bo interfaces, so > shouldn't be too hard to wire this up correctly. Ok. With virtio-gpu (where objects are backed by RAM pages anyway) wiring this up should be easy. But given there is no correct sample code I can look at it would be cool if you could give some more hints how this is supposed to work. The gem_prime_import_sg_table() callback gets a sg list passed in after all, so I probably would have tried to take the sg list too ... > > Importing into TTM_PL_VRAM: Impossible I think, without copying over > > the data. Should that be done? If so, how? Or is it better to just > > not support import then? > > Hm, since you ask about TTM concepts and not what this means in terms > of dma-buf: Ok, more details on the quesion: dma-buf: whatever the driver gets passed into the gem_prime_import_sg_table() callback. import into TTM_PL_VRAM: qemu driver which supports VRAM storage only (bochs, qxl), so the buffer has to be stored there if we want do something with it (like scanning out to a crtc). > As long as you upcast to the ttm_bo you can do whatever > you want to really. Well, if the dma-buf comes from another device (say export vgem bo, then try import into bochs/qxl/virtio) I can't upcast. When the dma-buf comes from the same device drm_gem_prime_import_dev() will notice and take a shortcut (skip import, just increase refcount instead), so I don't have to worry about that case in the gem_prime_import_sg_table() callback. > But with plain dma-buf this doesn't work right now > (least because ttm assumes it gets system RAM on import, in theory you > could put the peer2peer dma mapping into the sg list and it should > work). Well, qemu display devices don't have peer2peer dma support. So I guess the answer is "doesn't work". cheers, Gerd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next V2 0/3] basic in order support for vhost_net
On 2019/1/9 下午4:05, Jason Wang wrote: Hi: This series implement basic in order feature support for vhost_net. This feature requires both driver and device to use descriptors in order which can simplify the implementation and optimizaton for both side. The series also implement a simple optimization that avoid read available ring. Test shows 10% performance improvement at most. More optimizations could be done on top. Changes from V1: - no code changes - add result of SMAP off Just notice the patch works only for some specific case e.g a buffer only contain one descriptor. Will respin. Thanks Jason Wang (3): virtio: introduce in order feature bit vhost_net: support in order feature vhost: don't touch avail ring if in_order is negotiated drivers/vhost/net.c| 6 -- drivers/vhost/vhost.c | 19 --- include/uapi/linux/virtio_config.h | 6 ++ 3 files changed, 22 insertions(+), 9 deletions(-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation
On 09.01.2019 11:35, Wei Wang wrote: > On 01/08/2019 04:46 PM, Christian Borntraeger wrote: >> >> On 08.01.2019 06:35, Wei Wang wrote: >>> On 01/07/2019 09:49 PM, Christian Borntraeger wrote: On 07.01.2019 08:01, Wei Wang wrote: > virtio-ccw has deadlock issues with reading the config space inside the > interrupt context, so we tweak the virtballoon_changed implementation > by moving the config read operations into the related workqueue contexts. > The config_read_bitmap is used as a flag to the workqueue callbacks > about the related config fields that need to be read. > > The cmd_id_received is also renamed to cmd_id_received_cache, and > the value should be obtained via virtio_balloon_cmd_id_received. > > Reported-by: Christian Borntraeger > Signed-off-by: Wei Wang > Reviewed-by: Cornelia Huck > Reviewed-by: Halil Pasic Together with virtio_pci: use queue idx instead of array idx to set up the vq virtio: don't allocate vqs when names[i] = NULL Tested-by: Christian Borntraeger >>> OK. I don't plan to send a new version of the above patches as no changes >>> needed so far. >>> >>> Michael, if the above two patches look good to you, please help add the >>> related tested-by >>> and reviewed-by tags. Thanks. >> Can we also make sure that >> >> virtio_pci: use queue idx instead of array idx to set up the vq >> virtio: don't allocate vqs when names[i] = NULL >> >> also land in stable? >> > > You could also send the request to stable after it gets merged to Linus' tree. > The stable review committee will decide whether to take it. > > Please see Option 2: > > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html > Those patches are not upstream yet, Correct? Michael, can you add the stable tag before submitting? If not, can you give me a heads up when doing the pull request so that I can ping the stable folks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation
On 01/08/2019 04:46 PM, Christian Borntraeger wrote: On 08.01.2019 06:35, Wei Wang wrote: On 01/07/2019 09:49 PM, Christian Borntraeger wrote: On 07.01.2019 08:01, Wei Wang wrote: virtio-ccw has deadlock issues with reading the config space inside the interrupt context, so we tweak the virtballoon_changed implementation by moving the config read operations into the related workqueue contexts. The config_read_bitmap is used as a flag to the workqueue callbacks about the related config fields that need to be read. The cmd_id_received is also renamed to cmd_id_received_cache, and the value should be obtained via virtio_balloon_cmd_id_received. Reported-by: Christian Borntraeger Signed-off-by: Wei Wang Reviewed-by: Cornelia Huck Reviewed-by: Halil Pasic Together with virtio_pci: use queue idx instead of array idx to set up the vq virtio: don't allocate vqs when names[i] = NULL Tested-by: Christian Borntraeger OK. I don't plan to send a new version of the above patches as no changes needed so far. Michael, if the above two patches look good to you, please help add the related tested-by and reviewed-by tags. Thanks. Can we also make sure that virtio_pci: use queue idx instead of array idx to set up the vq virtio: don't allocate vqs when names[i] = NULL also land in stable? You could also send the request to stable after it gets merged to Linus' tree. The stable review committee will decide whether to take it. Please see Option 2: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html Best, Wei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 5/5] xfs: disable map_sync for virtio pmem
> > Virtio pmem provides asynchronous host page cache flush > > mechanism. we don't support 'MAP_SYNC' with virtio pmem > > and xfs. > > > > Signed-off-by: Pankaj Gupta > > --- > > fs/xfs/xfs_file.c | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > > index e474250..eae4aa4 100644 > > --- a/fs/xfs/xfs_file.c > > +++ b/fs/xfs/xfs_file.c > > @@ -1190,6 +1190,14 @@ xfs_file_mmap( > > if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC)) > > return -EOPNOTSUPP; > > > > + /* We don't support synchronous mappings with guest direct access > > +* and virtio based host page cache mechanism. > > +*/ > > + if (IS_DAX(file_inode(filp)) && virtio_pmem_host_cache_enabled( > > Echoing what Jan said, this ought to be some sort of generic function > that tells us whether or not memory mapped from the dax device will > always still be accessible even after a crash (i.e. supports MAP_SYNC). o.k > > What if the underlying file on the host is itself on pmem and can be > MAP_SYNC'd? Shouldn't the guest be able to use MAP_SYNC as well? Guest MAP_SYNC on actual host pmem will sync guest metadata, as guest writes are persistent on actual pmem. Host side Qemu MAP_SYNC enabling work for pmem is in progress. It will make sure metadata at host side is in consistent state after a crash or any other metadata corruption operation. For virtio-pmem, we are emulating pmem device over regular storage on host side. Guest need to call fsync followed by write to make sure guest metadata is in consistent state(or journalled). Host backing file data & metadata will also be persistent after guest to host fsync. Thanks, Pankaj ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag
On Wed, Jan 9, 2019 at 5:53 AM Pankaj Gupta wrote: > > This patch adds 'DAXDEV_BUFFERED' flag which is set > for virtio pmem corresponding nd_region. This later > is used to disable MAP_SYNC functionality for ext4 > & xfs filesystem. > > Signed-off-by: Pankaj Gupta > --- > drivers/dax/super.c | 17 + > drivers/nvdimm/pmem.c| 3 +++ > drivers/nvdimm/region_devs.c | 7 +++ > drivers/virtio/pmem.c| 1 + > include/linux/dax.h | 9 + > include/linux/libnvdimm.h| 6 ++ > 6 files changed, 43 insertions(+) > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > index 6e928f3..9128740 100644 > --- a/drivers/dax/super.c > +++ b/drivers/dax/super.c > @@ -167,6 +167,8 @@ enum dax_device_flags { > DAXDEV_ALIVE, > /* gate whether dax_flush() calls the low level flush routine */ > DAXDEV_WRITE_CACHE, > + /* flag to disable MAP_SYNC for virtio based host page cache flush */ > + DAXDEV_BUFFERED, > }; > > /** > @@ -335,6 +337,21 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev) > } > EXPORT_SYMBOL_GPL(dax_write_cache_enabled); > > +void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc) > +{ > + if (wc) > + set_bit(DAXDEV_BUFFERED, &dax_dev->flags); > + else > + clear_bit(DAXDEV_BUFFERED, &dax_dev->flags); > +} > +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache); The "write_cache" property was structured this way because it can conceivably change at runtime. The MAP_SYNC capability should be static and never changed after init. > +bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev) > +{ > + return test_bit(DAXDEV_BUFFERED, &dax_dev->flags); > +} > +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache_enabled); Echoing Darrick and Jan this is should be a generic property of a dax_device and not specific to virtio. I don't like the "buffered" designation as that's not accurate. There may be hardware reasons why a dax_device is not synchronous, like a requirement to flush a write-pending queue or otherwise notify the device of new writes. I would just have a dax_synchronous() helper and a DAXDEV_SYNC flag. I would also modify alloc_dax() to take a flags argument so that the capability can be instantiated when the dax_device is allocated. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 15/15] drm/bochs: reserve bo for pin/unpin
On Tue, Jan 08, 2019 at 12:25:19PM +0100, Gerd Hoffmann wrote: > The buffer object must be reserved before calling > ttm_bo_validate for pinning/unpinning. > > Signed-off-by: Gerd Hoffmann Seems a bit a bisect fumble in your series here: legacy kms code reserved the ttm bo before calling boch_bo_pin/unpin, your atomic code doesn't. I think pushing this into bochs_bo_pin/unpin makes sense for atomic, but to avoid bisect fail I think you need to have these temporarily in your cleanup/prepare_plane functions too. Looked through the entire series, this here is the only issue I think should be fixed before merging (making atomic_enable optional can be done as a follow-up if you feel like it). With that addressed on the series: Acked-by: Daniel Vetter > --- > drivers/gpu/drm/bochs/bochs_mm.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/drivers/gpu/drm/bochs/bochs_mm.c > b/drivers/gpu/drm/bochs/bochs_mm.c > index cfe061c25f..970a591908 100644 > --- a/drivers/gpu/drm/bochs/bochs_mm.c > +++ b/drivers/gpu/drm/bochs/bochs_mm.c > @@ -223,7 +223,11 @@ int bochs_bo_pin(struct bochs_bo *bo, u32 pl_flag) > bochs_ttm_placement(bo, pl_flag); > for (i = 0; i < bo->placement.num_placement; i++) > bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT; > + ret = ttm_bo_reserve(&bo->bo, true, false, NULL); > + if (ret) > + return ret; > ret = ttm_bo_validate(&bo->bo, &bo->placement, &ctx); > + ttm_bo_unreserve(&bo->bo); > if (ret) > return ret; > > @@ -247,7 +251,11 @@ int bochs_bo_unpin(struct bochs_bo *bo) > > for (i = 0; i < bo->placement.num_placement; i++) > bo->placements[i].flags &= ~TTM_PL_FLAG_NO_EVICT; > + ret = ttm_bo_reserve(&bo->bo, true, false, NULL); > + if (ret) > + return ret; > ret = ttm_bo_validate(&bo->bo, &bo->placement, &ctx); > + ttm_bo_unreserve(&bo->bo); > if (ret) > return ret; > > -- > 2.9.3 > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 1/2] virtio_pci: use queue idx instead of array idx to set up the vq
Cc: sta...@vger.kernel.org Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") On 28.12.2018 03:26, Wei Wang wrote: > When find_vqs, there will be no vq[i] allocation if its corresponding > names[i] is NULL. For example, the caller may pass in names[i] (i=4) > with names[2] being NULL because the related feature bit is turned off, > so technically there are 3 queues on the device, and name[4] should > correspond to the 3rd queue on the device. > > So we use queue_idx as the queue index, which is increased only when the > queue exists. > > Signed-off-by: Wei Wang > --- > drivers/virtio/virtio_pci_common.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/virtio/virtio_pci_common.c > b/drivers/virtio/virtio_pci_common.c > index 465a6f5..d0584c0 100644 > --- a/drivers/virtio/virtio_pci_common.c > +++ b/drivers/virtio/virtio_pci_common.c > @@ -285,7 +285,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, > unsigned nvqs, > { > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > u16 msix_vec; > - int i, err, nvectors, allocated_vectors; > + int i, err, nvectors, allocated_vectors, queue_idx = 0; > > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL); > if (!vp_dev->vqs) > @@ -321,7 +321,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, > unsigned nvqs, > msix_vec = allocated_vectors++; > else > msix_vec = VP_MSIX_VQ_VECTOR; > - vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i], > + vqs[i] = vp_setup_vq(vdev, queue_idx++, callbacks[i], names[i], >ctx ? ctx[i] : false, >msix_vec); > if (IS_ERR(vqs[i])) { > @@ -356,7 +356,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, > unsigned nvqs, > const char * const names[], const bool *ctx) > { > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > - int i, err; > + int i, err, queue_idx = 0; > > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL); > if (!vp_dev->vqs) > @@ -374,7 +374,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, > unsigned nvqs, > vqs[i] = NULL; > continue; > } > - vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i], > + vqs[i] = vp_setup_vq(vdev, queue_idx++, callbacks[i], names[i], >ctx ? ctx[i] : false, >VIRTIO_MSI_NO_VECTOR); > if (IS_ERR(vqs[i])) { > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 15/15] drm/bochs: reserve bo for pin/unpin
On Wed, Jan 09, 2019 at 11:10:44AM +0100, Daniel Vetter wrote: > On Tue, Jan 08, 2019 at 12:25:19PM +0100, Gerd Hoffmann wrote: > > The buffer object must be reserved before calling > > ttm_bo_validate for pinning/unpinning. > > > > Signed-off-by: Gerd Hoffmann > > Seems a bit a bisect fumble in your series here: legacy kms code reserved > the ttm bo before calling boch_bo_pin/unpin, your atomic code doesn't. I > think pushing this into bochs_bo_pin/unpin makes sense for atomic, but to > avoid bisect fail I think you need to have these temporarily in your > cleanup/prepare_plane functions too. I think I've sorted that. Have some other changes too, will probably send v3 tomorrow. > Looked through the entire series, this here is the only issue I think > should be fixed before merging (making atomic_enable optional can be done > as a follow-up if you feel like it). With that addressed on the series: > > Acked-by: Daniel Vetter Thanks. While being at it: I'm also looking at dma-buf export and import support for the qemu drivers. Right now both qxl and virtio have gem_prime_get_sg_table and gem_prime_import_sg_table handlers which throw a WARN_ONCE() and return an error. If I understand things correctly it is valid to set all import/export callbacks (prime_handle_to_fd, prime_fd_to_handle, gem_prime_get_sg_table, gem_prime_import_sg_table) to NULL when not supporting dma-buf import/export and still advertise DRIVER_PRIME to indicate the other prime callbacks are supported (so generic fbdev emulation can use gem_prime_vmap etc). Is that correct? On exporting: TTM_PL_TT should be easy, just pin the buffer, grab the pages list and feed that into drm_prime_pages_to_sg. Didn't try yet though. Is that approach correct? Is it possible to export TTM_PL_VRAM objects (with backing storage being a pci memory bar)? If so, how? On importing: Importing into TTM_PL_TT object looks easy again, at least when the object is actually stored in RAM. What if not? Importing into TTM_PL_VRAM: Impossible I think, without copying over the data. Should that be done? If so, how? Or is it better to just not support import then? thanks, Gerd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 4/5] ext4: disable map_sync for virtio pmem
Virtio pmem provides asynchronous host page cache flush mechanism. We don't support 'MAP_SYNC' with virtio pmem and ext4. Signed-off-by: Pankaj Gupta --- fs/ext4/file.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 69d65d4..e54f48b 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -360,8 +360,10 @@ static const struct vm_operations_struct ext4_file_vm_ops = { static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma) { struct inode *inode = file->f_mapping->host; + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + struct dax_device *dax_dev = sbi->s_daxdev; - if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb + if (unlikely(ext4_forced_shutdown(sbi))) return -EIO; /* @@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma) if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC)) return -EOPNOTSUPP; + /* We don't support synchronous mappings with guest direct access +* and virtio based host page cache flush mechanism. +*/ + if (IS_DAX(file_inode(file)) && virtio_pmem_host_cache_enabled(dax_dev) + && (vma->vm_flags & VM_SYNC)) + return -EOPNOTSUPP; + file_accessed(file); if (IS_DAX(file_inode(file))) { vma->vm_ops = &ext4_dax_vm_ops; -- 2.9.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag
This patch adds 'DAXDEV_BUFFERED' flag which is set for virtio pmem corresponding nd_region. This later is used to disable MAP_SYNC functionality for ext4 & xfs filesystem. Signed-off-by: Pankaj Gupta --- drivers/dax/super.c | 17 + drivers/nvdimm/pmem.c| 3 +++ drivers/nvdimm/region_devs.c | 7 +++ drivers/virtio/pmem.c| 1 + include/linux/dax.h | 9 + include/linux/libnvdimm.h| 6 ++ 6 files changed, 43 insertions(+) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 6e928f3..9128740 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -167,6 +167,8 @@ enum dax_device_flags { DAXDEV_ALIVE, /* gate whether dax_flush() calls the low level flush routine */ DAXDEV_WRITE_CACHE, + /* flag to disable MAP_SYNC for virtio based host page cache flush */ + DAXDEV_BUFFERED, }; /** @@ -335,6 +337,21 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev) } EXPORT_SYMBOL_GPL(dax_write_cache_enabled); +void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc) +{ + if (wc) + set_bit(DAXDEV_BUFFERED, &dax_dev->flags); + else + clear_bit(DAXDEV_BUFFERED, &dax_dev->flags); +} +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache); + +bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev) +{ + return test_bit(DAXDEV_BUFFERED, &dax_dev->flags); +} +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache_enabled); + bool dax_alive(struct dax_device *dax_dev) { lockdep_assert_held(&dax_srcu); diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index fe1217b..8d190a3 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -472,6 +472,9 @@ static int pmem_attach_disk(struct device *dev, return -ENOMEM; } dax_write_cache(dax_dev, nvdimm_has_cache(nd_region)); + + /* Set buffered bit in 'dax_dev' for virtio pmem */ + virtio_pmem_host_cache(dax_dev, nvdimm_is_buffered(nd_region)); pmem->dax_dev = dax_dev; gendev = disk_to_dev(disk); diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index f8218b4..1f8b2be 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -1264,6 +1264,13 @@ int nd_region_conflict(struct nd_region *nd_region, resource_size_t start, return device_for_each_child(&nvdimm_bus->dev, &ctx, region_conflict); } +int nvdimm_is_buffered(struct nd_region *nd_region) +{ + return is_nd_pmem(&nd_region->dev) && + test_bit(ND_REGION_BUFFERED, &nd_region->flags); +} +EXPORT_SYMBOL_GPL(nvdimm_is_buffered); + void __exit nd_region_devs_exit(void) { ida_destroy(®ion_ida); diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c index 51f5349..901767b 100644 --- a/drivers/virtio/pmem.c +++ b/drivers/virtio/pmem.c @@ -81,6 +81,7 @@ static int virtio_pmem_probe(struct virtio_device *vdev) ndr_desc.numa_node = nid; ndr_desc.flush = virtio_pmem_flush; set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags); + set_bit(ND_REGION_BUFFERED, &ndr_desc.flags); nd_region = nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc); nd_region->provider_data = dev_to_virtio (nd_region->dev.parent->parent); diff --git a/include/linux/dax.h b/include/linux/dax.h index 0dd316a..d16e03e 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -37,6 +37,8 @@ void put_dax(struct dax_device *dax_dev); void kill_dax(struct dax_device *dax_dev); void dax_write_cache(struct dax_device *dax_dev, bool wc); bool dax_write_cache_enabled(struct dax_device *dax_dev); +void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc); +bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev); #else static inline struct dax_device *dax_get_by_host(const char *host) { @@ -64,6 +66,13 @@ static inline bool dax_write_cache_enabled(struct dax_device *dax_dev) { return false; } +static inline void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc) +{ +} +static inline bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev) +{ + return false; +} #endif struct writeback_control; diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h index ca8bc07..94616f1 100644 --- a/include/linux/libnvdimm.h +++ b/include/linux/libnvdimm.h @@ -64,6 +64,11 @@ enum { */ ND_REGION_PERSIST_MEMCTRL = 2, + /* provides virtio based asynchronous flush mechanism for buffered +* host page cache. +*/ + ND_REGION_BUFFERED = 3, + /* mark newly adjusted resources as requiring a label update */ DPA_RESOURCE_ADJUSTED = 1 << 0, }; @@ -265,6 +270,7 @@ int generic_nvdimm_flush(struct nd_region *nd_region); int nvdimm_has_flush(struct nd_region *nd_region); int nvdimm_has_cache(struct nd_region *nd_region); int nvdimm_in_overwrite(struct nvdimm *nvdimm)
[PATCH v3 2/5] virtio-pmem: Add virtio pmem driver
This patch adds virtio-pmem driver for KVM guest. Guest reads the persistent memory range information from Qemu over VIRTIO and registers it on nvdimm_bus. It also creates a nd_region object with the persistent memory range information so that existing 'nvdimm/pmem' driver can reserve this into system memory map. This way 'virtio-pmem' driver uses existing functionality of pmem driver to register persistent memory compatible for DAX capable filesystems. This also provides function to perform guest flush over VIRTIO from 'pmem' driver when userspace performs flush on DAX memory range. Signed-off-by: Pankaj Gupta --- drivers/nvdimm/virtio_pmem.c | 84 ++ drivers/virtio/Kconfig | 10 drivers/virtio/Makefile | 1 + drivers/virtio/pmem.c| 124 +++ include/linux/virtio_pmem.h | 60 +++ include/uapi/linux/virtio_ids.h | 1 + include/uapi/linux/virtio_pmem.h | 10 7 files changed, 290 insertions(+) create mode 100644 drivers/nvdimm/virtio_pmem.c create mode 100644 drivers/virtio/pmem.c create mode 100644 include/linux/virtio_pmem.h create mode 100644 include/uapi/linux/virtio_pmem.h diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c new file mode 100644 index 000..2a1b1ba --- /dev/null +++ b/drivers/nvdimm/virtio_pmem.c @@ -0,0 +1,84 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * virtio_pmem.c: Virtio pmem Driver + * + * Discovers persistent memory range information + * from host and provides a virtio based flushing + * interface. + */ +#include +#include "nd.h" + + /* The interrupt handler */ +void host_ack(struct virtqueue *vq) +{ + unsigned int len; + unsigned long flags; + struct virtio_pmem_request *req, *req_buf; + struct virtio_pmem *vpmem = vq->vdev->priv; + + spin_lock_irqsave(&vpmem->pmem_lock, flags); + while ((req = virtqueue_get_buf(vq, &len)) != NULL) { + req->done = true; + wake_up(&req->host_acked); + + if (!list_empty(&vpmem->req_list)) { + req_buf = list_first_entry(&vpmem->req_list, + struct virtio_pmem_request, list); + list_del(&vpmem->req_list); + req_buf->wq_buf_avail = true; + wake_up(&req_buf->wq_buf); + } + } + spin_unlock_irqrestore(&vpmem->pmem_lock, flags); +} +EXPORT_SYMBOL_GPL(host_ack); + + /* The request submission function */ +int virtio_pmem_flush(struct nd_region *nd_region) +{ + int err; + unsigned long flags; + struct scatterlist *sgs[2], sg, ret; + struct virtio_device *vdev = nd_region->provider_data; + struct virtio_pmem *vpmem = vdev->priv; + struct virtio_pmem_request *req; + + might_sleep(); + req = kmalloc(sizeof(*req), GFP_KERNEL); + if (!req) + return -ENOMEM; + + req->done = req->wq_buf_avail = false; + strcpy(req->name, "FLUSH"); + init_waitqueue_head(&req->host_acked); + init_waitqueue_head(&req->wq_buf); + sg_init_one(&sg, req->name, strlen(req->name)); + sgs[0] = &sg; + sg_init_one(&ret, &req->ret, sizeof(req->ret)); + sgs[1] = &ret; + + spin_lock_irqsave(&vpmem->pmem_lock, flags); + err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC); + if (err) { + dev_err(&vdev->dev, "failed to send command to virtio pmem device\n"); + + list_add_tail(&vpmem->req_list, &req->list); + spin_unlock_irqrestore(&vpmem->pmem_lock, flags); + + /* When host has read buffer, this completes via host_ack */ + wait_event(req->wq_buf, req->wq_buf_avail); + spin_lock_irqsave(&vpmem->pmem_lock, flags); + } + virtqueue_kick(vpmem->req_vq); + spin_unlock_irqrestore(&vpmem->pmem_lock, flags); + + /* When host has read buffer, this completes via host_ack */ + wait_event(req->host_acked, req->done); + err = req->ret; + kfree(req); + + return err; +}; +EXPORT_SYMBOL_GPL(virtio_pmem_flush); +MODULE_LICENSE("GPL"); diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 3589764..9f634a2 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY If unsure, say Y. +config VIRTIO_PMEM + tristate "Support for virtio pmem driver" + depends on VIRTIO + depends on LIBNVDIMM + help + This driver provides support for virtio based flushing interface + for persistent memory range. + + If unsure, say M. + config VIRTIO_BALLOON tristate "Virtio balloon driver" depends on VIRTIO diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index 3a2b5c5..143ce91 100644 --- a/drivers/virtio/Makefile +++ b/driv
[PATCH v3 1/5] libnvdimm: nd_region flush callback support
This patch adds functionality to perform flush from guest to host over VIRTIO. We are registering a callback based on 'nd_region' type. virtio_pmem driver requires this special flush function. For rest of the region types we are registering existing flush function. Report error returned by host fsync failure to userspace. This also handles asynchronous flush requests from the block layer by creating a child bio and chaining it with parent bio. Signed-off-by: Pankaj Gupta --- drivers/acpi/nfit/core.c | 4 ++-- drivers/nvdimm/claim.c | 6 -- drivers/nvdimm/nd.h | 1 + drivers/nvdimm/pmem.c| 12 drivers/nvdimm/region_devs.c | 38 -- include/linux/libnvdimm.h| 5 - 6 files changed, 55 insertions(+), 11 deletions(-) diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index b072cfc..f154852 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -2234,7 +2234,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw, offset = to_interleave_offset(offset, mmio); writeq(cmd, mmio->addr.base + offset); - nvdimm_flush(nfit_blk->nd_region); + nvdimm_flush(nfit_blk->nd_region, NULL, false); if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH) readq(mmio->addr.base + offset); @@ -2283,7 +2283,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk, } if (rw) - nvdimm_flush(nfit_blk->nd_region); + nvdimm_flush(nfit_blk->nd_region, NULL, false); rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0; return rc; diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c index fb667bf..a1dfa06 100644 --- a/drivers/nvdimm/claim.c +++ b/drivers/nvdimm/claim.c @@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns, struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512); sector_t sector = offset >> 9; - int rc = 0; + int rc = 0, ret = 0; if (unlikely(!size)) return 0; @@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns, } memcpy_flushcache(nsio->addr + offset, buf, size); - nvdimm_flush(to_nd_region(ndns->dev.parent)); + ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL, false); + if (ret) + rc = ret; return rc; } diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h index 98317e7..d53a2d1 100644 --- a/drivers/nvdimm/nd.h +++ b/drivers/nvdimm/nd.h @@ -160,6 +160,7 @@ struct nd_region { struct nd_interleave_set *nd_set; struct nd_percpu_lane __percpu *lane; struct nd_mapping mapping[0]; + int (*flush)(struct nd_region *nd_region); }; struct nd_blk_region { diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 6071e29..5d6a4a1 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page, static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio) { + int ret = 0; blk_status_t rc = 0; bool do_acct; unsigned long start; @@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio) struct nd_region *nd_region = to_region(pmem); if (bio->bi_opf & REQ_PREFLUSH) - nvdimm_flush(nd_region); + ret = nvdimm_flush(nd_region, bio, true); do_acct = nd_iostat_start(bio, &start); bio_for_each_segment(bvec, bio, iter) { @@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio) nd_iostat_end(bio, start); if (bio->bi_opf & REQ_FUA) - nvdimm_flush(nd_region); + ret = nvdimm_flush(nd_region, bio, true); + + if (ret) + bio->bi_status = errno_to_blk_status(ret); bio_endio(bio); return BLK_QC_T_NONE; @@ -528,14 +532,14 @@ static int nd_pmem_remove(struct device *dev) sysfs_put(pmem->bb_state); pmem->bb_state = NULL; } - nvdimm_flush(to_nd_region(dev->parent)); + nvdimm_flush(to_nd_region(dev->parent), NULL, false); return 0; } static void nd_pmem_shutdown(struct device *dev) { - nvdimm_flush(to_nd_region(dev->parent)); + nvdimm_flush(to_nd_region(dev->parent), NULL, false); } static void nd_pmem_notify(struct device *dev, enum nvdimm_event event) diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index fa37afc..5508727 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -290,7 +290,9 @@ static ssize_t deep_flush_store(struct device *dev, struct device_attribute *att return rc;
[PATCH v3 0/5] kvm "virtio pmem" device
This patch series has implementation for "virtio pmem". "virtio pmem" is fake persistent memory(nvdimm) in guest which allows to bypass the guest page cache. This also implements a VIRTIO based asynchronous flush mechanism. Sharing guest kernel driver in this patchset with the changes suggested in v2. Tested with Qemu side device emulation for virtio-pmem [6]. Details of project idea for 'virtio pmem' flushing interface is shared [3] & [4]. Implementation is divided into two parts: New virtio pmem guest driver and qemu code changes for new virtio pmem paravirtualized device. 1. Guest virtio-pmem kernel driver - - Reads persistent memory range from paravirt device and registers with 'nvdimm_bus'. - 'nvdimm/pmem' driver uses this information to allocate persistent memory region and setup filesystem operations to the allocated memory. - virtio pmem driver implements asynchronous flushing interface to flush from guest to host. 2. Qemu virtio-pmem device - - Creates virtio pmem device and exposes a memory range to KVM guest. - At host side this is file backed memory which acts as persistent memory. - Qemu side flush uses aio thread pool API's and virtio for asynchronous guest multi request handling. David Hildenbrand CCed also posted a modified version[6] of qemu virtio-pmem code based on updated Qemu memory device API. Virtio-pmem errors handling: Checked behaviour of virtio-pmem for below types of errors Need suggestions on expected behaviour for handling these errors? - Hardware Errors: Uncorrectable recoverable Errors: a] virtio-pmem: - As per current logic if error page belongs to Qemu process, host MCE handler isolates(hwpoison) that page and send SIGBUS. Qemu SIGBUS handler injects exception to KVM guest. - KVM guest then isolates the page and send SIGBUS to guest userspace process which has mapped the page. b] Existing implementation for ACPI pmem driver: - Handles such errors with MCE notifier and creates a list of bad blocks. Read/direct access DAX operation return EIO if accessed memory page fall in bad block list. - It also starts backgound scrubbing. - Similar functionality can be reused in virtio-pmem with MCE notifier but without scrubbing(no ACPI/ARS)? Need inputs to confirm if this behaviour is ok or needs any change? Changes from PATCH v2: [1] - Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] - Use name 'virtio pmem' in place of 'fake dax' Changes from PATCH v1: [2] - 0-day build test for build dependency on libnvdimm Changes suggested by - [Dan Williams] - Split the driver into two parts virtio & pmem - Move queuing of async block request to block layer - Add "sync" parameter in nvdimm_flush function - Use indirect call for nvdimm_flush - Don’t move declarations to common global header e.g nd.h - nvdimm_flush() return 0 or -EIO if it fails - Teach nsio_rw_bytes() that the flush can fail - Rename nvdimm_flush() to generic_nvdimm_flush() - Use 'nd_region->provider_data' for long dereferencing - Remove virtio_pmem_freeze/restore functions - Remove BSD license text with SPDX license text - Add might_sleep() in virtio_pmem_flush - [Luiz] - Make spin_lock_irqsave() narrow Changes from RFC v3 - Rebase to latest upstream - Luiz - Call ndregion->flush in place of nvdimm_flush- Luiz - kmalloc return check - Luiz - virtqueue full handling - Stefan - Don't map entire virtio_pmem_req to device - Stefan - request leak, correct sizeof req- Stefan - Move declaration to virtio_pmem.c Changes from RFC v2: - Add flush function in the nd_region in place of switching on a flag - Dan & Stefan - Add flush completion function with proper locking and wait for host side flush completion - Stefan & Dan - Keep userspace API in uapi header file - Stefan, MST - Use LE fields & New device id - MST - Indentation & spacing suggestions - MST & Eric - Remove extra header files & add licensing - Stefan Changes from RFC v1: - Reuse existing 'pmem' code for registering persistent memory and other operations instead of creating an entirely new block driver. - Use VIRTIO driver to register memory information with nvdimm_bus and create region_type accordingly. - Call VIRTIO flush from existing pmem driver. Pankaj Gupta (5): libnvdimm: nd_region flush callback support virtio-pmem: Add virtio-pmem guest driver libnvdimm: add nd_region buffered dax_dev flag ext4: disable map_sync for virtio pmem xfs: disable map_sync for virtio pmem [2] https://lkml.org/lkml/2018/8/31/407 [3] https://www.spinics.net/lists/kvm/msg149761.html [4] https://www.spinics.net/lists/kvm/msg153095.html [5] https://lkml.org/lkml/2018/8/31/413 [6] https://marc.info/?l=qemu-devel&m=153555721901824&w=2 drivers/acpi/nfit
Re: [PATCH v2 03/15] drm/bochs: atomic: add atomic_flush+atomic_enable callbacks.
On Tue, Jan 08, 2019 at 12:25:07PM +0100, Gerd Hoffmann wrote: > Conversion to atomic modesetting, step one. > Add atomic crtc helper callbacks. > > Signed-off-by: Gerd Hoffmann > --- > drivers/gpu/drm/bochs/bochs_kms.c | 25 + > 1 file changed, 25 insertions(+) > > diff --git a/drivers/gpu/drm/bochs/bochs_kms.c > b/drivers/gpu/drm/bochs/bochs_kms.c > index f7e6d1a9b3..2cbd406b1f 100644 > --- a/drivers/gpu/drm/bochs/bochs_kms.c > +++ b/drivers/gpu/drm/bochs/bochs_kms.c > @@ -115,6 +115,29 @@ static int bochs_crtc_page_flip(struct drm_crtc *crtc, > return 0; > } > > +static void bochs_crtc_atomic_enable(struct drm_crtc *crtc, > + struct drm_crtc_state *old_crtc_state) > +{ > +} A patch to make this optional in the helpers would be neat. -Daniel > + > +static void bochs_crtc_atomic_flush(struct drm_crtc *crtc, > + struct drm_crtc_state *old_crtc_state) > +{ > + struct drm_device *dev = crtc->dev; > + struct drm_pending_vblank_event *event; > + > + if (crtc->state && crtc->state->event) { > + unsigned long irqflags; > + > + spin_lock_irqsave(&dev->event_lock, irqflags); > + event = crtc->state->event; > + crtc->state->event = NULL; > + drm_crtc_send_vblank_event(crtc, event); > + spin_unlock_irqrestore(&dev->event_lock, irqflags); > + } > +} > + > + > /* These provide the minimum set of functions required to handle a CRTC */ > static const struct drm_crtc_funcs bochs_crtc_funcs = { > .set_config = drm_crtc_helper_set_config, > @@ -128,6 +151,8 @@ static const struct drm_crtc_helper_funcs > bochs_helper_funcs = { > .mode_set_base = bochs_crtc_mode_set_base, > .prepare = bochs_crtc_prepare, > .commit = bochs_crtc_commit, > + .atomic_enable = bochs_crtc_atomic_enable, > + .atomic_flush = bochs_crtc_atomic_flush, > }; > > static const uint32_t bochs_formats[] = { > -- > 2.9.3 > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] qxl: Use struct_size() in kzalloc()
On Tue, Jan 08, 2019 at 10:21:52AM -0600, Gustavo A. R. Silva wrote: > One of the more common cases of allocation size calculations is finding the > size of a structure that has a zero-sized array at the end, along with memory > for some number of elements for that array. For example: > > struct foo { > int stuff; > void *entry[]; > }; > > instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); > > Instead of leaving these open-coded and prone to type mistakes, we can now > use the new struct_size() helper: > > instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); > > This code was detected with the help of Coccinelle. Patch queued up. thanks, Gerd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2] drm/virtio: Drop deprecated load/unload initialization
On Tue, Jan 08, 2019 at 11:59:30AM -0300, Ezequiel Garcia wrote: > Move the code around so the driver is probed the bus > .probe and removed from the bus .remove callbacks. > This commit is just a cleanup and shouldn't affect > functionality. That one works. thanks, Gerd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost/vsock: fix vhost vsock cid hashing inconsistent
Adding Peter, is this the same problem that you reported me today? Can you test Zha Bins patch? Christian On 08.01.2019 09:07, Zha Bin wrote: > The vsock core only supports 32bit CID, but the Virtio-vsock spec define > CID (dst_cid and src_cid) as u64 and the upper 32bits is reserved as > zero. This inconsistency causes one bug in vhost vsock driver. The > scenarios is: > > 0. A hash table (vhost_vsock_hash) is used to map an CID to a vsock > object. And hash_min() is used to compute the hash key. hash_min() is > defined as: > (sizeof(val) <= 4 ? hash_32(val, bits) : hash_long(val, bits)). > That means the hash algorithm has dependency on the size of macro > argument 'val'. > 0. In function vhost_vsock_set_cid(), a 64bit CID is passed to > hash_min() to compute the hash key when inserting a vsock object into > the hash table. > 0. In function vhost_vsock_get(), a 32bit CID is passed to hash_min() > to compute the hash key when looking up a vsock for an CID. > > Because the different size of the CID, hash_min() returns different hash > key, thus fails to look up the vsock object for an CID. > > To fix this bug, we keep CID as u64 in the IOCTLs and virtio message > headers, but explicitly convert u64 to u32 when deal with the hash table > and vsock core. > > Fixes: 834e772c8db0 ("vhost/vsock: fix use-after-free in network stack > callers") > Link: https://github.com/stefanha/virtio/blob/vsock/trunk/content.tex > Signed-off-by: Zha Bin > Reviewed-by: Liu Jiang > --- > drivers/vhost/vsock.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c > index bc42d38ae031..3fbc068eaa9b 100644 > --- a/drivers/vhost/vsock.c > +++ b/drivers/vhost/vsock.c > @@ -642,7 +642,7 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, > u64 guest_cid) > hash_del_rcu(&vsock->hash); > > vsock->guest_cid = guest_cid; > - hash_add_rcu(vhost_vsock_hash, &vsock->hash, guest_cid); > + hash_add_rcu(vhost_vsock_hash, &vsock->hash, vsock->guest_cid); > mutex_unlock(&vhost_vsock_mutex); > > return 0; > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next V2 3/3] vhost: don't touch avail ring if in_order is negotiated
Device use descriptors table in order, so there's no need to read index from available ring. This eliminate the cache contention on available ring completely. Virito-user + vhost_kernel + XDP_DROP on 2.60GHz Broadwell Before /After SMAP on: 4.8Mpps 5.3Mpps(+10%) SMAP off: 6.6Mpps 7.0Mpps(+6%) Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 55e5aa662ad5..ab0d05262235 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2012,6 +2012,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq, __virtio16 avail_idx; __virtio16 ring_head; int ret, access; + bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER); /* Check it isn't doing very strange things with descriptor numbers. */ last_avail_idx = vq->last_avail_idx; @@ -2044,15 +2045,19 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq, /* Grab the next descriptor number they're advertising, and increment * the index we've seen. */ - if (unlikely(vhost_get_avail(vq, ring_head, -&vq->avail->ring[last_avail_idx & (vq->num - 1)]))) { - vq_err(vq, "Failed to read head: idx %d address %p\n", - last_avail_idx, - &vq->avail->ring[last_avail_idx % vq->num]); - return -EFAULT; + if (!in_order) { + if (unlikely(vhost_get_avail(vq, ring_head, + &vq->avail->ring[last_avail_idx & (vq->num - 1)]))) { + vq_err(vq, "Failed to read head: idx %d address %p\n", + last_avail_idx, + &vq->avail->ring[last_avail_idx % vq->num]); + return -EFAULT; + } + head = vhost16_to_cpu(vq, ring_head); + } else { + head = last_avail_idx & (vq->num - 1); } - head = vhost16_to_cpu(vq, ring_head); /* If their number is silly, that's an error. */ if (unlikely(head >= vq->num)) { -- 2.17.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next V2 2/3] vhost_net: support in order feature
This makes vhost_net to support in order feature. This is as simple as use datacopy path when it was negotiated. An alternative is not to advertise in order when zerocopy is enabled which tends to be suboptimal consider zerocopy may suffer from e.g HOL issues. Signed-off-by: Jason Wang --- drivers/vhost/net.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 36f3d0f49e60..0870f51a1c76 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -74,7 +74,8 @@ enum { VHOST_NET_FEATURES = VHOST_FEATURES | (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | (1ULL << VIRTIO_NET_F_MRG_RXBUF) | -(1ULL << VIRTIO_F_IOMMU_PLATFORM) +(1ULL << VIRTIO_F_IOMMU_PLATFORM) | +(1ULL << VIRTIO_F_IN_ORDER) }; enum { @@ -977,7 +978,8 @@ static void handle_tx(struct vhost_net *net) vhost_disable_notify(&net->dev, vq); vhost_net_disable_vq(net, vq); - if (vhost_sock_zcopy(sock)) + if (vhost_sock_zcopy(sock) && + !vhost_has_feature(vq, VIRTIO_F_IN_ORDER)) handle_tx_zerocopy(net, sock); else handle_tx_copy(net, sock); -- 2.17.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next V2 1/3] virtio: introduce in order feature bit
Signed-off-by: Jason Wang --- include/uapi/linux/virtio_config.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h index 1196e1c1d4f6..2698e069ed9e 100644 --- a/include/uapi/linux/virtio_config.h +++ b/include/uapi/linux/virtio_config.h @@ -78,6 +78,12 @@ /* This feature indicates support for the packed virtqueue layout. */ #define VIRTIO_F_RING_PACKED 34 +/* + * Device uses buffers in the same order in which they have been + * available. + */ +#define VIRTIO_F_IN_ORDER 35 + /* * Does the device support Single Root I/O Virtualization? */ -- 2.17.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next V2 0/3] basic in order support for vhost_net
Hi: This series implement basic in order feature support for vhost_net. This feature requires both driver and device to use descriptors in order which can simplify the implementation and optimizaton for both side. The series also implement a simple optimization that avoid read available ring. Test shows 10% performance improvement at most. More optimizations could be done on top. Changes from V1: - no code changes - add result of SMAP off Jason Wang (3): virtio: introduce in order feature bit vhost_net: support in order feature vhost: don't touch avail ring if in_order is negotiated drivers/vhost/net.c| 6 -- drivers/vhost/vhost.c | 19 --- include/uapi/linux/virtio_config.h | 6 ++ 3 files changed, 22 insertions(+), 9 deletions(-) -- 2.17.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization