Re: [PATCH] fuse: avoid unnecessary spinlock bump

2022-04-07 Thread JeffleXu



On 4/7/22 10:10 PM, Vivek Goyal wrote:
> On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote:
>> Move dmap free worker kicker inside the critical region, so that extra
>> spinlock lock/unlock could be avoided.
>>
>> Suggested-by: Liu Jiang 
>> Signed-off-by: Jeffle Xu 
> 
> Looks good to me. Have you done any testing to make sure nothing is
> broken.

xfstests -g quick shows no regression. The tested virtiofs is mounted
with "dax=always".

-- 
Thanks,
Jeffle
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-07 Thread Alex Williamson
On Thu, 7 Apr 2022 12:23:31 -0300
Jason Gunthorpe  wrote:

> On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote:
> 
> > For the specific case of overriding PCIe No Snoop (which is more problematic
> > from an Arm SMMU PoV) when assigning to a VM, would that not be easier
> > solved by just having vfio-pci clear the "Enable No Snoop" control bit in
> > the endpoint's PCIe capability?  
> 
> Ideally.
> 
> That was rediscussed recently, apparently there are non-compliant
> devices and drivers that just ignore the bit. 
> 
> Presumably this is why x86 had to move to an IOMMU enforced feature..

I considered this option when implementing the current solution, but
ultimately I didn't have confidence in being able to prevent drivers
from using device specific means to effect the change anyway.  GPUs
especially have various back channels to config space.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-07 Thread Christoph Hellwig
On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote:
>> My take is that the drivers using this API are doing it to make sure
>> their HW blocks are setup in a way that is consistent with the DMA API
>> they are also using, and run in constrained embedded-style
>> environments that know the firmware support is present.
>>
>> So in the end it does not seem suitable right now for linking to
>> IOMMU_CACHE..
>
> That seems a pretty good summary - I think they're basically all "firmware 
> told Linux I'm coherent so I'd better act coherent" cases, but that still 
> doesn't necessarily mean that they're *forced* to respect that.

Yes. And the interface is horribly misnamed for that.  I'll see what
I can do to clean this up as I've noticed various other not very
nice things in that area.

> One of the 
> things on my to-do list is to try adding a DMA_ATTR_NO_SNOOP that can force 
> DMA cache maintenance for coherent devices, primarily to hook up in 
> Panfrost (where there is a bit of a performance to claw back on the 
> coherent AmLogic SoCs by leaving certain buffers non-cacheable).

This has been an explicit request from the amdgpu folks and thus been
on my TODO list for quite a while as well.  Note that I don't think it
should be a flag to dma_alloc_attrs, but rather for dma_alloc_pages
as the drivers that want non-snoop generally also want to actually
be able to deal with pages.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 26/27] block: uncouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD

2022-04-07 Thread David Sterba
On Wed, Apr 06, 2022 at 08:05:15AM +0200, Christoph Hellwig wrote:
> Secure erase is a very different operation from discard in that it is
> a data integrity operation vs hint.  Fully split the limits and helper
> infrastructure to make the separation more clear.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/blk-core.c|  2 +-
>  block/blk-lib.c | 64 -
>  block/blk-mq-debugfs.c  |  1 -
>  block/blk-settings.c| 16 +++-
>  block/fops.c|  2 +-
>  block/ioctl.c   | 43 +++
>  drivers/block/drbd/drbd_receiver.c  |  5 ++-
>  drivers/block/rnbd/rnbd-clt.c   |  4 +-
>  drivers/block/rnbd/rnbd-srv-dev.h   |  2 +-
>  drivers/block/xen-blkback/blkback.c | 15 +++
>  drivers/block/xen-blkback/xenbus.c  |  5 +--
>  drivers/block/xen-blkfront.c|  5 ++-
>  drivers/md/bcache/alloc.c   |  2 +-
>  drivers/md/dm-table.c   |  8 ++--
>  drivers/md/dm-thin.c|  4 +-
>  drivers/md/md.c |  2 +-
>  drivers/md/raid5-cache.c|  6 +--
>  drivers/mmc/core/queue.c|  2 +-
>  drivers/nvme/target/io-cmd-bdev.c   |  2 +-
>  drivers/target/target_core_file.c   |  2 +-
>  drivers/target/target_core_iblock.c |  2 +-

For

>  fs/btrfs/extent-tree.c  |  4 +-

Acked-by: David Sterba 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 24/27] block: add a bdev_discard_granularity helper

2022-04-07 Thread David Sterba
On Wed, Apr 06, 2022 at 08:05:13AM +0200, Christoph Hellwig wrote:
> Abstract away implementation details from file systems by providing a
> block_device based helper to retreive the discard granularity.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/blk-lib.c |  5 ++---
>  drivers/block/drbd/drbd_nl.c|  9 +
>  drivers/block/drbd/drbd_receiver.c  |  3 +--
>  drivers/block/loop.c|  2 +-
>  drivers/target/target_core_device.c |  3 +--

For

>  fs/btrfs/ioctl.c| 12 

Acked-by: David Sterba 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 07/27] btrfs: use bdev_max_active_zones instead of open coding it

2022-04-07 Thread Christoph Hellwig
On Thu, Apr 07, 2022 at 05:20:49PM +0200, David Sterba wrote:
> On Wed, Apr 06, 2022 at 08:04:56AM +0200, Christoph Hellwig wrote:
> > Signed-off-by: Christoph Hellwig 
> 
> As it's a standalone patch I can take it (possibly with other similar
> prep btrfs patches) in current development cycle to relieve the
> inter-tree dependencies.

Unless there's a conflict in other btrfs patches it would probably be
easiest to merge everything through the block tree.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 23/27] block: add a bdev_max_discard_sectors helper

2022-04-07 Thread David Sterba
On Wed, Apr 06, 2022 at 08:05:12AM +0200, Christoph Hellwig wrote:
> Add a helper to query the number of sectors support per each discard bio
> based on the block device and use this helper to stop various places from
> poking into the request_queue to see if discard is supported and if so how
> much.  This mirrors what is done e.g. for write zeroes as well.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/blk-core.c|  2 +-
>  block/blk-lib.c |  2 +-
>  block/ioctl.c   |  3 +--
>  drivers/block/drbd/drbd_main.c  |  2 +-
>  drivers/block/drbd/drbd_nl.c| 12 +++-
>  drivers/block/drbd/drbd_receiver.c  |  5 ++---
>  drivers/block/loop.c|  9 +++--
>  drivers/block/rnbd/rnbd-srv-dev.h   |  6 +-
>  drivers/block/xen-blkback/xenbus.c  |  2 +-
>  drivers/md/bcache/request.c |  4 ++--
>  drivers/md/bcache/super.c   |  2 +-
>  drivers/md/bcache/sysfs.c   |  2 +-
>  drivers/md/dm-cache-target.c|  9 +
>  drivers/md/dm-clone-target.c|  9 +
>  drivers/md/dm-io.c  |  2 +-
>  drivers/md/dm-log-writes.c  |  3 +--
>  drivers/md/dm-raid.c|  9 ++---
>  drivers/md/dm-table.c   |  4 +---
>  drivers/md/dm-thin.c|  9 +
>  drivers/md/dm.c |  2 +-
>  drivers/md/md-linear.c  |  4 ++--
>  drivers/md/raid0.c  |  2 +-
>  drivers/md/raid1.c  |  6 +++---
>  drivers/md/raid10.c |  8 
>  drivers/md/raid5-cache.c|  2 +-
>  drivers/target/target_core_device.c |  8 +++-

For

>  fs/btrfs/extent-tree.c  |  4 ++--
>  fs/btrfs/ioctl.c|  2 +-

Acked-by: David Sterba 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 11/27] block: add a bdev_write_cache helper

2022-04-07 Thread David Sterba
On Wed, Apr 06, 2022 at 08:05:00AM +0200, Christoph Hellwig wrote:
> Add a helper to check the write cache flag based on the block_device
> instead of having to poke into the block layer internal request_queue.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/block/rnbd/rnbd-srv.c   | 2 +-
>  drivers/block/xen-blkback/xenbus.c  | 2 +-
>  drivers/target/target_core_iblock.c | 8 ++--

For

>  fs/btrfs/disk-io.c  | 3 +--

Acked-by: David Sterba 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 10/27] block: add a bdev_nonrot helper

2022-04-07 Thread David Sterba
On Wed, Apr 06, 2022 at 08:04:59AM +0200, Christoph Hellwig wrote:
> Add a helper to check the nonrot flag based on the block_device instead
> of having to poke into the block layer internal request_queue.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/ioctl.c   | 2 +-
>  drivers/block/loop.c| 2 +-
>  drivers/md/dm-table.c   | 4 +---
>  drivers/md/md.c | 3 +--
>  drivers/md/raid1.c  | 2 +-
>  drivers/md/raid10.c | 2 +-
>  drivers/md/raid5.c  | 2 +-
>  drivers/target/target_core_file.c   | 3 +--
>  drivers/target/target_core_iblock.c | 2 +-

For

>  fs/btrfs/volumes.c  | 4 ++--

Acked-by: David Sterba 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 07/27] btrfs: use bdev_max_active_zones instead of open coding it

2022-04-07 Thread David Sterba
On Wed, Apr 06, 2022 at 08:04:56AM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 

As it's a standalone patch I can take it (possibly with other similar
prep btrfs patches) in current development cycle to relieve the
inter-tree dependencies.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-07 Thread Robin Murphy

On 2022-04-07 14:59, Jason Gunthorpe wrote:

On Thu, Apr 07, 2022 at 07:18:48AM +, Tian, Kevin wrote:

From: Jason Gunthorpe 
Sent: Thursday, April 7, 2022 1:17 AM

On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote:

On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:

On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:

On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:

Oh, I didn't know about device_get_dma_attr()..


Which is completely broken for any non-OF, non-ACPI plaform.


I saw that, but I spent some time searching and could not find an
iommu driver that would load independently of OF or ACPI. ie no IOMMU
platform drivers are created by board files. Things like Intel/AMD
discover only from ACPI, etc.


Intel discovers IOMMUs (and optionally ACPI namespace devices) from
ACPI, but there is no ACPI description for PCI devices i.e. the current
logic of device_get_dma_attr() cannot be used on PCI devices.


Oh? So on x86 acpi_get_dma_attr() returns DEV_DMA_NON_COHERENT or
DEV_DMA_NOT_SUPPORTED?


I think it _should_ return DEV_DMA_COHERENT on x86/IA-64 (unless a _CCA 
method was actually present to say otherwise), based on 
acpi_init_coherency(), but I only know for sure what happens on arm64.



I think I should give up on this and just redefine the existing iommu
cap flag to IOMMU_CAP_CACHE_SUPPORTED or something.


TBH I don't see any issue with current name, but I'd certainly be happy 
to nail down a specific definition for it, along the lines of "this 
means that IOMMU_CACHE mappings are generally coherent". That works for 
things like Arm's S2FWB making it OK to assign an otherwise-non-coherent 
device without extra hassle.


For the specific case of overriding PCIe No Snoop (which is more 
problematic from an Arm SMMU PoV) when assigning to a VM, would that not 
be easier solved by just having vfio-pci clear the "Enable No Snoop" 
control bit in the endpoint's PCIe capability?



We could alternatively use existing device_get_dma_attr() as a default
with an iommu wrapper and push the exception down through the iommu
driver and s390 can override it.



if going this way probably device_get_dma_attr() should be renamed to
device_fwnode_get_dma_attr() instead to make it clearer?


I'm looking at the few users:

drivers/ata/ahci_ceva.c
drivers/ata/ahci_qoriq.c
  - These are ARM only drivers. They are trying to copy the dma-coherent
property from its DT/ACPI definition to internal register settings
which look like they tune how the AXI bus transactions are created.

I'm guessing the SATA IP block's AXI interface can be configured to
generate coherent or non-coherent requests and it has to be set
in a way that is consistent with the SOC architecture and match
what the DMA API expects the device will do.

drivers/crypto/ccp/sp-platform.c
  - Only used on ARM64 and also programs a HW register similar to the
sata drivers. Refuses to work if the FW property is not present.

drivers/net/ethernet/amd/xgbe/xgbe-platform.c
  - Seems to be configuring another ARM AXI block

drivers/gpu/drm/panfrost/panfrost_drv.c
  - Robin's commit comment here is good, and one of the things this
controls is if the coherent_walk is set for the io-pgtable-arm.c
code which avoids DMA API calls

drivers/gpu/drm/tegra/uapi.c
  - Returns DRM_TEGRA_CHANNEL_CAP_CACHE_COHERENT to userspace. No idea.

My take is that the drivers using this API are doing it to make sure
their HW blocks are setup in a way that is consistent with the DMA API
they are also using, and run in constrained embedded-style
environments that know the firmware support is present.

So in the end it does not seem suitable right now for linking to
IOMMU_CACHE..


That seems a pretty good summary - I think they're basically all 
"firmware told Linux I'm coherent so I'd better act coherent" cases, but 
that still doesn't necessarily mean that they're *forced* to respect 
that. One of the things on my to-do list is to try adding a 
DMA_ATTR_NO_SNOOP that can force DMA cache maintenance for coherent 
devices, primarily to hook up in Panfrost (where there is a bit of a 
performance to claw back on the coherent AmLogic SoCs by leaving certain 
buffers non-cacheable).


Cheers,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] fuse: avoid unnecessary spinlock bump

2022-04-07 Thread Vivek Goyal
On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote:
> Move dmap free worker kicker inside the critical region, so that extra
> spinlock lock/unlock could be avoided.
> 
> Suggested-by: Liu Jiang 
> Signed-off-by: Jeffle Xu 

Looks good to me. Have you done any testing to make sure nothing is
broken.

Reviewed-by: Vivek Goyal 

Vivek

> ---
>  fs/fuse/dax.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index d7d3a7f06862..b9f8795d52c4 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -138,9 +138,9 @@ static struct fuse_dax_mapping *alloc_dax_mapping(struct 
> fuse_conn_dax *fcd)
>   WARN_ON(fcd->nr_free_ranges <= 0);
>   fcd->nr_free_ranges--;
>   }
> + __kick_dmap_free_worker(fcd, 0);
>   spin_unlock(>lock);
>  
> - kick_dmap_free_worker(fcd, 0);
>   return dmap;
>  }
>  
> -- 
> 2.27.0
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 4/5] virtio-pci: implement synchronize_vqs()

2022-04-07 Thread Jason Wang
On Thu, Apr 7, 2022 at 3:53 PM Cornelia Huck  wrote:
>
> On Thu, Apr 07 2022, Jason Wang  wrote:
>
> > 在 2022/4/6 下午11:31, Michael S. Tsirkin 写道:
> >> On Wed, Apr 06, 2022 at 03:04:32PM +0200, Cornelia Huck wrote:
> >>> On Wed, Apr 06 2022, "Michael S. Tsirkin"  wrote:
> >>>
>  On Wed, Apr 06, 2022 at 04:35:37PM +0800, Jason Wang wrote:
> > This patch implements PCI version of synchronize_vqs().
> >
> > Cc: Thomas Gleixner 
> > Cc: Peter Zijlstra 
> > Cc: "Paul E. McKenney" 
> > Cc: Marc Zyngier 
> > Signed-off-by: Jason Wang 
>  Please add implementations at least for ccw and mmio.
> >>> I'm not sure what (if anything) can/should be done for ccw...
> >>>
> > ---
> >   drivers/virtio/virtio_pci_common.c | 14 ++
> >   drivers/virtio/virtio_pci_common.h |  2 ++
> >   drivers/virtio/virtio_pci_legacy.c |  1 +
> >   drivers/virtio/virtio_pci_modern.c |  2 ++
> >   4 files changed, 19 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_pci_common.c 
> > b/drivers/virtio/virtio_pci_common.c
> > index d724f676608b..b78c8bc93a97 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -37,6 +37,20 @@ void vp_synchronize_vectors(struct virtio_device 
> > *vdev)
> >   synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> >   }
> >
> > +void vp_synchronize_vqs(struct virtio_device *vdev)
> > +{
> > + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > + int i;
> > +
> > + if (vp_dev->intx_enabled) {
> > + synchronize_irq(vp_dev->pci_dev->irq);
> > + return;
> > + }
> > +
> > + for (i = 0; i < vp_dev->msix_vectors; ++i)
> > + synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > +}
> > +
> >>> ...given that this seems to synchronize threaded interrupt handlers?
> >> No, any handlers at all. The point is to make sure any memory changes
> >> made prior to this op are visible to callbacks.
> >>
> >> Jason, maybe add that to the documentation?
> >
> >
> > Sure.
> >
> >
> >>
> >>> Halil, do you think ccw needs to do anything? (AFAICS, we only have one
> >>> 'irq' for channel devices anyway, and the handler just calls the
> >>> relevant callbacks directly.)
> >> Then you need to synchronize with that.
> >
> >
> > Have a quick glance at the codes, it looks to me we can synchronize with
> > the IO_INTERRUPT. (Assuming all callbacks are triggered via
> > ccw_device_irq()).
>
> Not quite, adapter interrupts are device-independent, but they are
> relevant for vring interrupts.
>
> That would mean that we would need to synchronize _all_ channel I/O
> interrupts, which looks like a huge hammer. But do we really need that
> at all?

We don't, we only need to synchronize with the vring callbacks, to make sure:

1) the memory changes that is done before this op is visible to the
callbacks that came after this op
2) the memory changes that is done after this op is not visible to
callbacks that came before this op

>
> The last patch in this series seems to be concerned with the "no vring
> interrupts if the device is not ready" case, so it needs to synchronize
> vring interrupts with device reset and setting the device status to
> ready. For virtio-ccw, both reset and setting the status are channel I/O
> operations, i.e. starting a program and waiting for the final device
> interrupt for it, so synchronization (on a device level) is already
> happening in a way. So I'm not sure if any extra synchronization is
> actually needed in this case, but maybe I'm misunderstanding.
>
> Do you have further use cases in mind?

Its goal is to prevent the buggy or malicus device/host from attacking
the driver/guest. One use case is the confidential computing where
guest memory is encrypted and the guest doesn't trust the hypervisor.

In that case, the device can try to raise the interrupt after
request_irq but before DRIVER_OK, which tries to trigger the vq
callbacks when the device is not fully initialized:

request_irq()
virtio_specific_setup() // vq callbacks to be triggered in the middle
virtio_device_ready()

Thanks

>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 4/5] virtio-pci: implement synchronize_vqs()

2022-04-07 Thread Cornelia Huck
On Thu, Apr 07 2022, Jason Wang  wrote:

> 在 2022/4/6 下午11:31, Michael S. Tsirkin 写道:
>> On Wed, Apr 06, 2022 at 03:04:32PM +0200, Cornelia Huck wrote:
>>> On Wed, Apr 06 2022, "Michael S. Tsirkin"  wrote:
>>>
 On Wed, Apr 06, 2022 at 04:35:37PM +0800, Jason Wang wrote:
> This patch implements PCI version of synchronize_vqs().
>
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Marc Zyngier 
> Signed-off-by: Jason Wang 
 Please add implementations at least for ccw and mmio.
>>> I'm not sure what (if anything) can/should be done for ccw...
>>>
> ---
>   drivers/virtio/virtio_pci_common.c | 14 ++
>   drivers/virtio/virtio_pci_common.h |  2 ++
>   drivers/virtio/virtio_pci_legacy.c |  1 +
>   drivers/virtio/virtio_pci_modern.c |  2 ++
>   4 files changed, 19 insertions(+)
>
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index d724f676608b..b78c8bc93a97 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -37,6 +37,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
>   synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
>   }
>   
> +void vp_synchronize_vqs(struct virtio_device *vdev)
> +{
> + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> + int i;
> +
> + if (vp_dev->intx_enabled) {
> + synchronize_irq(vp_dev->pci_dev->irq);
> + return;
> + }
> +
> + for (i = 0; i < vp_dev->msix_vectors; ++i)
> + synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> +}
> +
>>> ...given that this seems to synchronize threaded interrupt handlers?
>> No, any handlers at all. The point is to make sure any memory changes
>> made prior to this op are visible to callbacks.
>>
>> Jason, maybe add that to the documentation?
>
>
> Sure.
>
>
>>
>>> Halil, do you think ccw needs to do anything? (AFAICS, we only have one
>>> 'irq' for channel devices anyway, and the handler just calls the
>>> relevant callbacks directly.)
>> Then you need to synchronize with that.
>
>
> Have a quick glance at the codes, it looks to me we can synchronize with 
> the IO_INTERRUPT. (Assuming all callbacks are triggered via 
> ccw_device_irq()).

Not quite, adapter interrupts are device-independent, but they are
relevant for vring interrupts.

That would mean that we would need to synchronize _all_ channel I/O
interrupts, which looks like a huge hammer. But do we really need that
at all?

The last patch in this series seems to be concerned with the "no vring
interrupts if the device is not ready" case, so it needs to synchronize
vring interrupts with device reset and setting the device status to
ready. For virtio-ccw, both reset and setting the status are channel I/O
operations, i.e. starting a program and waiting for the final device
interrupt for it, so synchronization (on a device level) is already
happening in a way. So I'm not sure if any extra synchronization is
actually needed in this case, but maybe I'm misunderstanding.

Do you have further use cases in mind?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-07 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, April 7, 2022 1:17 AM
> 
> On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote:
> > On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:
> > > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > > > > > Oh, I didn't know about device_get_dma_attr()..
> > > >
> > > > Which is completely broken for any non-OF, non-ACPI plaform.
> > >
> > > I saw that, but I spent some time searching and could not find an
> > > iommu driver that would load independently of OF or ACPI. ie no IOMMU
> > > platform drivers are created by board files. Things like Intel/AMD
> > > discover only from ACPI, etc.

Intel discovers IOMMUs (and optionally ACPI namespace devices) from
ACPI, but there is no ACPI description for PCI devices i.e. the current
logic of device_get_dma_attr() cannot be used on PCI devices. 

> >
> > s390?
> 
> Ah, I missed looking in s390, hyperv and virtio..
> 
> hyperv is not creating iommu_domains, just IRQ remapping
> 
> virtio is using OF
> 
> And s390 indeed doesn't obviously have OF or ACPI parts..
> 
> This seems like it would be consistent with other things:
> 
> enum dev_dma_attr device_get_dma_attr(struct device *dev)
> {
>   const struct fwnode_handle *fwnode = dev_fwnode(dev);
>   struct acpi_device *adev = to_acpi_device_node(fwnode);
> 
>   if (is_of_node(fwnode)) {
>   if (of_dma_is_coherent(to_of_node(fwnode)))
>   return DEV_DMA_COHERENT;
>   return DEV_DMA_NON_COHERENT;
>   } else if (adev) {
>   return acpi_get_dma_attr(adev);
>   }
> 
>   /* Platform is always DMA coherent */
>   if (!IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) &&
>   !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) &&
>   !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) &&
>   device_iommu_mapped(dev))
>   return DEV_DMA_COHERENT;
>   return DEV_DMA_NOT_SUPPORTED;
> }
> EXPORT_SYMBOL_GPL(device_get_dma_attr);
> 
> ie s390 has no of or acpi but the entire platform is known DMA
> coherent at config time so allow it. Not sure we need the
> device_iommu_mapped() or not.

Probably not. If adding an iommu may change the coherency it would
come from specific IOPTE attributes for a device. The fact whether the
device is mapped by iommu doesn't tell that information.

> 
> We could alternatively use existing device_get_dma_attr() as a default
> with an iommu wrapper and push the exception down through the iommu
> driver and s390 can override it.
> 

if going this way probably device_get_dma_attr() should be renamed to
device_fwnode_get_dma_attr() instead to make it clearer?

Thanks
Kevin
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 5/5] virtio: harden vring IRQ

2022-04-07 Thread Jason Wang


在 2022/4/6 下午8:04, Michael S. Tsirkin 写道:

On Wed, Apr 06, 2022 at 04:35:38PM +0800, Jason Wang wrote:

This is a rework on the previous IRQ hardening that is done for
virtio-pci where several drawbacks were found and were reverted:

1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
that is used by some device such as virtio-blk
2) done only for PCI transport

In this patch, we tries to borrow the idea from the INTX IRQ hardening
in the reverted commit 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
by introducing a global device_ready variable for each
virtio_device. Then we can to toggle it during
virtio_reset_device()/virtio_device_ready(). A
virtio_synchornize_vqs() is used in both virtio_device_ready() and
virtio_reset_device() to synchronize with the vring callbacks. With
this, vring_interrupt() can return check and early if driver_ready is
false.

Note that the hardening is only done for vring interrupt since the
config interrupt hardening is already done in commit 22b7050a024d7
("virtio: defer config changed notifications"). But the method that is
used by config interrupt can't be reused by the vring interrupt
handler because it uses spinlock to do the synchronization which is
expensive.

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Signed-off-by: Jason Wang 
---
  drivers/virtio/virtio.c   | 11 +++
  drivers/virtio/virtio_ring.c  |  9 -
  include/linux/virtio.h|  2 ++
  include/linux/virtio_config.h |  8 
  4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 8dde44ea044a..2f3a6f8e3d9c 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -220,6 +220,17 @@ static int virtio_features_ok(struct virtio_device *dev)
   * */
  void virtio_reset_device(struct virtio_device *dev)
  {
+   if (READ_ONCE(dev->driver_ready)) {
+   /*
+* The below virtio_synchronize_vqs() guarantees that any
+* interrupt for this line arriving after
+* virtio_synchronize_vqs() has completed is guaranteed to see
+* driver_ready == false.
+*/
+   WRITE_ONCE(dev->driver_ready, false);
+   virtio_synchronize_vqs(dev);
+   }
+
dev->config->reset(dev);
  }
  EXPORT_SYMBOL_GPL(virtio_reset_device);
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cfb028ca238e..a4592e55c9f8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2127,10 +2127,17 @@ static inline bool more_used(const struct 
vring_virtqueue *vq)
return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
  }
  
-irqreturn_t vring_interrupt(int irq, void *_vq)

+irqreturn_t vring_interrupt(int irq, void *v)
  {
+   struct virtqueue *_vq = v;
+   struct virtio_device *vdev = _vq->vdev;
struct vring_virtqueue *vq = to_vvq(_vq);
  
+	if (!READ_ONCE(vdev->driver_ready)) {


I am not sure why we need READ_ONCE here, it's done under lock.



I may miss something but which lock did you mean here? (Note the irq 
handler doesn't hold the irq descriptor lock).


Thanks





Accrdingly, same thing above for READ_ONCE and WRITE_ONCE.



+   dev_warn_once(>dev, "virtio vring IRQ raised before 
DRIVER_OK");
+   return IRQ_NONE;
+   }
+
if (!more_used(vq)) {
pr_debug("virtqueue interrupt with no work for %p\n", vq);
return IRQ_NONE;
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 5464f398912a..dfa2638a293e 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -95,6 +95,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq);
   * @failed: saved value for VIRTIO_CONFIG_S_FAILED bit (for restore)
   * @config_enabled: configuration change reporting enabled
   * @config_change_pending: configuration change reported while disabled
+ * @driver_ready: whehter the driver is ready (e.g for vring callbacks)
   * @config_lock: protects configuration change reporting
   * @dev: underlying device.
   * @id: the device type identification (used to match it with a driver).
@@ -109,6 +110,7 @@ struct virtio_device {
bool failed;
bool config_enabled;
bool config_change_pending;
+   bool driver_ready;
spinlock_t config_lock;
spinlock_t vqs_list_lock; /* Protects VQs list access */
struct device dev;
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 08b73d9bbff2..c9e207bf2c9c 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -246,6 +246,14 @@ void virtio_device_ready(struct virtio_device *dev)
  {
unsigned status = dev->config->get_status(dev);
  
+	virtio_synchronize_vqs(dev);

+/*
+ * The above virtio_synchronize_vqs() make sure


makes sure


+ 

Re: [PATCH V2 4/5] virtio-pci: implement synchronize_vqs()

2022-04-07 Thread Jason Wang


在 2022/4/6 下午11:31, Michael S. Tsirkin 写道:

On Wed, Apr 06, 2022 at 03:04:32PM +0200, Cornelia Huck wrote:

On Wed, Apr 06 2022, "Michael S. Tsirkin"  wrote:


On Wed, Apr 06, 2022 at 04:35:37PM +0800, Jason Wang wrote:

This patch implements PCI version of synchronize_vqs().

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Signed-off-by: Jason Wang 

Please add implementations at least for ccw and mmio.

I'm not sure what (if anything) can/should be done for ccw...


---
  drivers/virtio/virtio_pci_common.c | 14 ++
  drivers/virtio/virtio_pci_common.h |  2 ++
  drivers/virtio/virtio_pci_legacy.c |  1 +
  drivers/virtio/virtio_pci_modern.c |  2 ++
  4 files changed, 19 insertions(+)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index d724f676608b..b78c8bc93a97 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -37,6 +37,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
  }
  
+void vp_synchronize_vqs(struct virtio_device *vdev)

+{
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   int i;
+
+   if (vp_dev->intx_enabled) {
+   synchronize_irq(vp_dev->pci_dev->irq);
+   return;
+   }
+
+   for (i = 0; i < vp_dev->msix_vectors; ++i)
+   synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
+}
+

...given that this seems to synchronize threaded interrupt handlers?

No, any handlers at all. The point is to make sure any memory changes
made prior to this op are visible to callbacks.

Jason, maybe add that to the documentation?



Sure.





Halil, do you think ccw needs to do anything? (AFAICS, we only have one
'irq' for channel devices anyway, and the handler just calls the
relevant callbacks directly.)

Then you need to synchronize with that.



Have a quick glance at the codes, it looks to me we can synchronize with 
the IO_INTERRUPT. (Assuming all callbacks are triggered via 
ccw_device_irq()).


Thanks





  /* the notify function used when creating a virt queue */
  bool vp_notify(struct virtqueue *vq)
  {


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 3/5] virtio: introduce config op to synchronize vring callbacks

2022-04-07 Thread Jason Wang


在 2022/4/6 下午7:59, Michael S. Tsirkin 写道:

On Wed, Apr 06, 2022 at 04:35:36PM +0800, Jason Wang wrote:

This patch introduce

introduces


a new

new


virtio config ops to vring
callbacks. Transport specific method is required to call
synchornize_irq() on the IRQs. For the transport that doesn't provide
synchronize_vqs(), use synchornize_rcu() as a fallback.

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Signed-off-by: Jason Wang 
---
  include/linux/virtio_config.h | 16 
  1 file changed, 16 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index b341dd62aa4d..08b73d9bbff2 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -57,6 +57,8 @@ struct virtio_shm_region {
   *include a NULL entry for vqs unused by driver
   *Returns 0 on success or error status
   * @del_vqs: free virtqueues found by find_vqs().
+ * @synchronize_vqs: synchronize with the virtqueue callbacks.
+ * vdev: the virtio_device

I think I prefer synchronize_callbacks



Ok, I will rename it.





   * @get_features: get the array of feature bits for this device.
   *vdev: the virtio_device
   *Returns the first 64 feature bits (all we currently need).
@@ -89,6 +91,7 @@ struct virtio_config_ops {
const char * const names[], const bool *ctx,
struct irq_affinity *desc);
void (*del_vqs)(struct virtio_device *);
+   void (*synchronize_vqs)(struct virtio_device *);
u64 (*get_features)(struct virtio_device *vdev);
int (*finalize_features)(struct virtio_device *vdev);
const char *(*bus_name)(struct virtio_device *vdev);
@@ -217,6 +220,19 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, 
unsigned nvqs,
  desc);
  }
  
+/**

+ * virtio_synchronize_vqs - synchronize with virtqueue callbacks
+ * @vdev: the device
+ */
+static inline
+void virtio_synchronize_vqs(struct virtio_device *dev)
+{
+   if (dev->config->synchronize_vqs)
+   dev->config->synchronize_vqs(dev);
+   else
+   synchronize_rcu();

I am not sure about this fallback and the latency impact.



Unless each transport can implement their own synchronization routine, 
we need something that can do best effort as fallback here.




Maybe synchronize_rcu_expedited is better here.



Not sure, it might lead IPIs and according to the 
Documentation/RCU/checklist.rst:



"""

    The expedited forms of these primitives have the same semantics
    as the non-expedited forms, but expediting is both expensive and
    (with the exception of synchronize_srcu_expedited()) unfriendly
    to real-time workloads.  Use of the expedited primitives should
    be restricted to rare configuration-change operations that would
    not normally be undertaken while a real-time workload is running.
    However, real-time workloads can use rcupdate.rcu_normal kernel
    boot parameter to completely disable expedited grace periods,
    though this might have performance implications.

"""

It will be expensive for real time workloads.

Thanks





+}
+
  /**
   * virtio_device_ready - enable vq use in probe function
   * @vdev: the device
--
2.25.1


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 1/5] virtio: use virtio_device_ready() in virtio_device_restore()

2022-04-07 Thread Jason Wang


在 2022/4/6 下午7:44, Michael S. Tsirkin 写道:

patch had wrong mime type. I managed to extra it but pls fix.



From: Stefano Garzarella 

It will allows us

will allow us


to do extension on virtio_device_ready() without
duplicating codes.

code


Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Signed-off-by: Stefano Garzarella 
Signed-off-by: Jason Wang 
---
  drivers/virtio/virtio.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 22f15f444f75..75c8d560bbd3 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -526,8 +526,9 @@ int virtio_device_restore(struct virtio_device *dev)
goto err;
}
  
-	/* Finally, tell the device we're all set */

-   virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+   /* If restore didn't do it, mark device DRIVER_OK ourselves. */
+   if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
+   virtio_device_ready(dev);
  
  	virtio_config_enable(dev);

it's unfortunate that this adds an extra vmexit since virtio_device_ready
calls get_status too.

We now have:

static inline
void virtio_device_ready(struct virtio_device *dev)
{
 unsigned status = dev->config->get_status(dev);
 
 BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);

 dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
}


I propose adding a helper and putting common code there.



Ok, let me fix it.

Thanks




  
--

2.25.1


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 0/5] rework on the IRQ hardening of virtio

2022-04-07 Thread Jason Wang


在 2022/4/6 下午7:36, Michael S. Tsirkin 写道:

On Wed, Apr 06, 2022 at 04:35:33PM +0800, Jason Wang wrote:

Hi All:

This is a rework on the IRQ hardening for virtio which is done
previously by the following commits are reverted:

9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
080cd7c3ac87 ("virtio-pci: harden INTX interrupts")

The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
with the assumption of the affinity managed IRQ that is used by some
virtio drivers. And what's more, it is only done for virtio-pci but
not other transports.

In this rework, I try to implement a general virtio solution which
borrows the idea of the INTX hardening by introducing a boolean for
virtqueue callback enabling and toggle it in virtio_device_ready()
and virtio_reset_device(). Then vring_interrupt() can simply check and
return early if the driver is not ready.


All of a sudden all patches are having a wrong mime type.

It is application/octet-stream; should be text/plain

Pls fix and repost, thanks!



So the patches are generated via git-format-patch and git-send-email in 
one run.


I can see many upstream patches were converted to 
application/octet-stream if From: tag is different from the sender.


Maxime told me they've also noticed the issue and it looks like a issue 
of mimecast.


Thanks





Please review.

Changes since v1:

- Use transport specific irq synchronization method when possible
- Drop the module parameter and enable the hardening unconditonally
- Tweak the barrier/ordering facilities used in the code
- Reanme irq_soft_enabled to driver_ready
- Avoid unnecssary IRQ synchornization (e.g during boot)

Jason Wang (4):
   virtio: use virtio_reset_device() when possible
   virtio: introduce config op to synchronize vring callbacks
   virtio-pci: implement synchronize_vqs()
   virtio: harden vring IRQ

Stefano Garzarella (1):
   virtio: use virtio_device_ready() in virtio_device_restore()

  drivers/virtio/virtio.c| 20 
  drivers/virtio/virtio_pci_common.c | 14 ++
  drivers/virtio/virtio_pci_common.h |  2 ++
  drivers/virtio/virtio_pci_legacy.c |  1 +
  drivers/virtio/virtio_pci_modern.c |  2 ++
  drivers/virtio/virtio_ring.c   |  9 -
  include/linux/virtio.h |  2 ++
  include/linux/virtio_config.h  | 24 
  8 files changed, 69 insertions(+), 5 deletions(-)

--
2.25.1


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization