[PATCH 1/1] iommu/vt-d: Remove unused iovad from dmar_domain

2022-05-26 Thread Lu Baolu
Not used anywhere. Cleanup it to avoid dead code.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 0f9df5a19ef7..a22adfbdf870 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -543,7 +543,6 @@ struct dmar_domain {
u8 set_pte_snp:1;
 
struct list_head devices;   /* all devices' list */
-   struct iova_domain iovad;   /* iova's that belong to this domain */
 
struct dma_pte  *pgd;   /* virtual address */
int gaw;/* max guest address width */
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/9] Add dynamic iommu backed bounce buffers

2022-05-26 Thread David Stevens
On Tue, May 24, 2022 at 9:27 PM Niklas Schnelle  wrote:
>
> On Fri, 2021-08-06 at 19:34 +0900, David Stevens wrote:
> > From: David Stevens 
> >
> > This patch series adds support for per-domain dynamic pools of iommu
> > bounce buffers to the dma-iommu API. This allows iommu mappings to be
> > reused while still maintaining strict iommu protection.
> >
> > This bounce buffer support is used to add a new config option that, when
> > enabled, causes all non-direct streaming mappings below a configurable
> > size to go through the bounce buffers. This serves as an optimization on
> > systems where manipulating iommu mappings is very expensive. For
> > example, virtio-iommu operations in a guest on a linux host require a
> > vmexit, involvement the VMM, and a VFIO syscall. For relatively small
> > DMA operations, memcpy can be significantly faster.
> >
> > As a performance comparison, on a device with an i5-10210U, I ran fio
> > with a VFIO passthrough NVMe drive and virtio-iommu with '--direct=1
> > --rw=read --ioengine=libaio --iodepth=64' and block sizes 4k, 16k, 64k,
> > and 128k. Test throughput increased by 2.8x, 4.7x, 3.6x, and 3.6x. Time
> > spent in iommu_dma_unmap_(page|sg) per GB processed decreased by 97%,
> > 94%, 90%, and 87%. Time spent in iommu_dma_map_(page|sg) decreased
> > by >99%, as bounce buffers don't require syncing here in the read case.
> > Running with multiple jobs doesn't serve as a useful performance
> > comparison because virtio-iommu and vfio_iommu_type1 both have big
> > locks that significantly limit mulithreaded DMA performance.
> >
> > These pooled bounce buffers are also used for subgranule mappings with
> > untrusted devices, replacing the single use bounce buffers used
> > currently. The biggest difference here is that the new implementation
> > maps a whole sglist using a single bounce buffer. The new implementation
> > does not support using bounce buffers for only some segments of the
> > sglist, so it may require more copying. However, the current
> > implementation requires per-segment iommu map/unmap operations for all
> > untrusted sglist mappings (fully aligned sglists included). On a
> > i5-10210U laptop with the internal NVMe drive made to appear untrusted,
> > fio --direct=1 --rw=read --ioengine=libaio --iodepth=64 --bs=64k showed
> > a statistically significant decrease in CPU load from 2.28% -> 2.17%
> > with the new iommu bounce buffer optimization enabled.
> >
> > Each domain's buffer pool is split into multiple power-of-2 size
> > classes. Each class allocates a fixed number of buffer slot metadata. A
> > large iova range is allocated, and each slot is assigned an iova from
> > the range. This allows the iova to be easily mapped back to the slot,
> > and allows the critical section of most pool operations to be constant
> > time. The one exception is finding a cached buffer to reuse. These are
> > only separated according to R/W permissions - the use of other
> > permissions such as IOMMU_PRIV may require a linear search through the
> > cache. However, these other permissions are rare and likely exhibit high
> > locality, so the should not be a bottleneck in practice.
> >
> > Since untrusted devices may require bounce buffers, each domain has a
> > fallback rbtree to manage single use buffers. This may be necessary if a
> > very large number of DMA operations are simultaneously in-flight, or for
> > very large individual DMA operations.
> >
> > This patch set does not use swiotlb. There are two primary ways in which
> > swiotlb isn't compatible with per-domain buffer pools. First, swiotlb
> > allocates buffers to be compatible with a single device, whereas
> > per-domain buffer pools don't handle that during buffer allocation as a
> > single buffer may end up being used by multiple devices. Second, swiotlb
> > allocation establishes the original to bounce buffer mapping, which
> > again doesn't work if buffers can be reused. Effectively the only code
> > that can be shared between the two use cases is allocating slots from
> > the swiotlb's memory. However, given that we're going to be allocating
> > memory for use with an iommu, allocating memory from a block of memory
> > explicitly set aside to deal with a lack of iommu seems kind of
> > contradictory. At best there might be a small performance improvement if
> > wiotlb allocation is faster than regular page allocation, but buffer
> > allocation isn't on the hot path anyway.
> >
> > Not using the swiotlb has the benefit that memory doesn't have to be
> > preallocated. Instead, bounce buffers consume memory only for in-flight
> > dma transactions (ignoring temporarily cached buffers), which is the
> > smallest amount possible. This makes it easier to use bounce buffers as
> > an optimization on systems with large numbers of devices or in
> > situations where devices are unknown, since it is not necessary to try
> > to tune how much memory needs to be set aside to achieve good
> > performance without 

Re: [PATCH v2 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-05-26 Thread Damien Le Moal via iommu
On 2022/05/26 19:28, John Garry wrote:
> Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
> allows the drivers to know the optimal mapping limit and thus limit the
> requested IOVA lengths.
> 
> This value is based on the IOVA rcache range limit, as IOVAs allocated
> above this limit must always be newly allocated, which may be quite slow.
> 
> Signed-off-by: John Garry 
> ---
>  drivers/iommu/dma-iommu.c | 6 ++
>  drivers/iommu/iova.c  | 5 +
>  include/linux/iova.h  | 2 ++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 09f6e1c0f9c0..f619e41b9172 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1442,6 +1442,11 @@ static unsigned long 
> iommu_dma_get_merge_boundary(struct device *dev)
>   return (1UL << __ffs(domain->pgsize_bitmap)) - 1;
>  }
>  
> +static size_t iommu_dma_opt_mapping_size(void)
> +{
> + return iova_rcache_range();
> +}
> +
>  static const struct dma_map_ops iommu_dma_ops = {
>   .alloc  = iommu_dma_alloc,
>   .free   = iommu_dma_free,
> @@ -1462,6 +1467,7 @@ static const struct dma_map_ops iommu_dma_ops = {
>   .map_resource   = iommu_dma_map_resource,
>   .unmap_resource = iommu_dma_unmap_resource,
>   .get_merge_boundary = iommu_dma_get_merge_boundary,
> + .opt_mapping_size   = iommu_dma_opt_mapping_size,
>  };
>  
>  /*
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index db77aa675145..9f00b58d546e 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -26,6 +26,11 @@ static unsigned long iova_rcache_get(struct iova_domain 
> *iovad,
>  static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain 
> *iovad);
>  static void free_iova_rcaches(struct iova_domain *iovad);
>  
> +unsigned long iova_rcache_range(void)
> +{
> + return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1);
> +}
> +
>  static int iova_cpuhp_dead(unsigned int cpu, struct hlist_node *node)
>  {
>   struct iova_domain *iovad;
> diff --git a/include/linux/iova.h b/include/linux/iova.h
> index 320a70e40233..c6ba6d95d79c 100644
> --- a/include/linux/iova.h
> +++ b/include/linux/iova.h
> @@ -79,6 +79,8 @@ static inline unsigned long iova_pfn(struct iova_domain 
> *iovad, dma_addr_t iova)
>  int iova_cache_get(void);
>  void iova_cache_put(void);
>  
> +unsigned long iova_rcache_range(void);
> +
>  void free_iova(struct iova_domain *iovad, unsigned long pfn);
>  void __free_iova(struct iova_domain *iovad, struct iova *iova);
>  struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size,

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC PATCH V3 2/2] net: netvsc: Allocate per-device swiotlb bounce buffer for netvsc

2022-05-26 Thread Dexuan Cui via iommu
> From: Tianyu Lan 
> Sent: Thursday, May 26, 2022 5:01 AM
> ...
> @@ -119,6 +124,10 @@ static void netvsc_subchan_work(struct work_struct
> *w)
>   nvdev->max_chn = 1;
>   nvdev->num_chn = 1;
>   }
> +
> + /* Allocate boucne buffer.*/
> + swiotlb_device_allocate(>device, nvdev->num_chn,
> + 10 * IO_TLB_BLOCK_UNIT);
>   }

Looks like swiotlb_device_allocate() is not called if the netvsc device
has only 1 primary channel and no sub-schannel, e.g. in the case of
single-vCPU VM?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH V3 1/2] swiotlb: Add Child IO TLB mem support

2022-05-26 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significant lock contention on the swiotlb lock.

This patch adds child IO TLB mem support to resolve spinlock overhead
among device's queues. Each device may allocate IO tlb mem and setup
child IO TLB mem according to queue number. Swiotlb code allocates
bounce buffer among child IO tlb mem iterately.

Introduce IO TLB Block unit(2MB) concepts to allocate big bounce buffer
from default pool for devices. IO TLB segment(256k) is too small for
device bounce buffer.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  38 +
 kernel/dma/swiotlb.c| 304 ++--
 2 files changed, 329 insertions(+), 13 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..a48a9d64e3c3 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -31,6 +31,14 @@ struct scatterlist;
 #define IO_TLB_SHIFT 11
 #define IO_TLB_SIZE (1 << IO_TLB_SHIFT)
 
+/*
+ * IO TLB BLOCK UNIT as device bounce buffer allocation unit.
+ * This allows device allocates bounce buffer from default io
+ * tlb pool.
+ */
+#define IO_TLB_BLOCKSIZE   (8 * IO_TLB_SEGSIZE)
+#define IO_TLB_BLOCK_UNIT  (IO_TLB_BLOCKSIZE << IO_TLB_SHIFT)
+
 /* default to 64MB */
 #define IO_TLB_DEFAULT_SIZE (64UL<<20)
 
@@ -89,6 +97,11 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @num_child:  The child io tlb mem number in the pool.
+ * @child_nslot:The number of IO TLB slot in the child IO TLB mem.
+ * @child_nblock:The number of IO TLB block in the child IO TLB mem.
+ * @child_start:The child index to start searching in the next round.
+ * @block_start:The block index to start searching in the next round.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -102,6 +115,16 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+   unsigned int num_child;
+   unsigned int child_nslot;
+   unsigned int child_nblock;
+   unsigned int child_start;
+   unsigned int block_index;
+   struct io_tlb_mem *child;
+   struct io_tlb_mem *parent;
+   struct io_tlb_block {
+   unsigned int list;
+   } *block;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -130,6 +153,10 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+int swiotlb_device_allocate(struct device *dev,
+   unsigned int area_num,
+   unsigned long size);
+void swiotlb_device_free(struct device *dev);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
@@ -162,6 +189,17 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+void swiotlb_device_free(struct device *dev)
+{
+}
+
+int swiotlb_device_allocate(struct device *dev,
+   unsigned int area_num,
+   unsigned long size)
+{
+   return -ENOMEM;
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e2ef0864eb1e..7ca22a5a1886 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -195,7 +195,8 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, 
phys_addr_t start,
unsigned long nslabs, bool late_alloc)
 {
void *vaddr = phys_to_virt(start);
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+   unsigned long bytes = nslabs << IO_TLB_SHIFT, i, j;
+   unsigned int block_num = nslabs / IO_TLB_BLOCKSIZE;
 
mem->nslabs = nslabs;
mem->start = start;
@@ -207,7 +208,36 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->force_bounce = true;
 
spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
+
+   if (mem->num_child) {
+   mem->child_nslot = nslabs / mem->num_child;
+   mem->child_nblock = block_num / mem->num_child;
+   mem->child_start = 0;
+
+   /*
+* Initialize child IO TLB mem, divide IO TLB pool
+* into child number. Reuse parent mem->slot in the
+* child mem->slot.
+*/
+   for (i = 0; i < mem->num_child; i++) {
+   

[RFC PATCH V3 2/2] net: netvsc: Allocate per-device swiotlb bounce buffer for netvsc

2022-05-26 Thread Tianyu Lan
From: Tianyu Lan 

Netvsc driver allocates device io tlb mem via calling swiotlb_device_
allocate() and set child io tlb mem number according to device queue
number. Child io tlb mem may reduce overhead of single spin lock in
device io tlb mem among multi device queues.

Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/netvsc.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 9442f751ad3a..26a8f8f84fc4 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -23,6 +23,7 @@
 
 #include 
 #include 
+#include 
 
 #include "hyperv_net.h"
 #include "netvsc_trace.h"
@@ -98,6 +99,7 @@ static void netvsc_subchan_work(struct work_struct *w)
struct netvsc_device *nvdev =
container_of(w, struct netvsc_device, subchan_work);
struct rndis_device *rdev;
+   struct hv_device *hdev;
int i, ret;
 
/* Avoid deadlock with device removal already under RTNL */
@@ -108,6 +110,9 @@ static void netvsc_subchan_work(struct work_struct *w)
 
rdev = nvdev->extension;
if (rdev) {
+   hdev = ((struct net_device_context *)
+   netdev_priv(rdev->ndev))->device_ctx;
+
ret = rndis_set_subchannel(rdev->ndev, nvdev, NULL);
if (ret == 0) {
netif_device_attach(rdev->ndev);
@@ -119,6 +124,10 @@ static void netvsc_subchan_work(struct work_struct *w)
nvdev->max_chn = 1;
nvdev->num_chn = 1;
}
+
+   /* Allocate boucne buffer.*/
+   swiotlb_device_allocate(>device, nvdev->num_chn,
+   10 * IO_TLB_BLOCK_UNIT);
}
 
rtnl_unlock();
@@ -769,6 +778,7 @@ void netvsc_device_remove(struct hv_device *device)
 
/* Release all resources */
free_netvsc_device_rcu(net_device);
+   swiotlb_device_free(>device);
 }
 
 #define RING_AVAIL_PERCENT_HIWATER 20
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH V3 0/2] swiotlb: Add child io tlb mem support

2022-05-26 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significant lock contention on the swiotlb lock.

This patch adds child IO TLB mem support to resolve spinlock overhead
among device's queues. Each device may allocate IO tlb mem and setup
child IO TLB mem according to queue number. The number child IO tlb
mem maybe set up equal with device queue number and this helps to resolve
swiotlb spinlock overhead among devices and queues.

introduces IO TLB Block concepts and swiotlb_device_allocate()
API to allocate per-device swiotlb bounce buffer. The new API Accepts
queue number as the number of child IO TLB mem to set up device's IO
TLB mem.

Patch 2 calls new allocation function in the netvsc driver to resolve
global spin lock issue.

Tianyu Lan (2):
  swiotlb: Add Child IO TLB mem support
  net: netvsc: Allocate per-device swiotlb bounce buffer for netvsc

 drivers/net/hyperv/netvsc.c |  10 ++
 include/linux/swiotlb.h |  38 +
 kernel/dma/swiotlb.c| 299 ++--
 3 files changed, 334 insertions(+), 13 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/4] libata-scsi: Cap ata_device->max_sectors according to shost->max_sectors

2022-05-26 Thread John Garry via iommu
ATA devices (struct ata_device) have a max_sectors field which is
configured internally in libata. This is then used to (re)configure the
associated sdev request queue max_sectors value from how it is earlier set
in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set
according to shost limits, which includes host DMA mapping limits.

Cap the ata_device max_sectors according to shost->max_sectors to respect
this shost limit.

Signed-off-by: John Garry 
Acked-by: Damien Le Moal 
---
 drivers/ata/libata-scsi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 06c9d90238d9..25fe89791641 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1036,6 +1036,7 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct 
ata_device *dev)
dev->flags |= ATA_DFLAG_NO_UNLOAD;
 
/* configure max sectors */
+   dev->max_sectors = min(dev->max_sectors, sdev->host->max_sectors);
blk_queue_max_hw_sectors(q, dev->max_sectors);
 
if (dev->class == ATA_DEV_ATAPI) {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-05-26 Thread John Garry via iommu
Streaming DMA mappings may be considerably slower when mappings go through
an IOMMU and the total mapping length is somewhat long. This is because the
IOMMU IOVA code allocates and free an IOVA for each mapping, which may
affect performance.

For performance reasons set the request_queue max_sectors from
dma_opt_mapping_size(), which knows this mapping limit.

In addition, the shost->max_sectors is repeatedly set for each sdev in
__scsi_init_queue(). This is unnecessary, so set once when adding the
host.

Signed-off-by: John Garry 
Reviewed-by: Damien Le Moal 
Reviewed-by: Martin K. Petersen 
---
 drivers/scsi/hosts.c| 5 +
 drivers/scsi/scsi_lib.c | 4 
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index f69b77cbf538..9563c0ac567a 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -240,6 +240,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
 
shost->dma_dev = dma_dev;
 
+   if (dma_dev->dma_mask) {
+   shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+   dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
+   }
+
/*
 * Increase usage count temporarily here so that calling
 * scsi_autopm_put_host() will trigger runtime idle if there is
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8d18cc7e510e..2d43bb8799bd 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1884,10 +1884,6 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct 
request_queue *q)
blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
}
 
-   if (dev->dma_mask) {
-   shost->max_sectors = min_t(unsigned int, shost->max_sectors,
-   dma_max_mapping_size(dev) >> SECTOR_SHIFT);
-   }
blk_queue_max_hw_sectors(q, shost->max_sectors);
blk_queue_segment_boundary(q, shost->dma_boundary);
dma_set_seg_boundary(dev, shost->dma_boundary);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-05-26 Thread John Garry via iommu
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.

This value is based on the IOVA rcache range limit, as IOVAs allocated
above this limit must always be newly allocated, which may be quite slow.

Signed-off-by: John Garry 
---
 drivers/iommu/dma-iommu.c | 6 ++
 drivers/iommu/iova.c  | 5 +
 include/linux/iova.h  | 2 ++
 3 files changed, 13 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 09f6e1c0f9c0..f619e41b9172 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1442,6 +1442,11 @@ static unsigned long iommu_dma_get_merge_boundary(struct 
device *dev)
return (1UL << __ffs(domain->pgsize_bitmap)) - 1;
 }
 
+static size_t iommu_dma_opt_mapping_size(void)
+{
+   return iova_rcache_range();
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
.alloc  = iommu_dma_alloc,
.free   = iommu_dma_free,
@@ -1462,6 +1467,7 @@ static const struct dma_map_ops iommu_dma_ops = {
.map_resource   = iommu_dma_map_resource,
.unmap_resource = iommu_dma_unmap_resource,
.get_merge_boundary = iommu_dma_get_merge_boundary,
+   .opt_mapping_size   = iommu_dma_opt_mapping_size,
 };
 
 /*
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145..9f00b58d546e 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -26,6 +26,11 @@ static unsigned long iova_rcache_get(struct iova_domain 
*iovad,
 static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad);
 static void free_iova_rcaches(struct iova_domain *iovad);
 
+unsigned long iova_rcache_range(void)
+{
+   return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1);
+}
+
 static int iova_cpuhp_dead(unsigned int cpu, struct hlist_node *node)
 {
struct iova_domain *iovad;
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 320a70e40233..c6ba6d95d79c 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -79,6 +79,8 @@ static inline unsigned long iova_pfn(struct iova_domain 
*iovad, dma_addr_t iova)
 int iova_cache_get(void);
 void iova_cache_put(void);
 
+unsigned long iova_rcache_range(void);
+
 void free_iova(struct iova_domain *iovad, unsigned long pfn);
 void __free_iova(struct iova_domain *iovad, struct iova *iova);
 struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/4] DMA mapping changes for SCSI core

2022-05-26 Thread John Garry via iommu
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching
limit may see a big performance hit.

This series introduces a new DMA mapping API, dma_opt_mapping_size(), so
that drivers may know this limit when performance is a factor in the
mapping.

Robin didn't like using dma_max_mapping_size() for this [1].

The SCSI core code is modified to use this limit.

I also added a patch for libata-scsi as it does not currently honour the
shost max_sectors limit.

[0] 
https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/
[1] 
https://lore.kernel.org/linux-iommu/f5b78c9c-312e-70ab-ecbb-f14623a4b...@arm.com/

Changes since v1:
- Relocate scsi_add_host_with_dma() dma_dev check (Reported by Dan)
- Add tags from Damien and Martin (thanks)
  - note: I only added Martin's tag to the SCSI patch

John Garry (4):
  dma-mapping: Add dma_opt_mapping_size()
  dma-iommu: Add iommu_dma_opt_mapping_size()
  scsi: core: Cap shost max_sectors according to DMA optimum mapping
limits
  libata-scsi: Cap ata_device->max_sectors according to
shost->max_sectors

 Documentation/core-api/dma-api.rst |  9 +
 drivers/ata/libata-scsi.c  |  1 +
 drivers/iommu/dma-iommu.c  |  6 ++
 drivers/iommu/iova.c   |  5 +
 drivers/scsi/hosts.c   |  5 +
 drivers/scsi/scsi_lib.c|  4 
 include/linux/dma-map-ops.h|  1 +
 include/linux/dma-mapping.h|  5 +
 include/linux/iova.h   |  2 ++
 kernel/dma/mapping.c   | 12 
 10 files changed, 46 insertions(+), 4 deletions(-)

-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/4] dma-mapping: Add dma_opt_mapping_size()

2022-05-26 Thread John Garry via iommu
Streaming DMA mapping involving an IOMMU may be much slower for larger
total mapping size. This is because every IOMMU DMA mapping requires an
IOVA to be allocated and freed. IOVA sizes above a certain limit are not
cached, which can have a big impact on DMA mapping performance.

Provide an API for device drivers to know this "optimal" limit, such that
they may try to produce mapping which don't exceed it.

Signed-off-by: John Garry 
Reviewed-by: Damien Le Moal 
---
 Documentation/core-api/dma-api.rst |  9 +
 include/linux/dma-map-ops.h|  1 +
 include/linux/dma-mapping.h|  5 +
 kernel/dma/mapping.c   | 12 
 4 files changed, 27 insertions(+)

diff --git a/Documentation/core-api/dma-api.rst 
b/Documentation/core-api/dma-api.rst
index 6d6d0edd2d27..b3cd9763d28b 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -204,6 +204,15 @@ Returns the maximum size of a mapping for the device. The 
size parameter
 of the mapping functions like dma_map_single(), dma_map_page() and
 others should not be larger than the returned value.
 
+::
+
+   size_t
+   dma_opt_mapping_size(struct device *dev);
+
+Returns the maximum optimal size of a mapping for the device. Mapping large
+buffers may take longer so device drivers are advised to limit total DMA
+streaming mappings length to the returned value.
+
 ::
 
bool
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d5b06b3a4a6..98ceba6fa848 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -69,6 +69,7 @@ struct dma_map_ops {
int (*dma_supported)(struct device *dev, u64 mask);
u64 (*get_required_mask)(struct device *dev);
size_t (*max_mapping_size)(struct device *dev);
+   size_t (*opt_mapping_size)(void);
unsigned long (*get_merge_boundary)(struct device *dev);
 };
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index dca2b1355bb1..fe3849434b2a 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -144,6 +144,7 @@ int dma_set_mask(struct device *dev, u64 mask);
 int dma_set_coherent_mask(struct device *dev, u64 mask);
 u64 dma_get_required_mask(struct device *dev);
 size_t dma_max_mapping_size(struct device *dev);
+size_t dma_opt_mapping_size(struct device *dev);
 bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
 unsigned long dma_get_merge_boundary(struct device *dev);
 struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
@@ -266,6 +267,10 @@ static inline size_t dma_max_mapping_size(struct device 
*dev)
 {
return 0;
 }
+static inline size_t dma_opt_mapping_size(struct device *dev)
+{
+   return 0;
+}
 static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
 {
return false;
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index db7244291b74..1bfe11b1edb6 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -773,6 +773,18 @@ size_t dma_max_mapping_size(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(dma_max_mapping_size);
 
+size_t dma_opt_mapping_size(struct device *dev)
+{
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+   size_t size = SIZE_MAX;
+
+   if (ops && ops->opt_mapping_size)
+   size = ops->opt_mapping_size();
+
+   return min(dma_max_mapping_size(dev), size);
+}
+EXPORT_SYMBOL_GPL(dma_opt_mapping_size);
+
 bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/7] iommu: mtk_iommu: Lookup phandle to retrieve syscon to pericfg

2022-05-26 Thread Yong Wu via iommu
On Wed, 2022-05-18 at 12:04 +0200, AngeloGioacchino Del Regno wrote:
> On some SoCs (of which only MT8195 is supported at the time of
> writing),
> the "R" and "W" (I/O) enable bits for the IOMMUs are in the
> pericfg_ao
> register space and not in the IOMMU space: as it happened already
> with
> infracfg, it is expected that this list will grow.
> 
> Instead of specifying pericfg compatibles on a per-SoC basis,
> following
> what was done with infracfg, let's lookup the syscon by phandle
> instead.
> Also following the previous infracfg change, add a warning for
> outdated
> devicetrees, in hope that the user will take action.
> 
> Signed-off-by: AngeloGioacchino Del Regno <
> angelogioacchino.delre...@collabora.com>
> ---
>  drivers/iommu/mtk_iommu.c | 26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index d16b95e71ded..090cf6e15f85 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -138,6 +138,8 @@
>  /* PM and clock always on. e.g. infra iommu */
>  #define PM_CLK_AOBIT(15)
>  #define IFA_IOMMU_PCIE_SUPPORT   BIT(16)
> +/* IOMMU I/O (r/w) is enabled using PERICFG_IOMMU_1 register */
> +#define HAS_PERI_IOMMU1_REG  BIT(17)
>  
>  #define MTK_IOMMU_HAS_FLAG_MASK(pdata, _x, mask) \
>   pdata)->flags) & (mask)) == (_x))
> @@ -187,7 +189,6 @@ struct mtk_iommu_plat_data {
>   u32 flags;
>   u32 inv_sel_reg;
>  
> - char*pericfg_comp_str;
>   struct list_head*hw_list;
>   unsigned intiova_region_nr;
>   const struct mtk_iommu_iova_region  *iova_region;
> @@ -1214,14 +1215,19 @@ static int mtk_iommu_probe(struct
> platform_device *pdev)
>   goto out_runtime_disable;
>   }
>   } else if (MTK_IOMMU_IS_TYPE(data->plat_data,
> MTK_IOMMU_TYPE_INFRA) &&
> -data->plat_data->pericfg_comp_str) {
> - infracfg = syscon_regmap_lookup_by_compatible(data-
> >plat_data->pericfg_comp_str);
> - if (IS_ERR(infracfg)) {
> - ret = PTR_ERR(infracfg);
> - goto out_runtime_disable;
> - }
> +MTK_IOMMU_HAS_FLAG(data->plat_data,
> HAS_PERI_IOMMU1_REG)) {
> + data->pericfg = syscon_regmap_lookup_by_phandle(dev-
> >of_node, "mediatek,pericfg");

I'm not keen to add this property. Currently only mt8195 use this
setting. In the lastest SoC, we move this setting to ATF. thus I think
we could keep the current way, no need add a new DT property only for
mt8195.

> + if (IS_ERR(data->pericfg)) {
> + dev_info(dev, "Cannot find phandle to
> mediatek,pericfg:"
> +   " Please update your
> devicetree.\n");
>  
> - data->pericfg = infracfg;
> + p = "mediatek,mt8195-pericfg_ao";
> + data->pericfg =
> syscon_regmap_lookup_by_compatible(p);
> + if (IS_ERR(data->pericfg)) {
> + ret = PTR_ERR(data->pericfg);
> + goto out_runtime_disable;
> + }
> + }
>   }
>  
>   platform_set_drvdata(pdev, data);
> @@ -1480,8 +1486,8 @@ static const struct mtk_iommu_plat_data
> mt8192_data = {
>  static const struct mtk_iommu_plat_data mt8195_data_infra = {
>   .m4u_plat = M4U_MT8195,
>   .flags= WR_THROT_EN | DCM_DISABLE | STD_AXI_MODE |
> PM_CLK_AO |
> - MTK_IOMMU_TYPE_INFRA |
> IFA_IOMMU_PCIE_SUPPORT,
> - .pericfg_comp_str = "mediatek,mt8195-pericfg_ao",
> + HAS_PERI_IOMMU1_REG | MTK_IOMMU_TYPE_INFRA
> |
> + IFA_IOMMU_PCIE_SUPPORT,
>   .inv_sel_reg  = REG_MMU_INV_SEL_GEN2,
>   .banks_num= 5,
>   .banks_enable = {true, false, false, false, true},

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/7] iommu: mtk_iommu: Lookup phandle to retrieve syscon to infracfg

2022-05-26 Thread Yong Wu via iommu
On Wed, 2022-05-18 at 12:04 +0200, AngeloGioacchino Del Regno wrote:
> This driver will get support for more SoCs and the list of infracfg
> compatibles is expected to grow: in order to prevent getting this
> situation out of control and see a long list of compatible strings,
> add support to retrieve a handle to infracfg's regmap through a
> new "mediatek,infracfg" phandle.
> 
> In order to keep retrocompatibility with older devicetrees, the old
> way is kept in place, but also a dev_warn() was added to advertise
> this change in hope that the user will see it and eventually update
> the devicetree if this is possible.
> 
> Signed-off-by: AngeloGioacchino Del Regno <
> angelogioacchino.delre...@collabora.com>
> ---
>  drivers/iommu/mtk_iommu.c | 40 +--
> 
>  1 file changed, 26 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 71b2ace74cd6..d16b95e71ded 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -1134,22 +1134,34 @@ static int mtk_iommu_probe(struct
> platform_device *pdev)
>   data->protect_base = ALIGN(virt_to_phys(protect),
> MTK_PROTECT_PA_ALIGN);
>  
>   if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_4GB_MODE)) {
> - switch (data->plat_data->m4u_plat) {
> - case M4U_MT2712:
> - p = "mediatek,mt2712-infracfg";
> - break;
> - case M4U_MT8173:
> - p = "mediatek,mt8173-infracfg";
> - break;
> - default:
> - p = NULL;
> + infracfg = syscon_regmap_lookup_by_phandle(dev-
> >of_node, "mediatek,infracfg");
> + if (IS_ERR(infracfg)) {
> + dev_info(dev, "Cannot find phandle to
> mediatek,infracfg:"
> +   " Please update your
> devicetree.\n");

Remove the log from Robin?

> + /*
> +  * Legacy devicetrees will not specify a
> phandle to
> +  * mediatek,infracfg: in that case, we use the
> older
> +  * way to retrieve a syscon to infra.
> +  *
> +  * This is for retrocompatibility purposes
> only, hence
> +  * no more compatibles shall be added to this.
> +  */
> + switch (data->plat_data->m4u_plat) {
> + case M4U_MT2712:
> + p = "mediatek,mt2712-infracfg";
> + break;
> + case M4U_MT8173:
> + p = "mediatek,mt8173-infracfg";
> + break;
> + default:
> + p = NULL;
> + }

We already use "mediatek,infracfg" property for commonizing. For the
previous SoC, I also prefer to put the string into the platform data.

After this, "->m4u_plat" could be removed. Of course, this is not the
main purpose of this patchset. it also is ok currently.

> +
> + infracfg =
> syscon_regmap_lookup_by_compatible(p);
> + if (IS_ERR(infracfg))
> + return PTR_ERR(infracfg);
>   }
>  
> - infracfg = syscon_regmap_lookup_by_compatible(p);
> -
> - if (IS_ERR(infracfg))
> - return PTR_ERR(infracfg);
> -
>   ret = regmap_read(infracfg, REG_INFRA_MISC, );
>   if (ret)
>   return ret;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/7] dt-bindings: iommu: mediatek: Add phandles for mediatek infra/pericfg

2022-05-26 Thread Yong Wu via iommu
On Wed, 2022-05-18 at 12:04 +0200, AngeloGioacchino Del Regno wrote:
> Add properties "mediatek,infracfg" and "mediatek,pericfg" to let the
> mtk_iommu driver retrieve phandles to the infracfg and pericfg
> syscon(s)
> instead of performing a per-soc compatible lookup.
> 
> Signed-off-by: AngeloGioacchino Del Regno <
> angelogioacchino.delre...@collabora.com>
> ---
>  .../devicetree/bindings/iommu/mediatek,iommu.yaml | 8
> 
>  1 file changed, 8 insertions(+)
> 
> diff --git
> a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> index 2ae3bbad7f1a..c4af41947593 100644
> --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> @@ -101,6 +101,10 @@ properties:
>  items:
>- const: bclk
>  
> +  mediatek,infracfg:
> +$ref: /schemas/types.yaml#/definitions/phandle
> +description: The phandle to the mediatek infracfg syscon
> +

Just curious, why not put this "mediatek,infracfg" and its required
segment[6/7] into one patch?

>mediatek,larbs:
>  $ref: /schemas/types.yaml#/definitions/phandle-array
>  minItems: 1
> @@ -112,6 +116,10 @@ properties:
>Refer to bindings/memory-controllers/mediatek,smi-larb.yaml.
> It must sort
>according to the local arbiter index, like larb0, larb1,
> larb2...
>  
> +  mediatek,pericfg:
> +$ref: /schemas/types.yaml#/definitions/phandle
> +description: The phandle to the mediatek pericfg syscon
> +
>'#iommu-cells':
>  const: 1
>  description: |

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] iommu: mtk_iommu: Add support for MT6795 Helio X10 M4Us

2022-05-26 Thread Yong Wu via iommu
On Wed, 2022-05-18 at 12:18 +0200, AngeloGioacchino Del Regno wrote:
> Add support for the M4Us found in the MT6795 Helio X10 SoC.
> 
> Signed-off-by: AngeloGioacchino Del Regno <
> angelogioacchino.delre...@collabora.com>
> ---
>  drivers/iommu/mtk_iommu.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 090cf6e15f85..97ff30ed2d0f 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -159,6 +159,7 @@
>  enum mtk_iommu_plat {
>   M4U_MT2712,
>   M4U_MT6779,
> + M4U_MT6795,
>   M4U_MT8167,
>   M4U_MT8173,
>   M4U_MT8183,
> @@ -954,7 +955,8 @@ static int mtk_iommu_hw_init(const struct
> mtk_iommu_data *data, unsigned int ban
>* Global control settings are in bank0. May re-init these
> global registers
>* since no sure if there is bank0 consumers.
>*/
> - if (data->plat_data->m4u_plat == M4U_MT8173) {
> + if (data->plat_data->m4u_plat == M4U_MT6795 ||
> + data->plat_data->m4u_plat == M4U_MT8173) {

Add a new flag for this. This setting difference is that the offset for
TF_PROT_TO_PROGRAM_ADDR is 5 in mt8173 while the others' offset is 4.
thus, we could rename the flag like TF_PORT_TO_ADDR_MT8173 or
TF_PORT_TO_ADDR_OFFSET_IS_5.

>   regval = F_MMU_PREFETCH_RT_REPLACE_MOD |
>F_MMU_TF_PROT_TO_PROGRAM_ADDR_MT8173;
>   } else {
> @@ -1422,6 +1424,18 @@ static const struct mtk_iommu_plat_data
> mt6779_data = {
>   .larbid_remap  = {{0}, {1}, {2}, {3}, {5}, {7, 8}, {10}, {9}},
>  };
>  
> +static const struct mtk_iommu_plat_data mt6795_data = {
> + .m4u_plat = M4U_MT6795,
> + .flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI |
> + HAS_LEGACY_IVRP_PADDR | MTK_IOMMU_TYPE_MM,
> + .inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
> + .banks_num= 1,
> + .banks_enable = {true},
> + .iova_region  = single_domain,
> + .iova_region_nr = ARRAY_SIZE(single_domain),
> + .larbid_remap = {{0}, {1}, {2}, {3}, {4}}, /* Linear mapping.
> */
> +};
> +
>  static const struct mtk_iommu_plat_data mt8167_data = {
>   .m4u_plat = M4U_MT8167,
>   .flags= RESET_AXI | HAS_LEGACY_IVRP_PADDR |
> MTK_IOMMU_TYPE_MM,
> @@ -1533,6 +1547,7 @@ static const struct mtk_iommu_plat_data
> mt8195_data_vpp = {
>  static const struct of_device_id mtk_iommu_of_ids[] = {
>   { .compatible = "mediatek,mt2712-m4u", .data = _data},
>   { .compatible = "mediatek,mt6779-m4u", .data = _data},
> + { .compatible = "mediatek,mt6795-m4u", .data = _data},
>   { .compatible = "mediatek,mt8167-m4u", .data = _data},
>   { .compatible = "mediatek,mt8173-m4u", .data = _data},
>   { .compatible = "mediatek,mt8183-m4u", .data = _data},

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 9/9] driver core: Delete driver_deferred_probe_check_state()

2022-05-26 Thread Saravana Kannan via iommu
The function is no longer used. So delete it.

Signed-off-by: Saravana Kannan 
---
 drivers/base/dd.c | 30 --
 include/linux/device/driver.h |  1 -
 2 files changed, 31 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index af8138d44e6c..789b0871dc45 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -274,42 +274,12 @@ static int __init deferred_probe_timeout_setup(char *str)
 }
 __setup("deferred_probe_timeout=", deferred_probe_timeout_setup);
 
-/**
- * driver_deferred_probe_check_state() - Check deferred probe state
- * @dev: device to check
- *
- * Return:
- * * -ENODEV if initcalls have completed and modules are disabled.
- * * -ETIMEDOUT if the deferred probe timeout was set and has expired
- *   and modules are enabled.
- * * -EPROBE_DEFER in other cases.
- *
- * Drivers or subsystems can opt-in to calling this function instead of 
directly
- * returning -EPROBE_DEFER.
- */
-int driver_deferred_probe_check_state(struct device *dev)
-{
-   if (!IS_ENABLED(CONFIG_MODULES) && initcalls_done) {
-   dev_warn(dev, "ignoring dependency for device, assuming no 
driver\n");
-   return -ENODEV;
-   }
-
-   if (!driver_deferred_probe_timeout && initcalls_done) {
-   dev_warn(dev, "deferred probe timeout, ignoring dependency\n");
-   return -ETIMEDOUT;
-   }
-
-   return -EPROBE_DEFER;
-}
-EXPORT_SYMBOL_GPL(driver_deferred_probe_check_state);
-
 static void deferred_probe_timeout_work_func(struct work_struct *work)
 {
struct device_private *p;
 
fw_devlink_drivers_done();
 
-   driver_deferred_probe_timeout = 0;
driver_deferred_probe_trigger();
flush_work(_probe_work);
 
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index 700453017e1c..7c245d269feb 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -241,7 +241,6 @@ driver_find_device_by_acpi_dev(struct device_driver *drv, 
const void *adev)
 
 extern int driver_deferred_probe_timeout;
 void driver_deferred_probe_add(struct device *dev);
-int driver_deferred_probe_check_state(struct device *dev);
 void driver_init(void);
 
 /**
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 8/9] net: ipconfig: Force fw_devlink to unblock any devices that might probe

2022-05-26 Thread Saravana Kannan via iommu
If there are network devices that could probe without some of their
suppliers probing and those network devices are needed for IP auto
config to work, then fw_devlink=on might break that usecase by blocking
the network devices from probing by the time IP auto config starts.

So, when IP auto config is enabled, make sure fw_devlink doesn't block
the probing of any device that has a driver by the time we get to IP
auto config.

Signed-off-by: Saravana Kannan 
---
 net/ipv4/ipconfig.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 9d41d5d5cd1e..aa7b8ba68ca6 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -1435,6 +1435,8 @@ static int __init wait_for_devices(void)
 {
int i;
 
+   fw_devlink_unblock_may_probe();
+
for (i = 0; i < DEVICE_WAIT_MAX; i++) {
struct net_device *dev;
int found = 0;
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 7/9] driver core: Add fw_devlink_unblock_may_probe() helper function

2022-05-26 Thread Saravana Kannan via iommu
This function can be used during the kernel boot sequence to forcefully
override fw_devlink=on and unblock the probing of all devices that have
a driver.

It's mainly meant to be called from late_initcall() or
late_initcall_sync() where a device needs to probe before the kernel can
mount rootfs.

Signed-off-by: Saravana Kannan 
---
 drivers/base/base.h|  1 +
 drivers/base/core.c| 58 ++
 drivers/base/dd.c  |  2 +-
 include/linux/fwnode.h |  2 ++
 4 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index ab71403d102f..b3a43a164dcd 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -160,6 +160,7 @@ extern int devres_release_all(struct device *dev);
 extern void device_block_probing(void);
 extern void device_unblock_probing(void);
 extern void deferred_probe_extend_timeout(void);
+extern void driver_deferred_probe_trigger(void);
 
 /* /sys/devices directory */
 extern struct kset *devices_kset;
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 7672f23231c1..7ff7fbb00643 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1655,6 +1655,64 @@ void fw_devlink_drivers_done(void)
device_links_write_unlock();
 }
 
+static int fw_devlink_may_probe(struct device *dev, void *data)
+{
+   struct device_link *link = to_devlink(dev);
+
+   if (!link->supplier->can_match && link->consumer->can_match)
+   fw_devlink_relax_link(link);
+
+   return 0;
+}
+
+/**
+ * fw_devlink_unblock_may_probe - Force unblock any device that has a driver
+ *
+ * This function is more of a sledge hammer than a scalpel. Use this very
+ * sparingly.
+ *
+ * Some devices might need to be probed and bound successfully before the 
kernel
+ * boot sequence can finish and move on to init/userspace. For example, a
+ * network interface might need to be bound to be able to mount a NFS rootfs.
+ *
+ * With fw_devlink=on by default, some of these devices might be blocked from
+ * probing because they are waiting on a optional supplier that doesn't have a
+ * driver. While fw_devlink will eventually identify such devices and unblock
+ * the probing automatically, it might be too late by the time it unblocks the
+ * probing of devices. For example, the IP4 autoconfig might timeout before
+ * fw_devlink unblocks probing of the network interface. This function is
+ * available to unblock the probing of such devices.
+ *
+ * Since there's no easy way to know which unprobed device needs to probe for
+ * boot to succeed, this function makes sure fw_devlink doesn't block any 
device
+ * that has a driver at the point in time this function is called.
+ *
+ * It does this by relaxing (fw_devlink=permissive behavior) all the device
+ * links created by fw_devlink where the consumer has a driver and the supplier
+ * doesn't have a driver.
+ *
+ * It's extremely unlikely that a proper use of this function will be outside 
of
+ * an initcall. So, until a case is made for that, this function is
+ * intentionally marked with __init.
+ */
+void __init fw_devlink_unblock_may_probe(void)
+{
+   struct device_link *link, *ln;
+
+   if (!fw_devlink_flags || fw_devlink_is_permissive())
+   return;
+
+   /* Wait for current probes to finish to limit impact. */
+   wait_for_device_probe();
+
+   device_links_write_lock();
+   class_for_each_device(_class, NULL, NULL,
+ fw_devlink_may_probe);
+   device_links_write_unlock();
+
+   driver_deferred_probe_trigger();
+}
+
 static void fw_devlink_unblock_consumers(struct device *dev)
 {
struct device_link *link;
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index f963d9010d7f..af8138d44e6c 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -172,7 +172,7 @@ static bool driver_deferred_probe_enable;
  * changes in the midst of a probe, then deferred processing should be 
triggered
  * again.
  */
-static void driver_deferred_probe_trigger(void)
+void driver_deferred_probe_trigger(void)
 {
if (!driver_deferred_probe_enable)
return;
diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
index 9a81c4410b9f..0770edda7068 100644
--- a/include/linux/fwnode.h
+++ b/include/linux/fwnode.h
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct fwnode_operations;
 struct device;
@@ -199,5 +200,6 @@ extern bool fw_devlink_is_strict(void);
 int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup);
 void fwnode_links_purge(struct fwnode_handle *fwnode);
 void fw_devlink_purge_absent_suppliers(struct fwnode_handle *fwnode);
+void __init fw_devlink_unblock_may_probe(void);
 
 #endif
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 6/9] iommu/of: Delete usage of driver_deferred_probe_check_state()

2022-05-26 Thread Saravana Kannan via iommu
Now that fw_devlink=on and fw_devlink.strict=1 by default and fw_devlink
supports iommu DT properties, the execution will never get to the point
where driver_deferred_probe_check_state() is called before the supplier
has probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan 
---
 drivers/iommu/of_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 5696314ae69e..41f4eb005219 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -40,7 +40,7 @@ static int of_iommu_xlate(struct device *dev,
 * a proper probe-ordering dependency mechanism in future.
 */
if (!ops)
-   return driver_deferred_probe_check_state(dev);
+   return -ENODEV;
 
if (!try_module_get(ops->owner))
return -ENODEV;
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 5/9] driver core: Set fw_devlink.strict=1 by default

2022-05-26 Thread Saravana Kannan via iommu
Now that deferred_probe_timeout is non-zero by default, fw_devlink will
never permanently block the probing of devices. It'll try its best to
probe the devices in the right order and then finally let devices probe
even if their suppliers don't have any drivers.

Signed-off-by: Saravana Kannan 
---
 drivers/base/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 7cd789c4985d..7672f23231c1 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1581,7 +1581,7 @@ static int __init fw_devlink_setup(char *arg)
 }
 early_param("fw_devlink", fw_devlink_setup);
 
-static bool fw_devlink_strict;
+static bool fw_devlink_strict = true;
 static int __init fw_devlink_strict_setup(char *arg)
 {
return strtobool(arg, _devlink_strict);
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 4/9] Revert "driver core: Set default deferred_probe_timeout back to 0."

2022-05-26 Thread Saravana Kannan via iommu
This reverts commit 11f7e7ef553b6b93ac1aa74a3c2011b9cc8aeb61.

Let's take another shot at getting deferred_probe_timeout=10 to work.

Signed-off-by: Saravana Kannan 
---
 drivers/base/dd.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 11b0fb6414d3..f963d9010d7f 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -256,7 +256,12 @@ static int deferred_devs_show(struct seq_file *s, void 
*data)
 }
 DEFINE_SHOW_ATTRIBUTE(deferred_devs);
 
+#ifdef CONFIG_MODULES
+int driver_deferred_probe_timeout = 10;
+#else
 int driver_deferred_probe_timeout;
+#endif
+
 EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
 
 static int __init deferred_probe_timeout_setup(char *str)
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 3/9] net: mdio: Delete usage of driver_deferred_probe_check_state()

2022-05-26 Thread Saravana Kannan via iommu
Now that fw_devlink=on by default and fw_devlink supports interrupt
properties, the execution will never get to the point where
driver_deferred_probe_check_state() is called before the supplier has
probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan 
---
 drivers/net/mdio/fwnode_mdio.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/mdio/fwnode_mdio.c b/drivers/net/mdio/fwnode_mdio.c
index 1c1584fca632..3e79c2c51929 100644
--- a/drivers/net/mdio/fwnode_mdio.c
+++ b/drivers/net/mdio/fwnode_mdio.c
@@ -47,9 +47,7 @@ int fwnode_mdiobus_phy_device_register(struct mii_bus *mdio,
 * just fall back to poll mode
 */
if (rc == -EPROBE_DEFER)
-   rc = driver_deferred_probe_check_state(>mdio.dev);
-   if (rc == -EPROBE_DEFER)
-   return rc;
+   rc = -ENODEV;
 
if (rc > 0) {
phy->irq = rc;
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 2/9] pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()

2022-05-26 Thread Saravana Kannan via iommu
Now that fw_devlink=on by default and fw_devlink supports
"pinctrl-[0-8]" property, the execution will never get to the point
where driver_deferred_probe_check_state() is called before the supplier
has probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan 
---
 drivers/pinctrl/devicetree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pinctrl/devicetree.c b/drivers/pinctrl/devicetree.c
index 3fb238714718..ef898ee8ca6b 100644
--- a/drivers/pinctrl/devicetree.c
+++ b/drivers/pinctrl/devicetree.c
@@ -129,7 +129,7 @@ static int dt_to_map_one_config(struct pinctrl *p,
np_pctldev = of_get_next_parent(np_pctldev);
if (!np_pctldev || of_node_is_root(np_pctldev)) {
of_node_put(np_pctldev);
-   ret = driver_deferred_probe_check_state(p->dev);
+   ret = -ENODEV;
/* keep deferring if modules are enabled */
if (IS_ENABLED(CONFIG_MODULES) && !allow_default && ret 
< 0)
ret = -EPROBE_DEFER;
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-05-26 Thread Saravana Kannan via iommu
Now that fw_devlink=on by default and fw_devlink supports
"power-domains" property, the execution will never get to the point
where driver_deferred_probe_check_state() is called before the supplier
has probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan 
---
 drivers/base/power/domain.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 739e52cd4aba..3e86772d5fac 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev, 
struct device *base_dev,
mutex_unlock(_list_lock);
dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
__func__, PTR_ERR(pd));
-   return driver_deferred_probe_check_state(base_dev);
+   return -ENODEV;
}
 
dev_dbg(dev, "adding to PM domain %s\n", pd->name);
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 0/9] deferred_probe_timeout logic clean up

2022-05-26 Thread Saravana Kannan via iommu
This series is based on linux-next + these 2 small patches applies on top:
https://lore.kernel.org/lkml/20220526034609.480766-1-sarava...@google.com/

A lot of the deferred_probe_timeout logic is redundant with
fw_devlink=on.  Also, enabling deferred_probe_timeout by default breaks
a few cases.

This series tries to delete the redundant logic, simplify the frameworks
that use driver_deferred_probe_check_state(), enable
deferred_probe_timeout=10 by default, and fixes the nfsroot failure
case.

Patches 1 to 3 are fairly straightforward and can probably be applied
right away.

Patches 4 to 9 are related and are the complicated bits of this series.

Patch 8 is where someone with more knowledge of the IP auto config code
can help rewrite the patch to limit the scope of the workaround by
running the work around only if IP auto config fails the first time
around. But it's also something that can be optimized in the future
because it's already limited to the case where IP auto config is enabled
using the kernel commandline.

Yoshihiro/Geert,

If you can test this patch series and confirm that the NFS root case
works, I'd really appreciate that.

Cc: Mark Brown 
Cc: Rob Herring 
Cc: Geert Uytterhoeven 
Cc: Yoshihiro Shimoda 
Cc: John Stultz 
Cc: Nathan Chancellor 
Cc: Sebastian Andrzej Siewior 

Saravana Kannan (9):
  PM: domains: Delete usage of driver_deferred_probe_check_state()
  pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()
  net: mdio: Delete usage of driver_deferred_probe_check_state()
  Revert "driver core: Set default deferred_probe_timeout back to 0."
  driver core: Set fw_devlink.strict=1 by default
  iommu/of: Delete usage of driver_deferred_probe_check_state()
  driver core: Add fw_devlink_unblock_may_probe() helper function
  net: ipconfig: Force fw_devlink to unblock any devices that might probe
  driver core: Delete driver_deferred_probe_check_state()

 drivers/base/base.h|  1 +
 drivers/base/core.c| 60 +-
 drivers/base/dd.c  | 37 -
 drivers/base/power/domain.c|  2 +-
 drivers/iommu/of_iommu.c   |  2 +-
 drivers/net/mdio/fwnode_mdio.c |  4 +--
 drivers/pinctrl/devicetree.c   |  2 +-
 include/linux/device/driver.h  |  1 -
 include/linux/fwnode.h |  2 ++
 net/ipv4/ipconfig.c|  2 ++
 10 files changed, 74 insertions(+), 39 deletions(-)

-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu