Re: No check of the size passed to unmap_single in swiotlb

2017-11-24 Thread Konrad Rzeszutek Wilk
..snip..
> > There doesn't seem to be much good reason for SWIOTLB to be more special
> > than other DMA API backends, and not all of them have enough internal state 
> > to
> > be able to make such a check. It's also not necessarily possible to "prevent
> > damage" anyway - if a driver does pass a bogus size for 
> > dma_unmap_single(...,
> > DMA_FROM_DEVICE), SWIOTLB might be able to keep itself internally 
> > consistent,
> > but it still can't prevent the arch code in the middle from invalidating 
> > the wrong
> > cache lines and potentially corrupting adjacent memory.
> > 
> > In short, trying to work around broken drivers is a much worse idea than 
> > just
> > fixing those drivers, and that's what we already have dma-debug for.
> > 
> > Robin.
> 
> Hi Robin,
> 
> I agree that hacking kernel to fix broken drivers is not acceptable, actually 
> we spent days to fight driver supplier with  this, they do not want to change 
> their code and want fix it directly in kernel. 

You could upstream the driver? That way it will be fixed.

I am not sure if you are aware but there is a staging directory
as well in Linux.
> 
> I tried Dma-debug yesterday, it works very well,  but I think only the size 
> mismatch check may not be enough for the map entry corrupt situation, some 
> run-time warning may be better when the real corruption happen.
> 
> For most of the dma-api backend, the size mismatch may do no harm at all, and 
> even in SWIOTLB itself when the bounce buffer is not used, the size mismatch 
> do no harm either. In our case, the same buggy driver works well when board 
> has 2G DDR, but panic frequently in 4G DDR because of the use of bounce 
> buffer and these corrupted map entries. it is hard to catch this kind of 
> bugs, for when the corruption happen, the kernel has all kind of reasons to 
> panic, but not even one may directly point to the real source.  
> 
> Add the warning messages is a big convenience for figure this kind of issues, 
> at least to me and the AP driver supplier, such warnings may save weeks of 
> hard debug time.

I would prefer that all of those warning messages go in the DMA debug
API - as you can have multiple DMA different backends.

In other words, instead of being in one implementation it would make
much more sense to be in the generic API layer.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 03/16] iommu: introduce iommu invalidate API function

2017-11-24 Thread Jean-Philippe Brucker
Hi,

On 17/11/17 18:55, Jacob Pan wrote:
> From: "Liu, Yi L" 
> 
> When an SVM capable device is assigned to a guest, the first level page
> tables are owned by the guest and the guest PASID table pointer is
> linked to the device context entry of the physical IOMMU.
> 
> Host IOMMU driver has no knowledge of caching structure updates unless
> the guest invalidation activities are passed down to the host. The
> primary usage is derived from emulated IOMMU in the guest, where QEMU
> can trap invalidation activities before passing them down to the
> host/physical IOMMU.
> Since the invalidation data are obtained from user space and will be
> written into physical IOMMU, we must allow security check at various
> layers. Therefore, generic invalidation data format are proposed here,
> model specific IOMMU drivers need to convert them into their own format.
> 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
[...]
>  #endif /* __LINUX_IOMMU_H */
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 651ad5d..039ba36 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -36,4 +36,66 @@ struct pasid_table_config {
>   };
>  };
>  
> +enum iommu_inv_granularity {
> + IOMMU_INV_GRANU_GLOBAL, /* all TLBs invalidated */
> + IOMMU_INV_GRANU_DOMAIN, /* all TLBs associated with a domain */
> + IOMMU_INV_GRANU_DEVICE, /* caching structure associated with a
> +  * device ID
> +  */

I thought you were planning on removing these? If we do need global
invalidation, for example the guest clears the whole PASID table and
doesn't want to send individual GRANU_ALL_PASID invalidations, maybe keep
only GRANU_DOMAIN?

> + IOMMU_INV_GRANU_DOMAIN_PAGE,/* address range with a domain */
> + IOMMU_INV_GRANU_ALL_PASID,  /* cache of a given PASID */
> + IOMMU_INV_GRANU_PASID_SEL,  /* only invalidate specified PASID */

GRANU_PASID_SEL seems redundant, don't you already get it by default with
GRANU_ALL_PASID and GRANU_DOMAIN_PAGE (with IOMMU_INVALIDATE_PASID_TAGGED
flag)?

> +
> + IOMMU_INV_GRANU_NG_ALL_PASID,   /* non-global within all PASIDs */
> + IOMMU_INV_GRANU_NG_PASID,   /* non-global within a PASIDs */

Don't you get the "NG" behavior by not passing the
IOMMU_INVALIDATE_GLOBAL_PAGE flag defined below?

> + IOMMU_INV_GRANU_PAGE_PASID, /* page-selective within a PASID */

And don't you get this with GRANU_DOMAIN_PAGE+IOMMU_INVALIDATE_PASID_TAGGED?

> + IOMMU_INV_NR_GRANU,
> +};
> +
> +enum iommu_inv_type {
> + IOMMU_INV_TYPE_DTLB,/* device IOTLB */
> + IOMMU_INV_TYPE_TLB, /* IOMMU paging structure cache */
> + IOMMU_INV_TYPE_PASID,   /* PASID cache */
> + IOMMU_INV_TYPE_CONTEXT, /* device context entry cache */
> + IOMMU_INV_NR_TYPE
> +};

When the guest removes a PASID entry, it would have to send DTLB, TLB and
PASID invalidations separately? Could we define this inv_type as
cumulative, to avoid redundant invalidation requests:

* TYPE_DTLB only invalidates ATC entries.
* TYPE_TLB invalidates both ATC and IOTLB entries.
* TYPE_PASID invalidates all ATC and IOTLB entries for a PASID, and also
the PASID cache entry.
* TYPE_CONTEXT invalidates all. Although is it needed by userspace or just
here fore completeness? "CONTEXT" is specific to VT-d (doesn't exist on
AMD and has a different meaning on SMMU), how about "DEVICE" instead?

This is important because invalidation will probably become the
bottleneck. The guest shouldn't have to send DTLB and TLB invalidation
separately after each unmapping.

> +/**
> + * Translation cache invalidation header that contains mandatory meta data.
> + * @version: info format version, expecting future extesions
> + * @type:type of translation cache to be invalidated
> + */
> +struct tlb_invalidate_hdr {
> + __u32 version;
> +#define TLB_INV_HDR_VERSION_1 1
> + enum iommu_inv_type type;
> +};
> +
> +/**
> + * Translation cache invalidation information, contains generic IOMMU
> + * data which can be parsed based on model ID by model specific drivers.
> + *
> + * @granularity: requested invalidation granularity, type dependent
> + * @size:2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.

Having only power of two invalidation seems too restrictive for a software
interface. You might have the same problem as above, where the guest or
userspace needs to send lots of invalidation requests, They could be
multiplexed by passing an arbitrary range instead. How about making @size
a __u64?

> + * @pasid:   processor address space ID value per PCI spec.
> + * @addr:page address to be invalidated
> + * @flagsIOMMU_INVALIDATE_PASID_TAGGED: DMA with PASID tagged,
> + *   @pasid validity can be
> + *   deduced from @granul

Re: [PATCH v3 08/16] iommu: introduce device fault data

2017-11-24 Thread Jean-Philippe Brucker
Hi Jacob,

On 17/11/17 18:55, Jacob Pan wrote:
> Device faults detected by IOMMU can be reported outside IOMMU
> subsystem for further processing. This patch intends to provide
> a generic device fault data such that device drivers can be
> communicated with IOMMU faults without model specific knowledge.
> 
> The proposed format is the result of discussion at:
> https://lkml.org/lkml/2017/11/10/291
> Part of the code is based on Jean-Philippe Brucker's patchset
> (https://patchwork.kernel.org/patch/9989315/).
> 
> The assumption is that model specific IOMMU driver can filter and
> handle most of the internal faults if the cause is within IOMMU driver
> control. Therefore, the fault reasons can be reported are grouped
> and generalized based common specifications such as PCI ATS.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 

This looks good from my point of view. And since it's not UAPI, we can
always update it if it turns out that device drivers need more information.

[...]
> +/**
> + * struct iommu_fault_event - Generic per device fault data
> + *
> + * - PCI and non-PCI devices
> + * - Recoverable faults (e.g. page request), information based on PCI ATS
> + * and PASID spec.
> + * - Un-recoverable faults of device interest
> + * - DMA remapping and IRQ remapping faults
> +
> + * @type contains fault type.
> + * @reason fault reasons if relevant outside IOMMU driver, IOMMU driver 
> internal
> + * faults are not reported
> + * @addr: tells the offending page address
> + * @pasid: contains process address space ID, used in shared virtual 
> memory(SVM)
> + * @rid: requestor ID

This comment can be removed

Thanks,
Jean

> + * @page_req_group_id: page request group index
> + * @last_req: last request in a page request group
> + * @pasid_valid: indicates if the PRQ has a valid PASID
> + * @prot: page access protection flag, e.g. IOMMU_FAULT_READ, 
> IOMMU_FAULT_WRITE
> + * @device_private: if present, uniquely identify device-specific
> + *  private data for an individual page request.
> + * @iommu_private: used by the IOMMU driver for storing fault-specific
> + * data. Users should not modify this field before
> + * sending the fault response.
> + */
> +struct iommu_fault_event {
> + enum iommu_fault_type type;
> + enum iommu_fault_reason reason;
> + u64 addr;
> + u32 pasid;
> + u32 page_req_group_id : 9;
> + u32 last_req : 1;
> + u32 pasid_valid : 1;
> + u32 prot;
> + u64 device_private;
> + u64 iommu_private;
> +};
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 01/16] iommu: introduce bind_pasid_table API function

2017-11-24 Thread Jean-Philippe Brucker
On 17/11/17 18:54, Jacob Pan wrote:
> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM)
> use in the guest:
> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> 
> As part of the proposed architecture, when an SVM capable PCI
> device is assigned to a guest, nested mode is turned on. Guest owns the
> first level page tables (request with PASID) which performs GVA->GPA
> translation. Second level page tables are owned by the host for GPA->HPA
> translation for both request with and without PASID.
> 
> A new IOMMU driver interface is therefore needed to perform tasks as
> follows:
> * Enable nested translation and appropriate translation type
> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> 
> This patch introduces new API functions to perform bind/unbind guest PASID
> tables. Based on common data, model specific IOMMU drivers can be extended
> to perform the specific steps for binding pasid table of assigned devices.
> 
[...]
>  
>  #define IOMMU_READ   (1 << 0)
>  #define IOMMU_WRITE  (1 << 1)
> @@ -187,6 +188,8 @@ struct iommu_resv_region {
>   * @domain_get_windows: Return the number of windows for a domain
>   * @of_xlate: add OF master IDs to iommu grouping
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
> + * @bind_pasid_table: bind pasid table pointer for guest SVM
> + * @unbind_pasid_table: unbind pasid table pointer and restore defaults

I was wondering, are you planning on using the IOMMU_DOMAIN_NESTING
attribute? It differentiates a domain that supports
bind/unbind_pasid_table and map/unmap GPA (virt SVM), from the domain that
supports bind/unbind individual PASIDs and map/unmap IOVA (host SVM)?

Users can set this attribute by using the VFIO_TYPE1_NESTING_IOMMU type
instead of VFIO_TYPE1v2_IOMMU, which seems ideal for what we're trying to do.

[...]
> +/**
> + * PASID table data used to bind guest PASID table to the host IOMMU. This 
> will
> + * enable guest managed first level page tables.
> + * @version: for future extensions and identification of the data format
> + * @bytes: size of this structure
> + * @base_ptr:PASID table pointer
> + * @pasid_bits:  number of bits supported in the guest PASID table, must 
> be less
> + *   or equal than the host supported PASID size.

Why remove the @model parameter?

> + */
> +struct pasid_table_config {
> + __u32 version;
> +#define PASID_TABLE_CFG_VERSION 1
> + __u32 bytes;
> + __u64 base_ptr;
> + __u8 pasid_bits;
> + /* reserved for extension of vendor specific config */
> + union {
> + struct {
> + /* ARM specific fields */
> + bool pasid0_dma_no_pasid;
> + } arm;

I think @model is still required for sanity check, but could you remove
the whole union for the moment? Other parameters will be needed and I'm
still thinking about it, so I'll add the arm struct back in a future patch.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 15/16] iommu: introduce page response function

2017-11-24 Thread Jean-Philippe Brucker
On 17/11/17 18:55, Jacob Pan wrote:
> When nested translation is turned on and guest owns the
> first level page tables, device page request can be forwared
> to the guest for handling faults. As the page response returns
> by the guest, IOMMU driver on the host need to process the
> response which informs the device and completes the page request
> transaction.
> 
> This patch introduces generic API function for page response
> passing from the guest or other in-kernel users. The definitions of
> the generic data is based on PCI ATS specification not limited to
> any vendor.>
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/iommu.c | 14 ++
>  include/linux/iommu.h | 42 ++
>  2 files changed, 56 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 97b7990..7aefb40 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1416,6 +1416,20 @@ int iommu_sva_invalidate(struct iommu_domain *domain,
>  }
>  EXPORT_SYMBOL_GPL(iommu_sva_invalidate);
>  
> +int iommu_page_response(struct iommu_domain *domain, struct device *dev,
> + struct page_response_msg *msg)

I think it's simpler, both for IOMMU and device drivers, to pass the exact
structure received in the fault handler back to the IOMMU driver, along
with a separate response status. So maybe

int iommu_page_response(struct iommu_domain *domain, struct device *dev,
struct iommu_fault_event *event, u32 response)

And then you'd just need to define the IOMMU_PAGE_RESPONSE_* values.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/2] iommu/ipmmu-vmsa: Add r8a779(70|95) DT bindings

2017-11-24 Thread Simon Horman
Update the IPMMU DT binding documentation to include the
r8a77970 (R-Car V3M) and r8a77995 (R-Car D3) compat strings.

Based on work for r8a7796 by Magnus Damm.

Signed-off-by: Simon Horman 
Reviewed-by: Geert Uytterhoeven 
---
v2
* Correct typo: R8A7770 -> R8A77970
---
 Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt 
b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt
index 98ae900a7a76..8e4e89c1bd48 100644
--- a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt
+++ b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt
@@ -17,6 +17,8 @@ Required Properties:
 - "renesas,ipmmu-r8a7794" for the R8A7794 (R-Car E2) IPMMU.
 - "renesas,ipmmu-r8a7795" for the R8A7795 (R-Car H3) IPMMU.
 - "renesas,ipmmu-r8a7796" for the R8A7796 (R-Car M3-W) IPMMU.
+- "renesas,ipmmu-r8a77970" for the R8A77970 (R-Car V3M) IPMMU.
+- "renesas,ipmmu-r8a77995" for the R8A77995 (R-Car D3) IPMMU.
 - "renesas,ipmmu-vmsa" for generic R-Car Gen2 VMSA-compatible IPMMU.
 
   - reg: Base address and size of the IPMMU registers.
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/2] iommu/ipmmu-vmsa: Add r8a7796 DT binding

2017-11-24 Thread Simon Horman
From: Magnus Damm 

Update the IPMMU DT binding documentation to include the r8a7796 compat
string for R-Car M3-W.

Signed-off-by: Magnus Damm 
Acked-by: Laurent Pinchart 
Acked-by: Rob Herring 
Acked-by: Simon Horman 
Acked-by: Geert Uytterhoeven 
---
 Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt 
b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt
index 3ed027cfca95..98ae900a7a76 100644
--- a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt
+++ b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt
@@ -16,6 +16,7 @@ Required Properties:
 - "renesas,ipmmu-r8a7793" for the R8A7793 (R-Car M2-N) IPMMU.
 - "renesas,ipmmu-r8a7794" for the R8A7794 (R-Car E2) IPMMU.
 - "renesas,ipmmu-r8a7795" for the R8A7795 (R-Car H3) IPMMU.
+- "renesas,ipmmu-r8a7796" for the R8A7796 (R-Car M3-W) IPMMU.
 - "renesas,ipmmu-vmsa" for generic R-Car Gen2 VMSA-compatible IPMMU.
 
   - reg: Base address and size of the IPMMU registers.
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/2] iommu/ipmmu-vmsa: Add r8a779(6|70|95) DT bindings

2017-11-24 Thread Simon Horman
Update the IPMMU DT binding documentation to include r8a7796 (R-Car M3-W),
r8a77970 (R-Car V3M) and r8a77995 (R-Car D3) compat strings.

This patch-set is comprised of patches previously included in
* [PATCH v4 0/3] iommu/ipmmu-vmsa: r8a7796 support V4
* [PATCH 0/2] iommu/ipmmu-vmsa: r8a779(70|95) support

Those series also include driver updates to implement the bindings,
however, it while it is felt that the bindings themselves are ready
to be merged we feel that the driver updates are not at that stage yet.

Changes since v1:
* Correct typo in r8a779(70|95) patch: R8A7770 -> R8A77970


Magnus Damm (1):
  iommu/ipmmu-vmsa: Add r8a7796 DT binding

Simon Horman (1):
  iommu/ipmmu-vmsa: Add r8a779(70|95) DT bindings

 Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt | 3 +++
 1 file changed, 3 insertions(+)

-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFCv2 PATCH 10/36] vfio: Add support for Shared Virtual Memory

2017-11-24 Thread Jean-Philippe Brucker
On 24/11/17 08:23, Bob Liu wrote:
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> Add two new ioctl for VFIO containers. VFIO_DEVICE_BIND_PROCESS creates a
>> bond between a container and a process address space, identified by a
>> device-specific ID named PASID. This allows the device to target DMA
>> transactions at the process virtual addresses without a need for mapping
>> and unmapping buffers explicitly in the IOMMU. The process page tables are
>> shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to
>> handle faults. VFIO_DEVICE_UNBIND_PROCESS removed a bond identified by a
>> PASID.
>> 
> 
> How about hide bind/unbind into ioctl(VFIO_SET_IOMMU)?
> e.g always bind to current process in SET_IOMMU.
> 
> Not sure about the real use case.

I guess you could introduce a new VFIO IOMMU type for this. I think this
would be useful for SVA without PASID: if the device supports I/O page
faults, use SET_IOMMU with a VFIO_SVA_IOMMU type (for example) and the
process is bound automatically to the default translation context of the
device. This requires a new IOMMU type because the MAP/UNMAP ioctl won't
work anymore.

I'm not keen on introducing loads of new features in the APIs at the
moment, because I only have the IOMMU point of view, not many endpoint users.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC v2] dma-coherent: introduce no-align to avoid allocation failure and save memory

2017-11-24 Thread David Laight
From: Jaewon Kim
> Sent: 24 November 2017 05:59
> 
> dma-coherent uses bitmap APIs which internally consider align based on the
> requested size. If most of allocations are small size like KBs, using
> alignment scheme seems to be good for anti-fragmentation. But if large
> allocation are commonly used, then an allocation could be failed because
> of the alignment. To avoid the allocation failure, we had to increase total
> size.
> 
> This is a example, total size is 30MB, only few memory at front is being
> used, and 9MB is being requsted. Then 9MB will be aligned to 16MB. The
> first try on offset 0MB will be failed because others already are using
> them. The second try on offset 16MB will be failed because of ouf of bound.
> 
> So if the alignment is not necessary on a specific dma-coherent memory
> region, we can set no-align property. Then dma-coherent will ignore the
> alignment only for the memory region.

ISTM that the alignment needs to be a property of the request, not of the
device. Certainly the device driver code is most likely to know the specific
alignment requirements of any specific allocation.

We've some hardware that would need large allocations to be 16k aligned.
We actually use multiple 16k allocations because any large buffers are
accessed directly from userspace (mmap and vm_iomap_memory) and the
card has its own page tables (with 16k pages).

David

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFCv2 PATCH 10/36] vfio: Add support for Shared Virtual Memory

2017-11-24 Thread Bob Liu
On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> Add two new ioctl for VFIO containers. VFIO_DEVICE_BIND_PROCESS creates a
> bond between a container and a process address space, identified by a
> device-specific ID named PASID. This allows the device to target DMA
> transactions at the process virtual addresses without a need for mapping
> and unmapping buffers explicitly in the IOMMU. The process page tables are
> shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to
> handle faults. VFIO_DEVICE_UNBIND_PROCESS removed a bond identified by a
> PASID.
> 

How about hide bind/unbind into ioctl(VFIO_SET_IOMMU)? 
e.g always bind to current process in SET_IOMMU.

Not sure about the real use case.

> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 243 
> +++-
>  include/uapi/linux/vfio.h   |  69 
>  2 files changed, 311 insertions(+), 1 deletion(-)
> 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu