Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-14 Thread Lu Baolu
Hi,

On 05/15/2018 04:55 AM, Jacob Pan wrote:
> On Mon, 14 May 2018 14:01:06 +0800
> Lu Baolu  wrote:
>
>> Hi,
>>
>> On 05/12/2018 04:54 AM, Jacob Pan wrote:
>>> Traditionally, device specific faults are detected and handled
>>> within their own device drivers. When IOMMU is enabled, faults such
>>> as DMA related transactions are detected by IOMMU. There is no
>>> generic reporting mechanism to report faults back to the in-kernel
>>> device driver or the guest OS in case of assigned devices.
>>>
>>> Faults detected by IOMMU is based on the transaction's source ID
>>> which can be reported at per device basis, regardless of the device
>>> type is a PCI device or not.
>>>
>>> The fault types include recoverable (e.g. page request) and
>>> unrecoverable faults(e.g. access error). In most cases, faults can
>>> be handled by IOMMU drivers internally. The primary use cases are as
>>> follows:
>>> 1. page request fault originated from an SVM capable device that is
>>> assigned to guest via vIOMMU. In this case, the first level page
>>> tables are owned by the guest. Page request must be propagated to
>>> the guest to let guest OS fault in the pages then send page
>>> response. In this mechanism, the direct receiver of IOMMU fault
>>> notification is VFIO, which can relay notification events to QEMU
>>> or other user space software.
>>>
>>> 2. faults need more subtle handling by device drivers. Other than
>>> simply invoke reset function, there are needs to let device driver
>>> handle the fault with a smaller impact.
>>>
>>> This patchset is intended to create a generic fault report API such
>>> that it can scale as follows:
>>> - all IOMMU types
>>> - PCI and non-PCI devices
>>> - recoverable and unrecoverable faults
>>> - VFIO and other other in kernel users
>>> - DMA & IRQ remapping (TBD)
>>> The original idea was brought up by David Woodhouse and discussions
>>> summarized at https://lwn.net/Articles/608914/.
>>>
>>> Signed-off-by: Jacob Pan 
>>> Signed-off-by: Ashok Raj 
>>> Signed-off-by: Jean-Philippe Brucker 
>>> ---
>>>  drivers/iommu/iommu.c | 149
>>> +-
>>> include/linux/iommu.h |  35 +++- 2 files changed, 181
>>> insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index 3a49b96..b3f9daf 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group
>>> *group, struct device *dev) goto err_free_name;
>>> }
>>>  
>>> +   dev->iommu_param = kzalloc(sizeof(*dev->iommu_param),
>>> GFP_KERNEL);
>>> +   if (!dev->iommu_param) {
>>> +   ret = -ENOMEM;
>>> +   goto err_free_name;
>>> +   }
>>> +   mutex_init(&dev->iommu_param->lock);
>>> +
>>> kobject_get(group->devices_kobj);
>>>  
>>> dev->iommu_group = group;
>>> @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group
>>> *group, struct device *dev) mutex_unlock(&group->mutex);
>>> dev->iommu_group = NULL;
>>> kobject_put(group->devices_kobj);
>>> +   kfree(dev->iommu_param);
>>>  err_free_name:
>>> kfree(device->name);
>>>  err_remove_link:
>>> @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device
>>> *dev) sysfs_remove_link(&dev->kobj, "iommu_group");
>>>  
>>> trace_remove_device_from_group(group->id, dev);
>>> -
>>> +   kfree(dev->iommu_param);
>>> kfree(device->name);
>>> kfree(device);
>>> dev->iommu_group = NULL;
>>> @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct
>>> iommu_group *group,
>>> EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier); 
>>>  /**
>>> + * iommu_register_device_fault_handler() - Register a device fault
>>> handler
>>> + * @dev: the device
>>> + * @handler: the fault handler
>>> + * @data: private data passed as argument to the handler
>>> + *
>>> + * When an IOMMU fault event is received, call this handler with
>>> the fault event
>>> + * and data as argument. The handler should return 0 on success.
>>> If the fault is
>>> + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also
>>> complete
>>> + * the fault by calling iommu_page_response() with one of the
>>> following
>>> + * response code:
>>> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
>>> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
>>> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop
>>> reporting
>>> + *   page faults if possible.
>>> + *
>>> + * Return 0 if the fault handler was installed successfully, or an
>>> error.
>>> + */
>>> +int iommu_register_device_fault_handler(struct device *dev,
>>> +   iommu_dev_fault_handler_t
>>> handler,
>>> +   void *data)
>>> +{
>>> +   struct iommu_param *param = dev->iommu_param;
>>> +   int ret = 0;
>>> +
>>> +   /*
>>> +* Device iommu_param should have been allocated when
>>> device is
>>> +* added to its iommu_group.
>>> +*/
>>> +   if (!param)
>>> +   

Re: [Linux-c6x-dev] [PATCH 05/20] c6x: use generic dma_noncoherent_ops

2018-05-14 Thread Mark Salter
On Fri, 2018-05-11 at 09:59 +0200, Christoph Hellwig wrote:
> Switch to the generic noncoherent direct mapping implementation.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/c6x/Kconfig   |   3 +
>  arch/c6x/include/asm/Kbuild|   1 +
>  arch/c6x/include/asm/dma-mapping.h |  28 --
>  arch/c6x/include/asm/setup.h   |   2 +
>  arch/c6x/kernel/Makefile   |   2 +-
>  arch/c6x/kernel/dma.c  | 138 -
>  arch/c6x/mm/dma-coherent.c |  40 -
>  7 files changed, 44 insertions(+), 170 deletions(-)
>  delete mode 100644 arch/c6x/include/asm/dma-mapping.h
>  delete mode 100644 arch/c6x/kernel/dma.c
> 
> diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig
> index 8c088b96e372..bf59855628ac 100644
> --- a/arch/c6x/Kconfig
> +++ b/arch/c6x/Kconfig
> @@ -6,7 +6,10 @@
>  
>  config C6X
>   def_bool y
> + select ARCH_HAS_SYNC_DMA_FOR_CPU
> + select ARCH_HAS_SYNC_DMA_FOR_DEVICE
>   select CLKDEV_LOOKUP
> + select DMA_NONCOHERENT_OPS
>   select GENERIC_ATOMIC64
>   select GENERIC_IRQ_SHOW
>   select HAVE_ARCH_TRACEHOOK
> diff --git a/arch/c6x/include/asm/Kbuild b/arch/c6x/include/asm/Kbuild
> index fd4c840de837..434600e47662 100644
> --- a/arch/c6x/include/asm/Kbuild
> +++ b/arch/c6x/include/asm/Kbuild
> @@ -5,6 +5,7 @@ generic-y += current.h
>  generic-y += device.h
>  generic-y += div64.h
>  generic-y += dma.h
> +generic-y += dma-mapping.h
>  generic-y += emergency-restart.h
>  generic-y += exec.h
>  generic-y += extable.h
> diff --git a/arch/c6x/include/asm/dma-mapping.h 
> b/arch/c6x/include/asm/dma-mapping.h
> deleted file mode 100644
> index 05daf1038111..
> --- a/arch/c6x/include/asm/dma-mapping.h
> +++ /dev/null
> @@ -1,28 +0,0 @@
> -/*
> - *  Port on Texas Instruments TMS320C6x architecture
> - *
> - *  Copyright (C) 2004, 2009, 2010, 2011 Texas Instruments Incorporated
> - *  Author: Aurelien Jacquiot 
> - *
> - *  This program is free software; you can redistribute it and/or modify
> - *  it under the terms of the GNU General Public License version 2 as
> - *  published by the Free Software Foundation.
> - *
> - */
> -#ifndef _ASM_C6X_DMA_MAPPING_H
> -#define _ASM_C6X_DMA_MAPPING_H
> -
> -extern const struct dma_map_ops c6x_dma_ops;
> -
> -static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type 
> *bus)
> -{
> - return &c6x_dma_ops;
> -}
> -
> -extern void coherent_mem_init(u32 start, u32 size);
> -void *c6x_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
> - gfp_t gfp, unsigned long attrs);
> -void c6x_dma_free(struct device *dev, size_t size, void *vaddr,
> - dma_addr_t dma_handle, unsigned long attrs);
> -
> -#endif   /* _ASM_C6X_DMA_MAPPING_H */
> diff --git a/arch/c6x/include/asm/setup.h b/arch/c6x/include/asm/setup.h
> index 852afb209afb..350f34debb19 100644
> --- a/arch/c6x/include/asm/setup.h
> +++ b/arch/c6x/include/asm/setup.h
> @@ -28,5 +28,7 @@ extern unsigned char c6x_fuse_mac[6];
>  extern void machine_init(unsigned long dt_ptr);
>  extern void time_init(void);
>  
> +extern void coherent_mem_init(u32 start, u32 size);
> +
>  #endif /* !__ASSEMBLY__ */
>  #endif /* _ASM_C6X_SETUP_H */
> diff --git a/arch/c6x/kernel/Makefile b/arch/c6x/kernel/Makefile
> index 02f340d7b8fe..fbe74174de87 100644
> --- a/arch/c6x/kernel/Makefile
> +++ b/arch/c6x/kernel/Makefile
> @@ -8,6 +8,6 @@ extra-y := head.o vmlinux.lds
>  obj-y := process.o traps.o irq.o signal.o ptrace.o
>  obj-y += setup.o sys_c6x.o time.o devicetree.o
>  obj-y += switch_to.o entry.o vectors.o c6x_ksyms.o
> -obj-y += soc.o dma.o
> +obj-y += soc.o
>  
>  obj-$(CONFIG_MODULES)   += module.o
> diff --git a/arch/c6x/kernel/dma.c b/arch/c6x/kernel/dma.c
> deleted file mode 100644
> index 31e1a9ec3a9c..
> --- a/arch/c6x/kernel/dma.c
> +++ /dev/null
> @@ -1,138 +0,0 @@
> -/*
> - *  Copyright (C) 2011 Texas Instruments Incorporated
> - *  Author: Mark Salter 
> - *
> - *  This program is free software; you can redistribute it and/or modify
> - *  it under the terms of the GNU General Public License version 2 as
> - *  published by the Free Software Foundation.
> - */
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#include 
> -
> -static void c6x_dma_sync(dma_addr_t handle, size_t size,
> -  enum dma_data_direction dir)
> -{
> - unsigned long paddr = handle;
> -
> - BUG_ON(!valid_dma_direction(dir));
> -
> - switch (dir) {
> - case DMA_FROM_DEVICE:
> - L2_cache_block_invalidate(paddr, paddr + size);
> - break;
> - case DMA_TO_DEVICE:
> - L2_cache_block_writeback(paddr, paddr + size);
> - break;
> - case DMA_BIDIRECTIONAL:
> - L2_cache_block_writeback_invalidate(paddr, paddr + size);
> - break;
> - default:
> - break;
> - }
> -}
> -
> -static dma_addr_t c6x_dma_map_page(struct devi

Re: [PATCH v7 2/2] iommu/amd: Add basic debugfs infrastructure for AMD IOMMU

2018-05-14 Thread Randy Dunlap
On 05/14/2018 01:00 PM, Gary R Hook wrote:
> On 05/14/2018 12:50 PM, Randy Dunlap wrote:
>> On 05/14/2018 10:20 AM, Gary R Hook wrote:
>>> Implement a skeleton framework for debugfs support in the
>>> AMD IOMMU.
>>>
>>> Signed-off-by: Gary R Hook 
>>> ---
>>>   drivers/iommu/Makefile    |    5 +
>>>   drivers/iommu/amd_iommu_debugfs.c |   39 
>>> +
>>>   drivers/iommu/amd_iommu_init.c    |    6 --
>>>   drivers/iommu/amd_iommu_proto.h   |    6 ++
>>>   drivers/iommu/amd_iommu_types.h   |    3 +++
>>>   5 files changed, 57 insertions(+), 2 deletions(-)
>>>   create mode 100644 drivers/iommu/amd_iommu_debugfs.c
>>>
>>> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
>>> index 74cfbc392862..dd980f7dd8b6 100644
>>> --- a/drivers/iommu/Makefile
>>> +++ b/drivers/iommu/Makefile
>>> @@ -30,3 +30,8 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
>>>   obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
>>>   obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>>>   obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
>>> +
>>> +# This ensures that only the required files are compiled
>>> +ifeq ($(CONFIG_IOMMU_DEBUGFS), y)
>>
>> Most Makefiles don't use a space before the 'y', but since you tested it,
>> I guess either way works.
> 
> Pretty sure whitespace isn't used as a delimiter in this construct. I could 
> be mistaken. But yes, it's perfectly serviceable.
> 
>> But why do this in the Makefile at all?  Why not just add another Kconfig
>> symbol and simplify the Makefile?
>>
>>> +obj-$(CONFIG_AMD_IOMMU) += amd_iommu_debugfs.o
>>> +endif
> 
> 
> This was brought up a few weeks ago in, I believe, version 3 of this patch. 
> That question was discussed (because that's what I did the first time out), 
> and _someone_ _else_ asked about why I didn't just do it the way I've done it 
> here.

Sorry I missed it.  I would have been glad to chime in at that time.

> Everyone has a preference.
> 
> I chose to simplify the choices and avoid multiple symbols, instead opting 
> for two switches: choose your device, and decide on Debug FS enablement for 
> it. IMO Very simple.
> 
> I can't fathom a scenario where this wouldn't work. Is there one?

Probably not.

But we used to have ugly, messy, convoluted makefiles with very little config 
help.
Then we got better kconfig tools (well, arguably :) and then the makefiles were
cleaned up and simplified a lot (or A LOT!).  We shouldn't want to go back 
there.

-- 
~Randy
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-14 Thread Jacob Pan
On Mon, 14 May 2018 14:01:06 +0800
Lu Baolu  wrote:

> Hi,
> 
> On 05/12/2018 04:54 AM, Jacob Pan wrote:
> > Traditionally, device specific faults are detected and handled
> > within their own device drivers. When IOMMU is enabled, faults such
> > as DMA related transactions are detected by IOMMU. There is no
> > generic reporting mechanism to report faults back to the in-kernel
> > device driver or the guest OS in case of assigned devices.
> >
> > Faults detected by IOMMU is based on the transaction's source ID
> > which can be reported at per device basis, regardless of the device
> > type is a PCI device or not.
> >
> > The fault types include recoverable (e.g. page request) and
> > unrecoverable faults(e.g. access error). In most cases, faults can
> > be handled by IOMMU drivers internally. The primary use cases are as
> > follows:
> > 1. page request fault originated from an SVM capable device that is
> > assigned to guest via vIOMMU. In this case, the first level page
> > tables are owned by the guest. Page request must be propagated to
> > the guest to let guest OS fault in the pages then send page
> > response. In this mechanism, the direct receiver of IOMMU fault
> > notification is VFIO, which can relay notification events to QEMU
> > or other user space software.
> >
> > 2. faults need more subtle handling by device drivers. Other than
> > simply invoke reset function, there are needs to let device driver
> > handle the fault with a smaller impact.
> >
> > This patchset is intended to create a generic fault report API such
> > that it can scale as follows:
> > - all IOMMU types
> > - PCI and non-PCI devices
> > - recoverable and unrecoverable faults
> > - VFIO and other other in kernel users
> > - DMA & IRQ remapping (TBD)
> > The original idea was brought up by David Woodhouse and discussions
> > summarized at https://lwn.net/Articles/608914/.
> >
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Ashok Raj 
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >  drivers/iommu/iommu.c | 149
> > +-
> > include/linux/iommu.h |  35 +++- 2 files changed, 181
> > insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 3a49b96..b3f9daf 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group
> > *group, struct device *dev) goto err_free_name;
> > }
> >  
> > +   dev->iommu_param = kzalloc(sizeof(*dev->iommu_param),
> > GFP_KERNEL);
> > +   if (!dev->iommu_param) {
> > +   ret = -ENOMEM;
> > +   goto err_free_name;
> > +   }
> > +   mutex_init(&dev->iommu_param->lock);
> > +
> > kobject_get(group->devices_kobj);
> >  
> > dev->iommu_group = group;
> > @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group
> > *group, struct device *dev) mutex_unlock(&group->mutex);
> > dev->iommu_group = NULL;
> > kobject_put(group->devices_kobj);
> > +   kfree(dev->iommu_param);
> >  err_free_name:
> > kfree(device->name);
> >  err_remove_link:
> > @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device
> > *dev) sysfs_remove_link(&dev->kobj, "iommu_group");
> >  
> > trace_remove_device_from_group(group->id, dev);
> > -
> > +   kfree(dev->iommu_param);
> > kfree(device->name);
> > kfree(device);
> > dev->iommu_group = NULL;
> > @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct
> > iommu_group *group,
> > EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier); 
> >  /**
> > + * iommu_register_device_fault_handler() - Register a device fault
> > handler
> > + * @dev: the device
> > + * @handler: the fault handler
> > + * @data: private data passed as argument to the handler
> > + *
> > + * When an IOMMU fault event is received, call this handler with
> > the fault event
> > + * and data as argument. The handler should return 0 on success.
> > If the fault is
> > + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also
> > complete
> > + * the fault by calling iommu_page_response() with one of the
> > following
> > + * response code:
> > + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> > + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> > + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop
> > reporting
> > + *   page faults if possible.
> > + *
> > + * Return 0 if the fault handler was installed successfully, or an
> > error.
> > + */
> > +int iommu_register_device_fault_handler(struct device *dev,
> > +   iommu_dev_fault_handler_t
> > handler,
> > +   void *data)
> > +{
> > +   struct iommu_param *param = dev->iommu_param;
> > +   int ret = 0;
> > +
> > +   /*
> > +* Device iommu_param should have been allocated when
> > device is
> > +* added to its iommu_group.
> > +*/
> > +   if (!param)
> > +   return -EINVAL;
> > +
> > +   mutex_lock(¶m->

Re: [PATCH v5 11/23] driver core: add per device iommu param

2018-05-14 Thread Jacob Pan
On Mon, 14 May 2018 13:27:13 +0800
Lu Baolu  wrote:

> Hi,
> 
> On 05/12/2018 04:54 AM, Jacob Pan wrote:
> > DMA faults can be detected by IOMMU at device level. Adding a
> > pointer to struct device allows IOMMU subsystem to report relevant
> > faults back to the device driver for further handling.
> > For direct assigned device (or user space drivers), guest OS holds
> > responsibility to handle and respond per device IOMMU fault.
> > Therefore we need fault reporting mechanism to propagate faults
> > beyond IOMMU subsystem.
> >
> > There are two other IOMMU data pointers under struct device today,
> > here we introduce iommu_param as a parent pointer such that all
> > device IOMMU data can be consolidated here. The idea was suggested
> > here by Greg KH and Joerg. The name iommu_param is chosen here
> > since iommu_data has been used.  
> 
> This doesn't match what you've done in the patch. Maybe you
> forgot to cleanup? :-)
> 
No, I was trying to explain the thought process behind naming
iommu_param. I meant to say iommu_data is a probably a better name but
taken already.
> The idea is to create a parent pointer under device struct and
> move previous iommu_group and iommu_fwspec together with
> the iommu fault related data into it.
> 
> Best regards,
> Lu Baolu
> 
>  [...]  
> 

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 09/23] iommu/vt-d: add svm/sva invalidate function

2018-05-14 Thread Jacob Pan
On Mon, 14 May 2018 11:35:05 +0800
Lu Baolu  wrote:

> > +   switch (inv_info->hdr.type) {
> > +   case IOMMU_INV_TYPE_TLB:
> > +   if (inv_info->size &&
> > +   (inv_info->addr & ((1 << (VTD_PAGE_SHIFT +
> > inv_info->size)) - 1))) {
> > +   pr_err("Addr out of range, addr 0x%llx,
> > size order %d\n",
> > +   inv_info->addr, inv_info->size);
> > +   ret = -ERANGE;
> > +   goto out_unlock;
> > +   }
> > +
> > +   qi_flush_eiotlb(iommu, did,
> > mm_to_dma_pfn(inv_info->addr),
> > +   inv_info->pasid,
> > +   inv_info->size, granu,
> > +   inv_info->flags &
> > IOMMU_INVALIDATE_GLOBAL_PAGE);
> > +   /**
> > +* Always flush device IOTLB if ATS is enabled
> > since guest
> > +* vIOMMU exposes CM = 1, no device IOTLB flush
> > will be passed
> > +* down.
> > +*/
> > +   info = iommu_support_dev_iotlb(dmar_domain, iommu,
> > bus, devfn);
> > +   if (info && info->ats_enabled) {
> > +   qi_flush_dev_eiotlb(iommu, sid,
> > +   inv_info->pasid,
> > info->ats_qdep,
> > +   inv_info->addr,
> > inv_info->size,
> > +   granu);
> > +   }
> > +   break;
> > +   case IOMMU_INV_TYPE_PASID:
> > +   qi_flush_pasid(iommu, did, granu, inv_info->pasid);
> > +
> > +   break;
> > +   default:
> > +   dev_err(dev, "Unknown IOMMU invalidation type
> > %d\n",
> > +   inv_info->hdr.type);  
> 
> There are three types of invalidation:
> 
> enum iommu_inv_type {
> IOMMU_INV_TYPE_DTLB,
> IOMMU_INV_TYPE_TLB,
> IOMMU_INV_TYPE_PASID,
> IOMMU_INV_NR_TYPE
> };
> 
> So "unsupported" looks better than "unknown" in the message.
> 
agreed, makes more sense.
>  [...]  

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 08/23] iommu/vt-d: support flushing more translation cache types

2018-05-14 Thread Jacob Pan
On Mon, 14 May 2018 10:18:44 +0800
Lu Baolu  wrote:

> Hi,
> 
> On 05/12/2018 04:54 AM, Jacob Pan wrote:
> > When Shared Virtual Memory is exposed to a guest via vIOMMU,
> > extended IOTLB invalidation may be passed down from outside IOMMU
> > subsystems. This patch adds invalidation functions that can be used
> > for additional translation cache types.
> >
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/dmar.c| 44
> > 
> > include/linux/intel-iommu.h | 21 +++-- 2 files
> > changed, 63 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> > index 7852678..0b5b052 100644
> > --- a/drivers/iommu/dmar.c
> > +++ b/drivers/iommu/dmar.c
> > @@ -1339,6 +1339,18 @@ void qi_flush_iotlb(struct intel_iommu
> > *iommu, u16 did, u64 addr, qi_submit_sync(&desc, iommu);
> >  }
> >  
> > +void qi_flush_eiotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> > u32 pasid,
> > +   unsigned int size_order, u64 granu, bool global)  
> 
> Alignment should match open parenthesis.
> 
> > +{
> > +   struct qi_desc desc;
> > +
> > +   desc.low = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> > +   QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
> > +   desc.high = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_GL(global) |
> > +   QI_EIOTLB_IH(0) | QI_EIOTLB_AM(size_order);
> > +   qi_submit_sync(&desc, iommu);
> > +}
> > +
> >  void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16
> > pfsid, u16 qdep, u64 addr, unsigned mask)
> >  {
> > @@ -1360,6 +1372,38 @@ void qi_flush_dev_iotlb(struct intel_iommu
> > *iommu, u16 sid, u16 pfsid, qi_submit_sync(&desc, iommu);
> >  }
> >  
> > +void qi_flush_dev_eiotlb(struct intel_iommu *iommu, u16 sid,
> > +   u32 pasid,  u16 qdep, u64 addr, unsigned size, u64
> > granu)  
> 
> Ditto.
> 
> > +{
> > +   struct qi_desc desc;
> > +
> > +   desc.low = QI_DEV_EIOTLB_PASID(pasid) |
> > QI_DEV_EIOTLB_SID(sid) |
> > +   QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE;  
> 
> Have you forgotten PFSID, or I missed anything here?
you are right, missed pfsid in this case.
> 
>  [...]  
> 
> Best regards,
> Lu Baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 07/23] iommu/vt-d: fix dev iotlb pfsid use

2018-05-14 Thread Jacob Pan
On Mon, 14 May 2018 09:52:04 +0800
Lu Baolu  wrote:

> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 4623294..732a10f 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -1459,6 +1459,19 @@ static void iommu_enable_dev_iotlb(struct
> > device_domain_info *info) return;
> >  
> > pdev = to_pci_dev(info->dev);
> > +   /* For IOMMU that supports device IOTLB throttling (DIT),
> > we assign
> > +* PFSID to the invalidation desc of a VF such that IOMMU
> > HW can gauge
> > +* queue depth at PF level. If DIT is not set, PFSID will
> > be treated as
> > +* reserved, which should be set to 0.
> > +*/
> > +   if (!ecap_dit(info->iommu->ecap))
> > +   info->pfsid = 0;
> > +   else if (pdev && pdev->is_virtfn) {
> > +   if (ecap_(info->iommu->ecap))
> > +   dev_warn(&pdev->dev, "SRIOV VF device
> > IOTLB enabled without flow control\n");  
> 
> I can't understand these two lines.
> 
> Isn't the condition always true? What does the error message mean?

you are right, there is no need to check ecap_dit again. thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 06/23] iommu/vt-d: add definitions for PFSID

2018-05-14 Thread Jacob Pan
On Mon, 14 May 2018 09:36:08 +0800
Lu Baolu  wrote:

> Hi,
> 
> On 05/12/2018 04:53 AM, Jacob Pan wrote:
> > When SRIOV VF device IOTLB is invalidated, we need to provide
> > the PF source ID such that IOMMU hardware can gauge the depth
> > of invalidation queue which is shared among VFs. This is needed
> > when device invalidation throttle (DIT) capability is supported.
> >
> > This patch adds bit definitions for checking and tracking PFSID.  
> 
> Patch 6 and 7 could be posted in a separated patch series.
> 
Thought about that also, but we need to include PFSID passdown
invalidation in this patchset, that is why i prefer to include
this fix, otherwise this patchset will continue to be wrong.
> >
> > Signed-off-by: Jacob Pan 
> > ---
> >  include/linux/intel-iommu.h | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index ddc7d79..dfacd49 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -114,6 +114,7 @@
> >   * Extended Capability Register
> >   */
> >  
> > +#define ecap_dit(e)((e >> 41) & 0x1)
> >  #define ecap_pasid(e)  ((e >> 40) & 0x1)
> >  #define ecap_pss(e)((e >> 35) & 0x1f)
> >  #define ecap_eafs(e)   ((e >> 34) & 0x1)
> > @@ -284,6 +285,7 @@ enum {
> >  #define QI_DEV_IOTLB_SID(sid)  ((u64)((sid) & 0x) << 32)
> >  #define QI_DEV_IOTLB_QDEP(qdep)(((qdep) & 0x1f) << 16)
> >  #define QI_DEV_IOTLB_ADDR(addr)((u64)(addr) &
> > VTD_PAGE_MASK) +#define QI_DEV_IOTLB_PFSID(pfsid) (((u64)(pfsid &
> > 0xf) << 12) | ((u64)(pfsid & 0xff0) << 48)) #define
> > QI_DEV_IOTLB_SIZE   1 #define QI_DEV_IOTLB_MAX_INVS 32
> >  
> > @@ -308,6 +310,7 @@ enum {
> >  #define QI_DEV_EIOTLB_PASID(p) (((u64)p) << 32)
> >  #define QI_DEV_EIOTLB_SID(sid) ((u64)((sid) & 0x) << 16)
> >  #define QI_DEV_EIOTLB_QDEP(qd) ((u64)((qd) & 0x1f) << 4)
> > +#define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) |
> > ((u64)(pfsid & 0xff0) << 48))  
> 
> PFSID[15:4] are stored in Descriptor [63:52], hence it should look
> like:
> 
> +#define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) |
> ((u64)(pfsid & 0xfff0) << 48))
> 
> 
good catch! thanks.
> 
> >  #define QI_DEV_EIOTLB_MAX_INVS 32
> >  
> >  #define QI_PGRP_IDX(idx)   (((u64)(idx)) << 55)
> > @@ -467,6 +470,7 @@ struct device_domain_info {
> > struct list_head global; /* link to global list */
> > u8 bus; /* PCI bus number */
> > u8 devfn;   /* PCI devfn number */
> > +   u16 pfsid;  /* SRIOV physical function
> > source ID */ u8 pasid_supported:3;
> > u8 pasid_enabled:1;
> > u8 pri_supported:1;  
> 
> Best regards,
> Lu Baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 04/23] iommu/vt-d: add bind_pasid_table function

2018-05-14 Thread Jacob Pan
On Sun, 13 May 2018 17:29:47 +0800
Lu Baolu  wrote:

> Hi,
> 
> On 05/12/2018 04:53 AM, Jacob Pan wrote:
> > Add Intel VT-d ops to the generic iommu_bind_pasid_table API
> > functions.
> >
> > The primary use case is for direct assignment of SVM capable
> > device. Originated from emulated IOMMU in the guest, the request
> > goes through many layers (e.g. VFIO). Upon calling host IOMMU
> > driver, caller passes guest PASID table pointer (GPA) and size.
> >
> > Device context table entry is modified by Intel IOMMU specific
> > bind_pasid_table function. This will turn on nesting mode and
> > matching translation type.
> >
> > The unbind operation restores default context mapping.
> >
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Liu, Yi L 
> > Signed-off-by: Ashok Raj 
> > ---
> >  drivers/iommu/intel-iommu.c   | 122
> > ++
> > include/linux/dma_remapping.h |   1 + 2 files changed, 123
> > insertions(+)
> >
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index a0f81a4..4623294 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -2409,6 +2409,7 @@ static struct dmar_domain
> > *dmar_insert_one_dev_info(struct intel_iommu *iommu,
> > info->ats_supported = info->pasid_supported = info->pri_supported =
> > 0; info->ats_enabled = info->pasid_enabled = info->pri_enabled = 0;
> > info->ats_qdep = 0;
> > +   info->pasid_table_bound = 0;
> > info->dev = dev;
> > info->domain = domain;
> > info->iommu = iommu;
> > @@ -5132,6 +5133,7 @@ static void
> > intel_iommu_put_resv_regions(struct device *dev, 
> >  #ifdef CONFIG_INTEL_IOMMU_SVM
> >  #define MAX_NR_PASID_BITS (20)
> > +#define MIN_NR_PASID_BITS (5)
> >  static inline unsigned long intel_iommu_get_pts(struct intel_iommu
> > *iommu) {
> > /*
> > @@ -5258,6 +5260,122 @@ struct intel_iommu
> > *intel_svm_device_to_iommu(struct device *dev) 
> > return iommu;
> >  }
> > +
> > +static int intel_iommu_bind_pasid_table(struct iommu_domain
> > *domain,
> > +   struct device *dev, struct pasid_table_config
> > *pasidt_binfo) +{
> > +   struct intel_iommu *iommu;
> > +   struct context_entry *context;
> > +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +   struct device_domain_info *info;
> > +   struct pci_dev *pdev;
> > +   u8 bus, devfn, host_table_pasid_bits;
> > +   u16 did, sid;
> > +   int ret = 0;
> > +   unsigned long flags;
> > +   u64 ctx_lo;  
> 
> I personally prefer to have this in order.
> 
> struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> u8 bus, devfn, host_table_pasid_bits;
> struct device_domain_info *info;
> struct context_entry *context;
> struct intel_iommu *iommu;
> struct pci_dev *pdev;
> unsigned long flags;
> u16 did, sid;
> int ret = 0;
> u64 ctx_lo;
> 
looks better.
> > +
> > +   if ((pasidt_binfo->version != PASID_TABLE_CFG_VERSION_1)
> > ||  
> 
> Unnecessary parentheses.
> 
here for readability.
> > +   pasidt_binfo->bytes != sizeof(*pasidt_binfo))  
> 
> Alignment should match open parenthesis.
> 
> > +   return -EINVAL;
> > +   iommu = device_to_iommu(dev, &bus, &devfn);
> > +   if (!iommu)
> > +   return -ENODEV;
> > +   /* VT-d spec section 9.4 says pasid table size is encoded
> > as 2^(x+5) */
> > +   host_table_pasid_bits = intel_iommu_get_pts(iommu) +
> > MIN_NR_PASID_BITS;
> > +   if (!pasidt_binfo || pasidt_binfo->pasid_bits >
> > host_table_pasid_bits ||  
> 
> "!pasidt_binfo" checking should be moved up to the version checking.
> 
good point!
> > +   pasidt_binfo->pasid_bits < MIN_NR_PASID_BITS) {
> > +   pr_err("Invalid gPASID bits %d, host range %d -
> > %d\n",  
> 
> How about dev_err()? 
> 
the error is not exactly specific to the device but rather the guest.
> > +   pasidt_binfo->pasid_bits,
> > +   MIN_NR_PASID_BITS, host_table_pasid_bits);
> > +   return -ERANGE;
> > +   }
> > +   if (!ecap_nest(iommu->ecap)) {
> > +   dev_err(dev, "Cannot bind PASID table, no nested
> > translation\n");
> > +   ret = -ENODEV;
> > +   goto out;  
> 
> How about
> +return -ENODEV;
> ?
> 
> > +   }
> > +   pdev = to_pci_dev(dev);  
> 
> We can't always assume that it is a PCI device, right?
> 
for vt-d, I don't think we expect any non-pci device.
> > +   sid = PCI_DEVID(bus, devfn);
> > +   info = dev->archdata.iommu;
> > +
> > +   if (!info) {
> > +   dev_err(dev, "Invalid device domain info\n");
> > +   ret = -EINVAL;
> > +   goto out;
> > +   }
> > +   if (info->pasid_table_bound) {  
> 
> We should do this checking with lock hold.
> 
agreed. will hold the device_domain_lock.
> Otherwise,
> 
> Thread A on CPUxThread B on CPUy
> ===
> check pasid_table_boundcheck pasid_table_bound
> 
> mutex_lock()
> Setup 

Re: [PATCH v7 2/2] iommu/amd: Add basic debugfs infrastructure for AMD IOMMU

2018-05-14 Thread Gary R Hook

On 05/14/2018 12:50 PM, Randy Dunlap wrote:

On 05/14/2018 10:20 AM, Gary R Hook wrote:

Implement a skeleton framework for debugfs support in the
AMD IOMMU.

Signed-off-by: Gary R Hook 
---
  drivers/iommu/Makefile|5 +
  drivers/iommu/amd_iommu_debugfs.c |   39 +
  drivers/iommu/amd_iommu_init.c|6 --
  drivers/iommu/amd_iommu_proto.h   |6 ++
  drivers/iommu/amd_iommu_types.h   |3 +++
  5 files changed, 57 insertions(+), 2 deletions(-)
  create mode 100644 drivers/iommu/amd_iommu_debugfs.c

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 74cfbc392862..dd980f7dd8b6 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -30,3 +30,8 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
+
+# This ensures that only the required files are compiled
+ifeq ($(CONFIG_IOMMU_DEBUGFS), y)


Most Makefiles don't use a space before the 'y', but since you tested it,
I guess either way works.


Pretty sure whitespace isn't used as a delimiter in this construct. I 
could be mistaken. But yes, it's perfectly serviceable.



But why do this in the Makefile at all?  Why not just add another Kconfig
symbol and simplify the Makefile?


+obj-$(CONFIG_AMD_IOMMU) += amd_iommu_debugfs.o
+endif



This was brought up a few weeks ago in, I believe, version 3 of this 
patch. That question was discussed (because that's what I did the first 
time out), and _someone_ _else_ asked about why I didn't just do it the 
way I've done it here.


Everyone has a preference.

I chose to simplify the choices and avoid multiple symbols, instead 
opting for two switches: choose your device, and decide on Debug FS 
enablement for it. IMO Very simple.


I can't fathom a scenario where this wouldn't work. Is there one?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


iommu/amd: flush IOTLB for specific domains only (v2)

2018-05-14 Thread Joseph Salisbury
Hello Arindam,

There is a bug report[0] that you created a patch[1] for a while back. 
However, the patch never landed in mainline.  There is a bug reporter in
Ubuntu[2] that is affected by this bug and is willing to test the
patch.  I attempted to build a test kernel with the patch, but it does
not apply to currently mainline cleanly.  Do you still think this patch
may resolve this bug?  If so, is there a version of your patch available
that will apply to current mainline?

Thanks,

Joe

[0] https://bugs.freedesktop.org/show_bug.cgi?id=101029
[1] https://patchwork.freedesktop.org/patch/157327/
[2] http://pad.lv/1747463

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 03/23] iommu/vt-d: add a flag for pasid table bound status

2018-05-14 Thread Jacob Pan
On Sun, 13 May 2018 16:01:50 +0800
Lu Baolu  wrote:

> > +   u8 pasid_table_bound:1;  
> 
> Can you please add some comments here? So that, people can
> understand the purpose for this bit exactly.
will do. I will add:
u8 pasid_table_bound:1; /* PASID table is bound to a guest */
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 03/23] iommu/vt-d: add a flag for pasid table bound status

2018-05-14 Thread Jacob Pan
On Sun, 13 May 2018 15:33:23 +0800
Lu Baolu  wrote:

> Hi,
> 
> On 05/12/2018 04:53 AM, Jacob Pan wrote:
> > Adding a flag in device domain into to track whether a guest or  
> typo:   ^^info
> 
good catch, will fix. thanks
> Best regards,
> Lu Baolu
> 
>  [...]  
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 17/22] iommu/intel-svm: report device page request

2018-05-14 Thread Jacob Pan
On Mon, 14 May 2018 06:56:11 +
"Liu, Yi L"  wrote:

> Same comment with the one to patch 16, pci_get_bus_and_slot() is
> deprecated, may update accordingly.

yes, it has been updated in v5, could you review v5?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 7/9] iommu/tegra: gart: Provide single domain and group for all devices

2018-05-14 Thread Robin Murphy

On 11/05/18 21:05, Dmitry Osipenko wrote:

On 11.05.2018 15:32, Robin Murphy wrote:

On 08/05/18 19:16, Dmitry Osipenko wrote:

GART aperture is shared by all devices, hence there is a single IOMMU
domain and group shared by these devices. Allocation of a group per
device only wastes resources and allowance of having more than one domain
is simply wrong because IOMMU mappings made by the users of "different"
domains will stomp on each other.


Strictly, that reasoning is a bit backwards - allocating multiple groups is the
conceptually-wrong thing if the GART cannot differentiate between different
devices, whereas having multiple domains *exist* is no real problem, it's merely
that only one can be active at any point in time (which will inherently become
the case once all devices are grouped together).


IIUC, the IOMMU domain represents the address space. There is only one address
space in a case of GART, the GART's aperture. So GART not only isn't
differentiating between different devices, but also between different domains.


Right, but that's the same as many other IOMMUs (exynos, rockchip, mtk, 
etc.) - the point is that an IOMMU domain represents *an* address space, 
but if nothing is attached to that domain, it's just a set of logical 
mappings which doesn't need to be backed by real hardware. It's 
specifically *because* these IOMMUs also can't differentiate between 
devices that things work out neatly - there's only one group, which can 
only be attached to a single domain at once, so there is never a time 
when more than one domain needs to be backed by hardware. Think of the 
IOMMU+devices as a CPU and the domains as processes ;)



Signed-off-by: Dmitry Osipenko 
---
   drivers/iommu/tegra-gart.c | 107 +
   1 file changed, 24 insertions(+), 83 deletions(-)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 5b2d27620350..ebc105c201bd 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -19,7 +19,6 @@
     #include 
   #include 
-#include 
   #include 
   #include 
   #include 
@@ -44,22 +43,17 @@
   #define GART_PAGE_MASK    \
   (~(GART_PAGE_SIZE - 1) & ~GART_ENTRY_PHYS_ADDR_VALID)
   -struct gart_client {
-    struct device    *dev;
-    struct list_head    list;
-};
-
   struct gart_device {
   void __iomem    *regs;
   u32    *savedata;
   u32    page_count;    /* total remappable size */
   dma_addr_t    iovmm_base;    /* offset to vmm_area */
   spinlock_t    pte_lock;    /* for pagetable */
-    struct list_head    client;
-    spinlock_t    client_lock;    /* for client list */
   struct device    *dev;
     struct iommu_device    iommu;    /* IOMMU Core handle */
+    struct iommu_group    *group;    /* Common IOMMU group */
+    struct gart_domain    *domain;    /* Unique IOMMU domain */
     struct tegra_mc_gart_handle mc_gart_handle;
   };
@@ -169,81 +163,31 @@ static inline bool gart_iova_range_valid(struct
gart_device *gart,
   static int gart_iommu_attach_dev(struct iommu_domain *domain,
    struct device *dev)
   {
-    struct gart_domain *gart_domain = to_gart_domain(domain);
-    struct gart_device *gart = gart_domain->gart;
-    struct gart_client *client, *c;
-    int err = 0;
-
-    client = devm_kzalloc(gart->dev, sizeof(*c), GFP_KERNEL);
-    if (!client)
-    return -ENOMEM;
-    client->dev = dev;
-
-    spin_lock(&gart->client_lock);
-    list_for_each_entry(c, &gart->client, list) {
-    if (c->dev == dev) {
-    dev_err(gart->dev,
-    "%s is already attached\n", dev_name(dev));
-    err = -EINVAL;
-    goto fail;
-    }
-    }
-    list_add(&client->list, &gart->client);
-    spin_unlock(&gart->client_lock);
-    dev_dbg(gart->dev, "Attached %s\n", dev_name(dev));
   return 0;
-
-fail:
-    devm_kfree(gart->dev, client);
-    spin_unlock(&gart->client_lock);
-    return err;
   }
     static void gart_iommu_detach_dev(struct iommu_domain *domain,
     struct device *dev)
   {
-    struct gart_domain *gart_domain = to_gart_domain(domain);
-    struct gart_device *gart = gart_domain->gart;
-    struct gart_client *c;
-
-    spin_lock(&gart->client_lock);
-
-    list_for_each_entry(c, &gart->client, list) {
-    if (c->dev == dev) {
-    list_del(&c->list);
-    devm_kfree(gart->dev, c);
-    dev_dbg(gart->dev, "Detached %s\n", dev_name(dev));
-    goto out;
-    }
-    }
-    dev_err(gart->dev, "Couldn't find\n");
-out:
-    spin_unlock(&gart->client_lock);
   }


The .detach_dev callback is optional in the core API now, so you can just remove
the whole thing.


Good catch, thanks!




   static struct iommu_domain *gart_iommu_domain_alloc(unsigned type)
   {
-    struct gart_domain *gart_domain;
-    struct gart_device *gart;
-
-    if (type != IOMMU_DOMAIN_UNMA

Re: [PATCH v7 2/2] iommu/amd: Add basic debugfs infrastructure for AMD IOMMU

2018-05-14 Thread Randy Dunlap
On 05/14/2018 10:20 AM, Gary R Hook wrote:
> Implement a skeleton framework for debugfs support in the
> AMD IOMMU.
> 
> Signed-off-by: Gary R Hook 
> ---
>  drivers/iommu/Makefile|5 +
>  drivers/iommu/amd_iommu_debugfs.c |   39 
> +
>  drivers/iommu/amd_iommu_init.c|6 --
>  drivers/iommu/amd_iommu_proto.h   |6 ++
>  drivers/iommu/amd_iommu_types.h   |3 +++
>  5 files changed, 57 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/iommu/amd_iommu_debugfs.c
> 
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 74cfbc392862..dd980f7dd8b6 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -30,3 +30,8 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
>  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
>  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
> +
> +# This ensures that only the required files are compiled
> +ifeq ($(CONFIG_IOMMU_DEBUGFS), y)

Most Makefiles don't use a space before the 'y', but since you tested it,
I guess either way works.

But why do this in the Makefile at all?  Why not just add another Kconfig
symbol and simplify the Makefile?

> +obj-$(CONFIG_AMD_IOMMU) += amd_iommu_debugfs.o
> +endif

-- 
~Randy
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 2/2] iommu/amd: Add basic debugfs infrastructure for AMD IOMMU

2018-05-14 Thread Gary R Hook
Implement a skeleton framework for debugfs support in the
AMD IOMMU.

Signed-off-by: Gary R Hook 
---
 drivers/iommu/Makefile|5 +
 drivers/iommu/amd_iommu_debugfs.c |   39 +
 drivers/iommu/amd_iommu_init.c|6 --
 drivers/iommu/amd_iommu_proto.h   |6 ++
 drivers/iommu/amd_iommu_types.h   |3 +++
 5 files changed, 57 insertions(+), 2 deletions(-)
 create mode 100644 drivers/iommu/amd_iommu_debugfs.c

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 74cfbc392862..dd980f7dd8b6 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -30,3 +30,8 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
 obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
+
+# This ensures that only the required files are compiled
+ifeq ($(CONFIG_IOMMU_DEBUGFS), y)
+obj-$(CONFIG_AMD_IOMMU) += amd_iommu_debugfs.o
+endif
diff --git a/drivers/iommu/amd_iommu_debugfs.c 
b/drivers/iommu/amd_iommu_debugfs.c
new file mode 100644
index ..6dff98552969
--- /dev/null
+++ b/drivers/iommu/amd_iommu_debugfs.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD IOMMU driver
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ *
+ * Author: Gary R Hook 
+ */
+
+#include 
+#include 
+#include 
+#include "amd_iommu_proto.h"
+#include "amd_iommu_types.h"
+
+static struct dentry *amd_iommu_debugfs;
+static DEFINE_MUTEX(amd_iommu_debugfs_lock);
+
+#defineMAX_NAME_LEN20
+
+void amd_iommu_debugfs_setup(struct amd_iommu *iommu)
+{
+   char name[MAX_NAME_LEN + 1];
+
+   mutex_lock(&amd_iommu_debugfs_lock);
+   if (!amd_iommu_debugfs)
+   amd_iommu_debugfs = iommu_debugfs_new_driver_dir("amd");
+   mutex_unlock(&amd_iommu_debugfs_lock);
+
+   if (amd_iommu_debugfs) {
+   snprintf(name, MAX_NAME_LEN, "iommu%02d", iommu->index);
+   iommu->debugfs = debugfs_create_dir(name,
+   amd_iommu_debugfs);
+   if (!iommu->debugfs) {
+   debugfs_remove_recursive(amd_iommu_debugfs);
+   amd_iommu_debugfs = NULL;
+   }
+   }
+}
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 904c575d1677..031e6dbb8345 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2721,6 +2721,7 @@ int __init amd_iommu_enable_faulting(void)
  */
 static int __init amd_iommu_init(void)
 {
+   struct amd_iommu *iommu;
int ret;
 
ret = iommu_go_to_state(IOMMU_INITIALIZED);
@@ -2730,14 +2731,15 @@ static int __init amd_iommu_init(void)
disable_iommus();
free_iommu_resources();
} else {
-   struct amd_iommu *iommu;
-
uninit_device_table_dma();
for_each_iommu(iommu)
iommu_flush_all_caches(iommu);
}
}
 
+   for_each_iommu(iommu)
+   amd_iommu_debugfs_setup(iommu);
+
return ret;
 }
 
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index 640c286a0ab9..39053f11dda3 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -33,6 +33,12 @@ extern void amd_iommu_uninit_devices(void);
 extern void amd_iommu_init_notifier(void);
 extern int amd_iommu_init_api(void);
 
+#ifdef CONFIG_IOMMU_DEBUGFS
+void amd_iommu_debugfs_setup(struct amd_iommu *iommu);
+#else
+static inline void amd_iommu_debugfs_setup(struct amd_iommu *iommu) {}
+#endif
+
 /* Needed for interrupt remapping */
 extern int amd_iommu_prepare(void);
 extern int amd_iommu_enable(void);
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 1c9b080276c9..2ca0959ae9e6 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -593,6 +593,9 @@ struct amd_iommu {
 
u32 flags;
volatile u64 __aligned(8) cmd_sem;
+
+   /* DebugFS Info */
+   struct dentry *debugfs;
 };
 
 static inline struct amd_iommu *dev_to_amd_iommu(struct device *dev)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 1/2] iommu - Enable debugfs exposure of IOMMU driver internals

2018-05-14 Thread Gary R Hook
Provide base enablement for using debugfs to expose internal data of an
IOMMU driver. When called, create the /sys/kernel/debug/iommu directory.

Emit a strong warning at boot time to indicate that this feature is
enabled.

This function is called from iommu_init, and creates the initial DebugFS
directory. Drivers may then call iommu_debugfs_new_driver_dir() to
instantiate a device-specific directory to expose internal data.
It will return a pointer to the new dentry structure created in
/sys/kernel/debug/iommu, or NULL in the event of a failure.

Since the IOMMU driver can not be removed from the running system, there
is no need for an "off" function.

Signed-off-by: Gary R Hook 
---
 drivers/iommu/Kconfig |   10 ++
 drivers/iommu/Makefile|1 +
 drivers/iommu/iommu-debugfs.c |   72 +
 drivers/iommu/iommu.c |2 +
 include/linux/iommu.h |   11 ++
 5 files changed, 96 insertions(+)
 create mode 100644 drivers/iommu/iommu-debugfs.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f3a21343e636..2eab6a849a0a 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -60,6 +60,16 @@ config IOMMU_IO_PGTABLE_ARMV7S_SELFTEST
 
 endmenu
 
+config IOMMU_DEBUGFS
+   bool "Export IOMMU internals in DebugFS"
+   depends on DEBUG_FS
+   help
+ Allows exposure of IOMMU device internals. This option enables
+ the use of debugfs by IOMMU drivers as required. Devices can,
+ at initialization time, cause the IOMMU code to create a top-level
+ debug/iommu directory, and then populate a subdirectory with
+ entries as required.
+
 config IOMMU_IOVA
tristate
 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 1fb695854809..74cfbc392862 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -2,6 +2,7 @@
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
+obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
diff --git a/drivers/iommu/iommu-debugfs.c b/drivers/iommu/iommu-debugfs.c
new file mode 100644
index ..bb1bf2d0ac51
--- /dev/null
+++ b/drivers/iommu/iommu-debugfs.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * IOMMU debugfs core infrastructure
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ *
+ * Author: Gary R Hook 
+ */
+
+#include 
+#include 
+#include 
+
+static struct dentry *iommu_debugfs_dir;
+
+/**
+ * iommu_debugfs_setup - create the top-level iommu directory in debugfs
+ *
+ * Provide base enablement for using debugfs to expose internal data of an
+ * IOMMU driver. When called, this function creates the
+ * /sys/kernel/debug/iommu directory.
+ *
+ * Emit a strong warning at boot time to indicate that this feature is
+ * enabled.
+ *
+ * This function is called from iommu_init; drivers may then call
+ * iommu_debugfs_new_driver_dir() to instantiate a vendor-specific
+ * directory to be used to expose internal data.
+ */
+void iommu_debugfs_setup(void)
+{
+   if (!iommu_debugfs_dir) {
+   iommu_debugfs_dir = debugfs_create_dir("iommu", NULL);
+   if (iommu_debugfs_dir) {
+   pr_warn("\n");
+   
pr_warn("*\n");
+   pr_warn("** NOTICE NOTICE NOTICE NOTICE NOTICE 
NOTICE NOTICE**\n");
+   pr_warn("** 
**\n");
+   pr_warn("**  IOMMU DebugFS SUPPORT HAS BEEN ENABLED IN 
THIS KERNEL  **\n");
+   pr_warn("** 
**\n");
+   pr_warn("** This means that this kernel is built to 
expose internal **\n");
+   pr_warn("** IOMMU data structures, which may compromise 
security on **\n");
+   pr_warn("** your system.
**\n");
+   pr_warn("** 
**\n");
+   pr_warn("** If you see this message and you are not 
debugging the   **\n");
+   pr_warn("** kernel, report this immediately to your 
vendor! **\n");
+   pr_warn("** 
**\n");
+   pr_warn("** NOTICE NOTICE NOTICE NOTICE NOTICE 
NOTICE NOTICE**\n");
+   
pr_warn("*\n");
+   }
+   }
+}
+
+/**
+ * iommu_debugfs_new_driver_dir - create a vendor directory under debugfs/iommu
+ * @vendor: name of the vendor-sp

[PATCH v7 0/2] Base enablement of IOMMU debugfs support

2018-05-14 Thread Gary R Hook
These patches create a top-level function, called at IOMMU
initialization, to create a debugfs directory for the IOMMU. Under
this directory drivers may create and populate-specific directories
for their device internals.

Patch 1: general IOMMU enablement
Patch 2: basic AMD enablement to demonstrate linkage with patch 1

Introduce a new Kconfig parameter IOMMU_DEBUGFS to globally
allow/disallow debugfs code to be built.

The Makefile structure is intended to allow the use of a single
switch for turning on DebugFS.

Changes since v6:
 - Rely on default Kconfig value for a bool
 - comment/doc fixes
 - use const where appropriate
 - fix inline declaration

Changes since v5:
 - Added parameters names in declarations/definitions
 - Reformatted an inline definition

Changes since v4:
 - Guard vendor-specific debugfs files in the Makefile
 - Call top-level routine from iommu_init()
 - Add function for instantiating a driver-specific directory
 - Change AMD driver code to use this new format

Changes since v3:
 - Remove superfluous calls to debugfs_initialized()
 - Emit a warning exactly one time
 - Change the Kconfig name to IOMMU_DEBUGFS
 - Change the way debugfs modules are made

Changes since v2:
 - Move a declaration to outside an ifdef
 - Remove a spurious blank line

Changes since v1:
 - Remove debug cruft
 - Remove cruft produced by design change
 - Change the lock to a mutex
 - Coding style fixes
 - Add a comment to document the framework

---

Gary R Hook (2):
  iommu - Enable debugfs exposure of IOMMU driver internals
  iommu/amd: Add basic debugfs infrastructure for AMD IOMMU


 drivers/iommu/Kconfig |   10 +
 drivers/iommu/Makefile|6 +++
 drivers/iommu/amd_iommu_debugfs.c |   39 
 drivers/iommu/amd_iommu_init.c|6 ++-
 drivers/iommu/amd_iommu_proto.h   |6 +++
 drivers/iommu/amd_iommu_types.h   |3 ++
 drivers/iommu/iommu-debugfs.c |   72 +
 drivers/iommu/iommu.c |2 +
 include/linux/iommu.h |   11 ++
 9 files changed, 153 insertions(+), 2 deletions(-)
 create mode 100644 drivers/iommu/amd_iommu_debugfs.c
 create mode 100644 drivers/iommu/iommu-debugfs.c

--
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1] iommu: Remove extra NULL check when call strtobool()

2018-05-14 Thread Andy Shevchenko
strtobool() does check for NULL parameter already. No need to repeat.

While here, switch to kstrtobool() and unshadow actual error code
(which is still -EINVAL).

No functional change intended.

Signed-off-by: Andy Shevchenko 
---
 drivers/iommu/iommu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d2aa23202bb9..7f61b142263e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -116,9 +116,11 @@ static void __iommu_detach_group(struct iommu_domain 
*domain,
 static int __init iommu_set_def_domain_type(char *str)
 {
bool pt;
+   int ret;
 
-   if (!str || strtobool(str, &pt))
-   return -EINVAL;
+   ret = kstrtobool(str, &pt);
+   if (ret)
+   return ret;
 
iommu_def_domain_type = pt ? IOMMU_DOMAIN_IDENTITY : IOMMU_DOMAIN_DMA;
return 0;
-- 
2.17.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/3] arm64: Force swiotlb bounce buffering for non-coherent DMA with large CWG

2018-05-14 Thread Catalin Marinas
On Sat, May 12, 2018 at 02:38:29PM +0200, Christoph Hellwig wrote:
> On Fri, May 11, 2018 at 02:55:47PM +0100, Catalin Marinas wrote:
> > On systems with a Cache Writeback Granule (CTR_EL0.CWG) greater than
> > ARCH_DMA_MINALIGN, DMA cache maintenance on sub-CWG ranges is not safe,
> > leading to data corruption. If such configuration is detected, the
> > kernel will force swiotlb bounce buffering for all non-coherent devices.
> 
> Per the previous discussion I understand that so far this is a
> purely theoretical condition. 

That's what we think, at least for publicly available hardware.

> Given that I'd rather avoid commiting this patch and just refuse too
> boot in this case.

I'll keep it to a WARN_TAINT() for now. Given that the warn triggers
only when cache_line_size() > ARCH_DMA_MINALIGN and we keep this
constant unchanged (128), it shouldn't be much different from our
current assumptions and no-one complained of DMA corruption so far.

> In a merge window or two I plan to have a noncoherent flag in struct
> device, at which point we can handle this entirely in common code.

Sounds ok, looking forward to this.

Thanks.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/6 v3] bus: fsl-mc: supoprt dma configure for devices on fsl-mc bus

2018-05-14 Thread Laurentiu Tudor
Hi Nipun,

On 04/27/2018 01:27 PM, Nipun Gupta wrote:
> Signed-off-by: Nipun Gupta 
> ---
>   drivers/bus/fsl-mc/fsl-mc-bus.c | 16 
>   1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
> index 5d8266c..624828b 100644
> --- a/drivers/bus/fsl-mc/fsl-mc-bus.c
> +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
> @@ -127,6 +127,16 @@ static int fsl_mc_bus_uevent(struct device *dev, struct 
> kobj_uevent_env *env)
>   return 0;
>   }
>
> +static int fsl_mc_dma_configure(struct device *dev)
> +{
> + struct device *dma_dev = dev;
> +
> + while (dev_is_fsl_mc(dma_dev))
> + dma_dev = dma_dev->parent;
> +
> + return of_dma_configure(dev, dma_dev->of_node, 0);
> +}
> +
>   static ssize_t modalias_show(struct device *dev, struct device_attribute 
> *attr,
>char *buf)
>   {
> @@ -148,6 +158,7 @@ struct bus_type fsl_mc_bus_type = {
>   .name = "fsl-mc",
>   .match = fsl_mc_bus_match,
>   .uevent = fsl_mc_bus_uevent,
> + .dma_configure  = fsl_mc_dma_configure,
>   .dev_groups = fsl_mc_dev_groups,
>   };
>   EXPORT_SYMBOL_GPL(fsl_mc_bus_type);
> @@ -616,6 +627,7 @@ int fsl_mc_device_add(struct fsl_mc_obj_desc *obj_desc,
>   mc_dev->icid = parent_mc_dev->icid;
>   mc_dev->dma_mask = FSL_MC_DEFAULT_DMA_MASK;
>   mc_dev->dev.dma_mask = &mc_dev->dma_mask;
> + mc_dev->dev.coherent_dma_mask = mc_dev->dma_mask;

This change seems a bit unrelated to the patch subject. I wonder if it 
makes sense to split it it in a distinct patch.

---
Best Regards, Laurentiu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] dma-debug: Check scatterlist segments

2018-05-14 Thread Robin Murphy
Drivers/subsystems creating scatterlists for DMA should be taking care
to respect the scatter-gather limitations of the appropriate device, as
described by dma_parms. A DMA API implementation cannot feasibly split
a scatterlist into *more* entries than originally passed, so it is not
well defined what they should do when given a segment larger than the
limit they are also required to respect.

Conversely, devices which are less limited than the rather conservative
defaults, or indeed have no limitations at all (e.g. GPUs with their own
internal MMU), should be encouraged to set appropriate dma_parms, as
they may get more efficient DMA mapping performance out of it.

Signed-off-by: Robin Murphy 
---

v2: Wrap it a separate config option - default y/n is open for debate

 lib/Kconfig.debug | 17 +
 lib/dma-debug.c   | 28 
 2 files changed, 45 insertions(+)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b734cd1..5fc326d84ae7 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1651,6 +1651,23 @@ config DMA_API_DEBUG
 
  If unsure, say N.
 
+config DMA_API_DEBUG_SG
+   bool "Debug DMA scatter-gather usage"
+   default y
+   depends on DMA_API_DEBUG
+   help
+ Perform extra checking that callers of dma_map_sg() have respected the
+ appropriate segment length/boundary limits for the given device when
+ preparing DMA scatterlists.
+
+ This is particularly likely to have been overlooked in cases where the
+ dma_map_sg() API is used for general bulk mapping of pages rather than
+ preparing literal scatter-gather descriptors, where there is a risk of
+ unexpected behaviour from DMA API implementations if the scatterlist 
is
+ technically out-of-spec.
+
+ If unsure, say N.
+
 menuconfig RUNTIME_TESTING_MENU
bool "Runtime Testing"
def_bool y
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 7f5cdc1e6b29..c522f3ef80a6 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -1293,6 +1293,32 @@ static void check_sync(struct device *dev,
put_hash_bucket(bucket, &flags);
 }
 
+static void check_sg_segment(struct device *dev, struct scatterlist *sg)
+{
+#ifdef CONFIG_DMA_API_DEBUG_SG
+   unsigned int max_seg = dma_get_max_seg_size(dev);
+   dma_addr_t start, end, boundary = dma_get_seg_boundary(dev);
+
+   /*
+* Either the driver forgot to set dma_parms appropriately, or
+* whoever generated the list forgot to check them.
+*/
+   if (sg->length > max_seg)
+   err_printk(dev, NULL, "DMA-API: mapping sg segment longer than 
device claims to support [len=%u] [max=%u]\n",
+  sg->length, max_seg);
+   /*
+* In some cases this could potentially be the DMA API
+* implementation's fault, but it would usually imply that
+* the scatterlist was built inappropriately to begin with.
+*/
+   start = sg_dma_address(sg);
+   end = start + sg_dma_len(sg) - 1;
+   if ((start ^ end) & ~boundary)
+   err_printk(dev, NULL, "DMA-API: mapping sg segment across 
boundary [start=0x%016llx] [end=0x%016llx] [boundary=0x%016llx]\n",
+  start, end, boundary);
+#endif
+}
+
 void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
size_t size, int direction, dma_addr_t dma_addr,
bool map_single)
@@ -1423,6 +1449,8 @@ void debug_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
check_for_illegal_area(dev, sg_virt(s), sg_dma_len(s));
}
 
+   check_sg_segment(dev, s);
+
add_dma_entry(entry);
}
 }
-- 
2.17.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 5/6] bus: fsl-mc: supoprt dma configure for devices on fsl-mc bus

2018-05-14 Thread Bjorn Helgaas
When you add the changleog, please also fix the subject typo:

- bus: fsl-mc: supoprt dma configure for devices on fsl-mc bus
   ^^^
+ bus: fsl-mc: support dma configure for devices on fsl-mc bus

On Mon, Apr 30, 2018 at 11:57:20AM +0530, Nipun Gupta wrote:
> Signed-off-by: Nipun Gupta 
> ---
>  drivers/bus/fsl-mc/fsl-mc-bus.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
> index 5d8266c..624828b 100644
> --- a/drivers/bus/fsl-mc/fsl-mc-bus.c
> +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
> @@ -127,6 +127,16 @@ static int fsl_mc_bus_uevent(struct device *dev, struct 
> kobj_uevent_env *env)
>   return 0;
>  }
>  
> +static int fsl_mc_dma_configure(struct device *dev)
> +{
> + struct device *dma_dev = dev;
> +
> + while (dev_is_fsl_mc(dma_dev))
> + dma_dev = dma_dev->parent;
> +
> + return of_dma_configure(dev, dma_dev->of_node, 0);
> +}
> +
>  static ssize_t modalias_show(struct device *dev, struct device_attribute 
> *attr,
>char *buf)
>  {
> @@ -148,6 +158,7 @@ struct bus_type fsl_mc_bus_type = {
>   .name = "fsl-mc",
>   .match = fsl_mc_bus_match,
>   .uevent = fsl_mc_bus_uevent,
> + .dma_configure  = fsl_mc_dma_configure,
>   .dev_groups = fsl_mc_dev_groups,
>  };
>  EXPORT_SYMBOL_GPL(fsl_mc_bus_type);
> @@ -616,6 +627,7 @@ int fsl_mc_device_add(struct fsl_mc_obj_desc *obj_desc,
>   mc_dev->icid = parent_mc_dev->icid;
>   mc_dev->dma_mask = FSL_MC_DEFAULT_DMA_MASK;
>   mc_dev->dev.dma_mask = &mc_dev->dma_mask;
> + mc_dev->dev.coherent_dma_mask = mc_dev->dma_mask;
>   dev_set_msi_domain(&mc_dev->dev,
>  dev_get_msi_domain(&parent_mc_dev->dev));
>   }
> @@ -633,10 +645,6 @@ int fsl_mc_device_add(struct fsl_mc_obj_desc *obj_desc,
>   goto error_cleanup_dev;
>   }
>  
> - /* Objects are coherent, unless 'no shareability' flag set. */
> - if (!(obj_desc->flags & FSL_MC_OBJ_FLAG_NO_MEM_SHAREABILITY))
> - arch_setup_dma_ops(&mc_dev->dev, 0, 0, NULL, true);
> -
>   /*
>* The device-specific probe callback will get invoked by device_add()
>*/
> -- 
> 1.9.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 5/6] bus: fsl-mc: supoprt dma configure for devices on fsl-mc bus

2018-05-14 Thread Greg KH
On Mon, Apr 30, 2018 at 11:57:20AM +0530, Nipun Gupta wrote:
> Signed-off-by: Nipun Gupta 
> ---

I can't take patches without any changelog text at all, sorry.  Please
fix up and resend.

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Difference between IOVA and bus address when SMMU is enabled

2018-05-14 Thread Russell King - ARM Linux
On Sat, May 12, 2018 at 06:25:13PM +0530, valmiki wrote:
> Hi All,
> 
> What is the difference between IOVA address and bus address
> when SMMU is enabled ?
> 
> Is IOVA address term used only when hypervisor is present ?

IOVA = IO virtual address.  IOVA is the term normally used to describe
the address used on the _device_ side of an IOMMU.

For any general setup:

RAM - MMU - DEVICE
  ^ ^
  physical   virtual
  addressaddress

where "device" can be an IO device or a CPU, the terms still apply.

If you have something like this:

RAM - PCI bridge - MMU - DEVICE
  ^^ ^
   physical   bus virtual
   address  address   address

You could also have (eg, in the case of a system MMU):

RAM - MMU - PCI bridge - DEVICE
  ^ ^^
   physical  virtualbus
   address   address  address
   (this can also be
considered a bus
address!)

In both of the above two cases, the PCI bridge may perform some address
translation, meaning that the bus address is different from the address
seen on the other side of the bridge.

So, the terms used depend exactly on the overall bus topology.

In the case of a system MMU, where the system MMU sits between peripheral
devices and RAM, then the bus addresses are the same as the
_IOVA of the system MMU_.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Difference between IOVA and bus address when SMMU is enabled

2018-05-14 Thread Jean-Philippe Brucker
Hi Valmiki,

On 12/05/18 13:55, valmiki wrote:
> Hi All,
> 
> What is the difference between IOVA address and bus address
> when SMMU is enabled ?

They are the same. You'll use one term or the other depending on what
system component you're talking about. "IOVA" only means something when
talking about IOMMUs, where it represents the input address. If you're
discussing bus transactions without caring whether an SMMU is enabled or
not, then "bus address" makes more sense.

We distinguish "IOVA" from "VA", which represents the input address of
the CPU's MMU (e.g. any userspace pointer). The distinction is useful
because the SMMU page tables are usually separate from the CPU page
tables. In this case if you want to share a buffer between application
and device, you'll have to allocate and map both a VA on the CPU side,
and an IOVA on the device side. When sharing MMU page tables with the
SMMU (see the SVA work), then we tend to talk about VA instead of IOVA,
because they are identical.

> Is IOVA address term used only when hypervisor is present ?

No, the term is used in bare-metal setups as well.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 17/23] iommu/vt-d: report non-recoverable faults to device

2018-05-14 Thread Lu Baolu
Hi,

On 05/12/2018 04:54 AM, Jacob Pan wrote:
> Currently, dmar fault IRQ handler does nothing more than rate
> limited printk, no critical hardware handling need to be done
> in IRQ context.

Not exactly. dmar_fault() needs to clear all the faults so that
the subsequent faults could be logged.

> For some use case such as vIOMMU, it might be useful to report
> non-recoverable faults outside host IOMMU subsystem. DMAR fault
> can come from both DMA and interrupt remapping which has to be
> set up early before threaded IRQ is available.
> This patch adds an option and a workqueue such that when faults
> are requested, DMAR fault IRQ handler can use the IOMMU fault
> reporting API to report.
>
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> ---
>  drivers/iommu/dmar.c| 159 
> ++--
>  drivers/iommu/intel-iommu.c |   6 +-
>  include/linux/dmar.h|   2 +-
>  include/linux/intel-iommu.h |   1 +
>  4 files changed, 159 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 0b5b052..ef846e3 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1110,6 +1110,12 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
>   return err;
>  }
>  
> +static inline void dmar_free_fault_wq(struct intel_iommu *iommu)
> +{
> + if (iommu->fault_wq)
> + destroy_workqueue(iommu->fault_wq);
> +}
> +
>  static void free_iommu(struct intel_iommu *iommu)
>  {
>   if (intel_iommu_enabled) {
> @@ -1126,6 +1132,7 @@ static void free_iommu(struct intel_iommu *iommu)
>   free_irq(iommu->irq, iommu);
>   dmar_free_hwirq(iommu->irq);
>   iommu->irq = 0;
> + dmar_free_fault_wq(iommu);
>   }
>  
>   if (iommu->qi) {
> @@ -1554,6 +1561,31 @@ static const char *irq_remap_fault_reasons[] =
>   "Blocked an interrupt request due to source-id verification failure",
>  };
>  
> +/* fault data and status */
> +enum intel_iommu_fault_reason {
> + INTEL_IOMMU_FAULT_REASON_SW,
> + INTEL_IOMMU_FAULT_REASON_ROOT_NOT_PRESENT,
> + INTEL_IOMMU_FAULT_REASON_CONTEXT_NOT_PRESENT,
> + INTEL_IOMMU_FAULT_REASON_CONTEXT_INVALID,
> + INTEL_IOMMU_FAULT_REASON_BEYOND_ADDR_WIDTH,
> + INTEL_IOMMU_FAULT_REASON_PTE_WRITE_ACCESS,
> + INTEL_IOMMU_FAULT_REASON_PTE_READ_ACCESS,
> + INTEL_IOMMU_FAULT_REASON_NEXT_PT_INVALID,
> + INTEL_IOMMU_FAULT_REASON_ROOT_ADDR_INVALID,
> + INTEL_IOMMU_FAULT_REASON_CONTEXT_PTR_INVALID,
> + INTEL_IOMMU_FAULT_REASON_NONE_ZERO_RTP,
> + INTEL_IOMMU_FAULT_REASON_NONE_ZERO_CTP,
> + INTEL_IOMMU_FAULT_REASON_NONE_ZERO_PTE,
> + NR_INTEL_IOMMU_FAULT_REASON,
> +};
> +
> +/* fault reasons that are allowed to be reported outside IOMMU subsystem */
> +#define INTEL_IOMMU_FAULT_REASON_ALLOWED \
> + ((1ULL << INTEL_IOMMU_FAULT_REASON_BEYOND_ADDR_WIDTH) | \
> + (1ULL << INTEL_IOMMU_FAULT_REASON_PTE_WRITE_ACCESS) |   \
> + (1ULL << INTEL_IOMMU_FAULT_REASON_PTE_READ_ACCESS))
> +
> +
>  static const char *dmar_get_fault_reason(u8 fault_reason, int *fault_type)
>  {
>   if (fault_reason >= 0x20 && (fault_reason - 0x20 <
> @@ -1634,11 +1666,91 @@ void dmar_msi_read(int irq, struct msi_msg *msg)
>   raw_spin_unlock_irqrestore(&iommu->register_lock, flag);
>  }
>  
> +static enum iommu_fault_reason to_iommu_fault_reason(u8 reason)
> +{
> + if (reason >= NR_INTEL_IOMMU_FAULT_REASON) {
> + pr_warn("unknown DMAR fault reason %d\n", reason);
> + return IOMMU_FAULT_REASON_UNKNOWN;
> + }
> + switch (reason) {
> + case INTEL_IOMMU_FAULT_REASON_SW:
> + case INTEL_IOMMU_FAULT_REASON_ROOT_NOT_PRESENT:
> + case INTEL_IOMMU_FAULT_REASON_CONTEXT_NOT_PRESENT:
> + case INTEL_IOMMU_FAULT_REASON_CONTEXT_INVALID:
> + case INTEL_IOMMU_FAULT_REASON_BEYOND_ADDR_WIDTH:
> + case INTEL_IOMMU_FAULT_REASON_ROOT_ADDR_INVALID:
> + case INTEL_IOMMU_FAULT_REASON_CONTEXT_PTR_INVALID:
> + return IOMMU_FAULT_REASON_INTERNAL;
> + case INTEL_IOMMU_FAULT_REASON_NEXT_PT_INVALID:
> + case INTEL_IOMMU_FAULT_REASON_PTE_WRITE_ACCESS:
> + case INTEL_IOMMU_FAULT_REASON_PTE_READ_ACCESS:
> + return IOMMU_FAULT_REASON_PERMISSION;
> + default:
> + return IOMMU_FAULT_REASON_UNKNOWN;
> + }
> +}
> +
> +struct dmar_fault_work {
> + struct work_struct fault_work;
> + struct intel_iommu *iommu;
> + u64 addr;
> + int type;
> + int fault_type;
> + enum intel_iommu_fault_reason reason;
> + u16 sid;
> +};
> +
> +static void report_fault_to_device(struct work_struct *work)
> +{
> + struct dmar_fault_work *dfw = container_of(work, struct dmar_fault_work,
> + fault_work);
> + struct iommu_fault_event event;
> + struct pci_dev *pdev;
> + u8 bus, devfn;
> +
> + memset(&event, 

Re: [PATCH v5 15/23] iommu: handle page response timeout

2018-05-14 Thread Lu Baolu
Hi,

On 05/12/2018 04:54 AM, Jacob Pan wrote:
> When IO page faults are reported outside IOMMU subsystem, the page
> request handler may fail for various reasons. E.g. a guest received
> page requests but did not have a chance to run for a long time. The
> irresponsive behavior could hold off limited resources on the pending
> device.
> There can be hardware or credit based software solutions as suggested
> in the PCI ATS Ch-4. To provide a basic safty net this patch
> introduces a per device deferrable timer which monitors the longest
> pending page fault that requires a response. Proper action such as
> sending failure response code could be taken when timer expires but not
> included in this patch. We need to consider the life cycle of page
> groupd ID to prevent confusion with reused group ID by a device.
> For now, a warning message provides clue of such failure.
>
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> ---
>  drivers/iommu/iommu.c | 53 
> +++
>  include/linux/iommu.h |  4 
>  2 files changed, 57 insertions(+)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 02fed3e..1f2f49e 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -827,6 +827,37 @@ int iommu_group_unregister_notifier(struct iommu_group 
> *group,
>  }
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
> +static void iommu_dev_fault_timer_fn(struct timer_list *t)
> +{
> + struct iommu_fault_param *fparam = from_timer(fparam, t, timer);
> + struct iommu_fault_event *evt;
> +
> + u64 now;
> +
> + now = get_jiffies_64();
> +
> + /* The goal is to ensure driver or guest page fault handler(via vfio)
> +  * send page response on time. Otherwise, limited queue resources
> +  * may be occupied by some irresponsive guests or drivers.
> +  * When per device pending fault list is not empty, we periodically 
> checks
> +  * if any anticipated page response time has expired.
> +  *
> +  * TODO:
> +  * We could do the following if response time expires:
> +  * 1. send page response code FAILURE to all pending PRQ
> +  * 2. inform device driver or vfio
> +  * 3. drain in-flight page requests and responses for this device
> +  * 4. clear pending fault list such that driver can unregister fault
> +  *handler(otherwise blocked when pending faults are present).
> +  */
> + list_for_each_entry(evt, &fparam->faults, list) {
> + if (time_after64(now, evt->expire))
> + pr_err("Page response time expired!, pasid %d gid %d 
> exp %llu now %llu\n",
> + evt->pasid, evt->page_req_group_id, 
> evt->expire, now);
> + }
> + mod_timer(t, now + prq_timeout);
> +}
> +

This timer scheme is very rough.

The timer expires every 10 seconds (by default).

0   10 2030 40  
  
+---+---+---+---+
^  ^   ^  ^^
 |   | || |
F0 F1  F2 F3   (F1,F2,F3 will not be handled until here!)

F0, F1, F2, F3 are four page faults happens during [0, 10s) time
window. F1, F2, F3 timeout won't be handled until the timer expires
again at 20s. That means a fault might be pending there until about
(2 * prq_timeout) seconds later.

Out of curiosity, Why not adding a timer in iommu_fault_event, starting it in
iommu_report_device_fault() and removing it in iommu_page_response()?

Best regards,
Lu Baolu


>  /**
>   * iommu_register_device_fault_handler() - Register a device fault handler
>   * @dev: the device
> @@ -879,6 +910,9 @@ int iommu_register_device_fault_handler(struct device 
> *dev,
>   param->fault_param->data = data;
>   INIT_LIST_HEAD(¶m->fault_param->faults);
>  
> + if (prq_timeout)
> + timer_setup(¶m->fault_param->timer, 
> iommu_dev_fault_timer_fn,
> + TIMER_DEFERRABLE);
>  done_unlock:
>   mutex_unlock(¶m->lock);
>  
> @@ -935,6 +969,8 @@ int iommu_report_device_fault(struct device *dev, struct 
> iommu_fault_event *evt)
>  {
>   int ret = 0;
>   struct iommu_fault_event *evt_pending;
> + struct timer_list *tmr;
> + u64 exp;
>   struct iommu_fault_param *fparam;
>  
>   /* iommu_param is allocated when device is added to group */
> @@ -955,7 +991,17 @@ int iommu_report_device_fault(struct device *dev, struct 
> iommu_fault_event *evt)
>   ret = -ENOMEM;
>   goto done_unlock;
>   }
> + /* Keep track of response expiration time */
> + exp = get_jiffies_64() + prq_timeout;
> + evt_pending->expire = exp;
>   mutex_lock(&fparam->lock);
> + if (list_empty(&fparam->faults)) {
> + /* First pending event, start timer */
> +