Re: [PATCH 0/5] iommu/vt-d: Consolidate various cache flush ops

2019-12-02 Thread Lu Baolu

Hi Jacob,

On 12/3/19 4:02 AM, Jacob Pan wrote:

On Fri, 22 Nov 2019 11:04:44 +0800
Lu Baolu  wrote:


Intel VT-d 3.0 introduces more caches and interfaces for software to
flush when it runs in the scalable mode. Currently various cache flush
helpers are scattered around. This consolidates them by putting them
in the existing iommu_flush structure.

/* struct iommu_flush - Intel IOMMU cache invalidation ops
  *
  * @cc_inv: invalidate context cache
  * @iotlb_inv: Invalidate IOTLB and paging structure caches when
software
  * has changed second-level tables.
  * @p_iotlb_inv: Invalidate IOTLB and paging structure caches when
software
  *   has changed first-level tables.
  * @pc_inv: invalidate pasid cache
  * @dev_tlb_inv: invalidate cached mappings used by
requests-without-PASID
  *   from the Device-TLB on a endpoint device.
  * @p_dev_tlb_inv: invalidate cached mappings used by
requests-with-PASID
  * from the Device-TLB on an endpoint device
  */
struct iommu_flush {
 void (*cc_inv)(struct intel_iommu *iommu, u16 did,
u16 sid, u8 fm, u64 type);
 void (*iotlb_inv)(struct intel_iommu *iommu, u16 did, u64
addr, unsigned int size_order, u64 type);
 void (*p_iotlb_inv)(struct intel_iommu *iommu, u16 did, u32
pasid, u64 addr, unsigned long npages, bool ih);
 void (*pc_inv)(struct intel_iommu *iommu, u16 did, u32 pasid,
u64 granu);
 void (*dev_tlb_inv)(struct intel_iommu *iommu, u16 sid, u16
pfsid, u16 qdep, u64 addr, unsigned int mask);
 void (*p_dev_tlb_inv)(struct intel_iommu *iommu, u16 sid, u16
pfsid, u32 pasid, u16 qdep, u64 addr,
   unsigned long npages);
};

The name of each cache flush ops is defined according to the spec
section 6.5 so that people are easy to look up them in the spec.


Nice consolidation. For nested SVM, I also introduced cache flushed
helpers as needed.
https://lkml.org/lkml/2019/10/24/857

Should I wait for yours to be merged or you want to extend the this
consolidation after SVA/SVM cache flush? I expect to send my v8 shortly.



Please base your v8 patch on this series. So it could get more chances
for test.

I will queue this patch series for internal test after 5.5-rc1 and if
everything goes well, I will forward it to Joerg around rc4 for linux-
next.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/8] iommu/vt-d: Add first level page table interfaces

2019-12-02 Thread Lu Baolu

Hi,

On 12/3/19 7:27 AM, Jacob Pan wrote:

On Thu, 28 Nov 2019 10:25:47 +0800
Lu Baolu  wrote:


This adds functions to manipulate first level page tables
which could be used by a scalale mode capable IOMMU unit.


FL and SL page tables are very similar, and I presume we are not using
all the flag bits in FL paging structures for DMA mapping. Are there
enough relevant differences to warrant a new set of helper functions
for FL? Or we can merge into one.



I ever thought about this and I am still open for this suggestion.

We had a quick compare on these two page tables. The only concern is the
read/write/present encoding. The present bit in first level implies read
permission while second level page table explicitly has a READ bit.
(recalled from memory, correct me if it's bad. :-)).

Anyway, let's listen to more opinions.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/8] Use 1st-level for DMA remapping

2019-12-02 Thread Lu Baolu

Hi Jacob,

Thanks for reviewing it.

On 12/3/19 4:19 AM, Jacob Pan wrote:

On Thu, 28 Nov 2019 10:25:42 +0800
Lu Baolu  wrote:


Intel VT-d in scalable mode supports two types of page talbes

tables


Got it, thanks!


for DMA translation: the first level page table and the second
level page table. The first level page table uses the same
format as the CPU page table, while the second level page table
keeps compatible with previous formats. The software is able
to choose any one of them for DMA remapping according to the use
case.

This patchset aims to move IOVA (I/O Virtual Address) translation

move guest IOVA only, right?


No. In v1, only for guest IOVA. This has been changed since v2 according
to comments during v1 review period. v2 will use first level for both
host and guest unless nested mode.


to 1st-level page table in scalable mode. This will simplify vIOMMU
(IOMMU simulated by VM hypervisor) design by using the two-stage
translation, a.k.a. nested mode translation.

As Intel VT-d architecture offers caching mode, guest IOVA (GIOVA)
support is now implemented in a shadow page manner. The device
simulation software, like QEMU, has to figure out GIOVA->GPA mappings
and write them to a shadowed page table, which will be used by the
physical IOMMU. Each time when mappings are created or destroyed in
vIOMMU, the simulation software has to intervene. Hence, the changes
on GIOVA->GPA could be shadowed to host.


  .---.
  |  vIOMMU   |
  |---| ..
  |   |IOTLB flush trap |QEMU|
  .---. (map/unmap) ||
  |GIOVA->GPA |>|..  |
  '---' || GIOVA->HPA |  |
  |   | |''  |
  '---' ||
||
''
 |
 <
 |
 v VFIO/IOMMU API
   .---.
   |  pIOMMU   |
   |---|
   |   |
   .---.
   |GIOVA->HPA |
   '---'
   |   |
   '---'

In VT-d 3.0, scalable mode is introduced, which offers two-level
translation page tables and nested translation mode. Regards to
GIOVA support, it can be simplified by 1) moving the GIOVA support
over 1st-level page table to store GIOVA->GPA mapping in vIOMMU,
2) binding vIOMMU 1st level page table to the pIOMMU, 3) using pIOMMU
second level for GPA->HPA translation, and 4) enable nested (a.k.a.
dual-stage) translation in host. Compared with current shadow GIOVA
support, the new approach makes the vIOMMU design simpler and more
efficient as we only need to flush the pIOMMU IOTLB and possible
device-IOTLB when an IOVA mapping in vIOMMU is torn down.

  .---.
  |  vIOMMU   |
  |---| .---.
  |   |IOTLB flush trap |   QEMU|
  .---.(unmap)  |---|
  |GIOVA->GPA |>|   |
  '---' '---'
  |   |   |
  '---'   |
<--
|  VFIO/IOMMU
|  cache invalidation and
| guest gpd bind interfaces
v
  .---.
  |  pIOMMU   |
  |---|
  .---.
  |GIOVA->GPA |<---First level
  '---'
  | GPA->HPA  |<---Scond level
  '---'
  '---'

This patch set includes two parts. The former part implements the
per-domain page table abstraction, which makes the page table
difference transparent to various map/unmap APIs. The later part

s/later/latter/

applies the first level page table for IOVA translation unless the
DOMAIN_ATTR_NESTING domain attribution has been set, which indicates
nested mode in use.


Maybe I am reading this wrong, but shouldn't it be the opposite?
i.e. Use FL page table for IOVA if it is a nesting domain?


My description seems to a bit confusing. If DOMAIN_ATTR_NESTING is set
for a domain, the second level will be used to map gPA (guest physical
address) to hPA (host physical address), and the mappings between gVA (
guest virtual address) and gPA will be maintained by the guest with the
page table address binding to host's first level. Otherwise, first level
will be used for mapping between gPA and hPA, or IOVA and DMA address.

Best regards,
baolu




Based-on-idea-by: Ashok Raj 
Based-on-idea-by: Kevin Tian 
Based-on-idea-by: Liu Yi L 
Based-on-idea-by: Jacob Pan 
Based-on-idea-by: Sanjay Kumar 
Based-on-idea-by: Lu Baolu 

Change log:

  v1->v2
  - The first series was posted here
https://lkml.org/lkml/2019/9/23/297
  - Use 

Re: dmar pte read access not set error messages on hp dl388 gen8 systems

2019-12-02 Thread Lu Baolu

Hi,

On 12/3/19 12:13 AM, Jerry Snitselaar wrote:

On Mon Dec 02 19, Jerry Snitselaar wrote:

On Mon Dec 02 19, Lu Baolu wrote:

Hi,

On 12/2/19 2:34 PM, Jerry Snitselaar wrote:

We are seeing DMAR PTE read access not set errors when booting a
kernel with default passthrough, both with a test kernel and with
a 5.4.0 kernel. Previously we would see a number of identity mappings
being set related to the rmrrs, and now they aren't seen and we get
the dmar pte errors as devices touch those regions. From what I can 
tell

currently df4f3c603aeb ("iommu/vt-d: Remove static identity map code")
removed the bit of code in init_dmars that used to set up those
mappings:

-   /*
-    * For each rmrr
-    *   for each dev attached to rmrr
-    *   do
-    * locate drhd for dev, alloc domain for dev
-    * allocate free domain
-    * allocate page table entries for rmrr
-    * if context not allocated for bus
-    *   allocate and init context
-    *   set present in root table for this bus
-    * init context with domain, translation etc
-    *    endfor
-    * endfor
-    */
-   pr_info("Setting RMRR:\n");
-   for_each_rmrr_units(rmrr) {
-   /* some BIOS lists non-exist devices in DMAR table. */
-   for_each_active_dev_scope(rmrr->devices, 
rmrr->devices_cnt,

- i, dev) {
-   ret = iommu_prepare_rmrr_dev(rmrr, dev);
-   if (ret)
-   pr_err("Mapping reserved region 
failed\n");

-   }
-   }

si_domain_init now has code that sets identity maps for devices in 
rmrrs, but

only for certain devices.


On which device, are you seeing this error? Is it a rmrr locked device?

Best regards,
baolu



Almost all of the messages are for the ilo, but there also is a 
message for

the smart array raid bus controller.



Also seeing it with a dl380 gen9 system, where the raid bus controller
is getting the error.


Does it help if you remove

if (device_is_rmrr_locked(dev))
continue;

in si_domain_init()?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page tables) to host

2019-12-02 Thread Alex Williamson
On Mon, 25 Nov 2019 07:45:18 +
"Liu, Yi L"  wrote:

> Hi Alex,
> 
> Thanks for the review. Here I'd like to conclude the major opens in this
> thread and see if we can get some agreements to prepare a new version.
> 
> a) IOCTLs for BIND_GPASID and BIND_PROCESS, share a single IOCTL or two
>separate IOCTLs?
>Yi: It may be helpful to have separate IOCTLs. The bind data conveyed
>for BIND_GPASID and BIND_PROCESS are totally different, and the struct
>iommu_gpasid_bind_data has vendor specific data and may even have more
>versions in future. To better maintain it, I guess separate IOCTLs for
>the two bind types would be better. The structure for BIND_GPASID is
>as below:
> 
> struct vfio_iommu_type1_bind {
> __u32   argsz;
> struct iommu_gpasid_bind_data   bind_data;
> };


We've been rather successful at extending ioctls in vfio and I'm
generally opposed to rampant ioctl proliferation.  If we added @flags
to the above struct (as pretty much the standard for vfio ioctls), then
we could use it to describe the type of binding to perform and
therefore the type of data provided.  I think my major complaint here
was that we were defining PROCESS but not implementing it.  We can
design the ioctl to enable it, but not define it until it's implemented.
 
> b) how kernel-space learns the number of bytes to be copied (a.k.a. the
>usage of @version field and @format field of struct
>iommu_gpasid_bind_data)
>Yi: Jean has an excellent recap in prior reply on the plan of future
>extensions regards to @version field and @format field. Based on the
>plan, kernel space needs to parse the @version field and @format field
>to get the length of the current BIND_GPASID request. Also kernel needs
>to maintain the new and old structure versions. Follow specific
>deprecation policy in future.

Yes, it seems reasonable, so from the struct above (plus @flags) we
could determine we have struct iommu_gpasid_bind_data as the payload
and read that using @version and @format as outlined.

> c) how can vIOMMU emulator know that the vfio interface supports to config
>dual stage translation for vIOMMU?
>Yi: may do it via VFIO_IOMMU_GET_INFO.

Yes please.

> d) how can vIOMMU emulator know what @version and @format should be set
>in struct iommu_gpasid_bind_data?
>Yi: currently, we have two ways. First one, may do it via
>VFIO_IOMMU_GET_INFO. This is a natural idea as here @version and @format
>are used in vfio apis. It makes sense to let vfio to provide related info
>to vIOMMU emulator after checking with vendor specific iommu driver. Also,
>there is idea to do it via sysfs (/sys/class/iommu/dmar#) as we have plan
>to do IOMMU capability sync between vIOMMU and pIOMMU via sysfs. I have
>two concern on this option. Current iommu sysfs only provides vendor
>specific hardware infos. I'm not sure if it is good to expose infos
>defined in IOMMU generic layer via iommu sysfs. If this concern is not
>a big thing, I'm fine with both options.

This seems like the same issue we had with IOMMU reserved regions, I'd
prefer that a user can figure out how to interact with the vfio
interface through the vfio interface.  Forcing the user to poke around
in sysfs requires the user to have read permissions to sysfs in places
they otherwise wouldn't need.  Thanks,

Alex

> > From: Jean-Philippe Brucker [mailto:jean-phili...@linaro.org]
> > Sent: Wednesday, November 13, 2019 6:29 PM
> > To: Liu, Yi L 
> > Subject: Re: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page tables) 
> > to host
> > 
> > On Wed, Nov 13, 2019 at 07:43:43AM +, Liu, Yi L wrote:  
> > > > From: Alex Williamson 
> > > > Sent: Wednesday, November 13, 2019 1:26 AM
> > > > To: Liu, Yi L 
> > > > Subject: Re: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page 
> > > > tables) to host
> > > >
> > > > On Tue, 12 Nov 2019 11:21:40 +
> > > > "Liu, Yi L"  wrote:
> > > >  
> > > > > > From: Alex Williamson < alex.william...@redhat.com >
> > > > > > Sent: Friday, November 8, 2019 7:21 AM
> > > > > > To: Liu, Yi L 
> > > > > > Subject: Re: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page 
> > > > > > tables) to  
> > host  
> > > > > >
> > > > > > On Thu, 24 Oct 2019 08:26:23 -0400
> > > > > > Liu Yi L  wrote:
> > > > > >  
> > > > > > > This patch adds vfio support to bind guest translation structure
> > > > > > > to host iommu. VFIO exposes iommu programming capability to user-
> > > > > > > space. Guest is a user-space application in host under KVM 
> > > > > > > solution.
> > > > > > > For SVA usage in Virtual Machine, guest owns GVA->GPA translation
> > > > > > > structure. And this part should be passdown to host to enable 
> > > > > > > nested
> > > > > > > translation (or say two stage translation). This patch reuses the
> > > > > > > VFIO_IOMMU_BIND proposal from Jean-Philippe Brucker, and 

Re: [PATCH v2 5/8] iommu/vt-d: Add first level page table interfaces

2019-12-02 Thread Jacob Pan
On Thu, 28 Nov 2019 10:25:47 +0800
Lu Baolu  wrote:

> This adds functions to manipulate first level page tables
> which could be used by a scalale mode capable IOMMU unit.
> 
FL and SL page tables are very similar, and I presume we are not using
all the flag bits in FL paging structures for DMA mapping. Are there
enough relevant differences to warrant a new set of helper functions
for FL? Or we can merge into one.

> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Cc: Yi Sun 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/Makefile |   2 +-
>  drivers/iommu/intel-iommu.c|  33 +++
>  drivers/iommu/intel-pgtable.c  | 376
> + include/linux/intel-iommu.h|
> 33 ++- include/trace/events/intel_iommu.h |  60 +
>  5 files changed, 502 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/iommu/intel-pgtable.c
> 
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 35d17094fe3b..aa04f4c3ae26 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -18,7 +18,7 @@ obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o
>  obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
>  obj-$(CONFIG_DMAR_TABLE) += dmar.o
>  obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
> -obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o
> +obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o intel-pgtable.o
>  obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += intel-iommu-debugfs.o
>  obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o
>  obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 66f76f6df2c2..a314892ee72b 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1670,6 +1670,37 @@ static void free_dmar_iommu(struct intel_iommu
> *iommu) #endif
>  }
>  
> +/* First level 5-level paging support */
> +static bool first_lvl_5lp_support(void)
> +{
> + struct dmar_drhd_unit *drhd;
> + struct intel_iommu *iommu;
> + static int first_level_5lp_supported = -1;
> +
> + if (likely(first_level_5lp_supported != -1))
> + return first_level_5lp_supported;
> +
> + first_level_5lp_supported = 1;
> +#ifdef CONFIG_X86
> + /* Match IOMMU first level and CPU paging mode */
> + if (!cpu_feature_enabled(X86_FEATURE_LA57)) {
> + first_level_5lp_supported = 0;
> + return first_level_5lp_supported;
> + }
> +#endif /* #ifdef CONFIG_X86 */
> +
> + rcu_read_lock();
> + for_each_active_iommu(iommu, drhd) {
> + if (!cap_5lp_support(iommu->cap)) {
> + first_level_5lp_supported = 0;
> + break;
> + }
> + }
> + rcu_read_unlock();
> +
> + return first_level_5lp_supported;
> +}
> +
>  static struct dmar_domain *alloc_domain(int flags)
>  {
>   struct dmar_domain *domain;
> @@ -1683,6 +1714,8 @@ static struct dmar_domain *alloc_domain(int
> flags) domain->flags = flags;
>   domain->has_iotlb_device = false;
>   domain->ops = _lvl_pgtable_ops;
> + domain->first_lvl_5lp = first_lvl_5lp_support();
> + spin_lock_init(>page_table_lock);
>   INIT_LIST_HEAD(>devices);
>  
>   return domain;
> diff --git a/drivers/iommu/intel-pgtable.c
> b/drivers/iommu/intel-pgtable.c new file mode 100644
> index ..4a26d08a7570
> --- /dev/null
> +++ b/drivers/iommu/intel-pgtable.c
> @@ -0,0 +1,376 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/**
> + * intel-pgtable.c - Intel IOMMU page table manipulation library
> + *
> + * Copyright (C) 2019 Intel Corporation
> + *
> + * Author: Lu Baolu 
> + */
> +
> +#define pr_fmt(fmt) "DMAR: " fmt
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * first_lvl_map: Map a range of IO virtual address to physical
> addresses.
> + */
> +#ifdef CONFIG_X86
> +#define pgtable_populate(domain,
> nm)   \ +do
> {
> \
> + void *__new =
> alloc_pgtable_page(domain->nid);  \
> + if
> (!__new)  \
> + return
> -ENOMEM;  \
> +
> smp_wmb();\
> +
> spin_lock(&(domain)->page_table_lock);
> \
> + if (nm ## _present(*nm))
> { \
> +
> free_pgtable_page(__new); \
> + } else
> { \
> + set_##nm(nm, __##nm(__pa(__new) |
> _PAGE_TABLE));\
> + domain_flush_cache(domain, nm,
> sizeof(nm##_t));  \
> + }
> \
> +
> spin_unlock(&(domain)->page_table_lock);  \ +}
> while (0) +
> +static int
> +first_lvl_map_pte_range(struct dmar_domain *domain, pmd_t *pmd,
> + unsigned long addr, unsigned long end,
> 

Re: [git pull] IOMMU Updates for Linux v5.5

2019-12-02 Thread pr-tracker-bot
The pull request you sent on Fri, 29 Nov 2019 13:02:51 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
> tags/iommu-updates-v5.5

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/1daa56bcfd8b329447e0c1b1e91c3925d08489b7

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 8/8] iommu/vt-d: Misc macro clean up for SVM

2019-12-02 Thread Joe Perches
On Mon, 2019-12-02 at 11:58 -0800, Jacob Pan wrote:
> Use combined macros for_each_svm_dev() to simplify SVM device iteration
> and error checking.
[]
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
[]
> @@ -427,40 +430,36 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
[]
> + for_each_svm_dev(sdev, svm, dev) {
> + ret = 0;
> + sdev->users--;
> + if (!sdev->users) {

This might be better by reducing indentation here too

if (sdev->users)
break;

> + list_del_rcu(>list);

to reduce indentation 1 level below this

> + /* Flush the PASID cache and IOTLB for this device.
> +  * Note that we do depend on the hardware *not* using
> +  * the PASID any more. Just as we depend on other
> +  * devices never using PASIDs that they have no right
> +  * to use. We have a *shared* PASID table, because it's
> +  * large and has to be physically contiguous. So it's
> +  * hard to be as defensive as we might like. */
> + intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
> + intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
> + kfree_rcu(sdev, rcu);
> +
> + if (list_empty(>devs)) {
> + ioasid_free(svm->pasid);
> + if (svm->mm)
> + mmu_notifier_unregister(>notifier, 
> svm->mm);
> + list_del(>list);
> + /* We mandate that no page faults may be 
> outstanding
> +  * for the PASID when intel_svm_unbind_mm() is 
> called.
> +  * If that is not obeyed, subtle errors will 
> happen.
> +  * Let's make them less subtle... */
> + memset(svm, 0x6b, sizeof(*svm));
> + kfree(svm);
>   }
> - break;
>   }
> + break;
>   }
>   out:
>   mutex_unlock(_mutex);

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/8] Use 1st-level for DMA remapping

2019-12-02 Thread Jacob Pan
On Thu, 28 Nov 2019 10:25:42 +0800
Lu Baolu  wrote:

> Intel VT-d in scalable mode supports two types of page talbes
tables
> for DMA translation: the first level page table and the second
> level page table. The first level page table uses the same
> format as the CPU page table, while the second level page table
> keeps compatible with previous formats. The software is able
> to choose any one of them for DMA remapping according to the use
> case.
> 
> This patchset aims to move IOVA (I/O Virtual Address) translation
move guest IOVA only, right?
> to 1st-level page table in scalable mode. This will simplify vIOMMU
> (IOMMU simulated by VM hypervisor) design by using the two-stage
> translation, a.k.a. nested mode translation.
> 
> As Intel VT-d architecture offers caching mode, guest IOVA (GIOVA)
> support is now implemented in a shadow page manner. The device
> simulation software, like QEMU, has to figure out GIOVA->GPA mappings
> and write them to a shadowed page table, which will be used by the
> physical IOMMU. Each time when mappings are created or destroyed in
> vIOMMU, the simulation software has to intervene. Hence, the changes
> on GIOVA->GPA could be shadowed to host.
> 
> 
>  .---.
>  |  vIOMMU   |
>  |---| ..
>  |   |IOTLB flush trap |QEMU|
>  .---. (map/unmap) ||
>  |GIOVA->GPA |>|..  |
>  '---' || GIOVA->HPA |  |
>  |   | |''  |
>  '---' ||
>||
>''
> |
> <
> |
> v VFIO/IOMMU API
>   .---.
>   |  pIOMMU   |
>   |---|
>   |   |
>   .---.
>   |GIOVA->HPA |
>   '---'
>   |   |
>   '---'
> 
> In VT-d 3.0, scalable mode is introduced, which offers two-level
> translation page tables and nested translation mode. Regards to
> GIOVA support, it can be simplified by 1) moving the GIOVA support
> over 1st-level page table to store GIOVA->GPA mapping in vIOMMU,
> 2) binding vIOMMU 1st level page table to the pIOMMU, 3) using pIOMMU
> second level for GPA->HPA translation, and 4) enable nested (a.k.a.
> dual-stage) translation in host. Compared with current shadow GIOVA
> support, the new approach makes the vIOMMU design simpler and more
> efficient as we only need to flush the pIOMMU IOTLB and possible
> device-IOTLB when an IOVA mapping in vIOMMU is torn down.
> 
>  .---.
>  |  vIOMMU   |
>  |---| .---.
>  |   |IOTLB flush trap |   QEMU|
>  .---.(unmap)  |---|
>  |GIOVA->GPA |>|   |
>  '---' '---'
>  |   |   |
>  '---'   |
><--
>|  VFIO/IOMMU  
>|  cache invalidation and  
>| guest gpd bind interfaces
>v
>  .---.
>  |  pIOMMU   |
>  |---|
>  .---.
>  |GIOVA->GPA |<---First level
>  '---'
>  | GPA->HPA  |<---Scond level
>  '---'
>  '---'
> 
> This patch set includes two parts. The former part implements the
> per-domain page table abstraction, which makes the page table
> difference transparent to various map/unmap APIs. The later part
s/later/latter/
> applies the first level page table for IOVA translation unless the
> DOMAIN_ATTR_NESTING domain attribution has been set, which indicates
> nested mode in use.
> 
Maybe I am reading this wrong, but shouldn't it be the opposite?
i.e. Use FL page table for IOVA if it is a nesting domain?

> Based-on-idea-by: Ashok Raj 
> Based-on-idea-by: Kevin Tian 
> Based-on-idea-by: Liu Yi L 
> Based-on-idea-by: Jacob Pan 
> Based-on-idea-by: Sanjay Kumar 
> Based-on-idea-by: Lu Baolu 
> 
> Change log:
> 
>  v1->v2
>  - The first series was posted here
>https://lkml.org/lkml/2019/9/23/297
>  - Use per domain page table ops to handle different page tables.
>  - Use first level for DMA remapping by default on both bare metal
>and vm guest.
>  - Code refine according to code review comments for v1.
> 
> Lu Baolu (8):
>   iommu/vt-d: Add per domain page table ops
>   iommu/vt-d: Move domain_flush_cache helper into header
>   iommu/vt-d: Implement second level page table ops
>   iommu/vt-d: Apply per domain second level page table ops
>   iommu/vt-d: Add first level page table interfaces
>   iommu/vt-d: Implement first level page table ops
>   iommu/vt-d: 

Re: [PATCH 0/5] iommu/vt-d: Consolidate various cache flush ops

2019-12-02 Thread Jacob Pan
On Fri, 22 Nov 2019 11:04:44 +0800
Lu Baolu  wrote:

> Intel VT-d 3.0 introduces more caches and interfaces for software to
> flush when it runs in the scalable mode. Currently various cache flush
> helpers are scattered around. This consolidates them by putting them
> in the existing iommu_flush structure.
> 
> /* struct iommu_flush - Intel IOMMU cache invalidation ops
>  *
>  * @cc_inv: invalidate context cache
>  * @iotlb_inv: Invalidate IOTLB and paging structure caches when
> software
>  * has changed second-level tables.
>  * @p_iotlb_inv: Invalidate IOTLB and paging structure caches when
> software
>  *   has changed first-level tables.
>  * @pc_inv: invalidate pasid cache
>  * @dev_tlb_inv: invalidate cached mappings used by
> requests-without-PASID
>  *   from the Device-TLB on a endpoint device.
>  * @p_dev_tlb_inv: invalidate cached mappings used by
> requests-with-PASID
>  * from the Device-TLB on an endpoint device
>  */
> struct iommu_flush {
> void (*cc_inv)(struct intel_iommu *iommu, u16 did,
>u16 sid, u8 fm, u64 type);
> void (*iotlb_inv)(struct intel_iommu *iommu, u16 did, u64
> addr, unsigned int size_order, u64 type);
> void (*p_iotlb_inv)(struct intel_iommu *iommu, u16 did, u32
> pasid, u64 addr, unsigned long npages, bool ih);
> void (*pc_inv)(struct intel_iommu *iommu, u16 did, u32 pasid,
>u64 granu);
> void (*dev_tlb_inv)(struct intel_iommu *iommu, u16 sid, u16
> pfsid, u16 qdep, u64 addr, unsigned int mask);
> void (*p_dev_tlb_inv)(struct intel_iommu *iommu, u16 sid, u16
> pfsid, u32 pasid, u16 qdep, u64 addr,
>   unsigned long npages);
> };
> 
> The name of each cache flush ops is defined according to the spec
> section 6.5 so that people are easy to look up them in the spec.
> 
Nice consolidation. For nested SVM, I also introduced cache flushed
helpers as needed.
https://lkml.org/lkml/2019/10/24/857

Should I wait for yours to be merged or you want to extend the this
consolidation after SVA/SVM cache flush? I expect to send my v8 shortly.

> Best regards,
> Lu Baolu
> 
> Lu Baolu (5):
>   iommu/vt-d: Extend iommu_flush for scalable mode
>   iommu/vt-d: Consolidate pasid cache invalidation
>   iommu/vt-d: Consolidate device tlb invalidation
>   iommu/vt-d: Consolidate pasid-based tlb invalidation
>   iommu/vt-d: Consolidate pasid-based device tlb invalidation
> 
>  drivers/iommu/dmar.c|  61 -
>  drivers/iommu/intel-iommu.c | 246
> +--- drivers/iommu/intel-pasid.c |
> 39 +- drivers/iommu/intel-svm.c   |  60 ++---
>  include/linux/intel-iommu.h |  39 --
>  5 files changed, 244 insertions(+), 201 deletions(-)
> 

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 0/8] VT-d Native Shared virtual memory cleanup and fixes

2019-12-02 Thread Jacob Pan
Mostly extracted from nested SVA/SVM series based on review comments of v7.
https://lkml.org/lkml/2019/10/24/852

This series also adds a few important fixes for native use of SVA. Nested
SVA new code will be submitted separately as a smaller set. Based on the
core branch of IOMMU tree staged for v5.5, where common APIs for vSVA were
applied.
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git core

Changelog:
v5  - Regrouped patch 6 and 8, added comments suggested by Joe Perches
v4  - Commit message fix

V3
- Squashed 1/10 & 2/10
- Deleted "8/10 Fix PASID cache flush" from this series
- Addressed reviews from Eric Auger and Baolu
V2
- Coding style fixes based on Baolu's input, no functional change
- Added Acked-by tags.

Thanks,

Jacob


*** BLURB HERE ***

Jacob Pan (8):
  iommu/vt-d: Fix CPU and IOMMU SVM feature matching checks
  iommu/vt-d: Match CPU and IOMMU paging mode
  iommu/vt-d: Reject SVM bind for failed capability check
  iommu/vt-d: Avoid duplicated code for PASID setup
  iommu/vt-d: Fix off-by-one in PASID allocation
  iommu/vt-d: Replace Intel specific PASID allocator with IOASID
  iommu/vt-d: Avoid sending invalid page response
  iommu/vt-d: Misc macro clean up for SVM

 drivers/iommu/Kconfig   |   1 +
 drivers/iommu/intel-iommu.c |  23 +++
 drivers/iommu/intel-pasid.c |  96 --
 drivers/iommu/intel-svm.c   | 163 +---
 include/linux/intel-iommu.h |   5 +-
 5 files changed, 135 insertions(+), 153 deletions(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 2/8] iommu/vt-d: Match CPU and IOMMU paging mode

2019-12-02 Thread Jacob Pan
When setting up first level page tables for sharing with CPU, we need
to ensure IOMMU can support no less than the levels supported by the
CPU.

It is not adequate, as in the current code, to set up 5-level paging
in PASID entry First Level Paging Mode(FLPM) solely based on CPU.

Currently, intel_pasid_setup_first_level() is only used by native SVM
code which already checks paging mode matches. However, future use of
this helper function may not be limited to native SVM.
https://lkml.org/lkml/2019/11/18/1037

Fixes: 437f35e1cd4c8 ("iommu/vt-d: Add first level page table
interface")
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Acked-by: Lu Baolu 
---
 drivers/iommu/intel-pasid.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 040a445be300..e7cb0b8a7332 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -499,8 +499,16 @@ int intel_pasid_setup_first_level(struct intel_iommu 
*iommu,
}
 
 #ifdef CONFIG_X86
-   if (cpu_feature_enabled(X86_FEATURE_LA57))
-   pasid_set_flpm(pte, 1);
+   /* Both CPU and IOMMU paging mode need to match */
+   if (cpu_feature_enabled(X86_FEATURE_LA57)) {
+   if (cap_5lp_support(iommu->cap)) {
+   pasid_set_flpm(pte, 1);
+   } else {
+   pr_err("VT-d has no 5-level paging support for CPU\n");
+   pasid_clear_entry(pte);
+   return -EINVAL;
+   }
+   }
 #endif /* CONFIG_X86 */
 
pasid_set_domain_id(pte, did);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 6/8] iommu/vt-d: Replace Intel specific PASID allocator with IOASID

2019-12-02 Thread Jacob Pan
Make use of generic IOASID code to manage PASID allocation,
free, and lookup. Replace Intel specific code.

Signed-off-by: Jacob Pan 
Reviewed-by: Lu Baolu 
Reviewed-by: Eric Auger 
Acked-by: Lu Baolu 
---
 drivers/iommu/Kconfig   |  1 +
 drivers/iommu/intel-iommu.c | 13 +++--
 drivers/iommu/intel-pasid.c | 36 
 drivers/iommu/intel-svm.c   | 36 ++--
 4 files changed, 30 insertions(+), 56 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index fd50ddbf..43ce450a40d3 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -212,6 +212,7 @@ config INTEL_IOMMU_SVM
depends on INTEL_IOMMU && X86
select PCI_PASID
select MMU_NOTIFIER
+   select IOASID
help
  Shared Virtual Memory (SVM) provides a facility for devices
  to access DMA resources through process address space by
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d598168e410d..a84f0caa33a0 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5238,7 +5238,7 @@ static void auxiliary_unlink_device(struct dmar_domain 
*domain,
domain->auxd_refcnt--;
 
if (!domain->auxd_refcnt && domain->default_pasid > 0)
-   intel_pasid_free_id(domain->default_pasid);
+   ioasid_free(domain->default_pasid);
 }
 
 static int aux_domain_add_dev(struct dmar_domain *domain,
@@ -5256,10 +5256,11 @@ static int aux_domain_add_dev(struct dmar_domain 
*domain,
if (domain->default_pasid <= 0) {
int pasid;
 
-   pasid = intel_pasid_alloc_id(domain, PASID_MIN,
-pci_max_pasids(to_pci_dev(dev)),
-GFP_KERNEL);
-   if (pasid <= 0) {
+   /* No private data needed for the default pasid */
+   pasid = ioasid_alloc(NULL, PASID_MIN,
+pci_max_pasids(to_pci_dev(dev)) - 1,
+NULL);
+   if (pasid == INVALID_IOASID) {
pr_err("Can't allocate default pasid\n");
return -ENODEV;
}
@@ -5295,7 +5296,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
spin_unlock(>lock);
spin_unlock_irqrestore(_domain_lock, flags);
if (!domain->auxd_refcnt && domain->default_pasid > 0)
-   intel_pasid_free_id(domain->default_pasid);
+   ioasid_free(domain->default_pasid);
 
return ret;
 }
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 732bfee228df..3cb569e76642 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -26,42 +26,6 @@
  */
 static DEFINE_SPINLOCK(pasid_lock);
 u32 intel_pasid_max_id = PASID_MAX;
-static DEFINE_IDR(pasid_idr);
-
-int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
-{
-   int ret, min, max;
-
-   min = max_t(int, start, PASID_MIN);
-   max = min_t(int, end, intel_pasid_max_id);
-
-   WARN_ON(in_interrupt());
-   idr_preload(gfp);
-   spin_lock(_lock);
-   ret = idr_alloc(_idr, ptr, min, max, GFP_ATOMIC);
-   spin_unlock(_lock);
-   idr_preload_end();
-
-   return ret;
-}
-
-void intel_pasid_free_id(int pasid)
-{
-   spin_lock(_lock);
-   idr_remove(_idr, pasid);
-   spin_unlock(_lock);
-}
-
-void *intel_pasid_lookup_id(int pasid)
-{
-   void *p;
-
-   spin_lock(_lock);
-   p = idr_find(_idr, pasid);
-   spin_unlock(_lock);
-
-   return p;
-}
 
 /*
  * Per device pasid table management:
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index e90d0b914afe..62f815e0ae57 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "intel-pasid.h"
@@ -335,16 +336,15 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
if (pasid_max > intel_pasid_max_id)
pasid_max = intel_pasid_max_id;
 
-   /* Do not use PASID 0 in caching mode (virtualised IOMMU) */
-   ret = intel_pasid_alloc_id(svm,
-  !!cap_caching_mode(iommu->cap),
-  pasid_max, GFP_KERNEL);
-   if (ret < 0) {
+   /* Do not use PASID 0, reserved for RID to PASID */
+   svm->pasid = ioasid_alloc(NULL, PASID_MIN,
+ pasid_max - 1, svm);
+   if (svm->pasid == INVALID_IOASID) {
kfree(svm);
kfree(sdev);
+   ret = -ENOSPC;
goto out;
}
-   svm->pasid = ret;
svm->notifier.ops = _mmuops;

[PATCH v5 3/8] iommu/vt-d: Reject SVM bind for failed capability check

2019-12-02 Thread Jacob Pan
Add a check during SVM bind to ensure CPU and IOMMU hardware capabilities
are met.

Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Acked-by: Lu Baolu 
---
 drivers/iommu/intel-svm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 716c543488f6..74df10a39dfc 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -238,6 +238,9 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
if (!iommu || dmar_disabled)
return -EINVAL;
 
+   if (!intel_svm_capable(iommu))
+   return -ENOTSUPP;
+
if (dev_is_pci(dev)) {
pasid_max = pci_max_pasids(to_pci_dev(dev));
if (pasid_max < 0)
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 8/8] iommu/vt-d: Misc macro clean up for SVM

2019-12-02 Thread Jacob Pan
Use combined macros for_each_svm_dev() to simplify SVM device iteration
and error checking.

Suggested-by: Andy Shevchenko 
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Acked-by: Lu Baolu 
---
 drivers/iommu/intel-svm.c | 79 +++
 1 file changed, 39 insertions(+), 40 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 4ed7426473c1..0fcbe631cd5f 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -226,6 +226,10 @@ static const struct mmu_notifier_ops intel_mmuops = {
 static DEFINE_MUTEX(pasid_mutex);
 static LIST_HEAD(global_svm_list);
 
+#define for_each_svm_dev(sdev, svm, d) \
+   list_for_each_entry((sdev), &(svm)->devs, list) \
+   if ((d) != (sdev)->dev) {} else
+
 int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct 
svm_dev_ops *ops)
 {
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
@@ -274,15 +278,14 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
goto out;
}
 
-   list_for_each_entry(sdev, >devs, list) {
-   if (dev == sdev->dev) {
-   if (sdev->ops != ops) {
-   ret = -EBUSY;
-   goto out;
-   }
-   sdev->users++;
-   goto success;
+   /* Find the matching device in svm list */
+   for_each_svm_dev(sdev, svm, dev) {
+   if (sdev->ops != ops) {
+   ret = -EBUSY;
+   goto out;
}
+   sdev->users++;
+   goto success;
}
 
break;
@@ -427,40 +430,36 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
goto out;
}
 
-   list_for_each_entry(sdev, >devs, list) {
-   if (dev == sdev->dev) {
-   ret = 0;
-   sdev->users--;
-   if (!sdev->users) {
-   list_del_rcu(>list);
-   /* Flush the PASID cache and IOTLB for this 
device.
-* Note that we do depend on the hardware *not* 
using
-* the PASID any more. Just as we depend on 
other
-* devices never using PASIDs that they have no 
right
-* to use. We have a *shared* PASID table, 
because it's
-* large and has to be physically contiguous. 
So it's
-* hard to be as defensive as we might like. */
-   intel_pasid_tear_down_entry(iommu, dev, 
svm->pasid);
-   intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
-   kfree_rcu(sdev, rcu);
-
-   if (list_empty(>devs)) {
-   ioasid_free(svm->pasid);
-   if (svm->mm)
-   
mmu_notifier_unregister(>notifier, svm->mm);
-
-   list_del(>list);
-
-   /* We mandate that no page faults may 
be outstanding
-* for the PASID when 
intel_svm_unbind_mm() is called.
-* If that is not obeyed, subtle errors 
will happen.
-* Let's make them less subtle... */
-   memset(svm, 0x6b, sizeof(*svm));
-   kfree(svm);
-   }
+   for_each_svm_dev(sdev, svm, dev) {
+   ret = 0;
+   sdev->users--;
+   if (!sdev->users) {
+   list_del_rcu(>list);
+   /* Flush the PASID cache and IOTLB for this device.
+* Note that we do depend on the hardware *not* using
+* the PASID any more. Just as we depend on other
+* devices never using PASIDs that they have no right
+* to use. We have a *shared* PASID table, because it's
+* large and has to be physically contiguous. So it's
+* hard to be as defensive as we might like. */
+   intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+   intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
+ 

[PATCH v5 7/8] iommu/vt-d: Avoid sending invalid page response

2019-12-02 Thread Jacob Pan
Page responses should only be sent when last page in group (LPIG) or
private data is present in the page request. This patch avoids sending
invalid descriptors.

Fixes: 5d308fc1ecf53 ("iommu/vt-d: Add 256-bit invalidation descriptor
support")
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Acked-by: Lu Baolu 
---
 drivers/iommu/intel-svm.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 62f815e0ae57..4ed7426473c1 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -683,11 +683,10 @@ static irqreturn_t prq_event_thread(int irq, void *d)
if (req->priv_data_present)
memcpy(, req->priv_data,
   sizeof(req->priv_data));
+   resp.qw2 = 0;
+   resp.qw3 = 0;
+   qi_submit_sync(, iommu);
}
-   resp.qw2 = 0;
-   resp.qw3 = 0;
-   qi_submit_sync(, iommu);
-
head = (head + sizeof(*req)) & PRQ_RING_MASK;
}
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 4/8] iommu/vt-d: Avoid duplicated code for PASID setup

2019-12-02 Thread Jacob Pan
After each setup for PASID entry, related translation caches must be
flushed. We can combine duplicated code into one function which is less
error prone.

Signed-off-by: Jacob Pan 
Reviewed-by: Lu Baolu 
Reviewed-by: Eric Auger 
Acked-by: Lu Baolu 
---
 drivers/iommu/intel-pasid.c | 48 +
 1 file changed, 18 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index e7cb0b8a7332..732bfee228df 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -465,6 +465,21 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
devtlb_invalidation_with_pasid(iommu, dev, pasid);
 }
 
+static void pasid_flush_caches(struct intel_iommu *iommu,
+   struct pasid_entry *pte,
+   int pasid, u16 did)
+{
+   if (!ecap_coherent(iommu->ecap))
+   clflush_cache_range(pte, sizeof(*pte));
+
+   if (cap_caching_mode(iommu->cap)) {
+   pasid_cache_invalidation_with_pasid(iommu, did, pasid);
+   iotlb_invalidation_with_pasid(iommu, did, pasid);
+   } else {
+   iommu_flush_write_buffer(iommu);
+   }
+}
+
 /*
  * Set up the scalable mode pasid table entry for first only
  * translation type.
@@ -518,16 +533,7 @@ int intel_pasid_setup_first_level(struct intel_iommu 
*iommu,
/* Setup Present and PASID Granular Transfer Type: */
pasid_set_translation_type(pte, 1);
pasid_set_present(pte);
-
-   if (!ecap_coherent(iommu->ecap))
-   clflush_cache_range(pte, sizeof(*pte));
-
-   if (cap_caching_mode(iommu->cap)) {
-   pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-   iotlb_invalidation_with_pasid(iommu, did, pasid);
-   } else {
-   iommu_flush_write_buffer(iommu);
-   }
+   pasid_flush_caches(iommu, pte, pasid, did);
 
return 0;
 }
@@ -591,16 +597,7 @@ int intel_pasid_setup_second_level(struct intel_iommu 
*iommu,
 */
pasid_set_sre(pte);
pasid_set_present(pte);
-
-   if (!ecap_coherent(iommu->ecap))
-   clflush_cache_range(pte, sizeof(*pte));
-
-   if (cap_caching_mode(iommu->cap)) {
-   pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-   iotlb_invalidation_with_pasid(iommu, did, pasid);
-   } else {
-   iommu_flush_write_buffer(iommu);
-   }
+   pasid_flush_caches(iommu, pte, pasid, did);
 
return 0;
 }
@@ -634,16 +631,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
*iommu,
 */
pasid_set_sre(pte);
pasid_set_present(pte);
-
-   if (!ecap_coherent(iommu->ecap))
-   clflush_cache_range(pte, sizeof(*pte));
-
-   if (cap_caching_mode(iommu->cap)) {
-   pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-   iotlb_invalidation_with_pasid(iommu, did, pasid);
-   } else {
-   iommu_flush_write_buffer(iommu);
-   }
+   pasid_flush_caches(iommu, pte, pasid, did);
 
return 0;
 }
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 5/8] iommu/vt-d: Fix off-by-one in PASID allocation

2019-12-02 Thread Jacob Pan
PASID allocator uses IDR which is exclusive for the end of the
allocation range. There is no need to decrement pasid_max.

Fixes: af39507305fb ("iommu/vt-d: Apply global PASID in SVA")
Reported-by: Eric Auger 
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Acked-by: Lu Baolu 
---
 drivers/iommu/intel-svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 74df10a39dfc..e90d0b914afe 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -338,7 +338,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
/* Do not use PASID 0 in caching mode (virtualised IOMMU) */
ret = intel_pasid_alloc_id(svm,
   !!cap_caching_mode(iommu->cap),
-  pasid_max - 1, GFP_KERNEL);
+  pasid_max, GFP_KERNEL);
if (ret < 0) {
kfree(svm);
kfree(sdev);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 1/8] iommu/vt-d: Fix CPU and IOMMU SVM feature matching checks

2019-12-02 Thread Jacob Pan
Shared Virtual Memory(SVM) is based on a collective set of hardware
features detected at runtime. There are requirements for matching CPU
and IOMMU capabilities.

The current code checks CPU and IOMMU feature set for SVM support but
the result is never stored nor used. Therefore, SVM can still be used
even when these checks failed. The consequences can be:
1. CPU uses 5-level paging mode for virtual address of 57 bits, but
IOMMU can only support 4-level paging mode with 48 bits address for DMA.
2. 1GB page size is used by CPU but IOMMU does not support it. VT-d
unrecoverable faults may be generated.

The best solution to fix these problems is to prevent them in the first
place.

This patch consolidates code for checking PASID, CPU vs. IOMMU paging
mode compatibility, as well as provides specific error messages for
each failed checks. On sane hardware configurations, these error message
shall never appear in kernel log.

Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Acked-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 10 ++
 drivers/iommu/intel-svm.c   | 40 +++-
 include/linux/intel-iommu.h |  5 -
 3 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3f974919d3bd..d598168e410d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3289,10 +3289,7 @@ static int __init init_dmars(void)
 
if (!ecap_pass_through(iommu->ecap))
hw_pass_through = 0;
-#ifdef CONFIG_INTEL_IOMMU_SVM
-   if (pasid_supported(iommu))
-   intel_svm_init(iommu);
-#endif
+   intel_svm_check(iommu);
}
 
/*
@@ -4471,10 +4468,7 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
if (ret)
goto out;
 
-#ifdef CONFIG_INTEL_IOMMU_SVM
-   if (pasid_supported(iommu))
-   intel_svm_init(iommu);
-#endif
+   intel_svm_check(iommu);
 
if (dmaru->ignored) {
/*
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 9b159132405d..716c543488f6 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -23,19 +23,6 @@
 
 static irqreturn_t prq_event_thread(int irq, void *d);
 
-int intel_svm_init(struct intel_iommu *iommu)
-{
-   if (cpu_feature_enabled(X86_FEATURE_GBPAGES) &&
-   !cap_fl1gp_support(iommu->cap))
-   return -EINVAL;
-
-   if (cpu_feature_enabled(X86_FEATURE_LA57) &&
-   !cap_5lp_support(iommu->cap))
-   return -EINVAL;
-
-   return 0;
-}
-
 #define PRQ_ORDER 0
 
 int intel_svm_enable_prq(struct intel_iommu *iommu)
@@ -99,6 +86,33 @@ int intel_svm_finish_prq(struct intel_iommu *iommu)
return 0;
 }
 
+static inline bool intel_svm_capable(struct intel_iommu *iommu)
+{
+   return iommu->flags & VTD_FLAG_SVM_CAPABLE;
+}
+
+void intel_svm_check(struct intel_iommu *iommu)
+{
+   if (!pasid_supported(iommu))
+   return;
+
+   if (cpu_feature_enabled(X86_FEATURE_GBPAGES) &&
+   !cap_fl1gp_support(iommu->cap)) {
+   pr_err("%s SVM disabled, incompatible 1GB page capability\n",
+  iommu->name);
+   return;
+   }
+
+   if (cpu_feature_enabled(X86_FEATURE_LA57) &&
+   !cap_5lp_support(iommu->cap)) {
+   pr_err("%s SVM disabled, incompatible paging mode\n",
+  iommu->name);
+   return;
+   }
+
+   iommu->flags |= VTD_FLAG_SVM_CAPABLE;
+}
+
 static void intel_flush_svm_range_dev (struct intel_svm *svm, struct 
intel_svm_dev *sdev,
unsigned long address, unsigned long pages, int 
ih)
 {
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index ed11ef594378..7dcfa1c4a844 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -433,6 +433,7 @@ enum {
 
 #define VTD_FLAG_TRANS_PRE_ENABLED (1 << 0)
 #define VTD_FLAG_IRQ_REMAP_PRE_ENABLED (1 << 1)
+#define VTD_FLAG_SVM_CAPABLE   (1 << 2)
 
 extern int intel_iommu_sm;
 
@@ -656,7 +657,7 @@ void iommu_flush_write_buffer(struct intel_iommu *iommu);
 int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev);
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
-int intel_svm_init(struct intel_iommu *iommu);
+extern void intel_svm_check(struct intel_iommu *iommu);
 extern int intel_svm_enable_prq(struct intel_iommu *iommu);
 extern int intel_svm_finish_prq(struct intel_iommu *iommu);
 
@@ -684,6 +685,8 @@ struct intel_svm {
 };
 
 extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
+#else
+static inline void intel_svm_check(struct intel_iommu *iommu) {}
 #endif
 
 #ifdef CONFIG_INTEL_IOMMU_DEBUGFS
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org

Re: [PATCH v4 8/8] iommu/vt-d: Misc macro clean up for SVM

2019-12-02 Thread Jacob Pan
On Mon, 02 Dec 2019 10:22:13 -0800
Joe Perches  wrote:

> On Mon, 2019-12-02 at 10:15 -0800, Jacob Pan wrote:
> > On Thu, 21 Nov 2019 13:37:10 -0800
> > Joe Perches  wrote:
> >   
> > > On Thu, 2019-11-21 at 13:26 -0800, Jacob Pan wrote:  
> > > > Use combined macros for_each_svm_dev() to simplify SVM device
> > > > iteration and error checking.
> > > []  
> > > > diff --git a/drivers/iommu/intel-svm.c
> > > > b/drivers/iommu/intel-svm.c
> > > []  
> > > > +#define for_each_svm_dev(sdev, svm, d) \
> > > > +   list_for_each_entry((sdev), &(svm)->devs, list)
> > > > \
> > > > +   if ((d) != (sdev)->dev) {} else
> > > > +
> > > >  int intel_svm_bind_mm(struct device *dev, int *pasid, int
> > > > flags, struct svm_dev_ops *ops) {
> > > > struct intel_iommu *iommu =
> > > > intel_svm_device_to_iommu(dev); @@ -274,15 +278,13 @@ int
> > > > intel_svm_bind_mm(struct device *dev, int *pasid, int flags,
> > > > struct svm_dev_ goto out; }
> > > >  
> > > > -   list_for_each_entry(sdev, >devs,
> > > > list) {
> > > > -   if (dev == sdev->dev) {
> > > > -   if (sdev->ops != ops) {
> > > > -   ret = -EBUSY;
> > > > -   goto out;
> > > > -   }
> > > > -   sdev->users++;
> > > > -   goto success;
> > > > +   for_each_svm_dev(sdev, svm, dev) {
> > > > +   if (sdev->ops != ops) {
> > > > +   ret = -EBUSY;
> > > > +   goto out;
> > > > }
> > > > +   sdev->users++;
> > > > +   goto success;
> > > > }
> > > 
> > > I think this does not read better as this is now a
> > > for_each loop that exits the loop on the first match.
> > >   
> > I think one of the benefits is reduced indentation. What do you
> > recommend?  
> 
> Making the code intelligible for a reader.
> 
> At least add a comment describing why there is only
> a single possible match.
> 
> Given the for_each name, it's odd code that only the
> first match has an action.
> 
I will add a comment to explain we are trying to find the matching
device on the list.

Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 8/8] iommu/vt-d: Misc macro clean up for SVM

2019-12-02 Thread Joe Perches
On Mon, 2019-12-02 at 10:15 -0800, Jacob Pan wrote:
> On Thu, 21 Nov 2019 13:37:10 -0800
> Joe Perches  wrote:
> 
> > On Thu, 2019-11-21 at 13:26 -0800, Jacob Pan wrote:
> > > Use combined macros for_each_svm_dev() to simplify SVM device
> > > iteration and error checking.  
> > []
> > > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c  
> > []
> > > +#define for_each_svm_dev(sdev, svm, d)   \
> > > + list_for_each_entry((sdev), &(svm)->devs, list) \
> > > + if ((d) != (sdev)->dev) {} else
> > > +
> > >  int intel_svm_bind_mm(struct device *dev, int *pasid, int flags,
> > > struct svm_dev_ops *ops) {
> > >   struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > > @@ -274,15 +278,13 @@ int intel_svm_bind_mm(struct device *dev, int
> > > *pasid, int flags, struct svm_dev_ goto out;
> > >   }
> > >  
> > > - list_for_each_entry(sdev, >devs,
> > > list) {
> > > - if (dev == sdev->dev) {
> > > - if (sdev->ops != ops) {
> > > - ret = -EBUSY;
> > > - goto out;
> > > - }
> > > - sdev->users++;
> > > - goto success;
> > > + for_each_svm_dev(sdev, svm, dev) {
> > > + if (sdev->ops != ops) {
> > > + ret = -EBUSY;
> > > + goto out;
> > >   }
> > > + sdev->users++;
> > > + goto success;
> > >   }  
> > 
> > I think this does not read better as this is now a
> > for_each loop that exits the loop on the first match.
> > 
> I think one of the benefits is reduced indentation. What do you
> recommend?

Making the code intelligible for a reader.

At least add a comment describing why there is only
a single possible match.

Given the for_each name, it's odd code that only the
first match has an action.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 8/8] iommu/vt-d: Misc macro clean up for SVM

2019-12-02 Thread Jacob Pan
On Thu, 21 Nov 2019 13:37:10 -0800
Joe Perches  wrote:

> On Thu, 2019-11-21 at 13:26 -0800, Jacob Pan wrote:
> > Use combined macros for_each_svm_dev() to simplify SVM device
> > iteration and error checking.  
> []
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c  
> []
> > +#define for_each_svm_dev(sdev, svm, d) \
> > +   list_for_each_entry((sdev), &(svm)->devs, list) \
> > +   if ((d) != (sdev)->dev) {} else
> > +
> >  int intel_svm_bind_mm(struct device *dev, int *pasid, int flags,
> > struct svm_dev_ops *ops) {
> > struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > @@ -274,15 +278,13 @@ int intel_svm_bind_mm(struct device *dev, int
> > *pasid, int flags, struct svm_dev_ goto out;
> > }
> >  
> > -   list_for_each_entry(sdev, >devs,
> > list) {
> > -   if (dev == sdev->dev) {
> > -   if (sdev->ops != ops) {
> > -   ret = -EBUSY;
> > -   goto out;
> > -   }
> > -   sdev->users++;
> > -   goto success;
> > +   for_each_svm_dev(sdev, svm, dev) {
> > +   if (sdev->ops != ops) {
> > +   ret = -EBUSY;
> > +   goto out;
> > }
> > +   sdev->users++;
> > +   goto success;
> > }  
> 
> I think this does not read better as this is now a
> for_each loop that exits the loop on the first match.
> 
I think one of the benefits is reduced indentation. What do you
recommend?

> >  
> > break;
> > @@ -427,43 +429,36 @@ int intel_svm_unbind_mm(struct device *dev,
> > int pasid) goto out;
> > }
> >  
> > -   if (!svm)
> > -   goto out;
> > -
> > -   list_for_each_entry(sdev, >devs, list) {  
> []
> > +   for_each_svm_dev(sdev, svm, dev) {  
> 
> I think this should not remove the !svm test above.
> 
Yeah, !svm test should have been part of 6/8. I will fix that.

Thanks,

Jacob 
> 

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 2/3] PCI: Add parameter nr_devfns to pci_add_dma_alias

2019-12-02 Thread Christoph Hellwig
On Fri, Nov 29, 2019 at 05:56:55PM +, James Sewart wrote:
> pci_add_dma_alias can now be used to create a dma alias for a range of
> devfns.
> 
> Signed-off-by: James Sewart 
> ---
>  drivers/pci/pci.c| 23 ++-
>  drivers/pci/quirks.c | 14 +++---
>  include/linux/pci.h  |  2 +-
>  3 files changed, 26 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 0a4449a30ace..f9800a610ca1 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5857,7 +5857,8 @@ int pci_set_vga_state(struct pci_dev *dev, bool decode,
>  /**
>   * pci_add_dma_alias - Add a DMA devfn alias for a device
>   * @dev: the PCI device for which alias is added
> - * @devfn: alias slot and function
> + * @devfn_from: alias slot and function
> + * @nr_devfns: Number of subsequent devfns to alias
>   *
>   * This helper encodes an 8-bit devfn as a bit number in dma_alias_mask
>   * which is used to program permissible bus-devfn source addresses for DMA
> @@ -5873,8 +5874,14 @@ int pci_set_vga_state(struct pci_dev *dev, bool decode,
>   * cannot be left as a userspace activity).  DMA aliases should therefore
>   * be configured via quirks, such as the PCI fixup header quirk.
>   */
> -void pci_add_dma_alias(struct pci_dev *dev, u8 devfn)
> +void pci_add_dma_alias(struct pci_dev *dev, u8 devfn_from, unsigned 
> nr_devfns)
>  {
> + int devfn_to;
> +
> + if (nr_devfns > U8_MAX+1)
> + nr_devfns = U8_MAX+1;

Missing whitespaces here as well.  Also this could use max() and I
think you want a documented constants for MAX_NR_DEVFNS that documents
this "not off by one".
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/3] PCI: Fix off by one in dma_alias_mask allocation size

2019-12-02 Thread Christoph Hellwig
On Fri, Nov 29, 2019 at 05:56:19PM +, James Sewart wrote:
> The number of possible devfns is 256 so the size of the bitmap for
> allocations should be U8_MAX+1.
> 
> Signed-off-by: James Sewart 
> ---
>  drivers/pci/pci.c| 2 +-
>  drivers/pci/search.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index a97e2571a527..0a4449a30ace 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5876,7 +5876,7 @@ int pci_set_vga_state(struct pci_dev *dev, bool decode,
>  void pci_add_dma_alias(struct pci_dev *dev, u8 devfn)
>  {
>   if (!dev->dma_alias_mask)
> - dev->dma_alias_mask = bitmap_zalloc(U8_MAX, GFP_KERNEL);
> + dev->dma_alias_mask = bitmap_zalloc(U8_MAX+1, GFP_KERNEL);

Missing whitespaces around the "+".

> - for_each_set_bit(devfn, pdev->dma_alias_mask, U8_MAX) {
> + for_each_set_bit(devfn, pdev->dma_alias_mask, U8_MAX+1) {

Same here.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/amd: Disable IOMMU on Stoney Ridge systems

2019-12-02 Thread Christoph Hellwig
On Fri, Nov 29, 2019 at 10:21:54PM +0800, Kai-Heng Feng wrote:
> Serious screen flickering when Stoney Ridge outputs to a 4K monitor.
> 
> According to Alex Deucher, IOMMU isn't enabled on Windows, so let's do
> the same here to avoid screen flickering on 4K monitor.

Disabling the IOMMU entirely seem pretty severe.  Isn't it enough to
identity map the GPU device?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch v2 2/3] iommu: optimize iova_magazine_free_pfns()

2019-12-02 Thread Christoph Hellwig
> + return (mag && mag->size == IOVA_MAG_SIZE);

> + return (!mag || mag->size == 0);

No need for the braces in both cases.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch v2 1/3] iommu: match the original algorithm

2019-12-02 Thread Christoph Hellwig
I think a subject line better describes what you change, no that
it matches an original algorithm.  The fact that the fix matches
the original algorithm can go somewhere towards the commit log,
preferably with a reference to the actual paper.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2] iommu/amd: Disable IOMMU on Stoney Ridge systems

2019-12-02 Thread Deucher, Alexander
> -Original Message-
> From: Lucas Stach 
> Sent: Sunday, December 1, 2019 7:43 AM
> To: Kai-Heng Feng ; j...@8bytes.org
> Cc: Deucher, Alexander ;
> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH v2] iommu/amd: Disable IOMMU on Stoney Ridge
> systems
> 
> Am Freitag, den 29.11.2019, 22:21 +0800 schrieb Kai-Heng Feng:
> > Serious screen flickering when Stoney Ridge outputs to a 4K monitor.
> >
> > According to Alex Deucher, IOMMU isn't enabled on Windows, so let's do
> > the same here to avoid screen flickering on 4K monitor.
> 
> This doesn't seem like a good solution, especially if there isn't a method for
> the user to opt-out.  Some users might prefer having the IOMMU support to
> 4K display output.
> 
> But before using the big hammer of disabling or breaking one of those
> features, we should take a look at what's the issue here. Screen flickering
> caused by the IOMMU being active hints to the IOMMU not being able to
> sustain the translation bandwidth required by the high- bandwidth
> isochronous transfers caused by 4K scanout, most likely due to insufficient
> TLB space.
> 
> As far as I know the framebuffer memory for the display buffers is located in
> stolen RAM, and thus contigous in memory. I don't know the details of the
> GPU integration on those APUs, but maybe there even is a way to bypass the
> IOMMU for the stolen VRAM regions?
> 
> If there isn't and all GPU traffic passes through the IOMMU when active, we
> should check if the stolen RAM is mapped with hugepages on the IOMMU
> side. All the stolen RAM can most likely be mapped with a few hugepage
> mappings, which should reduce IOMMU TLB demand by a large margin.

The is no issue when we scan out of the carve out region.  The issue occurs 
when we scan out of regular system memory (scatter/gather).  Many newer laptops 
have very small carve out regions (e.g., 32 MB), so we have to use regular 
system pages to support multiple high resolution displays.  The problem is, the 
latency gets too high at some point when the IOMMU is involved.  Huge pages 
would probably help in this case, but I'm not sure if there is any way to 
guarantee that we get huge pages for system memory.  I guess we could use CMA 
or something like that.

Alex

> 
> Regards,
> Lucas
> 
> > Cc: Alex Deucher 
> > Bug:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl
> >
> ab.freedesktop.org%2Fdrm%2Famd%2Fissues%2F961data=02%7C01%
> 7Calexa
> >
> nder.deucher%40amd.com%7C30540b2bf2be417c4d9508d7765bf07f%7C3dd
> 8961fe4
> >
> 884e608e11a82d994e183d%7C0%7C0%7C637108010075463266sdata=1
> ZIZUWos
> > cPiB4auOY10jlGzoFeWszYMDBQG0CtrrOO8%3Dreserved=0
> > Signed-off-by: Kai-Heng Feng 
> > ---
> > v2:
> > - Find Stoney graphics instead of host bridge.
> >
> >  drivers/iommu/amd_iommu_init.c | 13 -
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/amd_iommu_init.c
> > b/drivers/iommu/amd_iommu_init.c index 568c52317757..139aa6fdadda
> > 100644
> > --- a/drivers/iommu/amd_iommu_init.c
> > +++ b/drivers/iommu/amd_iommu_init.c
> > @@ -2516,6 +2516,7 @@ static int __init early_amd_iommu_init(void)
> > struct acpi_table_header *ivrs_base;
> > acpi_status status;
> > int i, remap_cache_sz, ret = 0;
> > +   u32 pci_id;
> >
> > if (!amd_iommu_detected)
> > return -ENODEV;
> > @@ -2603,6 +2604,16 @@ static int __init early_amd_iommu_init(void)
> > if (ret)
> > goto out;
> >
> > +   /* Disable IOMMU if there's Stoney Ridge graphics */
> > +   for (i = 0; i < 32; i++) {
> > +   pci_id = read_pci_config(0, i, 0, 0);
> > +   if ((pci_id & 0x) == 0x1002 && (pci_id >> 16) == 0x98e4) {
> > +   pr_info("Disable IOMMU on Stoney Ridge\n");
> > +   amd_iommu_disabled = true;
> > +   break;
> > +   }
> > +   }
> > +
> > /* Disable any previously enabled IOMMUs */
> > if (!is_kdump_kernel() || amd_iommu_disabled)
> > disable_iommus();
> > @@ -2711,7 +2722,7 @@ static int __init state_next(void)
> > ret = early_amd_iommu_init();
> > init_state = ret ? IOMMU_INIT_ERROR :
> IOMMU_ACPI_FINISHED;
> > if (init_state == IOMMU_ACPI_FINISHED &&
> amd_iommu_disabled) {
> > -   pr_info("AMD IOMMU disabled on kernel command-
> line\n");
> > +   pr_info("AMD IOMMU disabled\n");
> > init_state = IOMMU_CMDLINE_DISABLED;
> > ret = -EINVAL;
> > }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: dmar pte read access not set error messages on hp dl388 gen8 systems

2019-12-02 Thread Jerry Snitselaar

On Mon Dec 02 19, Jerry Snitselaar wrote:

On Mon Dec 02 19, Lu Baolu wrote:

Hi,

On 12/2/19 2:34 PM, Jerry Snitselaar wrote:

We are seeing DMAR PTE read access not set errors when booting a
kernel with default passthrough, both with a test kernel and with
a 5.4.0 kernel. Previously we would see a number of identity mappings
being set related to the rmrrs, and now they aren't seen and we get
the dmar pte errors as devices touch those regions. From what I can tell
currently df4f3c603aeb ("iommu/vt-d: Remove static identity map code")
removed the bit of code in init_dmars that used to set up those
mappings:

-   /*
-    * For each rmrr
-    *   for each dev attached to rmrr
-    *   do
-    * locate drhd for dev, alloc domain for dev
-    * allocate free domain
-    * allocate page table entries for rmrr
-    * if context not allocated for bus
-    *   allocate and init context
-    *   set present in root table for this bus
-    * init context with domain, translation etc
-    *    endfor
-    * endfor
-    */
-   pr_info("Setting RMRR:\n");
-   for_each_rmrr_units(rmrr) {
-   /* some BIOS lists non-exist devices in DMAR table. */
-   for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
- i, dev) {
-   ret = iommu_prepare_rmrr_dev(rmrr, dev);
-   if (ret)
-   pr_err("Mapping reserved region failed\n");
-   }
-   }

si_domain_init now has code that sets identity maps for devices in 
rmrrs, but

only for certain devices.


On which device, are you seeing this error? Is it a rmrr locked device?

Best regards,
baolu



Almost all of the messages are for the ilo, but there also is a message for
the smart array raid bus controller.



Also seeing it with a dl380 gen9 system, where the raid bus controller
is getting the error.



With iommu=nopt, the system boots up without issue.





___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 0/4] dma-mapping: introduce new dma unmap and sync variants

2019-12-02 Thread Madalin Bucur
> -Original Message-
> From: Christoph Hellwig 
> To: David Miller 
> Subject: Re: [PATCH v3 0/4] dma-mapping: introduce new dma unmap and sync
> variants
> 
> On Wed, Nov 13, 2019 at 12:11:32PM -0800, David Miller wrote:
> > > This series introduces a few new dma unmap and sync api variants
> that,
> > > on top of what the originals do, return the virtual address
> > > corresponding to the input dma address. In order to do that a new dma
> > > map op is added, .get_virt_addr that takes the input dma address and
> > > returns the virtual address backing it up.
> > > The second patch adds an implementation for this new dma map op in
> the
> > > generic iommu dma glue code and wires it in.
> > > The third patch updates the dpaa2-eth driver to use the new apis.
> >
> > The driver should store the mapping in it's private software state if
> > it needs this kind of conversion.
> 
> I think the problem for this driver (and a few others) is that they
> somehow manage to split I/O completions at a different boundary
> than submissions.  For me with my block I/O background this seems
> weird, but I'll need networking folks to double check the theory.
> 
> > This is huge precendence for this, and there is therefore no need to
> > add even more complication and methods and burdon to architecture code
> > for the sake of this.
> 
> Unfortunately there are various drivers that abuse iommu_iova_to_phys
> to get a struct page to unmap.  Two of theose are network drivers
> that went in through you (dpaa2 and thunder), additonally the
> caam crypto driver (which I think is the same SOC family as dpaa,
> but I'm not sure) and the AMD GPU driver.
> 
> We also have drivers that just don't unmap and this don't work with
> iommus or dma-debug (IBM EMAC another net driver).
> 
> That being said I hate these new API, but I still think they are less
> bad than these IOMMU API abuses people do now.  If experienced
> networking folks know a way to get rid of both I'm all for it.

Hi,

will this API be included during the v5.5 kernel development cycle or is
there an alternative solution?

Thank you,
Madalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch v2 1/3] iommu: match the original algorithm

2019-12-02 Thread John Garry

On 30/11/2019 05:58, Cong Wang wrote:

On Fri, Nov 29, 2019 at 6:43 AM John Garry  wrote:


On 29/11/2019 00:48, Cong Wang wrote:

The IOVA cache algorithm implemented in IOMMU code does not
exactly match the original algorithm described in the paper.



which paper?


It's in drivers/iommu/iova.c, from line 769:

  769 /*
  770  * Magazine caches for IOVA ranges.  For an introduction to magazines,
  771  * see the USENIX 2001 paper "Magazines and Vmem: Extending the Slab
  772  * Allocator to Many CPUs and Arbitrary Resources" by Bonwick and Adams.
  773  * For simplicity, we use a static magazine size and don't implement the
  774  * dynamic size tuning described in the paper.
  775  */





Particularly, it doesn't need to free the loaded empty magazine
when trying to put it back to global depot. To make it work, we
have to pre-allocate magazines in the depot and only recycle them
when all of them are full.

Before this patch, rcache->depot[] contains either full or
freed entries, after this patch, it contains either full or
empty (but allocated) entries.


I *quickly* tested this patch and got a small performance gain.


Thanks for testing! It requires a different workload to see bigger gain,
in our case, 24 memcache.parallel servers with 120 clients.



So in fact I was getting a ~10% throughput boost for my storage test. 
Seems more than I would expect/hope for. I would need to test more.








Cc: Joerg Roedel 
Signed-off-by: Cong Wang 
---
   drivers/iommu/iova.c | 45 +++-
   1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 41c605b0058f..cb473ddce4cf 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -862,12 +862,16 @@ static void init_iova_rcaches(struct iova_domain *iovad)
   struct iova_cpu_rcache *cpu_rcache;
   struct iova_rcache *rcache;
   unsigned int cpu;
- int i;
+ int i, j;

   for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
   rcache = >rcaches[i];
   spin_lock_init(>lock);
   rcache->depot_size = 0;
+ for (j = 0; j < MAX_GLOBAL_MAGS; ++j) {
+ rcache->depot[j] = iova_magazine_alloc(GFP_KERNEL);
+ WARN_ON(!rcache->depot[j]);
+ }
   rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), 
cache_line_size());
   if (WARN_ON(!rcache->cpu_rcaches))
   continue;
@@ -900,24 +904,30 @@ static bool __iova_rcache_insert(struct iova_domain 
*iovad,

   if (!iova_magazine_full(cpu_rcache->loaded)) {
   can_insert = true;
- } else if (!iova_magazine_full(cpu_rcache->prev)) {
+ } else if (iova_magazine_empty(cpu_rcache->prev)) {


is this change strictly necessary?


Yes, because it is what described in the paper. But it should be
functionally same because cpu_rcache->prev is either full or empty.


That is was what I was getting at.








   swap(cpu_rcache->prev, cpu_rcache->loaded);
   can_insert = true;
   } else {
- struct iova_magazine *new_mag = iova_magazine_alloc(GFP_ATOMIC);


Apart from this change, did anyone ever consider kmem cache for the 
magazines?



+ spin_lock(>lock);
+ if (rcache->depot_size < MAX_GLOBAL_MAGS) {
+ swap(rcache->depot[rcache->depot_size], cpu_rcache->prev);
+ swap(cpu_rcache->prev, cpu_rcache->loaded);
+ rcache->depot_size++;
+ can_insert = true;
+ } else {
+ mag_to_free = cpu_rcache->loaded;
+ }
+ spin_unlock(>lock);
+
+ if (mag_to_free) {
+ struct iova_magazine *new_mag = 
iova_magazine_alloc(GFP_ATOMIC);

- if (new_mag) {
- spin_lock(>lock);
- if (rcache->depot_size < MAX_GLOBAL_MAGS) {
- rcache->depot[rcache->depot_size++] =
- cpu_rcache->loaded;
+ if (new_mag) {
+ cpu_rcache->loaded = new_mag;
+ can_insert = true;
   } else {
- mag_to_free = cpu_rcache->loaded;
+ mag_to_free = NULL;
   }
- spin_unlock(>lock);
-
- cpu_rcache->loaded = new_mag;
- can_insert = true;
   }
   }

@@ -963,14 +973,15 @@ static unsigned long __iova_rcache_get(struct iova_rcache 
*rcache,

   if (!iova_magazine_empty(cpu_rcache->loaded)) {
   has_pfn = true;
- } else if (!iova_magazine_empty(cpu_rcache->prev)) {
+ } else if (iova_magazine_full(cpu_rcache->prev)) {
   swap(cpu_rcache->prev, cpu_rcache->loaded);
   has_pfn = true;
 

Re: [Patch v2 2/3] iommu: optimize iova_magazine_free_pfns()

2019-12-02 Thread John Garry

On 30/11/2019 06:02, Cong Wang wrote:

On Fri, Nov 29, 2019 at 5:24 AM John Garry  wrote:


On 29/11/2019 00:48, Cong Wang wrote:

If the maganize is empty, iova_magazine_free_pfns() should


magazine


Good catch!




be a nop, however it misses the case of mag->size==0. So we
should just call iova_magazine_empty().

This should reduce the contention on iovad->iova_rbtree_lock
a little bit.

Cc: Joerg Roedel 
Signed-off-by: Cong Wang 
---
   drivers/iommu/iova.c | 22 +++---
   1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index cb473ddce4cf..184d4c0e20b5 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -797,13 +797,23 @@ static void iova_magazine_free(struct iova_magazine *mag)
   kfree(mag);
   }

+static bool iova_magazine_full(struct iova_magazine *mag)
+{
+ return (mag && mag->size == IOVA_MAG_SIZE);
+}
+
+static bool iova_magazine_empty(struct iova_magazine *mag)
+{
+ return (!mag || mag->size == 0);
+}
+
   static void
   iova_magazine_free_pfns(struct iova_magazine *mag, struct iova_domain *iovad)
   {
   unsigned long flags;
   int i;

- if (!mag)
+ if (iova_magazine_empty(mag))


The only hot path we this call is
__iova_rcache_insert()->iova_magazine_free_pfns(mag_to_free) and
mag_to_free is full in this case, so I am sure how the additional check
helps, right?


This is what I mean by "a little bit" in changelog, did you miss it or
misunderstand it? :)


I was concerned that in the fastpath we actually make things very 
marginally slower by adding a check which will fail.


Thanks,
John



Thanks.
.



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu