Re: [PATCH v16 4/5] dt-bindings: arm-smmu: Add bindings for qcom, smmu-v2

2018-09-05 Thread Vivek Gautam
Hi Rob,

On Thu, Aug 30, 2018 at 8:16 PM Vivek Gautam
 wrote:
>
> Add bindings doc for Qcom's smmu-v2 implementation.
>
> Signed-off-by: Vivek Gautam 
> Reviewed-by: Tomasz Figa 
> Tested-by: Srinivas Kandagatla 
> ---

I removed your reviewed-by for this particular patch.
Can you please consider giving your review if you find the changes okay now.
Thanks.

Best regards
Vivek

>  .../devicetree/bindings/iommu/arm,smmu.txt | 39 
> ++
>  1 file changed, 39 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
> b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> index 8a6ffce12af5..a6504b37cc21 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> @@ -17,10 +17,16 @@ conditions.
>  "arm,mmu-401"
>  "arm,mmu-500"
>  "cavium,smmu-v2"
> +"qcom,smmu-v2"
>
>depending on the particular implementation and/or the
>version of the architecture implemented.
>
> +  Qcom SoCs must contain, as below, SoC-specific compatibles
> +  along with "qcom,smmu-v2":
> +  "qcom,msm8996-smmu-v2", "qcom,smmu-v2",
> +  "qcom,sdm845-smmu-v2", "qcom,smmu-v2".
> +
>  - reg   : Base address and size of the SMMU.
>
>  - #global-interrupts : The number of global interrupts exposed by the
> @@ -71,6 +77,22 @@ conditions.
>or using stream matching with #iommu-cells = <2>, and
>may be ignored if present in such cases.
>
> +- clock-names:List of the names of clocks input to the device. The
> +  required list depends on particular implementation and
> +  is as follows:
> +  - for "qcom,smmu-v2":
> +- "bus": clock required for downstream bus access and
> + for the smmu ptw,
> +- "iface": clock required to access smmu's registers
> +   through the TCU's programming interface.
> +  - unspecified for other implementations.
> +
> +- clocks: Specifiers for all clocks listed in the clock-names 
> property,
> +  as per generic clock bindings.
> +
> +- power-domains:  Specifiers for power domains required to be powered on for
> +  the SMMU to operate, as per generic power domain bindings.
> +
>  ** Deprecated properties:
>
>  - mmu-masters (deprecated in favour of the generic "iommus" binding) :
> @@ -137,3 +159,20 @@ conditions.
>  iommu-map = <0  0 0x400>;
>  ...
>  };
> +
> +   /* Qcom's arm,smmu-v2 implementation */
> +   smmu4: iommu@d0 {
> +   compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2";
> +   reg = <0xd0 0x1>;
> +
> +   #global-interrupts = <1>;
> +   interrupts = ,
> +,
> +;
> +   #iommu-cells = <1>;
> +   power-domains = < MDSS_GDSC>;
> +
> +   clocks = < SMMU_MDP_AXI_CLK>,
> +< SMMU_MDP_AHB_CLK>;
> +   clock-names = "bus", "iface";
> +   };
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation
>
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 08/12] iommu/vt-d: Pass pasid table to context mapping

2018-09-05 Thread Tian, Kevin
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, August 30, 2018 9:35 AM
> 
> So that the pasid related info, such as the pasid table and the
> maximum of pasid could be used during setting up scalable mode
> context.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Ashok Raj 

Reviewed-by: Kevin Tian 

> ---
>  drivers/iommu/intel-iommu.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index c3bf2ccf094d..33642dd3d6ba 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1942,6 +1942,7 @@ static void domain_exit(struct dmar_domain
> *domain)
> 
>  static int domain_context_mapping_one(struct dmar_domain *domain,
> struct intel_iommu *iommu,
> +   struct pasid_table *table,
> u8 bus, u8 devfn)
>  {
>   u16 did = domain->iommu_did[iommu->seq_id];
> @@ -2064,6 +2065,7 @@ static int domain_context_mapping_one(struct
> dmar_domain *domain,
>  struct domain_context_mapping_data {
>   struct dmar_domain *domain;
>   struct intel_iommu *iommu;
> + struct pasid_table *table;
>  };
> 
>  static int domain_context_mapping_cb(struct pci_dev *pdev,
> @@ -2072,25 +2074,31 @@ static int domain_context_mapping_cb(struct
> pci_dev *pdev,
>   struct domain_context_mapping_data *data = opaque;
> 
>   return domain_context_mapping_one(data->domain, data-
> >iommu,
> -   PCI_BUS_NUM(alias), alias & 0xff);
> +   data->table, PCI_BUS_NUM(alias),
> +   alias & 0xff);
>  }
> 
>  static int
>  domain_context_mapping(struct dmar_domain *domain, struct device
> *dev)
>  {
> + struct domain_context_mapping_data data;
> + struct pasid_table *table;
>   struct intel_iommu *iommu;
>   u8 bus, devfn;
> - struct domain_context_mapping_data data;
> 
>   iommu = device_to_iommu(dev, , );
>   if (!iommu)
>   return -ENODEV;
> 
> + table = intel_pasid_get_table(dev);
> +
>   if (!dev_is_pci(dev))
> - return domain_context_mapping_one(domain, iommu, bus,
> devfn);
> + return domain_context_mapping_one(domain, iommu,
> table,
> +   bus, devfn);
> 
>   data.domain = domain;
>   data.iommu = iommu;
> + data.table = table;
> 
>   return pci_for_each_dma_alias(to_pci_dev(dev),
> _context_mapping_cb, );
> --
> 2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 06/12] iommu/vt-d: Add second level page table interface

2018-09-05 Thread Tian, Kevin
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, August 30, 2018 9:35 AM
> 
> This adds the interfaces to setup or tear down the structures
> for second level page table translations. This includes types
> of second level only translation and pass through.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Sanjay Kumar 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Ashok Raj 
> ---
>  drivers/iommu/intel-iommu.c |   2 +-
>  drivers/iommu/intel-pasid.c | 246
> 
>  drivers/iommu/intel-pasid.h |   7 +
>  include/linux/intel-iommu.h |   3 +
>  4 files changed, 257 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 562da10bf93e..de6b909bb47a 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1232,7 +1232,7 @@ static void iommu_set_root_entry(struct
> intel_iommu *iommu)
>   raw_spin_unlock_irqrestore(>register_lock, flag);
>  }
> 
> -static void iommu_flush_write_buffer(struct intel_iommu *iommu)
> +void iommu_flush_write_buffer(struct intel_iommu *iommu)
>  {
>   u32 val;
>   unsigned long flag;
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index d6e90cd5b062..edcea1d8b9fc 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -9,6 +9,7 @@
> 
>  #define pr_fmt(fmt)  "DMAR: " fmt
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -291,3 +292,248 @@ void intel_pasid_clear_entry(struct device *dev,
> int pasid)
> 
>   pasid_clear_entry(pe);
>  }
> +
> +static inline void pasid_set_bits(u64 *ptr, u64 mask, u64 bits)
> +{
> + u64 old;
> +
> + old = READ_ONCE(*ptr);
> + WRITE_ONCE(*ptr, (old & ~mask) | bits);
> +}
> +
> +/*
> + * Setup the DID(Domain Identifier) field (Bit 64~79) of scalable mode
> + * PASID entry.
> + */
> +static inline void
> +pasid_set_domain_id(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(>val[1], GENMASK_ULL(15, 0), value);
> +}
> +
> +/*
> + * Setup the SLPTPTR(Second Level Page Table Pointer) field (Bit 12~63)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_address_root(struct pasid_entry *pe, u64 value)

is address_root too general? especially when the entry could contain both
1st level and 2nd level pointers.

> +{
> + pasid_set_bits(>val[0], VTD_PAGE_MASK, value);
> +}
> +
> +/*
> + * Setup the AW(Address Width) field (Bit 2~4) of a scalable mode PASID
> + * entry.
> + */
> +static inline void
> +pasid_set_address_width(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(>val[0], GENMASK_ULL(4, 2), value << 2);
> +}
> +
> +/*
> + * Setup the PGTT(PASID Granular Translation Type) field (Bit 6~8)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_translation_type(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(>val[0], GENMASK_ULL(8, 6), value << 6);
> +}
> +
> +/*
> + * Enable fault processing by clearing the FPD(Fault Processing
> + * Disable) field (Bit 1) of a scalable mode PASID entry.
> + */
> +static inline void pasid_set_fault_enable(struct pasid_entry *pe)
> +{
> + pasid_set_bits(>val[0], 1 << 1, 0);
> +}
> +
> +/*
> + * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a
> + * scalable mode PASID entry.
> + */
> +static inline void pasid_set_sre(struct pasid_entry *pe)
> +{
> + pasid_set_bits(>val[2], 1 << 0, 1);
> +}
> +
> +/*
> + * Setup the P(Present) field (Bit 0) of a scalable mode PASID
> + * entry.
> + */
> +static inline void pasid_set_present(struct pasid_entry *pe)
> +{
> + pasid_set_bits(>val[0], 1 << 0, 1);
> +}

it's a long list and there could be more in the future. What about
defining some macro to simplify LOC, e.g.

#define PASID_SET(name, i, m, b)\
static inline void pasid_set_name(struct pasid_entry *pe)   \
{   \
pasid_set_bits(>val[i], m, b);  \
}

PASID_SET(present, 0, 1<<0, 1);
PASID_SET(sre, 2, 1<<0, 1);
...

> +
> +/*
> + * Setup Page Walk Snoop bit (Bit 87) of a scalable mode PASID
> + * entry.
> + */
> +static inline void pasid_set_page_snoop(struct pasid_entry *pe, bool value)
> +{
> + pasid_set_bits(>val[1], 1 << 23, value);
> +}
> +
> +static void
> +pasid_based_pasid_cache_invalidation(struct intel_iommu *iommu,
> +  int did, int pasid)

pasid_cache_invalidation_with_pasid

> +{
> + struct qi_desc desc;
> +
> + desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL |
> QI_PC_PASID(pasid);
> + desc.qw1 = 0;
> + desc.qw2 = 0;
> + desc.qw3 = 0;
> +
> + qi_submit_sync(, iommu);
> +}
> +
> +static void
> +pasid_based_iotlb_cache_invalidation(struct intel_iommu *iommu,
> +  u16 did, u32 pasid)

iotlb_invalidation_with_pasid

> +{
> + struct 

Re: [PATCH v2 02/12] iommu/vt-d: Manage scalalble mode PASID tables

2018-09-05 Thread Lu Baolu

Hi,

On 09/06/2018 10:52 AM, Tian, Kevin wrote:

From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Thursday, September 6, 2018 10:46 AM


[...]

@@ -143,8 +142,9 @@ int intel_pasid_alloc_table(struct device *dev)
return -ENOMEM;
INIT_LIST_HEAD(_table->dev);

-   size = sizeof(struct pasid_entry);
+   size = sizeof(struct pasid_dir_entry);
count = min_t(int, pci_max_pasids(to_pci_dev(dev)),
intel_pasid_max_id);
+   count >>= PASID_PDE_SHIFT;
order = get_order(size * count);
pages = alloc_pages_node(info->iommu->node,
 GFP_ATOMIC | __GFP_ZERO,
@@ -154,7 +154,7 @@ int intel_pasid_alloc_table(struct device *dev)

pasid_table->table = page_address(pages);
pasid_table->order = order;
-   pasid_table->max_pasid = count;
+   pasid_table->max_pasid = count << PASID_PDE_SHIFT;


are you sure of that count is PDE_SHIFT aligned? otherwise >>
then << would lose some bits. If sure, then better add some check.


I am making the max_pasid PDE_SHIFT aligned as the result of shift
operations.



earlier:

count = min_t(int, pci_max_pasids(to_pci_dev(dev)),
intel_pasid_max_id);


so you decided to truncate count to be PDE_SHIFT aligned. Is PASID
value user configurable? if not, then it's fine.


Here @count is the count of PASID directory entries, so it must be
truncated from the original max_pasid. PASID value is not configurable
anyway.







   attach_out:
device_attach_pasid_table(info, pasid_table);
@@ -162,14 +162,33 @@ int intel_pasid_alloc_table(struct device *dev)
return 0;
   }

+/* Get PRESENT bit of a PASID directory entry. */
+static inline bool
+pasid_pde_is_present(struct pasid_dir_entry *pde)
+{
+   return READ_ONCE(pde->val) & PASID_PTE_PRESENT;


curious why adding READ_ONCE specifically for PASID structure,
but not used for any other existing vtd structures? Is it to address
some specific requirement on PASID structure as defined in spec?


READ/WRITE_ONCE are used in pasid entry read/write to prevent the
compiler from merging, refetching or reordering successive instances of
read/write.



that's fine. I'm just curious why this is the first user of such macros
in intel-iommu driver. Even before with ecs we have PASID table too.



Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 02/12] iommu/vt-d: Manage scalalble mode PASID tables

2018-09-05 Thread Tian, Kevin
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, September 6, 2018 10:46 AM
>
[...] 
> >> @@ -143,8 +142,9 @@ int intel_pasid_alloc_table(struct device *dev)
> >>return -ENOMEM;
> >>INIT_LIST_HEAD(_table->dev);
> >>
> >> -  size = sizeof(struct pasid_entry);
> >> +  size = sizeof(struct pasid_dir_entry);
> >>count = min_t(int, pci_max_pasids(to_pci_dev(dev)),
> >> intel_pasid_max_id);
> >> +  count >>= PASID_PDE_SHIFT;
> >>order = get_order(size * count);
> >>pages = alloc_pages_node(info->iommu->node,
> >> GFP_ATOMIC | __GFP_ZERO,
> >> @@ -154,7 +154,7 @@ int intel_pasid_alloc_table(struct device *dev)
> >>
> >>pasid_table->table = page_address(pages);
> >>pasid_table->order = order;
> >> -  pasid_table->max_pasid = count;
> >> +  pasid_table->max_pasid = count << PASID_PDE_SHIFT;
> >
> > are you sure of that count is PDE_SHIFT aligned? otherwise >>
> > then << would lose some bits. If sure, then better add some check.
> 
> I am making the max_pasid PDE_SHIFT aligned as the result of shift
> operations.
> 

earlier:
> >>count = min_t(int, pci_max_pasids(to_pci_dev(dev)),
> >> intel_pasid_max_id);

so you decided to truncate count to be PDE_SHIFT aligned. Is PASID
value user configurable? if not, then it's fine.

> >
> >>
> >>   attach_out:
> >>device_attach_pasid_table(info, pasid_table);
> >> @@ -162,14 +162,33 @@ int intel_pasid_alloc_table(struct device *dev)
> >>return 0;
> >>   }
> >>
> >> +/* Get PRESENT bit of a PASID directory entry. */
> >> +static inline bool
> >> +pasid_pde_is_present(struct pasid_dir_entry *pde)
> >> +{
> >> +  return READ_ONCE(pde->val) & PASID_PTE_PRESENT;
> >
> > curious why adding READ_ONCE specifically for PASID structure,
> > but not used for any other existing vtd structures? Is it to address
> > some specific requirement on PASID structure as defined in spec?
> 
> READ/WRITE_ONCE are used in pasid entry read/write to prevent the
> compiler from merging, refetching or reordering successive instances of
> read/write.
> 

that's fine. I'm just curious why this is the first user of such macros
in intel-iommu driver. Even before with ecs we have PASID table too.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 03/12] iommu/vt-d: Move page table helpers into header

2018-09-05 Thread Lu Baolu

Hi,

On 09/06/2018 10:15 AM, Tian, Kevin wrote:

From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Thursday, August 30, 2018 9:35 AM

So that they could also be used in other source files.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Signed-off-by: Lu Baolu 
Reviewed-by: Ashok Raj 


Reviewed-by: Kevin Tian 


Thank you, Kevin.

Best regards,
Lu Baolu




---
  drivers/iommu/intel-iommu.c | 43 -
  include/linux/intel-iommu.h | 43
+
  2 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b0da4f765274..93cde957adc7 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -315,49 +315,6 @@ static inline void context_clear_entry(struct
context_entry *context)
context->hi = 0;
  }

-/*
- * 0: readable
- * 1: writable
- * 2-6: reserved
- * 7: super page
- * 8-10: available
- * 11: snoop behavior
- * 12-63: Host physcial address
- */
-struct dma_pte {
-   u64 val;
-};
-
-static inline void dma_clear_pte(struct dma_pte *pte)
-{
-   pte->val = 0;
-}
-
-static inline u64 dma_pte_addr(struct dma_pte *pte)
-{
-#ifdef CONFIG_64BIT
-   return pte->val & VTD_PAGE_MASK;
-#else
-   /* Must have a full atomic 64-bit read */
-   return  __cmpxchg64(>val, 0ULL, 0ULL) & VTD_PAGE_MASK;
-#endif
-}
-
-static inline bool dma_pte_present(struct dma_pte *pte)
-{
-   return (pte->val & 3) != 0;
-}
-
-static inline bool dma_pte_superpage(struct dma_pte *pte)
-{
-   return (pte->val & DMA_PTE_LARGE_PAGE);
-}
-
-static inline int first_pte_in_page(struct dma_pte *pte)
-{
-   return !((unsigned long)pte & ~VTD_PAGE_MASK);
-}
-
  /*
   * This domain is a statically identity mapping domain.
   *1. This domain creats a static 1:1 mapping to all usable memory.
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 2173ae35f1dc..41791903a5e3 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -501,6 +501,49 @@ static inline void __iommu_flush_cache(
clflush_cache_range(addr, size);
  }

+/*
+ * 0: readable
+ * 1: writable
+ * 2-6: reserved
+ * 7: super page
+ * 8-10: available
+ * 11: snoop behavior
+ * 12-63: Host physcial address
+ */
+struct dma_pte {
+   u64 val;
+};
+
+static inline void dma_clear_pte(struct dma_pte *pte)
+{
+   pte->val = 0;
+}
+
+static inline u64 dma_pte_addr(struct dma_pte *pte)
+{
+#ifdef CONFIG_64BIT
+   return pte->val & VTD_PAGE_MASK;
+#else
+   /* Must have a full atomic 64-bit read */
+   return  __cmpxchg64(>val, 0ULL, 0ULL) & VTD_PAGE_MASK;
+#endif
+}
+
+static inline bool dma_pte_present(struct dma_pte *pte)
+{
+   return (pte->val & 3) != 0;
+}
+
+static inline bool dma_pte_superpage(struct dma_pte *pte)
+{
+   return (pte->val & DMA_PTE_LARGE_PAGE);
+}
+
+static inline int first_pte_in_page(struct dma_pte *pte)
+{
+   return !((unsigned long)pte & ~VTD_PAGE_MASK);
+}
+
  extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct
pci_dev *dev);
  extern int dmar_find_matched_atsr_unit(struct pci_dev *dev);

--
2.17.1




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 02/12] iommu/vt-d: Manage scalalble mode PASID tables

2018-09-05 Thread Lu Baolu

Hi,

On 09/06/2018 10:14 AM, Tian, Kevin wrote:

From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Thursday, August 30, 2018 9:35 AM

In scalable mode, pasid structure is a two level table with
a pasid directory table and a pasid table. Any pasid entry
can be identified by a pasid value in below way.

1
9   6 5  0
 .---.---.
 |  PASID|   |
 '---'---'.-.
  ||  | |
  ||  | |
  ||  | |
  | .---.  |  .-.
  | |   |  |->| PASID Entry |
  | |   |  |  '-'
  | |   |  |Plus  | |
  | .---.  |  | |
  |>| DIR Entry |>| |
  | '---' '-'
.-.  |Plus |   |
| Context |  | |   |
|  Entry  |--->|   |
'-''---'

This changes the pasid table APIs to support scalable mode
PASID directory and PASID table. It also adds a helper to
get the PASID table entry according to the pasid value.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Signed-off-by: Sanjay Kumar 
Signed-off-by: Lu Baolu 
Reviewed-by: Ashok Raj 
---
  drivers/iommu/intel-iommu.c |  2 +-
  drivers/iommu/intel-pasid.c | 72 
-
  drivers/iommu/intel-pasid.h | 10 +-
  drivers/iommu/intel-svm.c   |  6 +---
  4 files changed, 74 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5845edf4dcf9..b0da4f765274 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2507,7 +2507,7 @@ static struct dmar_domain
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
if (dev)
dev->archdata.iommu = info;

-   if (dev && dev_is_pci(dev) && info->pasid_supported) {
+   if (dev && dev_is_pci(dev) && sm_supported(iommu)) {


worthy of a comment here that PASID table now is mandatory in
scalable mode, instead of optional for 1st level usage before.


Fair enough. Will add in the next version.




ret = intel_pasid_alloc_table(dev);
if (ret) {
__dmar_remove_one_dev_info(info);
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index fe95c9bd4d33..d6e90cd5b062 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -127,8 +127,7 @@ int intel_pasid_alloc_table(struct device *dev)
int ret, order;

info = dev->archdata.iommu;
-   if (WARN_ON(!info || !dev_is_pci(dev) ||
-   !info->pasid_supported || info->pasid_table))
+   if (WARN_ON(!info || !dev_is_pci(dev) || info->pasid_table))
return -EINVAL;


following same logic should you check sm_supported here?


If not sm_supported, info->pasid_table should be NULL. Checking
info->pasid_table is better since even sm_supported, the pasid
table pointer could still possible to be empty.





/* DMA alias device already has a pasid table, use it: */
@@ -143,8 +142,9 @@ int intel_pasid_alloc_table(struct device *dev)
return -ENOMEM;
INIT_LIST_HEAD(_table->dev);

-   size = sizeof(struct pasid_entry);
+   size = sizeof(struct pasid_dir_entry);
count = min_t(int, pci_max_pasids(to_pci_dev(dev)),
intel_pasid_max_id);
+   count >>= PASID_PDE_SHIFT;
order = get_order(size * count);
pages = alloc_pages_node(info->iommu->node,
 GFP_ATOMIC | __GFP_ZERO,
@@ -154,7 +154,7 @@ int intel_pasid_alloc_table(struct device *dev)

pasid_table->table = page_address(pages);
pasid_table->order = order;
-   pasid_table->max_pasid = count;
+   pasid_table->max_pasid = count << PASID_PDE_SHIFT;


are you sure of that count is PDE_SHIFT aligned? otherwise >>
then << would lose some bits. If sure, then better add some check.


I am making the max_pasid PDE_SHIFT aligned as the result of shift
operations.





  attach_out:
device_attach_pasid_table(info, pasid_table);
@@ -162,14 +162,33 @@ int intel_pasid_alloc_table(struct device *dev)
return 0;
  }

+/* Get PRESENT bit of a PASID directory entry. */
+static inline bool
+pasid_pde_is_present(struct pasid_dir_entry *pde)
+{
+   return READ_ONCE(pde->val) & PASID_PTE_PRESENT;


curious why adding READ_ONCE specifically for PASID structure,
but not used for any other existing vtd structures? Is it to address
some specific requirement on PASID structure as defined in spec?


READ/WRITE_ONCE are used in pasid entry read/write to prevent the
compiler from merging, refetching or 

RE: [PATCH v2 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support

2018-09-05 Thread Tian, Kevin
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, August 30, 2018 9:35 AM
> 
> Intel vt-d spec rev3.0 requires software to use 256-bit
> descriptors in invalidation queue. As the spec reads in
> section 6.5.2:
> 
> Remapping hardware supporting Scalable Mode Translations
> (ECAP_REG.SMTS=1) allow software to additionally program
> the width of the descriptors (128-bits or 256-bits) that
> will be written into the Queue. Software should setup the
> Invalidation Queue for 256-bit descriptors before progra-
> mming remapping hardware for scalable-mode translation as
> 128-bit descriptors are treated as invalid descriptors
> (see Table 21 in Section 6.5.2.10) in scalable-mode.
> 
> This patch adds 256-bit invalidation descriptor support
> if the hardware presents scalable mode capability.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Sanjay Kumar 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/dmar.c| 83 +++--
>  drivers/iommu/intel-svm.c   | 76 --
>  drivers/iommu/intel_irq_remapping.c |  6 ++-
>  include/linux/intel-iommu.h |  7 ++-
>  4 files changed, 113 insertions(+), 59 deletions(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index d9c748b6f9e4..b1429fa2cf29 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu
> *iommu, int index)
>   int head, tail;
>   struct q_inval *qi = iommu->qi;
>   int wait_index = (index + 1) % QI_LENGTH;
> + int shift = DMAR_IQ_SHIFT + !!ecap_smts(iommu->ecap);

could add a new macro: qi_shift()

> 
>   if (qi->desc_status[wait_index] == QI_ABORT)
>   return -EAGAIN;
> @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct intel_iommu
> *iommu, int index)
>*/
>   if (fault & DMA_FSTS_IQE) {
>   head = readl(iommu->reg + DMAR_IQH_REG);
> - if ((head >> DMAR_IQ_SHIFT) == index) {
> + if ((head >> shift) == index) {

could be another macro: qi_index(head)

> + struct qi_desc *desc = qi->desc + head;
> +
>   pr_err("VT-d detected invalid descriptor: "
>   "low=%llx, high=%llx\n",
> - (unsigned long long)qi->desc[index].low,
> - (unsigned long long)qi->desc[index].high);
> - memcpy(>desc[index], >desc[wait_index],
> - sizeof(struct qi_desc));
> + (unsigned long long)desc->qw0,
> + (unsigned long long)desc->qw1);

what about qw2 and qw3 in 256-bit case?

> + memcpy(desc, qi->desc + (wait_index << shift),
> +1 << shift);
>   writel(DMA_FSTS_IQE, iommu->reg +
> DMAR_FSTS_REG);
>   return -EINVAL;
>   }
> @@ -1191,10 +1194,10 @@ static int qi_check_fault(struct intel_iommu
> *iommu, int index)
>*/
>   if (fault & DMA_FSTS_ITE) {
>   head = readl(iommu->reg + DMAR_IQH_REG);
> - head = ((head >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) %
> QI_LENGTH;
> + head = ((head >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
>   head |= 1;
>   tail = readl(iommu->reg + DMAR_IQT_REG);
> - tail = ((tail >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) %
> QI_LENGTH;
> + tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
> 
>   writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
> 
> @@ -1222,15 +1225,14 @@ int qi_submit_sync(struct qi_desc *desc, struct
> intel_iommu *iommu)
>  {
>   int rc;
>   struct q_inval *qi = iommu->qi;
> - struct qi_desc *hw, wait_desc;
> + int offset, shift, length;
> + struct qi_desc wait_desc;
>   int wait_index, index;
>   unsigned long flags;
> 
>   if (!qi)
>   return 0;
> 
> - hw = qi->desc;
> -
>  restart:
>   rc = 0;
> 
> @@ -1243,16 +1245,21 @@ int qi_submit_sync(struct qi_desc *desc, struct
> intel_iommu *iommu)
> 
>   index = qi->free_head;
>   wait_index = (index + 1) % QI_LENGTH;
> + shift = DMAR_IQ_SHIFT + !!ecap_smts(iommu->ecap);
> + length = 1 << shift;
> 
>   qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE;
> 
> - hw[index] = *desc;
> -
> - wait_desc.low = QI_IWD_STATUS_DATA(QI_DONE) |
> + offset = index << shift;
> + memcpy(qi->desc + offset, desc, length);
> + wait_desc.qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
>   QI_IWD_STATUS_WRITE | QI_IWD_TYPE;
> - wait_desc.high = virt_to_phys(>desc_status[wait_index]);
> + wait_desc.qw1 = virt_to_phys(>desc_status[wait_index]);
> + wait_desc.qw2 = 0;
> + wait_desc.qw3 = 0;
> 
> - hw[wait_index] = wait_desc;
> + offset = wait_index << shift;
> 

Re: [PATCH v2 01/12] iommu/vt-d: Enumerate the scalable mode capability

2018-09-05 Thread Lu Baolu

Hi,

On 09/06/2018 09:55 AM, Tian, Kevin wrote:

From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Thursday, August 30, 2018 9:35 AM

The Intel vt-d spec rev3.0 introduces a new translation
mode called scalable mode, which enables PASID-granular
translations for first level, second level, nested and
pass-through modes. At the same time, the previous
Extended Context (ECS) mode is deprecated (no production
ever implements ECS).

This patch adds enumeration for Scalable Mode and removes
the deprecated ECS enumeration. It provides a boot time
option to disable scalable mode even hardware claims to
support it.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Signed-off-by: Sanjay Kumar 
Signed-off-by: Lu Baolu 
Reviewed-by: Ashok Raj 


Reviewed-by: Kevin Tian 



Thank you, Kevin.

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 03/12] iommu/vt-d: Move page table helpers into header

2018-09-05 Thread Tian, Kevin
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, August 30, 2018 9:35 AM
> 
> So that they could also be used in other source files.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Ashok Raj 

Reviewed-by: Kevin Tian 

> ---
>  drivers/iommu/intel-iommu.c | 43 -
>  include/linux/intel-iommu.h | 43
> +
>  2 files changed, 43 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index b0da4f765274..93cde957adc7 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -315,49 +315,6 @@ static inline void context_clear_entry(struct
> context_entry *context)
>   context->hi = 0;
>  }
> 
> -/*
> - * 0: readable
> - * 1: writable
> - * 2-6: reserved
> - * 7: super page
> - * 8-10: available
> - * 11: snoop behavior
> - * 12-63: Host physcial address
> - */
> -struct dma_pte {
> - u64 val;
> -};
> -
> -static inline void dma_clear_pte(struct dma_pte *pte)
> -{
> - pte->val = 0;
> -}
> -
> -static inline u64 dma_pte_addr(struct dma_pte *pte)
> -{
> -#ifdef CONFIG_64BIT
> - return pte->val & VTD_PAGE_MASK;
> -#else
> - /* Must have a full atomic 64-bit read */
> - return  __cmpxchg64(>val, 0ULL, 0ULL) & VTD_PAGE_MASK;
> -#endif
> -}
> -
> -static inline bool dma_pte_present(struct dma_pte *pte)
> -{
> - return (pte->val & 3) != 0;
> -}
> -
> -static inline bool dma_pte_superpage(struct dma_pte *pte)
> -{
> - return (pte->val & DMA_PTE_LARGE_PAGE);
> -}
> -
> -static inline int first_pte_in_page(struct dma_pte *pte)
> -{
> - return !((unsigned long)pte & ~VTD_PAGE_MASK);
> -}
> -
>  /*
>   * This domain is a statically identity mapping domain.
>   *   1. This domain creats a static 1:1 mapping to all usable memory.
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 2173ae35f1dc..41791903a5e3 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -501,6 +501,49 @@ static inline void __iommu_flush_cache(
>   clflush_cache_range(addr, size);
>  }
> 
> +/*
> + * 0: readable
> + * 1: writable
> + * 2-6: reserved
> + * 7: super page
> + * 8-10: available
> + * 11: snoop behavior
> + * 12-63: Host physcial address
> + */
> +struct dma_pte {
> + u64 val;
> +};
> +
> +static inline void dma_clear_pte(struct dma_pte *pte)
> +{
> + pte->val = 0;
> +}
> +
> +static inline u64 dma_pte_addr(struct dma_pte *pte)
> +{
> +#ifdef CONFIG_64BIT
> + return pte->val & VTD_PAGE_MASK;
> +#else
> + /* Must have a full atomic 64-bit read */
> + return  __cmpxchg64(>val, 0ULL, 0ULL) & VTD_PAGE_MASK;
> +#endif
> +}
> +
> +static inline bool dma_pte_present(struct dma_pte *pte)
> +{
> + return (pte->val & 3) != 0;
> +}
> +
> +static inline bool dma_pte_superpage(struct dma_pte *pte)
> +{
> + return (pte->val & DMA_PTE_LARGE_PAGE);
> +}
> +
> +static inline int first_pte_in_page(struct dma_pte *pte)
> +{
> + return !((unsigned long)pte & ~VTD_PAGE_MASK);
> +}
> +
>  extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct
> pci_dev *dev);
>  extern int dmar_find_matched_atsr_unit(struct pci_dev *dev);
> 
> --
> 2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 02/12] iommu/vt-d: Manage scalalble mode PASID tables

2018-09-05 Thread Tian, Kevin
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, August 30, 2018 9:35 AM
> 
> In scalable mode, pasid structure is a two level table with
> a pasid directory table and a pasid table. Any pasid entry
> can be identified by a pasid value in below way.
> 
>1
>9   6 5  0
> .---.---.
> |  PASID|   |
> '---'---'.-.
>  ||  | |
>  ||  | |
>  ||  | |
>  | .---.  |  .-.
>  | |   |  |->| PASID Entry |
>  | |   |  |  '-'
>  | |   |  |Plus  | |
>  | .---.  |  | |
>  |>| DIR Entry |>| |
>  | '---' '-'
> .-.  |Plus |   |
> | Context |  | |   |
> |  Entry  |--->|   |
> '-''---'
> 
> This changes the pasid table APIs to support scalable mode
> PASID directory and PASID table. It also adds a helper to
> get the PASID table entry according to the pasid value.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Sanjay Kumar 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Ashok Raj 
> ---
>  drivers/iommu/intel-iommu.c |  2 +-
>  drivers/iommu/intel-pasid.c | 72 
> -
>  drivers/iommu/intel-pasid.h | 10 +-
>  drivers/iommu/intel-svm.c   |  6 +---
>  4 files changed, 74 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 5845edf4dcf9..b0da4f765274 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2507,7 +2507,7 @@ static struct dmar_domain
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>   if (dev)
>   dev->archdata.iommu = info;
> 
> - if (dev && dev_is_pci(dev) && info->pasid_supported) {
> + if (dev && dev_is_pci(dev) && sm_supported(iommu)) {

worthy of a comment here that PASID table now is mandatory in
scalable mode, instead of optional for 1st level usage before.

>   ret = intel_pasid_alloc_table(dev);
>   if (ret) {
>   __dmar_remove_one_dev_info(info);
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index fe95c9bd4d33..d6e90cd5b062 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -127,8 +127,7 @@ int intel_pasid_alloc_table(struct device *dev)
>   int ret, order;
> 
>   info = dev->archdata.iommu;
> - if (WARN_ON(!info || !dev_is_pci(dev) ||
> - !info->pasid_supported || info->pasid_table))
> + if (WARN_ON(!info || !dev_is_pci(dev) || info->pasid_table))
>   return -EINVAL;

following same logic should you check sm_supported here?

> 
>   /* DMA alias device already has a pasid table, use it: */
> @@ -143,8 +142,9 @@ int intel_pasid_alloc_table(struct device *dev)
>   return -ENOMEM;
>   INIT_LIST_HEAD(_table->dev);
> 
> - size = sizeof(struct pasid_entry);
> + size = sizeof(struct pasid_dir_entry);
>   count = min_t(int, pci_max_pasids(to_pci_dev(dev)),
> intel_pasid_max_id);
> + count >>= PASID_PDE_SHIFT;
>   order = get_order(size * count);
>   pages = alloc_pages_node(info->iommu->node,
>GFP_ATOMIC | __GFP_ZERO,
> @@ -154,7 +154,7 @@ int intel_pasid_alloc_table(struct device *dev)
> 
>   pasid_table->table = page_address(pages);
>   pasid_table->order = order;
> - pasid_table->max_pasid = count;
> + pasid_table->max_pasid = count << PASID_PDE_SHIFT;

are you sure of that count is PDE_SHIFT aligned? otherwise >>
then << would lose some bits. If sure, then better add some check.

> 
>  attach_out:
>   device_attach_pasid_table(info, pasid_table);
> @@ -162,14 +162,33 @@ int intel_pasid_alloc_table(struct device *dev)
>   return 0;
>  }
> 
> +/* Get PRESENT bit of a PASID directory entry. */
> +static inline bool
> +pasid_pde_is_present(struct pasid_dir_entry *pde)
> +{
> + return READ_ONCE(pde->val) & PASID_PTE_PRESENT;

curious why adding READ_ONCE specifically for PASID structure,
but not used for any other existing vtd structures? Is it to address
some specific requirement on PASID structure as defined in spec?

> +}
> +
> +/* Get PASID table from a PASID directory entry. */
> +static inline struct pasid_entry *
> +get_pasid_table_from_pde(struct pasid_dir_entry *pde)
> +{
> + if (!pasid_pde_is_present(pde))
> + return NULL;
> +
> + return phys_to_virt(READ_ONCE(pde->val) & PDE_PFN_MASK);
> +}
> +
>  void intel_pasid_free_table(struct 

RE: [PATCH v2 01/12] iommu/vt-d: Enumerate the scalable mode capability

2018-09-05 Thread Tian, Kevin
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, August 30, 2018 9:35 AM
> 
> The Intel vt-d spec rev3.0 introduces a new translation
> mode called scalable mode, which enables PASID-granular
> translations for first level, second level, nested and
> pass-through modes. At the same time, the previous
> Extended Context (ECS) mode is deprecated (no production
> ever implements ECS).
> 
> This patch adds enumeration for Scalable Mode and removes
> the deprecated ECS enumeration. It provides a boot time
> option to disable scalable mode even hardware claims to
> support it.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Sanjay Kumar 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Ashok Raj 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device

2018-09-05 Thread Lu Baolu

Hi,

On 09/06/2018 03:15 AM, Alex Williamson wrote:

On Wed, 5 Sep 2018 03:01:39 +
"Tian, Kevin"  wrote:


From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Thursday, August 30, 2018 12:09 PM
   

[...]


In order to distinguish the IOMMU-capable mediated devices from those
which still need to rely on parent devices, this patch set adds a
domain type attribute to each mdev.

enum mdev_domain_type {
DOMAIN_TYPE_NO_IOMMU,   /* Don't need any IOMMU support.
 * All isolation and protection
 * are handled by the parent
 * device driver with a device
 * specific mechanism.
 */
DOMAIN_TYPE_ATTACH_PARENT, /* IOMMU can isolate and
protect
* the mdev, and the isolation
* domain should be attaced with
* the parent device.
*/
};
   


ATTACH_PARENT is not like a good counterpart to NO_IOMMU.


Please do not use NO_IOMMU, we already have a thing called
vfio-noiommu, enabled through CONFIG_VFIO_NOIOMMU and module parameter
enable_unsafe_noiommu_mode.  This is much, much too similar and will
generate confusion.


Sure. Will remove this confusion.




what about DOMAIN_TYPE_NO_IOMMU/DOMAIN_TYPE_IOMMU? whether
to attach parent device is just internal logic.

Alternatively DOMAIN_TYPE_SOFTWARE/DOMAIN_TYPE_HARDWARE,
where software means iommu_domain is managed by software while
the other means managed by hardware.


I haven't gotten deep enough into the series to see how it's used, but
my gut reaction is that we don't need an enum, we just need some sort
of pointer on the mdev that points to an iommu_parent, which indicates
the root of our IOMMU based isolation, or is NULL, which indicates we
use vendor defined isolation as we have now.


It works as long as we can distinguish IOMMU based isolation and the
vendor defined isolation.

How about making the iommu_parent points the device structure who
created the mdev? If this pointer is NOT NULL we will bind the domain
to the device pointed to by it, otherwise, handle it in the vendor
defined way?

Best regards,
Lu Baolu




One side note to Alex - with multiple domain extension in IOMMU layer,
this version combines IOMMU-capable usages in VFIO: PASID-based (as
in scalable iov) and RID-based (as the usage of mdev wrapper on any
device). Both cases share the common path - just binding the domain to the
parent device of mdev. IOMMU layer will handle two cases differently later.


Good, I'm glad you've considered the regular (RID) IOMMU domain and not
just the new aux domain.  Thanks,

Alex


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 03/10] iommu/amd: Add default branch in amd_iommu_capable()

2018-09-05 Thread Lu Baolu

Hi,

On 09/06/2018 03:37 AM, Alex Williamson wrote:

On Thu, 30 Aug 2018 12:09:15 +0800
Lu Baolu  wrote:


Otherwise, there will be a build warning:

drivers/iommu/amd_iommu.c:3083:2: warning: enumeration value
'IOMMU_CAP_AUX_DOMAIN' not handled in switch [-Wswitch]

There is no functional change.

Signed-off-by: Lu Baolu 
---
  drivers/iommu/amd_iommu.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 4e04fff23977..237ae6db4cfd 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3077,6 +3077,8 @@ static bool amd_iommu_capable(enum iommu_cap cap)
return (irq_remapping_enabled == 1);
case IOMMU_CAP_NOEXEC:
return false;
+   default:
+   break;
}
  
  	return false;


Seems like a bug fix that doesn't need to be part of this RFC, send it
separately.  Thanks,


Sure.

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 02/10] iommu/vt-d: Add multiple domains per device query

2018-09-05 Thread Lu Baolu

Hi,

On 09/06/2018 03:35 AM, Alex Williamson wrote:

On Thu, 30 Aug 2018 12:09:14 +0800
Lu Baolu  wrote:


Add the response to IOMMU_CAP_AUX_DOMAIN capability query
through iommu_capable(). Return true if IOMMUs support the
scalable mode, return false otherwise.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel-iommu.c | 31 +--
  1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3e49d4029058..891ae70e7bf2 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5193,12 +5193,39 @@ static phys_addr_t intel_iommu_iova_to_phys(struct 
iommu_domain *domain,
return phys;
  }
  
+static inline bool scalable_mode_support(void)

+{
+   struct dmar_drhd_unit *drhd;
+   struct intel_iommu *iommu;
+   bool ret = true;
+
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd) {
+   if (!sm_supported(iommu)) {
+   ret = false;
+   break;
+   }
+   }
+   rcu_read_unlock();
+
+   return ret;
+}
+
  static bool intel_iommu_capable(enum iommu_cap cap)
  {
-   if (cap == IOMMU_CAP_CACHE_COHERENCY)
+   switch (cap) {
+   case IOMMU_CAP_CACHE_COHERENCY:
return domain_update_iommu_snooping(NULL) == 1;
-   if (cap == IOMMU_CAP_INTR_REMAP)
+   case IOMMU_CAP_INTR_REMAP:
return irq_remapping_enabled == 1;
+   case IOMMU_CAP_AUX_DOMAIN:
+   return scalable_mode_support();
+   case IOMMU_CAP_NOEXEC:
+   /* PASSTHROUGH */
+   default:
+   pr_info("Unsupported capability query %d\n", cap);
+   break;


Please don't do this, there's no reason to be noisy about a query of a
capability that VT-d doesn't know about.  We implement capabilities
exactly so that relevant drivers can expose a feature and others can
happily (and quietly) ignore them.  Thanks,


Fair enough. I will remove it in the next version.

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 19/21] iommu: fsl_pamu: use for_each_of_cpu_node iterator

2018-09-05 Thread Rob Herring
Use the for_each_of_cpu_node iterator to iterate over cpu nodes. This
has the side effect of defaulting to iterating using "cpu" node names in
preference to the deprecated (for FDT) device_type == "cpu".

Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Rob Herring 
---
Please ack and I will take via the DT tree. This is dependent on the
first 2 patches.

 drivers/iommu/fsl_pamu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
index 8540625796a1..1b955aea44dd 100644
--- a/drivers/iommu/fsl_pamu.c
+++ b/drivers/iommu/fsl_pamu.c
@@ -543,7 +543,7 @@ u32 get_stash_id(u32 stash_dest_hint, u32 vcpu)
return ~(u32)0;
}

-   for_each_node_by_type(node, "cpu") {
+   for_each_of_cpu_node(node) {
prop = of_get_property(node, "reg", );
for (i = 0; i < len / sizeof(u32); i++) {
if (be32_to_cpup([i]) == vcpu) {
--
2.17.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 00/21] DT cpu node iterator

2018-09-05 Thread Rob Herring
This series adds an iterator for cpu nodes and converts users over to use
it or of_get_cpu_node in some cases. This allows us to remove the
dependency on device_type property for cpu nodes though removing that
from DTS files will have to wait for some time. In some cases, this makes
the DT search more strict by only looking in /cpus child nodes rather
than any node with the device_type == cpu. The iterator also honors the
status property which is often forgotten.

I've only tested on ARM under QEMU and compiled powerpc.

Rob

Rob Herring (21):
  of: Add cpu node iterator for_each_of_cpu_node()
  of: Support matching cpu nodes with no 'reg' property
  ARM: use for_each_of_cpu_node iterator
  ARM: topology: remove unneeded check for /cpus node
  ARM: shmobile: use for_each_of_cpu_node iterator
  arm64: use for_each_of_cpu_node iterator
  c6x: use for_each_of_cpu_node iterator
  microblaze: get cpu node with of_get_cpu_node
  nios2: get cpu node with of_get_cpu_node
  openrisc: use for_each_of_cpu_node iterator
  powerpc: use for_each_of_cpu_node iterator
  powerpc: 4xx: get cpu node with of_get_cpu_node
  powerpc: 8xx: get cpu node with of_get_cpu_node
  riscv: use for_each_of_cpu_node iterator
  SH: use for_each_of_cpu_node iterator
  x86: DT: use for_each_of_cpu_node iterator
  clk: mvebu: use for_each_of_cpu_node iterator
  edac: cpc925: use for_each_of_cpu_node iterator
  iommu: fsl_pamu: use for_each_of_cpu_node iterator
  of: use for_each_of_cpu_node iterator
  fbdev: fsl-diu: get cpu node with of_get_cpu_node

 arch/arm/kernel/devtree.c |  5 +--
 arch/arm/kernel/topology.c|  6 ---
 arch/arm/mach-shmobile/pm-rcar-gen2.c |  8 +---
 arch/arm/mach-shmobile/pm-rmobile.c   |  2 +-
 arch/arm/mach-shmobile/timer.c| 10 +
 arch/arm64/kernel/smp.c   |  2 +-
 arch/c6x/kernel/setup.c   | 11 ++---
 arch/microblaze/kernel/cpu/cpuinfo.c  |  4 +-
 arch/nios2/kernel/cpuinfo.c   |  4 +-
 arch/openrisc/kernel/setup.c  |  3 +-
 arch/powerpc/platforms/4xx/soc.c  |  2 +-
 arch/powerpc/platforms/8xx/m8xx_setup.c   |  5 ++-
 arch/powerpc/platforms/powermac/feature.c | 51 ---
 arch/powerpc/platforms/powermac/setup.c   | 15 +++
 arch/riscv/kernel/smpboot.c   |  2 +-
 arch/sh/boards/of-generic.c   |  2 +-
 arch/x86/kernel/devicetree.c  |  2 +-
 drivers/clk/mvebu/clk-cpu.c   |  4 +-
 drivers/edac/cpc925_edac.c| 20 +
 drivers/iommu/fsl_pamu.c  |  2 +-
 drivers/of/base.c | 43 ++-
 drivers/of/of_numa.c  | 15 +--
 drivers/video/fbdev/fsl-diu-fb.c  |  2 +-
 include/linux/of.h| 11 +
 24 files changed, 111 insertions(+), 120 deletions(-)

--
2.17.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 03/10] iommu/amd: Add default branch in amd_iommu_capable()

2018-09-05 Thread Alex Williamson
On Thu, 30 Aug 2018 12:09:15 +0800
Lu Baolu  wrote:

> Otherwise, there will be a build warning:
> 
> drivers/iommu/amd_iommu.c:3083:2: warning: enumeration value
> 'IOMMU_CAP_AUX_DOMAIN' not handled in switch [-Wswitch]
> 
> There is no functional change.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/amd_iommu.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 4e04fff23977..237ae6db4cfd 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -3077,6 +3077,8 @@ static bool amd_iommu_capable(enum iommu_cap cap)
>   return (irq_remapping_enabled == 1);
>   case IOMMU_CAP_NOEXEC:
>   return false;
> + default:
> + break;
>   }
>  
>   return false;

Seems like a bug fix that doesn't need to be part of this RFC, send it
separately.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 02/10] iommu/vt-d: Add multiple domains per device query

2018-09-05 Thread Alex Williamson
On Thu, 30 Aug 2018 12:09:14 +0800
Lu Baolu  wrote:

> Add the response to IOMMU_CAP_AUX_DOMAIN capability query
> through iommu_capable(). Return true if IOMMUs support the
> scalable mode, return false otherwise.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel-iommu.c | 31 +--
>  1 file changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 3e49d4029058..891ae70e7bf2 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5193,12 +5193,39 @@ static phys_addr_t intel_iommu_iova_to_phys(struct 
> iommu_domain *domain,
>   return phys;
>  }
>  
> +static inline bool scalable_mode_support(void)
> +{
> + struct dmar_drhd_unit *drhd;
> + struct intel_iommu *iommu;
> + bool ret = true;
> +
> + rcu_read_lock();
> + for_each_active_iommu(iommu, drhd) {
> + if (!sm_supported(iommu)) {
> + ret = false;
> + break;
> + }
> + }
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
>  static bool intel_iommu_capable(enum iommu_cap cap)
>  {
> - if (cap == IOMMU_CAP_CACHE_COHERENCY)
> + switch (cap) {
> + case IOMMU_CAP_CACHE_COHERENCY:
>   return domain_update_iommu_snooping(NULL) == 1;
> - if (cap == IOMMU_CAP_INTR_REMAP)
> + case IOMMU_CAP_INTR_REMAP:
>   return irq_remapping_enabled == 1;
> + case IOMMU_CAP_AUX_DOMAIN:
> + return scalable_mode_support();
> + case IOMMU_CAP_NOEXEC:
> + /* PASSTHROUGH */
> + default:
> + pr_info("Unsupported capability query %d\n", cap);
> + break;

Please don't do this, there's no reason to be noisy about a query of a
capability that VT-d doesn't know about.  We implement capabilities
exactly so that relevant drivers can expose a feature and others can
happily (and quietly) ignore them.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device

2018-09-05 Thread Alex Williamson
On Wed, 5 Sep 2018 03:01:39 +
"Tian, Kevin"  wrote:

> > From: Lu Baolu [mailto:baolu...@linux.intel.com]
> > Sent: Thursday, August 30, 2018 12:09 PM
> >   
> [...]
> > 
> > In order to distinguish the IOMMU-capable mediated devices from those
> > which still need to rely on parent devices, this patch set adds a
> > domain type attribute to each mdev.
> > 
> > enum mdev_domain_type {
> > DOMAIN_TYPE_NO_IOMMU,   /* Don't need any IOMMU support.
> >  * All isolation and protection
> >  * are handled by the parent
> >  * device driver with a device
> >  * specific mechanism.
> >  */
> > DOMAIN_TYPE_ATTACH_PARENT, /* IOMMU can isolate and
> > protect
> > * the mdev, and the isolation
> > * domain should be attaced with
> > * the parent device.
> > */
> > };
> >   
> 
> ATTACH_PARENT is not like a good counterpart to NO_IOMMU.

Please do not use NO_IOMMU, we already have a thing called
vfio-noiommu, enabled through CONFIG_VFIO_NOIOMMU and module parameter
enable_unsafe_noiommu_mode.  This is much, much too similar and will
generate confusion.

> what about DOMAIN_TYPE_NO_IOMMU/DOMAIN_TYPE_IOMMU? whether
> to attach parent device is just internal logic.
> 
> Alternatively DOMAIN_TYPE_SOFTWARE/DOMAIN_TYPE_HARDWARE,
> where software means iommu_domain is managed by software while
> the other means managed by hardware.

I haven't gotten deep enough into the series to see how it's used, but
my gut reaction is that we don't need an enum, we just need some sort
of pointer on the mdev that points to an iommu_parent, which indicates
the root of our IOMMU based isolation, or is NULL, which indicates we
use vendor defined isolation as we have now.

> One side note to Alex - with multiple domain extension in IOMMU layer,
> this version combines IOMMU-capable usages in VFIO: PASID-based (as 
> in scalable iov) and RID-based (as the usage of mdev wrapper on any 
> device). Both cases share the common path - just binding the domain to the
> parent device of mdev. IOMMU layer will handle two cases differently later. 

Good, I'm glad you've considered the regular (RID) IOMMU domain and not
just the new aux domain.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces

2018-09-05 Thread Jacob Pan
On Wed, 5 Sep 2018 14:14:12 +0200
Auger Eric  wrote:

> > + *
> > + * On Arm and AMD IOMMUs, entry 0 of the PASID table can be used
> > to hold
> > + * non-PASID translations. In this case PASID 0 is reserved and
> > entry 0 points
> > + * to the io_pgtable base. On Intel IOMMU, the io_pgtable base
> > would be held in
> > + * the device table and PASID 0 would be available to the
> > allocator.
> > + */  
> very nice explanation
With the new Vt-d 3.0 spec., 2nd level IO page table base is no longer
held in the device context table. Instead it is held in the PASID table
entry pointed by the RID_PASID field in the device context entry. If
RID_PASID = 0, then it is the same as ARM and AMD IOMMUs.
You can refer to ch3.4.3 of the VT-d spec.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5 V6] x86/ioremap: strengthen the logic in early_memremap_pgprot_adjust() to adjust encryption mask

2018-09-05 Thread lijiang
在 2018年09月05日 14:46, Dave Young 写道:
> [snip]
>>
>> As previously mentioned, there are also many differences between kexec and 
>> kdump. In general,
>> kexec needs to look at all of available physical memory, but kdump doesn't 
>> need.
>>
>> For kexec, kexec-tools will read /sys/firmware/memmap and recreate the e820 
>> ranges for the 2nd
>> kernel. If it fails, will use /proc/iomem.
>>
>> For kdump, kexec-tools will read /proc/iomem and recreate the e820 ranges 
>> for kdump kernel.
>> BTW: we can not get the range of persistent memory from /proc/iomem. So e820 
>> ranges don't contain
>> the persistent memory in kdump kernel, this is the real reason why i need to 
>> strengthen the logic
>> of adjusting memory encryption mask.
> 
> "persistent memory" is different, I think you meant about some reserved
> memory instead
> 
>>
>> If kexec-tools also use /sys/firmware/memmap for kdump(like kexec), kdump 
>> kernel can also work
>> without a fix, but the kexec-tools will have to be modified. Are you sure 
>> that you want me to
>> fix kexec-tools instead of kernel?
> 
> Yes, please fix kexec-tools to pass reserved ranges in e820, you will
> not need this patch then.
> 

This might be a kexec-tools bug, i have posted a patch for kexec-tools(please 
check ke...@lists.infradead.org).
Thanks.

> Thanks
> Dave
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 04/40] iommu/sva: Add a mm_exit callback for device drivers

2018-09-05 Thread Auger Eric
Hi Jean-Philippe,

On 05/11/2018 09:06 PM, Jean-Philippe Brucker wrote:
> When an mm exits, devices that were bound to it must stop performing DMA
> on its PASID. Let device drivers register a callback to be notified on mm
> exit. Add the callback to the sva_param structure attached to struct
> device.
> 
> Signed-off-by: Jean-Philippe Brucker 
> 
> ---
> v1->v2: use iommu_sva_device_init instead of a separate function
> ---
>  drivers/iommu/iommu-sva.c | 11 ++-
>  include/linux/iommu.h |  9 +++--
>  2 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 6ac679c48f3c..0700893c679d 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -303,6 +303,7 @@ static void io_mm_detach_locked(struct iommu_bond *bond)
>   * @dev: the device
>   * @features: bitmask of features that need to be initialized
>   * @max_pasid: max PASID value supported by the device
> + * @mm_exit: callback to notify the device driver of an mm exiting
>   *
>   * Users of the bind()/unbind() API must call this function to initialize all
>   * features required for SVA.
> @@ -313,13 +314,20 @@ static void io_mm_detach_locked(struct iommu_bond *bond)
>   * description. Setting @max_pasid to a non-zero value smaller than this 
> limit
>   * overrides it.
>   *
> + * If the driver intends to share process address spaces, it should pass a 
> valid
> + * @mm_exit handler. Otherwise @mm_exit can be NULL.
I don't get case where mm_exit is allowed to be NULL.

Thanks

Eric
 After @mm_exit returns, the
> + * device must not issue any more transaction with the PASID given as 
> argument.
> + * The handler gets an opaque pointer corresponding to the drvdata passed as
> + * argument of bind().
> + *
>   * The device should not be performing any DMA while this function is 
> running,
>   * otherwise the behavior is undefined.
>   *
>   * Return 0 if initialization succeeded, or an error.
>   */
>  int iommu_sva_device_init(struct device *dev, unsigned long features,
> -   unsigned int max_pasid)
> +   unsigned int max_pasid,
> +   iommu_mm_exit_handler_t mm_exit)
>  {
>   int ret;
>   struct iommu_sva_param *param;
> @@ -337,6 +345,7 @@ int iommu_sva_device_init(struct device *dev, unsigned 
> long features,
>  
>   param->features = features;
>   param->max_pasid= max_pasid;
> + param->mm_exit  = mm_exit;
>   INIT_LIST_HEAD(>mm_list);
>  
>   /*
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index d5f21719a5a0..439c8fffd836 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -61,6 +61,7 @@ struct iommu_fault_event;
>  typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
>   struct device *, unsigned long, int, void *);
>  typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
> +typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid, void 
> *);
>  
>  struct iommu_domain_geometry {
>   dma_addr_t aperture_start; /* First address that can be mapped*/
> @@ -231,6 +232,7 @@ struct iommu_sva_param {
>   unsigned int min_pasid;
>   unsigned int max_pasid;
>   struct list_head mm_list;
> + iommu_mm_exit_handler_t mm_exit;
>  };
>  
>  /**
> @@ -980,17 +982,20 @@ static inline int iommu_sva_unbind_device(struct device 
> *dev, int pasid)
>  
>  #ifdef CONFIG_IOMMU_SVA
>  extern int iommu_sva_device_init(struct device *dev, unsigned long features,
> -  unsigned int max_pasid);
> +  unsigned int max_pasid,
> +  iommu_mm_exit_handler_t mm_exit);
>  extern int iommu_sva_device_shutdown(struct device *dev);
>  extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
>  int *pasid, unsigned long flags,
>  void *drvdata);
>  extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
>  extern void __iommu_sva_unbind_dev_all(struct device *dev);
> +
>  #else /* CONFIG_IOMMU_SVA */
>  static inline int iommu_sva_device_init(struct device *dev,
>   unsigned long features,
> - unsigned int max_pasid)
> + unsigned int max_pasid,
> + iommu_mm_exit_handler_t mm_exit)
>  {
>   return -ENODEV;
>  }
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 03/40] iommu/sva: Manage process address spaces

2018-09-05 Thread Auger Eric
Hi Jean-Philippe,

On 05/11/2018 09:06 PM, Jean-Philippe Brucker wrote:
> Allocate IOMMU mm structures and binding them to devices. Four operations
s/binding/bind
> are added to IOMMU drivers:
> 
> * mm_alloc(): to create an io_mm structure and perform architecture-
>   specific operations required to grab the process (for instance on ARM,
>   pin down the CPU ASID so that the process doesn't get assigned a new
>   ASID on rollover).
> 
>   There is a single valid io_mm structure per Linux mm. Future extensions
>   may also use io_mm for kernel-managed address spaces, populated with
>   map()/unmap() calls instead of bound to process address spaces. This
>   patch focuses on "shared" io_mm.
> 
> * mm_attach(): attach an mm to a device. The IOMMU driver checks that the
>   device is capable of sharing an address space, and writes the PASID
>   table entry to install the pgd.
> 
>   Some IOMMU drivers will have a single PASID table per domain, for
>   convenience. Other can implement it differently but to help these
>   drivers, mm_attach and mm_detach take 'attach_domain' and
>   'detach_domain' parameters, that tell whether they need to set and clear
>   the PASID entry or only send the required TLB invalidations.
> 
> * mm_detach(): detach an mm from a device. The IOMMU driver removes the
>   PASID table entry and invalidates the IOTLBs.
> 
> * mm_free(): free a structure allocated by mm_alloc(), and let arch
>   release the process.
> 
> mm_attach and mm_detach operations are serialized with a spinlock. When
> trying to optimize this code, we should at least prevent concurrent
> attach()/detach() on the same domain (so multi-level PASID table code can
> allocate tables lazily). mm_alloc() can sleep, but mm_free must not
> (because we'll have to call it from call_srcu later on).
> 
> At the moment we use an IDR for allocating PASIDs and retrieving contexts.
> We also use a single spinlock. These can be refined and optimized later (a
> custom allocator will be needed for top-down PASID allocation).
> 
> Keeping track of address spaces requires the use of MMU notifiers.
> Handling process exit with regard to unbind() is tricky, so it is left for
> another patch and we explicitly fail mm_alloc() for the moment.
> 
> Signed-off-by: Jean-Philippe Brucker 
> 
> ---
> v1->v2: sanity-check of flags
> ---
>  drivers/iommu/iommu-sva.c | 380 +-
>  drivers/iommu/iommu.c |   1 +
>  include/linux/iommu.h |  28 +++
>  3 files changed, 406 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 8d98f9c09864..6ac679c48f3c 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -5,8 +5,298 @@
>   * Copyright (C) 2018 ARM Ltd.
>   */
>  
> +#include 
>  #include 
> +#include 
>  #include 
> +#include 
> +
> +/**
> + * DOC: io_mm model
> + *
> + * The io_mm keeps track of process address spaces shared between CPU and 
> IOMMU.
> + * The following example illustrates the relation between structures
> + * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm 
> and
> + * device. A device can have multiple io_mm and an io_mm may be bound to
> + * multiple devices.
> + *  ___
> + * |  IOMMU domain A   |
> + * |   |
> + * | |  IOMMU group   |+--- io_pgtables
> + * | |||
> + * | |   dev 00:00.0 +--- bond --- io_mm X
> + * | ||   \|
> + * |   '- bond ---.
> + * |___|   \
> + *  ___ \
> + * |  IOMMU domain B   |   io_mm Y
> + * |   |   / /
> + * | |  IOMMU group   ||  / /
> + * | ||| / /
> + * | |   dev 00:01.0  bond -' /
> + * | |   dev 00:01.1  bond --'
> + * | |||
> + * |   +--- io_pgtables
> + * |___|
> + *
> + * In this example, device 00:00.0 is in domain A, devices 00:01.* are in 
> domain
> + * B. All devices within the same domain access the same address spaces. 
> Device
> + * 00:00.0 accesses address spaces X and Y, each corresponding to an 
> mm_struct.
> + * Devices 00:01.* only access address space Y. In addition each
> + * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
> + * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
> + *
> + * To obtain the above configuration, users would for instance issue the
> + * following calls:
> + *
> + * iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
> + * 

Re: [PATCH v2 02/40] iommu/sva: Bind process address spaces to devices

2018-09-05 Thread Auger Eric
Hi Jean-Philippe,

On 05/11/2018 09:06 PM, Jean-Philippe Brucker wrote:
> Add bind() and unbind() operations to the IOMMU API. Bind() returns a
> PASID that drivers can program in hardware, to let their devices access an
> mm. This patch only adds skeletons for the device driver API, most of the
> implementation is still missing.
> 
> IOMMU groups with more than one device aren't supported for SVA at the
> moment. There may be P2P traffic between devices within a group, which
> cannot be seen by an IOMMU (note that supporting PASID doesn't add any
> form of isolation with regard to P2P). Supporting groups would require
> calling bind() for all bound processes every time a device is added to a
> group, to perform sanity checks (e.g. ensure that new devices support
> PASIDs at least as big as those already allocated in the group). It also
> means making sure that reserved ranges (IOMMU_RESV_*) of all devices are
> carved out of processes. This is already tricky with single devices, but
> becomes very difficult with groups. Since SVA-capable devices are expected
> to be cleanly isolated, and since we don't have any way to test groups or
> hot-plug, we only allow singular groups for now.
> 
> Signed-off-by: Jean-Philippe Brucker 
> 
> ---
> v1->v2: remove iommu_sva_bind/unbind_group
> ---
>  drivers/iommu/iommu-sva.c | 27 +
>  drivers/iommu/iommu.c | 83 +++
>  include/linux/iommu.h | 37 +
>  3 files changed, 147 insertions(+)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 8b4afb7c63ae..8d98f9c09864 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -93,6 +93,8 @@ int iommu_sva_device_shutdown(struct device *dev)
>   if (!domain)
>   return -ENODEV;
>  
> + __iommu_sva_unbind_dev_all(dev);
> +
>   mutex_lock(>iommu_param->lock);
>   param = dev->iommu_param->sva_param;
>   dev->iommu_param->sva_param = NULL;
> @@ -108,3 +110,28 @@ int iommu_sva_device_shutdown(struct device *dev)
>   return 0;
>  }
>  EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
> +
> +int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> + int *pasid, unsigned long flags, void *drvdata)
> +{
> + return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
> +
> +int __iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> + return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
> +
> +/**
> + * __iommu_sva_unbind_dev_all() - Detach all address spaces from this device
> + * @dev: the device
> + *
> + * When detaching @device from a domain, IOMMU drivers should use this 
> helper.
> + */
> +void __iommu_sva_unbind_dev_all(struct device *dev)
> +{
> + /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 9e28d88c8074..bd2819deae5b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2261,3 +2261,86 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, 
> int num_ids)
>   return 0;
>  }
>  EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
> +
> +/**
> + * iommu_sva_bind_device() - Bind a process address space to a device
> + * @dev: the device
> + * @mm: the mm to bind, caller must hold a reference to it
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties
> + * @drvdata: private data passed to the mm exit handler
> + *
> + * Create a bond between device and task, allowing the device to access the 
> mm
> + * using the returned PASID. If unbind() isn't called first, a subsequent 
> bind()
> + * for the same device and mm fails with -EEXIST.
> + *
> + * iommu_sva_device_init() must be called first, to initialize the required 
> SVA
> + * features. @flags is a subset of these features.
@flags must be a subset of these features?
> + *
> + * The caller must pin down using get_user_pages*() all mappings shared with 
> the
> + * device. mlock() isn't sufficient, as it doesn't prevent minor page faults
> + * (e.g. copy-on-write).
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an 
> error
> + * is returned.
> + */
> +int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int 
> *pasid,
> +   unsigned long flags, void *drvdata)
> +{
> + int ret = -EINVAL;
> + struct iommu_group *group;
> +
> + if (!pasid)
> + return -EINVAL;
> +
> + group = iommu_group_get(dev);
> + if (!group)
> + return -ENODEV;
> +
> + /* Ensure device count and domain don't change while we're binding */
> + mutex_lock(>mutex);
> + if (iommu_group_device_count(group) != 1)
> + goto out_unlock;
don't you want to check flags is a subset of
dev->iommu_param->sva_param->features?
> +
> + ret = __iommu_sva_bind_device(dev, mm, pasid, flags, drvdata);

Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-05 Thread Auger Eric
Hi Jean-Philippe,
On 05/11/2018 09:06 PM, Jean-Philippe Brucker wrote:
> Shared Virtual Addressing (SVA) provides a way for device drivers to bind
> process address spaces to devices. This requires the IOMMU to support page
> table format and features compatible with the CPUs, and usually requires
> the system to support I/O Page Faults (IOPF) and Process Address Space ID
> (PASID). When all of these are available, DMA can access virtual addresses
> of a process. A PASID is allocated for each process, and the device driver
> programs it into the device in an implementation-specific way.
> 
> Add a new API for sharing process page tables with devices. Introduce two
> IOMMU operations, sva_device_init() and sva_device_shutdown(), that
> prepare the IOMMU driver for SVA. For example allocate PASID tables and
> fault queues. Subsequent patches will implement the bind() and unbind()
> operations.
> 
> Support for I/O Page Faults will be added in a later patch using a new
> feature bit (IOMMU_SVA_FEAT_IOPF). With the current API users must pin
> down all shared mappings. Other feature bits that may be added in the
> future are IOMMU_SVA_FEAT_PRIVATE, to support private PASID address
> spaces, and IOMMU_SVA_FEAT_NO_PASID, to bind the whole device address
> space to a process.
> 
> Signed-off-by: Jean-Philippe Brucker 
> 
> ---
> v1->v2:
> * Add sva_param structure to iommu_param
> * CONFIG option is only selectable by IOMMU drivers
> ---
>  drivers/iommu/Kconfig |   4 ++
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/iommu-sva.c | 110 ++
>  include/linux/iommu.h |  32 +++
>  4 files changed, 147 insertions(+)
>  create mode 100644 drivers/iommu/iommu-sva.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 7564237f788d..cca8e06903c7 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -74,6 +74,10 @@ config IOMMU_DMA
>   select IOMMU_IOVA
>   select NEED_SG_DMA_LENGTH
>  
> +config IOMMU_SVA
> + bool
> + select IOMMU_API
> +
>  config FSL_PAMU
>   bool "Freescale IOMMU support"
>   depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1fb695854809..1dbcc89ebe4c 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
> +obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> new file mode 100644
> index ..8b4afb7c63ae
> --- /dev/null
> +++ b/drivers/iommu/iommu-sva.c
> @@ -0,0 +1,110 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Manage PASIDs and bind process address spaces to devices.
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + */
> +
> +#include 
> +#include 
> +
> +/**
> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a 
> device
> + * @dev: the device
> + * @features: bitmask of features that need to be initialized
> + * @max_pasid: max PASID value supported by the device
> + *
> + * Users of the bind()/unbind() API must call this function to initialize all
> + * features required for SVA.
> + *
> + * The device must support multiple address spaces (e.g. PCI PASID). By 
> default
> + * the PASID allocated during bind() is limited by the IOMMU capacity, and by
> + * the device PASID width defined in the PCI capability or in the firmware
> + * description. Setting @max_pasid to a non-zero value smaller than this 
> limit
> + * overrides it.
> + *
> + * The device should not be performing any DMA while this function is 
> running,
> + * otherwise the behavior is undefined.
> + *
> + * Return 0 if initialization succeeded, or an error.
> + */
> +int iommu_sva_device_init(struct device *dev, unsigned long features,
> +   unsigned int max_pasid)
what about min_pasid?
> +{
> + int ret;
> + struct iommu_sva_param *param;
> + struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> + if (!domain || !domain->ops->sva_device_init)
> + return -ENODEV;
> +
> + if (features)
> + return -EINVAL;
> +
> + param = kzalloc(sizeof(*param), GFP_KERNEL);
> + if (!param)
> + return -ENOMEM;
> +
> + param->features = features;
> + param->max_pasid= max_pasid;
> +
> + /*
> +  * IOMMU driver updates the limits depending on the IOMMU and device
> +  * capabilities.
> +  */
> + ret = domain->ops->sva_device_init(dev, param);
> + if (ret)
> + goto err_free_param;
So you are likely to call sva_device_init even if it was already called
(ie. dev->iommu_param->sva_param is 

Re: [PATCH 0/5] Qcom smmu-500 TLB invalidation errata for sdm845

2018-09-05 Thread Vivek Gautam




On 9/5/2018 3:34 PM, Rob Clark wrote:

On Wed, Sep 5, 2018 at 5:22 AM Vivek Gautam  wrote:


On 8/14/2018 5:54 PM, Vivek Gautam wrote:

Hi Will,


On 8/14/2018 5:10 PM, Will Deacon wrote:

Hi Vivek,

On Tue, Aug 14, 2018 at 04:25:23PM +0530, Vivek Gautam wrote:

Qcom's implementation of arm,mmu-500 on sdm845 has a
functional/performance
errata [1] because of which the TCU cache look ups are stalled during
invalidation cycle. This is mitigated by serializing all the
invalidation
requests coming to the smmu.

How does this implementation differ from the one supported by
qcom_iommu.c?
I notice you're adding firmware hooks here, which we avoided by
having the
extra driver. Please help me understand which devices exist, how they
differ, and which drivers are intended to support them!

IIRC, the qcom_iommu driver was intended to support the static context
bank - SID
mapping, and is very specific to the smmu-v2 version present on
msm8916 soc.
However, this is the qcom's mmu-500 implementation specific errata.
qcom_iommu
will not be able to support mmu-500 configurations.
Rob Clark can add more.
Let you know what you suggest.

Rob, can you please comment about how qcom-smmu driver has different
implementation
from arm-smmu driver?

sorry, I missed this thread earlier.  But yeah, as you mentioned, the
purpose for qcom_iommu.c was to deal with the static context/SID
mapping.

(I guess it is all just software, and we could make qcom_iommu.c
support dynamic mapping as well, but I think then it starts to
duplicate most of arm_smmu.c, so that doesn't seem like the right
direction)


Thanks Rob for the response. I will wait for Will's response on how would he
like this support be implemented.

Best regards
Vivek


BR,
-R


Will, in case we would want to use arm-smmu driver, what would you
suggest for
having the firmware hooks?
Thanks.

Best regards
Vivek


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-09-05 Thread Jean-Philippe Brucker
On 05/09/2018 04:15, Xu Zaibo wrote:
>>>   1. While the series are finished well, VFIO-PCI device can be held
>>> by only one process
>>>   through binding IOCTL command without PASID (without PASID
>>> being exposed user space).
>> It could, but isn't supported at the moment. In addition to adding
>> support in the I/O page fault code, we'd also need to update the VFIO
>> API. Currently a VFIO_TYPE1 domain always supports the MAP/UNMAP ioctl.
>> The case you describe isn't compatible with MAP/UNMAP, since the process
>> manages the shared address space with mmap or malloc. We'd probably need
>> to introduce a new VFIO IOMMU type, in which case the bind could be
>> performed implicitly when the process does VFIO_SET_IOMMU. Then the
>> process wouldn't need to send an additional BIND IOCTL.
> ok. got it.  This is the legacy mode, so all the VFIO APIs are kept 
> unchanged?

Yes, existing VFIO semantics are preserved

>>>   2. While using VFIO-PCI device to support multiple processes with
>>> SVA series, a primary
>>>   process with multiple secondary processes must be deployed just
>>> like DPDK(https://www.dpdk.org/).
>>>   And, the PASID still has to be exposed to user land.
>> Right. A third case, also implemented by this patch (and complete), is
>> the primary process simply doing a BIND for itself, and using the
>> returned PASID to share its own address space with the device.
>>
> ok. But I am worried that the sulotion of one primary processes with 
> several secondary ones
> 
> is a little bit limited. Maybe, users don't want to depend on the 
> primary process. :)

I don't see a better way for vfio-pci, though. But more importantly, I
don't know of any users :) While the feature is great for testing new
hardware, and I've been using it for all kinds of stress testing, I
haven't received feedback from possible users in production settings
(DPDK etc) and can't speculate about what they'd prefer.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5] Qcom smmu-500 TLB invalidation errata for sdm845

2018-09-05 Thread Rob Clark
On Wed, Sep 5, 2018 at 5:22 AM Vivek Gautam  wrote:
>
>
> On 8/14/2018 5:54 PM, Vivek Gautam wrote:
> > Hi Will,
> >
> >
> > On 8/14/2018 5:10 PM, Will Deacon wrote:
> >> Hi Vivek,
> >>
> >> On Tue, Aug 14, 2018 at 04:25:23PM +0530, Vivek Gautam wrote:
> >>> Qcom's implementation of arm,mmu-500 on sdm845 has a
> >>> functional/performance
> >>> errata [1] because of which the TCU cache look ups are stalled during
> >>> invalidation cycle. This is mitigated by serializing all the
> >>> invalidation
> >>> requests coming to the smmu.
> >> How does this implementation differ from the one supported by
> >> qcom_iommu.c?
> >> I notice you're adding firmware hooks here, which we avoided by
> >> having the
> >> extra driver. Please help me understand which devices exist, how they
> >> differ, and which drivers are intended to support them!
> >
> > IIRC, the qcom_iommu driver was intended to support the static context
> > bank - SID
> > mapping, and is very specific to the smmu-v2 version present on
> > msm8916 soc.
> > However, this is the qcom's mmu-500 implementation specific errata.
> > qcom_iommu
> > will not be able to support mmu-500 configurations.
> > Rob Clark can add more.
> > Let you know what you suggest.
>
> Rob, can you please comment about how qcom-smmu driver has different
> implementation
> from arm-smmu driver?

sorry, I missed this thread earlier.  But yeah, as you mentioned, the
purpose for qcom_iommu.c was to deal with the static context/SID
mapping.

(I guess it is all just software, and we could make qcom_iommu.c
support dynamic mapping as well, but I think then it starts to
duplicate most of arm_smmu.c, so that doesn't seem like the right
direction)

BR,
-R

> Will, in case we would want to use arm-smmu driver, what would you
> suggest for
> having the firmware hooks?
> Thanks.
>
> Best regards
> Vivek
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5] Qcom smmu-500 TLB invalidation errata for sdm845

2018-09-05 Thread Vivek Gautam



On 8/14/2018 5:54 PM, Vivek Gautam wrote:

Hi Will,


On 8/14/2018 5:10 PM, Will Deacon wrote:

Hi Vivek,

On Tue, Aug 14, 2018 at 04:25:23PM +0530, Vivek Gautam wrote:
Qcom's implementation of arm,mmu-500 on sdm845 has a 
functional/performance

errata [1] because of which the TCU cache look ups are stalled during
invalidation cycle. This is mitigated by serializing all the 
invalidation

requests coming to the smmu.
How does this implementation differ from the one supported by 
qcom_iommu.c?
I notice you're adding firmware hooks here, which we avoided by 
having the

extra driver. Please help me understand which devices exist, how they
differ, and which drivers are intended to support them!


IIRC, the qcom_iommu driver was intended to support the static context 
bank - SID
mapping, and is very specific to the smmu-v2 version present on 
msm8916 soc.
However, this is the qcom's mmu-500 implementation specific errata. 
qcom_iommu

will not be able to support mmu-500 configurations.
Rob Clark can add more.
Let you know what you suggest.


Rob, can you please comment about how qcom-smmu driver has different 
implementation

from arm-smmu driver?
Will, in case we would want to use arm-smmu driver, what would you 
suggest for

having the firmware hooks?
Thanks.

Best regards
Vivek




Also -- you didn't CC all the maintainers for the firmware bits, so 
adding

Andy here for that, and Rob for the previous question.


I added Andy to the series, would you want me to add Rob H also?

Best regards
Vivek



Thanks,

Will




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/7] vfio: add sdmdev support

2018-09-05 Thread Dan Carpenter
Hi Kenneth,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on cryptodev/master]
[also build test WARNING on v4.19-rc2 next-20180905]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Kenneth-Lee/A-General-Accelerator-Framework-WarpDrive/20180903-162733
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master

smatch warnings:
drivers/vfio/sdmdev/vfio_sdmdev.c:78 iommu_type_show() error: 'sdmdev' 
dereferencing possible ERR_PTR()
drivers/vfio/sdmdev/vfio_sdmdev.c:91 dma_flag_show() error: 'sdmdev' 
dereferencing possible ERR_PTR()
drivers/vfio/sdmdev/vfio_sdmdev.c:127 flags_show() error: 'sdmdev' 
dereferencing possible ERR_PTR()
drivers/vfio/sdmdev/vfio_sdmdev.c:128 name_show() error: 'sdmdev' dereferencing 
possible ERR_PTR()
drivers/vfio/sdmdev/vfio_sdmdev.c:130 device_api_show() error: 'sdmdev' 
dereferencing possible ERR_PTR()
drivers/vfio/sdmdev/vfio_sdmdev.c:138 available_instances_show() error: 
'sdmdev' dereferencing possible ERR_PTR()
drivers/vfio/sdmdev/vfio_sdmdev.c:178 vfio_sdmdev_mdev_remove() warn: if();

# 
https://github.com/0day-ci/linux/commit/1e47d5e608652b4a2c813dbeaf5aa6811f6ceaf7
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 1e47d5e608652b4a2c813dbeaf5aa6811f6ceaf7
vim +/sdmdev +78 drivers/vfio/sdmdev/vfio_sdmdev.c

1e47d5e6 Kenneth Lee 2018-09-03   69  
1e47d5e6 Kenneth Lee 2018-09-03   70  static ssize_t iommu_type_show(struct 
device *dev,
1e47d5e6 Kenneth Lee 2018-09-03   71   struct 
device_attribute *attr, char *buf)
1e47d5e6 Kenneth Lee 2018-09-03   72  {
1e47d5e6 Kenneth Lee 2018-09-03   73struct vfio_sdmdev *sdmdev = 
vfio_sdmdev_pdev_sdmdev(dev);
 
^^^
Presumably this returns error pointers instead of NULL?

1e47d5e6 Kenneth Lee 2018-09-03   74  
1e47d5e6 Kenneth Lee 2018-09-03   75if (!sdmdev)
1e47d5e6 Kenneth Lee 2018-09-03   76return -ENODEV;
1e47d5e6 Kenneth Lee 2018-09-03   77  
1e47d5e6 Kenneth Lee 2018-09-03  @78return sprintf(buf, "%d\n", 
sdmdev->iommu_type);
1e47d5e6 Kenneth Lee 2018-09-03   79  }
1e47d5e6 Kenneth Lee 2018-09-03   80  
1e47d5e6 Kenneth Lee 2018-09-03   81  static DEVICE_ATTR_RO(iommu_type);
1e47d5e6 Kenneth Lee 2018-09-03   82  
1e47d5e6 Kenneth Lee 2018-09-03   83  static ssize_t dma_flag_show(struct 
device *dev,
1e47d5e6 Kenneth Lee 2018-09-03   84 struct 
device_attribute *attr, char *buf)
1e47d5e6 Kenneth Lee 2018-09-03   85  {
1e47d5e6 Kenneth Lee 2018-09-03   86struct vfio_sdmdev *sdmdev = 
vfio_sdmdev_pdev_sdmdev(dev);
1e47d5e6 Kenneth Lee 2018-09-03   87  
1e47d5e6 Kenneth Lee 2018-09-03   88if (!sdmdev)
1e47d5e6 Kenneth Lee 2018-09-03   89return -ENODEV;
1e47d5e6 Kenneth Lee 2018-09-03   90  
1e47d5e6 Kenneth Lee 2018-09-03  @91return sprintf(buf, "%d\n", 
sdmdev->dma_flag);
1e47d5e6 Kenneth Lee 2018-09-03   92  }
1e47d5e6 Kenneth Lee 2018-09-03   93  
1e47d5e6 Kenneth Lee 2018-09-03   94  static DEVICE_ATTR_RO(dma_flag);
1e47d5e6 Kenneth Lee 2018-09-03   95  
1e47d5e6 Kenneth Lee 2018-09-03   96  /* mdev->dev_attr_groups */
1e47d5e6 Kenneth Lee 2018-09-03   97  static struct attribute 
*vfio_sdmdev_attrs[] = {
1e47d5e6 Kenneth Lee 2018-09-03   98_attr_iommu_type.attr,
1e47d5e6 Kenneth Lee 2018-09-03   99_attr_dma_flag.attr,
1e47d5e6 Kenneth Lee 2018-09-03  100NULL,
1e47d5e6 Kenneth Lee 2018-09-03  101  };
1e47d5e6 Kenneth Lee 2018-09-03  102  static const struct attribute_group 
vfio_sdmdev_group = {
1e47d5e6 Kenneth Lee 2018-09-03  103.name  = 
VFIO_SDMDEV_PDEV_ATTRS_GRP_NAME,
1e47d5e6 Kenneth Lee 2018-09-03  104.attrs = vfio_sdmdev_attrs,
1e47d5e6 Kenneth Lee 2018-09-03  105  };
1e47d5e6 Kenneth Lee 2018-09-03  106  const struct attribute_group 
*vfio_sdmdev_groups[] = {
1e47d5e6 Kenneth Lee 2018-09-03  107_sdmdev_group,
1e47d5e6 Kenneth Lee 2018-09-03  108NULL,
1e47d5e6 Kenneth Lee 2018-09-03  109  };
1e47d5e6 Kenneth Lee 2018-09-03  110  
1e47d5e6 Kenneth Lee 2018-09-03  111  /* default attributes for 
mdev->supported_type_groups, used by registerer*/
1e47d5e6 Kenneth Lee 2018-09-03  112  #define MDEV_TYPE_ATTR_RO_EXPORT(name) \
1e47d5e6 Kenneth Lee 2018-09-03  113MDEV_TYPE_ATTR_RO(name); \
1e47d5e6 Kenneth Lee 2018-09-03  114
EXPORT_SYMBOL_GPL(mdev_type_attr_##name);
1e47d5e6 Kenneth Lee 2018-09-03  115  
1e47d5e6 Kenneth Lee 2018-09-03  116  #define DEF_SIMPLE_SDMDEV_ATTR(_name, 
sdmdev_member, format) \
1e47d5e6 Kenneth Lee 2018-09-03  117  static ssize_t _name##_show(struct 
kobject *kobj, struct device *dev, \
1e47d5e6 Kenneth Lee 2018-09-03  118char *buf) \
1e47d5e6 Kenneth Lee 2018-09-03  119  { \
1e47d5e6 Kenne

Re: [PATCH 2/5 V6] x86/ioremap: strengthen the logic in early_memremap_pgprot_adjust() to adjust encryption mask

2018-09-05 Thread Dave Young
[snip]
> 
> As previously mentioned, there are also many differences between kexec and 
> kdump. In general,
> kexec needs to look at all of available physical memory, but kdump doesn't 
> need.
> 
> For kexec, kexec-tools will read /sys/firmware/memmap and recreate the e820 
> ranges for the 2nd
> kernel. If it fails, will use /proc/iomem.
> 
> For kdump, kexec-tools will read /proc/iomem and recreate the e820 ranges for 
> kdump kernel.
> BTW: we can not get the range of persistent memory from /proc/iomem. So e820 
> ranges don't contain
> the persistent memory in kdump kernel, this is the real reason why i need to 
> strengthen the logic
> of adjusting memory encryption mask.

"persistent memory" is different, I think you meant about some reserved
memory instead

> 
> If kexec-tools also use /sys/firmware/memmap for kdump(like kexec), kdump 
> kernel can also work
> without a fix, but the kexec-tools will have to be modified. Are you sure 
> that you want me to
> fix kexec-tools instead of kernel?

Yes, please fix kexec-tools to pass reserved ranges in e820, you will
not need this patch then.

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5 V6] x86/ioremap: strengthen the logic in early_memremap_pgprot_adjust() to adjust encryption mask

2018-09-05 Thread lijiang
在 2018年09月04日 09:51, Dave Young 写道:
> On 09/04/18 at 09:29am, Dave Young wrote:
>> On 09/04/18 at 08:44am, Dave Young wrote:
>>> On 09/03/18 at 10:06pm, lijiang wrote:
 在 2018年09月03日 10:45, Dave Young 写道:
> On 08/31/18 at 04:19pm, Lianbo Jiang wrote:
>> For kdump kernel, when SME is enabled, the acpi table and dmi table will 
>> need
>> to be remapped without the memory encryption mask. So we have to 
>> strengthen
>> the logic in early_memremap_pgprot_adjust(), which makes us have an 
>> opportunity
>> to adjust the memory encryption mask.
>>
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/x86/mm/ioremap.c | 9 -
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index e01e6c695add..f9d9a39955f3 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -689,8 +689,15 @@ pgprot_t __init 
>> early_memremap_pgprot_adjust(resource_size_t phys_addr,
>>  encrypted_prot = true;
>>  
>>  if (sme_active()) {
>> +/*
>> + * In kdump kernel, the acpi table and dmi table will 
>> need
>> + * to be remapped without the memory encryption mask. 
>> Here
>> + * we have to strengthen the logic to adjust the memory
>> + * encryption mask.
>
> Assume the acpi/dmi tables are identical for both 1st kernel and kdump
> kernel, I'm not sure what is the difference, why need special handling
> for kdump. Can you add more explanations?
>

 Ok, i will use a dmi example to explain this issue.

 There are significant differences about E820 between the 1st kernel and 
 kdump kernel. I pasted them at bottom.

 Firstly, we need to know how they are called.
 __acpi_map_table()\
 / early_memremap_is_setup_data()
|-> early_memremap()-> early_memremap_pgprot_adjust()-> 
 | memremap_is_efi_data()
  dmi_early_remap()/
 \ memremap_should_map_decrypted()-> e820__get_entry_type()

 Secondly, we also need to understand the memremap_should_map_decrypted(), 
 which is illustrated by the fake code.
 static bool memremap_should_map_decrypted(resource_size_t phys_addr,
   unsigned long size)
 {

 /* code ... */

 switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
 case E820_TYPE_RESERVED:
 case E820_TYPE_ACPI:
 case E820_TYPE_NVS:
 case E820_TYPE_UNUSABLE:
 /* For SEV, these areas are encrypted */
 if (sev_active())
 break;
 /* Fallthrough */

 case E820_TYPE_PRAM:
 /* For SME, these areas are decrypted */
 return true;
 default:
 /* these areas are encrypted by default*/
 break;
 }

 return false;
 }

 For the dmi case, the dmi base address is 0x6286b000 in my test machine.

 In the 1st kernel, the e820__get_entry_type() can get a valid entry and 
 type by the dmi address, and we can also find the dmi base address from 
 e820.
 (see the 1st kernel log)
 0x6286b000 ∈ [mem 0x6286b000-0x6286efff]
 So, these areas are decrypted according to the 
 memremap_should_map_decrypted().

 In kdump kernel, the dmi base address is still 0x6286b000, but we can not 
 find the dmi base address from e820 any more. The e820__get_entry_type() 
 can
 not get a valid entry and type by the dmi base address, it will go into 
 the default branch. That is to say, these areas become encrypted. In fact, 
 these
 areas are also decrypted, so we have to strengthen the logic of adjusting 
 the memory encryption mask.


 The 1st kernel log:

 [0.00] BIOS-provided physical RAM map:
 [0.00] BIOS-e820: [mem 0x-0x0008bfff] 
 usable
 [0.00] BIOS-e820: [mem 0x0008c000-0x0009] 
 reserved
 [0.00] BIOS-e820: [mem 0x000e-0x000f] 
 reserved
 [0.00] BIOS-e820: [mem 0x0010-0x29920fff] 
 usable
 [0.00] BIOS-e820: [mem 0x29921000-0x29921fff] 
 reserved
 [0.00] BIOS-e820: [mem 0x29922000-0x62256fff] 
 usable
 [0.00] BIOS-e820: [mem 0x62257000-0x62356fff] 
 reserved
 [0.00] BIOS-e820: [mem 0x62357000-0x6235cfff] ACPI 
 data
 [