Re: [PATCH 3/7] iommu/iova: insert start_pfn boundary of dma32

2017-03-23 Thread Leizhen (ThunderTown)


On 2017/3/23 21:01, Robin Murphy wrote:
> On 22/03/17 06:27, Zhen Lei wrote:
>> Reserve the first granule size memory(start at start_pfn) as boundary
>> iova, to make sure that iovad->cached32_node can not be NULL in future.
>> Meanwhile, changed the assignment of iovad->cached32_node from rb_next to
>> rb_prev of >node in function __cached_rbnode_delete_update.
> 
> I'm not sure I follow this. It's a top-down allocator, so cached32_node
> points to the last node allocated (or the node above the last one freed)
> on the assumption that there is likely free space directly below there,
> thus it's a pretty good place for the next allocation to start searching
> from. On the other hand, start_pfn is a hard "do not go below this line"
> limit, so it doesn't seem to make any sense to ever point the former at
> the latter.
This patch just prepares for dma64. Because we really need to add the boundary
between dma32 and dma64, there are two main purposes:
1. to make dma32 iova allocation faster, because the last node which dma32 can 
be
seen is the boundary. So dma32 iova allocation will only try within dma32 iova 
space.
Meanwhile, we hope dma64 allocation try dma64 iova space(iova>=4G) first, 
because the
maxium dma32 iova space is 4GB, dma64 iova space is almost richer than dma32.

2. to prevent a allocated iova cross dma32 and dma64 space. Otherwise, this 
special
case should be considered when allocate and free iova.

After the above boundary added, it's better to add start_pfn of dma32 boundary 
also,
to make them to be considered in one model.

After the two boundaries added, adjust cached32/64_node point to the free iova 
node can
simplified programming.


> 
> I could understand slightly more if we were reserving the PFN *above*
> the cached range, but as-is I don't see what we gain from the change
> here, nor what benefit the cached32_node != NULL assumption gives
> (AFAICS it would be more useful to simply restrict the cases where it
> may be NULL to when the address space is either completely full or
> completely empty, or perhaps both).
> 
> Robin.
> 
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/iova.c | 63 
>> ++--
>>  1 file changed, 37 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index 1c49969..b5a148e 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -32,6 +32,17 @@ static unsigned long iova_rcache_get(struct iova_domain 
>> *iovad,
>>  static void init_iova_rcaches(struct iova_domain *iovad);
>>  static void free_iova_rcaches(struct iova_domain *iovad);
>>  
>> +static void
>> +insert_iova_boundary(struct iova_domain *iovad)
>> +{
>> +struct iova *iova;
>> +unsigned long start_pfn_32bit = iovad->start_pfn;
>> +
>> +iova = reserve_iova(iovad, start_pfn_32bit, start_pfn_32bit);
>> +BUG_ON(!iova);
>> +iovad->cached32_node = >node;
>> +}
>> +
>>  void
>>  init_iova_domain(struct iova_domain *iovad, unsigned long granule,
>>  unsigned long start_pfn, unsigned long pfn_32bit)
>> @@ -45,27 +56,38 @@ init_iova_domain(struct iova_domain *iovad, unsigned 
>> long granule,
>>  
>>  spin_lock_init(>iova_rbtree_lock);
>>  iovad->rbroot = RB_ROOT;
>> -iovad->cached32_node = NULL;
>>  iovad->granule = granule;
>>  iovad->start_pfn = start_pfn;
>>  iovad->dma_32bit_pfn = pfn_32bit;
>>  init_iova_rcaches(iovad);
>> +
>> +/*
>> + * Insert boundary nodes for dma32. So cached32_node can not be NULL in
>> + * future.
>> + */
>> +insert_iova_boundary(iovad);
>>  }
>>  EXPORT_SYMBOL_GPL(init_iova_domain);
>>  
>>  static struct rb_node *
>>  __get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn)
>>  {
>> -if ((*limit_pfn > iovad->dma_32bit_pfn) ||
>> -(iovad->cached32_node == NULL))
>> +struct rb_node *cached_node;
>> +struct rb_node *next_node;
>> +
>> +if (*limit_pfn > iovad->dma_32bit_pfn)
>>  return rb_last(>rbroot);
>> -else {
>> -struct rb_node *prev_node = rb_prev(iovad->cached32_node);
>> -struct iova *curr_iova =
>> -rb_entry(iovad->cached32_node, struct iova, node);
>> -*limit_pfn = curr_iova->pfn_lo - 1;
>> -return prev_node;
>> +else
>> +cached_node = iovad->cached32_node;
>> +
>> +next_node = rb_next(cached_node);
>> +if (next_node) {
>> +struct iova *next_iova = rb_entry(next_node, struct iova, node);
>> +
>> +*limit_pfn = min(*limit_pfn, next_iova->pfn_lo - 1);
>>  }
>> +
>> +return cached_node;
>>  }
>>  
>>  static void
>> @@ -83,20 +105,13 @@ __cached_rbnode_delete_update(struct iova_domain 
>> *iovad, struct iova *free)
>>  struct iova *cached_iova;
>>  struct rb_node *curr;
>>  
>> -if (!iovad->cached32_node)
>> -return;
>>  curr = iovad->cached32_node;
>> 

Re: [PATCH 3/9] Docs: dt: document qcom iommu bindings

2017-03-23 Thread Rob Clark
On Thu, Mar 23, 2017 at 6:21 PM, Rob Herring  wrote:
> On Tue, Mar 14, 2017 at 11:18:05AM -0400, Rob Clark wrote:
>> Cc: devicet...@vger.kernel.org
>> Signed-off-by: Rob Clark 
>> ---
>>  .../devicetree/bindings/iommu/qcom,iommu.txt   | 113 
>> +
>>  1 file changed, 113 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/iommu/qcom,iommu.txt
>>
>> diff --git a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt 
>> b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
>> new file mode 100644
>> index 000..fd5b7fa
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
>> @@ -0,0 +1,113 @@
>> +* QCOM IOMMU v1 Implementation
>> +
>> +Qualcomm "B" family devices which are not compatible with arm-smmu have
>> +a similar looking IOMMU but without access to the global register space,
>> +and optionally requiring additional configuration to route context irqs
>> +to non-secure vs secure interrupt line.
>> +
>> +** Required properties:
>> +
>> +- compatible   : Should be one of:
>> +
>> +"qcom,msm8916-iommu"
>> +
>> +- clock-names  : Should be a pair of "iface" (required for IOMMUs
>> + register group access) and "bus" (required for
>> + the IOMMUs underlying bus access).
>> +- clocks   : Phandles for respective clocks described by
>> + clock-names.
>> +- #address-cells   : must be 1.
>> +- #size-cells  : must be 1.
>> +- #iommu-cells : Must be 1.
>> +- ranges   : Base address and size of the iommu context banks.
>> +- qcom,iommu-secure-id  : secure-id.
>> +
>> +- List of sub-nodes, one per translation context bank.  Each sub-node
>> +  has the following required properties:
>> +
>> +  - compatible : Should be one of:
>> +- "qcom,msm-iommu-v1-ns"  : non-secure context bank
>> +- "qcom,msm-iommu-v1-sec" : secure context bank
>> +  - reg: Base address and size of context bank within the iommu
>> +  - interrupts : The context fault irq.
>> +
>> +** Optional properties:
>> +
>> +- reg  : Base address and size of the SMMU local base, should
>> + be only specified if the iommu requires configuration
>> + for routing of context bank irq's to secure vs non-
>> + secure lines.  (Ie. if the iommu contains secure
>> + context banks)
>> +
>> +
>> +** Examples:
>> +
>> + apps_iommu: iommu@1e2 {
>> + #address-cells = <1>;
>> + #size-cells = <1>;
>> + #iommu-cells = <1>;
>> + compatible = "qcom,msm8916-iommu", "qcom,msm-iommu-v1";
>
> You didn't document the fallback above. Maybe just drop it if only a few
> chips have this iommu.

not completely sure I understand what you want..

I think more than a few chips.. I suspect it is more like everything
after the last "a" family devices (snapdragon 600?) and before 820..
(well, more or less at least a few years worth of devices, stuff that
seems likely to be able to run an upstream kernel would be 800, 805,
808, 810.. and I guess there are some cut down 6xx and 4xx variants of
those)

I guess qcom_iommu wouldn't care about all the various 32b devices
(since they aren't going to use 64b page tables).. 808/810, I'm not
100% sure about..

>> + ranges = <0 0x1e2 0x4>;
>> + reg = <0x1ef 0x3000>;
>
> When you have both reg and ranges, use reg value for the unit-address.

whoops, I thought I fixed that

>> + clocks = < GCC_SMMU_CFG_CLK>,
>> +  < GCC_APSS_TCU_CLK>;
>> + clock-names = "iface", "bus";
>> + qcom,iommu-secure-id = <17>;
>> +
>> + // mdp_0:
>> + iommu-ctx@4000 {
>> + compatible = "qcom,msm-iommu-v1-ns";
>> + reg = <0x4000 0x1000>;
>> + interrupts = ;
>> + };
>> +
>> + // venus_ns:
>> + iommu-ctx@5000 {
>> + compatible = "qcom,msm-iommu-v1-sec";
>> + reg = <0x5000 0x1000>;
>> + interrupts = ;
>> + };
>> + };
>> +
>> + gpu_iommu: iommu@1f08000 {
>> + #address-cells = <1>;
>> + #size-cells = <1>;
>> + #iommu-cells = <1>;
>> + compatible = "qcom,msm8916-iommu", "qcom,msm-iommu-v1";
>> + ranges = <0 0x1f08000 0x1>;
>> + clocks = < GCC_SMMU_CFG_CLK>,
>> +  < GCC_GFX_TCU_CLK>;
>> + clock-names = "iface", "bus";
>> + qcom,iommu-secure-id = <18>;
>> +
>> + // gfx3d_user:
>> + iommu-ctx@1f09000 {
>
> iommu-ctx@1000

will fix, thx

BR,
-R

>> + compatible = "qcom,msm-iommu-v1-ns";
>> + reg = <0x1000 0x1000>;
>> +   

Re: [PATCH 1/7] iommu/iova: fix incorrect variable types

2017-03-23 Thread Leizhen (ThunderTown)


On 2017/3/23 19:42, Robin Murphy wrote:
> On 22/03/17 06:27, Zhen Lei wrote:
>> Keep these four variables type consistent with the paramters of function
>> __alloc_and_insert_iova_range and the members of struct iova:
>>
>> 1. static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
>>  unsigned long size, unsigned long limit_pfn,
>>
>> 2. struct iova {
>>  unsigned long   pfn_hi;
>>  unsigned long   pfn_lo;
>>
>> In fact, limit_pfn is most likely larger than 32 bits on DMA64.
> 
> FWIW if pad_size manages to overflow an int something's probably gone
> horribly wrong, but there's no harm in making it consistent with
> everything else here. However, given that patch #6 makes this irrelevant
> anyway, do we really need to bother?

Because I'm not sure whether patch #6 can be applied or not.

> 
> Robin.
> 
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/iova.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index b7268a1..8ba8b496 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -104,8 +104,8 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, 
>> struct iova *free)
>>   * Computes the padding size required, to make the start address
>>   * naturally aligned on the power-of-two order of its size
>>   */
>> -static unsigned int
>> -iova_get_pad_size(unsigned int size, unsigned int limit_pfn)
>> +static unsigned long
>> +iova_get_pad_size(unsigned long size, unsigned long limit_pfn)
>>  {
>>  return (limit_pfn + 1 - size) & (__roundup_pow_of_two(size) - 1);
>>  }
>> @@ -117,7 +117,7 @@ static int __alloc_and_insert_iova_range(struct 
>> iova_domain *iovad,
>>  struct rb_node *prev, *curr = NULL;
>>  unsigned long flags;
>>  unsigned long saved_pfn;
>> -unsigned int pad_size = 0;
>> +unsigned long pad_size = 0;
>>  
>>  /* Walk the tree backwards */
>>  spin_lock_irqsave(>iova_rbtree_lock, flags);
>>
> 
> 
> .
> 

-- 
Thanks!
BestRegards

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/9] Docs: dt: document qcom iommu bindings

2017-03-23 Thread Rob Herring
On Tue, Mar 14, 2017 at 11:18:05AM -0400, Rob Clark wrote:
> Cc: devicet...@vger.kernel.org
> Signed-off-by: Rob Clark 
> ---
>  .../devicetree/bindings/iommu/qcom,iommu.txt   | 113 
> +
>  1 file changed, 113 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/iommu/qcom,iommu.txt
> 
> diff --git a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt 
> b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
> new file mode 100644
> index 000..fd5b7fa
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
> @@ -0,0 +1,113 @@
> +* QCOM IOMMU v1 Implementation
> +
> +Qualcomm "B" family devices which are not compatible with arm-smmu have
> +a similar looking IOMMU but without access to the global register space,
> +and optionally requiring additional configuration to route context irqs
> +to non-secure vs secure interrupt line.
> +
> +** Required properties:
> +
> +- compatible   : Should be one of:
> +
> +"qcom,msm8916-iommu"
> +
> +- clock-names  : Should be a pair of "iface" (required for IOMMUs
> + register group access) and "bus" (required for
> + the IOMMUs underlying bus access).
> +- clocks   : Phandles for respective clocks described by
> + clock-names.
> +- #address-cells   : must be 1.
> +- #size-cells  : must be 1.
> +- #iommu-cells : Must be 1.
> +- ranges   : Base address and size of the iommu context banks.
> +- qcom,iommu-secure-id  : secure-id.
> +
> +- List of sub-nodes, one per translation context bank.  Each sub-node
> +  has the following required properties:
> +
> +  - compatible : Should be one of:
> +- "qcom,msm-iommu-v1-ns"  : non-secure context bank
> +- "qcom,msm-iommu-v1-sec" : secure context bank
> +  - reg: Base address and size of context bank within the iommu
> +  - interrupts : The context fault irq.
> +
> +** Optional properties:
> +
> +- reg  : Base address and size of the SMMU local base, should
> + be only specified if the iommu requires configuration
> + for routing of context bank irq's to secure vs non-
> + secure lines.  (Ie. if the iommu contains secure
> + context banks)
> +
> +
> +** Examples:
> +
> + apps_iommu: iommu@1e2 {
> + #address-cells = <1>;
> + #size-cells = <1>;
> + #iommu-cells = <1>;
> + compatible = "qcom,msm8916-iommu", "qcom,msm-iommu-v1";

You didn't document the fallback above. Maybe just drop it if only a few 
chips have this iommu.

> + ranges = <0 0x1e2 0x4>;
> + reg = <0x1ef 0x3000>;

When you have both reg and ranges, use reg value for the unit-address.

> + clocks = < GCC_SMMU_CFG_CLK>,
> +  < GCC_APSS_TCU_CLK>;
> + clock-names = "iface", "bus";
> + qcom,iommu-secure-id = <17>;
> +
> + // mdp_0:
> + iommu-ctx@4000 {
> + compatible = "qcom,msm-iommu-v1-ns";
> + reg = <0x4000 0x1000>;
> + interrupts = ;
> + };
> +
> + // venus_ns:
> + iommu-ctx@5000 {
> + compatible = "qcom,msm-iommu-v1-sec";
> + reg = <0x5000 0x1000>;
> + interrupts = ;
> + };
> + };
> +
> + gpu_iommu: iommu@1f08000 {
> + #address-cells = <1>;
> + #size-cells = <1>;
> + #iommu-cells = <1>;
> + compatible = "qcom,msm8916-iommu", "qcom,msm-iommu-v1";
> + ranges = <0 0x1f08000 0x1>;
> + clocks = < GCC_SMMU_CFG_CLK>,
> +  < GCC_GFX_TCU_CLK>;
> + clock-names = "iface", "bus";
> + qcom,iommu-secure-id = <18>;
> +
> + // gfx3d_user:
> + iommu-ctx@1f09000 {

iommu-ctx@1000

> + compatible = "qcom,msm-iommu-v1-ns";
> + reg = <0x1000 0x1000>;
> + interrupts = ;
> + };
> +
> + // gfx3d_priv:
> + iommu-ctx@1f0a000 {

iommu-ctx@2000

> + compatible = "qcom,msm-iommu-v1-ns";
> + reg = <0x2000 0x1000>;
> + interrupts = ;
> + };
> + };
> +
> + ...
> +
> + venus: video-codec@1d0 {
> + ...
> + iommus = <_iommu 5>;
> + };
> +
> + mdp: mdp@1a01000 {
> + ...
> + iommus = <_iommu 4>;
> + };
> +
> + gpu@01c0 {
> + ...
> + iommus = <_iommu 1>, <_iommu 2>;
> + };
> -- 
> 2.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majord...@vger.kernel.org

Re: [RFC PATCH v4 15/28] Add support to access persistent memory in the clear

2017-03-23 Thread Tom Lendacky

On 3/17/2017 5:58 PM, Elliott, Robert (Persistent Memory) wrote:




-Original Message-
From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
ow...@vger.kernel.org] On Behalf Of Tom Lendacky
Sent: Thursday, February 16, 2017 9:45 AM
Subject: [RFC PATCH v4 15/28] Add support to access persistent memory in
the clear

Persistent memory is expected to persist across reboots. The encryption
key used by SME will change across reboots which will result in corrupted
persistent memory.  Persistent memory is handed out by block devices
through memory remapping functions, so be sure not to map this memory as
encrypted.


The system might be able to save and restore the correct encryption key for a
region of persistent memory, in which case it does need to be mapped as
encrypted.


If the OS could get some indication that BIOS/UEFI has saved and
restored the encryption key, then it could be mapped encrypted.



This might deserve a new EFI_MEMORY_ENCRYPTED attribute bit so the
system firmware can communicate that information to the OS (in the
UEFI memory map and the ACPI NFIT SPA Range structures).  It wouldn't
likely ever be added to the E820h table - ACPI 6.1 already obsoleted the
Extended Attribute for AddressRangeNonVolatile.


An attribute bit in some form would be a nice way to inform the OS that
the persistent memory can be mapped encrypted.





Signed-off-by: Tom Lendacky 
---
 arch/x86/mm/ioremap.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index b0ff6bc..c6cb921 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -498,6 +498,8 @@ static bool
memremap_should_map_encrypted(resource_size_t phys_addr,
case E820_TYPE_ACPI:
case E820_TYPE_NVS:
case E820_TYPE_UNUSABLE:
+   case E820_TYPE_PMEM:
+   case E820_TYPE_PRAM:
return false;
default:
break;


E820_TYPE_RESERVED is also used to report persistent memory in
some systems (patch 16 adds that for other reasons).

You might want to intercept the persistent memory types in the
efi_mem_type(phys_addr) switch statement earlier in the function
as well.  https://lkml.org/lkml/2017/3/13/357 recently mentioned that
"in qemu hotpluggable memory isn't put into E820," with the latest
information only in the UEFI memory map.

Persistent memory can be reported there as:
* EfiReservedMemoryType type with the EFI_MEMORY_NV attribute
* EfiPersistentMemory type with the EFI_MEMORY_NV attribute

Even the UEFI memory map is not authoritative, though.  To really
determine what is in these regions requires parsing the ACPI NFIT
SPA Ranges structures.  Parts of the E820 or UEFI regions could be
reported as volatile there and should thus be encrypted.


Thanks for the details on this. I'll take a closer look at this and
update the checks appropriately.

Thanks,
Tom



---
Robert Elliott, HPE Persistent Memory




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/4] iommu/arm-smmu: Poll for TLB sync completion more effectively

2017-03-23 Thread Robin Murphy
On relatively slow development platforms and software models, the
inefficiency of our TLB sync loop tends not to show up - for instance on
a Juno r1 board I typically see the TLBI has completed of its own accord
by the time we get to the sync, such that the latter finishes instantly.

However, on larger systems doing real I/O, it's less realistic for the
TLBs to go idle immediately, and at that point falling into the 1MHz
polling loop turns out to throw away performance drastically. Let's
strike a balance by polling more than once between pauses, such that we
have much more chance of catching normal operations completing before
committing to the fixed delay, but also backing off exponentially, since
if a sync really hasn't completed within one or two "reasonable time"
periods, it becomes increasingly unlikely that it ever will.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index f7411109670f..aa17f3d937a0 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -162,6 +162,7 @@
 #define ARM_SMMU_GR0_sTLBGSTATUS   0x74
 #define sTLBGSTATUS_GSACTIVE   (1 << 0)
 #define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
 
 /* Stream mapping registers */
 #define ARM_SMMU_GR0_SMR(n)(0x800 + ((n) << 2))
@@ -574,18 +575,17 @@ static void __arm_smmu_free_bitmap(unsigned long *map, 
int idx)
 static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
void __iomem *sync, void __iomem *status)
 {
-   int count = 0;
+   unsigned int spin_count, delay;
 
writel_relaxed(0, sync);
-   while (readl_relaxed(status) & sTLBGSTATUS_GSACTIVE) {
-   cpu_relax();
-   if (++count == TLB_LOOP_TIMEOUT) {
-   dev_err_ratelimited(smmu->dev,
-   "TLB sync timed out -- SMMU may be deadlocked\n");
-   return;
-   }
-   udelay(1);
+   for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
+   for (spin_count = TLB_SPIN_COUNT; spin_count > 0; spin_count--)
+   if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
+   return;
+   udelay(delay);
}
+   dev_err_ratelimited(smmu->dev,
+   "TLB sync timed out -- SMMU may be deadlocked\n");
 }
 
 static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu)
-- 
2.11.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 24/30] iommu: Specify PASID state when unbinding a task

2017-03-23 Thread Jean-Philippe Brucker
On 23/03/17 16:52, Joerg Roedel wrote:
> On Thu, Mar 23, 2017 at 03:52:14PM +, Jean-Philippe Brucker wrote:
>> On 23/03/17 14:30, Joerg Roedel wrote:
>>> Are you sure about the meaning of the stop-marker? Can you point me to
>>> where it is specified?
>>
>> The concept is introduced in the PCI ECN that adds PASIDs to the ATS
>> specification. I have the following link, which is probably behind a
>> wall:
>>
>> https://pcisig.com/sites/default/files/specification_documents/ECN-PASID-ATS-2011-03-31.pdf
>>
>> A Stop Marker is a PPR with flags Read=Write=0, Last=1, targeting the
>> PASID that is being decommissioned. In section 4.1.2, the specifications
>> details the two device-specific ways of stopping use of a PASID, with or
>> without a Stop Marker. When done with a Stop Marker, the function doesn't
>> have to wait for any outstanding PPR to return, the Stop Marker serves as
>> a PASID barrier.
> 
> Thanks for that, I have a closer look. Is that stopper packet visible to
> software (e.g. is an entry created in the queue)?

As far as I understand, it should be added to the queue like a normal PPR,
and software should check the R/W/L flags combination. For SMMU at least,
it is the same. The only difference is that when the PRI queue overflows,
the SMMU would discard a Stop Marker instead of sending an automated
response to the device (as it would do with others PPR that have the L
flag.) Software shouldn't send a response to a Stop Marker either.

Thanks,
Jean-Philippe
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 24/30] iommu: Specify PASID state when unbinding a task

2017-03-23 Thread Joerg Roedel
On Thu, Mar 23, 2017 at 03:52:14PM +, Jean-Philippe Brucker wrote:
> On 23/03/17 14:30, Joerg Roedel wrote:
> > Are you sure about the meaning of the stop-marker? Can you point me to
> > where it is specified?
> 
> The concept is introduced in the PCI ECN that adds PASIDs to the ATS
> specification. I have the following link, which is probably behind a
> wall:
> 
> https://pcisig.com/sites/default/files/specification_documents/ECN-PASID-ATS-2011-03-31.pdf
> 
> A Stop Marker is a PPR with flags Read=Write=0, Last=1, targeting the
> PASID that is being decommissioned. In section 4.1.2, the specifications
> details the two device-specific ways of stopping use of a PASID, with or
> without a Stop Marker. When done with a Stop Marker, the function doesn't
> have to wait for any outstanding PPR to return, the Stop Marker serves as
> a PASID barrier.

Thanks for that, I have a closer look. Is that stopper packet visible to
software (e.g. is an entry created in the queue)?

I have to check whether the AMD IOMMU makes this visible.



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 24/30] iommu: Specify PASID state when unbinding a task

2017-03-23 Thread Jean-Philippe Brucker
On 23/03/17 14:30, Joerg Roedel wrote:
> On Thu, Mar 23, 2017 at 01:37:41PM +, Jean-Philippe Brucker wrote:
>> On 22/03/17 22:53, Joerg Roedel wrote:
>>> That can't happen, when the device driver does its job right. It has to
>>> shut down the context which causes the PPR requests for the PASID on the
>>> device. This includes stopping the context and waiting until all PPR
>>> requests it sent are processed.
>>
>> By "processed", do you mean that they are committed to the IOMMU, or that
>> they came back with a PRG response?
> 
> By processed I mean the device received a response to the request, or is
> at least no longer waiting for responses.
> 
>> The driver might not be able to do the latter, since PCI defines two ways
>> of shutting down a context:
>>
>> * Either wait for all PPR requests to come back with a PRG response,
>> * Or send a Stop Marker PPR request and forget about it.
> 
> Are you sure about the meaning of the stop-marker? Can you point me to
> where it is specified?

The concept is introduced in the PCI ECN that adds PASIDs to the ATS
specification. I have the following link, which is probably behind a
wall:

https://pcisig.com/sites/default/files/specification_documents/ECN-PASID-ATS-2011-03-31.pdf

A Stop Marker is a PPR with flags Read=Write=0, Last=1, targeting the
PASID that is being decommissioned. In section 4.1.2, the specifications
details the two device-specific ways of stopping use of a PASID, with or
without a Stop Marker. When done with a Stop Marker, the function doesn't
have to wait for any outstanding PPR to return, the Stop Marker serves as
a PASID barrier.

Thanks,
Jean-Philippe

> I know about a different marker, which allows a device to issue several
> PPR requests and only get one response for all of them. The last request
> then has that marker set.
> 
> But I am not aware (yet) of a marker with which the device says it no
> longer sends requests with this PASID.

>> My intent with passing flags to unbind() was to handle the case where VFIO
>> is unable to tell us whether PPRs are still being issued by the device.
>> But the issue seems moot to me now that I have a better understanding, as
>> there will be a detach_dev/attach_dev sequence before we start rebinding
>> PASIDs, and we can simply reset the PRI interface there (I'm currently
>> doing that in add_device, but I should move it.)
> 
> VFIO should completly reset the device before it is re-used, so I don't
> think we run into problems there.
> 
> 
> 
>   Joerg
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 24/30] iommu: Specify PASID state when unbinding a task

2017-03-23 Thread Joerg Roedel
On Thu, Mar 23, 2017 at 01:37:41PM +, Jean-Philippe Brucker wrote:
> On 22/03/17 22:53, Joerg Roedel wrote:
> > That can't happen, when the device driver does its job right. It has to
> > shut down the context which causes the PPR requests for the PASID on the
> > device. This includes stopping the context and waiting until all PPR
> > requests it sent are processed.
> 
> By "processed", do you mean that they are committed to the IOMMU, or that
> they came back with a PRG response?

By processed I mean the device received a response to the request, or is
at least no longer waiting for responses.

> The driver might not be able to do the latter, since PCI defines two ways
> of shutting down a context:
> 
> * Either wait for all PPR requests to come back with a PRG response,
> * Or send a Stop Marker PPR request and forget about it.

Are you sure about the meaning of the stop-marker? Can you point me to
where it is specified?

I know about a different marker, which allows a device to issue several
PPR requests and only get one response for all of them. The last request
then has that marker set.

But I am not aware (yet) of a marker with which the device says it no
longer sends requests with this PASID.

> My intent with passing flags to unbind() was to handle the case where VFIO
> is unable to tell us whether PPRs are still being issued by the device.
> But the issue seems moot to me now that I have a better understanding, as
> there will be a detach_dev/attach_dev sequence before we start rebinding
> PASIDs, and we can simply reset the PRI interface there (I'm currently
> doing that in add_device, but I should move it.)

VFIO should completly reset the device before it is re-used, so I don't
think we run into problems there.



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 24/30] iommu: Specify PASID state when unbinding a task

2017-03-23 Thread Jean-Philippe Brucker
On 22/03/17 22:53, Joerg Roedel wrote:
> On Wed, Mar 22, 2017 at 06:31:01PM +, Jean-Philippe Brucker wrote:
>> The problem might be too tied to the specifics of the SMMU. As implemented
>> in this series, the normal flow for a PPR with the SMMU is the following:
>>
>> (1) PCI device issues a PPR for PASID 1
>> (2) The PPR is queued by the SMMU in the (hardware) PRI queue
>> (3) The SMMU driver receives an interrupt, dequeues the PPR and moves it
>> to a software work queue.
>> (4) The PPR is finally handled and a PRI response is sent to the device.
> 
> There are two ways a PASID could get shut down:
> 
>   1) The device driver calls unbind()
>   2) The mm_struct bound to that PASID is going away
> 
> Case 1) is the easy one, we can safely assume that the device driver did
> anything to stop new PPR requests from being created for that PASID. In
> this case we just shut down PPR processing by waiting until everything
> is handled and reply INVALID to any further PPR request before we remove
> the PASID from the per-device IOMMU data structures and flush caches.
> 
> In case 2) we have more work to do. The mm_struct is going away
> (probably because the task segfaulted) and we can't assume that the
> device driver shut everything down already. But for this case we have
> the call-back into the device driver to tell it should clean everything
> up for that PASID and stop the device from creating further requests.
> 
> After that call-back returns it is the same as in case 1), we drain the
> queue and deny any further request that comes in.

I agree on the semantics (I didn't implement case 2) here but I will have
to in the next version.)

>> The case that worries me is if someone unbinds PASID 1 between (2) and
>> (3), while the PPR is still in the hardware queue, and immediately binds
>> it to a new address space.
>>
>> Then (3) and (4) happen, the PPR is handled and the fault is for the new
>> address space. It's certainly undesirable, but I don't know if it could be
>> exploited. We don't kill the task for an unhandled fault at the moment,
>> simply report a failed PPR to the device, so I might be worrying for nothing.
> 
> As I wrote above, when the device driver calls unbind() we should
> assume that the device does not sent any further requests with that
> PASID. If it does, we just answer with INVALID.
> 
>> Having the caller tell us if PPRs might still be pending in the hardware
>> PRI queue ensures that the SMMU driver waits until it's entirely safe:
>>
>> * If the device has no outstanding PPR, PASID can be reallocated
>> * If the device has outstanding PPRs, wait for a Stop Marker, or drain
>>   the PRI queue after a while (if the Stop Marker was lost in a PRI queue
>>   overflow).
> 
> That can't happen, when the device driver does its job right. It has to
> shut down the context which causes the PPR requests for the PASID on the
> device. This includes stopping the context and waiting until all PPR
> requests it sent are processed.

By "processed", do you mean that they are committed to the IOMMU, or that
they came back with a PRG response?

The driver might not be able to do the latter, since PCI defines two ways
of shutting down a context:

* Either wait for all PPR requests to come back with a PRG response,
* Or send a Stop Marker PPR request and forget about it.

The second one is problematic, all the device says is "I've stopped
sending requests, some might still be in flight, but a Stop Marker ends
the flow".

Without having the device driver tell us which of the two models the
device is implementing, draining the hardware queue is our best bet to
ensure that no request is pending.

In any case, this is a property of the device, and passing flags to
unbind() as I suggested is probably excessive. The IOMMU driver could
store that info somewhere and use it whenever it has to unbind(), as an
indication of the minimum amount of work required to clean the context.
Without this hint the default should be to drain both queues.

My intent with passing flags to unbind() was to handle the case where VFIO
is unable to tell us whether PPRs are still being issued by the device.
But the issue seems moot to me now that I have a better understanding, as
there will be a detach_dev/attach_dev sequence before we start rebinding
PASIDs, and we can simply reset the PRI interface there (I'm currently
doing that in add_device, but I should move it.)

> And the device driver has to do this either before it calls unbind() or
> in the call-back it provided. Only after this the PASID should be freed.
>
>> Draining the PRI queue is very costly, we need to block the PRI thread to
>> inspect the queue, risking an overflow. And with these PASID state flags
>> we avoid flushing any queue.
> 
> There is a configurable maximum of PPR requests a device can have
> in-flight. If you take that into account when allocation the PPR queue
> for the SMMU, there can't be any overflows. The AMD driver allocates a
> 

[PATCH] iommu: use sg_phys()

2017-03-23 Thread Geliang Tang
Use sg_phys() instead of open-coding it.

Signed-off-by: Geliang Tang 
---
 drivers/iommu/intel-iommu.c | 2 +-
 drivers/iommu/iommu.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d412a31..9d09a9e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3892,7 +3892,7 @@ static int intel_nontranslate_map_sg(struct device *hddev,
 
for_each_sg(sglist, sg, nelems, i) {
BUG_ON(!sg_page(sg));
-   sg->dma_address = page_to_phys(sg_page(sg)) + sg->offset;
+   sg->dma_address = sg_phys(sg);
sg->dma_length = sg->length;
}
return nelems;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3b67144..26f57b3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1603,7 +1603,7 @@ size_t default_iommu_map_sg(struct iommu_domain *domain, 
unsigned long iova,
min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
 
for_each_sg(sg, s, nents, i) {
-   phys_addr_t phys = page_to_phys(sg_page(s)) + s->offset;
+   phys_addr_t phys = sg_phys(s);
 
/*
 * We are mapping on IOMMU page boundaries, so offset within
-- 
2.9.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/7] iommu/iova: insert start_pfn boundary of dma32

2017-03-23 Thread Robin Murphy
On 22/03/17 06:27, Zhen Lei wrote:
> Reserve the first granule size memory(start at start_pfn) as boundary
> iova, to make sure that iovad->cached32_node can not be NULL in future.
> Meanwhile, changed the assignment of iovad->cached32_node from rb_next to
> rb_prev of >node in function __cached_rbnode_delete_update.

I'm not sure I follow this. It's a top-down allocator, so cached32_node
points to the last node allocated (or the node above the last one freed)
on the assumption that there is likely free space directly below there,
thus it's a pretty good place for the next allocation to start searching
from. On the other hand, start_pfn is a hard "do not go below this line"
limit, so it doesn't seem to make any sense to ever point the former at
the latter.

I could understand slightly more if we were reserving the PFN *above*
the cached range, but as-is I don't see what we gain from the change
here, nor what benefit the cached32_node != NULL assumption gives
(AFAICS it would be more useful to simply restrict the cases where it
may be NULL to when the address space is either completely full or
completely empty, or perhaps both).

Robin.

> Signed-off-by: Zhen Lei 
> ---
>  drivers/iommu/iova.c | 63 
> ++--
>  1 file changed, 37 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 1c49969..b5a148e 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -32,6 +32,17 @@ static unsigned long iova_rcache_get(struct iova_domain 
> *iovad,
>  static void init_iova_rcaches(struct iova_domain *iovad);
>  static void free_iova_rcaches(struct iova_domain *iovad);
>  
> +static void
> +insert_iova_boundary(struct iova_domain *iovad)
> +{
> + struct iova *iova;
> + unsigned long start_pfn_32bit = iovad->start_pfn;
> +
> + iova = reserve_iova(iovad, start_pfn_32bit, start_pfn_32bit);
> + BUG_ON(!iova);
> + iovad->cached32_node = >node;
> +}
> +
>  void
>  init_iova_domain(struct iova_domain *iovad, unsigned long granule,
>   unsigned long start_pfn, unsigned long pfn_32bit)
> @@ -45,27 +56,38 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
> granule,
>  
>   spin_lock_init(>iova_rbtree_lock);
>   iovad->rbroot = RB_ROOT;
> - iovad->cached32_node = NULL;
>   iovad->granule = granule;
>   iovad->start_pfn = start_pfn;
>   iovad->dma_32bit_pfn = pfn_32bit;
>   init_iova_rcaches(iovad);
> +
> + /*
> +  * Insert boundary nodes for dma32. So cached32_node can not be NULL in
> +  * future.
> +  */
> + insert_iova_boundary(iovad);
>  }
>  EXPORT_SYMBOL_GPL(init_iova_domain);
>  
>  static struct rb_node *
>  __get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn)
>  {
> - if ((*limit_pfn > iovad->dma_32bit_pfn) ||
> - (iovad->cached32_node == NULL))
> + struct rb_node *cached_node;
> + struct rb_node *next_node;
> +
> + if (*limit_pfn > iovad->dma_32bit_pfn)
>   return rb_last(>rbroot);
> - else {
> - struct rb_node *prev_node = rb_prev(iovad->cached32_node);
> - struct iova *curr_iova =
> - rb_entry(iovad->cached32_node, struct iova, node);
> - *limit_pfn = curr_iova->pfn_lo - 1;
> - return prev_node;
> + else
> + cached_node = iovad->cached32_node;
> +
> + next_node = rb_next(cached_node);
> + if (next_node) {
> + struct iova *next_iova = rb_entry(next_node, struct iova, node);
> +
> + *limit_pfn = min(*limit_pfn, next_iova->pfn_lo - 1);
>   }
> +
> + return cached_node;
>  }
>  
>  static void
> @@ -83,20 +105,13 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, 
> struct iova *free)
>   struct iova *cached_iova;
>   struct rb_node *curr;
>  
> - if (!iovad->cached32_node)
> - return;
>   curr = iovad->cached32_node;
>   cached_iova = rb_entry(curr, struct iova, node);
>  
>   if (free->pfn_lo >= cached_iova->pfn_lo) {
> - struct rb_node *node = rb_next(>node);
> - struct iova *iova = rb_entry(node, struct iova, node);
> -
>   /* only cache if it's below 32bit pfn */
> - if (node && iova->pfn_lo < iovad->dma_32bit_pfn)
> - iovad->cached32_node = node;
> - else
> - iovad->cached32_node = NULL;
> + if (free->pfn_hi <= iovad->dma_32bit_pfn)
> + iovad->cached32_node = rb_prev(>node);
>   }
>  }
>  
> @@ -114,7 +129,7 @@ static int __alloc_and_insert_iova_range(struct 
> iova_domain *iovad,
>   unsigned long size, unsigned long limit_pfn,
>   struct iova *new, bool size_aligned)
>  {
> - struct rb_node *prev, *curr = NULL;
> + struct rb_node *prev, *curr;
>   unsigned long flags;
>   unsigned long 

Re: [PATCH 2/7] iommu/iova: cut down judgement times

2017-03-23 Thread Robin Murphy
On 22/03/17 06:27, Zhen Lei wrote:
> Below judgement can only be satisfied at the last time, which produced 2N
> judgements(suppose N times failed, 0 or 1 time successed) in vain.
> 
> if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) {
>   return iova;
> }

For me, GCC (6.2.1 AArch64) seems to do a pretty good job of this
function already, so this change only saves two instructions in total
(pfn is compared against pfn_lo only once instead of twice), which I
wouldn't expect to see a noticeable performance effect from.

Given the improvement in readability, though, I don't even care about
any codegen differences :)

Reviewed-by: Robin Murphy 

> Signed-off-by: Zhen Lei 
> ---
>  drivers/iommu/iova.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 8ba8b496..1c49969 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -312,15 +312,12 @@ private_find_iova(struct iova_domain *iovad, unsigned 
> long pfn)
>   while (node) {
>   struct iova *iova = rb_entry(node, struct iova, node);
>  
> - /* If pfn falls within iova's range, return iova */
> - if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) {
> - return iova;
> - }
> -
>   if (pfn < iova->pfn_lo)
>   node = node->rb_left;
> - else if (pfn > iova->pfn_lo)
> + else if (pfn > iova->pfn_hi)
>   node = node->rb_right;
> + else
> + return iova;/* pfn falls within iova's range */
>   }
>  
>   return NULL;
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/7] iommu/iova: fix incorrect variable types

2017-03-23 Thread Robin Murphy
On 22/03/17 06:27, Zhen Lei wrote:
> Keep these four variables type consistent with the paramters of function
> __alloc_and_insert_iova_range and the members of struct iova:
> 
> 1. static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
>   unsigned long size, unsigned long limit_pfn,
> 
> 2. struct iova {
>   unsigned long   pfn_hi;
>   unsigned long   pfn_lo;
> 
> In fact, limit_pfn is most likely larger than 32 bits on DMA64.

FWIW if pad_size manages to overflow an int something's probably gone
horribly wrong, but there's no harm in making it consistent with
everything else here. However, given that patch #6 makes this irrelevant
anyway, do we really need to bother?

Robin.

> Signed-off-by: Zhen Lei 
> ---
>  drivers/iommu/iova.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index b7268a1..8ba8b496 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -104,8 +104,8 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, 
> struct iova *free)
>   * Computes the padding size required, to make the start address
>   * naturally aligned on the power-of-two order of its size
>   */
> -static unsigned int
> -iova_get_pad_size(unsigned int size, unsigned int limit_pfn)
> +static unsigned long
> +iova_get_pad_size(unsigned long size, unsigned long limit_pfn)
>  {
>   return (limit_pfn + 1 - size) & (__roundup_pow_of_two(size) - 1);
>  }
> @@ -117,7 +117,7 @@ static int __alloc_and_insert_iova_range(struct 
> iova_domain *iovad,
>   struct rb_node *prev, *curr = NULL;
>   unsigned long flags;
>   unsigned long saved_pfn;
> - unsigned int pad_size = 0;
> + unsigned long pad_size = 0;
>  
>   /* Walk the tree backwards */
>   spin_lock_irqsave(>iova_rbtree_lock, flags);
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/5] iommu: Allow default domain type to be set on the kernel command line

2017-03-23 Thread Sricharan R

Hi,

Replying again, as there was some issue with mailer last time.

On 3/21/2017 11:47 PM, Will Deacon wrote:

On Tue, Mar 21, 2017 at 05:46:29PM +, Robin Murphy wrote:

On 21/03/17 17:21, Will Deacon wrote:

On Tue, Mar 21, 2017 at 04:45:27PM +0100, Joerg Roedel wrote:

On Fri, Mar 10, 2017 at 08:49:36PM +, Will Deacon wrote:

@@ -1014,8 +1027,8 @@ struct iommu_group *iommu_group_get_for_dev(struct device 
*dev)
 * IOMMU driver.
 */
if (!group->default_domain) {
-   group->default_domain = __iommu_domain_alloc(dev->bus,
-IOMMU_DOMAIN_DMA);
+   group->default_domain =
+   __iommu_domain_alloc(dev->bus, iommu_def_domain_type);


It would be good to have a fall-back here if we are talking to an IOMMU
driver that uses default domains, but does not support identity-mapped
domains (yet). Exynos and Rockchip IOMMU drivers seem to fall into this
category. A dev_warn() also makes sense in case allocating a identity
domain fails.


Sure, something like the diff below?

Will

--->8


diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 42a842e3f95f..f787626a745d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1027,10 +1027,19 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)
 * IOMMU driver.
 */
if (!group->default_domain) {
-   group->default_domain =
-   __iommu_domain_alloc(dev->bus, iommu_def_domain_type);
+   struct iommu_domain *dom;
+
+   dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type);
+   if (!dom) {
+   dev_warn(dev,
+"failed to allocate default IOMMU domain of type 
%u; falling back to IOMMU_DOMAIN_DMA",
+iommu_def_domain_type);


Conversely, that's going to be noisy if iommu_def_domain_type was
IOMMU_DOMAIN_DMA to begin with. I think it makes sense to warn if the
user asked for a specific default domain type on the command line and
that didn't work, but maybe not to bother otherwise. Plus, if they asked
for passthrough, then not allocating a default domain at all is probably
closer to the desired result than installing a DMA ops domain would be.


You're right -- I'll hack this about to check if the default type isn't
DOMAIN_DMA before warning about the allocation failure.


if some master devices want 'IDENTITY_DOMAIN' as default (because
those devices do not want any iommu resources to be used/dma_ops to be
set) and some 'DMA_DOMAIN' as default, then should the default be
'DMA_DOMAIN' and then masters needing IDENTITY_DOMAIN explicitly do an
detach_dev later. This [1] was adding the support for detach_dev
of the default DMA_DOMAINs.

[1] https://patchwork.codeaurora.org/patch/164933/

Regards,
 Sricharan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory

2017-03-23 Thread Liu, Yi L
Hi Jean,

Thx for the excellent ideas. Pls refer to comments inline.

[...]

> > Hi Jean,
> >
> > I'm working on virtual SVM, and have some comments on the VFIO channel
> > definition.
> 
> Thanks a lot for the comments, this is quite interesting to me. I just have 
> some
> concerns about portability so I'm proposing a way to be slightly more generic 
> below.
> 

yes, portability is what need to consider.

[...]

> >> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >> index
> >> 519eff362c1c..3fe4197a5ea0 100644
> >> --- a/include/uapi/linux/vfio.h
> >> +++ b/include/uapi/linux/vfio.h
> >> @@ -198,6 +198,7 @@ struct vfio_device_info {
> >>  #define VFIO_DEVICE_FLAGS_PCI (1 << 1)/* vfio-pci device */
> >>  #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)   /* vfio-platform device 
> >> */
> >>  #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)  /* vfio-amba device */
> >> +#define VFIO_DEVICE_FLAGS_SVM (1 << 4)/* Device supports 
> >> bind/unbind */
> >>__u32   num_regions;/* Max region index + 1 */
> >>__u32   num_irqs;   /* Max IRQ index + 1 */
> >>  };
> >> @@ -409,6 +410,60 @@ struct vfio_irq_set {
> >>   */
> >>  #define VFIO_DEVICE_RESET _IO(VFIO_TYPE, VFIO_BASE + 11)
> >>
> >> +struct vfio_device_svm {
> >> +  __u32   argsz;
> >> +  __u32   flags;
> >> +#define VFIO_SVM_PASID_RELEASE_FLUSHED(1 << 0)
> >> +#define VFIO_SVM_PASID_RELEASE_CLEAN  (1 << 1)
> >> +  __u32   pasid;
> >> +};
> >
> > For virtual SVM work, the VFIO channel would be used to passdown guest
> > PASID tale PTR and invalidation information. And may have further
> > usage except the above.
> >
> > Here is the virtual SVM design doc which illustrates the VFIO usage.
> > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> >
> > For the guest PASID table ptr passdown, I've following message in pseudo 
> > code.
> > struct pasid_table_info {
> > __u64 ptr;
> > __u32 size;
> >  };
> 
> There should probably be a way to specify the table format, so that the pIOMMU
> driver can check that it recognizes the format used by the vIOMMU before 
> attaching
> it. This would allow to reuse the structure for other IOMMU architectures. 
> If, for
> instance, the host has an intel IOMMU and someone decides to emulate an ARM
> SMMU with Qemu (their loss :), it can certainly use VFIO for passing-through 
> devices
> with MAP/UNMAP. But if Qemu then attempts to passdown a PASID table in SMMU
> format, the Intel driver should have a way to reject it, as the SMMU format 
> isn't
> compatible.

Exactly, it would be grt if we can have the API defined as generic as 
MAP/UNMAP. The
case you mentioned to emulate an ARM SMMU on an Intel platform is 
representative.
For such cases, the problem is different vendors may have different PASID table 
format
and also different page table format. In my understanding, these incompatible 
things
may just result in failure if users try such emulation. What's your opinion 
here?
Anyhow, better to listen to different voices.

> 
> I'm tackling a similar problem at the moment, but for passing a single page 
> directory
> instead of full PASID table to the IOMMU.

For, Intel IOMMU, passing the whole guest PASID table is enough and it also 
avoids 
too much pgd passing. However, I'm open on this idea. You may just add a new 
flag
in "struct vfio_device_svm" and pass the single pgd down to host.

> 
> So we need some kind of high-level classification that the vIOMMU must
> communicate to the physical one. Each IOMMU flavor would get a unique, global
> identifier, simply to make sure that vIOMMU and pIOMMU speak the same 
> language.
> For example:
> 
> 0x65776886 "AMDV" AMD IOMMU
> 0x73788476 "INTL" Intel IOMMU
> 0x83515748 "S390" s390 IOMMU
> 0x8385 "SMMU" ARM SMMU
> etc.
> 
> It needs to be a global magic number that everyone can recognize. Could be as
> simple as 32-bit numbers allocated from 0. Once we have a global magic 
> number, we
> can use it to differentiate architecture-specific details.

I may need to think more on this part.
 
> struct pasid_table_info {
>   __u64 ptr;
>   __u64 size; /* Is it number of entry or size in
>  bytes? */

For Intel platform, it's encoded. But I can make it in bytes. Here, I'd like
to check with you if whole guest PASID info is also needed on ARM?

> 
>   __u32 model;/* magic number */
>   __u32 variant;  /* version of the IOMMU architecture,
>  maybe? IOMMU-specific. */
>   __u8 opaque[];  /* IOMMU-specific details */
> };
> 
> And then each IOMMU or page-table code can do low-level validation of the 
> format,
> by reading the details in 'opaque'. I assume that for Intel this would be 
> empty. But

yes, for Intel, if the PASID ptr is in the definition, opaque would be empty.

> for instance on ARM SMMUv3, PASID table can have either one or two levels, and