Hi Will,

On 7/6/17 8:08 AM, Will Deacon wrote:
> Hi Ray,
> 
> Thanks for testing this, and sorry it didn't help.
> 
> On Wed, Jul 05, 2017 at 04:24:22PM -0700, Ray Jui wrote:
>> On 7/5/17 1:41 AM, Will Deacon wrote:
>>> On Tue, Jul 04, 2017 at 06:45:17PM -0700, Ray Jui wrote:
>>>> Has anything functionally changed between PATCH v2 and v1? I'm seeing a
>>>> very different L2 throughput with v2 (in general a lot worse with v2 vs.
>>>> v1); however, I'm currently unable to reproduce the TLB sync timed out
>>>> issue with v2 (without the patch from Will's email).
>>>>
>>>> It could also be something else that has changed in my setup, but so far
>>>> I have not yet been able to spot anything wrong in the setup.
>>>
>>> There were fixes, and that initially involved a DSB that was found to be
>>> expensive. The patches queued in -next should have that addressed, so please
>>> use those (or my for-joerg/arm-smmu/updates branch).
>>>
>>
>> That was my bad yesterday. I was in a rush and the setup was incorrect.
>>
>> I redo my Ethernet performance test with both PATCH v1 and v2 today, and
>> can confirm the performance is consistent between v1 and v2 as expected.
>>
>> I also made sure the following message can still be reproduced with
>> patch set v2:
>>
>> arm-smmu 64000000.mmu: TLB sync timed out -- SMMU may be deadlocked
>>
>> Then I proceeded to apply your patch that attempt to fix the deadlock
>> issue.
> 
> [...]
> 
>> However, with the fix patch, I can still see the deadlock message when I
>> have > 32 iperf TX threads active in the system:
> 
> Damn. We've been going over this today and the only plausible theory seems
> to be that concurrent TLB syncs are causing the completion to be pushed out,
> resulting in timeouts.
> 
> Can you try this patch below, instead of the one I sent before, please?
> 
> Thanks,
> 
> Will
> 
> --->8

Good news! I can confirm that with the new patch below, the error log of
"TLB sync timed out" is now gone. I ran 20 iterations of the iperf
client test with 64 threads spread across 8 CPUs. The error message used
to coming out very quickly with just one iteration. But now it's
completely gone.

At the same time, I do seem to observe a slight impact on performance
when multiple threads are used, but I guess that is expected, given that
a spin lock is added to protect the TLB sync.

Great work and thanks for the fix!

Ray

> 
> From bbf3737c29e3d18f539998a66f42878ac91cde97 Mon Sep 17 00:00:00 2001
> From: Will Deacon <will.dea...@arm.com>
> Date: Thu, 6 Jul 2017 15:55:48 +0100
> Subject: [PATCH] iommu/arm-smmu: Reintroduce locking around TLB sync
>  operations
> 
> Commit 523d7423e21b ("iommu/arm-smmu: Remove io-pgtable spinlock")
> removed the locking used to serialise map/unmap calls into the io-pgtable
> code from the ARM SMMU driver. This is good for performance, but opens
> us up to a nasty race with TLB syncs because the TLB sync register is
> shared within a context bank (or even globally for stage-2 on SMMUv1).
> 
> There are two cases to consider:
> 
>   1. A CPU can be spinning on the completion of a TLB sync, take an
>      interrupt which issues a subsequent TLB sync, and then report a
>      timeout on return from the interrupt.
> 
>   2. A CPU can be spinning on the completion of a TLB sync, but other
>      CPUs can continuously issue additional TLB syncs in such a way that
>      the backoff logic reports a timeout.
> 
> Rather than fix this by spinning for completion of prior TLB syncs before
> issuing a new one (which may suffer from fairness issues on large systems),
> instead reintroduce locking around TLB sync operations in the ARM SMMU
> driver.
> 
> Fixes: 523d7423e21b ("iommu/arm-smmu: Remove io-pgtable spinlock")
> Cc: Robin Murphy <robin.mur...@arm.com>
> Reported-by: Ray Jui <ray....@broadcom.com>
> Signed-off-by: Will Deacon <will.dea...@arm.com>
> ---
>  drivers/iommu/arm-smmu.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index b446183b3015..770abd247f40 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -400,6 +400,8 @@ struct arm_smmu_device {
>  
>       u32                             cavium_id_base; /* Specific to Cavium */
>  
> +     spinlock_t                      global_sync_lock;
> +
>       /* IOMMU core code handle */
>       struct iommu_device             iommu;
>  };
> @@ -436,7 +438,7 @@ struct arm_smmu_domain {
>       struct arm_smmu_cfg             cfg;
>       enum arm_smmu_domain_stage      stage;
>       struct mutex                    init_mutex; /* Protects smmu pointer */
> -     spinlock_t                      cb_lock; /* Serialises ATS1* ops */
> +     spinlock_t                      cb_lock; /* Serialises ATS1* ops and 
> TLB syncs */
>       struct iommu_domain             domain;
>  };
>  
> @@ -602,9 +604,12 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device 
> *smmu,
>  static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu)
>  {
>       void __iomem *base = ARM_SMMU_GR0(smmu);
> +     unsigned long flags;
>  
> +     spin_lock_irqsave(&smmu->global_sync_lock, flags);
>       __arm_smmu_tlb_sync(smmu, base + ARM_SMMU_GR0_sTLBGSYNC,
>                           base + ARM_SMMU_GR0_sTLBGSTATUS);
> +     spin_unlock_irqrestore(&smmu->global_sync_lock, flags);
>  }
>  
>  static void arm_smmu_tlb_sync_context(void *cookie)
> @@ -612,9 +617,12 @@ static void arm_smmu_tlb_sync_context(void *cookie)
>       struct arm_smmu_domain *smmu_domain = cookie;
>       struct arm_smmu_device *smmu = smmu_domain->smmu;
>       void __iomem *base = ARM_SMMU_CB(smmu, smmu_domain->cfg.cbndx);
> +     unsigned long flags;
>  
> +     spin_lock_irqsave(&smmu_domain->cb_lock, flags);
>       __arm_smmu_tlb_sync(smmu, base + ARM_SMMU_CB_TLBSYNC,
>                           base + ARM_SMMU_CB_TLBSTATUS);
> +     spin_unlock_irqrestore(&smmu_domain->cb_lock, flags);
>  }
>  
>  static void arm_smmu_tlb_sync_vmid(void *cookie)
> @@ -1925,6 +1933,7 @@ static int arm_smmu_device_cfg_probe(struct 
> arm_smmu_device *smmu)
>  
>       smmu->num_mapping_groups = size;
>       mutex_init(&smmu->stream_map_mutex);
> +     spin_lock_init(&smmu->global_sync_lock);
>  
>       if (smmu->version < ARM_SMMU_V2 || !(id & ID0_PTFS_NO_AARCH32)) {
>               smmu->features |= ARM_SMMU_FEAT_FMT_AARCH32_L;
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to