Hi Nadav,

On 3/18/21 2:12 AM, Nadav Amit wrote:


On Mar 17, 2021, at 2:35 AM, Longpeng (Mike, Cloud Infrastructure Service Product 
Dept.) <longpe...@huawei.com> wrote:

Hi Nadav,

-----Original Message-----
From: Nadav Amit [mailto:nadav.a...@gmail.com]
  reproduce the problem with high probability (~50%).

I saw Lu replied, and he is much more knowledgable than I am (I was just 
intrigued
by your email).

However, if I were you I would try also to remove some “optimizations” to look 
for
the root-cause (e.g., use domain specific invalidations instead of 
page-specific).


Good suggestion! But we did it these days, we tried to use global invalidations 
as follow:
                iommu->flush.flush_iotlb(iommu, did, 0, 0,
                                                DMA_TLB_DSI_FLUSH);
But can not resolve the problem.

The first thing that comes to my mind is the invalidation hint (ih) in
iommu_flush_iotlb_psi(). I would remove it to see whether you get the failure
without it.

We also notice the IH, but the IH is always ZERO in our case, as the spec says:
'''
Paging-structure-cache entries caching second-level mappings associated with 
the specified
domain-id and the second-level-input-address range are invalidated, if the 
Invalidation Hint
(IH) field is Clear.
'''

It seems the software is everything fine, so we've no choice but to suspect the 
hardware.

Ok, I am pretty much out of ideas. I have two more suggestions, but
they are much less likely to help. Yet, they can further help to rule
out software bugs:

1. dma_clear_pte() seems to be wrong IMHO. It should have used WRITE_ONCE()
to prevent split-write, which might potentially cause “invalid” (partially
cleared) PTE to be stored in the TLB. Having said that, the subsequent
IOTLB flush should have prevented the problem.

Agreed. The pte read/write should use READ/WRITE_ONCE() instead.


2. Consider ensuring that the problem is not somehow related to queued
invalidations. Try to use __iommu_flush_iotlb() instead of
qi_flush_iotlb().

Regards,
Nadav


Best regards,
baolu
  • Re: A problem ... Alex Williamson
    • Re: A pro... Lu Baolu
      • RE: A... Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
    • Re: A pro... Nadav Amit
      • Re: A... Lu Baolu
      • RE: A... Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
        • R... Tian, Kevin
          • ... Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
            • ... Tian, Kevin
              • ... Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
            • ... Tian, Kevin
              • ... Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
                • ... Nadav Amit
              • ... Lu Baolu

Reply via email to