Hi Nadav,
> -----Original Message-----
> From: Nadav Amit [mailto:[email protected]]
> Sent: Wednesday, March 17, 2021 1:46 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <[email protected]>
> Cc: David Woodhouse <[email protected]>; Lu Baolu
> <[email protected]>; Joerg Roedel <[email protected]>; [email protected];
> [email protected]; chenjiashang <[email protected]>;
> [email protected]; Gonglei (Arei) <[email protected]>;
> LKML <[email protected]>
> Subject: Re: A problem of Intel IOMMU hardware ?
>
>
>
> > On Mar 16, 2021, at 8:16 PM, Longpeng (Mike, Cloud Infrastructure Service
> Product Dept.) <[email protected]> wrote:
> >
> > Hi guys,
> >
> > We find the Intel iommu cache (i.e. iotlb) maybe works wrong in a
> > special situation, it would cause DMA fails or get wrong data.
> >
> > The reproducer (based on Alex's vfio testsuite[1]) is in attachment,
> > it can reproduce the problem with high probability (~50%).
>
> I saw Lu replied, and he is much more knowledgable than I am (I was just
> intrigued
> by your email).
>
> However, if I were you I would try also to remove some “optimizations” to
> look for
> the root-cause (e.g., use domain specific invalidations instead of
> page-specific).
>
Good suggestion! But we did it these days, we tried to use global invalidations
as follow:
iommu->flush.flush_iotlb(iommu, did, 0, 0,
DMA_TLB_DSI_FLUSH);
But can not resolve the problem.
> The first thing that comes to my mind is the invalidation hint (ih) in
> iommu_flush_iotlb_psi(). I would remove it to see whether you get the failure
> without it.
We also notice the IH, but the IH is always ZERO in our case, as the spec says:
'''
Paging-structure-cache entries caching second-level mappings associated with
the specified
domain-id and the second-level-input-address range are invalidated, if the
Invalidation Hint
(IH) field is Clear.
'''
It seems the software is everything fine, so we've no choice but to suspect the
hardware.