RE: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-08 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Baolu,

> -Original Message-
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, April 8, 2021 12:32 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> ; io...@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org
> Cc: baolu...@linux.intel.com; David Woodhouse ; Nadav
> Amit ; Alex Williamson ;
> Kevin Tian ; Gonglei (Arei) ;
> sta...@vger.kernel.org
> Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating 
> superpage
> 
> Hi Longpeng,
> 
> On 4/7/21 2:35 PM, Longpeng (Mike, Cloud Infrastructure Service Product
> Dept.) wrote:
> > Hi Baolu,
> >
> >> -Original Message-
> >> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> >> Sent: Friday, April 2, 2021 12:44 PM
> >> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >> ; io...@lists.linux-foundation.org;
> >> linux-kernel@vger.kernel.org
> >> Cc: baolu...@linux.intel.com; David Woodhouse ;
> >> Nadav Amit ; Alex Williamson
> >> ; Kevin Tian ;
> >> Gonglei (Arei) ; sta...@vger.kernel.org
> >> Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating
> >> superpage
> >>
> >> Hi Longpeng,
> >>
> >> On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
> >>> diff --git a/drivers/iommu/intel/iommu.c
> >>> b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644
> >>> --- a/drivers/iommu/intel/iommu.c
> >>> +++ b/drivers/iommu/intel/iommu.c
> >>> @@ -2342,9 +2342,20 @@ static inline int
> >>> hardware_largepage_caps(struct
> >> dmar_domain *domain,
> >>>* removed to make room for 
> >>> superpage(s).
> >>>* We're adding new large pages, so 
> >>> make sure
> >>>* we don't remove their parent tables.
> >>> +  *
> >>> +  * We also need to flush the iotlb before 
> >>> creating
> >>> +  * superpage to ensure it does not perserves any
> >>> +  * obsolete info.
> >>>*/
> >>> - dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
> >>> -largepage_lvl + 1);
> >>> + if (dma_pte_present(pte)) {
> >>
> >> The dma_pte_free_pagetable() clears a batch of PTEs. So checking
> >> current PTE is insufficient. How about removing this check and always
> >> performing cache invalidation?
> >>
> >
> > Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping )
> orNOT-present ( e.g. create a totally new superpage mapping ), but we only 
> need to
> call free_pagetable and flush_iotlb in the former case, right ?
> 
> But this code covers multiple PTEs and perhaps crosses the page boundary.
> 
> How about moving this code into a separated function and check PTE presence
> there. A sample code could look like below: [compiled but not tested!]
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index
> d334f5b4e382..0e04d450c38a 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2300,6 +2300,41 @@ static inline int hardware_largepage_caps(struct
> dmar_domain *domain,
>  return level;
>   }
> 
> +/*
> + * Ensure that old small page tables are removed to make room for
> superpage(s).
> + * We're going to add new large pages, so make sure we don't remove
> their parent
> + * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
> + */
> +static void switch_to_super_page(struct dmar_domain *domain,
> +unsigned long start_pfn,
> +unsigned long end_pfn, int level) {

Maybe "swith_to" will lead people to think "remove old and then setup new", so 
how about something like "remove_room_for_super_page" or 
"prepare_for_super_page" ?

> +   unsigned long lvl_pages = lvl_to_nr_pages(level);
> +   struct dma_pte *pte = NULL;
> +   int i;
> +
> +   while (start_pfn <= end_pfn) {

start_pfn < end_pfn ?

> +   if (!pte)
> +   pte = pfn_to_dma_pte(domain, start_pfn, );
> +
> +   if (dma_pte_present(pte)) {
> +   dma_pte_free_pagetable(domain, start_pfn,
> + 

RE: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-07 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Baolu,

> -Original Message-
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Friday, April 2, 2021 12:44 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> ; io...@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org
> Cc: baolu...@linux.intel.com; David Woodhouse ; Nadav
> Amit ; Alex Williamson ;
> Kevin Tian ; Gonglei (Arei) ;
> sta...@vger.kernel.org
> Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating 
> superpage
> 
> Hi Longpeng,
> 
> On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index ee09323..cbcb434 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct
> dmar_domain *domain,
> >  * removed to make room for superpage(s).
> >  * We're adding new large pages, so make sure
> >  * we don't remove their parent tables.
> > +*
> > +* We also need to flush the iotlb before 
> > creating
> > +* superpage to ensure it does not perserves any
> > +* obsolete info.
> >  */
> > -   dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
> > -  largepage_lvl + 1);
> > +   if (dma_pte_present(pte)) {
> 
> The dma_pte_free_pagetable() clears a batch of PTEs. So checking current PTE 
> is
> insufficient. How about removing this check and always performing cache
> invalidation?
> 

Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping ) or 
NOT-present ( e.g. create a totally new superpage mapping ), but we only need 
to call free_pagetable and flush_iotlb in the former case, right ?

> > +   int i;
> > +
> > +   dma_pte_free_pagetable(domain, iov_pfn, 
> > end_pfn,
> > +  largepage_lvl + 
> > 1);
> > +   for_each_domain_iommu(i, domain)
> > +   
> > iommu_flush_iotlb_psi(g_iommus[i], domain,
> > + iov_pfn, 
> > nr_pages, 0, 0);
> > +
> 
> Best regards,
> baolu


Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-01 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Baolu,

在 2021/4/2 11:06, Lu Baolu 写道:
> Hi Longpeng,
> 
> On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
>> The translation caches may preserve obsolete data when the
>> mapping size is changed, suppose the following sequence which
>> can reveal the problem with high probability.
>>
>> 1.mmap(4GB,MAP_HUGETLB)
>> 2.
>>    while (1) {
>>     (a)    DMA MAP   0,0xa
>>     (b)    DMA UNMAP 0,0xa
>>     (c)    DMA MAP   0,0xc000
>>   * DMA read IOVA 0 may failure here (Not present)
>>   * if the problem occurs.
>>     (d)    DMA UNMAP 0,0xc000
>>    }
>>
>> The page table(only focus on IOVA 0) after (a) is:
>>   PML4: 0x19db5c1003   entry:0x899bdcd2f000
>>    PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
>>     PDE: 0x1a30a72003  entry:0x89b39cacb000
>>  PTE: 0x21d200803  entry:0x89b3b0a72000
>>
>> The page table after (b) is:
>>   PML4: 0x19db5c1003   entry:0x899bdcd2f000
>>    PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
>>     PDE: 0x1a30a72003  entry:0x89b39cacb000
>>  PTE: 0x0  entry:0x89b3b0a72000
>>
>> The page table after (c) is:
>>   PML4: 0x19db5c1003   entry:0x899bdcd2f000
>>    PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
>>     PDE: 0x21d200883   entry:0x89b39cacb000 (*)
>>
>> Because the PDE entry after (b) is present, it won't be
>> flushed even if the iommu driver flush cache when unmap,
>> so the obsolete data may be preserved in cache, which
>> would cause the wrong translation at end.
>>
>> However, we can see the PDE entry is finally switch to
>> 2M-superpage mapping, but it does not transform
>> to 0x21d200883 directly:
>>
>> 1. PDE: 0x1a30a72003
>> 2. __domain_mapping
>>   dma_pte_free_pagetable
>>     Set the PDE entry to ZERO
>>   Set the PDE entry to 0x21d200883
>>
>> So we must flush the cache after the entry switch to ZERO
>> to avoid the obsolete info be preserved.
>>
>> Cc: David Woodhouse 
>> Cc: Lu Baolu 
>> Cc: Nadav Amit 
>> Cc: Alex Williamson 
>> Cc: Kevin Tian 
>> Cc: Gonglei (Arei) 
>>
>> Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating
>> superpage")
>> Cc:  # v3.0+
>> Link:
>> https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5...@huawei.com/
>>
>> Suggested-by: Lu Baolu 
>> Signed-off-by: Longpeng(Mike) 
>> ---
>>   drivers/iommu/intel/iommu.c | 15 +--
>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index ee09323..cbcb434 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct
>> dmar_domain *domain,
>>    * removed to make room for superpage(s).
>>    * We're adding new large pages, so make sure
>>    * we don't remove their parent tables.
>> + *
>> + * We also need to flush the iotlb before creating
>> + * superpage to ensure it does not perserves any
>> + * obsolete info.
>>    */
>> -    dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
>> -   largepage_lvl + 1);
>> +    if (dma_pte_present(pte)) {
>> +    int i;
>> +
>> +    dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
>> +   largepage_lvl + 1);
>> +    for_each_domain_iommu(i, domain)
>> +    iommu_flush_iotlb_psi(g_iommus[i], domain,
>> +  iov_pfn, nr_pages, 0, 0);
> 
> Thanks for patch!
> 
> How about making the flushed page size accurate? For example,
> 
> @@ -2365,8 +2365,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned 
> long
> iov_pfn,
>     dma_pte_free_pagetable(domain, 
> iov_pfn,
> end_pfn,
> 
> largepage_lvl + 1);
>     for_each_domain_iommu(i, domain)
> - iommu_flush_iotlb_psi(g_iommus[i], domain,
> - iov_pfn, nr_pages, 0, 0);
> + iommu_flush_iotlb_psi(g_iommus[i], domain, iov_pfn,
> + ALIGN_DOWN(nr_pages, lvl_pages), 0, 0);
> 
Yes, make sense.

Maybe another alternative is 'end_pfn - iova_pfn + 1', it's readable because we
free pagetable with (iova_pfn, end_pfn) above. Which one do you prefer?

> 
>> +    }
>>   } else {
>>   pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
>>   }
>>
> 
> Best regards,
> baolu
> .


RE: A problem of Intel IOMMU hardware ?

2021-03-21 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)


> -Original Message-
> From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> Sent: Monday, March 22, 2021 7:51 AM
> To: 'Nadav Amit' 
> Cc: Tian, Kevin ; chenjiashang
> ; David Woodhouse ;
> io...@lists.linux-foundation.org; LKML ;
> alex.william...@redhat.com; Gonglei (Arei) ;
> w...@kernel.org; 'Lu Baolu' ; 'Joerg Roedel'
> 
> Subject: RE: A problem of Intel IOMMU hardware ?
> 
> Hi Nadav,
> 
> > -Original Message-
> > From: Nadav Amit [mailto:nadav.a...@gmail.com]
> > Sent: Friday, March 19, 2021 12:46 AM
> > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > 
> > Cc: Tian, Kevin ; chenjiashang
> > ; David Woodhouse ;
> > io...@lists.linux-foundation.org; LKML ;
> > alex.william...@redhat.com; Gonglei (Arei) ;
> > w...@kernel.org
> > Subject: Re: A problem of Intel IOMMU hardware ?
> >
> >
> >
> > > On Mar 18, 2021, at 2:25 AM, Longpeng (Mike, Cloud Infrastructure
> > > Service
> > Product Dept.)  wrote:
> > >
> > >
> > >
> > >> -----Original Message-
> > >> From: Tian, Kevin [mailto:kevin.t...@intel.com]
> > >> Sent: Thursday, March 18, 2021 4:56 PM
> > >> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > >> ; Nadav Amit 
> > >> Cc: chenjiashang ; David Woodhouse
> > >> ; io...@lists.linux-foundation.org; LKML
> > >> ; alex.william...@redhat.com; Gonglei
> > >> (Arei) ; w...@kernel.org
> > >> Subject: RE: A problem of Intel IOMMU hardware ?
> > >>
> > >>> From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > >>> 
> > >>>
> > >>>> -Original Message-
> > >>>> From: Tian, Kevin [mailto:kevin.t...@intel.com]
> > >>>> Sent: Thursday, March 18, 2021 4:27 PM
> > >>>> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > >>>> ; Nadav Amit 
> > >>>> Cc: chenjiashang ; David Woodhouse
> > >>>> ; io...@lists.linux-foundation.org; LKML
> > >>>> ; alex.william...@redhat.com;
> > >>>> Gonglei
> > >>> (Arei)
> > >>>> ; w...@kernel.org
> > >>>> Subject: RE: A problem of Intel IOMMU hardware ?
> > >>>>
> > >>>>> From: iommu  On Behalf
> > >>>>> Of Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > >>>>>
> > >>>>>> 2. Consider ensuring that the problem is not somehow related to
> > >>>>>> queued invalidations. Try to use __iommu_flush_iotlb() instead
> > >>>>>> of
> > >>>> qi_flush_iotlb().
> > >>>>>>
> > >>>>>
> > >>>>> I tried to force to use __iommu_flush_iotlb(), but maybe
> > >>>>> something wrong, the system crashed, so I prefer to lower the
> > >>>>> priority of this
> > >>> operation.
> > >>>>>
> > >>>>
> > >>>> The VT-d spec clearly says that register-based invalidation can
> > >>>> be used only
> > >>> when
> > >>>> queued-invalidations are not enabled. Intel-IOMMU driver doesn't
> > >>>> provide
> > >>> an
> > >>>> option to disable queued-invalidation though, when the hardware
> > >>>> is
> > >>> capable. If you
> > >>>> really want to try, tweak the code in intel_iommu_init_qi.
> > >>>>
> > >>>
> > >>> Hi Kevin,
> > >>>
> > >>> Thanks to point out this. Do you have any ideas about this problem ?
> > >>> I tried to descript the problem much clear in my reply to Alex,
> > >>> hope you could have a look if you're interested.
> > >>>
> > >>
> > >> btw I saw you used 4.18 kernel in this test. What about latest kernel?
> > >>
> > >
> > > Not test yet. It's hard to upgrade kernel in our environment.
> > >
> > >> Also one way to separate sw/hw bug is to trace the low level
> > >> interface (e.g.,
> > >> qi_flush_iotlb) which actually sends invalidation descriptors to
> > >> the IOMMU hardware. Check the window between b) and c) and see
> > >> whether the software does the right thing as expected the

RE: A problem of Intel IOMMU hardware ?

2021-03-21 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Nadav,

> -Original Message-
> From: Nadav Amit [mailto:nadav.a...@gmail.com]
> Sent: Friday, March 19, 2021 12:46 AM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> 
> Cc: Tian, Kevin ; chenjiashang
> ; David Woodhouse ;
> io...@lists.linux-foundation.org; LKML ;
> alex.william...@redhat.com; Gonglei (Arei) ;
> w...@kernel.org
> Subject: Re: A problem of Intel IOMMU hardware ?
> 
> 
> 
> > On Mar 18, 2021, at 2:25 AM, Longpeng (Mike, Cloud Infrastructure Service
> Product Dept.)  wrote:
> >
> >
> >
> >> -Original Message-
> >> From: Tian, Kevin [mailto:kevin.t...@intel.com]
> >> Sent: Thursday, March 18, 2021 4:56 PM
> >> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >> ; Nadav Amit 
> >> Cc: chenjiashang ; David Woodhouse
> >> ; io...@lists.linux-foundation.org; LKML
> >> ; alex.william...@redhat.com; Gonglei
> >> (Arei) ; w...@kernel.org
> >> Subject: RE: A problem of Intel IOMMU hardware ?
> >>
> >>> From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >>> 
> >>>
> >>>> -Original Message-
> >>>> From: Tian, Kevin [mailto:kevin.t...@intel.com]
> >>>> Sent: Thursday, March 18, 2021 4:27 PM
> >>>> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >>>> ; Nadav Amit 
> >>>> Cc: chenjiashang ; David Woodhouse
> >>>> ; io...@lists.linux-foundation.org; LKML
> >>>> ; alex.william...@redhat.com; Gonglei
> >>> (Arei)
> >>>> ; w...@kernel.org
> >>>> Subject: RE: A problem of Intel IOMMU hardware ?
> >>>>
> >>>>> From: iommu  On Behalf
> >>>>> Of Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >>>>>
> >>>>>> 2. Consider ensuring that the problem is not somehow related to
> >>>>>> queued invalidations. Try to use __iommu_flush_iotlb() instead of
> >>>> qi_flush_iotlb().
> >>>>>>
> >>>>>
> >>>>> I tried to force to use __iommu_flush_iotlb(), but maybe something
> >>>>> wrong, the system crashed, so I prefer to lower the priority of
> >>>>> this
> >>> operation.
> >>>>>
> >>>>
> >>>> The VT-d spec clearly says that register-based invalidation can be
> >>>> used only
> >>> when
> >>>> queued-invalidations are not enabled. Intel-IOMMU driver doesn't
> >>>> provide
> >>> an
> >>>> option to disable queued-invalidation though, when the hardware is
> >>> capable. If you
> >>>> really want to try, tweak the code in intel_iommu_init_qi.
> >>>>
> >>>
> >>> Hi Kevin,
> >>>
> >>> Thanks to point out this. Do you have any ideas about this problem ?
> >>> I tried to descript the problem much clear in my reply to Alex, hope
> >>> you could have a look if you're interested.
> >>>
> >>
> >> btw I saw you used 4.18 kernel in this test. What about latest kernel?
> >>
> >
> > Not test yet. It's hard to upgrade kernel in our environment.
> >
> >> Also one way to separate sw/hw bug is to trace the low level
> >> interface (e.g.,
> >> qi_flush_iotlb) which actually sends invalidation descriptors to the
> >> IOMMU hardware. Check the window between b) and c) and see whether
> >> the software does the right thing as expected there.
> >>
> >
> > We add some log in iommu driver these days, the software seems fine.
> > But we didn't look inside the qi_submit_sync yet, I'll try it tonight.
> 
> So here is my guess:
> 
> Intel probably used as a basis for the IOTLB an implementation of some other
> (regular) TLB design.
> 
> Intel SDM says regarding TLBs (4.10.4.2 “Recommended Invalidation”):
> 
> "Software wishing to prevent this uncertainty should not write to a
> paging-structure entry in a way that would change, for any linear address, 
> both the
> page size and either the page frame, access rights, or other attributes.”
> 
> 
> Now the aforementioned uncertainty is a bit different (multiple
> *valid* translations of a single address). Yet, perhaps this is yet another 
> thing that
> might happen.
> 
> From a brief look on the handling of MMU (not IOMMU) hugepages in Linux, 
> indeed
> the PMD is first 

RE: A problem of Intel IOMMU hardware ?

2021-03-18 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)


> -Original Message-
> From: Tian, Kevin [mailto:kevin.t...@intel.com]
> Sent: Thursday, March 18, 2021 4:56 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> ; Nadav Amit 
> Cc: chenjiashang ; David Woodhouse
> ; io...@lists.linux-foundation.org; LKML
> ; alex.william...@redhat.com; Gonglei (Arei)
> ; w...@kernel.org
> Subject: RE: A problem of Intel IOMMU hardware ?
> 
> > From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > 
> >
> > > -Original Message-
> > > From: Tian, Kevin [mailto:kevin.t...@intel.com]
> > > Sent: Thursday, March 18, 2021 4:27 PM
> > > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > > ; Nadav Amit 
> > > Cc: chenjiashang ; David Woodhouse
> > > ; io...@lists.linux-foundation.org; LKML
> > > ; alex.william...@redhat.com; Gonglei
> > (Arei)
> > > ; w...@kernel.org
> > > Subject: RE: A problem of Intel IOMMU hardware ?
> > >
> > > > From: iommu  On Behalf
> > > > Of Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > > >
> > > > > 2. Consider ensuring that the problem is not somehow related to
> > > > > queued invalidations. Try to use __iommu_flush_iotlb() instead
> > > > > of
> > > qi_flush_iotlb().
> > > > >
> > > >
> > > > I tried to force to use __iommu_flush_iotlb(), but maybe something
> > > > wrong, the system crashed, so I prefer to lower the priority of
> > > > this
> > operation.
> > > >
> > >
> > > The VT-d spec clearly says that register-based invalidation can be
> > > used only
> > when
> > > queued-invalidations are not enabled. Intel-IOMMU driver doesn't
> > > provide
> > an
> > > option to disable queued-invalidation though, when the hardware is
> > capable. If you
> > > really want to try, tweak the code in intel_iommu_init_qi.
> > >
> >
> > Hi Kevin,
> >
> > Thanks to point out this. Do you have any ideas about this problem ? I
> > tried to descript the problem much clear in my reply to Alex, hope you
> > could have a look if you're interested.
> >
> 
> btw I saw you used 4.18 kernel in this test. What about latest kernel?
> 

Not test yet. It's hard to upgrade kernel in our environment.

> Also one way to separate sw/hw bug is to trace the low level interface (e.g.,
> qi_flush_iotlb) which actually sends invalidation descriptors to the IOMMU
> hardware. Check the window between b) and c) and see whether the software does
> the right thing as expected there.
> 

We add some log in iommu driver these days, the software seems fine. But we
didn't look inside the qi_submit_sync yet, I'll try it tonight.

> Thanks
> Kevin


RE: A problem of Intel IOMMU hardware ?

2021-03-18 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)


> -Original Message-
> From: Tian, Kevin [mailto:kevin.t...@intel.com]
> Sent: Thursday, March 18, 2021 4:43 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> ; Nadav Amit 
> Cc: chenjiashang ; David Woodhouse
> ; io...@lists.linux-foundation.org; LKML
> ; alex.william...@redhat.com; Gonglei (Arei)
> ; w...@kernel.org
> Subject: RE: A problem of Intel IOMMU hardware ?
> 
> > From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > 
> >
> >
> > > -Original Message-
> > > From: Tian, Kevin [mailto:kevin.t...@intel.com]
> > > Sent: Thursday, March 18, 2021 4:27 PM
> > > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > > ; Nadav Amit 
> > > Cc: chenjiashang ; David Woodhouse
> > > ; io...@lists.linux-foundation.org; LKML
> > > ; alex.william...@redhat.com; Gonglei
> > (Arei)
> > > ; w...@kernel.org
> > > Subject: RE: A problem of Intel IOMMU hardware ?
> > >
> > > > From: iommu  On Behalf
> > > > Of Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > > >
> > > > > 2. Consider ensuring that the problem is not somehow related to
> > > > > queued invalidations. Try to use __iommu_flush_iotlb() instead
> > > > > of
> > > qi_flush_iotlb().
> > > > >
> > > >
> > > > I tried to force to use __iommu_flush_iotlb(), but maybe something
> > > > wrong, the system crashed, so I prefer to lower the priority of
> > > > this
> > operation.
> > > >
> > >
> > > The VT-d spec clearly says that register-based invalidation can be
> > > used only
> > when
> > > queued-invalidations are not enabled. Intel-IOMMU driver doesn't
> > > provide
> > an
> > > option to disable queued-invalidation though, when the hardware is
> > capable. If you
> > > really want to try, tweak the code in intel_iommu_init_qi.
> > >
> >
> > Hi Kevin,
> >
> > Thanks to point out this. Do you have any ideas about this problem ? I
> > tried to descript the problem much clear in my reply to Alex, hope you
> > could have a look if you're interested.
> >
> 
> I agree with Nadav. Looks this implies some stale paging structure cache 
> entry (e.g.
> PMD) is not invalidated properly. It's better if Baolu can reproduce this 
> problem in
> his local environment and then do more debug to identify whether it's a 
> software or
> hardware defect.
> 
> btw what is the device under test? Does it support ATS?
> 

The device is our offload card, it does not support ATS cap.

> Thanks
> Kevin


RE: A problem of Intel IOMMU hardware ?

2021-03-18 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)


> -Original Message-
> From: Tian, Kevin [mailto:kevin.t...@intel.com]
> Sent: Thursday, March 18, 2021 4:27 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> ; Nadav Amit 
> Cc: chenjiashang ; David Woodhouse
> ; io...@lists.linux-foundation.org; LKML
> ; alex.william...@redhat.com; Gonglei (Arei)
> ; w...@kernel.org
> Subject: RE: A problem of Intel IOMMU hardware ?
> 
> > From: iommu  On Behalf Of
> > Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >
> > > 2. Consider ensuring that the problem is not somehow related to
> > > queued invalidations. Try to use __iommu_flush_iotlb() instead of
> qi_flush_iotlb().
> > >
> >
> > I tried to force to use __iommu_flush_iotlb(), but maybe something
> > wrong, the system crashed, so I prefer to lower the priority of this 
> > operation.
> >
> 
> The VT-d spec clearly says that register-based invalidation can be used only 
> when
> queued-invalidations are not enabled. Intel-IOMMU driver doesn't provide an
> option to disable queued-invalidation though, when the hardware is capable. 
> If you
> really want to try, tweak the code in intel_iommu_init_qi.
> 

Hi Kevin,

Thanks to point out this. Do you have any ideas about this problem ? I tried
to descript the problem much clear in my reply to Alex, hope you could have
a look if you're interested.

> Thanks
> Kevin


RE: A problem of Intel IOMMU hardware ?

2021-03-18 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Nadav,

> -Original Message-
> From: Nadav Amit [mailto:nadav.a...@gmail.com]
> Sent: Thursday, March 18, 2021 2:13 AM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> 
> Cc: David Woodhouse ; Lu Baolu
> ; Joerg Roedel ; w...@kernel.org;
> alex.william...@redhat.com; chenjiashang ;
> io...@lists.linux-foundation.org; Gonglei (Arei) ;
> LKML 
> Subject: Re: A problem of Intel IOMMU hardware ?
> 
> 
> 
> > On Mar 17, 2021, at 2:35 AM, Longpeng (Mike, Cloud Infrastructure Service
> Product Dept.)  wrote:
> >
> > Hi Nadav,
> >
> >> -Original Message-
> >> From: Nadav Amit [mailto:nadav.a...@gmail.com]
> >>>  reproduce the problem with high probability (~50%).
> >>
> >> I saw Lu replied, and he is much more knowledgable than I am (I was
> >> just intrigued by your email).
> >>
> >> However, if I were you I would try also to remove some
> >> “optimizations” to look for the root-cause (e.g., use domain specific
> invalidations instead of page-specific).
> >>
> >
> > Good suggestion! But we did it these days, we tried to use global 
> > invalidations as
> follow:
> > iommu->flush.flush_iotlb(iommu, did, 0, 0,
> > DMA_TLB_DSI_FLUSH);
> > But can not resolve the problem.
> >
> >> The first thing that comes to my mind is the invalidation hint (ih)
> >> in iommu_flush_iotlb_psi(). I would remove it to see whether you get
> >> the failure without it.
> >
> > We also notice the IH, but the IH is always ZERO in our case, as the spec 
> > says:
> > '''
> > Paging-structure-cache entries caching second-level mappings
> > associated with the specified domain-id and the
> > second-level-input-address range are invalidated, if the Invalidation
> > Hint
> > (IH) field is Clear.
> > '''
> >
> > It seems the software is everything fine, so we've no choice but to suspect 
> > the
> hardware.
> 
> Ok, I am pretty much out of ideas. I have two more suggestions, but they are 
> much
> less likely to help. Yet, they can further help to rule out software bugs:
> 
> 1. dma_clear_pte() seems to be wrong IMHO. It should have used WRITE_ONCE()
> to prevent split-write, which might potentially cause “invalid” (partially
> cleared) PTE to be stored in the TLB. Having said that, the subsequent IOTLB 
> flush
> should have prevented the problem.
> 

Yes, use WRITE_ONCE is much safer, however I was just testing the following 
code,
it didn't resolved my problem.

static inline void dma_clear_pte(struct dma_pte *pte)
{
WRITE_ONCE(pte->val, 0ULL);
}

> 2. Consider ensuring that the problem is not somehow related to queued
> invalidations. Try to use __iommu_flush_iotlb() instead of qi_flush_iotlb().
> 

I tried to force to use __iommu_flush_iotlb(), but maybe something wrong,
the system crashed, so I prefer to lower the priority of this operation.

> Regards,
> Nadav


RE: A problem of Intel IOMMU hardware ?

2021-03-17 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi guys,

I provide more information, please see below

> -Original Message-
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, March 18, 2021 10:59 AM
> To: Alex Williamson 
> Cc: baolu...@linux.intel.com; Longpeng (Mike, Cloud Infrastructure Service 
> Product
> Dept.) ; dw...@infradead.org; j...@8bytes.org;
> w...@kernel.org; io...@lists.linux-foundation.org; LKML
> ; Gonglei (Arei) ;
> chenjiashang 
> Subject: Re: A problem of Intel IOMMU hardware ?
> 
> Hi Alex,
> 
> On 3/17/21 11:18 PM, Alex Williamson wrote:
> >>>   {MAP,   0x0, 0xc000}, - (b)
> >>>   use GDB to pause at here, and then DMA read
> >>> IOVA=0,
> >> IOVA 0 seems to be a special one. Have you verified with other
> >> addresses than IOVA 0?
> > It is???  That would be a problem.
> >
> 
> No problem from hardware point of view as far as I can see. Just thought about
> software might handle it specially.
> 

We simplify the reproducer, use the following map/unmap sequences can also 
reproduce the problem.

1. use 2M hugetlbfs to mmap 4G memory

2. run the while loop:
While (1) {
DMA MAP (0, 0xa) - - - - - - - - - - - - - -(a)
DMA UNMAP (0, 0xa) - - - - - - - - - - - (b)
  Operation-1 : dump DMAR table
DMA MAP (0, 0xc000) - - - - - - - - - - -(c)
  Operation-2 :
 use GDB to pause at here, then DMA read IOVA=0,
 sometimes DMA success (as expected),
 but sometimes DMA error (report not-present).
  Operation-3 : dump DMAR table
  Operation-4 (when DMA error) : please see below
DMA UNMAP (0, 0xc000) - - - - - - - - -(d)
}

The DMAR table of Operation-1 is (only show the entries about IOVA 0):

PML4: 0x  1a34fbb003
  PDPE: 0x  1a34fbb003
   PDE: 0x  1a34fbf003
PTE: 0x   0

And the table of Operation-3 is:

PML4: 0x  1a34fbb003
  PDPE: 0x  1a34fbb003
   PDE: 0x   15ec00883 < - - 2M superpage

So we can see the IOVA 0 is mapped, but the DMA read is error:

dmar_fault: 131757 callbacks suppressed
DRHD: handling fault status reg 402
[DMA Read] Request device [86:05.6] fault addr 0 [fault reason 06] PTE Read 
access is not set
[DMA Read] Request device [86:05.6] fault addr 0 [fault reason 06] PTE Read 
access is not set
DRHD: handling fault status reg 600
DRHD: handling fault status reg 602
[DMA Read] Request device [86:05.6] fault addr 0 [fault reason 06] PTE Read 
access is not set
[DMA Read] Request device [86:05.6] fault addr 0 [fault reason 06] PTE Read 
access is not set
[DMA Read] Request device [86:05.6] fault addr 0 [fault reason 06] PTE Read 
access is not set

NOTE, the magical thing happen...(*Operation-4*) we write the PTE
of Operation-1 from 0 to 0x3 which means can Read/Write, and then
we trigger DMA read again, it success and return the data of HPA 0 !!

Why we modify the older page table would make sense ? As we
have discussed previously, the cache flush part of the driver is correct,
it call flush_iotlb after (b) and no need to flush after (c). But the result
of the experiment shows the older page table or older caches is effective
actually.

Any ideas ?

> Best regards,
> baolu


RE: A problem of Intel IOMMU hardware ?

2021-03-17 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Baolu,

> -Original Message-
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Wednesday, March 17, 2021 1:17 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> ; dw...@infradead.org; j...@8bytes.org;
> w...@kernel.org; alex.william...@redhat.com
> Cc: baolu...@linux.intel.com; io...@lists.linux-foundation.org; LKML
> ; Gonglei (Arei) ;
> chenjiashang 
> Subject: Re: A problem of Intel IOMMU hardware ?
> 
> Hi Longpeng,
> 
> On 3/17/21 11:16 AM, Longpeng (Mike, Cloud Infrastructure Service Product 
> Dept.)
> wrote:
> > Hi guys,
> >
> > We find the Intel iommu cache (i.e. iotlb) maybe works wrong in a
> > special situation, it would cause DMA fails or get wrong data.
> >
> > The reproducer (based on Alex's vfio testsuite[1]) is in attachment,
> > it can reproduce the problem with high probability (~50%).
> >
> > The machine we used is:
> > processor   : 47
> > vendor_id   : GenuineIntel
> > cpu family  : 6
> > model   : 85
> > model name  : Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz
> > stepping: 4
> > microcode   : 0x269
> >
> > And the iommu capability reported is:
> > ver 1:0 cap 8d2078c106f0466 ecap f020df (caching mode = 0 ,
> > page-selective invalidation = 1)
> >
> > (The problem is also on 'Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz'
> > and
> > 'Intel(R) Xeon(R) Platinum 8378A CPU @ 3.00GHz')
> >
> > We run the reproducer on Linux 4.18 and it works as follow:
> >
> > Step 1. alloc 4G *2M-hugetlb* memory (N.B. no problem with 4K-page
> > mapping)
> 
> I don't understand 2M-hugetlb here means exactly. The IOMMU hardware
> supports both 2M and 1G super page. The mapping physical memory is 4G.
> Why couldn't it use 1G super page?
> 

We use hugetlbfs(support 1G and 2M, but we choose 2M in our case) to request
the memory in userspace: 
vaddr = (unsigned long)mmap(0, MAP_SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | 
*MAP_HUGETLB*, 0, 0);

Yep, IOMMU support both 2M and 1G superpage, we just haven't to test the 1G case
yet, because our productions use 2M hugetlbfs page.

> > Step 2. DMA Map 4G memory
> > Step 3.
> >  while (1) {
> >  {UNMAP, 0x0, 0xa},  (a)
> >  {UNMAP, 0xc, 0xbff4},
> 
> Have these two ranges been mapped before? Does the IOMMU driver complains
> when you trying to unmap a range which has never been mapped? The IOMMU
> driver implicitly assumes that mapping and unmapping are paired.
> 

Of course yes, please Step-2, we DMA mapped all the memory(4G) before the while 
loop.
The driver never complained during MAP and UNMAP operations.

> >  {MAP,   0x0, 0xc000}, - (b)
> >  use GDB to pause at here, and then DMA read IOVA=0,
> 
> IOVA 0 seems to be a special one. Have you verified with other addresses than
> IOVA 0?
> 

Yes, we also test IOVA=0x1000, it has problem too.

But one of the differeces between (0x0, 0xa) and (0x0, 0xc000) is the 
former
can only use 4K mapping in DMA pagetable but the latter uses 2M mapping. Is it 
possible
the hardware cache management works something wrong in this case?

> >  sometimes DMA success (as expected),
> >  but sometimes DMA error (report not-present).
> >  {UNMAP, 0x0, 0xc000}, - (c)
> >  {MAP,   0x0, 0xa},
> >  {MAP,   0xc, 0xbff4},
> >  }
> >
> > The DMA read operations sholud success between (b) and (c), it should
> > NOT report not-present at least!
> >
> > After analysis the problem, we think maybe it's caused by the Intel iommu 
> > iotlb.
> > It seems the DMA Remapping hardware still uses the IOTLB or other caches of
> (a).
> >
> > When do DMA unmap at (a), the iotlb will be flush:
> >  intel_iommu_unmap
> >  domain_unmap
> >  iommu_flush_iotlb_psi
> >
> > When do DMA map at (b), no need to flush the iotlb according to the
> > capability of this iommu:
> >  intel_iommu_map
> >  domain_pfn_mapping
> >  domain_mapping
> >  __mapping_notify_one
> >  if (cap_caching_mode(iommu->cap)) // FALSE
> >  iommu_flush_iotlb_psi
> 
> That's true. The iotlb flushing is not needed in case of PTE been changed from
> non-present to present unless caching mode.
> 

Yes, I also think the driver code is correct. But it's so confused that the 
problem
is disappear if we force it to flush here.

> > But the problem will disappear if we FORCE flush here. So we suspect
> > the iommu hardware.
> >
> > Do you have any suggestion ?
> 
> Best regards,
> baolu


RE: A problem of Intel IOMMU hardware ?

2021-03-17 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Nadav,

> -Original Message-
> From: Nadav Amit [mailto:nadav.a...@gmail.com]
> Sent: Wednesday, March 17, 2021 1:46 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> 
> Cc: David Woodhouse ; Lu Baolu
> ; Joerg Roedel ; w...@kernel.org;
> alex.william...@redhat.com; chenjiashang ;
> io...@lists.linux-foundation.org; Gonglei (Arei) ;
> LKML 
> Subject: Re: A problem of Intel IOMMU hardware ?
> 
> 
> 
> > On Mar 16, 2021, at 8:16 PM, Longpeng (Mike, Cloud Infrastructure Service
> Product Dept.)  wrote:
> >
> > Hi guys,
> >
> > We find the Intel iommu cache (i.e. iotlb) maybe works wrong in a
> > special situation, it would cause DMA fails or get wrong data.
> >
> > The reproducer (based on Alex's vfio testsuite[1]) is in attachment,
> > it can reproduce the problem with high probability (~50%).
> 
> I saw Lu replied, and he is much more knowledgable than I am (I was just 
> intrigued
> by your email).
> 
> However, if I were you I would try also to remove some “optimizations” to 
> look for
> the root-cause (e.g., use domain specific invalidations instead of 
> page-specific).
> 

Good suggestion! But we did it these days, we tried to use global invalidations 
as follow:
iommu->flush.flush_iotlb(iommu, did, 0, 0,
DMA_TLB_DSI_FLUSH);
But can not resolve the problem.

> The first thing that comes to my mind is the invalidation hint (ih) in
> iommu_flush_iotlb_psi(). I would remove it to see whether you get the failure
> without it.

We also notice the IH, but the IH is always ZERO in our case, as the spec says:
'''
Paging-structure-cache entries caching second-level mappings associated with 
the specified
domain-id and the second-level-input-address range are invalidated, if the 
Invalidation Hint
(IH) field is Clear.
'''

It seems the software is everything fine, so we've no choice but to suspect the 
hardware.


A problem of Intel IOMMU hardware ?

2021-03-16 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi guys,

We find the Intel iommu cache (i.e. iotlb) maybe works wrong in a special
situation, it would cause DMA fails or get wrong data.

The reproducer (based on Alex's vfio testsuite[1]) is in attachment, it can
reproduce the problem with high probability (~50%).

The machine we used is:
processor   : 47
vendor_id   : GenuineIntel
cpu family  : 6
model   : 85
model name  : Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz
stepping: 4
microcode   : 0x269

And the iommu capability reported is:
ver 1:0 cap 8d2078c106f0466 ecap f020df
(caching mode = 0 , page-selective invalidation = 1)

(The problem is also on 'Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz' and
'Intel(R) Xeon(R) Platinum 8378A CPU @ 3.00GHz')

We run the reproducer on Linux 4.18 and it works as follow:

Step 1. alloc 4G *2M-hugetlb* memory (N.B. no problem with 4K-page mapping)
Step 2. DMA Map 4G memory
Step 3.
while (1) {
{UNMAP, 0x0, 0xa},  (a)
{UNMAP, 0xc, 0xbff4},
{MAP,   0x0, 0xc000}, - (b)
use GDB to pause at here, and then DMA read IOVA=0,
sometimes DMA success (as expected),
but sometimes DMA error (report not-present).
{UNMAP, 0x0, 0xc000}, - (c)
{MAP,   0x0, 0xa},
{MAP,   0xc, 0xbff4},
}

The DMA read operations sholud success between (b) and (c), it should NOT report
not-present at least!

After analysis the problem, we think maybe it's caused by the Intel iommu iotlb.
It seems the DMA Remapping hardware still uses the IOTLB or other caches of (a).

When do DMA unmap at (a), the iotlb will be flush:
intel_iommu_unmap
domain_unmap
iommu_flush_iotlb_psi

When do DMA map at (b), no need to flush the iotlb according to the capability
of this iommu:
intel_iommu_map
domain_pfn_mapping
domain_mapping
__mapping_notify_one
if (cap_caching_mode(iommu->cap)) // FALSE
iommu_flush_iotlb_psi
But the problem will disappear if we FORCE flush here. So we suspect the iommu
hardware.

Do you have any suggestion ?







/*
 * VFIO API definition
 *
 * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
 * Author: Alex Williamson 
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 as
 * published by the Free Software Foundation.
 */
#ifndef _UAPIVFIO_H
#define _UAPIVFIO_H

#include 
#include 

#define VFIO_API_VERSION0


/* Kernel & User level defines for VFIO IOCTLs. */

/* Extensions */

#define VFIO_TYPE1_IOMMU1

/*
 * The IOCTL interface is designed for extensibility by embedding the
 * structure length (argsz) and flags into structures passed between
 * kernel and userspace.  We therefore use the _IO() macro for these
 * defines to avoid implicitly embedding a size into the ioctl request.
 * As structure fields are added, argsz will increase to match and flag
 * bits will be defined to indicate additional fields with valid data.
 * It's *always* the caller's responsibility to indicate the size of
 * the structure passed by setting argsz appropriately.
 */

#define VFIO_TYPE   (';')
#define VFIO_BASE   100

/*  IOCTLs for VFIO file descriptor (/dev/vfio/vfio)  */

/**
 * VFIO_GET_API_VERSION - _IO(VFIO_TYPE, VFIO_BASE + 0)
 *
 * Report the version of the VFIO API.  This allows us to bump the entire
 * API version should we later need to add or change features in incompatible
 * ways.
 * Return: VFIO_API_VERSION
 * Availability: Always
 */
#define VFIO_GET_API_VERSION_IO(VFIO_TYPE, VFIO_BASE + 0)

/**
 * VFIO_CHECK_EXTENSION - _IOW(VFIO_TYPE, VFIO_BASE + 1, __u32)
 *
 * Check whether an extension is supported.
 * Return: 0 if not supported, 1 (or some other positive integer) if supported.
 * Availability: Always
 */
#define VFIO_CHECK_EXTENSION_IO(VFIO_TYPE, VFIO_BASE + 1)

/**
 * VFIO_SET_IOMMU - _IOW(VFIO_TYPE, VFIO_BASE + 2, __s32)
 *
 * Set the iommu to the given type.  The type must be supported by an
 * iommu driver as verified by calling CHECK_EXTENSION using the same
 * type.  A group must be set to this file descriptor before this
 * ioctl is available.  The IOMMU interfaces enabled by this call are
 * specific to the value set.
 * Return: 0 on success, -errno on failure
 * Availability: When VFIO group attached
 */
#define VFIO_SET_IOMMU  _IO(VFIO_TYPE, VFIO_BASE + 2)

/*  IOCTLs for GROUP file descriptors (/dev/vfio/$GROUP)  */

/**
 * VFIO_GROUP_GET_STATUS - _IOR(VFIO_TYPE, VFIO_BASE + 3,
 *  struct vfio_group_status)
 *
 * Retrieve information about the group.  Fills in provided
 * struct vfio_group_info.  Caller sets argsz.
 

Re: [PATCH] nitro_enclaves: set master in the procedure of NE probe

2021-01-24 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



在 2021/1/20 18:27, Paraschiv, Andra-Irina 写道:
> 
> 
> On 19/01/2021 05:30, Longpeng(Mike) wrote:
>> According the PCI spec:
>>    Bus Master Enable – Controls the ability of a PCI Express
>>    Endpoint to issue Memory and I/O Read/Write Requests, and
>>    the ability of a Root or Switch Port to forward Memory and
>>    I/O Read/Write Requests in the Upstream direction
>>
>> Set BusMaster to make the driver to be PCI conformant.
> 
> Could update the commit title and message body to reflect more the why and 
> what for the change. One option can be:
> 
> nitro_enclaves: Set Bus Master for the NE PCI device
> 
> Enable Bus Master for the NE PCI device, according to the PCI spec
> for submitting memory or I/O requests:
>   Bus Master Enable ...
> 
> 
> 
> Please include the changelog in the commit message for the next revision(s).
> 
> + Greg in CC, as the patches for the Nitro Enclaves kernel driver are first 
> included in the char misc tree, then in the linux next and finally in the 
> mainline.
> 
Will update the commit message in V2.

>>
>> Signed-off-by: Longpeng(Mike) 
>> ---
>>   drivers/virt/nitro_enclaves/ne_pci_dev.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
>> b/drivers/virt/nitro_enclaves/ne_pci_dev.c
>> index b9c1de4..143207e 100644
>> --- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
>> +++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
>> @@ -480,6 +480,8 @@ static int ne_pci_probe(struct pci_dev *pdev, const 
>> struct pci_device_id *id)
>>  goto free_ne_pci_dev;
>>  }
>>
>> +   pci_set_master(pdev);
> 
> I was looking if we need the reverse for this - pci_clear_master() [1] - on 
> the error and remove / shutdown codebase paths, but pci_disable_device() 
> seems to include the bus master disable logic [2][3].
> 
No need to call pci_clear_master.

> Thanks,
> Andra
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pci.c?h=v5.11-rc4#n4312
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pci.c?h=v5.11-rc4#n2104
> [3] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pci.c?h=v5.11-rc4#n4242
> 
>> +
>>  rc = pci_request_regions_exclusive(pdev, "nitro_enclaves");
>>  if (rc < 0) {
>>  dev_err(>dev, "Error in pci request regions 
>> [rc=%d]\n", rc);
>> -- 
>> 1.8.3.1
>>
> 
> 
> 
> 
> Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
> Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
> Romania. Registration number J22/2621/2005.


Re: [PATCH v3 1/3] crypto: virtio: Fix src/dst scatterlist calculation in __virtio_crypto_skcipher_do_req()

2020-06-10 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



On 2020/6/5 22:10, Sasha Levin wrote:
> <20200123101000.GB24255@Red>
> References: <20200602070501.2023-2-longpe...@huawei.com>
> <20200123101000.GB24255@Red>
> 
> Hi
> 
> [This is an automated email]
> 
> This commit has been processed because it contains a "Fixes:" tag
> fixing commit: dbaf0624ffa5 ("crypto: add virtio-crypto driver").
> 
> The bot has tested the following trees: v5.6.15, v5.4.43, v4.19.125, 
> v4.14.182.
> 
> v5.6.15: Build OK!
> v5.4.43: Failed to apply! Possible dependencies:
> eee1d6fca0a0 ("crypto: virtio - switch to skcipher API")
> 
> v4.19.125: Failed to apply! Possible dependencies:
> eee1d6fca0a0 ("crypto: virtio - switch to skcipher API")
> 
> v4.14.182: Failed to apply! Possible dependencies:
> 500e6807ce93 ("crypto: virtio - implement missing support for output IVs")
> 67189375bb3a ("crypto: virtio - convert to new crypto engine API")
> d0d859bb87ac ("crypto: virtio - Register an algo only if it's supported")
> e02b8b43f55a ("crypto: virtio - pr_err() strings should end with 
> newlines")
> eee1d6fca0a0 ("crypto: virtio - switch to skcipher API")
> 
> 
> NOTE: The patch will not be queued to stable trees until it is upstream.
> 
> How should we proceed with this patch?
> 
I've tried to adapt my patch to these stable tree, but it seems there're some
other bugs. so I think the best way to resolve these conflicts is to apply the
dependent patches detected.

If we apply these dependent patches, then the conflicts of other two patches:
 crypto: virtio: Fix use-after-free in virtio_crypto_skcipher_finalize_req
 crypto: virtio: Fix dest length calculation in __virtio_crypto_skcipher_do_req
will also be gone.

---
Regards,
Longpeng(Mike)


Re: [PATCH v2 2/2] crypto: virtio: Fix use-after-free in virtio_crypto_skcipher_finalize_req()

2020-06-01 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



On 2020/5/31 17:21, Michael S. Tsirkin wrote:
> On Tue, May 26, 2020 at 02:11:37PM +, Sasha Levin wrote:
>> <20200123101000.GB24255@Red>
>> References: <20200526031956.1897-3-longpe...@huawei.com>
>> <20200123101000.GB24255@Red>
>>
>> Hi
>>
>> [This is an automated email]
>>
>> This commit has been processed because it contains a "Fixes:" tag
>> fixing commit: dbaf0624ffa5 ("crypto: add virtio-crypto driver").
>>
>> The bot has tested the following trees: v5.6.14, v5.4.42, v4.19.124, 
>> v4.14.181.
>>
>> v5.6.14: Build OK!
>> v5.4.42: Failed to apply! Possible dependencies:
>> eee1d6fca0a0 ("crypto: virtio - switch to skcipher API")
>>
>> v4.19.124: Failed to apply! Possible dependencies:
>> eee1d6fca0a0 ("crypto: virtio - switch to skcipher API")
>>
>> v4.14.181: Failed to apply! Possible dependencies:
>> 500e6807ce93 ("crypto: virtio - implement missing support for output 
>> IVs")
>> 67189375bb3a ("crypto: virtio - convert to new crypto engine API")
>> d0d859bb87ac ("crypto: virtio - Register an algo only if it's supported")
>> e02b8b43f55a ("crypto: virtio - pr_err() strings should end with 
>> newlines")
>> eee1d6fca0a0 ("crypto: virtio - switch to skcipher API")
>>
>>
>> NOTE: The patch will not be queued to stable trees until it is upstream.
>>
>> How should we proceed with this patch?
> 
> Mike could you comment on backporting?
> 
Hi Michael,

I will send V3, so I will resolve these conflicts later. :)

>> -- 
>> Thanks
>> Sasha
> 
> .
> 
---
Regards,
Longpeng(Mike)


Re: [v2 2/2] crypto: virtio: Fix use-after-free in virtio_crypto_skcipher_finalize_req()

2020-05-26 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



On 2020/5/26 17:01, Markus Elfring wrote:
 … Thus release specific resources before
>>>
>>> Is there a need to improve also this information another bit?
>>>
>> You mean the last two paragraph is redundant ?
> 
> No.
> 
> I became curious if you would like to choose a more helpful information
> according to the wording “specific resources”.
> 
> Regards,
> Markus
> 
Hi Markus,

I respect your work, but please let us to focus on the code itself. I think
experts in this area know what these patches want to solve after look at the 
code.

I hope experts in the thread could review the code when you free, thanks :)

---
Regards,
Longpeng(Mike)


Re: [PATCH v2 2/2] crypto: virtio: Fix use-after-free in virtio_crypto_skcipher_finalize_req()

2020-05-26 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Markus,

On 2020/5/26 15:19, Markus Elfring wrote:
>> The system'll crash when the users insmod crypto/tcrypto.ko with mode=155
>> ( testing "authenc(hmac(sha1),cbc(aes))" ). It's caused by reuse the memory
>> of request structure.
> 
> Wording adjustments:
> * … system will crash …
> * … It is caused by reusing the …
> 
> 
>> when these memory will be used again.
> 
> when this memory …
> 
OK.

> 
>> … Thus release specific resources before
> 
> Is there a need to improve also this information another bit?
> 
You mean the last two paragraph is redundant ?
'''
When the virtio_crypto driver finish skcipher req, it'll call ->complete
callback(in crypto_finalize_skcipher_request) and then free its
resources whose pointers are recorded in 'skcipher parts'.

However, the ->complete is 'crypto_authenc_encrypt_done' in this case,
it will use the 'ahash part' of the request and change its content,
so virtio_crypto driver will get the wrong pointer after ->complete
finish and mistakenly free some other's memory. So the system will crash
when these memory will be used again.

The resources which need to be cleaned up are not used any more. But the
pointers of these resources may be changed in the function
"crypto_finalize_skcipher_request". Thus release specific resources before
calling this function.
'''

How about:
'''
When the virtio_crypto driver finish the skcipher request, it will call the
function "crypto_finalize_skcipher_request()" and then free the resources whose
pointers are stored in the 'skcipher parts', but the pointers of these resources
 may be changed in that function. Thus fix it by releasing these resources
befored calling the function "crypto_finalize_skcipher_request()".
'''


> Regards,
> Markus
> 
---
Regards,
Longpeng(Mike)


Re: [PATCH v2 1/2] crypto: virtio: Fix src/dst scatterlist calculation in __virtio_crypto_skcipher_do_req()

2020-05-26 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Markus,

On 2020/5/26 15:03, Markus Elfring wrote:
>> Fix it by sg_next() on calculation of src/dst scatterlist.
> 
> Wording adjustment:
> … by calling the function “sg_next” …
> 
OK, thanks.

> 
> …
>> +++ b/drivers/crypto/virtio/virtio_crypto_algs.c
>> @@ -350,13 +350,18 @@ __virtio_crypto_skcipher_do_req(struct 
>> virtio_crypto_sym_request *vc_sym_req,
> …
>>  src_nents = sg_nents_for_len(req->src, req->cryptlen);
>> +if (src_nents < 0) {
>> +pr_err("Invalid number of src SG.\n");
>> +return src_nents;
>> +}
>> +
>>  dst_nents = sg_nents(req->dst);
> …
> 
> I suggest to move the addition of such input parameter validation
> to a separate update step.
> 
Um...The 'src_nents' will be used as a loop condition, so validate it here is
needed ?

'''
/* Source data */
-   for (i = 0; i < src_nents; i++)
-   sgs[num_out++] = >src[i];
+   for (sg = req->src; src_nents; sg = sg_next(sg), src_nents--)
+   sgs[num_out++] = sg;
'''

> Regards,
> Markus
> 

-- 
---
Regards,
Longpeng(Mike)


Re: [2/2] crypto: virtio: Fix use-after-free in virtio_crypto_skcipher_finalize_req()

2020-05-25 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



On 2020/5/25 15:36, Markus Elfring wrote:
>> Could you help me to make the sentence better?
> 
> How do you think about a wording variant like the following?
> 
>   So the system will crash when this memory will be used again.
> 
Uh, it's much better, thanks.

> 
>>> * You proposed to move a call of the function 
>>> “crypto_finalize_skcipher_request”.
>>>   How does this change fit to the mentioned position?
>>>
>> The resources which need to be freed is not used anymore, but the pointers
>> of these resources may be changed in the function
>> "crypto_finalize_skcipher_request", so free these resources before call the
>> function is suitable.
> 
> Another alternative:
>   The resources which need to be cleaned up are not used any more.
>   But the pointers of these resources may be changed in the
>   function “crypto_finalize_skcipher_request”.
>   Thus release specific resources before calling this function.
> 
Oh great! Thanks.

> Regards,
> Markus
> 

-- 
---
Regards,
Longpeng(Mike)


Re: [PATCH 2/2] crypto: virtio: Fix use-after-free in virtio_crypto_skcipher_finalize_req()

2020-05-25 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Markus,

On 2020/5/25 14:30, Markus Elfring wrote:
>> … So the system will crash
>> at last when this memory be used again.
> 
> I would prefer a wording with less typos here.
> 
Could you help me to make the sentence better?

> 
>> We can free the resources before calling ->complete to fix this issue.
> 
> * An imperative wording can be nicer.
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=9cb1fd0efd195590b828b9b865421ad345a4a145#n151
> 
I'll try.

> * You proposed to move a call of the function 
> “crypto_finalize_skcipher_request”.
>   How does this change fit to the mentioned position?
> 
The resources which need to be freed is not used anymore, but the pointers
of these resources may be changed in the function
"crypto_finalize_skcipher_request", so free these resources before call the
function is suitable.

> * Would you like to add the tag “Fixes” to the commit message?
>
OK.

> Regards,
> Markus
> 

-- 
---
Regards,
Longpeng(Mike)


Re: [PATCH 1/2] crypto: virtio: Fix src/dst scatterlist calculation in __virtio_crypto_skcipher_do_req()

2020-05-25 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Markus,

On 2020/5/25 14:05, Markus Elfring wrote:
>> The system will crash when we insmod crypto/tcrypt.ko whit mode=38.
> 
> * I suggest to use the word “with” in this sentence.
> 
OK, it's a typo.

> * Will it be helpful to explain the passed mode number?
> 
> 
>> BTW I add a check for sg_nents_for_len() its return value since
>> sg_nents_for_len() function could fail.
> 
> Please reconsider also development consequences for this suggestion.
> Will a separate update step be more appropriate for the addition of
> an input parameter validation?
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=9cb1fd0efd195590b828b9b865421ad345a4a145#n138
> 
> Would you like to add the tag “Fixes” to the commit message?
>
Will take all of your suggestions in v2, thanks.

> Regards,
> Markus
> 

-- 
---
Regards,
Longpeng(Mike)


Re: [PATCH 1/2] crypto: virtio: fix src/dst scatterlist calculation

2020-05-24 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Jason,

On 2020/5/25 11:12, Jason Wang wrote:
> 
> On 2020/5/25 上午8:56, Longpeng(Mike) wrote:
>> The system will crash when we insmod crypto/tcrypt.ko whit mode=38.
>>
>> Usually the next entry of one sg will be @sg@ + 1, but if this sg element
>> is part of a chained scatterlist, it could jump to the start of a new
>> scatterlist array. Let's fix it by sg_next() on calculation of src/dst
>> scatterlist.
>>
>> BTW I add a check for sg_nents_for_len() its return value since
>> sg_nents_for_len() function could fail.
>>
>> Cc: Gonglei 
>> Cc: Herbert Xu 
>> Cc: "Michael S. Tsirkin" 
>> Cc: Jason Wang 
>> Cc: "David S. Miller" 
>> Cc: virtualizat...@lists.linux-foundation.org
>> Cc: linux-kernel@vger.kernel.org
>>
>> Reported-by: LABBE Corentin 
>> Signed-off-by: Gonglei 
>> Signed-off-by: Longpeng(Mike) 
>> ---
>>   drivers/crypto/virtio/virtio_crypto_algs.c | 14 ++
>>   1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c
>> b/drivers/crypto/virtio/virtio_crypto_algs.c
>> index 372babb44112..2fa1129f96d6 100644
>> --- a/drivers/crypto/virtio/virtio_crypto_algs.c
>> +++ b/drivers/crypto/virtio/virtio_crypto_algs.c
>> @@ -359,8 +359,14 @@ __virtio_crypto_skcipher_do_req(struct
>> virtio_crypto_sym_request *vc_sym_req,
>>   unsigned int num_out = 0, num_in = 0;
>>   int sg_total;
>>   uint8_t *iv;
>> +    struct scatterlist *sg;
>>     src_nents = sg_nents_for_len(req->src, req->cryptlen);
>> +    if (src_nents < 0) {
>> +    pr_err("Invalid number of src SG.\n");
>> +    return src_nents;
>> +    }
>> +
>>   dst_nents = sg_nents(req->dst);
>>     pr_debug("virtio_crypto: Number of sgs (src_nents: %d, dst_nents: 
>> %d)\n",
>> @@ -446,12 +452,12 @@ __virtio_crypto_skcipher_do_req(struct
>> virtio_crypto_sym_request *vc_sym_req,
>>   vc_sym_req->iv = iv;
>>     /* Source data */
>> -    for (i = 0; i < src_nents; i++)
>> -    sgs[num_out++] = >src[i];
>> +    for (sg = req->src, i = 0; sg && i < src_nents; sg = sg_next(sg), i++)
> 
> 
> Any reason sg is checked here?
> 
> I believe it should be checked in sg_nents_for_len().
> 
Do you means:
for (sg = req->src, i = 0; i < src_nents; sg = sg_next(sg), i++) ?

> 
>> +    sgs[num_out++] = sg;
>>     /* Destination data */
>> -    for (i = 0; i < dst_nents; i++)
>> -    sgs[num_out + num_in++] = >dst[i];
>> +    for (sg = req->dst, i = 0; sg && i < dst_nents; sg = sg_next(sg), i++)
>> +    sgs[num_out + num_in++] = sg;
> 
> 
> I believe sg should be checked in sg_nents().
>
How about
for (sg = req->dst; sg; sg = sg_next(sg)) ?

> Thanks
> 
> 
>>     /* Status */
>>   sg_init_one(_sg, _req->status, sizeof(vc_req->status));
> 
> .
> 

-- 
---
Regards,
Longpeng(Mike)