On Fri, 19 Jul 2019 16:27:04 +0800
Lu Baolu <baolu...@linux.intel.com> wrote:

> Hi Alex,
> 
> On 7/19/19 7:16 AM, Alex Williamson wrote:
> > On Wed, 12 Jun 2019 08:28:51 +0800
> > Lu Baolu <baolu...@linux.intel.com> wrote:
> >   
> >> The domain_init() and md_domain_init() do almost the same job.
> >> Consolidate them to avoid duplication.
> >>
> >> Signed-off-by: Lu Baolu <baolu...@linux.intel.com>
> >> ---
> >>   drivers/iommu/intel-iommu.c | 123 +++++++++++-------------------------
> >>   1 file changed, 36 insertions(+), 87 deletions(-)
> >>
> >> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> >> index 5215dcd535a1..b8c6cf1d5f90 100644
> >> --- a/drivers/iommu/intel-iommu.c
> >> +++ b/drivers/iommu/intel-iommu.c
> >> @@ -1825,63 +1825,6 @@ static inline int guestwidth_to_adjustwidth(int gaw)
> >>    return agaw;
> >>   }
> >>   
> >> -static int domain_init(struct dmar_domain *domain, struct intel_iommu 
> >> *iommu,
> >> -                 int guest_width)
> >> -{
> >> -  int adjust_width, agaw;
> >> -  unsigned long sagaw;
> >> -  int err;
> >> -
> >> -  init_iova_domain(&domain->iovad, VTD_PAGE_SIZE, IOVA_START_PFN);
> >> -
> >> -  err = init_iova_flush_queue(&domain->iovad,
> >> -                              iommu_flush_iova, iova_entry_free);
> >> -  if (err)
> >> -          return err;
> >> -
> >> -  domain_reserve_special_ranges(domain);
> >> -
> >> -  /* calculate AGAW */
> >> -  if (guest_width > cap_mgaw(iommu->cap))
> >> -          guest_width = cap_mgaw(iommu->cap);
> >> -  domain->gaw = guest_width;
> >> -  adjust_width = guestwidth_to_adjustwidth(guest_width);
> >> -  agaw = width_to_agaw(adjust_width);
> >> -  sagaw = cap_sagaw(iommu->cap);
> >> -  if (!test_bit(agaw, &sagaw)) {
> >> -          /* hardware doesn't support it, choose a bigger one */
> >> -          pr_debug("Hardware doesn't support agaw %d\n", agaw);
> >> -          agaw = find_next_bit(&sagaw, 5, agaw);
> >> -          if (agaw >= 5)
> >> -                  return -ENODEV;
> >> -  }
> >> -  domain->agaw = agaw;
> >> -
> >> -  if (ecap_coherent(iommu->ecap))
> >> -          domain->iommu_coherency = 1;
> >> -  else
> >> -          domain->iommu_coherency = 0;
> >> -
> >> -  if (ecap_sc_support(iommu->ecap))
> >> -          domain->iommu_snooping = 1;
> >> -  else
> >> -          domain->iommu_snooping = 0;
> >> -
> >> -  if (intel_iommu_superpage)
> >> -          domain->iommu_superpage = fls(cap_super_page_val(iommu->cap));
> >> -  else
> >> -          domain->iommu_superpage = 0;
> >> -
> >> -  domain->nid = iommu->node;
> >> -
> >> -  /* always allocate the top pgd */
> >> -  domain->pgd = (struct dma_pte *)alloc_pgtable_page(domain->nid);
> >> -  if (!domain->pgd)
> >> -          return -ENOMEM;
> >> -  __iommu_flush_cache(iommu, domain->pgd, PAGE_SIZE);
> >> -  return 0;
> >> -}
> >> -
> >>   static void domain_exit(struct dmar_domain *domain)
> >>   {
> >>    struct page *freelist;
> >> @@ -2563,6 +2506,31 @@ static int get_last_alias(struct pci_dev *pdev, u16 
> >> alias, void *opaque)
> >>    return 0;
> >>   }
> >>   
> >> +static int domain_init(struct dmar_domain *domain, int guest_width)
> >> +{
> >> +  int adjust_width;
> >> +
> >> +  init_iova_domain(&domain->iovad, VTD_PAGE_SIZE, IOVA_START_PFN);
> >> +  domain_reserve_special_ranges(domain);
> >> +
> >> +  /* calculate AGAW */
> >> +  domain->gaw = guest_width;
> >> +  adjust_width = guestwidth_to_adjustwidth(guest_width);
> >> +  domain->agaw = width_to_agaw(adjust_width);  
> > 
> > 
> > How do we justify that domain->agaw is nothing like it was previously
> > here?  I spent some more time working on the failure to boot that I
> > thought was caused by 4ec066c7b147, but there are so many breakages and
> > fixes in Joerg's x86/vt-d branch that I think my bisect zero'd in on
> > the wrong one.  Instead I cherry-picked every commit from Joerg's tree
> > and matched Fixes patches to their original commit, which led me to
> > this patch, mainline commit 123b2ffc376e.  The issue I'm seeing is that
> > we call domain_context_mapping_one() and we are in this section:
> > 
> >          struct dma_pte *pgd = domain->pgd;
> >          int agaw;
> > 
> >          context_set_domain_id(context, did);
> > 
> >          if (translation != CONTEXT_TT_PASS_THROUGH) {
> >                  /*
> >                   * Skip top levels of page tables for iommu which has
> >                   * less agaw than default. Unnecessary for PT mode.
> >                   */
> >                  for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
> >                          ret = -ENOMEM;
> >                          pgd = phys_to_virt(dma_pte_addr(pgd));
> >                          if (!dma_pte_present(pgd))
> >                                  goto out_unlock;
> >                  }
> > 
> > Prior to this commit, we had domain->agaw=1 and iommu->agaw=1, so we
> > don't enter the loop.  With this commit, we have domain->agaw=3,
> > iommu->agaw=1 with pgd->val=0!
> >   
> 
> iommu->agaw presents the level of page table which is by default
> supported by the @iommu. domain->agaw presents the level of page
> table which is used by the @domain.
> 
> agaw = 1: 3-level page table
> agaw = 2: 4-level page table
> agaw = 3: 5-level page table
> 
> The case here is that @iommu only supports 3-level page table, but the
> @domain was set to use 5-level page table. So we must skip level 4 and 5
> page tables of the @domain.
> 
> This code in the loop looks odd to me. It will always goto to unlock and
> leave pgd->val==0. How about below change? (not tested yet!)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 412f18aba501..98d6878cd29d 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2020,11 +2020,15 @@ static int domain_context_mapping_one(struct 
> dmar_domain *domain,
>                           * Skip top levels of page tables for iommu 
> which has
>                           * less agaw than default. Unnecessary for PT mode.
>                           */
> -                       for (agaw = domain->agaw; agaw > iommu->agaw; 
> agaw--) {
> -                               ret = -ENOMEM;
> -                               pgd = phys_to_virt(dma_pte_addr(pgd));
> -                               if (!dma_pte_present(pgd))
> -                                       goto out_unlock;
> +                       while (iommu->agaw < domain->agaw) {
> +                               struct dma_pte *pte;
> +
> +                               pte = domain->pgd;
> +                               if (dma_pte_present(pte)) {
> +                                       domain->pgd = 
> phys_to_virt(dma_pte_addr(pte));
> +                                       free_pgtable_page(pte);
> +                               }
> +                               domain->agaw--;
>                          }
> 
>                          info = iommu_support_dev_iotlb(domain, iommu, 
> bus, devfn);


The system still fails to boot, now with DMAR faults indicating invalid
context entry.
 
> > I don't really follow how the setting of these fields above is
> > equivalent to what they were previously or if they're supposed to be
> > updated lazily, but the current behavior is non-functional.  Commit
> > 123b2ffc376e can be reverted with only a bit of offset, which brings
> > Linus' tree back into working operation for me.  Should we revert or is
> > there an obvious fix here?  Thanks,  
> 
> This commit is not the root cause of this issue as far as I can see, it
> only triggers above loop to get entered.

The system fails to boot with this commit, it does boot without this
commit.  Whether this commit exposed a latent issue or introduced a bug
itself is academic.  Thanks,

Alex

Reply via email to