On Fri, 19 Jul 2019 16:27:04 +0800 Lu Baolu <baolu...@linux.intel.com> wrote:
> Hi Alex, > > On 7/19/19 7:16 AM, Alex Williamson wrote: > > On Wed, 12 Jun 2019 08:28:51 +0800 > > Lu Baolu <baolu...@linux.intel.com> wrote: > > > >> The domain_init() and md_domain_init() do almost the same job. > >> Consolidate them to avoid duplication. > >> > >> Signed-off-by: Lu Baolu <baolu...@linux.intel.com> > >> --- > >> drivers/iommu/intel-iommu.c | 123 +++++++++++------------------------- > >> 1 file changed, 36 insertions(+), 87 deletions(-) > >> > >> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > >> index 5215dcd535a1..b8c6cf1d5f90 100644 > >> --- a/drivers/iommu/intel-iommu.c > >> +++ b/drivers/iommu/intel-iommu.c > >> @@ -1825,63 +1825,6 @@ static inline int guestwidth_to_adjustwidth(int gaw) > >> return agaw; > >> } > >> > >> -static int domain_init(struct dmar_domain *domain, struct intel_iommu > >> *iommu, > >> - int guest_width) > >> -{ > >> - int adjust_width, agaw; > >> - unsigned long sagaw; > >> - int err; > >> - > >> - init_iova_domain(&domain->iovad, VTD_PAGE_SIZE, IOVA_START_PFN); > >> - > >> - err = init_iova_flush_queue(&domain->iovad, > >> - iommu_flush_iova, iova_entry_free); > >> - if (err) > >> - return err; > >> - > >> - domain_reserve_special_ranges(domain); > >> - > >> - /* calculate AGAW */ > >> - if (guest_width > cap_mgaw(iommu->cap)) > >> - guest_width = cap_mgaw(iommu->cap); > >> - domain->gaw = guest_width; > >> - adjust_width = guestwidth_to_adjustwidth(guest_width); > >> - agaw = width_to_agaw(adjust_width); > >> - sagaw = cap_sagaw(iommu->cap); > >> - if (!test_bit(agaw, &sagaw)) { > >> - /* hardware doesn't support it, choose a bigger one */ > >> - pr_debug("Hardware doesn't support agaw %d\n", agaw); > >> - agaw = find_next_bit(&sagaw, 5, agaw); > >> - if (agaw >= 5) > >> - return -ENODEV; > >> - } > >> - domain->agaw = agaw; > >> - > >> - if (ecap_coherent(iommu->ecap)) > >> - domain->iommu_coherency = 1; > >> - else > >> - domain->iommu_coherency = 0; > >> - > >> - if (ecap_sc_support(iommu->ecap)) > >> - domain->iommu_snooping = 1; > >> - else > >> - domain->iommu_snooping = 0; > >> - > >> - if (intel_iommu_superpage) > >> - domain->iommu_superpage = fls(cap_super_page_val(iommu->cap)); > >> - else > >> - domain->iommu_superpage = 0; > >> - > >> - domain->nid = iommu->node; > >> - > >> - /* always allocate the top pgd */ > >> - domain->pgd = (struct dma_pte *)alloc_pgtable_page(domain->nid); > >> - if (!domain->pgd) > >> - return -ENOMEM; > >> - __iommu_flush_cache(iommu, domain->pgd, PAGE_SIZE); > >> - return 0; > >> -} > >> - > >> static void domain_exit(struct dmar_domain *domain) > >> { > >> struct page *freelist; > >> @@ -2563,6 +2506,31 @@ static int get_last_alias(struct pci_dev *pdev, u16 > >> alias, void *opaque) > >> return 0; > >> } > >> > >> +static int domain_init(struct dmar_domain *domain, int guest_width) > >> +{ > >> + int adjust_width; > >> + > >> + init_iova_domain(&domain->iovad, VTD_PAGE_SIZE, IOVA_START_PFN); > >> + domain_reserve_special_ranges(domain); > >> + > >> + /* calculate AGAW */ > >> + domain->gaw = guest_width; > >> + adjust_width = guestwidth_to_adjustwidth(guest_width); > >> + domain->agaw = width_to_agaw(adjust_width); > > > > > > How do we justify that domain->agaw is nothing like it was previously > > here? I spent some more time working on the failure to boot that I > > thought was caused by 4ec066c7b147, but there are so many breakages and > > fixes in Joerg's x86/vt-d branch that I think my bisect zero'd in on > > the wrong one. Instead I cherry-picked every commit from Joerg's tree > > and matched Fixes patches to their original commit, which led me to > > this patch, mainline commit 123b2ffc376e. The issue I'm seeing is that > > we call domain_context_mapping_one() and we are in this section: > > > > struct dma_pte *pgd = domain->pgd; > > int agaw; > > > > context_set_domain_id(context, did); > > > > if (translation != CONTEXT_TT_PASS_THROUGH) { > > /* > > * Skip top levels of page tables for iommu which has > > * less agaw than default. Unnecessary for PT mode. > > */ > > for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) { > > ret = -ENOMEM; > > pgd = phys_to_virt(dma_pte_addr(pgd)); > > if (!dma_pte_present(pgd)) > > goto out_unlock; > > } > > > > Prior to this commit, we had domain->agaw=1 and iommu->agaw=1, so we > > don't enter the loop. With this commit, we have domain->agaw=3, > > iommu->agaw=1 with pgd->val=0! > > > > iommu->agaw presents the level of page table which is by default > supported by the @iommu. domain->agaw presents the level of page > table which is used by the @domain. > > agaw = 1: 3-level page table > agaw = 2: 4-level page table > agaw = 3: 5-level page table > > The case here is that @iommu only supports 3-level page table, but the > @domain was set to use 5-level page table. So we must skip level 4 and 5 > page tables of the @domain. > > This code in the loop looks odd to me. It will always goto to unlock and > leave pgd->val==0. How about below change? (not tested yet!) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 412f18aba501..98d6878cd29d 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -2020,11 +2020,15 @@ static int domain_context_mapping_one(struct > dmar_domain *domain, > * Skip top levels of page tables for iommu > which has > * less agaw than default. Unnecessary for PT mode. > */ > - for (agaw = domain->agaw; agaw > iommu->agaw; > agaw--) { > - ret = -ENOMEM; > - pgd = phys_to_virt(dma_pte_addr(pgd)); > - if (!dma_pte_present(pgd)) > - goto out_unlock; > + while (iommu->agaw < domain->agaw) { > + struct dma_pte *pte; > + > + pte = domain->pgd; > + if (dma_pte_present(pte)) { > + domain->pgd = > phys_to_virt(dma_pte_addr(pte)); > + free_pgtable_page(pte); > + } > + domain->agaw--; > } > > info = iommu_support_dev_iotlb(domain, iommu, > bus, devfn); The system still fails to boot, now with DMAR faults indicating invalid context entry. > > I don't really follow how the setting of these fields above is > > equivalent to what they were previously or if they're supposed to be > > updated lazily, but the current behavior is non-functional. Commit > > 123b2ffc376e can be reverted with only a bit of offset, which brings > > Linus' tree back into working operation for me. Should we revert or is > > there an obvious fix here? Thanks, > > This commit is not the root cause of this issue as far as I can see, it > only triggers above loop to get entered. The system fails to boot with this commit, it does boot without this commit. Whether this commit exposed a latent issue or introduced a bug itself is academic. Thanks, Alex