x86 page_fault not succeeding when mapping write-only

2020-11-16 Thread James Sewart
Hey,

I’m looking into an issue after remapping some memory on 4.19, but looking 
at the code this may also be an issue in master.

I have a driver that grabs some pages using alloc_pages, these pages are 
then remapped to userspace using calls to vm_insert_page inside a syfs 
bin_attribute mmap handler. Userspace calls mmap64 on the sysfs file with 
MAP_SHARED and read/write permissions. If the process reads or writes to 
the mapping at this point it is fine and works. The issue occurs if the 
process calls mprotect with write-only permissions, before reading/writing 
to the mapping, then when writing I see the page fault handler doesn’t set 
up the page tables and the process spins entering the fault handler and 
exiting forever.

I tracked the return of page_fault down to a section in function 
do_page_mkwrite:

ret = vmf->vma->vm_ops->page_mkwrite(vmf);
/* Restore original flags so that caller is not surprised */
vmf->flags = old_flags;
if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))
return ret;
if (unlikely(!(ret & VM_FAULT_LOCKED))) {
lock_page(page);
if (!page->mapping) {
unlock_page(page);
return 0; /* retry */ <- we return here
}

A 0 return here means wp_page_shared will return before setting up the pte.

This is a snippet of the call stack:

do_page_mkwrite at mm/memory.c:2404
wp_page_shared at mm/memory.c:2696
do_wp_page at mm/memory.c:2797
handle_pte_fault at mm/memory.c:4063
__handle_mm_fault at mm/memory.c:4171


We hit the (marked unlikely) condition in do_wp_page of the vma being 
VM_WRITE and VM_SHARED only, which is why I think I only see the issue 
when calling mprotect with write-only. Thinking about it now I haven’t 
tried calling mmap with write-only to see what happens.

I think the issue is this vma has vm_ops associated with the kernfs ops, 
but as the page was allocated outside of the filesystem stuff, it doesn’t 
have the kernfs address_space_operations associated with it. 
kernfs_vma_page_mkwrite returns 0 indicating it didn’t lock the page, but 
do_page_mkwrite requires the page to have a mapping in this case.

I’m not sure what the solution is, I can’t figure out how to associate the 
page with kernfs so this condition is satisfied. What is this check for?

Should kernfs_vma_page_mkwrite lock the page? Or maybe it should set 
page->mapping?

Is there something I can do in my driver to the pages or vma to avoid 
hitting this issue? I looked through some other kernel code and it seems 
to me use of the vmalloc api or dma-iommu may hit the same issue.

Cheers,
James.

Re: [PATCH v2 0/7] iommu/vt-d: Fix-up device-domain relationship by refactoring to use iommu group default domain.

2019-03-22 Thread James Sewart
Hey Lu,

> On 20 Mar 2019, at 01:26, Lu Baolu  wrote:
> 
> Hi James,
> 
> On 3/19/19 9:35 PM, James Sewart wrote:
>> Hey Lu,
>>> On 15 Mar 2019, at 03:13, Lu Baolu  wrote:
>>> 
>>> Hi James,
>>> 
>>> On 3/14/19 7:56 PM, James Sewart wrote:
>>>> Patches 1 and 2 are the same as v1.
>>>> v1-v2:
>>>>   Refactored ISA direct mappings to be returned by iommu_get_resv_regions.
>>>>   Integrated patch by Lu to defer turning on DMAR until iommu.c has mapped
>>>> reserved regions.
>>>>   Integrated patches by Lu to remove more unused code in cleanup.
>>>> Lu: I didn't integrate your patch to set the default domain type as it
>>>> isn't directly related to the aim of this patchset. Instead patch 4
>>> 
>>> Without those patches, user experience will be affected and some devices
>>> will not work on Intel platforms anymore.
>>> 
>>> For a long time, Intel IOMMU driver has its own logic to determine
>>> whether a device requires an identity domain. For example, when user
>>> specifies "iommu=pt" in kernel parameter, all device will be attached
>>> with the identity domain. Further more, some quirky devices require
>>> an identity domain to be used before enabling DMA remapping, otherwise,
>>> it will not work. This was done by adding quirk bits in Intel IOMMU
>>> driver.
>>> 
>>> So from my point of view, one way is porting all those quirks and kernel
>>> parameters into IOMMU generic layer, or opening a door for vendor IOMMU
>>> driver to determine the default domain type by their own. I prefer the
>>> latter option since it will not impact any behaviors on other
>>> architectures.
>> I see your point. I’m not confident that using the proposed door to set a
>> groups default domain has the desired behaviour. As discussed before the
>> default domain type will be set based on the desired type for only the
>> first device attached to a group. I think to change the default domain
>> type you would need a slightly different door that wasn’t conditioned on
>> device.
> 
> I think this as another problem. Just a summarize for the ease of
> discussion. We saw two problems:
> 
> 1. When allocating a new group for a device, how should we determine the
> type of the default domain? This is what my proposal patches trying to
> address.

This will work as expected only if all devices within a group get the same 
result from is_identity_map. Otherwise wee see issue 2.

> 
> 2. If we need to put a device into an existing group which uses a
> different type of domain from what the device desires to use, we might
> break the functionality of the device. For this problem I'd second your
> proposal below if I get your point correctly.
> 
>> For situations where individual devices require an identity domain because
>> of quirks then maybe calling is_identity_map per device in
>> iommu_group_get_for_dev is a better solution than the one I proposed.
> 
> Do you mean if we see a quirky device requires a different domain type
> other than the default domain type of the group, we will assign a new
> group to it? That looks good to me as far as I can see. I suppose this
> should be done in vt-d's ops callback.

I have thought about this a bit and I think the cleanest approach is to 
put devices that require an identity domain into their own group, this can 
be done in the device_group callback, avoiding any situation where we have 
to deal with creating a new group based on domain type in 
iommu_group_get_for_dev. This way we shouldn’t ever be in a situation with 
multiple different domain types per group. This way your patches will work 
as expected. See below for a possible implementation.

> 
> Best regards,
> Lu Baolu

Cheers,
James.

Devices that require an identity map because of quirks or other reasons
should be put in their own IOMMU group so as to not end up with multiple
different domains per group.

Signed-off-by: James Sewart 

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3cb8b36abf50..0f5a121d31a0 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5421,6 +5421,22 @@ struct intel_iommu *intel_svm_device_to_iommu(struct 
device *dev)
 }
 #endif /* CONFIG_INTEL_IOMMU_SVM */

+static struct iommu_group *intel_identity_group;
+
+struct iommu_group *intel_iommu_pci_device_group(struct device *dev)
+{
+   if (iommu_no_mapping(dev)) {
+   if (!intel_identity_group) {
+   intel_identity_group = iommu_group_alloc();
+   if (IS_ERR(intel_identity_group))
+   re

Re: [PATCH 3/4] iommu/vt-d: Allow IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_IDENTITY to be allocated

2019-03-09 Thread James Sewart
Hey Dmitry,

> On 8 Mar 2019, at 01:20, Dmitry Safonov  wrote:
> 
> On 3/4/19 3:46 PM, James Sewart wrote:
>> +static inline int domain_is_initialised(struct dmar_domain *domain)
>> +{
>> +return domain->flags & DOMAIN_FLAG_INITIALISED;
>> +}
> 
> Maybe check it in intel_iommu_domain_free(), eh?

Good shout, freeing a non attached IOMMU_DOMAIN_DMA domain is now 
different to an attached one.

> 
> Thanks,
>  Dmitry


Cheers,
James.


Re: [PATCH 4/4] iommu/vt-d: Remove lazy allocation of domains

2019-03-09 Thread James Sewart
Hey Lu,

> On 9 Mar 2019, at 01:53, Lu Baolu  wrote:
> 
> Hi James,
> 
> On 3/9/19 12:57 AM, James Sewart wrote:
>> Hey Lu,
>>> On 8 Mar 2019, at 03:09, Lu Baolu  wrote:
>>>>> 
>>>>> Do you mind if I work on top of your patches for further cleanups and
>>>>> sign off a v2 together with you?
>>>> Sure, sounds good. I’ll fixup patch 3 and have a go at integrating
>>>> iommu_prepare_isa into get_resv_regions. This should make the initial
>>>> domain logic here quite concise.
>>> Here attached three extra patches which I think should be added before
>>> PATCH 3/4, and some further cleanup changes which you can merge with
>>> PATCH 4/4.
>>> 
>>> 
>>> 
>>> 0006-iommu-Add-ops-entry-for-vendor-specific-default-doma.patch
>>> 0007-iommu-vt-d-Add-is_identity_map-ops-entry.patch
>>> 
>>> These two patches aim to add a generic method for vendor specific iommu
>>> drivers to specify the type of the default domain for a device. Intel
>>> iommu driver will register an ops for this since it already has its own
>>> identity map adjudicator for a long time.
>> This seems like a good idea, but as domain alloc is only called for the
>> default domain on first device attached to a group, we may miss checking
>> whether a device added later should have an identity domain. Should there
>> be paths to downgrade a groups domain if one of the devices requires one?
> 
> Good catch!
> 
> This is supposed to be handled in iommu_no_mapping(). But, obviously
> current code sticks to lazy domain allocation. I'm not sure whether
> there are any real such cases, but we should handle it in a clean way.
> My idea is that we could downgrade to multiple domains per group (just
> like what we have now) in this case and print a kernel message for this.

I think if a device requires an identity domain, then it should ignore 
attempts to attach to something else. A print to warn a user about this 
would be a good idea.

I figure during attach: if iommu_no_mapping() then attach to si_domain and 
print, else continue with the given domain.

> 
> Or any other thoughts?
> 
> Best regards,
> Lu Baolu

Cheers,
James.

Re: [PATCH 4/4] iommu/vt-d: Remove lazy allocation of domains

2019-03-08 Thread James Sewart
Hey Lu,

> On 8 Mar 2019, at 03:09, Lu Baolu  wrote:
> 
> Hi James,
> 
> On 3/7/19 6:21 PM, James Sewart wrote:
>> Hey Lu,
>>> On 7 Mar 2019, at 06:31, Lu Baolu  wrote:
>>> 
>>> Hi James,
>>> 
>>> On 3/7/19 2:08 AM, James Sewart wrote:
>>>>>>>> -  /*
>>>>>>>> -   * For each rmrr
>>>>>>>> -   *   for each dev attached to rmrr
>>>>>>>> -   *   do
>>>>>>>> -   * locate drhd for dev, alloc domain for dev
>>>>>>>> -   * allocate free domain
>>>>>>>> -   * allocate page table entries for rmrr
>>>>>>>> -   * if context not allocated for bus
>>>>>>>> -   *   allocate and init context
>>>>>>>> -   *   set present in root table for this bus
>>>>>>>> -   * init context with domain, translation etc
>>>>>>>> -   *endfor
>>>>>>>> -   * endfor
>>>>>>>> -   */
>>>>>>>> -  pr_info("Setting RMRR:\n");
>>>>>>>> -  for_each_rmrr_units(rmrr) {
>>>>>>>> -  /* some BIOS lists non-exist devices in DMAR table. */
>>>>>>>> -  for_each_active_dev_scope(rmrr->devices, 
>>>>>>>> rmrr->devices_cnt,
>>>>>>>> -i, dev) {
>>>>>>>> -  ret = iommu_prepare_rmrr_dev(rmrr, dev);
>>>>>>>> -  if (ret)
>>>>>>>> -  pr_err("Mapping reserved region 
>>>>>>>> failed\n");
>>>>>>>> -  }
>>>>>>>> -  }
>>>>>>>> -
>>>>>>>> -  iommu_prepare_isa();
>>>>>>> Why do you want to remove this segment of code?
>>>>>> This will only work if the lazy allocation of domains exists, these
>>>>>> mappings will disappear once a default domain is attached to a device and
>>>>>> then remade by iommu_group_create_direct_mappings. This code is redundant
>>>>>> and removing it allows us to remove all the lazy allocation logic.
>>>>> No exactly.
>>>>> 
>>>>> We need to setup the rmrr mapping before enabling dma remapping,
>>>>> otherwise, there will be a window after dma remapping enabling and
>>>>> rmrr getting mapped, during which people might see dma faults.
>>>> Do you think this patch instead should be a refactoring to simplify this 
>>>> initial domain setup before the default domain takes over? It seems like 
>>>> duplicated effort to have both lazy allocated domains and externally 
>>>> managed domains. We could allocate a domain here for any devices with RMRR 
>>>> and call get_resv_regions to avoid duplicating RMRR loop.
>>> 
>>> Agree. We should replace the lazy allocated domains with default domain
>>> in a clean way. Actually, your patches look great to me. But I do think
>>> we need further cleanups. The rmrr code is one example, and the identity
>>> domain (si_domain) is another.
>>> 
>>> Do you mind if I work on top of your patches for further cleanups and
>>> sign off a v2 together with you?
>> Sure, sounds good. I’ll fixup patch 3 and have a go at integrating
>> iommu_prepare_isa into get_resv_regions. This should make the initial
>> domain logic here quite concise.
> 
> Here attached three extra patches which I think should be added before
> PATCH 3/4, and some further cleanup changes which you can merge with
> PATCH 4/4.
> 
> 
> 
> 0006-iommu-Add-ops-entry-for-vendor-specific-default-doma.patch
> 0007-iommu-vt-d-Add-is_identity_map-ops-entry.patch
> 
> These two patches aim to add a generic method for vendor specific iommu
> drivers to specify the type of the default domain for a device. Intel
> iommu driver will register an ops for this since it already has its own
> identity map adjudicator for a long time.

This seems like a good idea, but as domain alloc is only called for the 
default domain on first device attached to a group, we may miss checking 
whether a device added later should have an identity domain. Should there 
be paths to downgrade a groups domain if one of the device

Re: [PATCH 4/4] iommu/vt-d: Remove lazy allocation of domains

2019-03-07 Thread James Sewart
Hey Lu,

> On 7 Mar 2019, at 06:31, Lu Baolu  wrote:
> 
> Hi James,
> 
> On 3/7/19 2:08 AM, James Sewart wrote:
>>>>>> -/*
>>>>>> - * For each rmrr
>>>>>> - *   for each dev attached to rmrr
>>>>>> - *   do
>>>>>> - * locate drhd for dev, alloc domain for dev
>>>>>> - * allocate free domain
>>>>>> - * allocate page table entries for rmrr
>>>>>> - * if context not allocated for bus
>>>>>> - *   allocate and init context
>>>>>> - *   set present in root table for this bus
>>>>>> - * init context with domain, translation etc
>>>>>> - *endfor
>>>>>> - * endfor
>>>>>> - */
>>>>>> -pr_info("Setting RMRR:\n");
>>>>>> -for_each_rmrr_units(rmrr) {
>>>>>> -/* some BIOS lists non-exist devices in DMAR table. */
>>>>>> -for_each_active_dev_scope(rmrr->devices, 
>>>>>> rmrr->devices_cnt,
>>>>>> -  i, dev) {
>>>>>> -ret = iommu_prepare_rmrr_dev(rmrr, dev);
>>>>>> -if (ret)
>>>>>> -pr_err("Mapping reserved region 
>>>>>> failed\n");
>>>>>> -}
>>>>>> -}
>>>>>> -
>>>>>> -iommu_prepare_isa();
>>>>> Why do you want to remove this segment of code?
>>>> This will only work if the lazy allocation of domains exists, these
>>>> mappings will disappear once a default domain is attached to a device and
>>>> then remade by iommu_group_create_direct_mappings. This code is redundant
>>>> and removing it allows us to remove all the lazy allocation logic.
>>> No exactly.
>>> 
>>> We need to setup the rmrr mapping before enabling dma remapping,
>>> otherwise, there will be a window after dma remapping enabling and
>>> rmrr getting mapped, during which people might see dma faults.
>> Do you think this patch instead should be a refactoring to simplify this 
>> initial domain setup before the default domain takes over? It seems like 
>> duplicated effort to have both lazy allocated domains and externally managed 
>> domains. We could allocate a domain here for any devices with RMRR and call 
>> get_resv_regions to avoid duplicating RMRR loop.
> 
> Agree. We should replace the lazy allocated domains with default domain
> in a clean way. Actually, your patches look great to me. But I do think
> we need further cleanups. The rmrr code is one example, and the identity
> domain (si_domain) is another.
> 
> Do you mind if I work on top of your patches for further cleanups and
> sign off a v2 together with you?

Sure, sounds good. I’ll fixup patch 3 and have a go at integrating 
iommu_prepare_isa into get_resv_regions. This should make the initial 
domain logic here quite concise.

> 
> Best regards,
> Lu Baolu

Cheers,
James.

Re: [PATCH 4/4] iommu/vt-d: Remove lazy allocation of domains

2019-03-06 Thread James Sewart
Hey,

> On 6 Mar 2019, at 07:00, Lu Baolu  wrote:
> 
> Hi,
> 
> On 3/5/19 7:46 PM, James Sewart wrote:
>> Hey Lu,
>>> On 5 Mar 2019, at 06:59, Lu Baolu  wrote:
>>> 
>>> Hi,
>>> 
>>> It's hard for me to understand why do we remove the rmrr related
>>> code in this patch.
>> The RMRR code removed here requires the lazy allocation of domains to
>> exist, as it is run before iommu.c would assign IOMMU groups and attach a
>> domain. Before patch 3, removing this code would mean the RMRR regions are
>> never mapped for a domain: iommu.c will allocate a default domain for the
>> group that a device is about to be put in, it will attach the domain to
>> the device, then for each region returned by get_resv_regions it will
>> create an identity map, this is where the RMRRs are setup for the default
>> domain. >
>>> 
>>> And, now we have two places to hold a domain for a device: group and
>>> dev->info. We can consider to optimize the use of per device iommu
>>> arch data. This should be later work anyway.
>>> 
>>> More comments inline.
>>> 
>>> On 3/4/19 11:47 PM, James Sewart wrote:
>>>> The generic IOMMU code will allocate and attach a dma ops domain to each
>>>> device that comes online, replacing any lazy allocated domain. Removes
>>>> RMRR application at iommu init time as we won't have a domain attached
>>>> to any device. iommu.c will do this after attaching a device using
>>>> create_direct_mappings.
>>>> Signed-off-by: James Sewart 
>>>> ---
>>>>  drivers/iommu/intel-iommu.c | 202 ++--
>>>>  1 file changed, 8 insertions(+), 194 deletions(-)
>>>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>>>> index 71cd6bbfec05..282257e2628d 100644
>>>> --- a/drivers/iommu/intel-iommu.c
>>>> +++ b/drivers/iommu/intel-iommu.c
>>>> @@ -2595,118 +2595,6 @@ static struct dmar_domain 
>>>> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>>>>return domain;
>>>>  }
>>>>  -static int get_last_alias(struct pci_dev *pdev, u16 alias, void *opaque)
>>>> -{
>>>> -  *(u16 *)opaque = alias;
>>>> -  return 0;
>>>> -}
>>>> -
>>>> -static struct dmar_domain *find_or_alloc_domain(struct device *dev, int 
>>>> gaw)
>>>> -{
>>>> -  struct device_domain_info *info = NULL;
>>>> -  struct dmar_domain *domain = NULL;
>>>> -  struct intel_iommu *iommu;
>>>> -  u16 dma_alias;
>>>> -  unsigned long flags;
>>>> -  u8 bus, devfn;
>>>> -
>>>> -  iommu = device_to_iommu(dev, , );
>>>> -  if (!iommu)
>>>> -  return NULL;
>>>> -
>>>> -  if (dev_is_pci(dev)) {
>>>> -  struct pci_dev *pdev = to_pci_dev(dev);
>>>> -
>>>> -  pci_for_each_dma_alias(pdev, get_last_alias, _alias);
>>>> -
>>>> -  spin_lock_irqsave(_domain_lock, flags);
>>>> -  info = dmar_search_domain_by_dev_info(pci_domain_nr(pdev->bus),
>>>> -PCI_BUS_NUM(dma_alias),
>>>> -dma_alias & 0xff);
>>>> -  if (info) {
>>>> -  iommu = info->iommu;
>>>> -  domain = info->domain;
>>>> -  }
>>>> -  spin_unlock_irqrestore(_domain_lock, flags);
>>>> -
>>>> -  /* DMA alias already has a domain, use it */
>>>> -  if (info)
>>>> -  goto out;
>>>> -  }
>>>> -
>>>> -  /* Allocate and initialize new domain for the device */
>>>> -  domain = alloc_domain(0);
>>>> -  if (!domain)
>>>> -  return NULL;
>>>> -  if (domain_init(domain, iommu, gaw)) {
>>>> -  domain_exit(domain);
>>>> -  return NULL;
>>>> -  }
>>>> -
>>>> -out:
>>>> -
>>>> -  return domain;
>>>> -}
>>>> -
>>>> -static struct dmar_domain *set_domain_for_dev(struct device *dev,
>>>> -struct dmar_domain *domain)
>>>> -{
>>>> -  struct intel_iommu *iommu;
>>>> -  struct dmar_domain *tmp;

Re: [PATCH 4/4] iommu/vt-d: Remove lazy allocation of domains

2019-03-05 Thread James Sewart
Hey Lu,

> On 5 Mar 2019, at 06:59, Lu Baolu  wrote:
> 
> Hi,
> 
> It's hard for me to understand why do we remove the rmrr related
> code in this patch.

The RMRR code removed here requires the lazy allocation of domains to 
exist, as it is run before iommu.c would assign IOMMU groups and attach a 
domain. Before patch 3, removing this code would mean the RMRR regions are 
never mapped for a domain: iommu.c will allocate a default domain for the 
group that a device is about to be put in, it will attach the domain to 
the device, then for each region returned by get_resv_regions it will 
create an identity map, this is where the RMRRs are setup for the default 
domain.

> 
> And, now we have two places to hold a domain for a device: group and
> dev->info. We can consider to optimize the use of per device iommu
> arch data. This should be later work anyway.
> 
> More comments inline.
> 
> On 3/4/19 11:47 PM, James Sewart wrote:
>> The generic IOMMU code will allocate and attach a dma ops domain to each
>> device that comes online, replacing any lazy allocated domain. Removes
>> RMRR application at iommu init time as we won't have a domain attached
>> to any device. iommu.c will do this after attaching a device using
>> create_direct_mappings.
>> Signed-off-by: James Sewart 
>> ---
>>  drivers/iommu/intel-iommu.c | 202 ++--
>>  1 file changed, 8 insertions(+), 194 deletions(-)
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 71cd6bbfec05..282257e2628d 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -2595,118 +2595,6 @@ static struct dmar_domain 
>> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>>  return domain;
>>  }
>>  -static int get_last_alias(struct pci_dev *pdev, u16 alias, void *opaque)
>> -{
>> -*(u16 *)opaque = alias;
>> -return 0;
>> -}
>> -
>> -static struct dmar_domain *find_or_alloc_domain(struct device *dev, int gaw)
>> -{
>> -struct device_domain_info *info = NULL;
>> -struct dmar_domain *domain = NULL;
>> -struct intel_iommu *iommu;
>> -u16 dma_alias;
>> -unsigned long flags;
>> -u8 bus, devfn;
>> -
>> -iommu = device_to_iommu(dev, , );
>> -if (!iommu)
>> -return NULL;
>> -
>> -if (dev_is_pci(dev)) {
>> -struct pci_dev *pdev = to_pci_dev(dev);
>> -
>> -pci_for_each_dma_alias(pdev, get_last_alias, _alias);
>> -
>> -spin_lock_irqsave(_domain_lock, flags);
>> -info = dmar_search_domain_by_dev_info(pci_domain_nr(pdev->bus),
>> -  PCI_BUS_NUM(dma_alias),
>> -  dma_alias & 0xff);
>> -if (info) {
>> -iommu = info->iommu;
>> -domain = info->domain;
>> -}
>> -spin_unlock_irqrestore(_domain_lock, flags);
>> -
>> -/* DMA alias already has a domain, use it */
>> -if (info)
>> -goto out;
>> -}
>> -
>> -/* Allocate and initialize new domain for the device */
>> -domain = alloc_domain(0);
>> -if (!domain)
>> -return NULL;
>> -if (domain_init(domain, iommu, gaw)) {
>> -domain_exit(domain);
>> -return NULL;
>> -}
>> -
>> -out:
>> -
>> -return domain;
>> -}
>> -
>> -static struct dmar_domain *set_domain_for_dev(struct device *dev,
>> -  struct dmar_domain *domain)
>> -{
>> -struct intel_iommu *iommu;
>> -struct dmar_domain *tmp;
>> -u16 req_id, dma_alias;
>> -u8 bus, devfn;
>> -
>> -iommu = device_to_iommu(dev, , );
>> -if (!iommu)
>> -return NULL;
>> -
>> -req_id = ((u16)bus << 8) | devfn;
>> -
>> -if (dev_is_pci(dev)) {
>> -struct pci_dev *pdev = to_pci_dev(dev);
>> -
>> -pci_for_each_dma_alias(pdev, get_last_alias, _alias);
>> -
>> -/* register PCI DMA alias device */
>> -if (req_id != dma_alias) {
>> -tmp = dmar_insert_one_dev_info(iommu, 
>> PCI_BUS_NUM(dma_alias),
>> -dma_alias & 0xff, NULL, domain);
>> -
>> -if (!tmp || tmp != domain)
>> -   

Re: [PATCH 3/4] iommu/vt-d: Allow IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_IDENTITY to be allocated

2019-03-05 Thread James Sewart
Hey Lu,

> On 5 Mar 2019, at 06:46, Lu Baolu  wrote:
> 
> Hi,
> 
> On 3/4/19 11:46 PM, James Sewart wrote:
>> Allowing IOMMU_DOMAIN_DMA type IOMMU domain to be allocated allows the
>> default_domain of an iommu_group to be set. This delegates device-domain
>> relationships to the generic IOMMU code.
>> Signed-off-by: James Sewart 
>> ---
>>  drivers/iommu/intel-iommu.c | 113 +++-
>>  1 file changed, 84 insertions(+), 29 deletions(-)
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 8e0a4e2ff77f..71cd6bbfec05 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -309,6 +309,14 @@ static int hw_pass_through = 1;
>>  /* si_domain contains mulitple devices */
>>  #define DOMAIN_FLAG_STATIC_IDENTITY (1 << 1)
>>  +/* Domain managed externally, don't cleanup if it isn't attached
>> + * to any devices. */
>> +#define DOMAIN_FLAG_MANAGED_EXTERNALLY  (1 << 2)
>> +
>> +/* Set after domain initialisation. Used when allocating dma domains to
>> + * defer domain initialisation until it is attached to a device */
>> +#define DOMAIN_FLAG_INITIALISED (1 << 4)
> 
> Why do you skip bit 3?

This was an oversight, I will update to use bit 3.

> 
>> +
>>  #define for_each_domain_iommu(idx, domain)  \
>>  for (idx = 0; idx < g_num_of_iommus; idx++) \
>>  if (domain->iommu_refcnt[idx])
>> @@ -558,6 +566,16 @@ static inline int domain_type_is_vm_or_si(struct 
>> dmar_domain *domain)
>>  DOMAIN_FLAG_STATIC_IDENTITY);
>>  }
>>  +static inline int domain_managed_externally(struct dmar_domain *domain)
>> +{
>> +return domain->flags & DOMAIN_FLAG_MANAGED_EXTERNALLY;
>> +}
>> +
>> +static inline int domain_is_initialised(struct dmar_domain *domain)
>> +{
>> +return domain->flags & DOMAIN_FLAG_INITIALISED;
>> +}
>> +
>>  static inline int domain_pfn_supported(struct dmar_domain *domain,
>> unsigned long pfn)
>>  {
>> @@ -1662,7 +1680,7 @@ static void disable_dmar_iommu(struct intel_iommu 
>> *iommu)
>>  __dmar_remove_one_dev_info(info);
>>  -   if (!domain_type_is_vm_or_si(domain)) {
>> +if (!domain_managed_externally(domain)) {
>>  /*
>>   * The domain_exit() function  can't be called under
>>   * device_domain_lock, as it takes this lock itself.
>> @@ -1895,6 +1913,7 @@ static int domain_init(struct dmar_domain *domain, 
>> struct intel_iommu *iommu,
>>  domain->pgd = (struct dma_pte *)alloc_pgtable_page(domain->nid);
>>  if (!domain->pgd)
>>  return -ENOMEM;
>> +domain->flags |= DOMAIN_FLAG_INITIALISED;
>>  __iommu_flush_cache(iommu, domain->pgd, PAGE_SIZE);
>>  return 0;
>>  }
>> @@ -2807,14 +2826,10 @@ static inline void iommu_prepare_isa(void)
>>static int md_domain_init(struct dmar_domain *domain, int guest_width);
>>  -static int __init si_domain_init(int hw)
>> +static int __init si_domain_init(struct dmar_domain *si_domain, int hw)
>>  {
>>  int nid, ret = 0;
>>  -   si_domain = alloc_domain(DOMAIN_FLAG_STATIC_IDENTITY);
>> -if (!si_domain)
>> -return -EFAULT;
>> -
>>  if (md_domain_init(si_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) {
>>  domain_exit(si_domain);
>>  return -EFAULT;
>> @@ -3417,9 +3432,16 @@ static int __init init_dmars(void)
>>  check_tylersburg_isoch();
>>  if (iommu_identity_mapping) {
>> -ret = si_domain_init(hw_pass_through);
>> -if (ret)
>> +si_domain = alloc_domain(DOMAIN_FLAG_STATIC_IDENTITY);
> 
> Where will this si_domain be used? We still need to keep the global
> si_domain?

I am unsure of the best thing to do here. The si_domain can be initialised 
as a hardware passthrough which means it doesn’t have any mappings applied. 
I think any user allocating a domain here will always want a software 
passthrough domain. I’m not sure if/how to consolidate these two.

> 
>> +if (!si_domain) {
>> +ret = -EFAULT;
>>  goto free_iommu;
>> +}
>> +ret = si_domain_init(si_domain, hw_pass_through);
>> +if (ret) {
>> +domain_exit(si_domain);
>> +goto free_iommu;

Re: [PATCH 0/4] iommu/vt-d: Fix-up device-domain relationship by refactoring to use iommu group default domain.

2019-03-05 Thread James Sewart
Hey Lu,

The motivation for this is buggy domain <-> IOMMU group relationship when 
using find_or_alloc_domain. From what I have read it should be the case 
that an IOMMU domain is shared by all devices in the same group, thus the 
same mappings. This is because the IOMMU group is determined by things 
like ACS settings on pci switches which determines whether devices can 
talk to each other.

In find_or_alloc_domain we only determine domain sharing based on devices 
returned by pci_for_each_dma_alias, whereas there are many more checks in 
pci_device_group(e.g. ACS settings of intermediary pci switches), which is 
used for determining which IOMMU group a device is in. This discontinuity 
means it can be the case that each device within an IOMMU group will have 
different domains.

We see issues as it is supposed to be safe to assume that devices within a 
group should be considered to be in the same address space, but if each 
device has its own domain then any mapping created won’t be valid for the 
other devices, and even could be mapped differently for each device. This 
also could cause issues with a device whose RMRR's are not applied to the 
domains of other devices within its group, these regions could be remapped 
for the other devices resulting in unexpected behaviour during 
peer-to-peer transfers.

Cheers,
James


> On 5 Mar 2019, at 06:05, Lu Baolu  wrote:
> 
> Hi James,
> 
> Very glad to see this. Thank you!
> 
> On 3/4/19 11:41 PM, James Sewart wrote:
>> Hey,
>> This patchset moves IOMMU_DOMAIN_DMA iommu domain management to iommu.c.
>> This avoids the use of find_or_alloc_domain whose domain assignment is
>> inconsistent with the iommu grouping as determined by pci_device_group.
> 
> Is this a bug fix or an improvement? What's the real issue will it cause
> if we go without this patch set?
> 
> Best regards,
> Lu Baolu
> 
>> Patch 3 permits domain type IOMMU_DOMAIN_DMA to be allocated via the
>> iommu_ops api, allowing the default_domain of an iommu group to be set in
>> iommu.c. This domain will be attached to every device that is brought up
>> with an iommu group, and the devices reserved regions will be mapped using
>> regions returned by get_resv_regions.
>> In intel_iommu_domain_alloc we don’t know the IOMMU this domain will be
>> associated with so we defer full initialisation until
>> intel_iommu_attach_device. Currently iommu.c:iommu_group_add_device will
>> try to map a devices reserved regions before attaching the domain which
>> would cause issue if the domain is not fully initialised. This is
>> addressed in patch 1 by moving the mapping to after attaching.
>> Patch 2 implements function apply_resv_region, used in
>> iommu_group_create_direct_mappings to mark the reserved regions as non
>> mappable for the dma_map_ops api.
>> Patch 4 removes the domain lazy allocation logic. Before this patch the
>> lazy allocation logic would not be used as any domain allocated using
>> these paths would be replaced when attaching the group default domain.
>> Default domain allocation has been tested with and without this patch on
>> 4.19.
>> Cheers,
>> James.
>> James Sewart (4):
>>   iommu: Move iommu_group_create_direct_mappings to after device_attach
>>   iommu/vt-d: Implement apply_resv_region for reserving IOVA ranges
>>   iommu/vt-d: Allow IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_IDENTITY to be 
>> allocated
>>   iommu/vt-d: Remove lazy allocation of domains
>>  drivers/iommu/intel-iommu.c | 329 
>>  drivers/iommu/iommu.c   |   4 +-
>>  2 files changed, 108 insertions(+), 225 deletions(-)



[PATCH 4/4] iommu/vt-d: Remove lazy allocation of domains

2019-03-04 Thread James Sewart
The generic IOMMU code will allocate and attach a dma ops domain to each
device that comes online, replacing any lazy allocated domain. Removes
RMRR application at iommu init time as we won't have a domain attached
to any device. iommu.c will do this after attaching a device using
create_direct_mappings.

Signed-off-by: James Sewart 
---
 drivers/iommu/intel-iommu.c | 202 ++--
 1 file changed, 8 insertions(+), 194 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 71cd6bbfec05..282257e2628d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2595,118 +2595,6 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
return domain;
 }
 
-static int get_last_alias(struct pci_dev *pdev, u16 alias, void *opaque)
-{
-   *(u16 *)opaque = alias;
-   return 0;
-}
-
-static struct dmar_domain *find_or_alloc_domain(struct device *dev, int gaw)
-{
-   struct device_domain_info *info = NULL;
-   struct dmar_domain *domain = NULL;
-   struct intel_iommu *iommu;
-   u16 dma_alias;
-   unsigned long flags;
-   u8 bus, devfn;
-
-   iommu = device_to_iommu(dev, , );
-   if (!iommu)
-   return NULL;
-
-   if (dev_is_pci(dev)) {
-   struct pci_dev *pdev = to_pci_dev(dev);
-
-   pci_for_each_dma_alias(pdev, get_last_alias, _alias);
-
-   spin_lock_irqsave(_domain_lock, flags);
-   info = dmar_search_domain_by_dev_info(pci_domain_nr(pdev->bus),
- PCI_BUS_NUM(dma_alias),
- dma_alias & 0xff);
-   if (info) {
-   iommu = info->iommu;
-   domain = info->domain;
-   }
-   spin_unlock_irqrestore(_domain_lock, flags);
-
-   /* DMA alias already has a domain, use it */
-   if (info)
-   goto out;
-   }
-
-   /* Allocate and initialize new domain for the device */
-   domain = alloc_domain(0);
-   if (!domain)
-   return NULL;
-   if (domain_init(domain, iommu, gaw)) {
-   domain_exit(domain);
-   return NULL;
-   }
-
-out:
-
-   return domain;
-}
-
-static struct dmar_domain *set_domain_for_dev(struct device *dev,
- struct dmar_domain *domain)
-{
-   struct intel_iommu *iommu;
-   struct dmar_domain *tmp;
-   u16 req_id, dma_alias;
-   u8 bus, devfn;
-
-   iommu = device_to_iommu(dev, , );
-   if (!iommu)
-   return NULL;
-
-   req_id = ((u16)bus << 8) | devfn;
-
-   if (dev_is_pci(dev)) {
-   struct pci_dev *pdev = to_pci_dev(dev);
-
-   pci_for_each_dma_alias(pdev, get_last_alias, _alias);
-
-   /* register PCI DMA alias device */
-   if (req_id != dma_alias) {
-   tmp = dmar_insert_one_dev_info(iommu, 
PCI_BUS_NUM(dma_alias),
-   dma_alias & 0xff, NULL, domain);
-
-   if (!tmp || tmp != domain)
-   return tmp;
-   }
-   }
-
-   tmp = dmar_insert_one_dev_info(iommu, bus, devfn, dev, domain);
-   if (!tmp || tmp != domain)
-   return tmp;
-
-   return domain;
-}
-
-static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
-{
-   struct dmar_domain *domain, *tmp;
-
-   domain = find_domain(dev);
-   if (domain)
-   goto out;
-
-   domain = find_or_alloc_domain(dev, gaw);
-   if (!domain)
-   goto out;
-
-   tmp = set_domain_for_dev(dev, domain);
-   if (!tmp || domain != tmp) {
-   domain_exit(domain);
-   domain = tmp;
-   }
-
-out:
-
-   return domain;
-}
-
 static int iommu_domain_identity_map(struct dmar_domain *domain,
 unsigned long long start,
 unsigned long long end)
@@ -2779,7 +2667,7 @@ static int iommu_prepare_identity_map(struct device *dev,
struct dmar_domain *domain;
int ret;
 
-   domain = get_domain_for_dev(dev, DEFAULT_DOMAIN_ADDRESS_WIDTH);
+   domain = find_domain(dev);
if (!domain)
return -ENOMEM;
 
@@ -3301,11 +3189,9 @@ static int copy_translation_tables(struct intel_iommu 
*iommu)
 static int __init init_dmars(void)
 {
struct dmar_drhd_unit *drhd;
-   struct dmar_rmrr_unit *rmrr;
bool copied_tables = false;
-   struct device *dev;
struct intel_iommu *iommu;
-   int i, ret;
+   int ret;
 
/*
 * for each drhd
@@ -3466,32 +3352,6 @@ static int __init init_dmars(void)

[PATCH 3/4] iommu/vt-d: Allow IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_IDENTITY to be allocated

2019-03-04 Thread James Sewart
Allowing IOMMU_DOMAIN_DMA type IOMMU domain to be allocated allows the
default_domain of an iommu_group to be set. This delegates device-domain
relationships to the generic IOMMU code.

Signed-off-by: James Sewart 
---
 drivers/iommu/intel-iommu.c | 113 +++-
 1 file changed, 84 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 8e0a4e2ff77f..71cd6bbfec05 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -309,6 +309,14 @@ static int hw_pass_through = 1;
 /* si_domain contains mulitple devices */
 #define DOMAIN_FLAG_STATIC_IDENTITY(1 << 1)
 
+/* Domain managed externally, don't cleanup if it isn't attached
+ * to any devices. */
+#define DOMAIN_FLAG_MANAGED_EXTERNALLY (1 << 2)
+
+/* Set after domain initialisation. Used when allocating dma domains to
+ * defer domain initialisation until it is attached to a device */
+#define DOMAIN_FLAG_INITIALISED(1 << 4)
+
 #define for_each_domain_iommu(idx, domain) \
for (idx = 0; idx < g_num_of_iommus; idx++) \
if (domain->iommu_refcnt[idx])
@@ -558,6 +566,16 @@ static inline int domain_type_is_vm_or_si(struct 
dmar_domain *domain)
DOMAIN_FLAG_STATIC_IDENTITY);
 }
 
+static inline int domain_managed_externally(struct dmar_domain *domain)
+{
+   return domain->flags & DOMAIN_FLAG_MANAGED_EXTERNALLY;
+}
+
+static inline int domain_is_initialised(struct dmar_domain *domain)
+{
+   return domain->flags & DOMAIN_FLAG_INITIALISED;
+}
+
 static inline int domain_pfn_supported(struct dmar_domain *domain,
   unsigned long pfn)
 {
@@ -1662,7 +1680,7 @@ static void disable_dmar_iommu(struct intel_iommu *iommu)
 
__dmar_remove_one_dev_info(info);
 
-   if (!domain_type_is_vm_or_si(domain)) {
+   if (!domain_managed_externally(domain)) {
/*
 * The domain_exit() function  can't be called under
 * device_domain_lock, as it takes this lock itself.
@@ -1895,6 +1913,7 @@ static int domain_init(struct dmar_domain *domain, struct 
intel_iommu *iommu,
domain->pgd = (struct dma_pte *)alloc_pgtable_page(domain->nid);
if (!domain->pgd)
return -ENOMEM;
+   domain->flags |= DOMAIN_FLAG_INITIALISED;
__iommu_flush_cache(iommu, domain->pgd, PAGE_SIZE);
return 0;
 }
@@ -2807,14 +2826,10 @@ static inline void iommu_prepare_isa(void)
 
 static int md_domain_init(struct dmar_domain *domain, int guest_width);
 
-static int __init si_domain_init(int hw)
+static int __init si_domain_init(struct dmar_domain *si_domain, int hw)
 {
int nid, ret = 0;
 
-   si_domain = alloc_domain(DOMAIN_FLAG_STATIC_IDENTITY);
-   if (!si_domain)
-   return -EFAULT;
-
if (md_domain_init(si_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) {
domain_exit(si_domain);
return -EFAULT;
@@ -3417,9 +3432,16 @@ static int __init init_dmars(void)
check_tylersburg_isoch();
 
if (iommu_identity_mapping) {
-   ret = si_domain_init(hw_pass_through);
-   if (ret)
+   si_domain = alloc_domain(DOMAIN_FLAG_STATIC_IDENTITY);
+   if (!si_domain) {
+   ret = -EFAULT;
goto free_iommu;
+   }
+   ret = si_domain_init(si_domain, hw_pass_through);
+   if (ret) {
+   domain_exit(si_domain);
+   goto free_iommu;
+   }
}
 
 
@@ -4575,7 +4597,7 @@ static int device_notifier(struct notifier_block *nb,
return 0;
 
dmar_remove_one_dev_info(domain, dev);
-   if (!domain_type_is_vm_or_si(domain) && list_empty(>devices))
+   if (!domain_managed_externally(domain) && list_empty(>devices))
domain_exit(domain);
 
return 0;
@@ -5020,6 +5042,7 @@ static int md_domain_init(struct dmar_domain *domain, int 
guest_width)
domain->pgd = (struct dma_pte *)alloc_pgtable_page(domain->nid);
if (!domain->pgd)
return -ENOMEM;
+   domain->flags |= DOMAIN_FLAG_INITIALISED;
domain_flush_cache(domain, domain->pgd, PAGE_SIZE);
return 0;
 }
@@ -5028,28 +5051,54 @@ static struct iommu_domain 
*intel_iommu_domain_alloc(unsigned type)
 {
struct dmar_domain *dmar_domain;
struct iommu_domain *domain;
+   int flags = DOMAIN_FLAG_MANAGED_EXTERNALLY;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED)
-   return NULL;
-
-   dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE);
-   if (!dmar_domain) {
-   pr_err("Can't allocate dmar_domain\n");
-  

[PATCH 2/4] iommu/vt-d: Implement apply_resv_region for reserving IOVA ranges

2019-03-04 Thread James Sewart
Used by iommu.c before creating identity mappings for reserved ranges to
ensure dma-map-ops won't ever remap these addresses.

Signed-off-by: James Sewart 
---
 drivers/iommu/intel-iommu.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 78188bf7e90d..8e0a4e2ff77f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5299,6 +5299,19 @@ static void intel_iommu_put_resv_regions(struct device 
*dev,
}
 }
 
+static void intel_iommu_apply_resv_region(struct device *dev,
+   struct iommu_domain *domain,
+   struct iommu_resv_region *region)
+{
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   unsigned long start, end;
+
+   start = IOVA_PFN(region->start);
+   end   = IOVA_PFN(region->start + region->length - 1);
+
+   WARN_ON_ONCE(reserve_iova(_domain->iovad, start, end) == NULL);
+}
+
 #ifdef CONFIG_INTEL_IOMMU_SVM
 int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev 
*sdev)
 {
@@ -5392,6 +5405,7 @@ const struct iommu_ops intel_iommu_ops = {
.remove_device  = intel_iommu_remove_device,
.get_resv_regions   = intel_iommu_get_resv_regions,
.put_resv_regions   = intel_iommu_put_resv_regions,
+   .apply_resv_region  = intel_iommu_apply_resv_region,
.device_group   = pci_device_group,
.pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
 };
-- 
2.17.1



[PATCH 1/4] iommu: Move iommu_group_create_direct_mappings to after device_attach

2019-03-04 Thread James Sewart
If an IOMMU driver requires to know which IOMMU a domain is associated
to initialise a domain then it will do so in device_attach, before which
it is not safe to call iommu_ops.

Signed-off-by: James Sewart 
---
 drivers/iommu/iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3ed4db334341..1c6ffbb2d20e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -652,8 +652,6 @@ int iommu_group_add_device(struct iommu_group *group, 
struct device *dev)
 
dev->iommu_group = group;
 
-   iommu_group_create_direct_mappings(group, dev);
-
mutex_lock(>mutex);
list_add_tail(>list, >devices);
if (group->domain)
@@ -662,6 +660,8 @@ int iommu_group_add_device(struct iommu_group *group, 
struct device *dev)
if (ret)
goto err_put_group;
 
+   iommu_group_create_direct_mappings(group, dev);
+
/* Notify any listeners about change to group. */
blocking_notifier_call_chain(>notifier,
 IOMMU_GROUP_NOTIFY_ADD_DEVICE, dev);
-- 
2.17.1





[PATCH 0/4] iommu/vt-d: Fix-up device-domain relationship by refactoring to use iommu group default domain.

2019-03-04 Thread James Sewart
Hey,

This patchset moves IOMMU_DOMAIN_DMA iommu domain management to iommu.c. 
This avoids the use of find_or_alloc_domain whose domain assignment is 
inconsistent with the iommu grouping as determined by pci_device_group.

Patch 3 permits domain type IOMMU_DOMAIN_DMA to be allocated via the 
iommu_ops api, allowing the default_domain of an iommu group to be set in 
iommu.c. This domain will be attached to every device that is brought up 
with an iommu group, and the devices reserved regions will be mapped using 
regions returned by get_resv_regions.

In intel_iommu_domain_alloc we don’t know the IOMMU this domain will be 
associated with so we defer full initialisation until 
intel_iommu_attach_device. Currently iommu.c:iommu_group_add_device will 
try to map a devices reserved regions before attaching the domain which 
would cause issue if the domain is not fully initialised. This is 
addressed in patch 1 by moving the mapping to after attaching.

Patch 2 implements function apply_resv_region, used in 
iommu_group_create_direct_mappings to mark the reserved regions as non 
mappable for the dma_map_ops api.

Patch 4 removes the domain lazy allocation logic. Before this patch the 
lazy allocation logic would not be used as any domain allocated using 
these paths would be replaced when attaching the group default domain. 
Default domain allocation has been tested with and without this patch on 
4.19.

Cheers,
James.

James Sewart (4):
  iommu: Move iommu_group_create_direct_mappings to after device_attach
  iommu/vt-d: Implement apply_resv_region for reserving IOVA ranges
  iommu/vt-d: Allow IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_IDENTITY to be allocated
  iommu/vt-d: Remove lazy allocation of domains

 drivers/iommu/intel-iommu.c | 329 
 drivers/iommu/iommu.c   |   4 +-
 2 files changed, 108 insertions(+), 225 deletions(-)

-- 
2.17.1



Re: [RFC v2] iommu/vt-d: Allow iommu_domain_alloc to allocate IOMMU_DOMAIN_DMA

2019-01-15 Thread James Sewart
Hey Jacob,

> On 7 Jan 2019, at 20:04, Jacob Pan  wrote:
> 
> On Wed, 5 Dec 2018 17:19:35 +0000
> James Sewart  wrote:
> 
>> Hey,
>> 
>> There exists an issue in the logic used to determine domain
>> association with devices. Currently the driver uses
>> find_or_alloc_domain to either reuse an existing domain or allocate a
>> new one if one isn’t found. Domains should be shared between all
>> members of an IOMMU group as this is the minimum granularity that we
>> can guarantee address space isolation.
>> 
>> The intel IOMMU driver exposes pci_device_group in intel_iommu_ops as
>> the function to call to determine the group of a device, this is
>> implemented in the generic IOMMU code and checks: dma aliases,
>> upstream pcie switch ACS, pci aliases, and pci function aliases. The
>> find_or_alloc_domain code currently only uses dma aliases to
>> determine if a domain is shared. This causes a disconnect between
>> IOMMU groups and domains. We have observed devices under a pcie
>> switch each having their own domain but assigned the same group.
>> 
>> One solution would be to fix the logic in find_or_alloc_domain to add 
>> checks for the other conditions that a device may share a domain.
>> However, this duplicates code which the generic IOMMU code
>> implements. Instead this issue can be fixed by allowing the
>> allocation of default_domain on the IOMMU group. This is not
>> currently supported as the intel driver does not allow allocation of
>> domain type IOMMU_DOMAIN_DMA.
>> 
>> Allowing allocation of DMA domains has the effect that the
>> default_domain is non NULL and is attached to a device when
>> initialising. This delegates the handling of domains to the generic
>> IOMMU code. Once this is implemented it is possible to remove the
>> lazy allocation of domains entirely.
>> 
> This can also consolidate the domain storage, i.e. move domain from
> device_domain_info to iommu_group.

The plan was to have a patchset that first added the functionality below, 
making use of the group domain storage but keeping the lazy allocation. 
Then subsequent patches that remove the lazy allocation. To also remove 
the device_domain_info it looks like some of the information there might 
need moving into the domain?

>> This patch implements DMA and identity domains to be allocated for 
>> external management. As it isn’t known which device will be attached
>> to a domain, the dma domain is not initialised at alloc time. Instead
>> it is allocated when attached. As we may lose RMRR mappings when
>> attaching a device to a new domain, we also ensure these are mapped
>> at attach time.
>> 
>> This will likely conflict with the work done for auxiliary domains by 
>> Baolu but the code to accommodate won’t change much.
>> 
>> I had also started on a patch to remove find_or_alloc_domain and
>> various functions that call it but had issues with edge cases such as 
>> iommu_prepare_isa that is doing domain operations at IOMMU init time.
>> 
>> Cheers,
>> James.
>> 
>> 
>> ---
>> drivers/iommu/intel-iommu.c | 159
>> +--- 1 file changed, 110
>> insertions(+), 49 deletions(-)
>> 
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 41a4b8808802..6437cb2e9b22 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -351,6 +351,14 @@ static int hw_pass_through = 1;
>> /* si_domain contains mulitple devices */
>> #define DOMAIN_FLAG_STATIC_IDENTITY  (1 << 1)
>> 
>> +/* Domain managed externally, don't cleanup if it isn't attached
>> + * to any devices. */
>> +#define DOMAIN_FLAG_NO_CLEANUP  (1 << 2)
>> +
> the name NO_CLEANUP is a little counter intuitive to me, should it be
> called UNINITIALISED?

I don’t think uninitialised is accurate, the domain may be initialised. It 
is used to distinguish between domains allocated by the external API, 
which we shouldn’t automatically cleanup, and domains allocated internally, 
which should. I agree a better name could be found.

>> +/* Set after domain initialisation. Used when allocating dma domains
>> to
>> + * defer domain initialisation until it is attached to a device */
>> +#define DOMAIN_FLAG_INITIALISED (1 << 4)
>> +
>> #define for_each_domain_iommu(idx, domain)   \
>>  for (idx = 0; idx < g_num_of_iommus; idx++) \
>>  if (domain->iommu_refcnt[idx])
>> @@ -624,6 +632,16 @@ static inline int domain_type_is_vm_or_si(struct
>> dmar_doma

Re: [RFC v2] iommu/vt-d: Allow iommu_domain_alloc to allocate IOMMU_DOMAIN_DMA

2019-01-02 Thread James Sewart
Bump

> On 5 Dec 2018, at 17:19, James Sewart  wrote:
> 
> Hey,
> 
> There exists an issue in the logic used to determine domain association 
> with devices. Currently the driver uses find_or_alloc_domain to either 
> reuse an existing domain or allocate a new one if one isn’t found. Domains 
> should be shared between all members of an IOMMU group as this is the 
> minimum granularity that we can guarantee address space isolation.
> 
> The intel IOMMU driver exposes pci_device_group in intel_iommu_ops as the 
> function to call to determine the group of a device, this is implemented 
> in the generic IOMMU code and checks: dma aliases, upstream pcie switch 
> ACS, pci aliases, and pci function aliases. The find_or_alloc_domain code 
> currently only uses dma aliases to determine if a domain is shared. This 
> causes a disconnect between IOMMU groups and domains. We have observed 
> devices under a pcie switch each having their own domain but assigned the 
> same group.
> 
> One solution would be to fix the logic in find_or_alloc_domain to add 
> checks for the other conditions that a device may share a domain. However, 
> this duplicates code which the generic IOMMU code implements. Instead this 
> issue can be fixed by allowing the allocation of default_domain on the 
> IOMMU group. This is not currently supported as the intel driver does not 
> allow allocation of domain type IOMMU_DOMAIN_DMA.
> 
> Allowing allocation of DMA domains has the effect that the default_domain 
> is non NULL and is attached to a device when initialising. This delegates 
> the handling of domains to the generic IOMMU code. Once this is 
> implemented it is possible to remove the lazy allocation of domains 
> entirely.
> 
> This patch implements DMA and identity domains to be allocated for 
> external management. As it isn’t known which device will be attached to a 
> domain, the dma domain is not initialised at alloc time. Instead it is 
> allocated when attached. As we may lose RMRR mappings when attaching a 
> device to a new domain, we also ensure these are mapped at attach time.
> 
> This will likely conflict with the work done for auxiliary domains by 
> Baolu but the code to accommodate won’t change much.
> 
> I had also started on a patch to remove find_or_alloc_domain and various 
> functions that call it but had issues with edge cases such as 
> iommu_prepare_isa that is doing domain operations at IOMMU init time.
> 
> Cheers,
> James.
> 
> 
> ---
> drivers/iommu/intel-iommu.c | 159 +---
> 1 file changed, 110 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 41a4b8808802..6437cb2e9b22 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -351,6 +351,14 @@ static int hw_pass_through = 1;
> /* si_domain contains mulitple devices */
> #define DOMAIN_FLAG_STATIC_IDENTITY   (1 << 1)
> 
> +/* Domain managed externally, don't cleanup if it isn't attached
> + * to any devices. */
> +#define DOMAIN_FLAG_NO_CLEANUP   (1 << 2)
> +
> +/* Set after domain initialisation. Used when allocating dma domains to
> + * defer domain initialisation until it is attached to a device */
> +#define DOMAIN_FLAG_INITIALISED  (1 << 4)
> +
> #define for_each_domain_iommu(idx, domain)\
>   for (idx = 0; idx < g_num_of_iommus; idx++) \
>   if (domain->iommu_refcnt[idx])
> @@ -624,6 +632,16 @@ static inline int domain_type_is_vm_or_si(struct 
> dmar_domain *domain)
>   DOMAIN_FLAG_STATIC_IDENTITY);
> }
> 
> +static inline int domain_managed_externally(struct dmar_domain *domain)
> +{
> + return domain->flags & DOMAIN_FLAG_NO_CLEANUP;
> +}
> +
> +static inline int domain_is_initialised(struct dmar_domain *domain)
> +{
> + return domain->flags & DOMAIN_FLAG_INITIALISED;
> +}
> +
> static inline int domain_pfn_supported(struct dmar_domain *domain,
>  unsigned long pfn)
> {
> @@ -1717,7 +1735,7 @@ static void disable_dmar_iommu(struct intel_iommu 
> *iommu)
> 
>   __dmar_remove_one_dev_info(info);
> 
> - if (!domain_type_is_vm_or_si(domain)) {
> + if (!domain_managed_externally(domain)) {
>   /*
>* The domain_exit() function  can't be called under
>* device_domain_lock, as it takes this lock itself.
> @@ -1951,6 +1969,7 @@ static int domain_init(struct dmar_domain *domain, 
> struct intel_iommu *iommu,
>   domain->pgd = (struct dma_pte