Re: possible dmar_init_reserved_ranges() error

2016-12-22 Thread Bjorn Helgaas
Hi Ashok,

On Thu, Dec 22, 2016 at 03:45:08PM -0800, Raj, Ashok wrote:
> Hi Bjorn
> 
> None in the platform group say they know about this. So i'm fairly sure
> we don't do that on Intel hardware (x86). 

I'm pretty sure there was once an x86 prototype for which PCI bus
addresses were not identical to CPU physical addresses, but I have no
idea whether it shipped that way.

Even if such a system never shipped, the x86 arch code supports _TRA,
and there's no reason to make the unnecessary assumption in this code
that _TRA is always zero.

If we didn't want to use pcibios_resource_to_bus() here for some
reason, we should at least add a comment about why we think it's OK to
use a CPU physical address as an IOVA.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: possible dmar_init_reserved_ranges() error

2016-12-22 Thread Raj, Ashok
Hi Bjorn

None in the platform group say they know about this. So i'm fairly sure
we don't do that on Intel hardware (x86). 

I'm not sure about the usage, it appears maybe it was a hack 
pre-virtualization for some direct access?  (just wild guessing)

On Thu, Dec 22, 2016 at 03:32:38PM -0800, Raj, Ashok wrote:
> Let me check and keep you posted if we have such platforms to make sure if
> we need this considerations for _TRA.

Cheers,
Ashok

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: possible dmar_init_reserved_ranges() error

2016-12-22 Thread Raj, Ashok
Hi Bjorn

On Thu, Dec 22, 2016 at 02:28:03PM -0600, Bjorn Helgaas wrote:
> On Thu, Dec 22, 2016 at 05:27:14PM +0100, Joerg Roedel wrote:
> > Hi Bjorn,
> > 
> > On Mon, Dec 19, 2016 at 03:20:44PM -0600, Bjorn Helgaas wrote:
> > > I have some questions about dmar_init_reserved_ranges().  On systems
> > > where CPU physical address space is not identity-mapped to PCI bus
> > > address space, e.g., where the PCI host bridge windows have _TRA
> > > offsets, I'm not sure we're doing the right thing.
> > > 
> > > Assume we have a PCI host bridge with _TRA that maps CPU addresses
> > > 0x8000-0x9fff to PCI bus addresses 0x-0x1fff, with
> > > two PCI devices below it:

This is the first time I'm hearing about it too!,and tracked it to 
2002, one of Bjorn's patches from past life :-)

> > > 
> > >   PCI host bridge domain  [bus 00-3f]
> > >   PCI host bridge window [mem 0x8000-0x9fff] (bus 
> > > 0x-0x1fff]
> > >   00:00.0: BAR 0 [mem 0x8000-0x8] (0x-0x0fff on 
> > > bus)
> > >   00:01.0: BAR 0 [mem 0x9000-0x9] (0x1000-0x1fff on 
> > > bus)
> > > 
> > > The IOMMU init code in dmar_init_reserved_ranges() reserves the PCI
> > > MMIO space for all devices:
> > > 
> > >   pci_iommu_init()
> > > intel_iommu_init()
> > >   dmar_init_reserved_ranges()
> > > reserve_iova(0x8000-0x8)
> > > reserve_iova(0x9000-0x9)
> > > 
> > > This looks odd because we're reserving CPU physical addresses, but
> > > the IOVA space contains *PCI bus* addresses.  On most x86 systems they
> > > would be the same, but not on all.
> > 
> > Interesting, I wasn't aware of that. Looks like we are not doing the
> > right thing in dmar_init_reserved_ranges(). How is that handled without
> > an IOMMU, when the bus-addresses overlap with ram addresses?

I'm not sure if there are platforms that i'm aware of that do _TRA. I'm 
checking internally if others have come across something like that.

> 
> I don't know enough about these systems to answer that.  One way would
> be to avoid overlaps, e.g., by using bus addresses
> 0x8000-0x and not putting RAM at those addresses.  Or
> maybe the host bridge could apply a constant offset to bus addresses
> before forwarding transactions up to the sytem bus.
> 
> > > Assume the driver for 00:00.0 maps a page of main memory for DMA.  It
> > > may receive a dma_addr_t of 0x1000:
> > > 
> > >   00:00.0: intel_map_page() returns dma_addr_t 0x1000
> > >   00:00.0: issues DMA to 0x1000
> > > 
> > > What happens here?  The DMA access should go to main memory.  In
> > > conventional PCI it would be a peer-to-peer access to device 00:01.0.
> > > Is there enough PCIe smarts (ACS or something?) to do otherwise?
> > 
> > If there is a bridge doing ACS between the devices, the IOMMU will see
> > the request and re-map it to its RAM address.

True, if its all acs enabled, we don't need this, probably true for legacy.
But it doesn't matter in big scheme of things to reserve.

> > 
> > > The dmar_init_reserved_ranges() comment says "Reserve all PCI MMIO to
> > > avoid peer-to-peer access."  Without _TRA, CPU addresses and PCI bus
> > > addresses would be identical, and I think these reserve_iova() calls
> > > *would* prevent this situation.  So maybe we're just missing a
> > > pcibios_resource_to_bus() here?
> > 
> > I'll have a look, the AMD IOMMU driver implements this too, so it needs
> > also be fixed there. Do you know which x86 systems are configured like
> > this?
> 
Let me check and keep you posted if we have such platforms to make sure if
we need this considerations for _TRA.

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: possible dmar_init_reserved_ranges() error

2016-12-22 Thread Bjorn Helgaas
On Thu, Dec 22, 2016 at 05:27:14PM +0100, Joerg Roedel wrote:
> Hi Bjorn,
> 
> On Mon, Dec 19, 2016 at 03:20:44PM -0600, Bjorn Helgaas wrote:
> > I have some questions about dmar_init_reserved_ranges().  On systems
> > where CPU physical address space is not identity-mapped to PCI bus
> > address space, e.g., where the PCI host bridge windows have _TRA
> > offsets, I'm not sure we're doing the right thing.
> > 
> > Assume we have a PCI host bridge with _TRA that maps CPU addresses
> > 0x8000-0x9fff to PCI bus addresses 0x-0x1fff, with
> > two PCI devices below it:
> > 
> >   PCI host bridge domain  [bus 00-3f]
> >   PCI host bridge window [mem 0x8000-0x9fff] (bus 
> > 0x-0x1fff]
> >   00:00.0: BAR 0 [mem 0x8000-0x8] (0x-0x0fff on bus)
> >   00:01.0: BAR 0 [mem 0x9000-0x9] (0x1000-0x1fff on bus)
> > 
> > The IOMMU init code in dmar_init_reserved_ranges() reserves the PCI
> > MMIO space for all devices:
> > 
> >   pci_iommu_init()
> > intel_iommu_init()
> >   dmar_init_reserved_ranges()
> > reserve_iova(0x8000-0x8)
> > reserve_iova(0x9000-0x9)
> > 
> > This looks odd because we're reserving CPU physical addresses, but
> > the IOVA space contains *PCI bus* addresses.  On most x86 systems they
> > would be the same, but not on all.
> 
> Interesting, I wasn't aware of that. Looks like we are not doing the
> right thing in dmar_init_reserved_ranges(). How is that handled without
> an IOMMU, when the bus-addresses overlap with ram addresses?

I don't know enough about these systems to answer that.  One way would
be to avoid overlaps, e.g., by using bus addresses
0x8000-0x and not putting RAM at those addresses.  Or
maybe the host bridge could apply a constant offset to bus addresses
before forwarding transactions up to the sytem bus.

> > Assume the driver for 00:00.0 maps a page of main memory for DMA.  It
> > may receive a dma_addr_t of 0x1000:
> > 
> >   00:00.0: intel_map_page() returns dma_addr_t 0x1000
> >   00:00.0: issues DMA to 0x1000
> > 
> > What happens here?  The DMA access should go to main memory.  In
> > conventional PCI it would be a peer-to-peer access to device 00:01.0.
> > Is there enough PCIe smarts (ACS or something?) to do otherwise?
> 
> If there is a bridge doing ACS between the devices, the IOMMU will see
> the request and re-map it to its RAM address.
> 
> > The dmar_init_reserved_ranges() comment says "Reserve all PCI MMIO to
> > avoid peer-to-peer access."  Without _TRA, CPU addresses and PCI bus
> > addresses would be identical, and I think these reserve_iova() calls
> > *would* prevent this situation.  So maybe we're just missing a
> > pcibios_resource_to_bus() here?
> 
> I'll have a look, the AMD IOMMU driver implements this too, so it needs
> also be fixed there. Do you know which x86 systems are configured like
> this?

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b4873931cc8c
added this support, and I'm pretty sure it was tested, but I don't
know what machines it was for.  I know many large ia64 systems use
this _TRA support, but I don't have first-hand knowledge of x86
systems that do.

The untested patch below is what I was thinking for the Intel IOMMU
driver.

Bjorn


commit 529a6db0b0b2ff37a0cdb49d11eee4eb6f960a48
Author: Bjorn Helgaas 
Date:   Tue Dec 20 11:08:09 2016 -0600

iommu/vt-d: Reserve IOVA space for bus address, not CPU address

IOVA space contains bus addresses, not CPU addresses.  On many systems they
are identical, but PCI host bridges in some systems do apply an address
offset when forwarding CPU MMIO transactions to PCI.  In ACPI, this is
expressed as a _TRA offset in the window descriptor.

Convert the PCI resource CPU addresses to PCI bus addresses before
reserving them in the IOVA space.

Signed-off-by: Bjorn Helgaas 

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c66c273..be78ab7 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1865,6 +1865,7 @@ static struct lock_class_key reserved_rbtree_key;
 static int dmar_init_reserved_ranges(void)
 {
struct pci_dev *pdev = NULL;
+   struct pci_bus_region region;
struct iova *iova;
int i;
 
@@ -1890,9 +1891,11 @@ static int dmar_init_reserved_ranges(void)
r = >resource[i];
if (!r->flags || !(r->flags & IORESOURCE_MEM))
continue;
+
+   pcibios_resource_to_bus(pdev->bus, , r);
iova = reserve_iova(_iova_list,
-   IOVA_PFN(r->start),
-   IOVA_PFN(r->end));
+   IOVA_PFN(region.start),
+   

Re: [RFC v4 15/16] vfio/type1: Check MSI remapping at irq domain level

2016-12-22 Thread Auger Eric
Hi Diana,

On 22/12/2016 13:41, Diana Madalina Craciun wrote:
> Hi Eric,
> 
> On 12/13/2016 10:32 PM, Eric Auger wrote:
>> In case the IOMMU does not bypass MSI transactions (typical
>> case on ARM), we check all MSI controllers are IRQ remapping
>> capable. If not the IRQ assignment may be unsafe.
>>
>> At this stage the arm-smmu-(v3) still advertise the
>> IOMMU_CAP_INTR_REMAP capability at IOMMU level. This will be
>> removed in subsequent patches.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 9 ++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index d07fe73..a05648b 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -37,6 +37,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #define DRIVER_VERSION  "0.2"
>>  #define DRIVER_AUTHOR   "Alex Williamson "
>> @@ -765,7 +766,7 @@ static int vfio_iommu_type1_attach_group(void 
>> *iommu_data,
>>  struct vfio_domain *domain, *d;
>>  struct bus_type *bus = NULL;
>>  int ret;
>> -bool resv_msi;
>> +bool resv_msi, msi_remap;
>>  phys_addr_t resv_msi_base;
>>  
>>  mutex_lock(>lock);
>> @@ -818,8 +819,10 @@ static int vfio_iommu_type1_attach_group(void 
>> *iommu_data,
>>  INIT_LIST_HEAD(>group_list);
>>  list_add(>next, >group_list);
>>  
>> -if (!allow_unsafe_interrupts &&
>> -!iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>> +msi_remap = resv_msi ? irq_domain_check_msi_remap() :
>> +   iommu_capable(bus, IOMMU_CAP_INTR_REMAP);
>> +
>> +if (!allow_unsafe_interrupts && !msi_remap) {
>>  pr_warn("%s: No interrupt remapping support.  Use the module 
>> param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this 
>> platform\n",
>> __func__);
>>  ret = -EPERM;
> 
> I tested your v4.9-reserved-v4 branch on a ITS capable hardware (NXP
> LS2080), so I did not set allow_unsafe_interrupts. It fails here
> complaining that the there is no interrupt remapping support. The
> irq_domain_check_msi_remap function returns false as none of the checked
> domains has the IRQ_DOMAIN_FLAG_MSI_REMAP flag set. I think the reason
> is that the flags are not propagated through the domain hierarchy when
> the domain is created.

Hum OK. Please apologize for the inconvenience, all the more so this is
the second time you report the same issue for different cause :-( At the
moment I can't test on a GICv3 ITS based system. I will try to fix that
though.

I would like to get the confirmation introducing this flag is the right
direction though.

Thanks

Eric
> 
> Thanks,
> 
> Diana
> 
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v4 15/16] vfio/type1: Check MSI remapping at irq domain level

2016-12-22 Thread Diana Madalina Craciun
Hi Eric,

On 12/13/2016 10:32 PM, Eric Auger wrote:
> In case the IOMMU does not bypass MSI transactions (typical
> case on ARM), we check all MSI controllers are IRQ remapping
> capable. If not the IRQ assignment may be unsafe.
>
> At this stage the arm-smmu-(v3) still advertise the
> IOMMU_CAP_INTR_REMAP capability at IOMMU level. This will be
> removed in subsequent patches.
>
> Signed-off-by: Eric Auger 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index d07fe73..a05648b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -37,6 +37,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson "
> @@ -765,7 +766,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>   struct vfio_domain *domain, *d;
>   struct bus_type *bus = NULL;
>   int ret;
> - bool resv_msi;
> + bool resv_msi, msi_remap;
>   phys_addr_t resv_msi_base;
>  
>   mutex_lock(>lock);
> @@ -818,8 +819,10 @@ static int vfio_iommu_type1_attach_group(void 
> *iommu_data,
>   INIT_LIST_HEAD(>group_list);
>   list_add(>next, >group_list);
>  
> - if (!allow_unsafe_interrupts &&
> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
> + msi_remap = resv_msi ? irq_domain_check_msi_remap() :
> +iommu_capable(bus, IOMMU_CAP_INTR_REMAP);
> +
> + if (!allow_unsafe_interrupts && !msi_remap) {
>   pr_warn("%s: No interrupt remapping support.  Use the module 
> param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this 
> platform\n",
>  __func__);
>   ret = -EPERM;

I tested your v4.9-reserved-v4 branch on a ITS capable hardware (NXP
LS2080), so I did not set allow_unsafe_interrupts. It fails here
complaining that the there is no interrupt remapping support. The
irq_domain_check_msi_remap function returns false as none of the checked
domains has the IRQ_DOMAIN_FLAG_MSI_REMAP flag set. I think the reason
is that the flags are not propagated through the domain hierarchy when
the domain is created.

Thanks,

Diana



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu