On 8/11/2014 12:43 AM, Alex Williamson wrote:
> On Mon, 2014-08-11 at 10:54 +0800, Yijing Wang wrote:
>> We found some strange devices in HP C7000 and Huawei Server. These devices
>> can not be enumerated by OS, but they still did DMA read/write without OS 
>> management. Because iommu will not create the DMA mapping for these devices,
>> the DMA read/write will be blocked by iommu hardware.
>>
>> Eg.
>>  \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
>>              +-01.0-[11]--
>>                       +-01.1-[02]--
>>                       +-02.0-[04]--+-00.0  Emulex Corporation OneConnect 
>> 10Gb NIC (be3)
>>               |            +-00.1  Emulex Corporation OneConnect 10Gb NIC 
>> (be3)
>>               |            +-00.2  Emulex Corporation OneConnect 10Gb iSCSI 
>> Initiator (be3)
>>               |            \-00.3  Emulex Corporation OneConnect 10Gb iSCSI 
>> Initiator (be3)
>>               +-02.1-[12]--
>> Kernel only found four devices in bus 0x04, but we found following DMA 
>> errors in dmesg.
>>
>> [ 1438.477262] DRHD: handling fault status reg 402
>> [ 1438.498278] DMAR:[DMA Write] Request device [04:00.4] fault addr bdf70000 
>> [ 1438.498280] DMAR:[fault reason 02] Present bit in context entry is clear
>> [ 1438.566458] DMAR:[DMA Write] Request device [04:00.5] fault addr bdf70000 
>> [ 1438.566460] DMAR:[fault reason 02] Present bit in context entry is clear
>> [ 1438.635211] DMAR:[DMA Write] Request device [04:00.6] fault addr bdf70000 
>> [ 1438.635213] DMAR:[fault reason 02] Present bit in context entry is clear
>> [ 1438.703849] DMAR:[DMA Write] Request device [04:00.7] fault addr bdf70000 
>> [ 1438.703851] DMAR:[fault reason 02] Present bit in context entry is clear
>>
>> Signed-off-by: Yijing Wang <wangyij...@huawei.com>
>> ---
>>  arch/x86/include/asm/iommu.h |    2 ++
>>  arch/x86/kernel/pci-dma.c    |    8 ++++++++
>>  drivers/iommu/intel-iommu.c  |   41 
>> +++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 51 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
>> index 345c99c..5e3a2d8 100644
>> --- a/arch/x86/include/asm/iommu.h
>> +++ b/arch/x86/include/asm/iommu.h
>> @@ -5,6 +5,8 @@ extern struct dma_map_ops nommu_dma_ops;
>>  extern int force_iommu, no_iommu;
>>  extern int iommu_detected;
>>  extern int iommu_pass_through;
>> +extern int iommu_pt_force_bus;
>> +extern int iommu_pt_force_domain;
>>  
>>  /* 10 seconds */
>>  #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000)
>> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
>> index a25e202..bf21d97 100644
>> --- a/arch/x86/kernel/pci-dma.c
>> +++ b/arch/x86/kernel/pci-dma.c
>> @@ -44,6 +44,8 @@ int iommu_detected __read_mostly = 0;
>>   * guests and not for driver dma translation.
>>   */
>>  int iommu_pass_through __read_mostly;
>> +int iommu_pt_force_bus = -1;
>> +int iommu_pt_force_domain = -1;
>>  
>>  extern struct iommu_table_entry __iommu_table[], __iommu_table_end[];
>>  
>> @@ -146,6 +148,7 @@ void dma_generic_free_coherent(struct device *dev, 
>> size_t size, void *vaddr,
>>   */
>>  static __init int iommu_setup(char *p)
>>  {
>> +    char *end;
>>      iommu_merge = 1;
>>  
>>      if (!p)
>> @@ -192,6 +195,11 @@ static __init int iommu_setup(char *p)
>>  #endif
>>              if (!strncmp(p, "pt", 2))
>>                      iommu_pass_through = 1;
>> +            if (!strncmp(p, "pt_force=", 9)) {
>> +                    iommu_pass_through = 1;
>> +                    iommu_pt_force_domain = simple_strtol(p+9, &end, 0);
>> +                    iommu_pt_force_bus = simple_strtol(end+1, NULL, 0);
> 
> Documentation/kernel-parameters.txt?
> 
>> +            }
>>  
>>              gart_parse_options(p);
>>  
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index d1f5caa..49757f1 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -2705,6 +2705,47 @@ static int __init 
>> iommu_prepare_static_identity_mapping(int hw)
>>                              return ret;
>>              }
>>  
>> +    /* We found some strange devices in HP c7000 and other platforms that
>> +     * can not be enumerated by OS, but they did DMA read/write without
>> +     * driver management, so we should create the pt mapping for these
>> +     * devices to avoid DMA errors. Add iommu=pt_force=segment:busnum to
>> +     * force to do pt context mapping in the bus number.
>> +     */
> 
> So best case with this patch is that the user needs to discover that
> this option exists, figure out the undocumented parameters, be running
> on VT-d, permanently add a kernel commandline option, and never have any
> intention of assigning the device to userspace or a VM...
> 
> Can't we handle this with the DMA alias quirks that are now in 3.17?  Or
> can the vendor fix this with a firmware update?  This device behavior is
> really quite broken for this kind of server class product.  

Yeah, something doesn't sound right here.

I would like to hear more about this configuration, off list if you prefer.
What servers?  What firmware revisions?

Thanks,

-- ljk

> Thanks,
> 
> Alex
> 
>> +    if (iommu_pt_force_bus >= 0 && iommu_pt_force_bus >= 0) {
>> +            int found = 0;
>> +
>> +            iommu = NULL;
>> +            for_each_active_iommu(iommu, drhd) {
>> +                    if (iommu_pt_force_domain != drhd->segment)
>> +                            continue;
>> +
>> +                    for_each_active_dev_scope(drhd->devices, 
>> drhd->devices_cnt, i, dev) {
>> +                            if (!dev_is_pci(dev))
>> +                                    continue;
>> +
>> +                            pdev = to_pci_dev(dev);
>> +                            if (pdev->bus->number == iommu_pt_force_bus ||
>> +                                            (pdev->subordinate
>> +                                             && pdev->subordinate->number 
>> <= iommu_pt_force_bus
>> +                                             && 
>> pdev->subordinate->busn_res.end >= iommu_pt_force_bus)) {
>> +                                    found = 1;
>> +                                    break;
>> +                            }
>> +                    }
>> +
>> +                    if (drhd->include_all) {
>> +                            found = 1;
>> +                            break;
>> +                    }
>> +            }
>> +
>> +            if (found && iommu)
>> +                    for (i = 0; i < 256; i++)
>> +                            domain_context_mapping_one(si_domain, iommu, 
>> iommu_pt_force_bus,
>> +                                            i,  hw ? 
>> CONTEXT_TT_PASS_THROUGH :
>> +                                            CONTEXT_TT_MULTI_LEVEL);
>> +    }
>> +
>>      return 0;
>>  }
>>  
> 
> 
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to