RE: [PATCH 4/4] iommu: Add domain window handling functions

2013-01-31 Thread Sethi Varun-B16395


 -Original Message-
 From: Joerg Roedel [mailto:j...@8bytes.org]
 Sent: Thursday, January 31, 2013 3:14 AM
 To: Sethi Varun-B16395
 Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; Joerg
 Roedel
 Subject: [PATCH 4/4] iommu: Add domain window handling functions
 
 Add the iommu_domain_wnd_enable() and iommu_domain_wnd_disable()
 functions to the IOMMU-API. These functions will be used to setup domains
 that are based on subwindows and not on paging.
 
 Signed-off-by: Joerg Roedel j...@8bytes.org
 ---
  drivers/iommu/iommu.c |   20 
  include/linux/iommu.h |   18 ++
  2 files changed, 38 insertions(+)
 
 diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
 ab9dafd..55ae3bf 100644
 --- a/drivers/iommu/iommu.c
 +++ b/drivers/iommu/iommu.c
 @@ -852,6 +852,26 @@ size_t iommu_unmap(struct iommu_domain *domain,
 unsigned long iova, size_t size)  }  EXPORT_SYMBOL_GPL(iommu_unmap);
 
 +
 +int iommu_domain_wnd_enable(struct iommu_domain *domain, u32 window,
 + unsigned long offset, size_t size) {
 + if (unlikely(domain-ops-domain_wnd_enable == NULL))
 + return -ENODEV;
 +
 + return domain-ops-domain_wnd_enable(domain, window, offset,
 size); }
 +EXPORT_SYMBOL_GPL(iommu_domain_wnd_enable);
 +
 +void iommu_domain_wnd_disable(struct iommu_domain *domain, u32 window)
 +{
 + if (unlikely(domain-ops-domain_wnd_disable == NULL))
 + return;
 +
 + return domain-ops-domain_wnd_disable(domain, window); }
 +EXPORT_SYMBOL_GPL(iommu_domain_wnd_disable);
 +
  static int __init iommu_init(void)
  {
   iommu_group_kset = kset_create_and_add(iommu_groups,
 diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
 26066f5..f01657e 100644
 --- a/include/linux/iommu.h
 +++ b/include/linux/iommu.h
 @@ -90,6 +90,9 @@ struct iommu_ops {
  phys_addr_t paddr, size_t size, int prot);
   size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
size_t size);
 + int (*domain_wnd_enable)(struct iommu_domain *domain, u32 window,
 +unsigned long offset, size_t size);
 + void (*domain_wnd_disable)(struct iommu_domain *domain, u32
 window);
   phys_addr_t (*iova_to_phys)(struct iommu_domain *domain,
   unsigned long iova);
   int (*domain_has_cap)(struct iommu_domain *domain, @@ -123,6 +126,9
 @@ extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot);  extern
 size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
  size_t size);
 +extern int iommu_domain_wnd_enable(struct iommu_domain *domain, u32
 window,
 +unsigned long offset, size_t size); extern
 void
 +iommu_domain_wnd_disable(struct iommu_domain *domain, u32 window);
  extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
 unsigned long iova);
  extern int iommu_domain_has_cap(struct iommu_domain *domain, @@ -240,6
 +246,18 @@ static inline int iommu_unmap(struct iommu_domain *domain,
 unsigned long iova,
   return -ENODEV;
  }
 
 +static inline int iommu_domain_wnd_enable(struct iommu_domain *domain,
 +   u32 window, unsigned long offset,
 +   size_t size)
 +{
 + return -ENODEV;
 +}
 +
 +static inline void iommu_domain_wnd_disable(struct iommu_domain *domain,
 + u32 window)
 +{
 +}
 +
  static inline phys_addr_t iommu_iova_to_phys(struct iommu_domain
 *domain,
unsigned long iova)
  {
We would need a corresponding physical address in the iommu_domain_wnd_enable 
call. The sub windows can point to physically discontiguous locations. Also, 
although we support partial mappings where the sub window size  
geometry_size/max_sub_windows, but the mapping would always start from the sub 
window base (sub window base address would be aligned to (geometry 
size)/max_sub_windows). The user of the API would have to ensure that the iova 
is aligned to the max sub window size. So, offset is not relevant in our case 
as it would always be zero. The size should be u64 in order to accommodate 
window sizes supported by PAMU.


int iommu_domain_wnd_enable(struct iommu_domain *domain, u32 window, 
phys_addr_t paddr, u64 size)

We need a mechanism to determine the maximum number of subwindows supported by 
PAMU. How about representing it in the iommu_domain structure:
struct  iommu_domain {
struct iommu_ops *ops;
void *priv;
iommu_fault_handler_t handler;
void *handler_token;
struct iommu_domain_geometry geometry;
u32 max_sub_windows; - maximum number of sub windows supported by 
the hardware.
}

Also, we would need to set the number 

IO_PAGE_FAULTs on unity mapped regions during amd_iommu_init() in Linux 3.4

2013-01-31 Thread Shuah Khan
Joerg,

I am seeing IO_PAGE_FAULTs on AMD system running releases prior to 3.7.
I focused my debug and testing on 3.4. I am hoping to find a solution
for this problem in 3.4. I don't see any IO_PAGE_FAULTs with 3.7 and
later releases on this system.

On this system BIOS specifies Unity mapped (direct mapped) exclusion
ranges in IVMDs for several devices. These regions are in use during
BIOS hand-off to kernel and continue to be used during kernel boot and
run-time.

Access to these ranges continues to work with no errors until AMD IOMMU
driver disables and re-enables IOMMU in enable_iommus(). These faults
don't persist and appear between the enable_iommus() call and before
amd_iommu_init() gets done printing AMD-Vi: Lazy IO/TLB flushing
enabled message.

Read requests from device 02:00.2 and write request from device 03:00.0
to these unity mapped regions fail. The reason appears to be because
domain id is 0.

Domain gets assigned in amd_iommu_init_dma_ops() and unity maps are
handled. I don't see enable_iommus() doing anything to these unity
mapped exclusion ranges. So I am assuming that is not the issue,
however, could domain ids get flushed? More like, why do these faults
show up in this window? These are direct mapped, so there is no need for
any translations.

Please see below for IVMD dump and IO_PAGE_FAULT analysis.

Dump of these ranges from dmesg:

[5.322280] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0
devid_end: 04:00.3 range_start: 000f range_end:
0010 flags: 7
[5.322367] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0
devid_end: 04:00.3 range_start: bff7 range_end:
bfff flags: 7
[5.322454] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0
devid_end: 04:00.3 range_start: 000e8000 range_end:
000e9000 flags: 7
[5.322540] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0
devid_end: 04:00.3 range_start: bdffe000 range_end:
be00 flags: 7
[5.322627] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0
devid_end: 04:00.3 range_start: bdff9000 range_end:
bdffd000 flags: 7
[5.322714] AMD-Vi: IVMD_TYPE_ALL devid_start: 00:00.0
devid_end: 04:00.3 range_start: bdfe9000 range_end:
bdff9000 flags: 7


Now to IO_PAGE_FAULT analysis: My observations in 

[   15.281594] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.2
domain=0x address=0xbdffe000 flags=0x0050]
[   15.281861] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.2
domain=0x address=0xbdff9080 flags=0x0050]
[   15.281990] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.2
domain=0x address=0xbdff9100 flags=0x0050]

Domain ID is zero - PASID not valid
flags=0x0050 - Bits PE and PR are set in the Event.
TR: translation TR=0
  TR is 0 that means it is a transaction request
RZ: reserved bit RZ=0
  Since PR is set RZ is meaningful, I/O page fault is due to an invalid
   level encoding
PE: permission indicator PE=1
  Device doesn't have permission for this transaction
RW: read-write RW=0
  RW is meaningful since PR=1, TR=0, and I=0. It is a Read transaction
PR: Present PR=1
  PR = 1 means transaction is to a page marked present
I: interrupt I=0
  transaction is a memory request
US: user-supervisor US=0
  Supervisor privileges were asserted.
NX: no execute NX=0
  0 upstream transaction lacks a PASID TLP prefix. Domain ID is zero.
GN: guest/nested GN=0
  Transaction used a nested address (GPA).

[   15.281733] AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0
domain=0x address=0xbdff9160 flags=0x0070]

Domain ID is zero - PASID is not valid
flags=0x0070 - Bits PE, RW, and PR are set in the Event.
TR: translation TR=0
  TR is 0 that means it is a transaction request
RZ: reserved bit RZ=0
  Since PR is set RZ is meaningful, I/O page fault is due to an invalid
   level encoding
PE: permission indicator PE=1
  Device doesn't have permission for this transaction
RW: read-write RW=1
  RW is meaningful since PR=1, TR=0, and I=0. It is a Write
transaction
PR: Present PR=1
  PR = 1 means transaction is to a page marked present
I: interrupt I=0
  transaction is a memory request
US: user-supervisor US=0
  Supervisor privileges were asserted.
NX: no execute NX=0
  0 upstream transaction lacks a PASID TLP prefix. Domain ID is zero.
GN: guest/nested GN=0
  Transaction used a nested address (GPA).

Thanks,
-- Shuah

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu