date:20190828

Re: [RFC PATCH] iommu/vt-d: Fix IOMMU field not populated on device hot re-plug

2019-08-28 Thread Lu Baolu


Hi Janusz,

On 8/28/19 10:17 PM, Janusz Krzysztofik wrote:

We should avoid kernel panic when a intel_unmap() is called against
a non-existent domain.

Does that mean you suggest to replace
BUG_ON(!domain);
with something like
if (WARN_ON(!domain))
return;
and to not care of orphaned mappings left allocated?  Is there a way to inform
users that their active DMA mappings are no longer valid and they shouldn't
call dma_unmap_*()?


But we shouldn't expect the IOMMU driver not
cleaning up the domain info when a device remove notification comes and
wait until all file descriptors being closed, right?

Shouldn't then the IOMMU driver take care of cleaning up resources still
allocated on device remove before it invalidates and forgets their pointers?



You are right. We need to wait until all allocated resources (iova and
mappings) to be released.

How about registering a callback for BUS_NOTIFY_UNBOUND_DRIVER, and
removing the domain info when the driver detachment completes?


Thanks,
Janusz


Best regards,
Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: /proc/vmcore and wrong PAGE_OFFSET

2019-08-28 Thread Donald Buczek


Dear Bhupesh,

On 28.08.19 21:54, Bhupesh Sharma wrote:

Hi Donald,

On Wed, Aug 28, 2019 at 8:38 PM Donald Buczek  wrote:


On 8/20/19 11:21 PM, Donald Buczek wrote:

Dear Linux folks,

I'm investigating a problem, that the crash utility fails to work with our 
crash dumps:

  buczek@kreios:/mnt$ crash vmlinux crash.vmcore
  crash 7.2.6
  Copyright (C) 2002-2019  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005, 2011  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for 
details.
  GNU gdb (GDB) 7.6
  Copyright (C) 2013 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later 

  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "x86_64-unknown-linux-gnu"...
  crash: read error: kernel virtual address: 89807ff77000  type: "memory 
section root table"

The crash file is a copy of /dev/vmcore taken by a crashkernel after a 
sysctl-forced panic.

It looks to me, that  0x89807ff77000 is not readable, because the virtual 
addresses stored in the elf header of the dump file are off by 
0x0080:

  buczek@kreios:/mnt$ readelf -a crash.vmcore | grep LOAD | perl -lane 'printf 
"%s (%016x)\n",$_,hex($F[2])-hex($F[3])'
LOAD   0xd000 0x8100 0x01007d00 
(feff0400)
LOAD   0x01c33000 0x88001000 0x1000 
(8800)
LOAD   0x01cc1000 0x8809 0x0009 
(8800)
LOAD   0x01cd1000 0x8810 0x0010 
(8800)
LOAD   0x01cd2070 0x88100070 0x00100070 
(8800)
LOAD   0x19bd2000 0x88003800 0x3800 
(8800)
LOAD   0x4e6a1000 0x88006000 0x6000 
(8800)
LOAD   0x4e6a2000 0x8801 0x0001 
(8800)
LOAD   0x001fcda22000 0x88208000 0x00208000 
(8800)
LOAD   0x003fcd9a2000 0x88408000 0x00408000 
(8800)
LOAD   0x005fcd922000 0x88608000 0x00608000 
(8800)
LOAD   0x007fcd8a2000 0x88808000 0x00808000 
(8800)
LOAD   0x009fcd822000 0x88a08000 0x00a08000 
(8800)
LOAD   0x00bfcd7a2000 0x88c08000 0x00c08000 
(8800)
LOAD   0x00dfcd722000 0x88e08000 0x00e08000 
(8800)
LOAD   0x00fc4d722000 0x88fe 0x00fe 
(8800)

(Columns are File offset, Virtual Address, Physical Address and computed 
offset).

I would expect the offset between the virtual and the physical address to be 
PAGE_OFFSET, which is 0x888 on x86_64, not 0x8800. 
Unlike /proc/vmcore, /proc/kcore shows the same physical memory (of the last 
memory section above) with a correct offset:

  buczek@kreios:/mnt$ sudo readelf -a /proc/kcore | grep 0x00fe | perl 
-lane 'printf "%s (%016x)\n",$_,hex($F[2])-hex($F[3])'
LOAD   0x097e4000 0x897e 0x00fe 
(8880)

The failing address 0x89807ff77000 happens to be at the end of the last 
memory section. It is the mem_section array, which crash wants to load and 
which is visible in the running system:

  buczek@kreios:/mnt$ sudo gdb vmlinux /proc/kcore
  [...]
  (gdb) print mem_section
  $1 = (struct mem_section **) 0x89807ff77000
  (gdb) print *mem_section
  $2 = (struct mem_section *) 0x88a07f37b000
  (gdb) print **mem_section
  $3 = {section_mem_map = 18446719884453740551, pageblock_flags = 
0x88a07f36f040}

I can read the same information from the crash dump, if I account for the 
0x0080 error:

  buczek@kreios:/mnt$ gdb vmlinux crash.vmcore
  [...]
  (gdb) print mem_section
  $1 = (struct m

Re: [GIT PULL] iommu/arm-smmu: Big batch of updates for 5.4

2019-08-28 Thread Will Deacon

Hi Joerg,

On Fri, Aug 23, 2019 at 03:54:40PM +0100, Will Deacon wrote:
> Please pull these ARM SMMU updates for 5.4. The branch is based on the
> for-joerg/batched-unmap branch that you pulled into iommu/core already
> because I didn't want to rebase everything onto -rc3. The pull request
> was generated against iommu/core.

Just a gentle nudge on this pull request, since it would be nice to have
it sit in -next for a bit before the merge window opens.

Please let me know if you need anything more from me.

Cheers,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/amd: silence warnings under memory pressure

2019-08-28 Thread Qian Cai

When running heavy memory pressure workloads, the system is throwing
endless warnings,

smartpqi :23:00.0: AMD-Vi: IOMMU mapping error in map_sg (io-pages:
5 reason: -12)
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40
07/10/2019
swapper/10: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC),
nodemask=(null),cpuset=/,mems_allowed=0,4
Call Trace:
 
 dump_stack+0x62/0x9a
 warn_alloc.cold.43+0x8a/0x148
 __alloc_pages_nodemask+0x1a5c/0x1bb0
 get_zeroed_page+0x16/0x20
 iommu_map_page+0x477/0x540
 map_sg+0x1ce/0x2f0
 scsi_dma_map+0xc6/0x160
 pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470 [smartpqi]
 do_IRQ+0x81/0x170
 common_interrupt+0xf/0xf
 

because the allocation could fail from iommu_map_page(), and the volume
of this call could be huge which may generate a lot of serial console
output and cosumes all CPUs.

Fix it by silencing the warning in this call site, and there is still a
dev_err() later to notify the failure.

Signed-off-by: Qian Cai 
---
 drivers/iommu/amd_iommu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index b607a92791d3..19eef1edf8ed 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2547,7 +2547,9 @@ static int map_sg(struct device *dev, struct scatterlist 
*sglist,
 
bus_addr  = address + s->dma_address + (j << 
PAGE_SHIFT);
phys_addr = (sg_phys(s) & PAGE_MASK) + (j << 
PAGE_SHIFT);
-   ret = iommu_map_page(domain, bus_addr, phys_addr, 
PAGE_SIZE, prot, GFP_ATOMIC);
+   ret = iommu_map_page(domain, bus_addr, phys_addr,
+PAGE_SIZE, prot,
+GFP_ATOMIC | __GFP_NOWARN);
if (ret)
goto out_unmap;
 
-- 
1.8.3.1

Re: /proc/vmcore and wrong PAGE_OFFSET

2019-08-28 Thread Bhupesh Sharma

Hi Donald,

On Wed, Aug 28, 2019 at 8:38 PM Donald Buczek  wrote:
>
> On 8/20/19 11:21 PM, Donald Buczek wrote:
> > Dear Linux folks,
> >
> > I'm investigating a problem, that the crash utility fails to work with our 
> > crash dumps:
> >
> >  buczek@kreios:/mnt$ crash vmlinux crash.vmcore
> >  crash 7.2.6
> >  Copyright (C) 2002-2019  Red Hat, Inc.
> >  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
> >  Copyright (C) 1999-2006  Hewlett-Packard Co
> >  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
> >  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
> >  Copyright (C) 2005, 2011  NEC Corporation
> >  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
> >  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> >  This program is free software, covered by the GNU General Public 
> > License,
> >  and you are welcome to change it and/or distribute copies of it under
> >  certain conditions.  Enter "help copying" to see the conditions.
> >  This program has absolutely no warranty.  Enter "help warranty" for 
> > details.
> >  GNU gdb (GDB) 7.6
> >  Copyright (C) 2013 Free Software Foundation, Inc.
> >  License GPLv3+: GNU GPL version 3 or later 
> > 
> >  This is free software: you are free to change and redistribute it.
> >  There is NO WARRANTY, to the extent permitted by law.  Type "show 
> > copying"
> >  and "show warranty" for details.
> >  This GDB was configured as "x86_64-unknown-linux-gnu"...
> >  crash: read error: kernel virtual address: 89807ff77000  type: 
> > "memory section root table"
> >
> > The crash file is a copy of /dev/vmcore taken by a crashkernel after a 
> > sysctl-forced panic.
> >
> > It looks to me, that  0x89807ff77000 is not readable, because the 
> > virtual addresses stored in the elf header of the dump file are off by 
> > 0x0080:
> >
> >  buczek@kreios:/mnt$ readelf -a crash.vmcore | grep LOAD | perl -lane 
> > 'printf "%s (%016x)\n",$_,hex($F[2])-hex($F[3])'
> >LOAD   0xd000 0x8100 
> > 0x01007d00 (feff0400)
> >LOAD   0x01c33000 0x88001000 
> > 0x1000 (8800)
> >LOAD   0x01cc1000 0x8809 
> > 0x0009 (8800)
> >LOAD   0x01cd1000 0x8810 
> > 0x0010 (8800)
> >LOAD   0x01cd2070 0x88100070 
> > 0x00100070 (8800)
> >LOAD   0x19bd2000 0x88003800 
> > 0x3800 (8800)
> >LOAD   0x4e6a1000 0x88006000 
> > 0x6000 (8800)
> >LOAD   0x4e6a2000 0x8801 
> > 0x0001 (8800)
> >LOAD   0x001fcda22000 0x88208000 
> > 0x00208000 (8800)
> >LOAD   0x003fcd9a2000 0x88408000 
> > 0x00408000 (8800)
> >LOAD   0x005fcd922000 0x88608000 
> > 0x00608000 (8800)
> >LOAD   0x007fcd8a2000 0x88808000 
> > 0x00808000 (8800)
> >LOAD   0x009fcd822000 0x88a08000 
> > 0x00a08000 (8800)
> >LOAD   0x00bfcd7a2000 0x88c08000 
> > 0x00c08000 (8800)
> >LOAD   0x00dfcd722000 0x88e08000 
> > 0x00e08000 (8800)
> >LOAD   0x00fc4d722000 0x88fe 
> > 0x00fe (8800)
> >
> > (Columns are File offset, Virtual Address, Physical Address and computed 
> > offset).
> >
> > I would expect the offset between the virtual and the physical address to 
> > be PAGE_OFFSET, which is 0x888 on x86_64, not 
> > 0x8800. Unlike /proc/vmcore, /proc/kcore shows the same 
> > physical memory (of the last memory section above) with a correct offset:
> >
> >  buczek@kreios:/mnt$ sudo readelf -a /proc/kcore | grep 
> > 0x00fe | perl -lane 'printf "%s 
> > (%016x)\n",$_,hex($F[2])-hex($F[3])'
> >LOAD   0x097e4000 0x897e 
> > 0x00fe (8880)
> >
> > The failing address 0x89807ff77000 happens to be at the end of the last 
> > memory section. It is the mem_section array, which crash wants to load and 
> > which is visible in the running system:
> >
> >  buczek@kreios:/mnt$ sudo gdb vmlinux /proc/kcore
> >  [...]
> >  (gdb) print mem_section
> >  $1 = (struct mem_section **) 0x89807ff77000
> >  (gdb) print *mem_section
> >  $2 = (struct mem_section *) 0x88a07f37b000
> >  (gdb) print **mem_section
> >  $3 = {sect

Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices

2019-08-28 Thread Bjorn Helgaas

On Wed, Aug 28, 2019 at 11:21:53AM -0700, Kuppuswamy Sathyanarayanan wrote:
> On Mon, Aug 19, 2019 at 06:19:25PM -0500, Bjorn Helgaas wrote:
> > On Mon, Aug 19, 2019 at 03:53:31PM -0700, Kuppuswamy Sathyanarayanan wrote:
> > > On Mon, Aug 19, 2019 at 09:15:00AM -0500, Bjorn Helgaas wrote:
> > > > On Thu, Aug 15, 2019 at 03:39:03PM -0700, Kuppuswamy Sathyanarayanan 
> > > > wrote:
> > > > > On 8/15/19 3:20 PM, Bjorn Helgaas wrote:
> > > > > > [+cc Joerg, David, iommu list: because IOMMU drivers are the only
> > > > > > callers of pci_enable_pri() and pci_enable_pasid()]
> > > > > > 
> > > > > > On Thu, Aug 01, 2019 at 05:06:01PM -0700, 
> > > > > > sathyanarayanan.kuppusw...@linux.intel.com wrote:
> > > > > > > From: Kuppuswamy Sathyanarayanan 
> > > > > > > 
> > > > > > > 
> > > > > > > When IOMMU tries to enable Page Request Interface (PRI) for VF 
> > > > > > > device
> > > > > > > in iommu_enable_dev_iotlb(), it always fails because PRI support 
> > > > > > > for
> > > > > > > PCIe VF device is currently broken. Current implementation expects
> > > > > > > the given PCIe device (PF & VF) to implement PRI capability before
> > > > > > > enabling the PRI support. But this assumption is incorrect. As 
> > > > > > > per PCIe
> > > > > > > spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use 
> > > > > > > the
> > > > > > > PRI of the PF and not implement it. Hence we need to create 
> > > > > > > exception
> > > > > > > for handling the PRI support for PCIe VF device.
> > > > > > > 
> > > > > > > Also, since PRI is a shared resource between PF/VF, following 
> > > > > > > rules
> > > > > > > should apply.
> > > > > > > 
> > > > > > > 1. Use proper locking before accessing/modifying PF resources in 
> > > > > > > VF
> > > > > > > PRI enable/disable call.
> > > > > > > 2. Use reference count logic to track the usage of PRI resource.
> > > > > > > 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is 
> > > > > > > zero.
> > > > 
> > > > > > Wait, why do we need this at all?  I agree the spec says VFs may not
> > > > > > implement PRI or PASID capabilities and that VFs use the PRI and
> > > > > > PASID of the PF.
> > > > > > 
> > > > > > But why do we need to support pci_enable_pri() and 
> > > > > > pci_enable_pasid()
> > > > > > for VFs?  There's nothing interesting we can *do* in the VF, and
> > > > > > passing it off to the PF adds all this locking mess.  For VFs, can 
> > > > > > we
> > > > > > just make them do nothing or return -EINVAL?  What functionality 
> > > > > > would
> > > > > > we be missing if we did that?
> > > > > 
> > > > > Currently PRI/PASID capabilities are not enabled by default. IOMMU can
> > > > > enable PRI/PASID for VF first (and not enable it for PF). In this 
> > > > > case,
> > > > > doing nothing for VF device will break the functionality.
> > > > 
> > > > What is the path where we can enable PRI/PASID for VF but not for the
> > > > PF?  The call chains leading to pci_enable_pri() go through the
> > > > iommu_ops.add_device interface, which makes me think this is part of
> > > > the device enumeration done by the PCI core, and in that case I would
> > > > think this it should be done for the PF before VFs.  But maybe this
> > > > path isn't exercised until a driver does a DMA map or something
> > > > similar?
> > 
> > > AFAIK, this path will only get exercised when the device does DMA and
> > > hence there is no specific order in which PRI/PASID is enabled in PF/VF.
> > > In fact, my v2 version of this patch set had a check to ensure PF
> > > PRI/PASID enable is happened before VF attempts PRI/PASID
> > > enable/disable. But I had to remove it in later version of this series
> > > due to failure case reported by one the tester of this code. 
> > 
> > What's the path?  And does that path make sense?
> > 
> > I got this far before giving up:
> > 
> > iommu_go_to_state   # AMD
> >   state_next
> > amd_iommu_init_pci
> >   amd_iommu_init_api
> > bus_set_iommu
> >   iommu_bus_init
> > bus_for_each_dev(..., add_iommu_group)
> >   add_iommu_group
> > iommu_probe_device
> >   amd_iommu_add_device  # 
> > amd_iommu_ops.add_device
> > init_iommu_group
> >   iommu_group_get_for_dev
> > iommu_group_add_device
> >   __iommu_attach_device
> > amd_iommu_attach_device # 
> > amd_iommu_ops.attach_dev
> >   attach_device # amd_iommu
> > pdev_iommuv2_enable
> >   pci_enable_pri
> > 
> > 
> > iommu_probe_device
> >   intel_iommu_add_device# intel_iommu_ops.add_device
> > domain_add_dev_info
> >   dmar_insert_o

Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices

2019-08-28 Thread Kuppuswamy Sathyanarayanan

On Mon, Aug 19, 2019 at 06:19:25PM -0500, Bjorn Helgaas wrote:
> On Mon, Aug 19, 2019 at 03:53:31PM -0700, Kuppuswamy Sathyanarayanan wrote:
> > On Mon, Aug 19, 2019 at 09:15:00AM -0500, Bjorn Helgaas wrote:
> > > On Thu, Aug 15, 2019 at 03:39:03PM -0700, Kuppuswamy Sathyanarayanan 
> > > wrote:
> > > > On 8/15/19 3:20 PM, Bjorn Helgaas wrote:
> > > > > [+cc Joerg, David, iommu list: because IOMMU drivers are the only
> > > > > callers of pci_enable_pri() and pci_enable_pasid()]
> > > > > 
> > > > > On Thu, Aug 01, 2019 at 05:06:01PM -0700, 
> > > > > sathyanarayanan.kuppusw...@linux.intel.com wrote:
> > > > > > From: Kuppuswamy Sathyanarayanan 
> > > > > > 
> > > > > > 
> > > > > > When IOMMU tries to enable Page Request Interface (PRI) for VF 
> > > > > > device
> > > > > > in iommu_enable_dev_iotlb(), it always fails because PRI support for
> > > > > > PCIe VF device is currently broken. Current implementation expects
> > > > > > the given PCIe device (PF & VF) to implement PRI capability before
> > > > > > enabling the PRI support. But this assumption is incorrect. As per 
> > > > > > PCIe
> > > > > > spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
> > > > > > PRI of the PF and not implement it. Hence we need to create 
> > > > > > exception
> > > > > > for handling the PRI support for PCIe VF device.
> > > > > > 
> > > > > > Also, since PRI is a shared resource between PF/VF, following rules
> > > > > > should apply.
> > > > > > 
> > > > > > 1. Use proper locking before accessing/modifying PF resources in VF
> > > > > > PRI enable/disable call.
> > > > > > 2. Use reference count logic to track the usage of PRI resource.
> > > > > > 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is 
> > > > > > zero.
> > > 
> > > > > Wait, why do we need this at all?  I agree the spec says VFs may not
> > > > > implement PRI or PASID capabilities and that VFs use the PRI and
> > > > > PASID of the PF.
> > > > > 
> > > > > But why do we need to support pci_enable_pri() and pci_enable_pasid()
> > > > > for VFs?  There's nothing interesting we can *do* in the VF, and
> > > > > passing it off to the PF adds all this locking mess.  For VFs, can we
> > > > > just make them do nothing or return -EINVAL?  What functionality would
> > > > > we be missing if we did that?
> > > > 
> > > > Currently PRI/PASID capabilities are not enabled by default. IOMMU can
> > > > enable PRI/PASID for VF first (and not enable it for PF). In this case,
> > > > doing nothing for VF device will break the functionality.
> > > 
> > > What is the path where we can enable PRI/PASID for VF but not for the
> > > PF?  The call chains leading to pci_enable_pri() go through the
> > > iommu_ops.add_device interface, which makes me think this is part of
> > > the device enumeration done by the PCI core, and in that case I would
> > > think this it should be done for the PF before VFs.  But maybe this
> > > path isn't exercised until a driver does a DMA map or something
> > > similar?
> 
> > AFAIK, this path will only get exercised when the device does DMA and
> > hence there is no specific order in which PRI/PASID is enabled in PF/VF.
> > In fact, my v2 version of this patch set had a check to ensure PF
> > PRI/PASID enable is happened before VF attempts PRI/PASID
> > enable/disable. But I had to remove it in later version of this series
> > due to failure case reported by one the tester of this code. 
> 
> What's the path?  And does that path make sense?
> 
> I got this far before giving up:
> 
> iommu_go_to_state   # AMD
>   state_next
> amd_iommu_init_pci
>   amd_iommu_init_api
> bus_set_iommu
>   iommu_bus_init
> bus_for_each_dev(..., add_iommu_group)
>   add_iommu_group
> iommu_probe_device
>   amd_iommu_add_device  # 
> amd_iommu_ops.add_device
> init_iommu_group
>   iommu_group_get_for_dev
> iommu_group_add_device
>   __iommu_attach_device
> amd_iommu_attach_device # 
> amd_iommu_ops.attach_dev
>   attach_device # amd_iommu
> pdev_iommuv2_enable
>   pci_enable_pri
> 
> 
> iommu_probe_device
>   intel_iommu_add_device# intel_iommu_ops.add_device
> domain_add_dev_info
>   dmar_insert_one_dev_info
> domain_context_mapping
>   domain_context_mapping_one
> iommu_enable_dev_iotlb
>   pci_enable_pri
> 
> 
> These *look* like enumeration paths, not DMA setup paths.  But I could
> be wrong, since I gave up before getting to the source.
> 
> I don't want to add all this complexity because we *think* we ne

Re: [PATCH 2/5] x86/pci: Add a to_pci_sysdata helper

2019-08-28 Thread Derrick, Jonathan

On Wed, 2019-08-28 at 16:14 +0200, Christoph Hellwig wrote:
> Various helpers need the pci_sysdata just to dereference a single field
> in it.  Add a little helper that returns the properly typed sysdata
> pointer to require a little less boilerplate code.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/x86/include/asm/pci.h | 28 +---
>  1 file changed, 13 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
> index 6fa846920f5f..75fe28492290 100644
> --- a/arch/x86/include/asm/pci.h
> +++ b/arch/x86/include/asm/pci.h
> @@ -35,12 +35,15 @@ extern int noioapicreroute;
>  
>  #ifdef CONFIG_PCI
>  
> +static inline struct pci_sysdata *to_pci_sysdata(struct pci_bus *bus)
Can you make the argument const to avoid all the warnings from callers
passing const struct pci_bus

snip

Re: /proc/vmcore and wrong PAGE_OFFSET

2019-08-28 Thread Donald Buczek


On 8/20/19 11:21 PM, Donald Buczek wrote:

Dear Linux folks,

I'm investigating a problem, that the crash utility fails to work with our 
crash dumps:

     buczek@kreios:/mnt$ crash vmlinux crash.vmcore
     crash 7.2.6
     Copyright (C) 2002-2019  Red Hat, Inc.
     Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
     Copyright (C) 1999-2006  Hewlett-Packard Co
     Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
     Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
     Copyright (C) 2005, 2011  NEC Corporation
     Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
     Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
     This program is free software, covered by the GNU General Public License,
     and you are welcome to change it and/or distribute copies of it under
     certain conditions.  Enter "help copying" to see the conditions.
     This program has absolutely no warranty.  Enter "help warranty" for 
details.
     GNU gdb (GDB) 7.6
     Copyright (C) 2013 Free Software Foundation, Inc.
     License GPLv3+: GNU GPL version 3 or later 

     This is free software: you are free to change and redistribute it.
     There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
     and "show warranty" for details.
     This GDB was configured as "x86_64-unknown-linux-gnu"...
     crash: read error: kernel virtual address: 89807ff77000  type: "memory 
section root table"

The crash file is a copy of /dev/vmcore taken by a crashkernel after a 
sysctl-forced panic.

It looks to me, that  0x89807ff77000 is not readable, because the virtual 
addresses stored in the elf header of the dump file are off by 
0x0080:

     buczek@kreios:/mnt$ readelf -a crash.vmcore | grep LOAD | perl -lane 'printf 
"%s (%016x)\n",$_,hex($F[2])-hex($F[3])'
   LOAD   0xd000 0x8100 0x01007d00 
(feff0400)
   LOAD   0x01c33000 0x88001000 0x1000 
(8800)
   LOAD   0x01cc1000 0x8809 0x0009 
(8800)
   LOAD   0x01cd1000 0x8810 0x0010 
(8800)
   LOAD   0x01cd2070 0x88100070 0x00100070 
(8800)
   LOAD   0x19bd2000 0x88003800 0x3800 
(8800)
   LOAD   0x4e6a1000 0x88006000 0x6000 
(8800)
   LOAD   0x4e6a2000 0x8801 0x0001 
(8800)
   LOAD   0x001fcda22000 0x88208000 0x00208000 
(8800)
   LOAD   0x003fcd9a2000 0x88408000 0x00408000 
(8800)
   LOAD   0x005fcd922000 0x88608000 0x00608000 
(8800)
   LOAD   0x007fcd8a2000 0x88808000 0x00808000 
(8800)
   LOAD   0x009fcd822000 0x88a08000 0x00a08000 
(8800)
   LOAD   0x00bfcd7a2000 0x88c08000 0x00c08000 
(8800)
   LOAD   0x00dfcd722000 0x88e08000 0x00e08000 
(8800)
   LOAD   0x00fc4d722000 0x88fe 0x00fe 
(8800)

(Columns are File offset, Virtual Address, Physical Address and computed 
offset).

I would expect the offset between the virtual and the physical address to be 
PAGE_OFFSET, which is 0x888 on x86_64, not 0x8800. 
Unlike /proc/vmcore, /proc/kcore shows the same physical memory (of the last 
memory section above) with a correct offset:

     buczek@kreios:/mnt$ sudo readelf -a /proc/kcore | grep 0x00fe | perl 
-lane 'printf "%s (%016x)\n",$_,hex($F[2])-hex($F[3])'
   LOAD   0x097e4000 0x897e 0x00fe 
(8880)

The failing address 0x89807ff77000 happens to be at the end of the last 
memory section. It is the mem_section array, which crash wants to load and 
which is visible in the running system:

     buczek@kreios:/mnt$ sudo gdb vmlinux /proc/kcore
     [...]
     (gdb) print mem_section
     $1 = (struct mem_section **) 0x89807ff77000
     (gdb) print *mem_section
     $2 = (struct mem_section *) 0x88a07f37b000
     (gdb) print **mem_section
     $3 = {section_mem_map = 18446719884453740551, pageblock_flags = 
0x88a07f36f040}

I can read the same information from the crash dump, if I account for the 
0x0080 error:

     buczek@kreios:/mnt$ gdb vmlinux crash.vmcore
     [...]
     (gdb) print mem_section
     $1 = (struct mem_section **) 0x89807ff77000
     (gdb) print *mem_section
     Cannot access memory at address 0x89807ff77000
     (gdb) set $t=(struct mem_section **) ((char *)mem_sec

Re: [PATCH 4/5] PCI/vmd: Stop overriding dma_map_ops

2019-08-28 Thread Keith Busch

On Wed, Aug 28, 2019 at 07:14:42AM -0700, Christoph Hellwig wrote:
> With a little tweak to the intel-iommu code we should be able to work
> around the VMD mess for the requester IDs without having to create giant
> amounts of boilerplate DMA ops wrapping code.  The other advantage of
> this scheme is that we can respect the real DMA masks for the actual
> devices, and I bet it will only be a matter of time until we'll see the
> first DMA challeneged NVMe devices.

This tests out fine on VMD hardware, but it's quite different than the
previous patch. In v1, the original dev was used in iommu_need_mapping(),
but this time it's the vmd device. Is this still using the actual device's
DMA mask then?


> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/intel-iommu.c|  25 ++
>  drivers/pci/controller/Kconfig |   1 -
>  drivers/pci/controller/vmd.c   | 150 -
>  3 files changed, 25 insertions(+), 151 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 12d094d08c0a..aaa35ac73956 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -373,6 +373,23 @@ EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
>  static DEFINE_SPINLOCK(device_domain_lock);
>  static LIST_HEAD(device_domain_list);
>  
> +/*
> + * For VMD we need to use the VMD devices for mapping requests instead of the
> + * actual device to get the proper PCIe requester ID.
> + */
> +static inline struct device *vmd_real_dev(struct device *dev)
> +{
> +#if IS_ENABLED(CONFIG_VMD)
> + if (dev_is_pci(dev)) {
> + struct pci_sysdata *sd = to_pci_dev(dev)->bus->sysdata;
> +
> + if (sd->vmd_dev)
> + return sd->vmd_dev;
> + }
> +#endif
> + return dev;
> +}
> +
>  /*
>   * Iterate over elements in device_domain_list and call the specified
>   * callback @fn against each element.
> @@ -3520,6 +3537,7 @@ static dma_addr_t intel_map_page(struct device *dev, 
> struct page *page,
>enum dma_data_direction dir,
>unsigned long attrs)
>  {
> + dev = vmd_real_dev(dev);
>   if (iommu_need_mapping(dev))
>   return __intel_map_single(dev, page_to_phys(page) + offset,
>   size, dir, *dev->dma_mask);
> @@ -3530,6 +3548,7 @@ static dma_addr_t intel_map_resource(struct device 
> *dev, phys_addr_t phys_addr,
>size_t size, enum dma_data_direction dir,
>unsigned long attrs)
>  {
> + dev = vmd_real_dev(dev);
>   if (iommu_need_mapping(dev))
>   return __intel_map_single(dev, phys_addr, size, dir,
>   *dev->dma_mask);
> @@ -3585,6 +3604,7 @@ static void intel_unmap_page(struct device *dev, 
> dma_addr_t dev_addr,
>size_t size, enum dma_data_direction dir,
>unsigned long attrs)
>  {
> + dev = vmd_real_dev(dev);
>   if (iommu_need_mapping(dev))
>   intel_unmap(dev, dev_addr, size);
>   else
> @@ -3594,6 +3614,7 @@ static void intel_unmap_page(struct device *dev, 
> dma_addr_t dev_addr,
>  static void intel_unmap_resource(struct device *dev, dma_addr_t dev_addr,
>   size_t size, enum dma_data_direction dir, unsigned long attrs)
>  {
> + dev = vmd_real_dev(dev);
>   if (iommu_need_mapping(dev))
>   intel_unmap(dev, dev_addr, size);
>  }
> @@ -3605,6 +3626,7 @@ static void *intel_alloc_coherent(struct device *dev, 
> size_t size,
>   struct page *page = NULL;
>   int order;
>  
> + dev = vmd_real_dev(dev);
>   if (!iommu_need_mapping(dev))
>   return dma_direct_alloc(dev, size, dma_handle, flags, attrs);
>  
> @@ -3641,6 +3663,7 @@ static void intel_free_coherent(struct device *dev, 
> size_t size, void *vaddr,
>   int order;
>   struct page *page = virt_to_page(vaddr);
>  
> + dev = vmd_real_dev(dev);
>   if (!iommu_need_mapping(dev))
>   return dma_direct_free(dev, size, vaddr, dma_handle, attrs);
>  
> @@ -3661,6 +3684,7 @@ static void intel_unmap_sg(struct device *dev, struct 
> scatterlist *sglist,
>   struct scatterlist *sg;
>   int i;
>  
> + dev = vmd_real_dev(dev);
>   if (!iommu_need_mapping(dev))
>   return dma_direct_unmap_sg(dev, sglist, nelems, dir, attrs);
>  
> @@ -3685,6 +3709,7 @@ static int intel_map_sg(struct device *dev, struct 
> scatterlist *sglist, int nele
>   struct intel_iommu *iommu;
>  
>   BUG_ON(dir == DMA_NONE);
> + dev = vmd_real_dev(dev);
>   if (!iommu_need_mapping(dev))
>   return dma_direct_map_sg(dev, sglist, nelems, dir, attrs);
>  
> diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
> index fe9f9f13ce11..920546cb84e2 100644
> --- a/drivers/pci/controller/Kconfig
> +++ b/drivers/pci/controller/Kconfig
> @@ -267,7

Re: [RFC PATCH] iommu/vt-d: Fix IOMMU field not populated on device hot re-plug

2019-08-28 Thread Janusz Krzysztofik

On Wednesday, August 28, 2019 2:56:18 AM CEST Lu Baolu wrote:
> Hi Janusz,
> 
> On 8/27/19 5:35 PM, Janusz Krzysztofik wrote:
> > Hi Lu,
> > 
> > On Monday, August 26, 2019 10:29:12 AM CEST Lu Baolu wrote:
> >> Hi Janusz,
> >>
> >> On 8/26/19 4:15 PM, Janusz Krzysztofik wrote:
> >>> Hi Lu,
> >>>
> >>> On Friday, August 23, 2019 3:51:11 AM CEST Lu Baolu wrote:
>  Hi,
> 
>  On 8/22/19 10:29 PM, Janusz Krzysztofik wrote:
> > When a perfectly working i915 device is hot unplugged (via sysfs) and
> > hot re-plugged again, its dev->archdata.iommu field is not populated
> > again with an IOMMU pointer.  As a result, the device probe fails on
> > DMA mapping error during scratch page setup.
> >
> > It looks like that happens because devices are not detached from their
> > MMUIO bus before they are removed on device unplug.  Then, when an
> > already registered device/IOMMU association is identified by the
> > reinstantiated device's bus and function IDs on IOMMU bus re-attach
> > attempt, the device's archdata is not populated with IOMMU information
> > and the bad happens.
> >
> > I'm not sure if this is a proper fix but it works for me so at least 
it
> > confirms correctness of my analysis results, I believe.  So far I
> > haven't been able to identify a good place where the possibly missing
> > IOMMU bus detach on device unplug operation could be added.
> 
>  Which kernel version are you testing with? Does it contain below 
commit?
> 
>  commit 458b7c8e0dde12d140e3472b80919cbb9ae793f4
>  Author: Lu Baolu 
>  Date:   Thu Aug 1 11:14:58 2019 +0800
> >>>
> >>> I was using an internal branch based on drm-tip which didn't contain 
this
> >>> commit yet.  Fortunately it has been already merged into drm-tip over 
last
> >>> weekend and has effectively fixed the issue.
> >>
> >> Thanks for testing this.
> > 
> > My testing appeared not sufficiently exhaustive. The fix indeed resolved 
my
> > initially discovered issue of not being able to rebind the i915 driver to 
a
> > re-plugged device, however it brought another, probably more serious 
problem
> > to light.
> > 
> > When an open i915 device is hot unplugged, IOMMU bus notifier now cleans 
up
> > IOMMU info for the device on PCI device remove while the i915 driver is 
still
> > not released, kept by open file descriptors.  Then, on last device close,
> > cleanup attempts lead to kernel panic raised from intel_unmap() on 
unresolved
> > IOMMU domain.
> 
> We should avoid kernel panic when a intel_unmap() is called against
> a non-existent domain.

Does that mean you suggest to replace
BUG_ON(!domain);
with something like
if (WARN_ON(!domain))
return;
and to not care of orphaned mappings left allocated?  Is there a way to inform 
users that their active DMA mappings are no longer valid and they shouldn't 
call dma_unmap_*()?

> But we shouldn't expect the IOMMU driver not
> cleaning up the domain info when a device remove notification comes and 
> wait until all file descriptors being closed, right?

Shouldn't then the IOMMU driver take care of cleaning up resources still 
allocated on device remove before it invalidates and forgets their pointers?

Thanks,
Janusz


> Best regards,
> Baolu
> 
> > 
> > With commit 458b7c8e0dde reverted and my fix applied, both late device 
close
> > and device re-plug work for me.  However, I can realize that's probably 
still
> > not a complete solution, possibly missing some protection against reuse of 
a
> > removed device other than for cleanup.  If you think that's the right way 
to
> > go, I can work more on that.
> > 
> > I've had a look at other drivers and found AMD is using somehow similar
> > approach.  On the other hand, looking at the IOMMU common code I couldn't
> > identify any arrangement that would support deferred device cleanup.
> > 
> > If that approach is not acceptable for Intel IOMMU, please suggest a way 
you'd
> > like to have it resolved and I can try to implement it.
> > 
> > Thanks,
> > Janusz
> > 
> >> Best regards,
> >> Lu Baolu
> >>
> 




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 3/5] x86/pci: Replace the vmd_domain field with a vmd_dev pointer

2019-08-28 Thread Christoph Hellwig

Store the actual VMD device in struct pci_sysdata, so that we can later
use it directly for DMA mappings.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/include/asm/pci.h   | 5 ++---
 drivers/pci/controller/vmd.c | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index 75fe28492290..a9bb4cdb66d4 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -25,7 +25,7 @@ struct pci_sysdata {
void*fwnode;/* IRQ domain for MSI assignment */
 #endif
 #if IS_ENABLED(CONFIG_VMD)
-   bool vmd_domain;/* True if in Intel VMD domain */
+   struct device   *vmd_dev;   /* Main device if in Intel VMD domain */
 #endif
 };
 
@@ -64,12 +64,11 @@ static inline void *_pci_root_bus_fwnode(struct pci_bus 
*bus)
 #if IS_ENABLED(CONFIG_VMD)
 static inline bool is_vmd(struct pci_bus *bus)
 {
-   return to_pci_sysdata(bus)->vmd_domain;
+   return to_pci_sysdata(bus)->vmd_dev != NULL;
 }
 #else
 #define is_vmd(bus)false
 #endif /* CONFIG_VMD */
-}
 
 /* Can be used to override the logic in pci_scan_bus for skipping
already-configured bus numbers - to be used for buggy BIOSes
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 4575e0c6dc4b..785cb657c8c2 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -660,7 +660,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned 
long features)
.parent = res,
};
 
-   sd->vmd_domain = true;
+   sd->vmd_dev = &vmd->dev->dev;
sd->domain = vmd_find_free_domain();
if (sd->domain < 0)
return sd->domain;
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/5] x86/pci: Add a to_pci_sysdata helper

2019-08-28 Thread Christoph Hellwig

Various helpers need the pci_sysdata just to dereference a single field
in it.  Add a little helper that returns the properly typed sysdata
pointer to require a little less boilerplate code.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/include/asm/pci.h | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index 6fa846920f5f..75fe28492290 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -35,12 +35,15 @@ extern int noioapicreroute;
 
 #ifdef CONFIG_PCI
 
+static inline struct pci_sysdata *to_pci_sysdata(struct pci_bus *bus)
+{
+   return bus->sysdata;
+}
+
 #ifdef CONFIG_PCI_DOMAINS
 static inline int pci_domain_nr(struct pci_bus *bus)
 {
-   struct pci_sysdata *sd = bus->sysdata;
-
-   return sd->domain;
+   return to_pci_sysdata(bus)->domain;
 }
 
 static inline int pci_proc_domain(struct pci_bus *bus)
@@ -52,23 +55,20 @@ static inline int pci_proc_domain(struct pci_bus *bus)
 #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
 static inline void *_pci_root_bus_fwnode(struct pci_bus *bus)
 {
-   struct pci_sysdata *sd = bus->sysdata;
-
-   return sd->fwnode;
+   return to_pci_sysdata(bus)->fwnode;
 }
 
 #define pci_root_bus_fwnode_pci_root_bus_fwnode
 #endif
 
+#if IS_ENABLED(CONFIG_VMD)
 static inline bool is_vmd(struct pci_bus *bus)
 {
-#if IS_ENABLED(CONFIG_VMD)
-   struct pci_sysdata *sd = bus->sysdata;
-
-   return sd->vmd_domain;
+   return to_pci_sysdata(bus)->vmd_domain;
+}
 #else
-   return false;
-#endif
+#define is_vmd(bus)false
+#endif /* CONFIG_VMD */
 }
 
 /* Can be used to override the logic in pci_scan_bus for skipping
@@ -128,9 +128,7 @@ void native_restore_msi_irqs(struct pci_dev *dev);
 /* Returns the node based on pci bus */
 static inline int __pcibus_to_node(const struct pci_bus *bus)
 {
-   const struct pci_sysdata *sd = bus->sysdata;
-
-   return sd->node;
+   return to_pci_sysdata(bus)->node;
 }
 
 static inline const struct cpumask *
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 4/5] PCI/vmd: Stop overriding dma_map_ops

2019-08-28 Thread Christoph Hellwig

With a little tweak to the intel-iommu code we should be able to work
around the VMD mess for the requester IDs without having to create giant
amounts of boilerplate DMA ops wrapping code.  The other advantage of
this scheme is that we can respect the real DMA masks for the actual
devices, and I bet it will only be a matter of time until we'll see the
first DMA challeneged NVMe devices.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/intel-iommu.c|  25 ++
 drivers/pci/controller/Kconfig |   1 -
 drivers/pci/controller/vmd.c   | 150 -
 3 files changed, 25 insertions(+), 151 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 12d094d08c0a..aaa35ac73956 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -373,6 +373,23 @@ EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
 static DEFINE_SPINLOCK(device_domain_lock);
 static LIST_HEAD(device_domain_list);
 
+/*
+ * For VMD we need to use the VMD devices for mapping requests instead of the
+ * actual device to get the proper PCIe requester ID.
+ */
+static inline struct device *vmd_real_dev(struct device *dev)
+{
+#if IS_ENABLED(CONFIG_VMD)
+   if (dev_is_pci(dev)) {
+   struct pci_sysdata *sd = to_pci_dev(dev)->bus->sysdata;
+
+   if (sd->vmd_dev)
+   return sd->vmd_dev;
+   }
+#endif
+   return dev;
+}
+
 /*
  * Iterate over elements in device_domain_list and call the specified
  * callback @fn against each element.
@@ -3520,6 +3537,7 @@ static dma_addr_t intel_map_page(struct device *dev, 
struct page *page,
 enum dma_data_direction dir,
 unsigned long attrs)
 {
+   dev = vmd_real_dev(dev);
if (iommu_need_mapping(dev))
return __intel_map_single(dev, page_to_phys(page) + offset,
size, dir, *dev->dma_mask);
@@ -3530,6 +3548,7 @@ static dma_addr_t intel_map_resource(struct device *dev, 
phys_addr_t phys_addr,
 size_t size, enum dma_data_direction dir,
 unsigned long attrs)
 {
+   dev = vmd_real_dev(dev);
if (iommu_need_mapping(dev))
return __intel_map_single(dev, phys_addr, size, dir,
*dev->dma_mask);
@@ -3585,6 +3604,7 @@ static void intel_unmap_page(struct device *dev, 
dma_addr_t dev_addr,
 size_t size, enum dma_data_direction dir,
 unsigned long attrs)
 {
+   dev = vmd_real_dev(dev);
if (iommu_need_mapping(dev))
intel_unmap(dev, dev_addr, size);
else
@@ -3594,6 +3614,7 @@ static void intel_unmap_page(struct device *dev, 
dma_addr_t dev_addr,
 static void intel_unmap_resource(struct device *dev, dma_addr_t dev_addr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
+   dev = vmd_real_dev(dev);
if (iommu_need_mapping(dev))
intel_unmap(dev, dev_addr, size);
 }
@@ -3605,6 +3626,7 @@ static void *intel_alloc_coherent(struct device *dev, 
size_t size,
struct page *page = NULL;
int order;
 
+   dev = vmd_real_dev(dev);
if (!iommu_need_mapping(dev))
return dma_direct_alloc(dev, size, dma_handle, flags, attrs);
 
@@ -3641,6 +3663,7 @@ static void intel_free_coherent(struct device *dev, 
size_t size, void *vaddr,
int order;
struct page *page = virt_to_page(vaddr);
 
+   dev = vmd_real_dev(dev);
if (!iommu_need_mapping(dev))
return dma_direct_free(dev, size, vaddr, dma_handle, attrs);
 
@@ -3661,6 +3684,7 @@ static void intel_unmap_sg(struct device *dev, struct 
scatterlist *sglist,
struct scatterlist *sg;
int i;
 
+   dev = vmd_real_dev(dev);
if (!iommu_need_mapping(dev))
return dma_direct_unmap_sg(dev, sglist, nelems, dir, attrs);
 
@@ -3685,6 +3709,7 @@ static int intel_map_sg(struct device *dev, struct 
scatterlist *sglist, int nele
struct intel_iommu *iommu;
 
BUG_ON(dir == DMA_NONE);
+   dev = vmd_real_dev(dev);
if (!iommu_need_mapping(dev))
return dma_direct_map_sg(dev, sglist, nelems, dir, attrs);
 
diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
index fe9f9f13ce11..920546cb84e2 100644
--- a/drivers/pci/controller/Kconfig
+++ b/drivers/pci/controller/Kconfig
@@ -267,7 +267,6 @@ config PCIE_TANGO_SMP8759
 
 config VMD
depends on PCI_MSI && X86_64 && SRCU
-   select X86_DEV_DMA_OPS
tristate "Intel Volume Management Device Driver"
---help---
  Adds support for the Intel Volume Management Device (VMD). VMD is a
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 785cb657c8c2..ba017ebba6a7 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/p

[PATCH 5/5] x86/pci: Remove X86_DEV_DMA_OPS

2019-08-28 Thread Christoph Hellwig

There are no users of X86_DEV_DMA_OPS left, so remove the code.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/Kconfig  |  3 ---
 arch/x86/include/asm/device.h | 10 -
 arch/x86/pci/common.c | 38 ---
 3 files changed, 51 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 222855cc0158..35597dae38b7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2905,9 +2905,6 @@ config HAVE_ATOMIC_IOMAP
def_bool y
depends on X86_32
 
-config X86_DEV_DMA_OPS
-   bool
-
 source "drivers/firmware/Kconfig"
 
 source "arch/x86/kvm/Kconfig"
diff --git a/arch/x86/include/asm/device.h b/arch/x86/include/asm/device.h
index a8f6c809d9b1..3e6c75a6d070 100644
--- a/arch/x86/include/asm/device.h
+++ b/arch/x86/include/asm/device.h
@@ -11,16 +11,6 @@ struct dev_archdata {
 #endif
 };
 
-#if defined(CONFIG_X86_DEV_DMA_OPS) && defined(CONFIG_PCI_DOMAINS)
-struct dma_domain {
-   struct list_head node;
-   const struct dma_map_ops *dma_ops;
-   int domain_nr;
-};
-void add_dma_domain(struct dma_domain *domain);
-void del_dma_domain(struct dma_domain *domain);
-#endif
-
 struct pdev_archdata {
 };
 
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 9acab6ac28f5..d2ac803b6c00 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -625,43 +625,6 @@ unsigned int pcibios_assign_all_busses(void)
return (pci_probe & PCI_ASSIGN_ALL_BUSSES) ? 1 : 0;
 }
 
-#if defined(CONFIG_X86_DEV_DMA_OPS) && defined(CONFIG_PCI_DOMAINS)
-static LIST_HEAD(dma_domain_list);
-static DEFINE_SPINLOCK(dma_domain_list_lock);
-
-void add_dma_domain(struct dma_domain *domain)
-{
-   spin_lock(&dma_domain_list_lock);
-   list_add(&domain->node, &dma_domain_list);
-   spin_unlock(&dma_domain_list_lock);
-}
-EXPORT_SYMBOL_GPL(add_dma_domain);
-
-void del_dma_domain(struct dma_domain *domain)
-{
-   spin_lock(&dma_domain_list_lock);
-   list_del(&domain->node);
-   spin_unlock(&dma_domain_list_lock);
-}
-EXPORT_SYMBOL_GPL(del_dma_domain);
-
-static void set_dma_domain_ops(struct pci_dev *pdev)
-{
-   struct dma_domain *domain;
-
-   spin_lock(&dma_domain_list_lock);
-   list_for_each_entry(domain, &dma_domain_list, node) {
-   if (pci_domain_nr(pdev->bus) == domain->domain_nr) {
-   pdev->dev.dma_ops = domain->dma_ops;
-   break;
-   }
-   }
-   spin_unlock(&dma_domain_list_lock);
-}
-#else
-static void set_dma_domain_ops(struct pci_dev *pdev) {}
-#endif
-
 static void set_dev_domain_options(struct pci_dev *pdev)
 {
if (is_vmd(pdev->bus))
@@ -697,7 +660,6 @@ int pcibios_add_device(struct pci_dev *dev)
pa_data = data->next;
memunmap(data);
}
-   set_dma_domain_ops(dev);
set_dev_domain_options(dev);
return 0;
 }
-- 
2.20.1

stop overriding dma_ops in vmd v2

2019-08-28 Thread Christoph Hellwig

Hi all,

this is a new version of the vmd dma_map_ops removal, which does not
require vmd to be built in.  Instead we slightly expand the vmd-specific
field in the x86 pci_sysdata to cover that information.

Note that I do not have a vmd-enable system, so some testing by the
maintainers would be welcome.

[PATCH 1/5] x86/pci: Remove an ifdef KERNEL from pci.h

2019-08-28 Thread Christoph Hellwig

Non-UAPI headers can't be included by userspace, so remove the
pointless ifdef.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/include/asm/pci.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index e662f987dfa2..6fa846920f5f 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -12,8 +12,6 @@
 #include 
 #include 
 
-#ifdef __KERNEL__
-
 struct pci_sysdata {
int domain; /* PCI domain */
int node;   /* NUMA node */
@@ -118,7 +116,6 @@ void native_restore_msi_irqs(struct pci_dev *dev);
 #define native_setup_msi_irqs  NULL
 #define native_teardown_msi_irqNULL
 #endif
-#endif  /* __KERNEL__ */
 
 #ifdef CONFIG_X86_64
 #include 
-- 
2.20.1

[PATCH] iommu/iova: avoid false sharing on fq_timer_on

2019-08-28 Thread Eric Dumazet

In commit 14bd9a607f90 ("iommu/iova: Separate atomic variables
to improve performance") Jinyu Qi identified that the atomic_cmpxchg()
in queue_iova() was causing a performance loss and moved critical fields
so that the false sharing would not impact them.

However, avoiding the false sharing in the first place seems easy.
We should attempt the atomic_cmpxchg() no more than 100 times
per second. Adding an atomic_read() will keep the cache
line mostly shared.

This false sharing came with commit 9a005a800ae8
("iommu/iova: Add flush timer").

Signed-off-by: Eric Dumazet 
Cc: Jinyu Qi 
Cc: Joerg Roedel 
---
 drivers/iommu/iova.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 
3e1a8a6755723a927a7942a7429ab7e6c19a0027..41c605b0058f9615c2dbdd83f1de2404a9b1d255
 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -577,7 +577,9 @@ void queue_iova(struct iova_domain *iovad,
 
spin_unlock_irqrestore(&fq->lock, flags);
 
-   if (atomic_cmpxchg(&iovad->fq_timer_on, 0, 1) == 0)
+   /* Avoid false sharing as much as possible. */
+   if (!atomic_read(&iovad->fq_timer_on) &&
+   !atomic_cmpxchg(&iovad->fq_timer_on, 0, 1))
mod_timer(&iovad->fq_timer,
  jiffies + msecs_to_jiffies(IOVA_FQ_TIMEOUT));
 }
-- 
2.23.0.187.g17f5b7556c-goog

[PATCH v10 4/4] mmc: queue: Use bigger segments if DMA MAP layer can merge the segments

2019-08-28 Thread Yoshihiro Shimoda

When the max_segs of a mmc host is smaller than 512, the mmc
subsystem tries to use 512 segments if DMA MAP layer can merge
the segments, and then the mmc subsystem exposes such information
to the block layer by using blk_queue_can_use_dma_map_merging().

Signed-off-by: Yoshihiro Shimoda 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Ulf Hansson 
Reviewed-by: Simon Horman 
---
 drivers/mmc/core/queue.c | 35 ---
 include/linux/mmc/host.h |  1 +
 2 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 7102e2e..1e29b30 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -21,6 +21,8 @@
 #include "card.h"
 #include "host.h"
 
+#define MMC_DMA_MAP_MERGE_SEGMENTS 512
+
 static inline bool mmc_cqe_dcmd_busy(struct mmc_queue *mq)
 {
/* Allow only 1 DCMD at a time */
@@ -193,6 +195,12 @@ static void mmc_queue_setup_discard(struct request_queue 
*q,
blk_queue_flag_set(QUEUE_FLAG_SECERASE, q);
 }
 
+static unsigned int mmc_get_max_segments(struct mmc_host *host)
+{
+   return host->can_dma_map_merge ? MMC_DMA_MAP_MERGE_SEGMENTS :
+host->max_segs;
+}
+
 /**
  * mmc_init_request() - initialize the MMC-specific per-request data
  * @q: the request queue
@@ -206,7 +214,7 @@ static int __mmc_init_request(struct mmc_queue *mq, struct 
request *req,
struct mmc_card *card = mq->card;
struct mmc_host *host = card->host;
 
-   mq_rq->sg = mmc_alloc_sg(host->max_segs, gfp);
+   mq_rq->sg = mmc_alloc_sg(mmc_get_max_segments(host), gfp);
if (!mq_rq->sg)
return -ENOMEM;
 
@@ -362,13 +370,23 @@ static void mmc_setup_queue(struct mmc_queue *mq, struct 
mmc_card *card)
blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_HIGH);
blk_queue_max_hw_sectors(mq->queue,
min(host->max_blk_count, host->max_req_size / 512));
-   blk_queue_max_segments(mq->queue, host->max_segs);
+   if (host->can_dma_map_merge)
+   WARN(!blk_queue_can_use_dma_map_merging(mq->queue,
+   mmc_dev(host)),
+"merging was advertised but not possible");
+   blk_queue_max_segments(mq->queue, mmc_get_max_segments(host));
 
if (mmc_card_mmc(card))
block_size = card->ext_csd.data_sector_size;
 
blk_queue_logical_block_size(mq->queue, block_size);
-   blk_queue_max_segment_size(mq->queue,
+   /*
+* After blk_queue_can_use_dma_map_merging() was called with succeed,
+* since it calls blk_queue_virt_boundary(), the mmc should not call
+* both blk_queue_max_segment_size().
+*/
+   if (!host->can_dma_map_merge)
+   blk_queue_max_segment_size(mq->queue,
round_down(host->max_seg_size, block_size));
 
dma_set_max_seg_size(mmc_dev(host), queue_max_segment_size(mq->queue));
@@ -418,6 +436,17 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card 
*card)
mq->tag_set.cmd_size = sizeof(struct mmc_queue_req);
mq->tag_set.driver_data = mq;
 
+   /*
+* Since blk_mq_alloc_tag_set() calls .init_request() of mmc_mq_ops,
+* the host->can_dma_map_merge should be set before to get max_segs
+* from mmc_get_max_segments().
+*/
+   if (host->max_segs < MMC_DMA_MAP_MERGE_SEGMENTS &&
+   dma_get_merge_boundary(mmc_dev(host)))
+   host->can_dma_map_merge = 1;
+   else
+   host->can_dma_map_merge = 0;
+
ret = blk_mq_alloc_tag_set(&mq->tag_set);
if (ret)
return ret;
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 4a351cb..c5662b3 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -396,6 +396,7 @@ struct mmc_host {
unsigned intretune_paused:1; /* re-tuning is temporarily 
disabled */
unsigned intuse_blk_mq:1;   /* use blk-mq */
unsigned intretune_crc_disable:1; /* don't trigger retune 
upon crc */
+   unsigned intcan_dma_map_merge:1; /* merging can be used */
 
int rescan_disable; /* disable card detection */
int rescan_entered; /* used with nonremovable 
devices */
-- 
2.7.4

[PATCH v10 2/4] iommu/dma: Add a new dma_map_ops of get_merge_boundary()

2019-08-28 Thread Yoshihiro Shimoda

This patch adds a new dma_map_ops of get_merge_boundary() to
expose the DMA merge boundary if the domain type is IOMMU_DOMAIN_DMA.

Signed-off-by: Yoshihiro Shimoda 
Reviewed-by: Simon Horman 
Acked-by: Joerg Roedel 
---
 drivers/iommu/dma-iommu.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index de68b4a..ad861bd 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1091,6 +1091,13 @@ static int iommu_dma_get_sgtable(struct device *dev, 
struct sg_table *sgt,
return ret;
 }
 
+static unsigned long iommu_dma_get_merge_boundary(struct device *dev)
+{
+   struct iommu_domain *domain = iommu_get_dma_domain(dev);
+
+   return (1UL << __ffs(domain->pgsize_bitmap)) - 1;
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
.alloc  = iommu_dma_alloc,
.free   = iommu_dma_free,
@@ -1106,6 +1113,7 @@ static const struct dma_map_ops iommu_dma_ops = {
.sync_sg_for_device = iommu_dma_sync_sg_for_device,
.map_resource   = iommu_dma_map_resource,
.unmap_resource = iommu_dma_unmap_resource,
+   .get_merge_boundary = iommu_dma_get_merge_boundary,
 };
 
 /*
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 1/4] dma: Introduce dma_get_merge_boundary()

2019-08-28 Thread Yoshihiro Shimoda

This patch adds a new DMA API "dma_get_merge_boundary". This function
returns the DMA merge boundary if the DMA layer can merge the segments.
This patch also adds the implementation for a new dma_map_ops pointer.

Signed-off-by: Yoshihiro Shimoda 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Simon Horman 
---
 Documentation/DMA-API.txt   |  8 
 include/linux/dma-mapping.h |  6 ++
 kernel/dma/mapping.c| 11 +++
 3 files changed, 25 insertions(+)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index e47c63b..9c4dd3d 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -204,6 +204,14 @@ Returns the maximum size of a mapping for the device. The 
size parameter
 of the mapping functions like dma_map_single(), dma_map_page() and
 others should not be larger than the returned value.
 
+::
+
+   unsigned long
+   dma_get_merge_boundary(struct device *dev);
+
+Returns the DMA merge boundary. If the device cannot merge any the DMA address
+segments, the function returns 0.
+
 Part Id - Streaming DMA mappings
 
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 14702e2..7072b78 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -131,6 +131,7 @@ struct dma_map_ops {
int (*dma_supported)(struct device *dev, u64 mask);
u64 (*get_required_mask)(struct device *dev);
size_t (*max_mapping_size)(struct device *dev);
+   unsigned long (*get_merge_boundary)(struct device *dev);
 };
 
 #define DMA_MAPPING_ERROR  (~(dma_addr_t)0)
@@ -462,6 +463,7 @@ int dma_set_mask(struct device *dev, u64 mask);
 int dma_set_coherent_mask(struct device *dev, u64 mask);
 u64 dma_get_required_mask(struct device *dev);
 size_t dma_max_mapping_size(struct device *dev);
+unsigned long dma_get_merge_boundary(struct device *dev);
 #else /* CONFIG_HAS_DMA */
 static inline dma_addr_t dma_map_page_attrs(struct device *dev,
struct page *page, size_t offset, size_t size,
@@ -567,6 +569,10 @@ static inline size_t dma_max_mapping_size(struct device 
*dev)
 {
return 0;
 }
+static inline unsigned long dma_get_merge_boundary(struct device *dev)
+{
+   return 0;
+}
 #endif /* CONFIG_HAS_DMA */
 
 static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index b0038ca..b3077b5 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -405,3 +405,14 @@ size_t dma_max_mapping_size(struct device *dev)
return size;
 }
 EXPORT_SYMBOL_GPL(dma_max_mapping_size);
+
+unsigned long dma_get_merge_boundary(struct device *dev)
+{
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+
+   if (!ops || !ops->get_merge_boundary)
+   return 0;   /* can't merge */
+
+   return ops->get_merge_boundary(dev);
+}
+EXPORT_SYMBOL_GPL(dma_get_merge_boundary);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 3/4] block: add a helper function to merge the segments

2019-08-28 Thread Yoshihiro Shimoda

This patch adds a helper function whether a queue can merge
the segments by the DMA MAP layer (e.g. via IOMMU).

Signed-off-by: Yoshihiro Shimoda 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Simon Horman 
 #include 
 #include 
+#include 
 
 #include "blk.h"
 #include "blk-wbt.h"
@@ -832,6 +833,28 @@ void blk_queue_write_cache(struct request_queue *q, bool 
wc, bool fua)
 }
 EXPORT_SYMBOL_GPL(blk_queue_write_cache);
 
+/**
+ * blk_queue_can_use_dma_map_merging - configure queue for merging segments.
+ * @q: the request queue for the device
+ * @dev:   the device pointer for dma
+ *
+ * Tell the block layer about merging the segments by dma map of @q.
+ */
+bool blk_queue_can_use_dma_map_merging(struct request_queue *q,
+  struct device *dev)
+{
+   unsigned long boundary = dma_get_merge_boundary(dev);
+
+   if (!boundary)
+   return false;
+
+   /* No need to update max_segment_size. see blk_queue_virt_boundary() */
+   blk_queue_virt_boundary(q, boundary);
+
+   return true;
+}
+EXPORT_SYMBOL_GPL(blk_queue_can_use_dma_map_merging);
+
 static int __init blk_settings_init(void)
 {
blk_max_low_pfn = max_low_pfn - 1;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1ac7901..d62d6e2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1086,6 +1086,8 @@ extern void blk_queue_dma_alignment(struct request_queue 
*, int);
 extern void blk_queue_update_dma_alignment(struct request_queue *, int);
 extern void blk_queue_rq_timeout(struct request_queue *, unsigned int);
 extern void blk_queue_write_cache(struct request_queue *q, bool enabled, bool 
fua);
+extern bool blk_queue_can_use_dma_map_merging(struct request_queue *q,
+ struct device *dev);
 
 /*
  * Number of physical segments as sent to the device.
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 0/4] treewide: improve R-Car SDHI performance

2019-08-28 Thread Yoshihiro Shimoda

This patch series is based on linux-next.git / next-20190828 tag.

Since SDHI host internal DMAC of the R-Car Gen3 cannot handle two or
more segments, the performance rate (especially, eMMC HS400 reading)
is not good. However, if IOMMU is enabled on the DMAC, since IOMMU will
map multiple scatter gather buffers as one contignous iova, the DMAC can
handle the iova as well and then the performance rate is possible to
improve. In fact, I have measured the performance by using bonnie++,
"Sequential Input - block" rate was improved on r8a7795.

To achieve this, this patch series modifies IOMMU and Block subsystem
at first. This patch series is strictly depended on each subsystem
modification, so that I submit it as treewide.

Changes from v9:
 - Rebase on next-20190828.
 - Drop a blk-setting.c patch to sort headers.
 - Remove an unnecessary condition on patch 2/4.
 - Add Joerg-san's Reviewed-by into patch 2/4.
 - Fix a condition on patch 4/4 not to call blk_queue_max_segment_size()
   as the comment.
https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=151067

Changes from v8:
 - Rebase on next-20190726.
 - Use "1UL" instead of just "1" for iommu_dma_get_merge_boundary().
 - Add Simon-san's Reviewed-by into all patches.
https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=149023

Changes from v7:
 - Rebase on next-20190722 (v5.3-rc1 + next branches of subsystems)
 - Add some Reviewed-by.
https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=135391

Changes from v6:
 - [1/5 for DMA MAP] A new patch.
 - [2/5 for IOMMU] A new patch.
 - [3/5 for BLOCK] Add Reviewed-by.
 - [4/5 for BLOCK] Use a new DMA MAP API instead of device_iommu_mapped().
 - [5/5 for MMC] Likewise, and some minor fix.
 - Remove patch 4/5 of v6 from this v7 patch series.
https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=131769

Changes from v5:
 - Almost all patches are new code.
 - [4/5 for MMC] This is a refactor patch so that I don't add any
   {Tested,Reviewed}-by tags.
 - [5/5 for MMC] Modify MMC subsystem to use bigger segments instead of
   the renesas_sdhi driver.
 - [5/5 for MMC] Use BLK_MAX_SEGMENTS (128) instead of local value
   SDHI_MAX_SEGS_IN_IOMMU (512). Even if we use BLK_MAX_SEGMENTS,
   the performance is still good.
https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=127511

Changes from v4:
 - [DMA MAPPING] Add a new device_dma_parameters for iova contiguous.
 - [IOMMU] Add a new capable for "merging" segments.
 - [IOMMU] Add a capable ops into the ipmmu-vmsa driver.
 - [MMC] Sort headers in renesas_sdhi_core.c.
 - [MMC] Remove the following codes that made on v3 that can be achieved by
 DMA MAPPING and IOMMU subsystem:
 -- Check if R-Car Gen3 IPMMU is used or not on patch 3.
 -- Check if all multiple segment buffers are aligned to PAGE_SIZE on patch 3.
https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=125593

Changes from v3:
 - Use a helper function device_iommu_mapped on patch 1 and 3.
 - Check if R-Car Gen3 IPMMU is used or not on patch 3.

Yoshihiro Shimoda (4):
  dma: Introduce dma_get_merge_boundary()
  iommu/dma: Add a new dma_map_ops of get_merge_boundary()
  block: add a helper function to merge the segments
  mmc: queue: Use bigger segments if DMA MAP layer can merge the
segments

 Documentation/DMA-API.txt   |  8 
 block/blk-settings.c| 23 +++
 drivers/iommu/dma-iommu.c   |  8 
 drivers/mmc/core/queue.c| 35 ---
 include/linux/blkdev.h  |  2 ++
 include/linux/dma-mapping.h |  6 ++
 include/linux/mmc/host.h|  1 +
 kernel/dma/mapping.c| 11 +++
 8 files changed, 91 insertions(+), 3 deletions(-)

-- 
2.7.4

Re: [PATCH v2 2/2] dma-contiguous: Use fallback alloc_pages for single pages

2019-08-28 Thread Masahiro Yamada

On Wed, Aug 28, 2019 at 7:53 PM Masahiro Yamada
 wrote:
>
> Hi Christoph,
>
> On Tue, Aug 27, 2019 at 8:55 PM Christoph Hellwig  wrote:
> >
> > On Tue, Aug 27, 2019 at 06:03:14PM +0900, Masahiro Yamada wrote:
> > > Yes, this makes my driver working again
> > > when CONFIG_DMA_CMA=y.
> > >
> > >
> > > If I apply the following, my driver gets back working
> > > irrespective of CONFIG_DMA_CMA.
> >
> > That sounds a lot like the device simply isn't 64-bit DMA capable, and
> > previously always got CMA allocations under the limit it actually
> > supported.  I suggest that you submit this quirk to the mmc maintainers.
>
>
> I tested v5.2 and my MMC host controller works with
> dma_address that exceeds 32-bit physical address.
>
> So, I believe my MMC device is 64-bit DMA capable.
>
> I am still looking into the code
> to find out what was changed.


I retract this comment.

Prior to bd2e75633c8012fc8a7431c82fda66237133bf7e,
the descriptor table for ADMA is placed within the
32-bit phys address range, not exceeds the 32-bit limit.

Probably, my device is not 64-bit capable.

I will talk to the hardware engineer,
and check the hardware spec just in case.

Thanks.

-- 
Best Regards
Masahiro Yamada

Re: [PATCH v6 0/6] Allwinner H6 Mali GPU support

2019-08-28 Thread Neil Armstrong

On 28/08/2019 13:49, Robin Murphy wrote:
> Hi Neil,
> 
> On 28/08/2019 12:28, Neil Armstrong wrote:
>> Hi Robin,
>>

[...]
>>>
>>> OK - with the 32-bit hack pointed to up-thread, a quick kmscube test gave 
>>> me the impression that T720 works fine, but on closer inspection some parts 
>>> of glmark2 do seem to go a bit wonky (although I suspect at least some of 
>>> it is just down to the FPGA setup being both very slow and lacking in 
>>> memory bandwidth), and the "nv12-1img" mode of kmscube turns out to render 
>>> in some delightfully wrong colours.
>>>
>>> I'll try to get a 'proper' version of the io-pgtable patch posted soon.
>>
>> I'm trying to collect all the fixes needed to make T820 work again, and
>> I was wondering if you finally have a proper patch for this and "cfg->ias > 
>> 48"
>> hack ? Or one I can test ?
> 
> I do have a handful of io-pgtable patches written up and ready to go, I'm 
> just treading carefully and waiting for the internal approval box to be 
> ticked before I share anything :(

Great !

No problem, it can totally wait until approval,

Thanks,
Neil

> 
> Robin.
> 
>>
>> Thanks,
>> Neil
>>
>>>
>>> Thanks,
>>> Robin.
>>>

 Cheers,

 Tomeu

> Robin.
>
>
> ->8-
> diff --git a/drivers/iommu/io-pgtable-arm.c 
> b/drivers/iommu/io-pgtable-arm.c
> index 546968d8a349..f29da6e8dc08 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -1023,12 +1023,14 @@ arm_mali_lpae_alloc_pgtable(struct
> io_pgtable_cfg *cfg, void *cookie)
>   iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
>   if (iop) {
>   u64 mair, ttbr;
> +   struct arm_lpae_io_pgtable *data = 
> io_pgtable_ops_to_data(&iop->ops);
>
> +   data->levels = 4;
>   /* Copy values as union fields overlap */
>   mair = cfg->arm_lpae_s1_cfg.mair[0];
>   ttbr = cfg->arm_lpae_s1_cfg.ttbr[0];
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 0/6] Allwinner H6 Mali GPU support

2019-08-28 Thread Robin Murphy


Hi Neil,

On 28/08/2019 12:28, Neil Armstrong wrote:

Hi Robin,

On 31/05/2019 15:47, Robin Murphy wrote:

On 31/05/2019 13:04, Tomeu Vizoso wrote:

On Wed, 29 May 2019 at 19:38, Robin Murphy  wrote:


On 29/05/2019 16:09, Tomeu Vizoso wrote:

On Tue, 21 May 2019 at 18:11, Clément Péron  wrote:



[snip]

[  345.204813] panfrost 180.gpu: mmu irq status=1
[  345.209617] panfrost 180.gpu: Unhandled Page fault in AS0 at VA
0x02400400


   From what I can see here, 0x02400400 points to the first byte
of the first submitted job descriptor.

So mapping buffers for the GPU doesn't seem to be working at all on
64-bit T-760.

Steven, Robin, do you have any idea of why this could be?


I tried rolling back to the old panfrost/nondrm shim, and it works fine
with kbase, and I also found that T-820 falls over in the exact same
manner, so the fact that it seemed to be common to the smaller 33-bit
designs rather than anything to do with the other
job_descriptor_size/v4/v5 complication turned out to be telling.


Is this complication something you can explain? I don't know what v4
and v5 are meant here.


I was alluding to BASE_HW_FEATURE_V4, which I believe refers to the Midgard architecture version - 
the older versions implemented by T6xx and T720 seem to be collectively treated as "v4", 
while T760 and T8xx would effectively be "v5".


[ as an aside, are 64-bit jobs actually known not to work on v4 GPUs, or
is it just that nobody's yet observed a 64-bit blob driving one? ]


I'm looking right now at getting Panfrost working on T720 with 64-bit
descriptors, with the ultimate goal of making Panfrost
64-bit-descriptor only so we can have a single build of Mesa in
distros.


Cool, I'll keep an eye out, and hope that it might be enough for T620 on Juno, 
too :)


Long story short, it appears that 'Mali LPAE' is also lacking the start
level notion of VMSA, and expects a full 4-level table even for <40 bits
when level 0 effectively redundant. Thus walking the 3-level table that
io-pgtable comes back with ends up going wildly wrong. The hack below
seems to do the job for me; if Clément can confirm (on T-720 you'll
still need the userspace hack to force 32-bit jobs as well) then I think
I'll cook up a proper refactoring of the allocator to put things right.


Mmaps seem to work with this patch, thanks.

The main complication I'm facing right now seems to be that the SFBD
descriptor on T720 seems to be different from the one we already had
(tested on T6xx?).


OK - with the 32-bit hack pointed to up-thread, a quick kmscube test gave me the 
impression that T720 works fine, but on closer inspection some parts of glmark2 do seem 
to go a bit wonky (although I suspect at least some of it is just down to the FPGA setup 
being both very slow and lacking in memory bandwidth), and the "nv12-1img" mode 
of kmscube turns out to render in some delightfully wrong colours.

I'll try to get a 'proper' version of the io-pgtable patch posted soon.


I'm trying to collect all the fixes needed to make T820 work again, and
I was wondering if you finally have a proper patch for this and "cfg->ias > 48"
hack ? Or one I can test ?


I do have a handful of io-pgtable patches written up and ready to go, 
I'm just treading carefully and waiting for the internal approval box to 
be ticked before I share anything :(


Robin.



Thanks,
Neil



Thanks,
Robin.



Cheers,

Tomeu


Robin.


->8-
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 546968d8a349..f29da6e8dc08 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -1023,12 +1023,14 @@ arm_mali_lpae_alloc_pgtable(struct
io_pgtable_cfg *cfg, void *cookie)
  iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
  if (iop) {
  u64 mair, ttbr;
+   struct arm_lpae_io_pgtable *data = 
io_pgtable_ops_to_data(&iop->ops);

+   data->levels = 4;
  /* Copy values as union fields overlap */
  mair = cfg->arm_lpae_s1_cfg.mair[0];
  ttbr = cfg->arm_lpae_s1_cfg.ttbr[0];

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 0/6] Allwinner H6 Mali GPU support

2019-08-28 Thread Neil Armstrong

Hi Robyn,

On 31/05/2019 15:47, Robin Murphy wrote:
> On 31/05/2019 13:04, Tomeu Vizoso wrote:
>> On Wed, 29 May 2019 at 19:38, Robin Murphy  wrote:
>>>
>>> On 29/05/2019 16:09, Tomeu Vizoso wrote:
 On Tue, 21 May 2019 at 18:11, Clément Péron  wrote:
>
 [snip]
> [  345.204813] panfrost 180.gpu: mmu irq status=1
> [  345.209617] panfrost 180.gpu: Unhandled Page fault in AS0 at VA
> 0x02400400

   From what I can see here, 0x02400400 points to the first byte
 of the first submitted job descriptor.

 So mapping buffers for the GPU doesn't seem to be working at all on
 64-bit T-760.

 Steven, Robin, do you have any idea of why this could be?
>>>
>>> I tried rolling back to the old panfrost/nondrm shim, and it works fine
>>> with kbase, and I also found that T-820 falls over in the exact same
>>> manner, so the fact that it seemed to be common to the smaller 33-bit
>>> designs rather than anything to do with the other
>>> job_descriptor_size/v4/v5 complication turned out to be telling.
>>
>> Is this complication something you can explain? I don't know what v4
>> and v5 are meant here.
> 
> I was alluding to BASE_HW_FEATURE_V4, which I believe refers to the Midgard 
> architecture version - the older versions implemented by T6xx and T720 seem 
> to be collectively treated as "v4", while T760 and T8xx would effectively be 
> "v5".
> 
>>> [ as an aside, are 64-bit jobs actually known not to work on v4 GPUs, or
>>> is it just that nobody's yet observed a 64-bit blob driving one? ]
>>
>> I'm looking right now at getting Panfrost working on T720 with 64-bit
>> descriptors, with the ultimate goal of making Panfrost
>> 64-bit-descriptor only so we can have a single build of Mesa in
>> distros.
> 
> Cool, I'll keep an eye out, and hope that it might be enough for T620 on 
> Juno, too :)
> 
>>> Long story short, it appears that 'Mali LPAE' is also lacking the start
>>> level notion of VMSA, and expects a full 4-level table even for <40 bits
>>> when level 0 effectively redundant. Thus walking the 3-level table that
>>> io-pgtable comes back with ends up going wildly wrong. The hack below
>>> seems to do the job for me; if Clément can confirm (on T-720 you'll
>>> still need the userspace hack to force 32-bit jobs as well) then I think
>>> I'll cook up a proper refactoring of the allocator to put things right.
>>
>> Mmaps seem to work with this patch, thanks.
>>
>> The main complication I'm facing right now seems to be that the SFBD
>> descriptor on T720 seems to be different from the one we already had
>> (tested on T6xx?).
> 
> OK - with the 32-bit hack pointed to up-thread, a quick kmscube test gave me 
> the impression that T720 works fine, but on closer inspection some parts of 
> glmark2 do seem to go a bit wonky (although I suspect at least some of it is 
> just down to the FPGA setup being both very slow and lacking in memory 
> bandwidth), and the "nv12-1img" mode of kmscube turns out to render in some 
> delightfully wrong colours.
> 
> I'll try to get a 'proper' version of the io-pgtable patch posted soon.

I'm trying to collect all the fixes needed to make T820 work again, and
I was wondering if you finally have a proper patch for this and "cfg->ias > 48"
hack ? Or one I can test ?

Thanks,
Neil

> 
> Thanks,
> Robin.
> 
>>
>> Cheers,
>>
>> Tomeu
>>
>>> Robin.
>>>
>>>
>>> ->8-
>>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>>> index 546968d8a349..f29da6e8dc08 100644
>>> --- a/drivers/iommu/io-pgtable-arm.c
>>> +++ b/drivers/iommu/io-pgtable-arm.c
>>> @@ -1023,12 +1023,14 @@ arm_mali_lpae_alloc_pgtable(struct
>>> io_pgtable_cfg *cfg, void *cookie)
>>>  iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
>>>  if (iop) {
>>>  u64 mair, ttbr;
>>> +   struct arm_lpae_io_pgtable *data = 
>>> io_pgtable_ops_to_data(&iop->ops);
>>>
>>> +   data->levels = 4;
>>>  /* Copy values as union fields overlap */
>>>  mair = cfg->arm_lpae_s1_cfg.mair[0];
>>>  ttbr = cfg->arm_lpae_s1_cfg.ttbr[0];
>>>
>>> ___
>>> dri-devel mailing list
>>> dri-de...@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 2/2] dma-contiguous: Use fallback alloc_pages for single pages

2019-08-28 Thread Masahiro Yamada

Hi Christoph,

On Tue, Aug 27, 2019 at 8:55 PM Christoph Hellwig  wrote:
>
> On Tue, Aug 27, 2019 at 06:03:14PM +0900, Masahiro Yamada wrote:
> > Yes, this makes my driver working again
> > when CONFIG_DMA_CMA=y.
> >
> >
> > If I apply the following, my driver gets back working
> > irrespective of CONFIG_DMA_CMA.
>
> That sounds a lot like the device simply isn't 64-bit DMA capable, and
> previously always got CMA allocations under the limit it actually
> supported.  I suggest that you submit this quirk to the mmc maintainers.


I tested v5.2 and my MMC host controller works with
dma_address that exceeds 32-bit physical address.

So, I believe my MMC device is 64-bit DMA capable.

I am still looking into the code
to find out what was changed.




--
Best Regards
Masahiro Yamada

Re: [PATCH v2 01/11] asm-generic: add dma_zone_size

2019-08-28 Thread Nicolas Saenz Julienne

On Mon, 2019-08-26 at 15:46 +0200, Nicolas Saenz Julienne wrote:
> On Mon, 2019-08-26 at 09:09 +0200, Christoph Hellwig wrote:
> > On Tue, Aug 20, 2019 at 04:58:09PM +0200, Nicolas Saenz Julienne wrote:
> > > Some architectures have platform specific DMA addressing limitations.
> > > This will allow for hardware description code to provide the constraints
> > > in a generic manner, so as for arch code to properly setup it's memory
> > > zones and DMA mask.
> > 
> > I know this just spreads the arm code, but I still kinda hate it.
> 
> Rob's main concern was finding a way to pass the constraint from HW definition
> to arch without widening fdt's architecture specific function surface. I'd say
> it's fair to argue that having a generic mechanism makes sense as it'll now
> traverse multiple archs and subsystems.
> 
> I get adding globals like this is not very appealing, yet I went with it as it
> was the easier to integrate with arm's code. Any alternative suggestions?
> 
> > MAX_DMA_ADDRESS is such an oddly defined concepts.  We have the mm
> > code that uses it to start allocating after the dma zones, but
> > I think that would better be done using a function returning
> > 1 << max(zone_dma_bits, 32) or so.  Then we have about a handful
> > of drivers using it that all seem rather bogus, and one of which
> > I think are usable on arm64.
> 
> Is it safe to assume DMA limitations will always be a power of 2? I ask as
> RPi4
> kinda isn't: ZONE_DMA is 0x3c00 bytes big, I'm approximating the zone mask
> to 30 as [0x3c00 0x3fff] isn't defined as memory so it's unlikely that
> we´ll encounter buffers there. But I don't know how it could affect mm
> initialization code.
> 
> This also rules out 'zone_dma_bits' as a mechanism to pass ZONE_DMA's size
> from
> HW definition code to arch's.

Hi Christoph,
I gave it a thought and think this whole MAX_DMA_ADDRESS topic falls out of the
scope of the series. I agree it's something that we should get rid of, but
fixing it isn't going to affect the overall enhancement intended here.  I'd
rather focus on how are we going to pass the DMA zone data into the arch code
and fix MAX_DMA_ADDRESS on another series.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH] iommu/vt-d: Fix IOMMU field not populated on device hot re-plug

Re: /proc/vmcore and wrong PAGE_OFFSET

Re: [GIT PULL] iommu/arm-smmu: Big batch of updates for 5.4

[PATCH] iommu/amd: silence warnings under memory pressure

Re: /proc/vmcore and wrong PAGE_OFFSET

Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices

Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices

Re: [PATCH 2/5] x86/pci: Add a to_pci_sysdata helper

Re: /proc/vmcore and wrong PAGE_OFFSET

Re: [PATCH 4/5] PCI/vmd: Stop overriding dma_map_ops

Re: [RFC PATCH] iommu/vt-d: Fix IOMMU field not populated on device hot re-plug

[PATCH 3/5] x86/pci: Replace the vmd_domain field with a vmd_dev pointer

[PATCH 2/5] x86/pci: Add a to_pci_sysdata helper

[PATCH 4/5] PCI/vmd: Stop overriding dma_map_ops

[PATCH 5/5] x86/pci: Remove X86_DEV_DMA_OPS

stop overriding dma_ops in vmd v2

[PATCH 1/5] x86/pci: Remove an ifdef KERNEL from pci.h

[PATCH] iommu/iova: avoid false sharing on fq_timer_on

[PATCH v10 4/4] mmc: queue: Use bigger segments if DMA MAP layer can merge the segments

[PATCH v10 2/4] iommu/dma: Add a new dma_map_ops of get_merge_boundary()

[PATCH v10 1/4] dma: Introduce dma_get_merge_boundary()

[PATCH v10 3/4] block: add a helper function to merge the segments

[PATCH v10 0/4] treewide: improve R-Car SDHI performance

Re: [PATCH v2 2/2] dma-contiguous: Use fallback alloc_pages for single pages

Re: [PATCH v6 0/6] Allwinner H6 Mali GPU support

Re: [PATCH v6 0/6] Allwinner H6 Mali GPU support

Re: [PATCH v6 0/6] Allwinner H6 Mali GPU support

Re: [PATCH v2 2/2] dma-contiguous: Use fallback alloc_pages for single pages

Re: [PATCH v2 01/11] asm-generic: add dma_zone_size

29 matches

Site Navigation

Mail list logo

Footer information