Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

2022-05-02 Thread Qian Cai
On Mon, Apr 18, 2022 at 08:49:49AM +0800, Lu Baolu wrote:
> Hi Joerg,
> 
> This is a resend version of v8 posted here:
> https://lore.kernel.org/linux-iommu/20220308054421.847385-1-baolu...@linux.intel.com/
> as we discussed in this thread:
> https://lore.kernel.org/linux-iommu/yk%2fq1bgn8pc5h...@8bytes.org/
> 
> All patches can be applied perfectly except this one:
>  - [PATCH v8 02/11] driver core: Add dma_cleanup callback in bus_type
> It conflicts with below refactoring commit:
>  - 4b775aaf1ea99 "driver core: Refactor sysfs and drv/bus remove hooks"
> The conflict has been fixed in this post.
> 
> No functional changes in this series. I suppress cc-ing this series to
> all v8 reviewers in order to avoid spam.
> 
> Please consider it for your iommu tree.

Reverting this series fixed an user-after-free while doing SR-IOV.

 BUG: KASAN: use-after-free in __lock_acquire
 Read of size 8 at addr 080279825d78 by task qemu-system-aar/22429
 CPU: 24 PID: 22429 Comm: qemu-system-aar Not tainted 5.18.0-rc5-next-20220502 
#69
 Call trace:
  dump_backtrace
  show_stack
  dump_stack_lvl
  print_address_description.constprop.0
  print_report
  kasan_report
  __asan_report_load8_noabort
  __lock_acquire
  lock_acquire.part.0
  lock_acquire
  _raw_spin_lock_irqsave
  arm_smmu_detach_dev
  arm_smmu_detach_dev at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:2377
  arm_smmu_attach_dev
  __iommu_attach_group
  __iommu_attach_device at drivers/iommu/iommu.c:1942
  (inlined by) iommu_group_do_attach_device at drivers/iommu/iommu.c:2058
  (inlined by) __iommu_group_for_each_dev at drivers/iommu/iommu.c:989
  (inlined by) __iommu_attach_group at drivers/iommu/iommu.c:2069
  iommu_group_release_dma_owner
  __vfio_group_unset_container
  vfio_group_try_dissolve_container
  vfio_group_put_external_user
  kvm_vfio_destroy
  kvm_destroy_vm
  kvm_vm_release
  __fput
  fput
  task_work_run
  do_exit
  do_group_exit
  get_signal
  do_signal
  do_notify_resume
  el0_svc
  el0t_64_sync_handler
  el0t_64_sync

 Allocated by task 22427:
  kasan_save_stack
  __kasan_kmalloc
  kmem_cache_alloc_trace
  arm_smmu_domain_alloc
  iommu_domain_alloc
  vfio_iommu_type1_attach_group
  vfio_ioctl_set_iommu
  vfio_fops_unl_ioctl
  __arm64_sys_ioctl
  invoke_syscall
  el0_svc_common.constprop.0
  do_el0_svc
  el0_svc
  el0t_64_sync_handler
  el0t_64_sync

 Freed by task 22429:
  kasan_save_stack
  kasan_set_track
  kasan_set_free_info
  kasan_slab_free
  __kasan_slab_free
  slab_free_freelist_hook
  kfree
  arm_smmu_domain_free
  arm_smmu_domain_free at iommu/arm/arm-smmu-v3/arm-smmu-v3.c:2067
  iommu_domain_free
  vfio_iommu_type1_detach_group
  __vfio_group_unset_container
  vfio_group_try_dissolve_container
  vfio_group_put_external_user
  kvm_vfio_destroy
  kvm_destroy_vm
  kvm_vm_release
  __fput
  fput
  task_work_run
  do_exit
  do_group_exit
  get_signal
  do_signal
  do_notify_resume
  el0_svc
  el0t_64_sync_handler
  el0t_64_sync
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-06-24 Thread Qian Cai



On 6/24/2021 7:48 AM, Will Deacon wrote:
> Ok, diff below which attempts to tackle the offset issue I mentioned as
> well. Qian Cai -- please can you try with these changes?

This works fine.

> 
> Will
> 
> --->8
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 175b6c113ed8..39284ff2a6cd 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -116,7 +116,9 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
> phys_addr_t paddr)
>  
>  static inline bool is_swiotlb_force_bounce(struct device *dev)
>  {
> -   return dev->dma_io_tlb_mem->force_bounce;
> +   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> +
> +   return mem && mem->force_bounce;
>  }
>  
>  void __init swiotlb_exit(void);
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 44be8258e27b..0ffbaae9fba2 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -449,6 +449,7 @@ static int swiotlb_find_slots(struct device *dev, 
> phys_addr_t orig_addr,
> dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> unsigned int nslots = nr_slots(alloc_size), stride;
> unsigned int index, wrap, count = 0, i;
> +   unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> unsigned long flags;
>  
> BUG_ON(!nslots);
> @@ -497,7 +498,7 @@ static int swiotlb_find_slots(struct device *dev, 
> phys_addr_t orig_addr,
> for (i = index; i < index + nslots; i++) {
> mem->slots[i].list = 0;
> mem->slots[i].alloc_size =
> -   alloc_size - ((i - index) << IO_TLB_SHIFT);
> +   alloc_size - (offset + ((i - index) << IO_TLB_SHIFT));
> }
> for (i = index - 1;
>  io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-06-23 Thread Qian Cai



On 6/23/2021 2:37 PM, Will Deacon wrote:
> On Wed, Jun 23, 2021 at 12:39:29PM -0400, Qian Cai wrote:
>>
>>
>> On 6/18/2021 11:40 PM, Claire Chang wrote:
>>> Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
>>> use it to determine whether to bounce the data or not. This will be
>>> useful later to allow for different pools.
>>>
>>> Signed-off-by: Claire Chang 
>>> Reviewed-by: Christoph Hellwig 
>>> Tested-by: Stefano Stabellini 
>>> Tested-by: Will Deacon 
>>> Acked-by: Stefano Stabellini 
>>
>> Reverting the rest of the series up to this patch fixed a boot crash with 
>> NVMe on today's linux-next.
> 
> Hmm, so that makes patch 7 the suspicious one, right?

Will, no. It is rather patch #6 (this patch). Only the patch from #6 to #12 
were reverted to fix the issue. Also, looking at this offset of the crash,

pc : dma_direct_map_sg+0x304/0x8f0
is_swiotlb_force_bounce at /usr/src/linux-next/./include/linux/swiotlb.h:119

is_swiotlb_force_bounce() was the new function introduced in this patch here.

+static inline bool is_swiotlb_force_bounce(struct device *dev)
+{
+   return dev->dma_io_tlb_mem->force_bounce;
+}

> 
> Looking at that one more closely, it looks like swiotlb_find_slots() takes
> 'alloc_size + offset' as its 'alloc_size' parameter from
> swiotlb_tbl_map_single() and initialises 'mem->slots[i].alloc_size' based
> on 'alloc_size + offset', which looks like a change in behaviour from the
> old code, which didn't include the offset there.
> 
> swiotlb_release_slots() then adds the offset back on afaict, so we end up
> accounting for it twice and possibly unmap more than we're supposed to?
> 
> Will
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-06-23 Thread Qian Cai



On 6/18/2021 11:40 PM, Claire Chang wrote:
> Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
> use it to determine whether to bounce the data or not. This will be
> useful later to allow for different pools.
> 
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 
> Acked-by: Stefano Stabellini 

Reverting the rest of the series up to this patch fixed a boot crash with NVMe 
on today's linux-next.

[   22.286574][T7] Unable to handle kernel paging request at virtual 
address dfff800e
[   22.295225][T7] Mem abort info:
[   22.298743][T7]   ESR = 0x9604
[   22.302496][T7]   EC = 0x25: DABT (current EL), IL = 32 bits
[   22.308525][T7]   SET = 0, FnV = 0
[   22.312274][T7]   EA = 0, S1PTW = 0
[   22.316131][T7]   FSC = 0x04: level 0 translation fault
[   22.321704][T7] Data abort info:
[   22.325278][T7]   ISV = 0, ISS = 0x0004
[   22.329840][T7]   CM = 0, WnR = 0
[   22.333503][T7] [dfff800e] address between user and kernel 
address ranges
[   22.338543][  T256] igb 0006:01:00.0: Intel(R) Gigabit Ethernet Network 
Connection
[   22.341400][T7] Internal error: Oops: 9604 [#1] SMP
[   22.348915][  T256] igb 0006:01:00.0: eth0: (PCIe:2.5Gb/s:Width x1) 
4c:38:d5:09:c8:83
[   22.354458][T7] Modules linked in: igb(+) i2c_algo_bit nvme mlx5_core(+) 
i2c_core nvme_core firmware_class
[   22.362512][  T256] igb 0006:01:00.0: eth0: PBA No: G69016-004
[   22.372287][T7] CPU: 13 PID: 7 Comm: kworker/u64:0 Not tainted 
5.13.0-rc7-next-20210623+ #47
[   22.372293][T7] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, 
BIOS 1.6 06/28/2020
[   22.372298][T7] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[   22.378145][  T256] igb 0006:01:00.0: Using MSI-X interrupts. 4 rx queue(s), 
4 tx queue(s)
[   22.386901][T7] 
[   22.386905][T7] pstate: 1005 (nzcV daif -PAN -UAO -TCO BTYPE=--)
[   22.386910][T7] pc : dma_direct_map_sg+0x304/0x8f0

is_swiotlb_force_bounce at /usr/src/linux-next/./include/linux/swiotlb.h:119
(inlined by) dma_direct_map_page at /usr/src/linux-next/kernel/dma/direct.h:90
(inlined by) dma_direct_map_sg at /usr/src/linux-next/kernel/dma/direct.c:428

[   22.386919][T7] lr : dma_map_sg_attrs+0x6c/0x118
[   22.386924][T7] sp : 80001dc8eac0
[   22.386926][T7] x29: 80001dc8eac0 x28: 199e70b0 x27: 

[   22.386935][T7] x26: 000847ee7000 x25: 80001158e570 x24: 
0002
[   22.386943][T7] x23: dfff8000 x22: 0100 x21: 
199e7460
[   22.386951][T7] x20: 199e7488 x19: 0001 x18: 
10062670
[   22.386955][  T253] Unable to handle kernel paging request at virtual 
address dfff800e
[   22.386958][T7] x17: 8000109f6a90 x16: 8000109e1b4c x15: 
89303420
[   22.386965][  T253] Mem abort info:
[   22.386967][T7] x14: 0001 x13: 80001158e000
[   22.386970][  T253]   ESR = 0x9604
[   22.386972][T7]  x12: 1fffe00108fdce01
[   22.386975][  T253]   EC = 0x25: DABT (current EL), IL = 32 bits
[   22.386976][T7] x11: 1fffe00108fdce03 x10: 000847ee700c x9 : 
0004
[   22.386981][  T253]   SET = 0, FnV = 0
[   22.386983][T7] 
[   22.386985][T7] x8 : 73b91d72
[   22.386986][  T253]   EA = 0, S1PTW = 0
[   22.386987][T7]  x7 :  x6 : 000e
[   22.386990][  T253]   FSC = 0x04: level 0 translation fault
[   22.386992][T7] 
[   22.386994][T7] x5 : dfff8000
[   22.386995][  T253] Data abort info:
[   22.386997][T7]  x4 : 0008c7ede000
[   22.386999][  T253]   ISV = 0, ISS = 0x0004
[   22.386999][T7]  x3 : 0008c7ede000
[   22.387003][T7] x2 : 1000
[   22.387003][  T253]   CM = 0, WnR = 0
[   22.387006][T7]  x1 :  x0 : 0071
[   22.387008][  T253] [dfff800e] address between user and kernel 
address ranges
[   22.387011][T7] 
[   22.387013][T7] Call trace:
[   22.387016][T7]  dma_direct_map_sg+0x304/0x8f0
[   22.387022][T7]  dma_map_sg_attrs+0x6c/0x118
[   22.387026][T7]  nvme_map_data+0x2ec/0x21d8 [nvme]
[   22.387040][T7]  nvme_queue_rq+0x274/0x3f0 [nvme]
[   22.387052][T7]  blk_mq_dispatch_rq_list+0x2ec/0x18a0
[   22.387060][T7]  __blk_mq_sched_dispatch_requests+0x2a0/0x3e8
[   22.387065][T7]  blk_mq_sched_dispatch_requests+0xa4/0x100
[   22.387070][T7]  __blk_mq_run_hw_queue+0x148/0x1d8
[   22.387075][T7]  __blk_mq_delay_run_hw_queue+0x3f8/0x730
[   22.414539][  T269] igb 0006:01:00.0 enP6p1s0: renamed from eth0
[   22.418957][T7]  blk_mq_run_hw_queue+0x148/0x248
[   22.418969][T7]  blk_mq_sched_insert_request+0x2a4/0x330
[   22.418975][T7]  blk_execute_rq_nowait+0xc8/0x118
[   22.418981][T7]  blk_execute_rq+0xd4/0x188
[   22.453203][  T255] udeva

Re: [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers

2020-11-09 Thread Qian Cai
On Sat, 2020-10-24 at 22:35 +0100, David Woodhouse wrote:
> From: Thomas Gleixner 
> 
> 'trigger' and 'polarity' are used throughout the I/O-APIC code for handling
> the trigger type (edge/level) and the active low/high configuration. While
> there are defines for initializing these variables and struct members, they
> are not used consequently and the meaning of 'trigger' and 'polarity' is
> opaque and confusing at best.
> 
> Rename them to 'is_level' and 'active_low' and make them boolean in various
> structs so it's entirely clear what the meaning is.
> 
> Signed-off-by: Thomas Gleixner 
> Signed-off-by: David Woodhouse 
> ---
>  arch/x86/include/asm/hw_irq.h   |   6 +-
>  arch/x86/kernel/apic/io_apic.c  | 244 +---
>  arch/x86/pci/intel_mid_pci.c|   8 +-
>  drivers/iommu/amd/iommu.c   |  10 +-
>  drivers/iommu/intel/irq_remapping.c |   9 +-
>  5 files changed, 130 insertions(+), 147 deletions(-)

Reverting the rest of patchset up to this commit on next-20201109 fixed an
endless soft-lockups issue booting an AMD server below. I noticed that the
failed boots always has this IOMMU IO_PAGE_FAULT before those soft-lockups:

[ 3404.093354][T1] AMD-Vi: Interrupt remapping enabled
[ 3404.098593][T1] AMD-Vi: Virtual APIC enabled
[ 3404.107783][  T340] pci :00:14.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x address=0xfffdf8020200 flags=0x0008]
[ 3404.120644][T1] AMD-Vi: Lazy IO/TLB flushing enabled
[ 3404.126011][T1] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 3404.133173][T1] software IO TLB: mapped [mem 
0x68dcf000-0x6cdcf000] (64MB)

.config (if ever matters):
https://cailca.coding.net/public/linux/mm/git/files/master/x86.config

good boot dmesg (with the commits reverted):
http://people.redhat.com/qcai/dmesg.txt

== system info ==
Dell Poweredge R6415
AMD EPYC 7401P 24-Core Processor
24576 MB memory, 239 GB disk space

[  OK  ] Started Flush Journal to Persistent Storage.
[  OK  ] Started udev Kernel Device Manager.
[  OK  ] Started udev Coldplug all Devices.
[  OK  ] Started Monitoring of LVM2 mirrors,…sing dmeventd or progress polling.
[  OK  ] Reached target Local File Systems (Pre).
 Mounting /boot...
[  OK  ] Created slice system-lvm2\x2dpvscan.slice.
[ 3740.376500][ T1058] XFS (sda1): Mounting V5 Filesystem
[ 3740.438474][ T1058] XFS (sda1): Ending clean mount
[ 3765.159433][C0] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[systemd:1]
[ 3765.166929][C0] Modules linked in: acpi_cpufreq(+) ip_tables x_tables 
sd_mod ahci libahci tg3 bnxt_en megaraid_sas libata firmware_class libphy 
dm_mirror dm_region_hash dm_log dm_mod
[ 3765.183576][C0] irq event stamp: 26230104
[ 3765.187954][C0] hardirqs last  enabled at (26230103): 
[] asm_common_interrupt+0x1e/0x40
[ 3765.197873][C0] hardirqs last disabled at (26230104): 
[] sysvec_apic_timer_interrupt+0xa/0xa0
[ 3765.208303][C0] softirqs last  enabled at (26202664): 
[] __do_softirq+0x61b/0x95d
[ 3765.217699][C0] softirqs last disabled at (26202591): 
[] asm_call_irq_on_stack+0x12/0x20
[ 3765.227702][C0] CPU: 0 PID: 1 Comm: systemd Not tainted 
5.10.0-rc2-next-20201109+ #2
[ 3765.235793][C0] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 
1.9.3 06/25/2019
[ 3765.244065][C0] RIP: 0010:lock_acquire+0x1f4/0x820
lock_acquire at kernel/locking/lockdep.c:5404
[ 3765.249211][C0] Code: ff ff ff 48 83 c4 20 65 0f c1 05 a7 ba 9e 7e 83 f8 
01 4c 8b 54 24 08 0f 85 60 04 00 00 41 52 9d 48 b8 00 00 00 00 00 fc ff df <48> 
01 c3 c7 03 00 00 00 00 c7 43 08 00 00 00 00 48 8b0
[ 3765.268657][C0] RSP: 0018:c906f9f8 EFLAGS: 0246
[ 3765.274587][C0] RAX: dc00 RBX: 1920df42 RCX: 
1920df28
[ 3765.282420][C0] RDX: 111020645769 RSI:  RDI: 
0001
[ 3765.290256][C0] RBP: 0001 R08: fbfff164cb10 R09: 
fbfff164cb10
[ 3765.298090][C0] R10: 0246 R11: fbfff164cb0f R12: 
88812be555b0
[ 3765.305922][C0] R13:  R14:  R15: 

[ 3765.313750][C0] FS:  7f12bb8c59c0() GS:8881b700() 
knlGS:
[ 3765.322537][C0] CS:  0010 DS:  ES:  CR0: 80050033
[ 3765.328985][C0] CR2: 7f0c2d828fd0 CR3: 00011868a000 CR4: 
003506f0
[ 3765.336820][C0] Call Trace:
[ 3765.339979][C0]  ? rcu_read_unlock+0x40/0x40
[ 3765.344609][C0]  __d_move+0x2a2/0x16f0
__seqprop_spinlock_assert at include/linux/seqlock.h:277
(inlined by) __d_move at fs/dcache.c:2861
[ 3765.348711][C0]  ? d_move+0x47/0x70
[ 3765.352560][C0]  ? _raw_spin_unlock+0x1a/0x30
[ 3765.357275][C0]  d_move+0x47/0x70
write_seqcount_t_end at include/linux/seqlock.h:560
(inlined by) write_sequnlock at include/linux/seqlock.h:901
(inlined by) d_move at fs/dcache.c:2916
[ 3765.360951][C0]  ? vfs_rename+0x9ac/0x1270

Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-09-25 Thread Qian Cai
On Wed, 2020-08-26 at 13:16 +0200, Thomas Gleixner wrote:
> This is the second version of providing a base to support device MSI (non
> PCI based) and on top of that support for IMS (Interrupt Message Storm)
> based devices in a halfways architecture independent way.
> 
> The first version can be found here:
> 
> https://lore.kernel.org/r/20200821002424.119492...@linutronix.de
> 
> It's still a mixed bag of bug fixes, cleanups and general improvements
> which are worthwhile independent of device MSI.

Reverting the part of this patchset on the top of today's linux-next fixed an
boot issue on HPE ProLiant DL560 Gen10, i.e.,

$ git revert --no-edit 13b90cadfc29..bc95fd0d7c42

.config: https://gitlab.com/cailca/linux-mm/-/blob/master/x86.config

It looks like the crashes happen in the interrupt remapping code where they are
only able to to generate partial call traces.

[1.912386][T0] ACPI: X2APIC_NMI (uid[0xf5] high level 9983][T0] ... 
MAX_LOCK_DEPTH:  48
[7.914876][T0] ... MAX_LOCKDEP_KEYS:8192
[7.919942][T0] ... CLASSHASH_SIZE:  4096
[7.925009][T0] ... MAX_LOCKDEP_ENTRIES: 32768
[7.930163][T0] ... MAX_LOCKDEP_CHAINS:  65536
[7.935318][T0] ... CHAINHASH_SIZE:  32768
[7.940473][T0]  memory used by lock dependency info: 6301 kB
[7.946586][T0]  memory used for stack traces: 4224 kB
[7.952088][T0]  per task-struct memory footprint: 1920 bytes
[7.968312][T0] mempolicy: Enabling automatic NUMA balancing. Configure 
with numa_balancing= or the kernel.numa_balancing sysctl
[7.980281][T0] ACPI: Core revision 20200717
[7.993343][T0] clocksource: hpet: mask: 0x max_cycles: 
0x, max_idle_ns: 79635855245 ns
[8.003270][T0] APIC: Switch to symmetric I/O mode setup
[8.008951][T0] DMAR: Host address width 46
[8.013512][T0] DMAR: DRHD base: 0x00e5ffc000 flags: 0x0
[8.019680][T0] DMAR: dmar0: reg_base_addr e5ffc000 ver 1:0 cap 
8d2078c106f0466 [T0] DMAR-IR: IOAPIC id 15 under DRHD base  0xe5ffc000 
IOMMU 0
[8.420990][T0] DMAR-IR: IOAPIC id 8 under DRHD base  0xddffc000 IOMMU 15
[8.428166][T0] DMAR-IR: IOAPIC id 9 under DRHD base  0xddffc000 IOMMU 15
[8.435341][T0] DMAR-IR: HPET id 0 under DRHD base 0xddffc000
[8.441456][T0] DMAR-IR: Queued invalidation will be enabled to support 
x2apic and Intr-remapping.
[8.457911][T0] DMAR-IR: Enabled IRQ remapping in x2apic mode
[8.466614][T0] BUG: kernel NULL pointer dereference, address: 

[8.474295][T0] #PF: supervisor instruction fetch in kernel mode
[8.480669][T0] #PF: error_code(0x0010) - not-present page
[8.486518][T0] PGD 0 P4D 0 
[8.489757][T0] Oops: 0010 [#1] SMP KASAN PTI
[8.494476][T0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G  I  
 5.9.0-rc6-next-20200925 #2
[8.503987][T0] Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 
Gen10, BIOS U34 11/13/2019
[8.513238][T0] RIP: 0010:0x0
[8.516562][T0] Code: Bad RIP v

or

[2.906744][T0] ACPI: X2API32, address 0xfec68000, GSI 128-135
[2.907063][T0] IOAPIC[15]: apic_id 29, version 32, address 0xfec7, 
GSI 136-143
[2.907071][T0] IOAPIC[16]: apic_id 30, version 32, address 0xfec78000, 
GSI 144-151
[2.907079][T0] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[2.907084][T0] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high 
level)
[2.907100][T0] Using ACPI (MADT) for SMP configuration information
[2.907105][T0] ACPI: HPET id: 0x8086a701 base: 0xfed0
[2.907116][T0] ACPI: SPCR: console: uart,mmio,0x0,115200
[2.907121][T0] TSC deadline timer available
[2.907126][T0] smpboot: Allowing 144 CPUs, 0 hotplug CPUs
[2.907163][T0] [mem 0xd000-0xfdff] available for PCI devices
[2.907175][T0] clocksource: refined-jiffies: mask: 0x 
max_cycles: 0x, max_idle_ns: 1911260446275 ns
[2.914541][T0] setup_percpu: NR_CPUS:256 nr_cpumask_bits:144 
nr_cpu_ids:144 nr_node_ids:4
[2.926109][   466 ecap f020df
[9.134709][T0] DMAR: DRHD base: 0x00f5ffc000 flags: 0x0
[9.140867][T0] DMAR: dmar8: reg_base_addr f5ffc000 ver 1:0 cap 
8d2078c106f0466 ecap f020df
[9.149610][T0] DMAR: DRHD base: 0x00f7ffc000 flags: 0x0
[9.155762][T0] DMAR: dmar9: reg_base_addr f7ffc000 ver 1:0 cap 
8d2078c106f0466 ecap f020df
[9.164491][T0] DMAR: DRHD base: 0x00f9ffc000 flags: 0x0
[9.170645][T0] DMAR: dmar10: reg_base_addr f9ffc000 ver 1:0 cap 
8d2078c106f0466 ecap f020df
[9.179476][T0] DMAR: DRHD base: 0x00fbffc000 flags: 0x0
[9.185626][T0] DMAR: dmar11: reg_base_addr fbffc000 ver 1:0 cap 
8d2078c106f0466 ecap f020df
[9.194442][T0] DMAR: DRHD base: 0x00dfffc000 flags: 0x0
[9.200587][T0] DMAR: dmar12: 

Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable

2020-09-25 Thread Qian Cai
On Wed, 2020-08-26 at 13:17 +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> The arch_.*_msi_irq[s] fallbacks are compiled in whether an architecture
> requires them or not. Architectures which are fully utilizing hierarchical
> irq domains should never call into that code.
> 
> It's not only architectures which depend on that by implementing one or
> more of the weak functions, there is also a bunch of drivers which relies
> on the weak functions which invoke msi_controller::setup_irq[s] and
> msi_controller::teardown_irq.
> 
> Make the architectures and drivers which rely on them select them in Kconfig
> and if not selected replace them by stub functions which emit a warning and
> fail the PCI/MSI interrupt allocation.
> 
> Signed-off-by: Thomas Gleixner 

Today's linux-next will have some warnings on s390x:

.config: https://gitlab.com/cailca/linux-mm/-/blob/master/s390.config

WARNING: unmet direct dependencies detected for PCI_MSI_ARCH_FALLBACKS
  Depends on [n]: PCI [=n]
  Selected by [y]:
  - S390 [=y]

WARNING: unmet direct dependencies detected for PCI_MSI_ARCH_FALLBACKS
  Depends on [n]: PCI [=n]
  Selected by [y]:
  - S390 [=y]

> ---
> V2: Make the architectures (and drivers) which need the fallbacks select them
> and not the other way round (Bjorn).
> ---
>  arch/ia64/Kconfig  |1 +
>  arch/mips/Kconfig  |1 +
>  arch/powerpc/Kconfig   |1 +
>  arch/s390/Kconfig  |1 +
>  arch/sparc/Kconfig |1 +
>  arch/x86/Kconfig   |1 +
>  drivers/pci/Kconfig|3 +++
>  drivers/pci/controller/Kconfig |3 +++
>  drivers/pci/msi.c  |3 ++-
>  include/linux/msi.h|   31 ++-
>  10 files changed, 40 insertions(+), 6 deletions(-)
> 
> --- a/arch/ia64/Kconfig
> +++ b/arch/ia64/Kconfig
> @@ -56,6 +56,7 @@ config IA64
>   select NEED_DMA_MAP_STATE
>   select NEED_SG_DMA_LENGTH
>   select NUMA if !FLATMEM
> + select PCI_MSI_ARCH_FALLBACKS
>   default y
>   help
> The Itanium Processor Family is Intel's 64-bit successor to
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -86,6 +86,7 @@ config MIPS
>   select MODULES_USE_ELF_REL if MODULES
>   select MODULES_USE_ELF_RELA if MODULES && 64BIT
>   select PERF_USE_VMALLOC
> + select PCI_MSI_ARCH_FALLBACKS
>   select RTC_LIB
>   select SYSCTL_EXCEPTION_TRACE
>   select VIRT_TO_BUS
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -246,6 +246,7 @@ config PPC
>   select OLD_SIGACTIONif PPC32
>   select OLD_SIGSUSPEND
>   select PCI_DOMAINS  if PCI
> + select PCI_MSI_ARCH_FALLBACKS
>   select PCI_SYSCALL  if PCI
>   select PPC_DAWR if PPC64
>   select RTC_LIB
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -185,6 +185,7 @@ config S390
>   select OLD_SIGSUSPEND3
>   select PCI_DOMAINS  if PCI
>   select PCI_MSI  if PCI
> + select PCI_MSI_ARCH_FALLBACKS
>   select SPARSE_IRQ
>   select SYSCTL_EXCEPTION_TRACE
>   select THREAD_INFO_IN_TASK
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -43,6 +43,7 @@ config SPARC
>   select GENERIC_STRNLEN_USER
>   select MODULES_USE_ELF_RELA
>   select PCI_SYSCALL if PCI
> + select PCI_MSI_ARCH_FALLBACKS
>   select ODD_RT_SIGACTION
>   select OLD_SIGSUSPEND
>   select CPU_NO_EFFICIENT_FFS
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -225,6 +225,7 @@ config X86
>   select NEED_SG_DMA_LENGTH
>   select PCI_DOMAINS  if PCI
>   select PCI_LOCKLESS_CONFIG  if PCI
> + select PCI_MSI_ARCH_FALLBACKS
>   select PERF_EVENTS
>   select RTC_LIB
>   select RTC_MC146818_LIB
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -56,6 +56,9 @@ config PCI_MSI_IRQ_DOMAIN
>   depends on PCI_MSI
>   select GENERIC_MSI_IRQ_DOMAIN
>  
> +config PCI_MSI_ARCH_FALLBACKS
> + bool
> +
>  config PCI_QUIRKS
>   default y
>   bool "Enable PCI quirk workarounds" if EXPERT
> --- a/drivers/pci/controller/Kconfig
> +++ b/drivers/pci/controller/Kconfig
> @@ -41,6 +41,7 @@ config PCI_TEGRA
>   bool "NVIDIA Tegra PCIe controller"
>   depends on ARCH_TEGRA || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
> + select PCI_MSI_ARCH_FALLBACKS
>   help
> Say Y here if you want support for the PCIe host controller found
> on NVIDIA Tegra SoCs.
> @@ -67,6 +68,7 @@ config PCIE_RCAR_HOST
>   bool "Renesas R-Car PCIe host controller"
>   depends on ARCH_RENESAS || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
> + select PCI_MSI_ARCH_FALLBACKS
>   help
> Say Y here if you want PCIe controller support on R-Car SoCs in host
> mode.
> @@ -103,6 +105,7 @@ config PCIE_X

Re: [PATCH v3 00/34] iommu: Move iommu_group setup to IOMMU core code

2020-07-03 Thread Qian Cai
On Tue, Jun 30, 2020 at 08:40:28PM -0400, Qian Cai wrote:
> On Wed, Apr 29, 2020 at 03:36:38PM +0200, Joerg Roedel wrote:
> > Hi,
> > 
> > here is the third version of this patch-set. Older versions can be found
> > here:
> > 
> > v1: https://lore.kernel.org/lkml/20200407183742.4344-1-j...@8bytes.org/
> > (Has some more introductory text)
> > 
> > v2: https://lore.kernel.org/lkml/20200414131542.25608-1-j...@8bytes.org/
> > 
> > Changes v2 -> v3:
> > 
> > * Rebased v5.7-rc3
> > 
> > * Added a missing iommu_group_put() as reported by Lu Baolu.
> > 
> > * Added a patch to consolidate more initialization work in
> >   __iommu_probe_device(), fixing a bug where no 'struct
> >   device_iommu' was allocated in the hotplug path.
> > 
> > There is also a git-branch available with these patches applied:
> > 
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device-v3
> > 
> > Please review. If there are no objections I plan to put these patches
> > into the IOMMU tree early next week.
> 
> Looks like this patchset introduced an use-after-free on arm-smmu-v3.
> 
> Reproduced using mlx5,
> 
> # echo 1 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs
> # echo 0 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs 
> 
> The .config,
> https://github.com/cailca/linux-mm/blob/master/arm64.config
> 
> Looking at the free stack,
> 
> iommu_release_device->iommu_group_remove_device
> 
> was introduced in 07/34 ("iommu: Add probe_device() and release_device()
> call-backs").

FYI, I have just sent a patch to fix this,

https://lore.kernel.org/linux-iommu/20200704001003.2303-1-...@lca.pw/

> 
> [ 9426.724641][ T3356] pci :0b:01.2: Removing from iommu group 3
> [ 9426.731347][ T3356] 
> ==
> [ 9426.739263][ T3356] BUG: KASAN: use-after-free in 
> __lock_acquire+0x3458/0x4440
> __lock_acquire at kernel/locking/lockdep.c:4250
> [ 9426.746477][ T3356] Read of size 8 at addr 0089df1a6f68 by task 
> bash/3356
> [ 9426.753601][ T3356]
> [ 9426.755782][ T3356] CPU: 5 PID: 3356 Comm: bash Not tainted 
> 5.8.0-rc3-next-20200630 #2
> [ 9426.763687][ T3356] Hardware name: HPE Apollo 70 
> /C01_APACHE_MB , BIOS L50_5.13_1.11 06/18/2019
> [ 9426.774111][ T3356] Call trace:
> [ 9426.777245][ T3356]  dump_backtrace+0x0/0x398
> [ 9426.781593][ T3356]  show_stack+0x14/0x20
> [ 9426.785596][ T3356]  dump_stack+0x140/0x1b8
> [ 9426.789772][ T3356]  print_address_description.isra.12+0x54/0x4a8
> [ 9426.795855][ T3356]  kasan_report+0x134/0x1b8
> [ 9426.800203][ T3356]  __asan_report_load8_noabort+0x2c/0x50
> [ 9426.805679][ T3356]  __lock_acquire+0x3458/0x4440
> [ 9426.810373][ T3356]  lock_acquire+0x204/0xf10
> [ 9426.814722][ T3356]  _raw_spin_lock_irqsave+0xf8/0x180
> [ 9426.819853][ T3356]  arm_smmu_detach_dev+0xd8/0x4a0
> arm_smmu_detach_dev at drivers/iommu/arm-smmu-v3.c:2776
> [ 9426.824721][ T3356]  arm_smmu_release_device+0xb4/0x1c8
> arm_smmu_disable_pasid at drivers/iommu/arm-smmu-v3.c:2754
> (inlined by) arm_smmu_release_device at drivers/iommu/arm-smmu-v3.c:3000
> [ 9426.829937][ T3356]  iommu_release_device+0xc0/0x178
> iommu_release_device at drivers/iommu/iommu.c:302
> [ 9426.834892][ T3356]  iommu_bus_notifier+0x118/0x160
> [ 9426.839762][ T3356]  notifier_call_chain+0xa4/0x128
> [ 9426.844630][ T3356]  __blocking_notifier_call_chain+0x70/0xa8
> [ 9426.850367][ T3356]  blocking_notifier_call_chain+0x14/0x20
> [ 9426.855929][ T3356]  device_del+0x618/0xa00
> [ 9426.860105][ T3356]  pci_remove_bus_device+0x108/0x2d8
> [ 9426.865233][ T3356]  pci_stop_and_remove_bus_device+0x1c/0x28
> [ 9426.870972][ T3356]  pci_iov_remove_virtfn+0x228/0x368
> [ 9426.876100][ T3356]  sriov_disable+0x8c/0x348
> [ 9426.880447][ T3356]  pci_disable_sriov+0x5c/0x70
> [ 9426.885117][ T3356]  mlx5_core_sriov_configure+0xd8/0x260 [mlx5_core]
> [ 9426.891549][ T3356]  sriov_numvfs_store+0x240/0x318
> [ 9426.896417][ T3356]  dev_attr_store+0x38/0x68
> [ 9426.900766][ T3356]  sysfs_kf_write+0xdc/0x128
> [ 9426.905200][ T3356]  kernfs_fop_write+0x23c/0x448
> [ 9426.909897][ T3356]  __vfs_write+0x54/0xe8
> [ 9426.913984][ T3356]  vfs_write+0x124/0x3f0
> [ 9426.918070][ T3356]  ksys_write+0xe8/0x1b8
> [ 9426.922157][ T3356]  __arm64_sys_write+0x68/0x98
> [ 9426.926766][ T3356]  do_el0_svc+0x124/0x220
> [ 9426.930941][ T3356]  el0_sync_handler+0x260/0x408
> [ 9426.935634][ T3356]  el0_sync+0x140/0x180
> [ 9426.939633][ T3356]
> [ 9426.941810][ T3356] Allocated by task 3356:
> [ 9426.9

[PATCH] iommu: fix use-after-free in iommu_release_device

2020-07-03 Thread Qian Cai
b fb fb fb fb
  0089df1a6e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 >0089df1a6f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ^
  0089df1a6f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  0089df1a7000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Fixes: a6a4c7e2c5b8 ("iommu: Add probe_device() and release_device() 
call-backs")
Signed-off-by: Qian Cai 
---
 drivers/iommu/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 1ed1e14a1f0c..0eeb3251266a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -295,10 +295,10 @@ void iommu_release_device(struct device *dev)
return;
 
iommu_device_unlink(dev->iommu->iommu_dev, dev);
-   iommu_group_remove_device(dev);
 
ops->release_device(dev);
 
+   iommu_group_remove_device(dev);
module_put(ops->owner);
dev_iommu_free(dev);
 }
-- 
2.21.0 (Apple Git-122.2)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 00/34] iommu: Move iommu_group setup to IOMMU core code

2020-06-30 Thread Qian Cai
On Wed, Apr 29, 2020 at 03:36:38PM +0200, Joerg Roedel wrote:
> Hi,
> 
> here is the third version of this patch-set. Older versions can be found
> here:
> 
>   v1: https://lore.kernel.org/lkml/20200407183742.4344-1-j...@8bytes.org/
>   (Has some more introductory text)
> 
>   v2: https://lore.kernel.org/lkml/20200414131542.25608-1-j...@8bytes.org/
> 
> Changes v2 -> v3:
> 
>   * Rebased v5.7-rc3
> 
>   * Added a missing iommu_group_put() as reported by Lu Baolu.
> 
>   * Added a patch to consolidate more initialization work in
> __iommu_probe_device(), fixing a bug where no 'struct
> device_iommu' was allocated in the hotplug path.
> 
> There is also a git-branch available with these patches applied:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device-v3
> 
> Please review. If there are no objections I plan to put these patches
> into the IOMMU tree early next week.

Looks like this patchset introduced an use-after-free on arm-smmu-v3.

Reproduced using mlx5,

# echo 1 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs
# echo 0 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs 

The .config,
https://github.com/cailca/linux-mm/blob/master/arm64.config

Looking at the free stack,

iommu_release_device->iommu_group_remove_device

was introduced in 07/34 ("iommu: Add probe_device() and release_device()
call-backs").

[ 9426.724641][ T3356] pci :0b:01.2: Removing from iommu group 3
[ 9426.731347][ T3356] 
==
[ 9426.739263][ T3356] BUG: KASAN: use-after-free in 
__lock_acquire+0x3458/0x4440
__lock_acquire at kernel/locking/lockdep.c:4250
[ 9426.746477][ T3356] Read of size 8 at addr 0089df1a6f68 by task bash/3356
[ 9426.753601][ T3356]
[ 9426.755782][ T3356] CPU: 5 PID: 3356 Comm: bash Not tainted 
5.8.0-rc3-next-20200630 #2
[ 9426.763687][ T3356] Hardware name: HPE Apollo 70 /C01_APACHE_MB  
   , BIOS L50_5.13_1.11 06/18/2019
[ 9426.774111][ T3356] Call trace:
[ 9426.777245][ T3356]  dump_backtrace+0x0/0x398
[ 9426.781593][ T3356]  show_stack+0x14/0x20
[ 9426.785596][ T3356]  dump_stack+0x140/0x1b8
[ 9426.789772][ T3356]  print_address_description.isra.12+0x54/0x4a8
[ 9426.795855][ T3356]  kasan_report+0x134/0x1b8
[ 9426.800203][ T3356]  __asan_report_load8_noabort+0x2c/0x50
[ 9426.805679][ T3356]  __lock_acquire+0x3458/0x4440
[ 9426.810373][ T3356]  lock_acquire+0x204/0xf10
[ 9426.814722][ T3356]  _raw_spin_lock_irqsave+0xf8/0x180
[ 9426.819853][ T3356]  arm_smmu_detach_dev+0xd8/0x4a0
arm_smmu_detach_dev at drivers/iommu/arm-smmu-v3.c:2776
[ 9426.824721][ T3356]  arm_smmu_release_device+0xb4/0x1c8
arm_smmu_disable_pasid at drivers/iommu/arm-smmu-v3.c:2754
(inlined by) arm_smmu_release_device at drivers/iommu/arm-smmu-v3.c:3000
[ 9426.829937][ T3356]  iommu_release_device+0xc0/0x178
iommu_release_device at drivers/iommu/iommu.c:302
[ 9426.834892][ T3356]  iommu_bus_notifier+0x118/0x160
[ 9426.839762][ T3356]  notifier_call_chain+0xa4/0x128
[ 9426.844630][ T3356]  __blocking_notifier_call_chain+0x70/0xa8
[ 9426.850367][ T3356]  blocking_notifier_call_chain+0x14/0x20
[ 9426.855929][ T3356]  device_del+0x618/0xa00
[ 9426.860105][ T3356]  pci_remove_bus_device+0x108/0x2d8
[ 9426.865233][ T3356]  pci_stop_and_remove_bus_device+0x1c/0x28
[ 9426.870972][ T3356]  pci_iov_remove_virtfn+0x228/0x368
[ 9426.876100][ T3356]  sriov_disable+0x8c/0x348
[ 9426.880447][ T3356]  pci_disable_sriov+0x5c/0x70
[ 9426.885117][ T3356]  mlx5_core_sriov_configure+0xd8/0x260 [mlx5_core]
[ 9426.891549][ T3356]  sriov_numvfs_store+0x240/0x318
[ 9426.896417][ T3356]  dev_attr_store+0x38/0x68
[ 9426.900766][ T3356]  sysfs_kf_write+0xdc/0x128
[ 9426.905200][ T3356]  kernfs_fop_write+0x23c/0x448
[ 9426.909897][ T3356]  __vfs_write+0x54/0xe8
[ 9426.913984][ T3356]  vfs_write+0x124/0x3f0
[ 9426.918070][ T3356]  ksys_write+0xe8/0x1b8
[ 9426.922157][ T3356]  __arm64_sys_write+0x68/0x98
[ 9426.926766][ T3356]  do_el0_svc+0x124/0x220
[ 9426.930941][ T3356]  el0_sync_handler+0x260/0x408
[ 9426.935634][ T3356]  el0_sync+0x140/0x180
[ 9426.939633][ T3356]
[ 9426.941810][ T3356] Allocated by task 3356:
[ 9426.945985][ T3356]  save_stack+0x24/0x50
[ 9426.949986][ T3356]  __kasan_kmalloc.isra.13+0xc4/0xe0
[ 9426.955114][ T3356]  kasan_kmalloc+0xc/0x18
[ 9426.959288][ T3356]  kmem_cache_alloc_trace+0x1ec/0x318
[ 9426.964503][ T3356]  arm_smmu_domain_alloc+0x54/0x148
[ 9426.969545][ T3356]  iommu_group_alloc_default_domain+0xc0/0x440
[ 9426.975541][ T3356]  iommu_probe_device+0x1c0/0x308
[ 9426.980409][ T3356]  iort_iommu_configure+0x434/0x518
[ 9426.985452][ T3356]  acpi_dma_configure+0xf0/0x128
[ 9426.990235][ T3356]  pci_dma_configure+0x114/0x160
[ 9426.995017][ T3356]  really_probe+0x124/0x6d8
[ 9426.999364][ T3356]  driver_probe_device+0xc4/0x180
[ 9427.004232][ T3356]  __device_attach_driver+0x184/0x1e8
[ 9427.009447][ T3356]  bus_for_each_drv+0x114/0x1a0
[ 9427.014142][ T

Re: [PATCH v2 0/2] iommu/amd: Don't use atomic64_t for domain->pt_root

2020-06-26 Thread Qian Cai


> On Jun 26, 2020, at 4:05 AM, Joerg Roedel  wrote:
> 
> a previous discussion pointed out that using atomic64_t for that
> purpose is a bit of overkill. This patch-set replaces it with unsigned
> long and introduces some helpers first to make the change more easy.

BTW, from the previous discussion, Linus mentioned,
 
“
The thing is, the 64-bit atomic reads/writes are very expensive on
32-bit x86. If it was just a native pointer, it would be much cheaper
than an "atomic64_t".
“

However, here we have AMD_IOMMU depend on x86_64, so I am wondering if it makes 
any sense to run this code on 32-bit x86 at all?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu/amd: Use 'unsigned long' for domain->pt_root

2020-06-25 Thread Qian Cai
On Thu, Jun 25, 2020 at 04:52:27PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> Using atomic64_t can be quite expensive, so use unsigned long instead.
> This is safe because the write becomes visible atomically.
> 
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/iommu/amd/amd_iommu_types.h |  2 +-
>  drivers/iommu/amd/iommu.c   | 10 --
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/amd/amd_iommu_types.h 
> b/drivers/iommu/amd/amd_iommu_types.h
> index 30a5d412255a..f6f102282dda 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -468,7 +468,7 @@ struct protection_domain {
>  iommu core code */
>   spinlock_t lock;/* mostly used to lock the page table*/
>   u16 id; /* the domain id written to the device table */
> - atomic64_t pt_root; /* pgtable root and pgtable mode */
> + unsigned long pt_root;  /* pgtable root and pgtable mode */
>   int glx;/* Number of levels for GCR3 table */
>   u64 *gcr3_tbl;  /* Guest CR3 table */
>   unsigned long flags;/* flags to find out type of domain */
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 5286ddcfc2f9..b0e1dc58244e 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -156,7 +156,7 @@ static struct protection_domain *to_pdomain(struct 
> iommu_domain *dom)
>  static void amd_iommu_domain_get_pgtable(struct protection_domain *domain,
>struct domain_pgtable *pgtable)
>  {
> - u64 pt_root = atomic64_read(&domain->pt_root);
> + unsigned long pt_root = domain->pt_root;

The pt_root might be reload later in case of register pressure where the
compiler decides to not store it as a stack variable, so it needs
smp_rmb() here to match to the smp_wmb() in
amd_iommu_domain_set_pt_root() to make the load visiable to all CPUs.

Then, smp_rmb/wmb() wouldn't be able to deal with data races, so it
needs,

unsigned long pt_root = READ_ONCE(domain->pt_root);

>  
>   pgtable->root = (u64 *)(pt_root & PAGE_MASK);
>   pgtable->mode = pt_root & 7; /* lowest 3 bits encode pgtable mode */
> @@ -164,7 +164,13 @@ static void amd_iommu_domain_get_pgtable(struct 
> protection_domain *domain,
>  
>  static void amd_iommu_domain_set_pt_root(struct protection_domain *domain, 
> u64 root)
>  {
> - atomic64_set(&domain->pt_root, root);
> + domain->pt_root = root;

WRITE_ONCE(domain->pt_root, root);

> +
> + /*
> +  * The new value needs to be gobally visible in case pt_root gets
> +  * cleared, so that the page-table can be safely freed.
> +  */
> + smp_wmb();
>  }
>  
>  static void amd_iommu_domain_clr_pt_root(struct protection_domain *domain)
> -- 
> 2.27.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH -next] iommu/vt-d: fix a GCC warning

2020-05-21 Thread Qian Cai
The commit 6ee1b77ba3ac ("iommu/vt-d: Add svm/sva invalidate function")
introduced a GCC warning,

drivers/iommu/intel-iommu.c:5330:1: warning: 'static' is not at beginning of
declaration [-Wold-style-declaration]
 const static int
 ^~~~~

Signed-off-by: Qian Cai 
---
 drivers/iommu/intel-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index f75d7d9c231f..ff5a30a94679 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5327,7 +5327,7 @@ static void intel_iommu_aux_detach_device(struct 
iommu_domain *domain,
  * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
  */
 
-const static int
+static const int
 inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
/*
 * PASID based IOTLB invalidation: PASID selective (per PASID),
-- 
2.17.2 (Apple Git-113)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH -next] iommu/amd: fix variable "iommu" set but not used

2020-05-08 Thread Qian Cai
The commit dce8d6964ebd ("iommu/amd: Convert to probe/release_device()
call-backs") introduced an unused variable,

drivers/iommu/amd_iommu.c: In function 'amd_iommu_uninit_device':
drivers/iommu/amd_iommu.c:422:20: warning: variable 'iommu' set but not
used [-Wunused-but-set-variable]
  struct amd_iommu *iommu;
    ^

Signed-off-by: Qian Cai 
---
 drivers/iommu/amd_iommu.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index fef3689ee535..2b8eb58d2bea 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -419,15 +419,12 @@ static void iommu_ignore_device(struct device *dev)
 static void amd_iommu_uninit_device(struct device *dev)
 {
struct iommu_dev_data *dev_data;
-   struct amd_iommu *iommu;
int devid;
 
devid = get_device_id(dev);
if (devid < 0)
return;
 
-   iommu = amd_iommu_rlookup_table[devid];
-
dev_data = search_dev_data(devid);
if (!dev_data)
return;
-- 
2.21.0 (Apple Git-122.2)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()

2020-05-03 Thread Qian Cai



> On May 3, 2020, at 2:39 PM, Joerg Roedel  wrote:
> 
> Can I add your Tested-by when I
> send them to the mailing list tomorrow?

Sure. Feel free to add,

Tested-by: Qian Cai 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()

2020-05-03 Thread Qian Cai



> On Apr 29, 2020, at 7:20 AM, Joerg Roedel  wrote:
> 
> On Mon, Apr 20, 2020 at 09:26:12AM -0400, Qian Cai wrote:
>> No dice. There could be some other races. For example,
> 
> Can you please test this branch:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=amd-iommu-fixes
> 
> It has the previous fix in it and a couple more to make sure the
> device-table is updated and flushed before increase_address_space()
> updates domain->pt_root.

I believe this closed the existing races as it had survived for many days. 
Great work!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()

2020-04-29 Thread Qian Cai


> On Apr 29, 2020, at 7:20 AM, Joerg Roedel  wrote:
> 
> On Mon, Apr 20, 2020 at 09:26:12AM -0400, Qian Cai wrote:
>> No dice. There could be some other races. For example,
> 
> Can you please test this branch:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=amd-iommu-fixes
> 
> It has the previous fix in it and a couple more to make sure the
> device-table is updated and flushed before increase_address_space()
> updates domain->pt_root.

It looks promising as it survives for a day’s stress testing. I’ll keep it 
running for a few days to be sure.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()

2020-04-20 Thread Qian Cai


> On Apr 18, 2020, at 2:34 PM, Joerg Roedel  wrote:
> 
> On Sat, Apr 18, 2020 at 09:01:35AM -0400, Qian Cai wrote:
>> Hard to tell without testing further. I’ll leave that optimization in
>> the future, and focus on fixing those races first.
> 
> Yeah right, we should fix the existing races first before introducing
> new ones ;)
> 
> Btw, THANKS A LOT for tracking down all these race condition bugs, I am
> not even remotely able to trigger them with the hardware I have around.
> 
> I did some hacking and the attached diff shows how I think this race
> condition needs to be fixed. I boot-tested this fix on-top of v5.7-rc1,
> but did no further testing. Can you test it please?

No dice. There could be some other races. For example,

> @@ -1536,16 +1571,19 @@ static u64 *fetch_pte(struct protection_domain 
> *domain,
> unsigned long address,
> unsigned long *page_size)
...
>   amd_iommu_domain_get_pgtable(domain, &pgtable);
> 
>   if (address > PM_LEVEL_SIZE(pgtable.mode))
>   return NULL;
> 
>   level  =  pgtable.mode - 1;
>   pte= &pgtable.root[PM_LEVEL_INDEX(level, address)];

<— increase_address_space()

>   *page_size =  PTE_LEVEL_PAGE_SIZE(level);
> 

while (level > 0) {
*page_size = PTE_LEVEL_PAGE_SIZE(level);

Then in iommu_unmap_page(),

while (unmapped < page_size) {
pte = fetch_pte(dom, bus_addr, &unmap_size);
…
bus_addr  = (bus_addr & ~(unmap_size - 1)) + unmap_size;

bus_addr would be unsync with dom->mode when it enter fetch_pte() again.
Could that be a problem?


[ 5159.274829][ T4057] LTP: starting oom02
[ 5160.382787][   C52] perf: interrupt took too long (7443 > 6208), lowering 
kernel.perf_event_max_sample_rate to 26800
[ 5167.951785][  T812] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffc flags=0x0010]
[ 5167.964540][  T812] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffc1000 flags=0x0010]
[ 5167.977442][  T812] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffc1900 flags=0x0010]
[ 5167.989901][  T812] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffc1d00 flags=0x0010]
[ 5168.002291][  T812] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffc2000 flags=0x0010]
[ 5168.014665][  T812] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffc2400 flags=0x0010]
[ 5168.027132][  T812] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffc2800 flags=0x0010]
[ 5168.039566][  T812] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffc2c00 flags=0x0010]
[ 5168.051956][  T812] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffc3000 flags=0x0010]
[ 5168.064310][  T812] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffc3400 flags=0x0010]
[ 5168.076652][  T812] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 
domain=0x0027 address=0xfffc3800 flags=0x0010]
[ 5168.088290][  T812] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 
domain=0x0027 address=0xfffc3c00 flags=0x0010]
[ 5183.695829][C8] smartpqi :23:00.0: controller is offline: status 
code 0x14803
[ 5183.704390][C8] smartpqi :23:00.0: controller offline
[ 5183.756594][  C101] blk_update_request: I/O error, dev sda, sector 22306304 
op 0x1:(WRITE) flags 0x800 phys_seg 4 prio class 0
[ 5183.756628][   C34] sd 0:1:0:0: [sda] tag#655 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.756759][   C56] blk_update_request: I/O error, dev sda, sector 58480128 
op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.756810][   C79] sd 0:1:0:0: [sda] tag#234 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.756816][  C121] sd 0:1:0:0: [sda] tag#104 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.756837][   T53] sd 0:1:0:0: [sda] tag#4 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.756882][  C121] sd 0:1:0:0: [sda] tag#104 CDB: opcode=0x2a 2a 00 00 4d 
d4 00 00 02 00 00
[ 5183.756892][   C79] sd 0:1:0:0: [sda] tag#234 CDB: opcode=0x2a 2a 00 02 03 
e4 00 00 02 00 00
[ 5183.756909][  C121] blk_update_request: I/O error, dev sda, sector 5100544 
op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.756920][   C79] blk_update_request: I/O error, dev sda, sector 33809408 
op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.756939][   T53] sd 0:1:0:0: [sda] tag#4 CDB: o

Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()

2020-04-19 Thread Qian Cai


> On Apr 18, 2020, at 2:34 PM, Joerg Roedel  wrote:
> 
> On Sat, Apr 18, 2020 at 09:01:35AM -0400, Qian Cai wrote:
>> Hard to tell without testing further. I’ll leave that optimization in
>> the future, and focus on fixing those races first.
> 
> Yeah right, we should fix the existing races first before introducing
> new ones ;)
> 
> Btw, THANKS A LOT for tracking down all these race condition bugs, I am
> not even remotely able to trigger them with the hardware I have around.
> 
> I did some hacking and the attached diff shows how I think this race
> condition needs to be fixed. I boot-tested this fix on-top of v5.7-rc1,
> but did no further testing. Can you test it please?

Sure, give it a few days to see if it could survive.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()

2020-04-18 Thread Qian Cai


> On Apr 18, 2020, at 8:10 AM, Joerg Roedel  wrote:
> 
> Yes, your patch still looks racy. You need to atomically read
> domain->pt_root to a stack variable and derive the pt_root pointer and
> the mode from that variable instead of domain->pt_root directly. If you
> read the domain->pt_root twice there could still be an update between
> the two reads.
> Probably the lock in increase_address_space() can also be avoided if
> pt_root is updated using cmpxchg()?

Hard to tell without testing further. I’ll leave that optimization in the 
future, and focus on fixing those races first.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()

2020-04-16 Thread Qian Cai


> On Apr 13, 2020, at 9:36 PM, Qian Cai  wrote:
> 
> 
> 
>> On Apr 8, 2020, at 10:19 AM, Joerg Roedel  wrote:
>> 
>> Hi Qian,
>> 
>> On Tue, Apr 07, 2020 at 11:36:05AM -0400, Qian Cai wrote:
>>> After further testing, the change along is insufficient. What I am chasing 
>>> right
>>> now is the swap device will go offline after heavy memory pressure below. 
>>> The
>>> symptom is similar to what we have in the commit,
>>> 
>>> 754265bcab78 (“iommu/amd: Fix race in increase_address_space()”)
>>> 
>>> Apparently, it is no possible to take the domain->lock in fetch_pte() 
>>> because it
>>> could sleep.
>> 
>> Thanks a lot for finding and tracking down another race in the AMD IOMMU
>> page-table code.  The domain->lock is a spin-lock and taking it can't
>> sleep. But fetch_pte() is a fast-path and must not take any locks.
>> 
>> I think the best fix is to update the pt_root and mode of the domain
>> atomically by storing the mode in the lower 12 bits of pt_root. This way
>> they are stored together and can be read/write atomically.
> 
> Like this?

So, this is still not enough that would still trigger storage driver offline 
under
memory pressure for a bit longer. It looks to me that in fetch_pte() there are
could still racy?

level  =  domain->mode - 1;
pte= &domain->pt_root[PM_LEVEL_INDEX(level, address)];
<— 
increase_address_space();
*page_size =  PTE_LEVEL_PAGE_SIZE(level);

while (level > 0) {
*page_size = PTE_LEVEL_PAGE_SIZE(level);

Then in iommu_unmap_page(),

while (unmapped < page_size) {
pte = fetch_pte(dom, bus_addr, &unmap_size);
…
bus_addr  = (bus_addr & ~(unmap_size - 1)) + unmap_size;

bus_addr would be unsync with dom->mode when it enter fetch_pte() again.
Could that be a problem?

> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 20cce366e951..b36c6b07cbfd 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -1397,13 +1397,13 @@ static struct page *free_sub_pt(unsigned long root, 
> int mode,
> 
> static void free_pagetable(struct protection_domain *domain)
> {
> - unsigned long root = (unsigned long)domain->pt_root;
> + int level = iommu_get_mode(domain->pt_root);
> + unsigned long root = iommu_get_root(domain->pt_root);
>   struct page *freelist = NULL;
> 
> - BUG_ON(domain->mode < PAGE_MODE_NONE ||
> -domain->mode > PAGE_MODE_6_LEVEL);
> + BUG_ON(level < PAGE_MODE_NONE || level > PAGE_MODE_6_LEVEL);
> 
> - freelist = free_sub_pt(root, domain->mode, freelist);
> + freelist = free_sub_pt(root, level, freelist);
> 
>   free_page_list(freelist);
> }
> @@ -1417,24 +1417,27 @@ static bool increase_address_space(struct 
> protection_domain *domain,
>  unsigned long address,
>  gfp_t gfp)
> {
> + int level;
>   unsigned long flags;
>   bool ret = false;
>   u64 *pte;
> 
>   spin_lock_irqsave(&domain->lock, flags);
> 
> - if (address <= PM_LEVEL_SIZE(domain->mode) ||
> - WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))
> + level = iommu_get_mode(domain->pt_root);
> +
> + if (address <= PM_LEVEL_SIZE(level) ||
> + WARN_ON_ONCE(level == PAGE_MODE_6_LEVEL))
>   goto out;
> 
>   pte = (void *)get_zeroed_page(gfp);
>   if (!pte)
>   goto out;
> 
> - *pte = PM_LEVEL_PDE(domain->mode,
> - iommu_virt_to_phys(domain->pt_root));
> - domain->pt_root  = pte;
> - domain->mode+= 1;
> + *pte = PM_LEVEL_PDE(level,
> + iommu_virt_to_phys((void *)iommu_get_root(domain->pt_root)));
> +
> + WRITE_ONCE(domain->pt_root, (unsigned long)pte + level + 1);
> 
>   ret = true;
> 
> @@ -1452,15 +1455,17 @@ static u64 *alloc_pte(struct protection_domain 
> *domain,
> bool *updated)
> {
>   int level, end_lvl;
> - u64 *pte, *page;
> + u64 *pte, *page, *pt_root, *root;
> 
>   BUG_ON(!is_power_of_2(page_size));
> 
> - while (address > PM_LEVEL_SIZE(domain->mode))
> + while (address > PM_LEVEL_SIZE(iommu_get_mode(domain->pt_root)))
>   *updated = increase_address_space(domain, address, gfp) || 
> *updated;
>

Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()

2020-04-13 Thread Qian Cai


> On Apr 8, 2020, at 10:19 AM, Joerg Roedel  wrote:
> 
> Hi Qian,
> 
> On Tue, Apr 07, 2020 at 11:36:05AM -0400, Qian Cai wrote:
>> After further testing, the change along is insufficient. What I am chasing 
>> right
>> now is the swap device will go offline after heavy memory pressure below. The
>> symptom is similar to what we have in the commit,
>> 
>> 754265bcab78 (“iommu/amd: Fix race in increase_address_space()”)
>> 
>> Apparently, it is no possible to take the domain->lock in fetch_pte() 
>> because it
>> could sleep.
> 
> Thanks a lot for finding and tracking down another race in the AMD IOMMU
> page-table code.  The domain->lock is a spin-lock and taking it can't
> sleep. But fetch_pte() is a fast-path and must not take any locks.
> 
> I think the best fix is to update the pt_root and mode of the domain
> atomically by storing the mode in the lower 12 bits of pt_root. This way
> they are stored together and can be read/write atomically.

Like this?

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 20cce366e951..b36c6b07cbfd 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1397,13 +1397,13 @@ static struct page *free_sub_pt(unsigned long root, int 
mode,
 
 static void free_pagetable(struct protection_domain *domain)
 {
-   unsigned long root = (unsigned long)domain->pt_root;
+   int level = iommu_get_mode(domain->pt_root);
+   unsigned long root = iommu_get_root(domain->pt_root);
struct page *freelist = NULL;
 
-   BUG_ON(domain->mode < PAGE_MODE_NONE ||
-  domain->mode > PAGE_MODE_6_LEVEL);
+   BUG_ON(level < PAGE_MODE_NONE || level > PAGE_MODE_6_LEVEL);
 
-   freelist = free_sub_pt(root, domain->mode, freelist);
+   freelist = free_sub_pt(root, level, freelist);
 
free_page_list(freelist);
 }
@@ -1417,24 +1417,27 @@ static bool increase_address_space(struct 
protection_domain *domain,
   unsigned long address,
   gfp_t gfp)
 {
+   int level;
unsigned long flags;
bool ret = false;
u64 *pte;
 
spin_lock_irqsave(&domain->lock, flags);
 
-   if (address <= PM_LEVEL_SIZE(domain->mode) ||
-   WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))
+   level = iommu_get_mode(domain->pt_root);
+
+   if (address <= PM_LEVEL_SIZE(level) ||
+   WARN_ON_ONCE(level == PAGE_MODE_6_LEVEL))
goto out;
 
pte = (void *)get_zeroed_page(gfp);
if (!pte)
goto out;
 
-   *pte = PM_LEVEL_PDE(domain->mode,
-   iommu_virt_to_phys(domain->pt_root));
-   domain->pt_root  = pte;
-   domain->mode+= 1;
+   *pte = PM_LEVEL_PDE(level,
+   iommu_virt_to_phys((void *)iommu_get_root(domain->pt_root)));
+
+   WRITE_ONCE(domain->pt_root, (unsigned long)pte + level + 1);
 
ret = true;
 
@@ -1452,15 +1455,17 @@ static u64 *alloc_pte(struct protection_domain *domain,
  bool *updated)
 {
int level, end_lvl;
-   u64 *pte, *page;
+   u64 *pte, *page, *pt_root, *root;
 
BUG_ON(!is_power_of_2(page_size));
 
-   while (address > PM_LEVEL_SIZE(domain->mode))
+   while (address > PM_LEVEL_SIZE(iommu_get_mode(domain->pt_root)))
*updated = increase_address_space(domain, address, gfp) || 
*updated;
 
-   level   = domain->mode - 1;
-   pte = &domain->pt_root[PM_LEVEL_INDEX(level, address)];
+   pt_root = READ_ONCE(domain->pt_root);
+   root= (void *)iommu_get_root(pt_root);
+   level   = iommu_get_mode(pt_root) - 1;
+   pte = &root[PM_LEVEL_INDEX(level, address)];
address = PAGE_SIZE_ALIGN(address, page_size);
end_lvl = PAGE_SIZE_LEVEL(page_size);
 
@@ -1536,16 +1541,18 @@ static u64 *fetch_pte(struct protection_domain *domain,
  unsigned long address,
  unsigned long *page_size)
 {
-   int level;
u64 *pte;
+   u64 *pt_root = READ_ONCE(domain->pt_root);
+   u64 *root= (void *)iommu_get_root(pt_root);
+   int level= iommu_get_mode(pt_root);
 
*page_size = 0;
 
-   if (address > PM_LEVEL_SIZE(domain->mode))
+   if (address > PM_LEVEL_SIZE(level))
return NULL;
 
-   level  =  domain->mode - 1;
-   pte= &domain->pt_root[PM_LEVEL_INDEX(level, address)];
+   level--;
+   pte= &root[PM_LEVEL_INDEX(level, address)];
*page_size =  PTE_LEVEL_PAGE_SIZE(level);
 
while (level > 0) {
@@ -1814,12 +1821,13 @@ static struct protection_domain 
*dma_ops_domain_alloc(void)
if (prot

Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()

2020-04-07 Thread Qian Cai


> On Apr 6, 2020, at 10:12 PM, Qian Cai  wrote:
> 
> fetch_pte() could race with increase_address_space() because it held no
> lock from iommu_unmap_page(). On the CPU that runs fetch_pte() it could
> see a stale domain->pt_root and a new increased domain->mode from
> increase_address_space(). As the result, it could trigger invalid
> accesses later on. Fix it by using a pair of smp_[w|r]mb in those
> places.

After further testing, the change along is insufficient. What I am chasing right
now is the swap device will go offline after heavy memory pressure below. The
symptom is similar to what we have in the commit,

754265bcab78 (“iommu/amd: Fix race in increase_address_space()”)

Apparently, it is no possible to take the domain->lock in fetch_pte() because it
could sleep.

Thoughts?

[ 7292.727377][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd8 flags=0x0010]
[ 7292.740571][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd81000 flags=0x0010]
[ 7292.753268][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd81900 flags=0x0010]
[ 7292.766078][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd81d00 flags=0x0010]
[ 7292.778447][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd82000 flags=0x0010]
[ 7292.790724][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd82400 flags=0x0010]
[ 7292.803195][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd82800 flags=0x0010]
[ 7292.815664][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd82c00 flags=0x0010]
[ 7292.828089][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd83000 flags=0x0010]
[ 7292.840468][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd83400 flags=0x0010]
[ 7292.852795][  T814] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 
domain=0x0027 address=0xffd83800 flags=0x0010]
[ 7292.864566][  T814] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 
domain=0x0027 address=0xffd83c00 flags=0x0010]
[ 7326.583908][C8] smartpqi :23:00.0: controller is offline: status 
code 0x14803
[ 7326.592386][C8] smartpqi :23:00.0: controller offline
[ 7326.663728][   C66] sd 0:1:0:0: [sda] tag#467 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=6s
[ 7326.664321][ T2738] smartpqi :23:00.0: resetting scsi 0:1:0:0
[ 7326.673849][   C66] sd 0:1:0:0: [sda] tag#467 CDB: opcode=0x28 28 00 02 ee 
2e 60 00 00 08 00
[ 7326.680118][ T2738] smartpqi :23:00.0: reset of scsi 0:1:0:0: FAILED
[ 7326.688612][   C66] blk_update_request: I/O error, dev sda, sector 49163872 
op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 7326.695456][ T2738] sd 0:1:0:0: Device offlined - not ready after error 
recovery
[ 7326.706632][   C66] Read-error on swap-device (254:1:45833824)
[ 7326.714030][ T2738] sd 0:1:0:0: Device offlined - not ready after error 
recovery
[ 7326.723382][T45523] sd 0:1:0:0: rejecting I/O to offline device
[ 7326.727352][ T2738] sd 0:1:0:0: Device offlined - not ready after error 
recovery
[ 7326.727379][ T2738] sd 0:1:0:0: Device offlined - not ready after error 
recovery
[ 7326.727442][ T2738] sd 0:1:0:0: Device offlined - not ready after error 
recovery

> 
> kernel BUG at drivers/iommu/amd_iommu.c:1704!
> BUG_ON(unmapped && !is_power_of_2(unmapped));
> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 
> 07/10/2019
> RIP: 0010:amd_iommu_unmap+0x1b2/0x1d0
> Call Trace:
>  
>  __iommu_unmap+0x106/0x320
>  iommu_unmap_fast+0xe/0x10
>  __iommu_dma_unmap+0xdc/0x1a0
>  iommu_dma_unmap_sg+0xae/0xd0
>  scsi_dma_unmap+0xe7/0x150
>  pqi_raid_io_complete+0x37/0x60 [smartpqi]
>  pqi_irq_handler+0x1fc/0x13f0 [smartpqi]
>  __handle_irq_event_percpu+0x78/0x4f0
>  handle_irq_event_percpu+0x70/0x100
>  handle_irq_event+0x5a/0x8b
>  handle_edge_irq+0x10c/0x370
>  do_IRQ+0x9e/0x1e0
>  common_interrupt+0xf/0xf
>  
> 
> Signed-off-by: Qian Cai 
> ---
> drivers/iommu/amd_iommu.c | 9 +
> 1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 20cce366e951..22328a23335f 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -1434,6 +1434,11 @@ static bool increase_address_space(struct 
> protection_domain *domain,
>   *pte = PM_LEVEL_PDE(domain->mode,
>   iommu_virt_to_phys(domain->pt_root)

[RFC PATCH] iommu/amd: fix a race in fetch_pte()

2020-04-06 Thread Qian Cai
fetch_pte() could race with increase_address_space() because it held no
lock from iommu_unmap_page(). On the CPU that runs fetch_pte() it could
see a stale domain->pt_root and a new increased domain->mode from
increase_address_space(). As the result, it could trigger invalid
accesses later on. Fix it by using a pair of smp_[w|r]mb in those
places.

 kernel BUG at drivers/iommu/amd_iommu.c:1704!
 BUG_ON(unmapped && !is_power_of_2(unmapped));
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 
07/10/2019
 RIP: 0010:amd_iommu_unmap+0x1b2/0x1d0
 Call Trace:
  
  __iommu_unmap+0x106/0x320
  iommu_unmap_fast+0xe/0x10
  __iommu_dma_unmap+0xdc/0x1a0
  iommu_dma_unmap_sg+0xae/0xd0
  scsi_dma_unmap+0xe7/0x150
  pqi_raid_io_complete+0x37/0x60 [smartpqi]
  pqi_irq_handler+0x1fc/0x13f0 [smartpqi]
  __handle_irq_event_percpu+0x78/0x4f0
  handle_irq_event_percpu+0x70/0x100
  handle_irq_event+0x5a/0x8b
  handle_edge_irq+0x10c/0x370
  do_IRQ+0x9e/0x1e0
  common_interrupt+0xf/0xf
  

Signed-off-by: Qian Cai 
---
 drivers/iommu/amd_iommu.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 20cce366e951..22328a23335f 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1434,6 +1434,11 @@ static bool increase_address_space(struct 
protection_domain *domain,
*pte = PM_LEVEL_PDE(domain->mode,
iommu_virt_to_phys(domain->pt_root));
domain->pt_root  = pte;
+   /*
+* Make sure fetch_pte() will see the new domain->pt_root before it
+* snapshots domain->mode.
+*/
+   smp_wmb();
domain->mode+= 1;
 
ret = true;
@@ -1460,6 +1465,8 @@ static u64 *alloc_pte(struct protection_domain *domain,
*updated = increase_address_space(domain, address, gfp) || 
*updated;
 
level   = domain->mode - 1;
+   /* To pair with smp_wmb() in increase_address_space(). */
+   smp_rmb();
pte = &domain->pt_root[PM_LEVEL_INDEX(level, address)];
address = PAGE_SIZE_ALIGN(address, page_size);
end_lvl = PAGE_SIZE_LEVEL(page_size);
@@ -1545,6 +1552,8 @@ static u64 *fetch_pte(struct protection_domain *domain,
return NULL;
 
level  =  domain->mode - 1;
+   /* To pair with smp_wmb() in increase_address_space(). */
+   smp_rmb();
pte= &domain->pt_root[PM_LEVEL_INDEX(level, address)];
*page_size =  PTE_LEVEL_PAGE_SIZE(level);
 
-- 
2.21.0 (Apple Git-122.2)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: silence a RCU-list debugging warning

2020-03-18 Thread Qian Cai



> On Mar 18, 2020, at 1:27 AM, Lu Baolu  wrote:
> 
> How about changing the commit subject to
> "iommu/vt-d: Silence RCU-list debugging warning in dmar_find_atsr()"?

Just a bit long which should be non-issue.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/vt-d: silence a RCU-list debugging warning

2020-03-17 Thread Qian Cai
dmar_find_atsr() calls list_for_each_entry_rcu() outside of an RCU read
side critical section but with dmar_global_lock held. Silence this
false positive.

 drivers/iommu/intel-iommu.c:4504 RCU-list traversed in non-reader section!!
 1 lock held by swapper/0/1:
 #0: 9755bee8 (dmar_global_lock){+.+.}, at: intel_iommu_init+0x1a6/0xe19

 Call Trace:
  dump_stack+0xa4/0xfe
  lockdep_rcu_suspicious+0xeb/0xf5
  dmar_find_atsr+0x1ab/0x1c0
  dmar_parse_one_atsr+0x64/0x220
  dmar_walk_remapping_entries+0x130/0x380
  dmar_table_init+0x166/0x243
  intel_iommu_init+0x1ab/0xe19
  pci_iommu_init+0x1a/0x44
  do_one_initcall+0xae/0x4d0
  kernel_init_freeable+0x412/0x4c5
  kernel_init+0x19/0x193

Signed-off-by: Qian Cai 
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 4be549478691..ef0a5246700e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4501,7 +4501,8 @@ static struct dmar_atsr_unit *dmar_find_atsr(struct 
acpi_dmar_atsr *atsr)
struct dmar_atsr_unit *atsru;
struct acpi_dmar_atsr *tmp;
 
-   list_for_each_entry_rcu(atsru, &dmar_atsr_units, list) {
+   list_for_each_entry_rcu(atsru, &dmar_atsr_units, list,
+   dmar_rcu_check()) {
tmp = (struct acpi_dmar_atsr *)atsru->hdr;
if (atsr->segment != tmp->segment)
continue;
-- 
2.21.0 (Apple Git-122.2)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/vt-d: fix RCU-list bugs in intel_iommu_init

2020-03-05 Thread Qian Cai
There are several places traverse RCU-list without holding any lock in
intel_iommu_init(). Fix them by acquiring dmar_global_lock.

 WARNING: suspicious RCU usage
 -
 drivers/iommu/intel-iommu.c:5216 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 no locks held by swapper/0/1.

 Call Trace:
  dump_stack+0xa0/0xea
  lockdep_rcu_suspicious+0x102/0x10b
  intel_iommu_init+0x947/0xb13
  pci_iommu_init+0x26/0x62
  do_one_initcall+0xfe/0x500
  kernel_init_freeable+0x45a/0x4f8
  kernel_init+0x11/0x139
  ret_from_fork+0x3a/0x50
 DMAR: Intel(R) Virtualization Technology for Directed I/O

Fixes: d8190dc63886 ("iommu/vt-d: Enable DMA remapping after rmrr mapped")
Signed-off-by: Qian Cai 
---
 drivers/iommu/intel-iommu.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 6fa6de2b6ad5..bc138ceb07bc 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5193,6 +5193,7 @@ int __init intel_iommu_init(void)
 
init_iommu_pm_ops();
 
+   down_read(&dmar_global_lock);
for_each_active_iommu(iommu, drhd) {
iommu_device_sysfs_add(&iommu->iommu, NULL,
   intel_iommu_groups,
@@ -5200,6 +5201,7 @@ int __init intel_iommu_init(void)
iommu_device_set_ops(&iommu->iommu, &intel_iommu_ops);
iommu_device_register(&iommu->iommu);
}
+   up_read(&dmar_global_lock);
 
bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
if (si_domain && !hw_pass_through)
@@ -5210,7 +5212,6 @@ int __init intel_iommu_init(void)
down_read(&dmar_global_lock);
if (probe_acpi_namespace_devices())
pr_warn("ACPI name space devices didn't probe correctly\n");
-   up_read(&dmar_global_lock);
 
/* Finally, we enable the DMA remapping hardware. */
for_each_iommu(iommu, drhd) {
@@ -5219,6 +5220,8 @@ int __init intel_iommu_init(void)
 
iommu_disable_protect_mem_regions(iommu);
}
+   up_read(&dmar_global_lock);
+
pr_info("Intel(R) Virtualization Technology for Directed I/O\n");
 
intel_iommu_enabled = 1;
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH -next] iommu/dmar: silence RCU-list debugging warnings

2020-03-05 Thread Qian Cai
Similar to the commit 02d715b4a818 ("iommu/vt-d: Fix RCU list debugging
warnings"), there are several other places that call
list_for_each_entry_rcu() outside of an RCU read side critical section
but with dmar_global_lock held. Silence those false positives as well.

 drivers/iommu/intel-iommu.c:4288 RCU-list traversed in non-reader section!!
 1 lock held by swapper/0/1:
  #0: 935892c8 (dmar_global_lock){+.+.}, at: 
intel_iommu_init+0x1ad/0xb97

 drivers/iommu/dmar.c:366 RCU-list traversed in non-reader section!!
 1 lock held by swapper/0/1:
  #0: 935892c8 (dmar_global_lock){+.+.}, at: 
intel_iommu_init+0x125/0xb97

 drivers/iommu/intel-iommu.c:5057 RCU-list traversed in non-reader section!!
 1 lock held by swapper/0/1:
  #0: a71892c8 (dmar_global_lock){}, at: 
intel_iommu_init+0x61a/0xb13

Signed-off-by: Qian Cai 
---
 drivers/iommu/dmar.c | 3 ++-
 include/linux/dmar.h | 6 --
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 071bb42bbbc5..7b16c4db40b4 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -363,7 +363,8 @@ static int dmar_pci_bus_notifier(struct notifier_block *nb,
 {
struct dmar_drhd_unit *dmaru;
 
-   list_for_each_entry_rcu(dmaru, &dmar_drhd_units, list)
+   list_for_each_entry_rcu(dmaru, &dmar_drhd_units, list,
+   dmar_rcu_check())
if (dmaru->segment == drhd->segment &&
dmaru->reg_base_addr == drhd->address)
return dmaru;
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 712be8bc6a7c..d7bf029df737 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -74,11 +74,13 @@ struct dmar_pci_notify_info {
dmar_rcu_check())
 
 #define for_each_active_drhd_unit(drhd)
\
-   list_for_each_entry_rcu(drhd, &dmar_drhd_units, list)   \
+   list_for_each_entry_rcu(drhd, &dmar_drhd_units, list,   \
+   dmar_rcu_check())   \
if (drhd->ignored) {} else
 
 #define for_each_active_iommu(i, drhd) \
-   list_for_each_entry_rcu(drhd, &dmar_drhd_units, list)   \
+   list_for_each_entry_rcu(drhd, &dmar_drhd_units, list,   \
+   dmar_rcu_check())   \
if (i=drhd->iommu, drhd->ignored) {} else
 
 #define for_each_iommu(i, drhd)
\
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma: fix variable 'cookie' set but not used

2020-01-06 Thread Qian Cai



> On Jan 6, 2020, at 1:19 PM, Robin Murphy  wrote:
> 
> Fair enough... I guess this is a W=1 thing?

Yes.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/dma: fix variable 'cookie' set but not used

2020-01-06 Thread Qian Cai
The commit c18647900ec8 ("iommu/dma: Relax locking in
iommu_dma_prepare_msi()") introduced a compliation warning,

drivers/iommu/dma-iommu.c: In function 'iommu_dma_prepare_msi':
drivers/iommu/dma-iommu.c:1206:27: warning: variable 'cookie' set but
not used [-Wunused-but-set-variable]
  struct iommu_dma_cookie *cookie;
   ^~

Fixes: c18647900ec8 ("iommu/dma: Relax locking in iommu_dma_prepare_msi()")
Signed-off-by: Qian Cai 
---
 drivers/iommu/dma-iommu.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c363294b3bb9..a2e96a5fd9a7 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1203,7 +1203,6 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, 
phys_addr_t msi_addr)
 {
struct device *dev = msi_desc_to_dev(desc);
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-   struct iommu_dma_cookie *cookie;
struct iommu_dma_msi_page *msi_page;
static DEFINE_MUTEX(msi_prepare_lock); /* see below */
 
@@ -1212,8 +1211,6 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, 
phys_addr_t msi_addr)
return 0;
}
 
-   cookie = domain->iova_cookie;
-
/*
 * In fact the whole prepare operation should already be serialised by
 * irq_domain_mutex further up the callchain, but that's pretty subtle
-- 
2.21.0 (Apple Git-122.2)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] iommu: fix KASAN use-after-free in iommu_insert_resv_region

2019-12-16 Thread Qian Cai



> On Nov 26, 2019, at 5:27 AM, Eric Auger  wrote:
> 
> In case the new region gets merged into another one, the nr
> list node is freed. Checking its type while completing the
> merge algorithm leads to a use-after-free. Use new->type
> instead.
> 
> Fixes: 4dbd258ff63e ("iommu: Revisit iommu_insert_resv_region()
> implementation")
> Signed-off-by: Eric Auger 
> Reported-by: Qian Cai 
> Cc: Stable  #v5.3+


Looks like Joerg is away for a few weeks. Could Andrew or Linus pick up this 
use-after-free?

> 
> ---
> 
> v2 -> v3:
> - directly use new->type
> 
> v1 -> v2:
> - remove spurious new line
> ---
> drivers/iommu/iommu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d658c7c6a2ab..285ad4a4c7f2 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -313,7 +313,7 @@ int iommu_insert_resv_region(struct iommu_resv_region 
> *new,
>   phys_addr_t top_end, iter_end = iter->start + iter->length - 1;
> 
>   /* no merge needed on elements of different types than @nr */
> - if (iter->type != nr->type) {
> + if (iter->type != new->type) {
>   list_move_tail(&iter->list, &stack);
>   continue;
>   }
> -- 
> 2.20.1
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch v3 0/3] iommu: reduce spinlock contention on fast path

2019-12-06 Thread Qian Cai



> On Dec 6, 2019, at 4:38 PM, Cong Wang  wrote:
> 
> This patchset contains three small optimizations for the global spinlock
> contention in IOVA cache. Our memcache perf test shows this reduced its
> p999 latency down by 45% on AMD when IOMMU is enabled.

Can you at least have a changelog compared to previous versions?

> 
> Cong Wang (3):
>  iommu: avoid unnecessary magazine allocations
>  iommu: optimize iova_magazine_free_pfns()
>  iommu: avoid taking iova_rbtree_lock twice
> ---
> drivers/iommu/iova.c | 75 ++--
> 1 file changed, 45 insertions(+), 30 deletions(-)
> 
> -- 
> 2.21.0
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu: match the original algorithm

2019-11-27 Thread Qian Cai



> On Nov 27, 2019, at 1:01 PM, John Garry  wrote:
> 
> I haven't gone into the details, but this patch alone is giving this:
> 
> root@(none)$ [  123.857024] kmemleak: 8 new suspected memory leaks (see 
> /sys/kernel/debug/kmemleak)
> 
> root@(none)$ cat /sys/kernel/debug/kmemleak
> unreferenced object 0x002339843000 (size 2048):
>  comm "swapper/0", pid 1, jiffies 4294898165 (age 122.688s)
>  hex dump (first 32 bytes):
>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>  backtrace:
>[<1d2710bf>] kmem_cache_alloc+0x188/0x260
>[] init_iova_domain+0x1e8/0x2a8
>[<2646fc92>] iommu_setup_dma_ops+0x200/0x710
>[] arch_setup_dma_ops+0x80/0x128
>[<994e1e43>] acpi_dma_configure+0x11c/0x140
>[] pci_dma_configure+0xe0/0x108
>[] really_probe+0x210/0x548
>[<87884b1b>] driver_probe_device+0x7c/0x148
>[<10af2936>] device_driver_attach+0x94/0xa0
>[] __driver_attach+0xa4/0x110
>[] bus_for_each_dev+0xe8/0x158
>[] driver_attach+0x30/0x40
>[<3cf39ba8>] bus_add_driver+0x234/0x2f0
>[<43830a45>] driver_register+0xbc/0x1d0
>[] __pci_register_driver+0xb0/0xc8
>[] sas_v3_pci_driver_init+0x20/0x28
> unreferenced object 0x002339844000 (size 2048):
>  comm "swapper/0", pid 1, jiffies 4294898165 (age 122.688s)
> 
> [snip]
> 
> And I don't feel like continuing until it's resolved

Thanks for talking a hit by this before me. It is frustrating that people tend 
not to test their patches properly  with things like kmemleak.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4] iommu/iova: silence warnings under memory pressure

2019-11-22 Thread Qian Cai
:34403
  active_file:2285 inactive_file:1838 isolated_file:0
  unevictable:0 dirty:1 writeback:5 unstable:0
  slab_reclaimable:13972 slab_unreclaimable:453879
  mapped:2380 shmem:154 pagetables:6948 bounce:0
  free:19133 free_pcp:7363 free_cma:0

Signed-off-by: Qian Cai 
---

v4: use dev_err_once() instead as mentioned above.
v3: insert a "\n" in dev_err_ratelimited() per Joe.
v2: use dev_err_ratelimited() and improve the commit messages.

 drivers/iommu/intel-iommu.c | 3 ++-
 drivers/iommu/iova.c| 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0c8d81f56a30..8c944bbbed8a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3406,7 +3406,8 @@ static unsigned long intel_alloc_iova(struct device *dev,
iova_pfn = alloc_iova_fast(&domain->iovad, nrpages,
   IOVA_PFN(dma_mask), true);
if (unlikely(!iova_pfn)) {
-   dev_err(dev, "Allocating %ld-page iova failed", nrpages);
+   dev_err_once(dev, "Allocating %ld-page iova failed\n",
+nrpages);
return 0;
}
 
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 41c605b0058f..aa1a56aaa5ee 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -233,7 +233,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain 
*iovad,
 
 struct iova *alloc_iova_mem(void)
 {
-   return kmem_cache_alloc(iova_cache, GFP_ATOMIC);
+   return kmem_cache_alloc(iova_cache, GFP_ATOMIC | __GFP_NOWARN);
 }
 EXPORT_SYMBOL(alloc_iova_mem);
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3] iommu/iova: silence warnings under memory pressure

2019-11-22 Thread Qian Cai
lated_file:0
  unevictable:0 dirty:1 writeback:5 unstable:0
  slab_reclaimable:13972 slab_unreclaimable:453879
  mapped:2380 shmem:154 pagetables:6948 bounce:0
  free:19133 free_pcp:7363 free_cma:0

Signed-off-by: Qian Cai 
---

v3: insert a "\n" in dev_err_ratelimited() per Joe.
v2: use dev_err_ratelimited() and improve the commit messages.

 drivers/iommu/intel-iommu.c | 3 ++-
 drivers/iommu/iova.c| 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0c8d81f56a30..855a7bb272d8 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3406,7 +3406,8 @@ static unsigned long intel_alloc_iova(struct device *dev,
iova_pfn = alloc_iova_fast(&domain->iovad, nrpages,
   IOVA_PFN(dma_mask), true);
if (unlikely(!iova_pfn)) {
-   dev_err(dev, "Allocating %ld-page iova failed", nrpages);
+   dev_err_ratelimited(dev, "Allocating %ld-page iova failed\n",
+   nrpages);
return 0;
}
 
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 41c605b0058f..aa1a56aaa5ee 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -233,7 +233,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain 
*iovad,
 
 struct iova *alloc_iova_mem(void)
 {
-   return kmem_cache_alloc(iova_cache, GFP_ATOMIC);
+   return kmem_cache_alloc(iova_cache, GFP_ATOMIC | __GFP_NOWARN);
 }
 EXPORT_SYMBOL(alloc_iova_mem);
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/iova: silence warnings under memory pressure

2019-11-22 Thread Qian Cai
On Fri, 2019-11-22 at 08:28 -0800, Joe Perches wrote:
> On Fri, 2019-11-22 at 09:59 -0500, Qian Cai wrote:
> > On Thu, 2019-11-21 at 20:37 -0800, Joe Perches wrote:
> > > On Thu, 2019-11-21 at 21:55 -0500, Qian Cai wrote:
> > > > When running heavy memory pressure workloads, this 5+ old system is
> > > > throwing endless warnings below because disk IO is too slow to recover
> > > > from swapping. Since the volume from alloc_iova_fast() could be large,
> > > > once it calls printk(), it will trigger disk IO (writing to the log
> > > > files) and pending softirqs which could cause an infinite loop and make
> > > > no progress for days by the ongoimng memory reclaim. This is the counter
> > > > part for Intel where the AMD part has already been merged. See the
> > > > commit 3d708895325b ("iommu/amd: Silence warnings under memory
> > > > pressure"). Since the allocation failure will be reported in
> > > > intel_alloc_iova(), so just call printk_ratelimted() there and silence
> > > > the one in alloc_iova_mem() to avoid the expensive warn_alloc().
> > > 
> > > []
> > > > v2: use dev_err_ratelimited() and improve the commit messages.
> > > 
> > > []
> > > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > > 
> > > []
> > > > @@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct 
> > > > device *dev,
> > > > iova_pfn = alloc_iova_fast(&domain->iovad, nrpages,
> > > >IOVA_PFN(dma_mask), true);
> > > > if (unlikely(!iova_pfn)) {
> > > > -   dev_err(dev, "Allocating %ld-page iova failed", 
> > > > nrpages);
> > > > +   dev_err_ratelimited(dev, "Allocating %ld-page iova 
> > > > failed",
> > > > +   nrpages);
> > > 
> > > Trivia:
> > > 
> > > This should really have a \n termination on the format string
> > > 
> > >   dev_err_ratelimited(dev, "Allocating %ld-page iova failed\n",
> > > 
> > > 
> > 
> > Why do you say so? It is right now printing with a newline added anyway.
> > 
> >  hpsa :03:00.0: DMAR: Allocating 1-page iova failed
> 
> If another process uses pr_cont at the same time,
> it can be interleaved.

I lean towards fixing that in a separate patch if ever needed, as the origin
dev_err() has no "\n" enclosed either.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


"Revisit iommu_insert_resv_region() implementation" causes use-after-free

2019-11-22 Thread Qian Cai
Read files under /sys/kernel/iommu_groups/ triggers an use-after-free. Reverted
the commit 4dbd258ff63e ("iommu: Revisit iommu_insert_resv_region()
implementation") fixed the issue.

/* no merge needed on elements of different types than @nr */
if (iter->type != nr->type) {
list_move_tail(&iter->list, &stack);
continue;

[  160.156964][ T3100] BUG: KASAN: use-after-free in
iommu_insert_resv_region+0x34b/0x520
[  160.197758][ T3100] Read of size 4 at addr 8887aba78464 by task cat/3100
[  160.230645][ T3100] 
[  160.240907][ T3100] CPU: 14 PID: 3100 Comm: cat Not tainted 5.4.0-rc8-next-
20191122+ #11
[  160.278671][ T3100] Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420
Gen9, BIOS U19 12/27/2015
[  160.320589][ T3100] Call Trace:
[  160.335229][ T3100]  dump_stack+0xa0/0xea
[  160.354011][ T3100]  print_address_description.constprop.5.cold.7+0x9/0x384
[  160.386569][ T3100]  __kasan_report.cold.8+0x7a/0xc0
[  160.409811][ T3100]  ? iommu_insert_resv_region+0x34b/0x520
[  160.435668][ T3100]  kasan_report+0x12/0x20
[  160.455387][ T3100]  __asan_load4+0x95/0xa0
[  160.474808][ T3100]  iommu_insert_resv_region+0x34b/0x520
[  160.500228][ T3100]  ? iommu_bus_notifier+0xe0/0xe0
[  160.522904][ T3100]  ? intel_iommu_get_resv_regions+0x348/0x400
[  160.550461][ T3100]  iommu_get_group_resv_regions+0x16d/0x2f0
[  160.577611][ T3100]  ? iommu_insert_resv_region+0x520/0x520
[  160.603756][ T3100]  ? register_lock_class+0x940/0x940
[  160.628265][ T3100]  iommu_group_show_resv_regions+0x8d/0x1f0
[  160.655370][ T3100]  ? iommu_get_group_resv_regions+0x2f0/0x2f0
[  160.684168][ T3100]  iommu_group_attr_show+0x34/0x50
[  160.708395][ T3100]  sysfs_kf_seq_show+0x11c/0x220
[  160.731758][ T3100]  ? iommu_default_passthrough+0x20/0x20
[  160.756898][ T3100]  kernfs_seq_show+0xa4/0xb0
[  160.777097][ T3100]  seq_read+0x27e/0x710
[  160.795195][ T3100]  kernfs_fop_read+0x7d/0x2c0
[  160.815349][ T3100]  __vfs_read+0x50/0xa0
[  160.834154][ T3100]  vfs_read+0xcb/0x1e0
[  160.852332][ T3100]  ksys_read+0xc6/0x160
[  160.871028][ T3100]  ? kernel_write+0xc0/0xc0
[  160.891307][ T3100]  ? do_syscall_64+0x79/0xaec
[  160.912446][ T3100]  ? do_syscall_64+0x79/0xaec
[  160.933640][ T3100]  __x64_sys_read+0x43/0x50
[  160.953957][ T3100]  do_syscall_64+0xcc/0xaec
[  160.974322][ T3100]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  160.999130][ T3100]  ? syscall_return_slowpath+0x580/0x580
[  161.024753][ T3100]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
[  161.052416][ T3100]  ? trace_hardirqs_off_caller+0x3a/0x150
[  161.078400][ T3100]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  161.103711][ T3100]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  161.130793][ T3100] RIP: 0033:0x7f33e0d89d75
[  161.150732][ T3100] Code: fe ff ff 50 48 8d 3d 4a dc 09 00 e8 25 0e 02 00 0f
1f 44 00 00 f3 0f 1e fa 48 8d 05 a5 59 2d 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48>
3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89
[  161.245503][ T3100] RSP: 002b:7fff88f0db88 EFLAGS: 0246 ORIG_RAX:

[  161.284547][ T3100] RAX: ffda RBX: 0002 RCX:
7f33e0d89d75
[  161.321123][ T3100] RDX: 0002 RSI: 7f33e1201000 RDI:
0003
[  161.357617][ T3100] RBP: 7f33e1201000 R08:  R09:

[  161.394173][ T3100] R10: 0022 R11: 0246 R12:
7f33e1201000
[  161.430736][ T3100] R13: 0003 R14: 0fff R15:
0002
[  161.467337][ T3100] 
[  161.477529][ T3100] Allocated by task 3100:
[  161.497133][ T3100]  save_stack+0x21/0x90
[  161.515777][ T3100]  __kasan_kmalloc.constprop.13+0xc1/0xd0
[  161.541743][ T3100]  kasan_kmalloc+0x9/0x10
[  161.561330][ T3100]  kmem_cache_alloc_trace+0x1f8/0x470
[  161.585949][ T3100]  iommu_insert_resv_region+0xeb/0x520
[  161.610876][ T3100]  iommu_get_group_resv_regions+0x16d/0x2f0
[  161.638318][ T3100]  iommu_group_show_resv_regions+0x8d/0x1f0
[  161.665322][ T3100]  iommu_group_attr_show+0x34/0x50
[  161.688526][ T3100]  sysfs_kf_seq_show+0x11c/0x220
[  161.711992][ T3100]  kernfs_seq_show+0xa4/0xb0
[  161.734252][ T3100]  seq_read+0x27e/0x710
[  161.754412][ T3100]  kernfs_fop_read+0x7d/0x2c0
[  161.775493][ T3100]  __vfs_read+0x50/0xa0
[  161.794328][ T3100]  vfs_read+0xcb/0x1e0
[  161.812559][ T3100]  ksys_read+0xc6/0x160
[  161.831554][ T3100]  __x64_sys_read+0x43/0x50
[  161.851772][ T3100]  do_syscall_64+0xcc/0xaec
[  161.872098][ T3100]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  161.898919][ T3100] 
[  161.909113][ T3100] Freed by task 3100:
[  161.927070][ T3100]  save_stack+0x21/0x90
[  161.945711][ T3100]  __kasan_slab_free+0x11c/0x170
[  161.968112][ T3100]  kasan_slab_free+0xe/0x10
[  161.988601][ T3100]  slab_free_freelist_hook+0x5f/0x1d0
[  162.012918][ T3100]  kfree+0xe9/0x410
[  162.029454][ T3100]  iommu_insert_resv_region+0x47d/0x520
[  162.053701][ T3100]  iommu_get_group_resv_regions+0x16d/0x2f0
[  162.079671][ T3100]  iommu_group_show

Re: [PATCH v2] iommu/iova: silence warnings under memory pressure

2019-11-22 Thread Qian Cai
On Thu, 2019-11-21 at 20:37 -0800, Joe Perches wrote:
> On Thu, 2019-11-21 at 21:55 -0500, Qian Cai wrote:
> > When running heavy memory pressure workloads, this 5+ old system is
> > throwing endless warnings below because disk IO is too slow to recover
> > from swapping. Since the volume from alloc_iova_fast() could be large,
> > once it calls printk(), it will trigger disk IO (writing to the log
> > files) and pending softirqs which could cause an infinite loop and make
> > no progress for days by the ongoimng memory reclaim. This is the counter
> > part for Intel where the AMD part has already been merged. See the
> > commit 3d708895325b ("iommu/amd: Silence warnings under memory
> > pressure"). Since the allocation failure will be reported in
> > intel_alloc_iova(), so just call printk_ratelimted() there and silence
> > the one in alloc_iova_mem() to avoid the expensive warn_alloc().
> 
> []
> > v2: use dev_err_ratelimited() and improve the commit messages.
> 
> []
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> 
> []
> > @@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct device 
> > *dev,
> > iova_pfn = alloc_iova_fast(&domain->iovad, nrpages,
> >IOVA_PFN(dma_mask), true);
> > if (unlikely(!iova_pfn)) {
> > -   dev_err(dev, "Allocating %ld-page iova failed", nrpages);
> > +   dev_err_ratelimited(dev, "Allocating %ld-page iova failed",
> > +   nrpages);
> 
> Trivia:
> 
> This should really have a \n termination on the format string
> 
>   dev_err_ratelimited(dev, "Allocating %ld-page iova failed\n",
> 
> 

Why do you say so? It is right now printing with a newline added anyway.

 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2] iommu/iova: silence warnings under memory pressure

2019-11-21 Thread Qian Cai
lated_file:0
  unevictable:0 dirty:1 writeback:5 unstable:0
  slab_reclaimable:13972 slab_unreclaimable:453879
  mapped:2380 shmem:154 pagetables:6948 bounce:0
  free:19133 free_pcp:7363 free_cma:0

Signed-off-by: Qian Cai 
---

v2: use dev_err_ratelimited() and improve the commit messages.

 drivers/iommu/intel-iommu.c | 3 ++-
 drivers/iommu/iova.c| 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 6db6d969e31c..c01a7bc99385 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct device *dev,
iova_pfn = alloc_iova_fast(&domain->iovad, nrpages,
   IOVA_PFN(dma_mask), true);
if (unlikely(!iova_pfn)) {
-   dev_err(dev, "Allocating %ld-page iova failed", nrpages);
+   dev_err_ratelimited(dev, "Allocating %ld-page iova failed",
+   nrpages);
return 0;
}
 
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 41c605b0058f..aa1a56aaa5ee 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -233,7 +233,7 @@ static DEFINE_MUTEX(iova_cache_mutex);
 
 struct iova *alloc_iova_mem(void)
 {
-   return kmem_cache_alloc(iova_cache, GFP_ATOMIC);
+   return kmem_cache_alloc(iova_cache, GFP_ATOMIC | __GFP_NOWARN);
 }
 EXPORT_SYMBOL(alloc_iova_mem);
 
-- 
2.21.0 (Apple Git-122.2)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/1] iommu/vt-d: Add Kconfig option to enable/disable scalable mode

2019-11-11 Thread Qian Cai



> On Nov 11, 2019, at 12:23 AM, Lu Baolu  wrote:
> 
> The scalable mode is defined in VT-d 3.0. The scalable mode capability
> could be checked by reading /sys/devices/virtual/iommu/dmar*/intel-
> iommu/ecap. It's currently not friendly for reading. You need to decode
> it according to the spec.

This looks like some perfect information to put in the Kconfig description.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/1] iommu/vt-d: Add Kconfig option to enable/disable scalable mode

2019-11-10 Thread Qian Cai



> On Nov 8, 2019, at 10:43 PM, Lu Baolu  wrote:
> 
> +config INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
> +prompt "Enable Intel IOMMU scalable mode by default"
> +depends on INTEL_IOMMU
> +help
> +  Selecting this option will enable the scalable mode if
> +  hardware presents the capability. If this option is not
> +  selected, scalable mode support could also be enabled
> +  by passing intel_iommu=sm_on to the kernel.
> +

Does it also make sense to mention which hardware presents this capability or 
how to check it?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/1] iommu/vt-d: Add Kconfig option to enable/disable scalable mode

2019-11-10 Thread Qian Cai


> On Nov 10, 2019, at 8:30 PM, Lu Baolu  wrote:
> 
> How about "pasid based multiple stages DMA translation"?

It is better but I am still not sure how developers should select it or not 
when asking. Ideally, should it mention pros and cons of this? At minimal, 
there should be a line said “if not sure what this is, select N”?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 1/1] iommu/vt-d: Add Kconfig option to enable/disable scalable mode

2019-11-08 Thread Qian Cai


> On Nov 8, 2019, at 10:40 PM, Lu Baolu  wrote:
> 
> This adds Kconfig option INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
> to make it easier for distributions to enable or disable the
> Intel IOMMU scalable mode by default during kernel build.
> 
> Signed-off-by: Lu Baolu 
> ---
> drivers/iommu/Kconfig   | 9 +
> drivers/iommu/intel-iommu.c | 7 ++-
> 2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index e3842eabcfdd..fbdf3fd291d9 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -242,6 +242,15 @@ config INTEL_IOMMU_FLOPPY_WA
> workaround will setup a 1:1 mapping for the first
> 16MiB to make floppy (an ISA device) work.
> 
> +config INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
> + prompt "Enable Intel IOMMU scalable mode by default"
> + depends on INTEL_IOMMU
> + help
> +   Selecting this option will enable the scalable mode if
> +   hardware presents the capability. If this option is not
> +   selected, scalable mode support could also be enabled
> +   by passing intel_iommu=sm_on to the kernel.


Maybe a sentence or two to describe what the scalable mode is in layman's
terms could be useful, so developers don’t need to search around for the
Kconfig selection?

> +
> config IRQ_REMAP
>   bool "Support for Interrupt Remapping"
>   depends on X86_64 && X86_IO_APIC && PCI_MSI && ACPI
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 6db6d969e31c..6051fe790c61 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -355,9 +355,14 @@ static phys_addr_t intel_iommu_iova_to_phys(struct 
> iommu_domain *domain,
> int dmar_disabled = 0;
> #else
> int dmar_disabled = 1;
> -#endif /*CONFIG_INTEL_IOMMU_DEFAULT_ON*/
> +#endif /* CONFIG_INTEL_IOMMU_DEFAULT_ON */
> 
> +#ifdef INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
> +int intel_iommu_sm = 1;
> +#else
> int intel_iommu_sm;
> +#endif /* INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON */
> +
> int intel_iommu_enabled = 0;
> EXPORT_SYMBOL_GPL(intel_iommu_enabled);
> 
> -- 
> 2.17.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy

2019-10-17 Thread Qian Cai
On Wed, 2019-10-16 at 17:44 +0200, Joerg Roedel wrote:
> On Wed, Oct 16, 2019 at 10:59:42AM -0400, Qian Cai wrote:
> > BTW, the previous x86 warning was from only reverted one patch "iommu: Add 
> > gfp
> > parameter to iommu_ops::map" where proved to be insufficient. Now, pasting 
> > the
> > correct warning.
> 
> Can you please test this small fix:

This works fine so far.

> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 78a2cca3ac5c..e7a4464e8594 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2562,7 +2562,7 @@ static int amd_iommu_map(struct iommu_domain *dom, 
> unsigned long iova,
>   if (iommu_prot & IOMMU_WRITE)
>   prot |= IOMMU_PROT_IW;
>  
> - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL);
> + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp);
>  
>   domain_flush_np_cache(domain, iova, page_size);
>  
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH -next] iommu/amd: fix a warning in increase_address_space

2019-10-17 Thread Qian Cai



> On Oct 16, 2019, at 6:59 PM, Jerry Snitselaar  wrote:
> 
> I guess the mode level 6 check is really for other potential callers
> increase_address_space, none exist at the moment, and the condition
> of the while loop in alloc_pte should fail if the mode level is 6.

Because there is no locking around iommu_map_page(), if there are several 
concurrent callers of it for the same domain, could it be that it silently 
corrupt data due to invalid access?

[PATCH -next] iommu/amd: fix a warning in increase_address_space

2019-10-16 Thread Qian Cai
After the commit 754265bcab78 ("iommu/amd: Fix race in
increase_address_space()"), it could still possible trigger a race
condition under some heavy memory pressure below. The race to trigger a
warning is,

CPU0:   CPU1:
in alloc_pte(): in increase_address_space():
while (address > PM_LEVEL_SIZE(domain->mode)) [1]

spin_lock_irqsave(&domain->lock
domain->mode+= 1;
spin_unlock_irqrestore(&domain->lock

in increase_address_space():
spin_lock_irqsave(&domain->lock
if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))

[1] domain->mode = 5

It is unclear the triggering of the warning is the root cause of the
smartpqi offline yet, but let's fix it first by lifting the locking.

WARNING: CPU: 57 PID: 124314 at drivers/iommu/amd_iommu.c:1474
iommu_map_page+0x718/0x7e0
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec flags=0x0010]
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec1000 flags=0x0010]
CPU: 57 PID: 124314 Comm: oom01 Tainted: G   O
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40
07/10/2019
RIP: 0010:iommu_map_page+0x718/0x7e0
Code: 88 a5 70 ff ff ff e9 5d fa ff ff 48 8b b5 70 ff ff ff 4c 89 ef e8
08 32 2f 00 41 80 fc 01 0f 87 b7 3d 00 00 41 83 e4 01 eb be <0f> 0b 48
8b b5 70 ff ff ff 4c 89 ef e8 e7 31 2f 00 eb dd 0f 0b 48
RSP: 0018:888da4816cb8 EFLAGS: 00010046
RAX:  RBX: 8885fe689000 RCX: 96f4a6c4
RDX: 0007 RSI: dc00 RDI: 8885fe689124
RBP: 888da4816da8 R08: ed10bfcd120e R09: ed10bfcd120e
R10: ed10bfcd120d R11: 8885fe68906b R12: 
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec1a00 flags=0x0010]
R13: 8885fe689068 R14: 8885fe689124 R15: 
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec1e00 flags=0x0010]
FS:  7f29722ba700() GS:88902f88()
knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f27f82d8000 CR3: 00102ed9c000 CR4: 003406e0
Call Trace:
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec2000 flags=0x0010]
 map_sg+0x1ce/0x2f0
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec2400 flags=0x0010]
 scsi_dma_map+0xd7/0x160
 pqi_raid_submit_scsi_cmd_with_io_request+0x1b8/0x420 [smartpqi]
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec2800 flags=0x0010]
 pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi]
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec2c00 flags=0x0010]
 scsi_queue_rq+0xd19/0x1360
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec3000 flags=0x0010]
 __blk_mq_try_issue_directly+0x295/0x3f0
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec3400 flags=0x0010]
AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x
address=0xffec3800 flags=0x0010]
 blk_mq_request_issue_directly+0xb5/0x100
AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x
address=0xffec3c00 flags=0x0010]
 blk_mq_try_issue_list_directly+0xa9/0x160
 blk_mq_sched_insert_requests+0x228/0x380
 blk_mq_flush_plug_list+0x448/0x7e0
 blk_flush_plug_list+0x1eb/0x230
 blk_finish_plug+0x43/0x5d
 shrink_node_memcg+0x9c5/0x1550
smartpqi :23:00.0: controller is offline: status code 0x14803
smartpqi :23:00.0: controller offline

Fixes: 754265bcab78 ("iommu/amd: Fix race in increase_address_space()")
Signed-off-by: Qian Cai 
---

BTW, Joerg, this line from the commit "iommu/amd: Remove domain->updated" looks
suspicious. Not sure what the purpose of it.

*updated = increase_address_space(domain, gfp) || *updated;

 drivers/iommu/amd_iommu.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 2369b8af81f3..a5754068aa29 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1465,12 +1465,9 @@ static void free_pagetable(struct protection_domain 
*domain)
 static bool increase_address_space(struct protection_domain *domain,
   gfp_t gfp)
 {
-   unsigned long flags;
bool ret = false;
u64 *pte;
 
-   spin_lock_irqsave(&domain->lock, flags);
-
if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))
/* address space already 64 bit large */
goto out;
@@ -1487,8 +1484,6 @@ static bool increase_address_space(struct 
protection_domai

Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy

2019-10-16 Thread Qian Cai
On Wed, 2019-10-16 at 18:03 +0200, Joerg Roedel wrote:
> On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote:
> > On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote:
> > The x86 one might just be a mistake.
> > 
> > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> > index ad05484d0c80..63c4b894751d 100644
> > --- a/drivers/iommu/amd_iommu.c
> > +++ b/drivers/iommu/amd_iommu.c
> > @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom,
> > unsigned long iova,
> > if (iommu_prot & IOMMU_WRITE)
> > prot |= IOMMU_PROT_IW;
> >  
> > -   ret = iommu_map_page(domain, iova, paddr, page_size, prot, 
> > GFP_KERNEL);
> > +   ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp);
> 
> Yeah, that is a bug, I spotted that too.
> 
> > @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page
> > *iommu_dma_get_msi_page(struct device *dev,
> > if (!iova)
> > goto out_free_page;
> >  
> > -   if (iommu_map(domain, iova, msi_addr, size, prot))
> > +   if (iommu_map_atomic(domain, iova, msi_addr, size, prot))
> > goto out_free_iova;
> 
> Not so sure this is a bug, this code is only about setting up MSIs on
> ARM. It probably doesn't need to be atomic.

The patch "iommu: Add gfp parameter to iommu_ops::map" does this. It could be
called from an atomic context as showed in the arm64 call traces,

+int iommu_map(struct iommu_domain *domain, unsigned long iova,
+ phys_addr_t paddr, size_t size, int prot)
+{
+   might_sleep();
+   return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
+}


Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy

2019-10-16 Thread Qian Cai
On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote:
> Hi Qian,
> 
> thanks for the report!
> 
> On Wed, Oct 16, 2019 at 10:59:42AM -0400, Qian Cai wrote:
> > On Wed, 2019-10-16 at 10:55 -0400, Qian Cai wrote:
> > > Today's linux-next generates a lot of warnings on multiple servers during 
> > > boot
> > > due to the series "iommu/amd: Convert the AMD iommu driver to the 
> > > dma-iommu api"
> > > [1]. Reverted the whole things fixed them.
> > > 
> > > [1] https://lore.kernel.org/lkml/20190908165642.22253-1-murph...@tcd.ie/
> > > 
> > 
> > BTW, the previous x86 warning was from only reverted one patch "iommu: Add 
> > gfp
> > parameter to iommu_ops::map" where proved to be insufficient. Now, pasting 
> > the
> > correct warning.

The x86 one might just be a mistake.

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index ad05484d0c80..63c4b894751d 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom,
unsigned long iova,
if (iommu_prot & IOMMU_WRITE)
prot |= IOMMU_PROT_IW;
 
-   ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL);
+   ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp);
 
domain_flush_np_cache(domain, iova, page_size);

The arm64 -- does it forget to do this?

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ecc08aef9b58..8dd0ef0656f4 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page
*iommu_dma_get_msi_page(struct device *dev,
if (!iova)
goto out_free_page;
 
-   if (iommu_map(domain, iova, msi_addr, size, prot))
+   if (iommu_map_atomic(domain, iova, msi_addr, size, prot))
goto out_free_iova;
 
INIT_LIST_HEAD(&msi_page->list);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy

2019-10-16 Thread Qian Cai
On Wed, 2019-10-16 at 10:55 -0400, Qian Cai wrote:
> Today's linux-next generates a lot of warnings on multiple servers during boot
> due to the series "iommu/amd: Convert the AMD iommu driver to the dma-iommu 
> api"
> [1]. Reverted the whole things fixed them.
> 
> [1] https://lore.kernel.org/lkml/20190908165642.22253-1-murph...@tcd.ie/
> 

BTW, the previous x86 warning was from only reverted one patch "iommu: Add gfp
parameter to iommu_ops::map" where proved to be insufficient. Now, pasting the
correct warning.

[  564.365768][ T6222] BUG: sleeping function called from invalid context at
mm/page_alloc.c:4692
[  564.374447][ T6222] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
6222, name: git
[  564.382969][ T6222] INFO: lockdep is turned off.
[  564.387644][ T6222] CPU: 25 PID: 6222 Comm: git Tainted:
GW 5.4.0-rc3-next-20191016 #6
[  564.397011][ T6222] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 07/10/2019
[  564.406291][ T6222] Call Trace:
[  564.409470][ T6222]  dump_stack+0x86/0xca
[  564.413517][ T6222]  ___might_sleep.cold.92+0xd2/0x122
[  564.418694][ T6222]  __might_sleep+0x73/0xe0
[  564.422999][ T6222]  __alloc_pages_nodemask+0x442/0x720
[  564.428265][ T6222]  ? __alloc_pages_slowpath+0x18d0/0x18d0
[  564.433883][ T6222]  ? arch_stack_walk+0x7f/0xf0
[  564.438534][ T6222]  ? create_object+0x4a2/0x540
[  564.443188][ T6222]  alloc_pages_current+0x9c/0x110
[  564.448098][ T6222]  __get_free_pages+0x12/0x60
[  564.452659][ T6222]  get_zeroed_page+0x16/0x20
[  564.457137][ T6222]  amd_iommu_map+0x504/0x850
[  564.461612][ T6222]  ? amd_iommu_domain_direct_map+0x60/0x60
[  564.467312][ T6222]  ? lockdep_hardirqs_on+0x16/0x2a0
[  564.472400][ T6222]  ? alloc_iova+0x189/0x210
[  564.476790][ T6222]  __iommu_map+0x1c1/0x4e0
[  564.481090][ T6222]  ? iommu_get_dma_domain+0x40/0x40
[  564.486181][ T6222]  ? alloc_iova_fast+0x258/0x3d1
[  564.491009][ T6222]  ? create_object+0x4a2/0x540
[  564.495656][ T6222]  __iommu_map_sg+0xa5/0x130
[  564.500135][ T6222]  iommu_map_sg_atomic+0x14/0x20
[  564.504958][ T6222]  iommu_dma_map_sg+0x2c3/0x450
[  564.509699][ T6222]  scsi_dma_map+0xd7/0x160
[  564.514010][ T6222]  pqi_raid_submit_scsi_cmd_with_io_request+0x392/0x420
[smartpqi]
[  564.521811][ T6222]  ? pqi_alloc_io_request+0x127/0x140 [smartpqi]
[  564.528037][ T6222]  pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi]
[  564.534264][ T6222]  ? pqi_eh_device_reset_handler+0x9c0/0x9c0 [smartpqi]
[  564.541100][ T6222]  ? sd_init_command+0xa25/0x1346 [sd_mod]
[  564.546802][ T6222]  scsi_queue_rq+0xd19/0x1360
[  564.551372][ T6222]  __blk_mq_try_issue_directly+0x295/0x3f0
[  564.557071][ T6222]  ? blk_mq_request_bypass_insert+0xd0/0xd0
[  564.562860][ T6222]  ? debug_lockdep_rcu_enabled+0x27/0x60
[  564.568384][ T6222]  blk_mq_try_issue_directly+0xad/0x130
[  564.573821][ T6222]  ? __blk_mq_try_issue_directly+0x3f0/0x3f0
[  564.579693][ T6222]  ? blk_add_rq_to_plug+0xcd/0x110
[  564.584693][ T6222]  blk_mq_make_request+0xcee/0x1120
[  564.589777][ T6222]  ? lock_downgrade+0x3c0/0x3c0
[  564.594517][ T6222]  ? blk_mq_try_issue_directly+0x130/0x130
[  564.600218][ T6222]  ? blk_queue_enter+0x78d/0x810
[  564.605041][ T6222]  ? generic_make_request_checks+0xf30/0xf30
[  564.610915][ T6222]  ? lock_downgrade+0x3c0/0x3c0
[  564.615655][ T6222]  ? __srcu_read_unlock+0x24/0x50
[  564.620565][ T6222]  ? generic_make_request+0x150/0x650
[  564.625833][ T6222]  generic_make_request+0x196/0x650
[  564.630921][ T6222]  ? blk_queue_enter+0x810/0x810
[  564.635747][ T6222]  submit_bio+0xaa/0x270
[  564.639873][ T6222]  ? submit_bio+0xaa/0x270
[  564.644172][ T6222]  ? generic_make_request+0x650/0x650
[  564.649437][ T6222]  ? iomap_readpage+0x260/0x260
[  564.654173][ T6222]  iomap_readpages+0x154/0x3d0
[  564.658823][ T6222]  ? iomap_zero_range_actor+0x330/0x330
[  564.664257][ T6222]  ? __might_sleep+0x73/0xe0
[  564.668836][ T6222]  xfs_vm_readpages+0xaf/0x1f0 [xfs]
[  564.674016][ T6222]  read_pages+0xe2/0x3b0
[  564.678142][ T6222]  ? read_cache_pages+0x350/0x350
[  564.683057][ T6222]  ? __page_cache_alloc+0x12c/0x230
[  564.688148][ T6222]  __do_page_cache_readahead+0x346/0x3a0
[  564.693670][ T6222]  ? read_pages+0x3b0/0x3b0
[  564.698059][ T6222]  ? lockdep_hardirqs_on+0x16/0x2a0
[  564.703247][ T6222]  ? __xfs_filemap_fault+0x167/0x4a0 [xfs]
[  564.708947][ T6222]  filemap_fault+0xa13/0xe70
[  564.713528][ T6222]  __xfs_filemap_fault+0x167/0x4a0 [xfs]
[  564.719059][ T6222]  ? kmemleak_alloc+0x57/0x90
[  564.723724][ T6222]  ? xfs_file_read_iter+0x3c0/0x3c0 [xfs]
[  564.729337][ T6222]  ? debug_check_no_locks_freed+0x2c/0xe0
[  564.734946][ T6222]  ? lockdep_init_map+0x8b/0x2b0
[  564.739872][ T6222]  xfs_filemap_fault+0x68/0x70 [xfs]
[  564.745046][ T6222]  __do_fault+0x83/0x220
[  564.749172][ T6222]  __handle_mm_fault+0xd76/0x1f40
[  564.754084][ T6222]  ? __pmd_alloc+0x280/0x280
[  564.758559][ T6222]  ? debug_loc

"Convert the AMD iommu driver to the dma-iommu api" is buggy

2019-10-16 Thread Qian Cai
Today's linux-next generates a lot of warnings on multiple servers during boot
due to the series "iommu/amd: Convert the AMD iommu driver to the dma-iommu api"
[1]. Reverted the whole things fixed them.

[1] https://lore.kernel.org/lkml/20190908165642.22253-1-murph...@tcd.ie/

[  257.785749][ T6184] BUG: sleeping function called from invalid context at
mm/page_alloc.c:4692
[  257.794886][ T6184] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
6184, name: readelf
[  257.803574][ T6184] INFO: lockdep is turned off.
[  257.808233][ T6184] CPU: 86 PID: 6184 Comm: readelf Tainted:
GW 5.4.0-rc3-next-20191016+ #7
[  257.818035][ T6184] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 07/10/2019
[  257.827313][ T6184] Call Trace:
[  257.830487][ T6184]  dump_stack+0x86/0xca
[  257.834530][ T6184]  ___might_sleep.cold.92+0xd2/0x122
[  257.839708][ T6184]  __might_sleep+0x73/0xe0
[  257.844011][ T6184]  __alloc_pages_nodemask+0x442/0x720
[  257.849274][ T6184]  ? __alloc_pages_slowpath+0x18d0/0x18d0
[  257.854886][ T6184]  ? debug_lockdep_rcu_enabled+0x27/0x60
[  257.860415][ T6184]  ? lock_downgrade+0x3c0/0x3c0
[  257.865156][ T6184]  alloc_pages_current+0x9c/0x110
[  257.870071][ T6184]  __get_free_pages+0x12/0x60
[  257.874634][ T6184]  get_zeroed_page+0x16/0x20
[  257.879112][ T6184]  amd_iommu_map+0x504/0x850
[  257.883588][ T6184]  ? amd_iommu_domain_direct_map+0x60/0x60
[  257.889286][ T6184]  ? lockdep_hardirqs_on+0x16/0x2a0
[  257.894373][ T6184]  ? alloc_iova+0x189/0x210
[  257.898765][ T6184]  ? trace_hardirqs_on+0x3a/0x160
[  257.903677][ T6184]  iommu_map+0x1b3/0x4d0
[  257.907802][ T6184]  ? iommu_unmap+0xf0/0xf0
[  257.912104][ T6184]  ? alloc_iova_fast+0x258/0x3d1
[  257.916929][ T6184]  ? create_object+0x4a2/0x540
[  257.921579][ T6184]  iommu_map_sg+0x9d/0x120
[  257.925882][ T6184]  iommu_dma_map_sg+0x2c3/0x450
[  257.930627][ T6184]  scsi_dma_map+0xd7/0x160
[  257.934936][ T6184]  pqi_raid_submit_scsi_cmd_with_io_request+0x392/0x420
[smartpqi]
[  257.942735][ T6184]  ? pqi_alloc_io_request+0x127/0x140 [smartpqi]
[  257.948962][ T6184]  pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi]
[  257.955192][ T6184]  ? pqi_eh_device_reset_handler+0x9c0/0x9c0 [smartpqi]
[  257.962029][ T6184]  ? sd_init_command+0xa25/0x1346 [sd_mod]
[  257.967730][ T6184]  scsi_queue_rq+0xd19/0x1360
[  257.972298][ T6184]  __blk_mq_try_issue_directly+0x295/0x3f0
[  257.977999][ T6184]  ? blk_mq_request_bypass_insert+0xd0/0xd0
[  257.983787][ T6184]  ? debug_lockdep_rcu_enabled+0x27/0x60
[  257.989312][ T6184]  blk_mq_request_issue_directly+0xb5/0x100
[  257.995098][ T6184]  ? blk_mq_flush_plug_list+0x7e0/0x7e0
[  258.000537][ T6184]  ? blk_mq_sched_insert_requests+0xd6/0x380
[  258.006409][ T6184]  ? lock_downgrade+0x3c0/0x3c0
[  258.011147][ T6184]  blk_mq_try_issue_list_directly+0xa9/0x160
[  258.017023][ T6184]  blk_mq_sched_insert_requests+0x228/0x380
[  258.022810][ T6184]  blk_mq_flush_plug_list+0x448/0x7e0
[  258.028073][ T6184]  ? blk_mq_insert_requests+0x380/0x380
[  258.033516][ T6184]  blk_flush_plug_list+0x1eb/0x230
[  258.038515][ T6184]  ? blk_insert_cloned_request+0x1b0/0x1b0
[  258.044215][ T6184]  blk_finish_plug+0x43/0x5d
[  258.048695][ T6184]  read_pages+0xf6/0x3b0
[  258.052823][ T6184]  ? read_cache_pages+0x350/0x350
[  258.057737][ T6184]  ? __page_cache_alloc+0x12c/0x230
[  258.062826][ T6184]  __do_page_cache_readahead+0x346/0x3a0
[  258.068348][ T6184]  ? read_pages+0x3b0/0x3b0
[  258.072738][ T6184]  ? lockdep_hardirqs_on+0x16/0x2a0
[  258.077928][ T6184]  ? __xfs_filemap_fault+0x167/0x4a0 [xfs]
[  258.083625][ T6184]  filemap_fault+0xa13/0xe70
[  258.088201][ T6184]  __xfs_filemap_fault+0x167/0x4a0 [xfs]
[  258.093731][ T6184]  ? kmemleak_alloc+0x57/0x90
[  258.098397][ T6184]  ? xfs_file_read_iter+0x3c0/0x3c0 [xfs]
[  258.104009][ T6184]  ? debug_check_no_locks_freed+0x2c/0xe0
[  258.109618][ T6184]  ? lockdep_init_map+0x8b/0x2b0
[  258.114543][ T6184]  xfs_filemap_fault+0x68/0x70 [xfs]
[  258.119720][ T6184]  __do_fault+0x83/0x220
[  258.123847][ T6184]  __handle_mm_fault+0xd76/0x1f40
[  258.128757][ T6184]  ? __pmd_alloc+0x280/0x280
[  258.133231][ T6184]  ? debug_lockdep_rcu_enabled+0x27/0x60
[  258.138755][ T6184]  ? handle_mm_fault+0x178/0x4c0
[  258.143581][ T6184]  ? lockdep_hardirqs_on+0x16/0x2a0
[  258.148674][ T6184]  ? __do_page_fault+0x29c/0x640
[  258.153501][ T6184]  handle_mm_fault+0x205/0x4c0
[  258.158151][ T6184]  __do_page_fault+0x29c/0x640
[  258.162800][ T6184]  do_page_fault+0x50/0x37f
[  258.167189][ T6184]  page_fault+0x34/0x40
[  258.171228][ T6184] RIP: 0010:__clear_user+0x3b/0x70

[  183.553150] BUG: sleeping function called from invalid context at
drivers/iommu/iommu.c:1950
[  183.562306] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1486,
name: kworker/0:3
[  183.571450] 5 locks held by kworker/0:3/1486:
[  183.576510]  #0: 44ff0008000ce128 ((wq_completion)events){+.+.}, at:
process_one_work+0x25c/0x948
[  183.5861

Re: [PATCH] dma/coherent: remove unused dma_get_device_base()

2019-09-25 Thread Qian Cai
Ping. Please take a look at this trivial patch.

On Tue, 2019-09-17 at 09:00 -0400, Qian Cai wrote:
> dma_get_device_base() was first introduced in the commit c41f9ea998f3
> ("drivers: dma-coherent: Account dma_pfn_offset when used with device
> tree"). Later, it was removed by the commit 43fc509c3efb ("dma-coherent:
> introduce interface for default DMA pool")
> 
> Found by the -Wunused-function compilation warning.
> 
> Signed-off-by: Qian Cai 
> ---
>  kernel/dma/coherent.c | 9 -
>  1 file changed, 9 deletions(-)
> 
> diff --git a/kernel/dma/coherent.c b/kernel/dma/coherent.c
> index 29fd6590dc1e..909b38e1c29b 100644
> --- a/kernel/dma/coherent.c
> +++ b/kernel/dma/coherent.c
> @@ -28,15 +28,6 @@ static inline struct dma_coherent_mem 
> *dev_get_coherent_memory(struct device *de
>   return NULL;
>  }
>  
> -static inline dma_addr_t dma_get_device_base(struct device *dev,
> -  struct dma_coherent_mem * mem)
> -{
> - if (mem->use_dev_dma_pfn_offset)
> - return (mem->pfn_base - dev->dma_pfn_offset) << PAGE_SHIFT;
> - else
> - return mem->device_base;
> -}
> -
>  static int dma_init_coherent_memory(phys_addr_t phys_addr,
>   dma_addr_t device_addr, size_t size,
>   struct dma_coherent_mem **mem)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] dma/coherent: remove unused dma_get_device_base()

2019-09-17 Thread Qian Cai
dma_get_device_base() was first introduced in the commit c41f9ea998f3
("drivers: dma-coherent: Account dma_pfn_offset when used with device
tree"). Later, it was removed by the commit 43fc509c3efb ("dma-coherent:
introduce interface for default DMA pool")

Found by the -Wunused-function compilation warning.

Signed-off-by: Qian Cai 
---
 kernel/dma/coherent.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/kernel/dma/coherent.c b/kernel/dma/coherent.c
index 29fd6590dc1e..909b38e1c29b 100644
--- a/kernel/dma/coherent.c
+++ b/kernel/dma/coherent.c
@@ -28,15 +28,6 @@ static inline struct dma_coherent_mem 
*dev_get_coherent_memory(struct device *de
return NULL;
 }
 
-static inline dma_addr_t dma_get_device_base(struct device *dev,
-struct dma_coherent_mem * mem)
-{
-   if (mem->use_dev_dma_pfn_offset)
-   return (mem->pfn_base - dev->dma_pfn_offset) << PAGE_SHIFT;
-   else
-   return mem->device_base;
-}
-
 static int dma_init_coherent_memory(phys_addr_t phys_addr,
dma_addr_t device_addr, size_t size,
struct dma_coherent_mem **mem)
-- 
1.8.3.1



Re: [PATCH] iommu/arm-smmu: Report USF more clearly

2019-09-13 Thread Qian Cai
On Fri, 2019-09-13 at 12:48 +0100, Robin Murphy wrote:
> Although CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT is a welcome tool
> for smoking out inadequate firmware, the failure mode is non-obvious
> and can be confusing for end users. Add some special-case reporting of
> Unidentified Stream Faults to help clarify this particular symptom.
> 
> CC: Douglas Anderson 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/arm-smmu.c | 5 +
>  drivers/iommu/arm-smmu.h | 2 ++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index b7cf24402a94..76ac8c180695 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -499,6 +499,11 @@ static irqreturn_t arm_smmu_global_fault(int irq, void 
> *dev)
>   dev_err_ratelimited(smmu->dev,
>   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
> 0x%08x\n",
>   gfsr, gfsynr0, gfsynr1, gfsynr2);
> + if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
> + (gfsr & sGFSR_USF))
> + dev_err_ratelimited(smmu->dev,
> + "Stream ID %hu may not be described by firmware, try 
> booting with \"arm-smmu.disable_bypass=0\"\n",
> + (u16)gfsynr1);

dev_err_once(), i.e., don't need to remind people to set "arm-
smmu.disable_bypass=0" multiple times.

>  
>   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sGFSR, gfsr);
>   return IRQ_HANDLED;
> diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> index c9c13b5785f2..46f7e161e83e 100644
> --- a/drivers/iommu/arm-smmu.h
> +++ b/drivers/iommu/arm-smmu.h
> @@ -79,6 +79,8 @@
>  #define ID7_MINORGENMASK(3, 0)
>  
>  #define ARM_SMMU_GR0_sGFSR   0x48
> +#define sGFSR_USFBIT(2)
> +
>  #define ARM_SMMU_GR0_sGFSYNR00x50
>  #define ARM_SMMU_GR0_sGFSYNR10x54
>  #define ARM_SMMU_GR0_sGFSYNR20x58
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Make function signatures static

2019-09-10 Thread Qian Cai
On Tue, 2019-09-10 at 10:15 +0200, Joerg Roedel wrote:
> On Sat, Sep 07, 2019 at 04:49:33PM +1000, Adam Zerella wrote:
> >  drivers/iommu/intel-iommu.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> Applied, thanks.

Joerg, not sure if you saw the reply from Lu,

https://lore.kernel.org/linux-iommu/ba8d4792-3b62-98a8-31d8-74a08be2f983@linux.i
ntel.com/

This patch is not even compiled for me as well.

WARNING: "intel_iommu_gfx_mapped" [vmlinux] is a static EXPORT_SYMBOL_GPL
drivers/iommu/dmar.o: In function `iommu_device_set_ops':
/home/linux-mm/linux/./include/linux/iommu.h:382: undefined reference to
`intel_iommu_ops'
make: *** [Makefile:1096: vmlinux] Error 1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] iommu/amd: fix a race in increase_address_space()

2019-09-05 Thread Qian Cai
On Thu, 2019-09-05 at 13:43 +0200, Joerg Roedel wrote:
> Hi Qian,
> 
> On Wed, Sep 04, 2019 at 05:24:22PM -0400, Qian Cai wrote:
> > if (domain->mode == PAGE_MODE_6_LEVEL)
> > /* address space already 64 bit large */
> > return false;
> > 
> > This gives a clue that there must be a race between multiple concurrent
> > threads in increase_address_space().
> 
> Thanks for tracking this down, there is a race indeed.
> 
> > +   mutex_lock(&domain->api_lock);
> > *dma_addr = __map_single(dev, dma_dom, page_to_phys(page),
> >  size, DMA_BIDIRECTIONAL, dma_mask);
> > +   mutex_unlock(&domain->api_lock);
> >  
> > if (*dma_addr == DMA_MAPPING_ERROR)
> > goto out_free;
> > @@ -2696,7 +2698,9 @@ static void free_coherent(struct device *dev, size_t 
> > size,
> >  
> > dma_dom = to_dma_ops_domain(domain);
> >  
> > +   mutex_lock(&domain->api_lock);
> > __unmap_single(dma_dom, dma_addr, size, DMA_BIDIRECTIONAL);
> > +   mutex_unlock(&domain->api_lock);
> 
> But I think the right fix is to lock the operation in
> increase_address_space() directly, and not the calls around it, like in
> the diff below. It is untested, so can you please try it and report back
> if it fixes your issue?

Yes, it works great so far.

> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index b607a92791d3..1ff705f16239 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -1424,18 +1424,21 @@ static void free_pagetable(struct protection_domain 
> *domain)
>   * another level increases the size of the address space by 9 bits to a size 
> up
>   * to 64 bits.
>   */
> -static bool increase_address_space(struct protection_domain *domain,
> +static void increase_address_space(struct protection_domain *domain,
>  gfp_t gfp)
>  {
> + unsigned long flags;
>   u64 *pte;
>  
> - if (domain->mode == PAGE_MODE_6_LEVEL)
> + spin_lock_irqsave(&domain->lock, flags);
> +
> + if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))
>   /* address space already 64 bit large */
> - return false;
> + goto out;
>  
>   pte = (void *)get_zeroed_page(gfp);
>   if (!pte)
> - return false;
> + goto out;
>  
>   *pte = PM_LEVEL_PDE(domain->mode,
>   iommu_virt_to_phys(domain->pt_root));
> @@ -1443,7 +1446,10 @@ static bool increase_address_space(struct 
> protection_domain *domain,
>   domain->mode+= 1;
>   domain->updated  = true;
>  
> - return true;
> +out:
> + spin_unlock_irqrestore(&domain->lock, flags);
> +
> + return;
>  }
>  
>  static u64 *alloc_pte(struct protection_domain *domain,


[RFC PATCH] iommu/amd: fix a race in increase_address_space()

2019-09-04 Thread Qian Cai
When the system is under some memory pressure, it could cause disks on
the "smartpqi" driver going offline below. From the UBSAN report, it
indicates that "domain->mode" becomes 7. Further investigation indicates
that the only place that would increase "domain->mode" is
increase_address_space() but it has this check before actually
increasing it,

if (domain->mode == PAGE_MODE_6_LEVEL)
/* address space already 64 bit large */
return false;

This gives a clue that there must be a race between multiple concurrent
threads in increase_address_space().

In amd_iommu_map() and amd_iommu_unmap(), there is
mutex_lock() and mutex_unlock() around a iommu_map_page(). There are
several places that could call iommu_map_page() with interrupt enabled.
One for sure is alloc_coherent() which call __map_single() that had a
comment said that it needs to hold a lock but not so. I am not sure
about the map_page() and map_sg() where my investigation so far
indicating that many call sites there are in the atomic context.

smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xff702a00 flags=0x0010]
UBSAN: Undefined behaviour in drivers/iommu/amd_iommu.c:1464:29
shift exponent 66 is too large for 64-bit type 'long unsigned int'
CPU: 39 PID: 821 Comm: kswapd4 Tainted: G   O
5.3.0-rc7-next-20190903+ #4
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40
07/10/2019
Call Trace:
 dump_stack+0x62/0x9a
 ubsan_epilogue+0xd/0x3a
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xff702c80 flags=0x0010]
 __ubsan_handle_shift_out_of_bounds.cold.12+0x21/0x68
 iommu_map_page.cold.39+0x7f/0x84
 map_sg+0x1ce/0x2f0
 scsi_dma_map+0xc6/0x160
 pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470 [smartpqi]
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xff702900 flags=0x0010]
 pqi_scsi_queue_command+0x7d6/0xeb0 [smartpqi]
 scsi_queue_rq+0x7ee/0x12b0
 __blk_mq_try_issue_directly+0x295/0x3f0
 blk_mq_try_issue_directly+0xad/0x130
 blk_mq_make_request+0xb12/0xee0
 generic_make_request+0x179/0x4a0
 submit_bio+0xaa/0x270
 __swap_writepage+0x8f5/0xba0
 swap_writepage+0x65/0xb0
 pageout.isra.3+0x3e5/0xa00
 shrink_page_list+0x15a0/0x2660
 shrink_inactive_list+0x373/0x770
 shrink_node_memcg+0x4ff/0x1550
 shrink_node+0x123/0x800
 balance_pgdat+0x493/0x9b0
 kswapd+0x39b/0x7f0
 kthread+0x1df/0x200
 ret_from_fork+0x22/0x40

smartpqi :23:00.0: resetting scsi 0:1:0:0
INFO: task khugepaged:797 blocked for more than 122 seconds.
  Tainted: G   O  5.3.0-rc7-next-20190903+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
khugepaged  D22656   797  2 0x80004000
Call Trace:
 __schedule+0x4bb/0xc20
 schedule+0x6c/0x140
 io_schedule+0x21/0x50
 rq_qos_wait+0x18e/0x2b0
 wbt_wait+0x12b/0x1b0
 __rq_qos_throttle+0x3b/0x60
 blk_mq_make_request+0x217/0xee0
 generic_make_request+0x179/0x4a0
 submit_bio+0xaa/0x270
 __swap_writepage+0x8f5/0xba0
 swap_writepage+0x65/0xb0
 pageout.isra.3+0x3e5/0xa00
 shrink_page_list+0x15a0/0x2660
 shrink_inactive_list+0x373/0x770
 shrink_node_memcg+0x4ff/0x1550
 shrink_node+0x123/0x800
 do_try_to_free_pages+0x22f/0x820
 try_to_free_pages+0x242/0x4d0
 __alloc_pages_nodemask+0x9f6/0x1bb0
 khugepaged_alloc_page+0x6f/0xf0
 collapse_huge_page+0x103/0x1060
 khugepaged_scan_pmd+0x840/0xa70
 khugepaged+0x1571/0x18d0
 kthread+0x1df/0x200
 ret_from_fork+0x22/0x40
INFO: task systemd-journal:3112 blocked for more than 123 seconds.
  Tainted: G   O  5.3.0-rc7-next-20190903+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
systemd-journal D20744  3112  1 0x4120
Call Trace:
 __do_page_cache_readahead+0x149/0x3a0
 filemap_fault+0x9da/0x1050
 __xfs_filemap_fault+0x167/0x450 [xfs]
 xfs_filemap_fault+0x68/0x70 [xfs]
 __do_fault+0x83/0x220
 __handle_mm_fault+0x1034/0x1a50
 handle_mm_fault+0x17f/0x37e
 __do_page_fault+0x369/0x630
 do_page_fault+0x50/0x2d3
 page_fault+0x2f/0x40
RIP: 0033:0x56088e81faa0
Code: Bad RIP value.
RSP: 002b:7ffdc2158ba8 EFLAGS: 000102_iter+0x135/0x850
 shrink_node+0x123/0x800
 do_try_to_free_pages+0x22f/0x820
 try_to_free_pages+0x242/0x4d0
 __alloc_pages_nodemask+0x9f6/0x1bb0
[11967.13967  __swap_writepage+0x8f5/0xba0
 swap_writepage+0x65/0xb0
 pageout.isra.3+0x3e5/0xa00
 shrink_page_list+0x15a0/0x2660
 io_schedule+0x21/0x50
 rq_qos_wait+0x18e/0x2b0
 wbt_wait+0x12b/0x1b0
 __rq_qos_throttle+0x3b/0x60
 blk_mq_make_request+0x217/0xee0
 schedule+0x6c/0x140
 io_schedule+0x21/0x50
 rq_qos_wait+0x18e/0x2b0
 wbt_wait+0x12b/0x1b0
 __rq_qos_throttle+0x3b/0x60
 blk_mq_make_request+0x217/0xee0
[11968.69198.0+0x60/0x60
 swap_writepage+0x65/0xb0
 pageout.isra.3+0x3e5/0xa00
 shrink_page_list+0x15a0/0x2660
 generic_make_request+0x179/0x4a0
 submit_bio+0xaa/0x270
 __swap_writepage+0x8f5/0xba0
 swap_writepage+0x65/0xb0
 pageo

"Rework enabling/disabling of ATS for PCI masters" failed to compile on arm64

2019-09-02 Thread Qian Cai
The linux-next commit “iommu/arm-smmu-v3: Rework enabling/disabling of ATS for 
PCI masters” [1] causes a compilation error when PCI_ATS=n on arm64.

[1] https://lore.kernel.org/linux-iommu/20190820154549.17018-3-w...@kernel.org/

drivers/iommu/arm-smmu-v3.c:2325:35: error: no member named 'ats_cap' in 
'struct pci_dev'
return !pdev->untrusted && pdev->ats_cap;
     ^

For example,

Symbol: PCI_ATS [=n]
  │ Type  : bool
  │   Defined at drivers/pci/Kconfig:118
  │   Depends on: PCI [=y] 
  │   Selected by [n]: 
  │   - PCI_IOV [=n] && PCI [=y] 
  │   - PCI_PRI [=n] && PCI [=y]│  
  │   - PCI_PASID [=n] && PCI [=y] │  
  │   - AMD_IOMMU [=n] && IOMMU_SUPPORT [=y] && X86_64 && PCI [=y] && ACPI [=y]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/amd: silence warnings under memory pressure

2019-08-28 Thread Qian Cai
When running heavy memory pressure workloads, the system is throwing
endless warnings,

smartpqi :23:00.0: AMD-Vi: IOMMU mapping error in map_sg (io-pages:
5 reason: -12)
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40
07/10/2019
swapper/10: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC),
nodemask=(null),cpuset=/,mems_allowed=0,4
Call Trace:
 
 dump_stack+0x62/0x9a
 warn_alloc.cold.43+0x8a/0x148
 __alloc_pages_nodemask+0x1a5c/0x1bb0
 get_zeroed_page+0x16/0x20
 iommu_map_page+0x477/0x540
 map_sg+0x1ce/0x2f0
 scsi_dma_map+0xc6/0x160
 pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470 [smartpqi]
 do_IRQ+0x81/0x170
 common_interrupt+0xf/0xf
 

because the allocation could fail from iommu_map_page(), and the volume
of this call could be huge which may generate a lot of serial console
output and cosumes all CPUs.

Fix it by silencing the warning in this call site, and there is still a
dev_err() later to notify the failure.

Signed-off-by: Qian Cai 
---
 drivers/iommu/amd_iommu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index b607a92791d3..19eef1edf8ed 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2547,7 +2547,9 @@ static int map_sg(struct device *dev, struct scatterlist 
*sglist,
 
bus_addr  = address + s->dma_address + (j << 
PAGE_SHIFT);
phys_addr = (sg_phys(s) & PAGE_MASK) + (j << 
PAGE_SHIFT);
-   ret = iommu_map_page(domain, bus_addr, phys_addr, 
PAGE_SIZE, prot, GFP_ATOMIC);
+   ret = iommu_map_page(domain, bus_addr, phys_addr,
+PAGE_SIZE, prot,
+GFP_ATOMIC | __GFP_NOWARN);
if (ret)
goto out_unmap;
 
-- 
1.8.3.1



[PATCH v2] iommu/amd: fix a crash in iova_magazine_free_pfns

2019-07-11 Thread Qian Cai
The commit b3aa14f02254 ("iommu: remove the mapping_error dma_map_ops
method") incorrectly changed the checking from dma_ops_alloc_iova() in
map_sg() causes a crash under memory pressure as dma_ops_alloc_iova()
never return DMA_MAPPING_ERROR on failure but 0, so the error handling
is all wrong.

   kernel BUG at drivers/iommu/iova.c:801!
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:iova_magazine_free_pfns+0x7d/0xc0
Call Trace:
 free_cpu_cached_iovas+0xbd/0x150
 alloc_iova_fast+0x8c/0xba
 dma_ops_alloc_iova.isra.6+0x65/0xa0
 map_sg+0x8c/0x2a0
 scsi_dma_map+0xc6/0x160
 pqi_aio_submit_io+0x1f6/0x440 [smartpqi]
 pqi_scsi_queue_command+0x90c/0xdd0 [smartpqi]
 scsi_queue_rq+0x79c/0x1200
 blk_mq_dispatch_rq_list+0x4dc/0xb70
 blk_mq_sched_dispatch_requests+0x249/0x310
 __blk_mq_run_hw_queue+0x128/0x200
 blk_mq_run_work_fn+0x27/0x30
 process_one_work+0x522/0xa10
 worker_thread+0x63/0x5b0
 kthread+0x1d2/0x1f0
 ret_from_fork+0x22/0x40

Fixes: b3aa14f02254 ("iommu: remove the mapping_error dma_map_ops method")
Signed-off-by: Qian Cai 
---

v2: Fix the offensive commit directly.

 drivers/iommu/amd_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 73740b969e62..b607a92791d3 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2533,7 +2533,7 @@ static int map_sg(struct device *dev, struct scatterlist 
*sglist,
npages = sg_num_pages(dev, sglist, nelems);
 
address = dma_ops_alloc_iova(dev, dma_dom, npages, dma_mask);
-   if (address == DMA_MAPPING_ERROR)
+   if (!address)
goto out_err;
 
prot = dir2prot(direction);
-- 
1.8.3.1



[PATCH] iommu/amd: fix a crash in iova_magazine_free_pfns

2019-07-10 Thread Qian Cai
When a system is under heavy memory pressure, the allocation in
alloc_iova_fast() could still fail even flush_rcache=true, and then
causes dma_ops_alloc_iova() return 0.

pqi_scsi_queue_command
  pqi_raid_submit_scsi_cmd_with_io_request
scsi_dma_map
  map_sg
dma_ops_alloc_iova
 alloc_iova_fast

Later, map_sg()->iommu_map_page() would probably fail due to the invalid
PFN 0, and call free_iova_fast()->iova_rcache_insert() to insert it to
the rcache. Finally, it will trigger the BUG_ON(!iova) here.

kernel BUG at drivers/iommu/iova.c:801!
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:iova_magazine_free_pfns+0x7d/0xc0
Call Trace:
 free_cpu_cached_iovas+0xbd/0x150
 alloc_iova_fast+0x8c/0xba
 dma_ops_alloc_iova.isra.6+0x65/0xa0
 map_sg+0x8c/0x2a0
 scsi_dma_map+0xc6/0x160
 pqi_aio_submit_io+0x1f6/0x440 [smartpqi]
 pqi_scsi_queue_command+0x90c/0xdd0 [smartpqi]
 scsi_queue_rq+0x79c/0x1200
 blk_mq_dispatch_rq_list+0x4dc/0xb70
 blk_mq_sched_dispatch_requests+0x249/0x310
 __blk_mq_run_hw_queue+0x128/0x200
 blk_mq_run_work_fn+0x27/0x30
 process_one_work+0x522/0xa10
 worker_thread+0x63/0x5b0
 kthread+0x1d2/0x1f0
 ret_from_fork+0x22/0x40

Fix it by validating the return from the 2nd alloc_iova_fast() in
dma_ops_alloc_iova(), so map_sg() could handle the error condition
immediately.

Signed-off-by: Qian Cai 
---
 drivers/iommu/amd_iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 73740b969e62..f24c689b4e01 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1697,6 +1697,8 @@ static unsigned long dma_ops_alloc_iova(struct device 
*dev,
if (!pfn)
pfn = alloc_iova_fast(&dma_dom->iovad, pages,
  IOVA_PFN(dma_mask), true);
+   if (!pfn)
+   return DMA_MAPPING_ERROR;
 
return (pfn << PAGE_SHIFT);
 }
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/intel: remove an unused variable "length"

2019-06-17 Thread Qian Cai
The linux-next commit "iommu/vt-d: Duplicate iommu_resv_region objects
per device list" [1] left out an unused variable,

drivers/iommu/intel-iommu.c: In function 'dmar_parse_one_rmrr':
drivers/iommu/intel-iommu.c:4014:9: warning: variable 'length' set but
not used [-Wunused-but-set-variable]

[1] https://lore.kernel.org/patchwork/patch/1083073/

Signed-off-by: Qian Cai 
---
 drivers/iommu/intel-iommu.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 478ac186570b..d86d4ee5cc78 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4011,7 +4011,6 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header 
*header, void *arg)
 {
struct acpi_dmar_reserved_memory *rmrr;
struct dmar_rmrr_unit *rmrru;
-   size_t length;
 
rmrru = kzalloc(sizeof(*rmrru), GFP_KERNEL);
if (!rmrru)
@@ -4022,8 +4021,6 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header 
*header, void *arg)
rmrru->base_address = rmrr->base_address;
rmrru->end_address = rmrr->end_address;
 
-   length = rmrr->end_address - rmrr->base_address + 1;
-
rmrru->devices = dmar_alloc_dev_scope((void *)(rmrr + 1),
((void *)rmrr) + rmrr->header.length,
&rmrru->devices_cnt);
-- 
2.20.1 (Apple Git-117)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/6] iommu/vt-d: Fixes and cleanups for linux-next

2019-06-11 Thread Qian Cai
On Sun, 2019-06-09 at 10:37 +0800, Lu Baolu wrote:
> Hi Joerg,
> 
> This series includes several fixes and cleanups after delegating
> DMA domain to generic iommu. Please review and consider them for
> linux-next.
> 
> Best regards,
> Baolu
> 
> Lu Baolu (5):
>   iommu/vt-d: Don't return error when device gets right domain
>   iommu/vt-d: Set domain type for a private domain
>   iommu/vt-d: Don't enable iommu's which have been ignored
>   iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()
>   iommu/vt-d: Consolidate domain_init() to avoid duplication
> 
> Sai Praneeth Prakhya (1):
>   iommu/vt-d: Cleanup after delegating DMA domain to generic iommu
> 
>  drivers/iommu/intel-iommu.c | 210 +---
>  1 file changed, 53 insertions(+), 157 deletions(-)
> 

BTW, the linux-next commit "iommu/vt-d: Expose ISA direct mapping region via
iommu_get_resv_regions" [1] also introduced a memory leak below, as it forgets
to ask intel_iommu_put_resv_regions() to call kfree() when
CONFIG_INTEL_IOMMU_FLOPPY_WA=y.

[1] https://lore.kernel.org/patchwork/patch/1078963/

unreferenced object 0x88912ef789c8 (size 64):
  comm "swapper/0", pid 1, jiffies 4294946232 (age 5399.530s)
  hex dump (first 32 bytes):
48 83 f7 2e 91 88 ff ff 30 fa e3 00 82 88 ff ff  H...0...
00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00  
  backtrace:
[] kmem_cache_alloc_trace+0x266/0x380
[] iommu_alloc_resv_region+0x40/0xb0
[] intel_iommu_get_resv_regions+0x25e/0x2d0
[<21fbc6c3>] iommu_group_create_direct_mappings+0x159/0x3d0
[<22259268>] iommu_group_add_device+0x17b/0x4f0
[<28b91093>] iommu_group_get_for_dev+0x153/0x460
[<577c33b4>] intel_iommu_add_device+0xc4/0x210
[<587b7492>] iommu_probe_device+0x63/0x80
[<4aa997d1>] add_iommu_group+0xe/0x20
[] bus_for_each_dev+0xf0/0x150
[] bus_set_iommu+0xc6/0x100
[] intel_iommu_init+0x682/0xb0a
[<226f7444>] pci_iommu_init+0x26/0x62
[<2d8694f5>] do_one_initcall+0xe5/0x3ea
[<4bc60101>] kernel_init_freeable+0x5ad/0x640
[<91b0bad6>] kernel_init+0x11/0x138



Re: "iommu/vt-d: Delegate DMA domain to generic iommu" series breaks megaraid_sas

2019-06-11 Thread Qian Cai



> On Jun 10, 2019, at 9:41 PM, Lu Baolu  wrote:
> 
> Ah, good catch!
> 
> The device failed to be attached by a DMA domain. Can you please try the
> attached fix patch?

It works fine.

> 
> [  101.885468] pci :06:00.0: DMAR: Device is ineligible for IOMMU
> domain attach due to platform RMRR requirement.  Contact your platform
> vendor.
> [  101.900801] pci :06:00.0: Failed to add to iommu group 23: -1
> 
> Best regards,
> Baolu
> 
> On 6/10/19 10:54 PM, Qian Cai wrote:
>> On Mon, 2019-06-10 at 09:44 -0400, Qian Cai wrote:
>>> On Sun, 2019-06-09 at 10:43 +0800, Lu Baolu wrote:
>>>> Hi Qian,
>>>> 
>>>> I just posted some fix patches. I cc'ed them in your email inbox as
>>>> well. Can you please check whether they happen to fix your issue?
>>>> If not, do you mind posting more debug messages?
>>> 
>>> Unfortunately, it does not work. Here is the dmesg.
>>> 
>>> https://raw.githubusercontent.com/cailca/tmp/master/dmesg?token=AMC35QKPIZBYUM
>>> FUQKLW4ZC47ZPIK
>> This one should be good to view.
>> https://cailca.github.io/files/dmesg.txt
> <0001-iommu-vt-d-Allow-DMA-domain-attaching-to-rmrr-locked.patch>



Re: "iommu/vt-d: Delegate DMA domain to generic iommu" series breaks megaraid_sas

2019-06-10 Thread Qian Cai
On Mon, 2019-06-10 at 09:44 -0400, Qian Cai wrote:
> On Sun, 2019-06-09 at 10:43 +0800, Lu Baolu wrote:
> > Hi Qian,
> > 
> > I just posted some fix patches. I cc'ed them in your email inbox as
> > well. Can you please check whether they happen to fix your issue?
> > If not, do you mind posting more debug messages?
> 
> Unfortunately, it does not work. Here is the dmesg.
> 
> https://raw.githubusercontent.com/cailca/tmp/master/dmesg?token=AMC35QKPIZBYUM
> FUQKLW4ZC47ZPIK

This one should be good to view.

https://cailca.github.io/files/dmesg.txt


Re: "iommu/vt-d: Delegate DMA domain to generic iommu" series breaks megaraid_sas

2019-06-10 Thread Qian Cai
On Sun, 2019-06-09 at 10:43 +0800, Lu Baolu wrote:
> Hi Qian,
> 
> I just posted some fix patches. I cc'ed them in your email inbox as
> well. Can you please check whether they happen to fix your issue?
> If not, do you mind posting more debug messages?

Unfortunately, it does not work. Here is the dmesg.

https://raw.githubusercontent.com/cailca/tmp/master/dmesg?token=AMC35QKPIZBYUMFU
QKLW4ZC47ZPIK

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


"iommu/vt-d: Delegate DMA domain to generic iommu" series breaks megaraid_sas

2019-06-07 Thread Qian Cai
The linux-next series "iommu/vt-d: Delegate DMA domain to generic iommu" [1]
causes a system with the rootfs on megaraid_sas card unable to boot.

Reverted the whole series on the top of linux-next (next-20190607) fixed the
issue.

The information regards this storage card is,

[  116.466810][  T324] megaraid_sas :06:00.0: FW provided supportMaxExtLDs:
0   max_lds: 32
[  116.476052][  T324] megaraid_sas :06:00.0: controller type   :
iMR(0MB)
[  116.483646][  T324] megaraid_sas :06:00.0: Online Controller Reset(OCR)  
: Enabled
[  116.492403][  T324] megaraid_sas :06:00.0: Secure JBOD support   :
Yes
[  116.499887][  T324] megaraid_sas :06:00.0: NVMe passthru support :
No
[  116.507480][  T324] megaraid_sas :06:00.0: FW provided
[  116.612523][  T324] megaraid_sas :06:00.0: NVME page size: (0)
[  116.629991][  T324] megaraid_sas :06:00.0: INIT adapter done
[  116.714789][  T324] megaraid_sas :06:00.0: pci id:
(0x1000)/(0x0017)/(0x1d49)/(0x0500)
[  116.724228][  T324] megaraid_sas :06:00.0: unevenspan support: no
[  116.731518][  T324] megaraid_sas :06:00.0: firmware crash dump   :
no
[  116.738981][  T324] megaraid_sas :06:00.0: jbod sync map :
yes
[  116.787433][  T324] scsi host0: Avago SAS based MegaRAID driver
[  117.081088][  T324] scsi 0:0:0:0: Direct-
Access LENOVO   ST900MM0168  L587 PQ: 0 ANSI: 6

[1] https://lore.kernel.org/patchwork/cover/1078960/


[PATCH -next] iommu/intel: silence a variable set but not used

2019-06-03 Thread Qian Cai
The commit "iommu/vt-d: Probe DMA-capable ACPI name space devices"
introduced a compilation warning due to the "iommu" variable in
for_each_active_iommu() but never used the for each element, i.e,
"drhd->iommu".

drivers/iommu/intel-iommu.c: In function 'probe_acpi_namespace_devices':
drivers/iommu/intel-iommu.c:4639:22: warning: variable 'iommu' set but
not used [-Wunused-but-set-variable]
  struct intel_iommu *iommu;

Silence the warning the same way as in the commit d3ed71e5cc50
("drivers/iommu/intel-iommu.c: fix variable 'iommu' set but not used")

Signed-off-by: Qian Cai 
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b431cc6f6ba4..2897354a341a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4636,7 +4636,8 @@ static int __init platform_optin_force_iommu(void)
 static int __init probe_acpi_namespace_devices(void)
 {
struct dmar_drhd_unit *drhd;
-   struct intel_iommu *iommu;
+   /* To avoid a -Wunused-but-set-variable warning. */
+   struct intel_iommu *iommu __maybe_unused;
struct device *dev;
int i, ret = 0;
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: replace single-char identifiers in macros

2019-06-03 Thread Qian Cai
On Mon, 2019-06-03 at 14:07 +0100, Robin Murphy wrote:
> On 03/06/2019 13:59, Qian Cai wrote:
> > There are a few macros in IOMMU have single-char identifiers make the
> > code hard to read and debug. Replace them with meaningful names.
> > 
> > Suggested-by: Andrew Morton 
> > Signed-off-by: Qian Cai 
> > ---
> >   include/linux/dmar.h | 14 --
> >   1 file changed, 8 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/linux/dmar.h b/include/linux/dmar.h
> > index f8af1d770520..eb634912f475 100644
> > --- a/include/linux/dmar.h
> > +++ b/include/linux/dmar.h
> > @@ -104,12 +104,14 @@ static inline bool dmar_rcu_check(void)
> >   
> >   #define   dmar_rcu_dereference(p) rcu_dereference_check((p),
> > dmar_rcu_check())
> >   
> > -#definefor_each_dev_scope(a, c, p, d)  \
> > -   for ((p) = 0; ((d) = (p) < (c) ? dmar_rcu_dereference((a)[(p)].dev)
> > : \
> > -   NULL, (p) < (c)); (p)++)
> > -
> > -#definefor_each_active_dev_scope(a, c, p, d)   \
> > -   for_each_dev_scope((a), (c), (p), (d))  if (!(d)) { continue;
> > } else
> > +#define for_each_dev_scope(devs, cnt, i, tmp)  
> > \
> > +   for ((i) = 0; ((tmp) = (i) < (cnt) ?
> > \
> 
> Given that "tmp" actually appears to be some sort of device cursor, I'm 
> not sure that that naming really achieves the stated goal of clarity :/

"tmp" is used in the callers everywhere though, although I suppose something
like "tmp_dev" can be used if you prefer.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu: replace single-char identifiers in macros

2019-06-03 Thread Qian Cai
There are a few macros in IOMMU have single-char identifiers make the
code hard to read and debug. Replace them with meaningful names.

Suggested-by: Andrew Morton 
Signed-off-by: Qian Cai 
---
 include/linux/dmar.h | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index f8af1d770520..eb634912f475 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -104,12 +104,14 @@ static inline bool dmar_rcu_check(void)
 
 #definedmar_rcu_dereference(p) rcu_dereference_check((p), 
dmar_rcu_check())
 
-#definefor_each_dev_scope(a, c, p, d)  \
-   for ((p) = 0; ((d) = (p) < (c) ? dmar_rcu_dereference((a)[(p)].dev) : \
-   NULL, (p) < (c)); (p)++)
-
-#definefor_each_active_dev_scope(a, c, p, d)   \
-   for_each_dev_scope((a), (c), (p), (d))  if (!(d)) { continue; } else
+#define for_each_dev_scope(devs, cnt, i, tmp)  \
+   for ((i) = 0; ((tmp) = (i) < (cnt) ?\
+   dmar_rcu_dereference((devs)[(i)].dev) : NULL, (i) < (cnt)); \
+   (i)++)
+
+#define for_each_active_dev_scope(devs, cnt, i, tmp)   \
+   for_each_dev_scope((devs), (cnt), (i), (tmp))   \
+   if (!(tmp)) { continue; } else
 
 extern int dmar_table_init(void);
 extern int dmar_dev_scope_init(void);
-- 
1.8.3.1



[PATCH -next] intel-iommu: fix a variable set but not used

2019-05-31 Thread Qian Cai
The commit "iommu/vt-d: Delegate the dma domain to upper layer" left an
unused variable,

drivers/iommu/intel-iommu.c: In function 'disable_dmar_iommu':
drivers/iommu/intel-iommu.c:1652:23: warning: variable 'domain' set but
not used [-Wunused-but-set-variable]

Signed-off-by: Qian Cai 
---
 drivers/iommu/intel-iommu.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b431cc6f6ba4..073c547f247a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1649,16 +1649,12 @@ static void disable_dmar_iommu(struct intel_iommu 
*iommu)
 
spin_lock_irqsave(&device_domain_lock, flags);
list_for_each_entry_safe(info, tmp, &device_domain_list, global) {
-   struct dmar_domain *domain;
-
if (info->iommu != iommu)
continue;
 
if (!info->dev || !info->domain)
continue;
 
-   domain = info->domain;
-
__dmar_remove_one_dev_info(info);
}
spin_unlock_irqrestore(&device_domain_lock, flags);
-- 
1.8.3.1



[PATCH] dma: replace single-char identifiers in macros

2019-05-28 Thread Qian Cai
There are a few macros in DMA have single-char identifiers make the code
hard to read. Replace them with meaningful names.

Signed-off-by: Qian Cai 
---
 include/linux/dma-mapping.h | 24 
 include/linux/dmar.h| 14 --
 2 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 6309a721394b..2f0151dcfa4e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -602,14 +602,22 @@ static inline void 
dma_sync_single_range_for_device(struct device *dev,
return dma_sync_single_for_device(dev, addr + offset, size, dir);
 }
 
-#define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, 0)
-#define dma_unmap_single(d, a, s, r) dma_unmap_single_attrs(d, a, s, r, 0)
-#define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, 0)
-#define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, 0)
-#define dma_map_page(d, p, o, s, r) dma_map_page_attrs(d, p, o, s, r, 0)
-#define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0)
-#define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s, 0)
-#define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, 0)
+#define dma_map_single(dev, ptr, size, direction)  \
+   dma_map_single_attrs(dev, ptr, size, direction, 0)
+#define dma_unmap_single(dev, addr, size, direction)   \
+   dma_unmap_single_attrs(dev, addr, size, direction, 0)
+#define dma_map_sg(dev, sg, mapped_ents, direction)\
+   dma_map_sg_attrs(dev, sg, mapped_ents, direction, 0)
+#define dma_unmap_sg(dev, sg, mapped_ents, direction)  \
+   dma_unmap_sg_attrs(dev, sg, mapped_ents, direction, 0)
+#define dma_map_page(dev, page, offset, size, direction)   \
+   dma_map_page_attrs(dev, page, offset, size, direction, 0)
+#define dma_unmap_page(dev, addr, size, direction) \
+   dma_unmap_page_attrs(dev, addr, size, direction, 0)
+#define dma_get_sgtable(dev, sgt, cpu_addr, dma_addr, size)\
+   dma_get_sgtable_attrs(dev, sgt, cpu_addr, dma_addr, size, 0)
+#define dma_mmap_coherent(dev, vma, cpu_addr, dma_addr, size)  \
+   dma_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, 0)
 
 extern int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index f8af1d770520..eb634912f475 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -104,12 +104,14 @@ static inline bool dmar_rcu_check(void)
 
 #definedmar_rcu_dereference(p) rcu_dereference_check((p), 
dmar_rcu_check())
 
-#definefor_each_dev_scope(a, c, p, d)  \
-   for ((p) = 0; ((d) = (p) < (c) ? dmar_rcu_dereference((a)[(p)].dev) : \
-   NULL, (p) < (c)); (p)++)
-
-#definefor_each_active_dev_scope(a, c, p, d)   \
-   for_each_dev_scope((a), (c), (p), (d))  if (!(d)) { continue; } else
+#define for_each_dev_scope(devs, cnt, i, tmp)  \
+   for ((i) = 0; ((tmp) = (i) < (cnt) ?\
+   dmar_rcu_dereference((devs)[(i)].dev) : NULL, (i) < (cnt)); \
+   (i)++)
+
+#define for_each_active_dev_scope(devs, cnt, i, tmp)   \
+   for_each_dev_scope((devs), (cnt), (i), (tmp))   \
+   if (!(tmp)) { continue; } else
 
 extern int dmar_table_init(void);
 extern int dmar_dev_scope_init(void);
-- 
1.8.3.1



[PATCH v2] iommu/intel: fix variable 'iommu' set but not used

2019-05-22 Thread Qian Cai
The commit cf04eee8bf0e ("iommu/vt-d: Include ACPI devices in iommu=pt")
added for_each_active_iommu() in iommu_prepare_static_identity_mapping()
but never used the each element, i.e, "drhd->iommu".

drivers/iommu/intel-iommu.c: In function
'iommu_prepare_static_identity_mapping':
drivers/iommu/intel-iommu.c:3037:22: warning: variable 'iommu' set but
not used [-Wunused-but-set-variable]
 struct intel_iommu *iommu;

Fixed the warning by appending a compiler attribute __maybe_unused for it.

Suggested-by: Andrew Morton 
Signed-off-by: Qian Cai 
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a209199f3af6..09b8ff0d856a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3034,7 +3034,8 @@ static int __init 
iommu_prepare_static_identity_mapping(int hw)
 {
struct pci_dev *pdev = NULL;
struct dmar_drhd_unit *drhd;
-   struct intel_iommu *iommu;
+   /* To avoid a -Wunused-but-set-variable warning. */
+   struct intel_iommu *iommu __maybe_unused;
struct device *dev;
int i;
int ret = 0;
-- 
2.20.1 (Apple Git-117)



[RESEND PATCH] iommu/intel: fix variable 'iommu' set but not used

2019-05-22 Thread Qian Cai
The commit cf04eee8bf0e ("iommu/vt-d: Include ACPI devices in iommu=pt")
added for_each_active_iommu() in iommu_prepare_static_identity_mapping()
but never used the each element, i.e, "drhd->iommu".

drivers/iommu/intel-iommu.c: In function
'iommu_prepare_static_identity_mapping':
drivers/iommu/intel-iommu.c:3037:22: warning: variable 'iommu' set but
not used [-Wunused-but-set-variable]
  struct intel_iommu *iommu;

Fixed the warning by passing "drhd->iommu" directly to
for_each_active_iommu() which all subsequent self-assignments should be
ignored by a compiler anyway.

Signed-off-by: Qian Cai 
---
 drivers/iommu/intel-iommu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a209199f3af6..86e1ddcb4a8e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3034,7 +3034,6 @@ static int __init 
iommu_prepare_static_identity_mapping(int hw)
 {
struct pci_dev *pdev = NULL;
struct dmar_drhd_unit *drhd;
-   struct intel_iommu *iommu;
struct device *dev;
int i;
int ret = 0;
@@ -3045,7 +3044,7 @@ static int __init 
iommu_prepare_static_identity_mapping(int hw)
return ret;
}
 
-   for_each_active_iommu(iommu, drhd)
+   for_each_active_iommu(drhd->iommu, drhd)
for_each_active_dev_scope(drhd->devices, drhd->devices_cnt, i, 
dev) {
struct acpi_device_physical_node *pn;
struct acpi_device *adev;
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: print out "tag" in INVALID_PPR_REQUEST

2019-05-14 Thread Qian Cai
On Tue, 2019-05-07 at 13:47 +, Gary R Hook wrote:
> On 5/5/19 11:11 PM, Qian Cai wrote:
> > [CAUTION: External Email]
> > 
> > The commit e7f63ffc1bf1 ("iommu/amd: Update logging information for new
> > event type") introduced a variable "tag" but had never used it which
> > generates a warning below,
> > 
> > drivers/iommu/amd_iommu.c: In function 'iommu_print_event':
> > drivers/iommu/amd_iommu.c:567:33: warning: variable 'tag' set but not
> > used [-Wunused-but-set-variable]
> >    int type, devid, pasid, flags, tag;
> >   ^~~
> > so just use it during the logging.
> > 
> > Signed-off-by: Qian Cai 
> > ---
> >   drivers/iommu/amd_iommu.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> > index f7cdd2ab7f11..52f41369c5b3 100644
> > --- a/drivers/iommu/amd_iommu.c
> > +++ b/drivers/iommu/amd_iommu.c
> > @@ -631,9 +631,9 @@ static void iommu_print_event(struct amd_iommu *iommu,
> > void *__evt)
> >  pasid = ((event[0] >> 16) & 0x)
> >  | ((event[1] << 6) & 0xF);
> >  tag = event[1] & 0x03FF;
> > -   dev_err(dev, "Event logged [INVALID_PPR_REQUEST
> > device=%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x]\n",
> > +   dev_err(dev, "Event logged [INVALID_PPR_REQUEST
> > device=%02x:%02x.%x pasid=0x%05x tag=0x%04x address=0x%llx flags=0x%04x]\n",
> >  PCI_BUS_NUM(devid), PCI_SLOT(devid),
> > PCI_FUNC(devid),
> > -   pasid, address, flags);
> > +   pasid, tag, address, flags);
> >  break;
> >  default:
> >  dev_err(dev, "Event logged [UNKNOWN event[0]=0x%08x
> > event[1]=0x%08x event[2]=0x%08x event[3]=0x%08x\n",
> 
> I did manage to overlook that variable when I posted the original patch. 
> But it looks to me like 41e59a41fc5d1 (iommu tree) already fixed this... 
> I'm not sure why it never got pushed to the main tree.

Jroedel, I am wondering what the plan for 41e59a41fc5d1 (iommu tree) or this
patch to be pushed to the linux-next or mainline...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/intel: fix variable 'iommu' set but not used

2019-05-10 Thread Qian Cai
The commit cf04eee8bf0e ("iommu/vt-d: Include ACPI devices in iommu=pt")
added for_each_active_iommu() in iommu_prepare_static_identity_mapping()
but never used the each element, i.e, "drhd->iommu".

drivers/iommu/intel-iommu.c: In function
'iommu_prepare_static_identity_mapping':
drivers/iommu/intel-iommu.c:3037:22: warning: variable 'iommu' set but not
used [-Wunused-but-set-variable]
  struct intel_iommu *iommu;

Fixed the warning by passing "drhd->iommu" directly to
for_each_active_iommu() which all subsequent self-assignments should be
ignored by a compiler anyway.

Signed-off-by: Qian Cai 
---
 drivers/iommu/intel-iommu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a209199f3af6..86e1ddcb4a8e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3034,7 +3034,6 @@ static int __init 
iommu_prepare_static_identity_mapping(int hw)
 {
struct pci_dev *pdev = NULL;
struct dmar_drhd_unit *drhd;
-   struct intel_iommu *iommu;
struct device *dev;
int i;
int ret = 0;
@@ -3045,7 +3044,7 @@ static int __init 
iommu_prepare_static_identity_mapping(int hw)
return ret;
}
 
-   for_each_active_iommu(iommu, drhd)
+   for_each_active_iommu(drhd->iommu, drhd)
for_each_active_dev_scope(drhd->devices, drhd->devices_cnt, i, 
dev) {
struct acpi_device_physical_node *pn;
struct acpi_device *adev;
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH -next v2] iommu/amd: fix a null-ptr-deref in map_sg()

2019-05-06 Thread Qian Cai
The commit 1a1079011da3 ("iommu/amd: Flush not present cache in
iommu_map_page") added domain_flush_np_cache() in map_sg() which
triggered a crash below during boot. sg_next() could return NULL if
sg_is_last() is true, so after for_each_sg(sglist, s, nelems, i), "s"
could be NULL which ends up deferencing a NULL pointer later here,

domain_flush_np_cache(domain, s->dma_address, s->dma_length);

so move domain_flush_np_cache() call inside for_each_sg() to loop over
each sg element.

BUG: kernel NULL pointer dereference, address: 0018
PGD 0 P4D 0
Oops:  [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
CPU: 8 PID: 659 Comm: kworker/8:1 Tainted: GB
5.1.0-rc7-next-20190506+ #20
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40
01/25/2019
Workqueue: events work_for_cpu_fn
RIP: 0010:map_sg+0x297/0x2e0
Call Trace:
 scsi_dma_map+0xc6/0x160
 pqi_raid_submit_scsi_cmd_with_io_request+0x3b4/0x470 [smartpqi]
 pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
 scsi_queue_rq+0x79c/0x1200
 blk_mq_dispatch_rq_list+0x4dc/0xb70
 blk_mq_sched_dispatch_requests+0x2e1/0x310
 __blk_mq_run_hw_queue+0x128/0x200
 __blk_mq_delay_run_hw_queue+0x2b7/0x2d0
 blk_mq_run_hw_queue+0x127/0x1d0
 blk_mq_sched_insert_request+0x25c/0x320
 __scsi_scan_target+0x14d/0x790
 scsi_scan_target+0x115/0x120
 sas_rphy_add+0x1d1/0x280 [scsi_transport_sas]
 pqi_add_sas_device+0x187/0x1e0 [smartpqi]
 pqi_update_device_list+0x1227/0x1460 [smartpqi]
 pqi_update_scsi_devices+0x755/0x1980 [smartpqi]
 pqi_scan_scsi_devices+0x57/0xf0 [smartpqi]
 pqi_ctrl_init+0x149e/0x14df [smartpqi]
 pqi_pci_probe.cold.49+0x808/0x818 [smartpqi]
 local_pci_probe+0x7a/0xc0
 work_for_cpu_fn+0x2e/0x50
 process_one_work+0x522/0xa10
 worker_thread+0x363/0x5b0
 kthread+0x1d2/0x1f0
 ret_from_fork+0x22/0x40

Signed-off-by: Qian Cai 
---

v2: Call domain_flush_np_cache() inside for_each_sg().

 drivers/iommu/amd_iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 867f8b155000..b7132812ce59 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2576,9 +2576,9 @@ static int map_sg(struct device *dev, struct scatterlist 
*sglist,
 */
s->dma_address += address + (s->offset & ~PAGE_MASK);
s->dma_length   = s->length;
-   }
 
-   domain_flush_np_cache(domain, s->dma_address, s->dma_length);
+   domain_flush_np_cache(domain, s->dma_address, s->dma_length);
+   }
 
return nelems;
 
-- 
2.20.1 (Apple Git-117)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH -next] iommu/amd: fix a null-ptr-deref in map_sg()

2019-05-06 Thread Qian Cai
The commit 1a1079011da3 ("iommu/amd: Flush not present cache in
iommu_map_page") added domain_flush_np_cache() in map_sg() which
triggered a crash below during boot. sg_next() could return NULL if
sg_is_last() is true, so after for_each_sg(sglist, s, nelems, i), "s"
could be NULL which ends up deferencing a NULL pointer later here,

domain_flush_np_cache(domain, s->dma_address, s->dma_length);

BUG: kernel NULL pointer dereference, address: 0018
PGD 0 P4D 0
Oops:  [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
CPU: 8 PID: 659 Comm: kworker/8:1 Tainted: GB
5.1.0-rc7-next-20190506+ #20
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40
01/25/2019
Workqueue: events work_for_cpu_fn
RIP: 0010:map_sg+0x297/0x2e0
Call Trace:
 scsi_dma_map+0xc6/0x160
 pqi_raid_submit_scsi_cmd_with_io_request+0x3b4/0x470 [smartpqi]
 pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
 scsi_queue_rq+0x79c/0x1200
 blk_mq_dispatch_rq_list+0x4dc/0xb70
 blk_mq_sched_dispatch_requests+0x2e1/0x310
 __blk_mq_run_hw_queue+0x128/0x200
 __blk_mq_delay_run_hw_queue+0x2b7/0x2d0
 blk_mq_run_hw_queue+0x127/0x1d0
 blk_mq_sched_insert_request+0x25c/0x320
 __scsi_scan_target+0x14d/0x790
 scsi_scan_target+0x115/0x120
 sas_rphy_add+0x1d1/0x280 [scsi_transport_sas]
 pqi_add_sas_device+0x187/0x1e0 [smartpqi]
 pqi_update_device_list+0x1227/0x1460 [smartpqi]
 pqi_update_scsi_devices+0x755/0x1980 [smartpqi]
 pqi_scan_scsi_devices+0x57/0xf0 [smartpqi]
 pqi_ctrl_init+0x149e/0x14df [smartpqi]
 pqi_pci_probe.cold.49+0x808/0x818 [smartpqi]
 local_pci_probe+0x7a/0xc0
 work_for_cpu_fn+0x2e/0x50
 process_one_work+0x522/0xa10
 worker_thread+0x363/0x5b0
 kthread+0x1d2/0x1f0
 ret_from_fork+0x22/0x40

Signed-off-by: Qian Cai 
---
 drivers/iommu/amd_iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 867f8b155000..908f5618fb5c 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2578,7 +2578,8 @@ static int map_sg(struct device *dev, struct scatterlist 
*sglist,
s->dma_length   = s->length;
}
 
-   domain_flush_np_cache(domain, s->dma_address, s->dma_length);
+   if (s)
+   domain_flush_np_cache(domain, s->dma_address, s->dma_length);
 
return nelems;
 
-- 
2.20.1 (Apple Git-117)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/amd: print out "tag" in INVALID_PPR_REQUEST

2019-05-05 Thread Qian Cai
The commit e7f63ffc1bf1 ("iommu/amd: Update logging information for new
event type") introduced a variable "tag" but had never used it which
generates a warning below,

drivers/iommu/amd_iommu.c: In function 'iommu_print_event':
drivers/iommu/amd_iommu.c:567:33: warning: variable 'tag' set but not
used [-Wunused-but-set-variable]
  int type, devid, pasid, flags, tag;
 ^~~
so just use it during the logging.

Signed-off-by: Qian Cai 
---
 drivers/iommu/amd_iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index f7cdd2ab7f11..52f41369c5b3 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -631,9 +631,9 @@ static void iommu_print_event(struct amd_iommu *iommu, void 
*__evt)
pasid = ((event[0] >> 16) & 0x)
| ((event[1] << 6) & 0xF);
tag = event[1] & 0x03FF;
-   dev_err(dev, "Event logged [INVALID_PPR_REQUEST 
device=%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x]\n",
+   dev_err(dev, "Event logged [INVALID_PPR_REQUEST 
device=%02x:%02x.%x pasid=0x%05x tag=0x%04x address=0x%llx flags=0x%04x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
-   pasid, address, flags);
+   pasid, tag, address, flags);
break;
default:
dev_err(dev, "Event logged [UNKNOWN event[0]=0x%08x 
event[1]=0x%08x event[2]=0x%08x event[3]=0x%08x\n",
-- 
2.20.1 (Apple Git-117)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: "iommu/amd: Set exclusion range correctly" causes smartpqi offline

2019-05-05 Thread Qian Cai
On 4/26/19 10:52 AM, Qian Cai wrote:
> Applying some memory pressure would causes smartpqi offline even in today's
> linux-next. This can always be reproduced by a LTP test cases [1] or sometimes
> just compiling kernels.
> 
> Reverting the commit "iommu/amd: Set exclusion range correctly" fixed the 
> issue.
> 
> [  213.437112] smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
> domain=0x address=0x1000 flags=0x]
> [  213.447659] smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
> domain=0x address=0x1800 flags=0x]
> [  233.362013] smartpqi :23:00.0: controller is offline: status code 
> 0x14803
> [  233.369359] smartpqi :23:00.0: controller offline
> [  233.388915] print_req_error: I/O error, dev sdb, sector 3317352 flags 
> 201
> [  233.388921] sd 0:0:0:0: [sdb] tag#95 UNKNOWN(0x2003) Result: hostbyte=0x01
> driverbyte=0x00
> [  233.388931] sd 0:0:0:0: [sdb] tag#95 CDB: opcode=0x2a 2a 00 00 55 89 00 00 
> 01
> 08 00
> [  233.389003] Write-error on swap-device (254:1:4474640)
> [  233.389015] Write-error on swap-device (254:1:2190776)
> [  233.389023] Write-error on swap-device (254:1:8351936)
> 
> [1] /opt/ltp/testcases/bin/mtest01 -p80 -w

It turned out another linux-next commit is needed to reproduce this, i.e.,
7a5dbf3ab2f0 ("iommu/amd: Remove the leftover of bypass support"). Specifically,
the chunks for map_sg() and unmap_sg(). This has been reproduced on 3 different
HPE ProLiant DL385 Gen10 systems so far.

Either reverted the chunks (map_sg() and unmap_sg()) on the top of the latest
linux-next fixed the issue or applied them on the top of the mainline v5.1
reproduced it immediately.

Lots of time it triggered this BUG_ON(!iova) in iova_magazine_free_pfns()
instead of the smartpqi offline.

kernel BUG at drivers/iommu/iova.c:813!
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:iova_magazine_free_pfns+0x7d/0xc0
Call Trace:
 free_cpu_cached_iovas+0xbd/0x150
 alloc_iova_fast+0x8c/0xba
 dma_ops_alloc_iova.isra.6+0x65/0xa0
 map_sg+0x8c/0x2a0
 scsi_dma_map+0xc6/0x160
 pqi_aio_submit_io+0x1f6/0x440 [smartpqi]
 pqi_scsi_queue_command+0x90c/0xdd0 [smartpqi]
 scsi_queue_rq+0x79c/0x1200
 blk_mq_dispatch_rq_list+0x4dc/0xb70
 blk_mq_sched_dispatch_requests+0x249/0x310
 __blk_mq_run_hw_queue+0x128/0x200
 blk_mq_run_work_fn+0x27/0x30
 process_one_work+0x522/0xa10
 worker_thread+0x63/0x5b0
 kthread+0x1d2/0x1f0
 ret_from_fork+0x22/0x40
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: "iommu/amd: Set exclusion range correctly" causes smartpqi offline

2019-05-03 Thread Qian Cai
On Mon, 2019-04-29 at 16:23 +0200, Joerg Roedel wrote:
> On Fri, Apr 26, 2019 at 11:55:12AM -0400, Qian Cai wrote:
> > https://git.sr.ht/~cai/linux-debug/blob/master/dmesg
> 
> Thanks, I can't see any definitions for unity ranges or exclusion ranges
> in the IVRS table dump, which makes it even more weird.
> 
> Can you please send me the output of
> 
>   for f in `ls -1 /sys/kernel/iommu_groups/*/reserved_regions`; do echo "-
> --$f"; cat $f;done
> 
> to double-check?

https://git.sr.ht/~cai/linux-debug/blob/master/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: "iommu/amd: Set exclusion range correctly" causes smartpqi offline

2019-04-30 Thread Qian Cai



On 4/29/19 10:23 AM, Joerg Roedel wrote:
> On Fri, Apr 26, 2019 at 11:55:12AM -0400, Qian Cai wrote:
>> https://git.sr.ht/~cai/linux-debug/blob/master/dmesg
> 
> Thanks, I can't see any definitions for unity ranges or exclusion ranges
> in the IVRS table dump, which makes it even more weird.
> 
> Can you please send me the output of
> 
>   for f in `ls -1 /sys/kernel/iommu_groups/*/reserved_regions`; do echo 
> "---$f"; cat $f;done
> 
> to double-check?

It is going to take a while to reserve that system again to gather the
information. BTW, this is only reproducible on linux-next but not mainline.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: "iommu/amd: Set exclusion range correctly" causes smartpqi offline

2019-04-26 Thread Qian Cai
On Fri, 2019-04-26 at 17:26 +0200, Joerg Roedel wrote:
> On Fri, Apr 26, 2019 at 10:52:28AM -0400, Qian Cai wrote:
> > Applying some memory pressure would causes smartpqi offline even in today's
> > linux-next. This can always be reproduced by a LTP test cases [1] or
> > sometimes
> > just compiling kernels.
> > 
> > Reverting the commit "iommu/amd: Set exclusion range correctly" fixed the
> > issue.
> > 
> > [  213.437112] smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
> > domain=0x address=0x1000 flags=0x]
> > [  213.447659] smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
> > domain=0x address=0x1800 flags=0x]
> > [  233.362013] smartpqi :23:00.0: controller is offline: status code
> > 0x14803
> > [  233.369359] smartpqi :23:00.0: controller offline
> > [  233.388915] print_req_error: I/O error, dev sdb, sector 3317352 flags
> > 201
> > [  233.388921] sd 0:0:0:0: [sdb] tag#95 UNKNOWN(0x2003) Result:
> > hostbyte=0x01
> > driverbyte=0x00
> > [  233.388931] sd 0:0:0:0: [sdb] tag#95 CDB: opcode=0x2a 2a 00 00 55 89 00
> > 00 01
> > 08 00
> > [  233.389003] Write-error on swap-device (254:1:4474640)
> > [  233.389015] Write-error on swap-device (254:1:2190776)
> > [  233.389023] Write-error on swap-device (254:1:8351936)
> > 
> > [1] /opt/ltp/testcases/bin/mtest01 -p80 -w
> 
> I can't explain that, can you please boot with 'amd_iommu_dump' on the
> kernel command line and send me dmesg after boot?

https://git.sr.ht/~cai/linux-debug/blob/master/dmesg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

"iommu/amd: Set exclusion range correctly" causes smartpqi offline

2019-04-26 Thread Qian Cai
Applying some memory pressure would causes smartpqi offline even in today's
linux-next. This can always be reproduced by a LTP test cases [1] or sometimes
just compiling kernels.

Reverting the commit "iommu/amd: Set exclusion range correctly" fixed the issue.

[  213.437112] smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x address=0x1000 flags=0x]
[  213.447659] smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x address=0x1800 flags=0x]
[  233.362013] smartpqi :23:00.0: controller is offline: status code 0x14803
[  233.369359] smartpqi :23:00.0: controller offline
[  233.388915] print_req_error: I/O error, dev sdb, sector 3317352 flags 201
[  233.388921] sd 0:0:0:0: [sdb] tag#95 UNKNOWN(0x2003) Result: hostbyte=0x01
driverbyte=0x00
[  233.388931] sd 0:0:0:0: [sdb] tag#95 CDB: opcode=0x2a 2a 00 00 55 89 00 00 01
08 00
[  233.389003] Write-error on swap-device (254:1:4474640)
[  233.389015] Write-error on swap-device (254:1:2190776)
[  233.389023] Write-error on swap-device (254:1:8351936)

[1] /opt/ltp/testcases/bin/mtest01 -p80 -w
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 0/7] dma-debug cleanup and dynamic allocation

2018-12-10 Thread Qian Cai
On 12/10/18 9:00 AM, Robin Murphy wrote:
> Hi all,
> 
> Here's some assorted cleanup and improvements to dma-debug which grew
> out of the problem that certain drivers use very large numbers of DMA
> mappings, and knowing when to override "dma_debug_entries=..." and what
> value to override it with can be a less-than-obvious task for users.
> 
> The main part is patch #4, wherein we make dma-debug clever enough
> to allocate more entries dynamically if needed, such that the
> preallocation value becomes more of a quality-of-life option than a
> necessity. Patches #5 and #6 do some cruft-removal to allow patch #7
> to make the allocation behaviour more efficient in general.
> 
> Patches #1, #2 and #4 are some other cleanup and handy features which
> fell out of the discussion/development.
> 
> Robin.
> 
> 
> Robin Murphy (7):
>   dma-debug: Use pr_fmt()
>   dma-debug: Expose nr_total_entries in debugfs
>   dma-debug: Dynamically expand the dma_debug_entry pool
>   dma-debug: Make leak-like behaviour apparent
>   x86/dma/amd-gart: Stop resizing dma_debug_entry pool
>   dma/debug: Remove dma_debug_resize_entries()
>   dma-debug: Batch dma_debug_entry allocation
> 
>  Documentation/DMA-API.txt |  20 +-
>  Documentation/x86/x86_64/boot-options.txt |   5 +-
>  arch/x86/kernel/amd_gart_64.c |  23 ---
>  include/linux/dma-debug.h |   7 -
>  kernel/dma/debug.c        | 217 ++
>  5 files changed, 109 insertions(+), 163 deletions(-)
> 

Tested-by: Qian Cai 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] dma-debug: Kconfig for PREALLOC_DMA_DEBUG_ENTRIES

2018-11-30 Thread Qian Cai
The amount of DMA mappings from Hisilicon HNS ethernet devices is huge,
so it could trigger "DMA-API: debugging out of memory - disabling".

hnae_get_handle [1]
  hnae_init_queue
hnae_init_ring
  hnae_alloc_buffers [2]
debug_dma_map_page
  dma_entry_alloc

[1] for (i = 0; i < handle->q_num; i++)
[2] for (i = 0; i < ring->desc_num; i++)

Also, "#define HNS_DSAF_MAX_DESC_CNT 1024"

On this Huawei TaiShan 2280 aarch64 server, it has reached the limit
already,

4 (NICs) x 16 (queues) x 1024 (port descption numbers) = 65536

Added a Kconfig entry for PREALLOC_DMA_DEBUG_ENTRIES, so make it easier
for users to deal with special cases like this.

Signed-off-by: Qian Cai 
---

Changes since v1:
* Increased the default value if has HNS_ENET suggested by Robin.

 kernel/dma/debug.c |  9 ++---
 lib/Kconfig.debug  | 10 ++
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 231ca4628062..3752fb23f72f 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -41,11 +41,6 @@
 #define HASH_FN_SHIFT   13
 #define HASH_FN_MASK(HASH_SIZE - 1)
 
-/* allow architectures to override this if absolutely required */
-#ifndef PREALLOC_DMA_DEBUG_ENTRIES
-#define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
-#endif
-
 enum {
dma_debug_single,
dma_debug_page,
@@ -132,7 +127,7 @@ static u32 min_free_entries;
 static u32 nr_total_entries;
 
 /* number of preallocated entries requested by kernel cmdline */
-static u32 nr_prealloc_entries = PREALLOC_DMA_DEBUG_ENTRIES;
+static u32 nr_prealloc_entries = CONFIG_PREALLOC_DMA_DEBUG_ENTRIES;
 
 /* debugfs dentry's for the stuff above */
 static struct dentry *dma_debug_dent__read_mostly;
@@ -1063,7 +1058,7 @@ static __init int dma_debug_entries_cmdline(char *str)
if (!str)
return -EINVAL;
if (!get_option(&str, &nr_prealloc_entries))
-   nr_prealloc_entries = PREALLOC_DMA_DEBUG_ENTRIES;
+   nr_prealloc_entries = CONFIG_PREALLOC_DMA_DEBUG_ENTRIES;
return 0;
 }
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1af29b8224fd..9f85a7a13647 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1659,6 +1659,16 @@ config DMA_API_DEBUG
 
  If unsure, say N.
 
+config PREALLOC_DMA_DEBUG_ENTRIES
+   int "Preallocated DMA-API debugging entries"
+   depends on DMA_API_DEBUG
+   default 131072 if HNS_ENET
+   default 65536
+   help
+ The number of preallocated entries for DMA-API debugging code. One
+ entry is required per DMA-API allocation. Increase this if the DMA-API
+ debugging code disables itself because the default is too low.
+
 config DMA_API_DEBUG_SG
bool "Debug DMA scatter-gather usage"
default y
-- 
2.17.2 (Apple Git-113)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] dma-debug: Kconfig for PREALLOC_DMA_DEBUG_ENTRIES

2018-11-30 Thread Qian Cai
The amount of DMA mappings from Hisilicon HNS ethernet devices is huge,
so it could trigger "DMA-API: debugging out of memory - disabling".

hnae_get_handle [1]
  hnae_init_queue
hnae_init_ring
  hnae_alloc_buffers [2]
debug_dma_map_page
  dma_entry_alloc

[1] for (i = 0; i < handle->q_num; i++)
[2] for (i = 0; i < ring->desc_num; i++)

On this Huawei TaiShan 2280 aarch64 server, it has reached the limit
already,

4 (ports) x 16 (handles) x 1024 (rings) = 65536

Added a Kconfig entry for PREALLOC_DMA_DEBUG_ENTRIES, so make it easier
for users to deal with special cases like this.

Signed-off-by: Qian Cai 
---
 kernel/dma/debug.c | 9 ++---
 lib/Kconfig.debug  | 9 +
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 231ca4628062..3752fb23f72f 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -41,11 +41,6 @@
 #define HASH_FN_SHIFT   13
 #define HASH_FN_MASK(HASH_SIZE - 1)
 
-/* allow architectures to override this if absolutely required */
-#ifndef PREALLOC_DMA_DEBUG_ENTRIES
-#define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
-#endif
-
 enum {
dma_debug_single,
dma_debug_page,
@@ -132,7 +127,7 @@ static u32 min_free_entries;
 static u32 nr_total_entries;
 
 /* number of preallocated entries requested by kernel cmdline */
-static u32 nr_prealloc_entries = PREALLOC_DMA_DEBUG_ENTRIES;
+static u32 nr_prealloc_entries = CONFIG_PREALLOC_DMA_DEBUG_ENTRIES;
 
 /* debugfs dentry's for the stuff above */
 static struct dentry *dma_debug_dent__read_mostly;
@@ -1063,7 +1058,7 @@ static __init int dma_debug_entries_cmdline(char *str)
if (!str)
return -EINVAL;
if (!get_option(&str, &nr_prealloc_entries))
-   nr_prealloc_entries = PREALLOC_DMA_DEBUG_ENTRIES;
+   nr_prealloc_entries = CONFIG_PREALLOC_DMA_DEBUG_ENTRIES;
return 0;
 }
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1af29b8224fd..2c281edcb5ad 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1659,6 +1659,15 @@ config DMA_API_DEBUG
 
  If unsure, say N.
 
+config PREALLOC_DMA_DEBUG_ENTRIES
+   int "Preallocated DMA-API debugging entries"
+   depends on DMA_API_DEBUG
+   default 65536
+   help
+ The number of preallocated entries for DMA-API debugging code. One
+ entry is required per DMA-API allocation. Increase this if the DMA-API
+ debugging code disables itself because the default is too low.
+
 config DMA_API_DEBUG_SG
bool "Debug DMA scatter-gather usage"
default y
-- 
2.17.2 (Apple Git-113)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] dma-debug: hns_enet_drv could use more DMA entries

2018-11-29 Thread Qian Cai
The amount of DMA mappings from Hisilicon HNS ethernet devices is huge,
so it could trigger "DMA-API: debugging out of memory - disabling".

hnae_get_handle [1]
  hnae_init_queue
hnae_init_ring
  hnae_alloc_buffers [2]
debug_dma_map_page
  dma_entry_alloc

[1] for (i = 0; i < handle->q_num; i++)
[2] for (i = 0; i < ring->desc_num; i++)

On this Huawei TaiShan 2280 aarch64 server, it has reached the limit
already,

4 (ports) x 16 (handles) x 1024 (rings) = 65536

Signed-off-by: Qian Cai 
---
 kernel/dma/debug.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 231ca4628062..ae91689cc9ad 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -43,8 +43,13 @@
 
 /* allow architectures to override this if absolutely required */
 #ifndef PREALLOC_DMA_DEBUG_ENTRIES
+/* amount of DMA mappings on this driver is huge. */
+#ifdef HNS_ENET
+#define PREALLOC_DMA_DEBUG_ENTRIES (1 << 17)
+#else
 #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
 #endif
+#endif
 
 enum {
dma_debug_single,
-- 
2.17.2 (Apple Git-113)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu