Hi Adrian,

On 2/11/22 10:45 AM, Adrian Huang wrote:
From: Adrian Huang<ahuan...@lenovo.com>

When enabling VMD and IOMMU scalable mode, the following kernel panic
call trace/kernel log is shown in Eagle Stream platform (Sapphire Rapids
CPU) during booting:

pci 0000:59:00.5: Adding to iommu group 42
...
vmd 0000:59:00.5: PCI host bridge to bus 10000:80
pci 10000:80:01.0: [8086:352a] type 01 class 0x060400
pci 10000:80:01.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
pci 10000:80:01.0: enabling Extended Tags
pci 10000:80:01.0: PME# supported from D0 D3hot D3cold
pci 10000:80:01.0: DMAR: Setup RID2PASID failed
pci 10000:80:01.0: Failed to add to iommu group 42: -16
pci 10000:80:03.0: [8086:352b] type 01 class 0x060400
pci 10000:80:03.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
pci 10000:80:03.0: enabling Extended Tags
pci 10000:80:03.0: PME# supported from D0 D3hot D3cold
list_add double add: new=ff4d61160b74b8a0, prev=ff4d611d8e245c10, 
next=ff4d61160b74b8a0.
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:29!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.17.0-rc3+ #7
Hardware name: Lenovo ThinkSystem SR650V3/SB27A86647, BIOS ESE101Y-1.00 
01/13/2022
Workqueue: events work_for_cpu_fn
RIP: 0010:__list_add_valid.cold+0x26/0x3f
Code: 9a 4a ab ff 4c 89 c1 48 c7 c7 40 0c d9 9e e8 b9 b1 fe ff 0f 0b 48 89 f2 4c 89 
c1 48 89 fe 48 c7 c7 f0 0c d9 9e e8 a2 b1 fe ff <0f> 0b 48 89 d1 4c 89 c6 4c 89 
ca 48 c7 c7 98 0c d9 9e e8 8b b1 fe
RSP: 0000:ff5ad434865b3a40 EFLAGS: 00010246
RAX: 0000000000000058 RBX: ff4d61160b74b880 RCX: ff4d61255e1fffa8
RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd34f20
RBP: ff4d611d8e245c00 R08: 0000000000000000 R09: ff5ad434865b3888
R10: ff5ad434865b3880 R11: ff4d61257fdc6fe8 R12: ff4d61160b74b8a0
R13: ff4d61160b74b8a0 R14: ff4d611d8e245c10 R15: ff4d611d8001ba70
FS:  0000000000000000(0000) GS:ff4d611d5ea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ff4d611fa1401000 CR3: 0000000aa0210001 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
  <TASK>
  intel_pasid_alloc_table+0x9c/0x1d0
  dmar_insert_one_dev_info+0x423/0x540
  ? device_to_iommu+0x12d/0x2f0
  intel_iommu_attach_device+0x116/0x290
  __iommu_attach_device+0x1a/0x90
  iommu_group_add_device+0x190/0x2c0
  __iommu_probe_device+0x13e/0x250
  iommu_probe_device+0x24/0x150
  iommu_bus_notifier+0x69/0x90
  blocking_notifier_call_chain+0x5a/0x80
  device_add+0x3db/0x7b0
  ? arch_memremap_can_ram_remap+0x19/0x50
  ? memremap+0x75/0x140
  pci_device_add+0x193/0x1d0
  pci_scan_single_device+0xb9/0xf0
  pci_scan_slot+0x4c/0x110
  pci_scan_child_bus_extend+0x3a/0x290
  vmd_enable_domain.constprop.0+0x63e/0x820
  vmd_probe+0x163/0x190
  local_pci_probe+0x42/0x80
  work_for_cpu_fn+0x13/0x20
  process_one_work+0x1e2/0x3b0
  worker_thread+0x1c4/0x3a0
  ? rescuer_thread+0x370/0x370
  kthread+0xc7/0xf0
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
...
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x1ca00000 from 0xffffffff81000000 (relocation range: 
0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]---

The following 'lspci' output shows devices '10000:80:*' are subdevices of
the VMD device 0000:59:00.5:

   $ lspci
   ...
   0000:59:00.5 RAID bus controller: Intel Corporation Volume Management Device 
NVMe RAID Controller (rev 20)
   ...
   10000:80:01.0 PCI bridge: Intel Corporation Device 352a (rev 03)
   10000:80:03.0 PCI bridge: Intel Corporation Device 352b (rev 03)
   10000:80:05.0 PCI bridge: Intel Corporation Device 352c (rev 03)
   10000:80:07.0 PCI bridge: Intel Corporation Device 352d (rev 03)
   10000:81:00.0 Non-Volatile memory controller: Intel Corporation NVMe 
Datacenter SSD [3DNAND, Beta Rock Controller]
   10000:82:00.0 Non-Volatile memory controller: Intel Corporation NVMe 
Datacenter SSD [3DNAND, Beta Rock Controller]

The symptom 'list_add double add' is caused by the following failure
message:

   pci 10000:80:01.0: DMAR: Setup RID2PASID failed
   pci 10000:80:01.0: Failed to add to iommu group 42: -16
   pci 10000:80:03.0: [8086:352b] type 01 class 0x060400

Device 10000:80:01.0 is the subdevice of the VMD device 0000:59:00.5,
so invoking intel_pasid_alloc_table() gets the pasid_table of the VMD
device 0000:59:00.5. Here is call path:

   intel_pasid_alloc_table
     pci_for_each_dma_alias
      get_alias_pasid_table
        search_pasid_table

pci_real_dma_dev() in pci_for_each_dma_alias() gets the real dma device
which is the VMD device 0000:59:00.5. However, pte of the VMD device
0000:59:00.5 has been configured during this message "pci 0000:59:00.5:
Adding to iommu group 42". So, the status -EBUSY is returned when
configuring pasid entry for device 10000:80:01.0.

It then invokes dmar_remove_one_dev_info() to release
'struct device_domain_info *' from iommu_devinfo_cache. But, the pasid
table is not released because of the following statement in
__dmar_remove_one_dev_info():

        if (info->dev && !dev_is_real_dma_subdevice(info->dev)) {
                ...
                intel_pasid_free_table(info->dev);
         }

The subsequent dmar_insert_one_dev_info() operation of device
10000:80:03.0 allocates 'struct device_domain_info *' from
iommu_devinfo_cache. The allocated address is the same address that
is released previously for device 10000:80:01.0. Finally, invoking
device_attach_pasid_table() causes the issue.

The solution is to prevent from allocating pasid table if those
devices are subdevices of the VMD device.

Thanks for your patch!

Is this the only patch that is needed to make VMD devices work in VT-d
scalable mode?

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to