Re: [mainline] [linux-next] [6.8-rc1] [FC] [DLPAR] OOps kernel crash after performing dlpar remove test

2024-02-02 Thread Robin Murphy

On 02/02/2024 7:11 am, Tasmiya Nalatwad wrote:

Greetings,

I have tried reverting some latest commits and tested the issue. I see
reverting below commit hits to some other problem which was reported
earlier and the patch for fixing that issue is under review

1. Reverted commit :

  commit 17de3f5fdd35676b0e3d41c7c9bf4e3032eb3673
  iommu: Retire bus ops

2. Below are the traces of other issue that was seen after reverting
above commit, And below is the patch which fixes this issue is that is 
under review


Patch :
https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg225210.html


Yes, it's the same fundamental issue (failing to manage the IOMMU state 
for dynamic addition/removal) that's been present since the commit cited 
in the fix patch; the bus ops change just makes us more sensitive to the 
lack of unregistration on remove, vs. the lack of registration on add. 
The fix should solve both aspects (although I'd be inlined to agree with 
factoring out the registration between both paths).


Thanks,
Robin.


--- Traces ---

[  981.124047] Kernel attempted to read user page (30) - exploit
attempt? (uid: 0)
[  981.124053] BUG: Kernel NULL pointer dereference on read at 0x0030
[  981.124056] Faulting instruction address: 0xc0689864
[  981.124060] Oops: Kernel access of bad area, sig: 11 [#1]
[  981.124063] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
[  981.124067] Modules linked in: sit tunnel4 ip_tunnel rpadlpar_io
rpaphp xsk_diag nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding
tls ip_set rfkill nf_tables libcrc32c nfnetlink pseries_rng vmx_crypto
binfmt_misc ext4 mbcache jbd2 dm_service_time sd_mod t10_pi
crc64_rocksoft crc64 sg ibmvfc scsi_transport_fc ibmveth mlx5_core mlxfw
psample dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
[  981.124111] CPU: 24 PID: 78294 Comm: drmgr Kdump: loaded Not tainted
6.5.0-rc6-next-20230817-auto #1
[  981.124115] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200
0xf06 of:IBM,FW1030.30 (NH1030_062) hv:phyp pSeries
[  981.124118] NIP:  c0689864 LR: c09bd05c CTR:
c005fb90
[  981.124121] REGS: c000a878b1e0 TRAP: 0300   Not tainted
(6.5.0-rc6-next-20230817-auto)
[  981.124125] MSR:  80009033   CR:
44822422  XER: 20040006
[  981.124132] CFAR: c09bd058 DAR: 0030 DSISR:
4000 IRQMASK: 0
[  981.124132] GPR00: c09bd05c c000a878b480 c1451400

[  981.124132] GPR04: c128d510  ceeccf50
c000a878b420
[  981.124132] GPR08: 0001 ceed76e0 c2c24c28
0220
[  981.124132] GPR12: c005fb90 c01837969300 

[  981.124132] GPR16:   

[  981.124132] GPR20: c125cef0  c125cf08
c2bce500
[  981.124132] GPR24: c000573e90c0 f000 c000573e93c0
c000a877d2a0
[  981.124132] GPR28: c128d510 ceeccf50 c000a877d2a0
c000573e90c0
[  981.124171] NIP [c0689864] sysfs_add_link_to_group+0x34/0x90
[  981.124178] LR [c09bd05c] iommu_device_link+0x5c/0x110
[  981.124184] Call Trace:
[  981.124186] [c000a878b480] [c048d630]
kmalloc_trace+0x50/0x140 (unreliable)
[  981.124193] [c000a878b4c0] [c09bd05c]
iommu_device_link+0x5c/0x110
[  981.124198] [c000a878b500] [c09ba050]
__iommu_probe_device+0x250/0x5c0
[  981.124203] [c000a878b570] [c09ba9e0]
iommu_probe_device_locked+0x30/0x90
[  981.124207] [c000a878b5a0] [c09baa80]
iommu_probe_device+0x40/0x70
[  981.124212] [c000a878b5d0] [c09baaf0]
iommu_bus_notifier+0x40/0x80
[  981.124217] [c000a878b5f0] [c019aad0]
notifier_call_chain+0xc0/0x1b0
[  981.124221] [c000a878b650] [c019b604]
blocking_notifier_call_chain+0x64/0xa0
[  981.124226] [c000a878b690] [c09cd870] bus_notify+0x50/0x80
[  981.124230] [c000a878b6d0] [c09c8f04] device_add+0x744/0x9b0
[  981.124235] [c000a878b790] [c089f2ec]
pci_device_add+0x2fc/0x880
[  981.124240] [c000a878b840] [c007ef90]
of_create_pci_dev+0x390/0xa10
[  981.124245] [c000a878b920] [c007f858]
__of_scan_bus+0x248/0x320
[  981.124249] [c000a878ba00] [c007c1f0]
pcibios_scan_phb+0x2d0/0x3c0
[  981.124254] [c000a878bad0] [c0107f08]
init_phb_dynamic+0xb8/0x110
[  981.124259] [c000a878bb40] [c00802cc03b4]
dlpar_add_slot+0x18c/0x380 [rpadlpar_io]
[  981.124265] [c000a878bbe0] [c00802cc0bec]
add_slot_store+0xa4/0x150 [rpadlpar_io]
[  981.124270] [c000a878bc70] [c0f2f800]
kobj_attr_store+0x30/0x50
[  981.124274] [c000a878bc90] [c0687368]
sysfs_kf_write+0x68/0x80
[  981.124278] 

Re: [mainline] [linux-next] [6.8-rc1] [FC] [DLPAR] OOps kernel crash after performing dlpar remove test

2024-02-01 Thread Tasmiya Nalatwad

Greetings,

I have tried reverting some latest commits and tested the issue. I see
reverting below commit hits to some other problem which was reported
earlier and the patch for fixing that issue is under review

1. Reverted commit :

 commit 17de3f5fdd35676b0e3d41c7c9bf4e3032eb3673
 iommu: Retire bus ops

2. Below are the traces of other issue that was seen after reverting
above commit, And below is the patch which fixes this issue is that is 
under review


Patch :
https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg225210.html

--- Traces ---

[  981.124047] Kernel attempted to read user page (30) - exploit
attempt? (uid: 0)
[  981.124053] BUG: Kernel NULL pointer dereference on read at 0x0030
[  981.124056] Faulting instruction address: 0xc0689864
[  981.124060] Oops: Kernel access of bad area, sig: 11 [#1]
[  981.124063] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
[  981.124067] Modules linked in: sit tunnel4 ip_tunnel rpadlpar_io
rpaphp xsk_diag nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding
tls ip_set rfkill nf_tables libcrc32c nfnetlink pseries_rng vmx_crypto
binfmt_misc ext4 mbcache jbd2 dm_service_time sd_mod t10_pi
crc64_rocksoft crc64 sg ibmvfc scsi_transport_fc ibmveth mlx5_core mlxfw
psample dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
[  981.124111] CPU: 24 PID: 78294 Comm: drmgr Kdump: loaded Not tainted
6.5.0-rc6-next-20230817-auto #1
[  981.124115] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200
0xf06 of:IBM,FW1030.30 (NH1030_062) hv:phyp pSeries
[  981.124118] NIP:  c0689864 LR: c09bd05c CTR:
c005fb90
[  981.124121] REGS: c000a878b1e0 TRAP: 0300   Not tainted
(6.5.0-rc6-next-20230817-auto)
[  981.124125] MSR:  80009033   CR:
44822422  XER: 20040006
[  981.124132] CFAR: c09bd058 DAR: 0030 DSISR:
4000 IRQMASK: 0
[  981.124132] GPR00: c09bd05c c000a878b480 c1451400

[  981.124132] GPR04: c128d510  ceeccf50
c000a878b420
[  981.124132] GPR08: 0001 ceed76e0 c2c24c28
0220
[  981.124132] GPR12: c005fb90 c01837969300 

[  981.124132] GPR16:   

[  981.124132] GPR20: c125cef0  c125cf08
c2bce500
[  981.124132] GPR24: c000573e90c0 f000 c000573e93c0
c000a877d2a0
[  981.124132] GPR28: c128d510 ceeccf50 c000a877d2a0
c000573e90c0
[  981.124171] NIP [c0689864] sysfs_add_link_to_group+0x34/0x90
[  981.124178] LR [c09bd05c] iommu_device_link+0x5c/0x110
[  981.124184] Call Trace:
[  981.124186] [c000a878b480] [c048d630]
kmalloc_trace+0x50/0x140 (unreliable)
[  981.124193] [c000a878b4c0] [c09bd05c]
iommu_device_link+0x5c/0x110
[  981.124198] [c000a878b500] [c09ba050]
__iommu_probe_device+0x250/0x5c0
[  981.124203] [c000a878b570] [c09ba9e0]
iommu_probe_device_locked+0x30/0x90
[  981.124207] [c000a878b5a0] [c09baa80]
iommu_probe_device+0x40/0x70
[  981.124212] [c000a878b5d0] [c09baaf0]
iommu_bus_notifier+0x40/0x80
[  981.124217] [c000a878b5f0] [c019aad0]
notifier_call_chain+0xc0/0x1b0
[  981.124221] [c000a878b650] [c019b604]
blocking_notifier_call_chain+0x64/0xa0
[  981.124226] [c000a878b690] [c09cd870] bus_notify+0x50/0x80
[  981.124230] [c000a878b6d0] [c09c8f04] device_add+0x744/0x9b0
[  981.124235] [c000a878b790] [c089f2ec]
pci_device_add+0x2fc/0x880
[  981.124240] [c000a878b840] [c007ef90]
of_create_pci_dev+0x390/0xa10
[  981.124245] [c000a878b920] [c007f858]
__of_scan_bus+0x248/0x320
[  981.124249] [c000a878ba00] [c007c1f0]
pcibios_scan_phb+0x2d0/0x3c0
[  981.124254] [c000a878bad0] [c0107f08]
init_phb_dynamic+0xb8/0x110
[  981.124259] [c000a878bb40] [c00802cc03b4]
dlpar_add_slot+0x18c/0x380 [rpadlpar_io]
[  981.124265] [c000a878bbe0] [c00802cc0bec]
add_slot_store+0xa4/0x150 [rpadlpar_io]
[  981.124270] [c000a878bc70] [c0f2f800]
kobj_attr_store+0x30/0x50
[  981.124274] [c000a878bc90] [c0687368]
sysfs_kf_write+0x68/0x80
[  981.124278] [c000a878bcb0] [c0685d3c]
kernfs_fop_write_iter+0x1cc/0x280
[  981.124283] [c000a878bd00] [c05909c8] vfs_write+0x358/0x4b0
[  981.124288] [c000a878bdc0] [c0590cfc] ksys_write+0x7c/0x140
[  981.124293] [c000a878be10] [c0036554]
system_call_exception+0x134/0x330
[  981.124298] [c000a878be50] [c000d6a0]
system_call_common+0x160/0x2e4
[  981.124303] --- interrupt: c00 at 0x200013f21594
[  981.124306] NIP:  200013f21594 LR: 

Re: [mainline] [linux-next] [6.8-rc1] [FC] [DLPAR] OOps kernel crash after performing dlpar remove test

2024-02-01 Thread Tasmiya Nalatwad

Greetings,

I have tried reverting some latest commits and tested the issue. I see 
reverting below commit hits to some other problem which was reported 
earlier and the patch for fixing that issue is under review


1. Reverted commit :

    commit 17de3f5fdd35676b0e3d41c7c9bf4e3032eb3673
    iommu: Retire bus ops

2. Below are the traces of other issue that was seen after reverting 
above commit, And the patch which fixes this issue is under review


Patch : 
https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg225210.html


--- Traces ---

[  981.124047] Kernel attempted to read user page (30) - exploit 
attempt? (uid: 0)

[  981.124053] BUG: Kernel NULL pointer dereference on read at 0x0030
[  981.124056] Faulting instruction address: 0xc0689864
[  981.124060] Oops: Kernel access of bad area, sig: 11 [#1]
[  981.124063] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
[  981.124067] Modules linked in: sit tunnel4 ip_tunnel rpadlpar_io 
rpaphp xsk_diag nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib 
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct 
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding 
tls ip_set rfkill nf_tables libcrc32c nfnetlink pseries_rng vmx_crypto 
binfmt_misc ext4 mbcache jbd2 dm_service_time sd_mod t10_pi 
crc64_rocksoft crc64 sg ibmvfc scsi_transport_fc ibmveth mlx5_core mlxfw 
psample dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
[  981.124111] CPU: 24 PID: 78294 Comm: drmgr Kdump: loaded Not tainted 
6.5.0-rc6-next-20230817-auto #1
[  981.124115] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 
0xf06 of:IBM,FW1030.30 (NH1030_062) hv:phyp pSeries
[  981.124118] NIP:  c0689864 LR: c09bd05c CTR: 
c005fb90
[  981.124121] REGS: c000a878b1e0 TRAP: 0300   Not tainted 
(6.5.0-rc6-next-20230817-auto)
[  981.124125] MSR:  80009033   CR: 
44822422  XER: 20040006
[  981.124132] CFAR: c09bd058 DAR: 0030 DSISR: 
4000 IRQMASK: 0
[  981.124132] GPR00: c09bd05c c000a878b480 c1451400 

[  981.124132] GPR04: c128d510  ceeccf50 
c000a878b420
[  981.124132] GPR08: 0001 ceed76e0 c2c24c28 
0220
[  981.124132] GPR12: c005fb90 c01837969300  

[  981.124132] GPR16:    

[  981.124132] GPR20: c125cef0  c125cf08 
c2bce500
[  981.124132] GPR24: c000573e90c0 f000 c000573e93c0 
c000a877d2a0
[  981.124132] GPR28: c128d510 ceeccf50 c000a877d2a0 
c000573e90c0

[  981.124171] NIP [c0689864] sysfs_add_link_to_group+0x34/0x90
[  981.124178] LR [c09bd05c] iommu_device_link+0x5c/0x110
[  981.124184] Call Trace:
[  981.124186] [c000a878b480] [c048d630] 
kmalloc_trace+0x50/0x140 (unreliable)
[  981.124193] [c000a878b4c0] [c09bd05c] 
iommu_device_link+0x5c/0x110
[  981.124198] [c000a878b500] [c09ba050] 
__iommu_probe_device+0x250/0x5c0
[  981.124203] [c000a878b570] [c09ba9e0] 
iommu_probe_device_locked+0x30/0x90
[  981.124207] [c000a878b5a0] [c09baa80] 
iommu_probe_device+0x40/0x70
[  981.124212] [c000a878b5d0] [c09baaf0] 
iommu_bus_notifier+0x40/0x80
[  981.124217] [c000a878b5f0] [c019aad0] 
notifier_call_chain+0xc0/0x1b0
[  981.124221] [c000a878b650] [c019b604] 
blocking_notifier_call_chain+0x64/0xa0

[  981.124226] [c000a878b690] [c09cd870] bus_notify+0x50/0x80
[  981.124230] [c000a878b6d0] [c09c8f04] device_add+0x744/0x9b0
[  981.124235] [c000a878b790] [c089f2ec] 
pci_device_add+0x2fc/0x880
[  981.124240] [c000a878b840] [c007ef90] 
of_create_pci_dev+0x390/0xa10
[  981.124245] [c000a878b920] [c007f858] 
__of_scan_bus+0x248/0x320
[  981.124249] [c000a878ba00] [c007c1f0] 
pcibios_scan_phb+0x2d0/0x3c0
[  981.124254] [c000a878bad0] [c0107f08] 
init_phb_dynamic+0xb8/0x110
[  981.124259] [c000a878bb40] [c00802cc03b4] 
dlpar_add_slot+0x18c/0x380 [rpadlpar_io]
[  981.124265] [c000a878bbe0] [c00802cc0bec] 
add_slot_store+0xa4/0x150 [rpadlpar_io]
[  981.124270] [c000a878bc70] [c0f2f800] 
kobj_attr_store+0x30/0x50
[  981.124274] [c000a878bc90] [c0687368] 
sysfs_kf_write+0x68/0x80
[  981.124278] [c000a878bcb0] [c0685d3c] 
kernfs_fop_write_iter+0x1cc/0x280

[  981.124283] [c000a878bd00] [c05909c8] vfs_write+0x358/0x4b0
[  981.124288] [c000a878bdc0] [c0590cfc] ksys_write+0x7c/0x140
[  981.124293] [c000a878be10] [c0036554] 
system_call_exception+0x134/0x330
[  981.124298] [c000a878be50] [c000d6a0] 
system_call_common+0x160/0x2e4

[  981.124303] --- interrupt: c00 at 0x200013f21594
[  981.124306] 

Re: [mainline] [linux-next] [6.8-rc1] [FC] [DLPAR] OOps kernel crash after performing dlpar remove test

2024-01-31 Thread Robin Murphy

On 2024-01-31 9:19 am, Tasmiya Nalatwad wrote:

Greetings,

[mainline] [linux-next] [6.8-rc1] [DLPAR] OOps kernel crash after 
performing dlpar remove test


--- Traces ---

[58563.146236] BUG: Unable to handle kernel data access at 
0x6b6b6b6b6b6b6b83

[58563.146242] Faulting instruction address: 0xc09c0e60
[58563.146248] Oops: Kernel access of bad area, sig: 11 [#1]
[58563.146252] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries
[58563.146258] Modules linked in: isofs cdrom dm_snapshot dm_bufio 
dm_round_robin dm_queue_length exfat vfat fat btrfs blake2b_generic xor 
raid6_pq zstd_compress loop xfs libcrc32c raid0 nvram rpadlpar_io rpaphp 
nfnetlink xsk_diag bonding tls rfkill sunrpc dm_service_time 
dm_multipath dm_mod pseries_rng vmx_crypto binfmt_misc ext4 mbcache jbd2 
sd_mod sg ibmvscsi scsi_transport_srp ibmveth lpfc nvmet_fc nvmet 
nvme_fc nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 
scsi_transport_fc fuse
[58563.146326] CPU: 0 PID: 1071247 Comm: drmgr Kdump: loaded Not tainted 
6.8.0-rc1-auto-gecb1b8288dc7 #1
[58563.146332] Hardware name: IBM,9009-42A POWER9 (raw) 0x4e0202 
0xf05 of:IBM,FW950.A0 (VL950_141) hv:phyp pSeries
[58563.146337] NIP:  c09c0e60 LR: c09c0e28 CTR: 
c09c1584
[58563.146342] REGS: c0007960f260 TRAP: 0380   Not tainted 
(6.8.0-rc1-auto-gecb1b8288dc7)
[58563.146347] MSR:  80009033   CR: 
24822424  XER: 20040006

[58563.146360] CFAR: c09c0e74 IRQMASK: 0
[58563.146360] GPR00: c09c0e28 c0007960f500 c1482600 
c3050540
[58563.146360] GPR04:  c0089a6870c0 0001 
fffe
[58563.146360] GPR08: c2bac020 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 
0220
[58563.146360] GPR12: 2000 c308  

[58563.146360] GPR16:    
0001
[58563.146360] GPR20: c1281478  c1281490 
c2bfed80
[58563.146360] GPR24: c0089a6870c0   
c2b9ffb8
[58563.146360] GPR28:  c2bac0e8  


[58563.146421] NIP [c09c0e60] iommu_ops_from_fwnode+0x68/0x118
[58563.146430] LR [c09c0e28] iommu_ops_from_fwnode+0x30/0x118


This implies that iommu_device_list has become corrupted. Looks like 
spapr_tce_setup_phb_iommus_initcall() registers an iommu_device which 
pcibios_free_controller() could free if a PCI controller is removed, but 
there's no path anywhere to ever unregister any of those IOMMUs. 
Presumably this also means that is a PCI controller is dynamically added 
after init, its IOMMU won't be set up properly either.


Thanks,
Robin.


[58563.146437] Call Trace:
[58563.146439] [c0007960f500] [c0007960f560] 0xc0007960f560 
(unreliable)
[58563.146446] [c0007960f530] [c09c0fd0] 
__iommu_probe_device+0xc0/0x5c0
[58563.146454] [c0007960f5a0] [c09c151c] 
iommu_probe_device+0x4c/0xb4
[58563.146462] [c0007960f5e0] [c09c15d0] 
iommu_bus_notifier+0x4c/0x8c
[58563.146469] [c0007960f600] [c019e3d0] 
notifier_call_chain+0xb8/0x1a0
[58563.146476] [c0007960f660] [c019eea0] 
blocking_notifier_call_chain+0x64/0x94

[58563.146483] [c0007960f6a0] [c09d3c5c] bus_notify+0x50/0x7c
[58563.146491] [c0007960f6e0] [c09cfba4] device_add+0x774/0x9bc
[58563.146498] [c0007960f7a0] [c08abe9c] 
pci_device_add+0x2f4/0x864
[58563.146506] [c0007960f850] [c007d5a0] 
of_create_pci_dev+0x390/0xa08
[58563.146514] [c0007960f930] [c007de68] 
__of_scan_bus+0x250/0x328
[58563.146520] [c0007960fa10] [c007a680] 
pcibios_scan_phb+0x274/0x3c0
[58563.146527] [c0007960fae0] [c0105d58] 
init_phb_dynamic+0xb8/0x110
[58563.146535] [c0007960fb50] [c008217b0380] 
dlpar_add_slot+0x170/0x3b4 [rpadlpar_io]
[58563.146544] [c0007960fbf0] [c008217b0ca0] 
add_slot_store+0xa4/0x140 [rpadlpar_io]
[58563.146551] [c0007960fc80] [c0f3dbec] 
kobj_attr_store+0x30/0x4c
[58563.146559] [c0007960fca0] [c06931fc] 
sysfs_kf_write+0x68/0x7c
[58563.146566] [c0007960fcc0] [c0691b2c] 
kernfs_fop_write_iter+0x1c8/0x278

[58563.146573] [c0007960fd10] [c0599f54] vfs_write+0x340/0x4cc
[58563.146580] [c0007960fdc0] [c059a2bc] ksys_write+0x7c/0x140
[58563.146587] [c0007960fe10] [c0035d74] 
system_call_exception+0x134/0x330
[58563.146595] [c0007960fe50] [c000d6a0] 
system_call_common+0x160/0x2e4

[58563.146602] --- interrupt: c00 at 0x24470cb4
[58563.146606] NIP:  24470cb4 LR: 243e7d04 CTR: 

[58563.146611] REGS: c0007960fe80 TRAP: 0c00   Not tainted 
(6.8.0-rc1-auto-gecb1b8288dc7)
[58563.146616] MSR:  8280f033 
  CR: 24000282  XER: 

[58563.146632] IRQMASK: 0
[58563.146632] GPR00: 

[mainline] [linux-next] [6.8-rc1] [FC] [DLPAR] OOps kernel crash after performing dlpar remove test

2024-01-31 Thread Tasmiya Nalatwad

Greetings,

[mainline] [linux-next] [6.8-rc1] [DLPAR] OOps kernel crash after 
performing dlpar remove test


--- Traces ---

[58563.146236] BUG: Unable to handle kernel data access at 
0x6b6b6b6b6b6b6b83

[58563.146242] Faulting instruction address: 0xc09c0e60
[58563.146248] Oops: Kernel access of bad area, sig: 11 [#1]
[58563.146252] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries
[58563.146258] Modules linked in: isofs cdrom dm_snapshot dm_bufio 
dm_round_robin dm_queue_length exfat vfat fat btrfs blake2b_generic xor 
raid6_pq zstd_compress loop xfs libcrc32c raid0 nvram rpadlpar_io rpaphp 
nfnetlink xsk_diag bonding tls rfkill sunrpc dm_service_time 
dm_multipath dm_mod pseries_rng vmx_crypto binfmt_misc ext4 mbcache jbd2 
sd_mod sg ibmvscsi scsi_transport_srp ibmveth lpfc nvmet_fc nvmet 
nvme_fc nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 
scsi_transport_fc fuse
[58563.146326] CPU: 0 PID: 1071247 Comm: drmgr Kdump: loaded Not tainted 
6.8.0-rc1-auto-gecb1b8288dc7 #1
[58563.146332] Hardware name: IBM,9009-42A POWER9 (raw) 0x4e0202 
0xf05 of:IBM,FW950.A0 (VL950_141) hv:phyp pSeries
[58563.146337] NIP:  c09c0e60 LR: c09c0e28 CTR: 
c09c1584
[58563.146342] REGS: c0007960f260 TRAP: 0380   Not tainted 
(6.8.0-rc1-auto-gecb1b8288dc7)
[58563.146347] MSR:  80009033   CR: 
24822424  XER: 20040006

[58563.146360] CFAR: c09c0e74 IRQMASK: 0
[58563.146360] GPR00: c09c0e28 c0007960f500 c1482600 
c3050540
[58563.146360] GPR04:  c0089a6870c0 0001 
fffe
[58563.146360] GPR08: c2bac020 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 
0220
[58563.146360] GPR12: 2000 c308  

[58563.146360] GPR16:    
0001
[58563.146360] GPR20: c1281478  c1281490 
c2bfed80
[58563.146360] GPR24: c0089a6870c0   
c2b9ffb8
[58563.146360] GPR28:  c2bac0e8  


[58563.146421] NIP [c09c0e60] iommu_ops_from_fwnode+0x68/0x118
[58563.146430] LR [c09c0e28] iommu_ops_from_fwnode+0x30/0x118
[58563.146437] Call Trace:
[58563.146439] [c0007960f500] [c0007960f560] 0xc0007960f560 
(unreliable)
[58563.146446] [c0007960f530] [c09c0fd0] 
__iommu_probe_device+0xc0/0x5c0
[58563.146454] [c0007960f5a0] [c09c151c] 
iommu_probe_device+0x4c/0xb4
[58563.146462] [c0007960f5e0] [c09c15d0] 
iommu_bus_notifier+0x4c/0x8c
[58563.146469] [c0007960f600] [c019e3d0] 
notifier_call_chain+0xb8/0x1a0
[58563.146476] [c0007960f660] [c019eea0] 
blocking_notifier_call_chain+0x64/0x94

[58563.146483] [c0007960f6a0] [c09d3c5c] bus_notify+0x50/0x7c
[58563.146491] [c0007960f6e0] [c09cfba4] device_add+0x774/0x9bc
[58563.146498] [c0007960f7a0] [c08abe9c] 
pci_device_add+0x2f4/0x864
[58563.146506] [c0007960f850] [c007d5a0] 
of_create_pci_dev+0x390/0xa08
[58563.146514] [c0007960f930] [c007de68] 
__of_scan_bus+0x250/0x328
[58563.146520] [c0007960fa10] [c007a680] 
pcibios_scan_phb+0x274/0x3c0
[58563.146527] [c0007960fae0] [c0105d58] 
init_phb_dynamic+0xb8/0x110
[58563.146535] [c0007960fb50] [c008217b0380] 
dlpar_add_slot+0x170/0x3b4 [rpadlpar_io]
[58563.146544] [c0007960fbf0] [c008217b0ca0] 
add_slot_store+0xa4/0x140 [rpadlpar_io]
[58563.146551] [c0007960fc80] [c0f3dbec] 
kobj_attr_store+0x30/0x4c
[58563.146559] [c0007960fca0] [c06931fc] 
sysfs_kf_write+0x68/0x7c
[58563.146566] [c0007960fcc0] [c0691b2c] 
kernfs_fop_write_iter+0x1c8/0x278

[58563.146573] [c0007960fd10] [c0599f54] vfs_write+0x340/0x4cc
[58563.146580] [c0007960fdc0] [c059a2bc] ksys_write+0x7c/0x140
[58563.146587] [c0007960fe10] [c0035d74] 
system_call_exception+0x134/0x330
[58563.146595] [c0007960fe50] [c000d6a0] 
system_call_common+0x160/0x2e4

[58563.146602] --- interrupt: c00 at 0x24470cb4
[58563.146606] NIP:  24470cb4 LR: 243e7d04 CTR: 

[58563.146611] REGS: c0007960fe80 TRAP: 0c00   Not tainted 
(6.8.0-rc1-auto-gecb1b8288dc7)
[58563.146616] MSR:  8280f033 
  CR: 24000282  XER: 

[58563.146632] IRQMASK: 0
[58563.146632] GPR00: 0004 7fffd3993420 24557300 
0007
[58563.146632] GPR04: 01000d8a5270 0006 fbad2c80 
01000d8a02a0
[58563.146632] GPR08: 0001   

[58563.146632] GPR12:  2422bb50  

[58563.146632] GPR16:    

[58563.146632] GPR20: