Hello, On Fri, Apr 19, 2024 at 04:12:46PM +1000, Michael Ellerman wrote: > Gaurav Batra <gba...@linux.ibm.com> writes: > > At the time of LPAR reboot, partition firmware provides Open Firmware > > property ibm,dma-window for the PE. This property is provided on the PCI > > bus the PE is attached to. > > AFAICS you're actually describing a bug that happens during boot *up*? > > Describing it as "reboot" makes me think you're talking about the > shutdown path. I think that will confuse people, me at least :)
there is probably an assumption that it must have been running previously for the errors to happen in the first place but given the error state persists for a day it may be a very long 'reboot'. Thanks Michal > > cheers > > > There are execptions where the partition firmware might not provide this > > property for the PE at the time of LPAR reboot. One of the scenario is > > where the firmware has frozen the PE due to some error conditions. This > > PE is frozen for 24 hours or unless the whole system is reinitialized. > > > > Within this time frame, if the LPAR is rebooted, the frozen PE will be > > presented to the LPAR but ibm,dma-window property could be missing. > > > > Today, under these circumstances, the LPAR oopses with NULL pointer > > dereference, when configuring the PCI bus the PE is attached to. > > > > BUG: Kernel NULL pointer dereference on read at 0x000000c8 > > Faulting instruction address: 0xc0000000001024c0 > > Oops: Kernel access of bad area, sig: 7 [#1] > > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries > > Modules linked in: > > Supported: Yes > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1 > > Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf000006 > > of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries > > NIP: c0000000001024c0 LR: c0000000001024b0 CTR: c000000000102450 > > REGS: c0000000037db5c0 TRAP: 0300 Not tainted (6.4.0-150600.9-default) > > MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 28000822 XER: > > 00000000 > > CFAR: c00000000010254c DAR: 00000000000000c8 DSISR: 00080000 IRQMASK: 0 > > ... > > NIP [c0000000001024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0 > > LR [c0000000001024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 > > Call Trace: > > pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable) > > pcibios_setup_bus_self+0x1c0/0x370 > > __of_scan_bus+0x2f8/0x330 > > pcibios_scan_phb+0x280/0x3d0 > > pcibios_init+0x88/0x12c > > do_one_initcall+0x60/0x320 > > kernel_init_freeable+0x344/0x3e4 > > kernel_init+0x34/0x1d0 > > ret_from_kernel_user_thread+0x14/0x1c > > > > Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of > > ibm,dma-window") > > Signed-off-by: Gaurav Batra <gba...@linux.ibm.com> > > --- > > arch/powerpc/platforms/pseries/iommu.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/arch/powerpc/platforms/pseries/iommu.c > > b/arch/powerpc/platforms/pseries/iommu.c > > index e8c4129697b1..e808d5b1fa49 100644 > > --- a/arch/powerpc/platforms/pseries/iommu.c > > +++ b/arch/powerpc/platforms/pseries/iommu.c > > @@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus > > *bus) > > * parent bus. During reboot, there will be ibm,dma-window property to > > * define DMA window. For kdump, there will at least be default window > > or DDW > > * or both. > > + * There is an exception to the above. In case the PE goes into frozen > > + * state, firmware may not provide ibm,dma-window property at the time > > + * of LPAR reboot. > > */ > > > > + if (!pdn) { > > + pr_debug(" no ibm,dma-window property !\n"); > > + return; > > + } > > + > > ppci = PCI_DN(pdn); > > > > pr_debug(" parent is %pOF, iommu_table: 0x%p\n", > > > > base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702 > > -- > > 2.39.3 (Apple Git-146)