On Thu, May 13, 2021 at 2:22 PM David Gibson <da...@gibson.dropbear.id.au> wrote: > > On Wed, May 05, 2021 at 08:18:27PM +0530, Mahesh Salgaonkar wrote: > > With upstream kernel, especially after commit 98ba956f6a389 > > ("powerpc/pseries/eeh: Rework device EEH PE determination") we see that KVM > > guest isn't able to enable EEH option for PCI pass-through devices anymore. > > > > [root@atest-guest ~]# dmesg | grep EEH > > [ 0.032337] EEH: pSeries platform initialized > > [ 0.298207] EEH: No capable adapters found: recovery disabled. > > [root@atest-guest ~]# > > > > So far the linux kernel was assuming pe_config_addr equal to device's > > config_addr and using it to enable EEH on the PE through ibm,set-eeh-option > > RTAS call. Which wasn't the correct way as per PAPR. The linux kernel > > commit 98ba956f6a389 fixed this flow. With that fixed, linux now uses PE > > config address returned by ibm,get-config-addr-info2 RTAS call to enable > > EEH option per-PE basis instead of per-device basis. However this has > > uncovered a bug in qemu where ibm,set-eeh-option is treating PE config > > address as per-device config address. > > Huh. To be fair, the stuff about this in PAPR is nearly > incomprehensible, so we probably used what the kernel was doing as a > guide instead.
I found the PAPR documentation made some sense after I learned how EEH was handled on PCI(-X) systems. What's in Linux never made sense, unfortunately. > Hmm.. shouldn't we at least check that the supplied config_addr > matches the one it should be for this PHB, rather than just ignoring > it? I think that'd cause issues with older kernels. Prior to the rework mentioned by Mahesh (linux commit 98ba956f6a389 ("powerpc/pseries/eeh: Rework device EEH PE determination")) the kernel would call eeh-set-option for each device in the PE using the device's config_address as the argument rather than the PE address. If we return an error from eeh-set-option when the argument isn't a valid PE address then older kernels will interpret that as EEH not being supported. That really needs to be called out in a comment though. Preferably with kernel version numbers, etc. > ..and, looking back at rtas_ibm_get_config_addr_info2(), I think > that looks wrong in the case of PCI bridges. AFAICT it gives an > address that depends on the bus, but in other places we assume that > the entire PHB is a single PE on the guest side, so it really > shouldn't. Yep, get_config_addr_info2 should map every device inside that PE to the same PE address, even when they're on child busses. That said, I'm not sure how well EEH works when there's a mix of real (vfio) and emulated (qemu bridges) devices in the same PHB. Can VFIO pass through a bridge?