On Wed, 27 Mar 2019 12:18:31 +0530 Aravinda Prasad <aravi...@linux.vnet.ibm.com> wrote:
> On Tuesday 26 March 2019 03:35 PM, Greg Kurz wrote: > > On Tue, 26 Mar 2019 14:45:57 +0530 > > Aravinda Prasad <aravi...@linux.vnet.ibm.com> wrote: > > > >> On Tuesday 26 March 2019 02:00 PM, Greg Kurz wrote: > >>> On Tue, 26 Mar 2019 10:32:35 +1100 > >>> David Gibson <da...@gibson.dropbear.id.au> wrote: > >>> > >>>> On Mon, Mar 25, 2019 at 01:56:50PM +0530, Aravinda Prasad wrote: > >>>>> > >>>>> > >>>>> On Monday 25 March 2019 12:00 PM, David Gibson wrote: > >>>>>> On Fri, Mar 22, 2019 at 12:04:07PM +0530, Aravinda Prasad wrote: > >>>>>>> This patch builds the rtas error log, copies it to the > >>>>>>> rtas_addr and then invokes the guest registered machine > >>>>>>> check handler. > >>>>>> > >>>>>> This commit message needs more context. When is this occurring, why > >>>>>> do we need this? > >>>>>> > >>>>>> [I can answer those questions now, but whether I - or anyone else - > >>>>>> will be able to looking back at this commit from years in the future > >>>>>> is a different question] > >>>>> > >>>>> will add more info. > >>>> > >>>> Thanks. > >>>> > >>>> [snip] > >>>>>>> +static uint64_t spapr_get_rtas_addr(void) > >>>>>>> +{ > >>>>>>> + SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine()); > >>>>>>> + int rtas_node; > >>>>>>> + const struct fdt_property *rtas_addr_prop; > >>>>>>> + void *fdt = spapr->fdt_blob; > >>>>>>> + uint32_t rtas_addr; > >>>>>>> + > >>>>>>> + /* fetch rtas addr from fdt */ > >>>>>>> + rtas_node = fdt_path_offset(fdt, "/rtas"); > >>>>>>> + g_assert(rtas_node >= 0); > >>>>>>> + > >>>>>>> + rtas_addr_prop = fdt_get_property(fdt, rtas_node, > >>>>>>> "linux,rtas-base", NULL); > >>>>>>> + g_assert(rtas_addr_prop); > >>>>>>> + > >>>>>>> + rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data); > >>>>>>> + return (uint64_t)rtas_addr; > >>>>>> > >>>>>> It seems a bit roundabout to pull the rtas address out of the device > >>>>>> tree, since it was us that put it in there in the first place. > >>>>> > >>>>> Slof can change the rtas address. So we need to get the updated rtas > >>>>> address. > >>>> > >>>> Ah, ok. > >>>> > >>> > >>> Yeah, and knowing that the DT is guest originated makes me a bit > >>> nervous when I see the g_assert()... a misbehaving guest could > >>> possibly abort QEMU. Either there should be some sanity checks > >>> performed earlier or an non-fatal error path should be added in > >>> this function IMHO. > >> > >> Is it not the QEMU that builds the DT and provides it to the guest? > >> > > > > Yes, but then the guest can push a new DT with the KVMPPC_H_UPDATE_DT hcall. > > We only do some minimalist sanity checks in h_update_dt(). I don't think > > we want to abort QEMU because the guest sent a DT where "linux,rtas-base" > > is missing for example. > > > >> Also, spapr_get_rtas_addr() is called during physical memory corruption > >> which is a fatal error. > > BTW... IIUC this can also occur if the guest created duplicate SLB entries, which isn't fatal from an hypervisor perspective. > > Not that fatal since we care to report it to the guest. > > True, but if guest does not provide rtas_addr then I am not getting the > point on why terminating the QEMU instance (which actually terminates > the guest) is a problem. Am I missing something? > This isn't terminating QEMU, this is aborting QEMU, which should only ever happen in case of a QEMU bug or g_malloc() failure. LoPAPR v1.1 7.3.14 Firmware Assisted Non-Maskable Interrupts Option (FWNMI) doesn't describe this specific case of course. However, it gives a hint on what to do in case of a fatal condition where it isn't possible to notify the guest: "...the firmware has no choice but to declare the condition fatal, log the result and execute the partition’s reboot policy." ie, either triggering a system_reset or exiting QEMU. > > > >> So, if we cannot fetch rtas_addr (required to > >> build and pass the error info to the guest) then I think we should abort. > >> > > > > Maybe we cannot do anything better at this point, but then we should > > do some earlier checks and switch to the old machine check behaviour > > if what we need is missing from the updated DT for example. > > We can do some checks earlier, may be during fwnmi registration to see > if rtas entry is missing. Again, the guest can possibly update DT after > fwnmi registration. > Hmm... you're right. Maybe also add these checks to h_update_dt() if the guest registered fwnmi ? Or even forbid DT updates since we really expect this to occur only once from SLOF between machine resets ? In which case, it would be appropriate to keep the asserts in spapr_get_rtas_addr(). > But I think we cannot switch to old machine check behavior if we cannot > fetch the rtas_addr, because according to PAPR the guest would have > relinquished 0x200 vector to the firmware when fwnmi is registered. So > we cannot expect the guest to handle 0x200 interrupt. > I wonder how kexec fits in the picture... I mean what if the guest switches from an FWNMI-capable kernel to an FWNMI-not-capable one ? There's no machine reset in this case... > > > > >>> > >> > > >