On Wednesday 27 March 2019 08:16 PM, Greg Kurz wrote:
> On Wed, 27 Mar 2019 12:18:31 +0530
> Aravinda Prasad <aravi...@linux.vnet.ibm.com> wrote:
>
>> On Tuesday 26 March 2019 03:35 PM, Greg Kurz wrote:
>>> On Tue, 26 Mar 2019 14:45:57 +0530
>>> Aravinda Prasad <aravi...@linux.vnet.ibm.com> wrote:
>>>
>>>> On Tuesday 26 March 2019 02:00 PM, Greg Kurz wrote:
>>>>> On Tue, 26 Mar 2019 10:32:35 +1100
>>>>> David Gibson <da...@gibson.dropbear.id.au> wrote:
>>>>>
>>>>>> On Mon, Mar 25, 2019 at 01:56:50PM +0530, Aravinda Prasad wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Monday 25 March 2019 12:00 PM, David Gibson wrote:
>>>>>>>> On Fri, Mar 22, 2019 at 12:04:07PM +0530, Aravinda Prasad wrote:
>>>>>>>>> This patch builds the rtas error log, copies it to the
>>>>>>>>> rtas_addr and then invokes the guest registered machine
>>>>>>>>> check handler.
>>>>>>>>
>>>>>>>> This commit message needs more context. When is this occurring, why
>>>>>>>> do we need this?
>>>>>>>>
>>>>>>>> [I can answer those questions now, but whether I - or anyone else -
>>>>>>>> will be able to looking back at this commit from years in the future
>>>>>>>> is a different question]
>>>>>>>
>>>>>>> will add more info.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> [snip]
>>>>>>>>> +static uint64_t spapr_get_rtas_addr(void)
>>>>>>>>> +{
>>>>>>>>> + SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>>>>>>> + int rtas_node;
>>>>>>>>> + const struct fdt_property *rtas_addr_prop;
>>>>>>>>> + void *fdt = spapr->fdt_blob;
>>>>>>>>> + uint32_t rtas_addr;
>>>>>>>>> +
>>>>>>>>> + /* fetch rtas addr from fdt */
>>>>>>>>> + rtas_node = fdt_path_offset(fdt, "/rtas");
>>>>>>>>> + g_assert(rtas_node >= 0);
>>>>>>>>> +
>>>>>>>>> + rtas_addr_prop = fdt_get_property(fdt, rtas_node,
>>>>>>>>> "linux,rtas-base", NULL);
>>>>>>>>> + g_assert(rtas_addr_prop);
>>>>>>>>> +
>>>>>>>>> + rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
>>>>>>>>> + return (uint64_t)rtas_addr;
>>>>>>>>
>>>>>>>> It seems a bit roundabout to pull the rtas address out of the device
>>>>>>>> tree, since it was us that put it in there in the first place.
>>>>>>>
>>>>>>> Slof can change the rtas address. So we need to get the updated rtas
>>>>>>> address.
>>>>>>
>>>>>> Ah, ok.
>>>>>>
>>>>>
>>>>> Yeah, and knowing that the DT is guest originated makes me a bit
>>>>> nervous when I see the g_assert()... a misbehaving guest could
>>>>> possibly abort QEMU. Either there should be some sanity checks
>>>>> performed earlier or an non-fatal error path should be added in
>>>>> this function IMHO.
>>>>
>>>> Is it not the QEMU that builds the DT and provides it to the guest?
>>>>
>>>
>>> Yes, but then the guest can push a new DT with the KVMPPC_H_UPDATE_DT hcall.
>>> We only do some minimalist sanity checks in h_update_dt(). I don't think
>>> we want to abort QEMU because the guest sent a DT where "linux,rtas-base"
>>> is missing for example.
>>>
>>>> Also, spapr_get_rtas_addr() is called during physical memory corruption
>>>> which is a fatal error.
>>>
>
> BTW... IIUC this can also occur if the guest created duplicate SLB entries,
> which isn't fatal from an hypervisor perspective.
yes.. right..
>
>>> Not that fatal since we care to report it to the guest.
>>
>> True, but if guest does not provide rtas_addr then I am not getting the
>> point on why terminating the QEMU instance (which actually terminates
>> the guest) is a problem. Am I missing something?
>>
>
> This isn't terminating QEMU, this is aborting QEMU, which should only
> ever happen in case of a QEMU bug or g_malloc() failure.
>
> LoPAPR v1.1 7.3.14 Firmware Assisted Non-Maskable Interrupts Option (FWNMI)
> doesn't describe this specific case of course. However, it gives a hint on
> what to do in case of a fatal condition where it isn't possible to notify
> the guest:
>
> "...the firmware has no choice but to declare the condition
> fatal, log the result and execute the partition’s reboot policy."
>
> ie, either triggering a system_reset or exiting QEMU.
Now I see the point. I will modify it to trigger system_reset instead of
aborting.
>
>>>
>>>> So, if we cannot fetch rtas_addr (required to
>>>> build and pass the error info to the guest) then I think we should abort.
>>>>
>>>
>>> Maybe we cannot do anything better at this point, but then we should
>>> do some earlier checks and switch to the old machine check behaviour
>>> if what we need is missing from the updated DT for example.
>>
>> We can do some checks earlier, may be during fwnmi registration to see
>> if rtas entry is missing. Again, the guest can possibly update DT after
>> fwnmi registration.
>>
>
> Hmm... you're right. Maybe also add these checks to h_update_dt() if the
> guest registered fwnmi ? Or even forbid DT updates since we really expect
> this to occur only once from SLOF between machine resets ? In which case,
> it would be appropriate to keep the asserts in spapr_get_rtas_addr().
As you pointed out above, we can trigger system_reset if we fail to
fetch rtas_addr. I think this approach is simple and straight forward
and will not forbid guests from updating the DT.
>
>> But I think we cannot switch to old machine check behavior if we cannot
>> fetch the rtas_addr, because according to PAPR the guest would have
>> relinquished 0x200 vector to the firmware when fwnmi is registered. So
>> we cannot expect the guest to handle 0x200 interrupt.
>>
>
> I wonder how kexec fits in the picture... I mean what if the guest switches
> from an FWNMI-capable kernel to an FWNMI-not-capable one ? There's no machine
> reset in this case...
Ah.. not sure I need to check. So in this case QEMU will never know that
the guest rebooted into an FWNMI-not-capable kexec kernel.
>
>>
>>>
>>>>>
>>>>
>>>
>>
>
--
Regards,
Aravinda