On Thursday 28 March 2019 06:20 AM, David Gibson wrote:
> On Wed, Mar 27, 2019 at 12:18:31PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Tuesday 26 March 2019 03:35 PM, Greg Kurz wrote:
>>> On Tue, 26 Mar 2019 14:45:57 +0530
>>> Aravinda Prasad <aravi...@linux.vnet.ibm.com> wrote:
>>>
>>>> On Tuesday 26 March 2019 02:00 PM, Greg Kurz wrote:
>>>>> On Tue, 26 Mar 2019 10:32:35 +1100
>>>>> David Gibson <da...@gibson.dropbear.id.au> wrote:
>>>>>
>>>>>> On Mon, Mar 25, 2019 at 01:56:50PM +0530, Aravinda Prasad wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Monday 25 March 2019 12:00 PM, David Gibson wrote:
>>>>>>>> On Fri, Mar 22, 2019 at 12:04:07PM +0530, Aravinda Prasad wrote:
>>>>>>>>> This patch builds the rtas error log, copies it to the
>>>>>>>>> rtas_addr and then invokes the guest registered machine
>>>>>>>>> check handler.
>>>>>>>>
>>>>>>>> This commit message needs more context. When is this occurring, why
>>>>>>>> do we need this?
>>>>>>>>
>>>>>>>> [I can answer those questions now, but whether I - or anyone else -
>>>>>>>> will be able to looking back at this commit from years in the future
>>>>>>>> is a different question]
>>>>>>>
>>>>>>> will add more info.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> [snip]
>>>>>>>>> +static uint64_t spapr_get_rtas_addr(void)
>>>>>>>>> +{
>>>>>>>>> + SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>>>>>>> + int rtas_node;
>>>>>>>>> + const struct fdt_property *rtas_addr_prop;
>>>>>>>>> + void *fdt = spapr->fdt_blob;
>>>>>>>>> + uint32_t rtas_addr;
>>>>>>>>> +
>>>>>>>>> + /* fetch rtas addr from fdt */
>>>>>>>>> + rtas_node = fdt_path_offset(fdt, "/rtas");
>>>>>>>>> + g_assert(rtas_node >= 0);
>>>>>>>>> +
>>>>>>>>> + rtas_addr_prop = fdt_get_property(fdt, rtas_node,
>>>>>>>>> "linux,rtas-base", NULL);
>>>>>>>>> + g_assert(rtas_addr_prop);
>>>>>>>>> +
>>>>>>>>> + rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
>>>>>>>>> + return (uint64_t)rtas_addr;
>>>>>>>>
>>>>>>>> It seems a bit roundabout to pull the rtas address out of the device
>>>>>>>> tree, since it was us that put it in there in the first place.
>>>>>>>
>>>>>>> Slof can change the rtas address. So we need to get the updated rtas
>>>>>>> address.
>>>>>>
>>>>>> Ah, ok.
>>>>>>
>>>>>
>>>>> Yeah, and knowing that the DT is guest originated makes me a bit
>>>>> nervous when I see the g_assert()... a misbehaving guest could
>>>>> possibly abort QEMU. Either there should be some sanity checks
>>>>> performed earlier or an non-fatal error path should be added in
>>>>> this function IMHO.
>>>>
>>>> Is it not the QEMU that builds the DT and provides it to the guest?
>>>>
>>>
>>> Yes, but then the guest can push a new DT with the KVMPPC_H_UPDATE_DT hcall.
>>> We only do some minimalist sanity checks in h_update_dt(). I don't think
>>> we want to abort QEMU because the guest sent a DT where "linux,rtas-base"
>>> is missing for example.
>>>
>>>> Also, spapr_get_rtas_addr() is called during physical memory corruption
>>>> which is a fatal error.
>>>
>>> Not that fatal since we care to report it to the guest.
>>
>> True, but if guest does not provide rtas_addr then I am not getting the
>> point on why terminating the QEMU instance (which actually terminates
>> the guest) is a problem. Am I missing something?
>>
>>>
>>>> So, if we cannot fetch rtas_addr (required to
>>>> build and pass the error info to the guest) then I think we should abort.
>>>>
>>>
>>> Maybe we cannot do anything better at this point, but then we should
>>> do some earlier checks and switch to the old machine check behaviour
>>> if what we need is missing from the updated DT for example.
>>
>> We can do some checks earlier, may be during fwnmi registration to see
>> if rtas entry is missing. Again, the guest can possibly update DT after
>> fwnmi registration.
>>
>> But I think we cannot switch to old machine check behavior if we cannot
>> fetch the rtas_addr, because according to PAPR the guest would have
>> relinquished 0x200 vector to the firmware when fwnmi is registered. So
>> we cannot expect the guest to handle 0x200 interrupt.
>
> I think that's fair.
>
> I think we want to verify that we can get the rtas_addr and store it
> at nmi-register time. If we can't we can fail the RTAS call. If the
> guest changes the RTAS address after that point (which we don't
> expect), then it has shot itself in the foot and that's fine.
Yes we can do that during nmi-register (a fail-early approach). Further
we can also try to re-fetch rtas_addr during machine check (in case the
guest has updated the DT) and if we fail to re-fetch we can trigger a
system_reset.
>
--
Regards,
Aravinda