On Wed, 12 Dec 2018 11:29:55 +1100 David Gibson <da...@gibson.dropbear.id.au> wrote:
> On Tue, Dec 11, 2018 at 10:55:59AM +0100, Greg Kurz wrote: > > On Tue, 11 Dec 2018 14:53:32 +1100 > > Alexey Kardashevskiy <a...@ozlabs.ru> wrote: > > > > > On 10/12/2018 20:30, Greg Kurz wrote: > > > > On Mon, 10 Dec 2018 17:20:43 +1100 > > > > David Gibson <da...@gibson.dropbear.id.au> wrote: > > > > > > > >> On Mon, Nov 12, 2018 at 03:12:26PM +1100, Alexey Kardashevskiy wrote: > > > >> > > > >>> > > > >>> > > > >>> On 12/11/2018 05:10, Greg Kurz wrote: > > > >>>> Hi Alexey, > > > >>>> > > > >>>> Just a few remarks. See below. > > > >>>> > > > >>>> On Thu, 8 Nov 2018 12:44:06 +1100 > > > >>>> Alexey Kardashevskiy <a...@ozlabs.ru> wrote: > > > >>>> > > > >>>>> SLOF receives a device tree and updates it with various properties > > > >>>>> before switching to the guest kernel and QEMU is not aware of any > > > >>>>> changes > > > >>>>> made by SLOF. Since there is no real RTAS (QEMU implements it), it > > > >>>>> makes > > > >>>>> sense to pass the SLOF final device tree to QEMU to let it implement > > > >>>>> RTAS related tasks better, such as PCI host bus adapter hotplug. > > > >>>>> > > > >>>>> Specifially, now QEMU can find out the actual XICS phandle (for PHB > > > >>>>> hotplug) and the RTAS linux,rtas-entry/base properties (for firmware > > > >>>>> assisted NMI - FWNMI). > > > >>>>> > > > >>>>> This stores the initial DT blob in the sPAPR machine and replaces it > > > >>>>> in the KVMPPC_H_UPDATE_DT (new private hypercall) handler. > > > >>>>> > > > >>>>> This adds an @update_dt_enabled machine property to allow backward > > > >>>>> migration. > > > >>>>> > > > >>>>> SLOF already has a hypercall since > > > >>>>> https://github.com/aik/SLOF/commit/e6fc84652c9c0073f9183 > > > >>>>> > > > >>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> > > > >>>>> --- > > > >>>>> include/hw/ppc/spapr.h | 7 ++++++- > > > >>>>> hw/ppc/spapr.c | 29 ++++++++++++++++++++++++++++- > > > >>>>> hw/ppc/spapr_hcall.c | 32 ++++++++++++++++++++++++++++++++ > > > >>>>> hw/ppc/trace-events | 2 ++ > > > >>>>> 4 files changed, 68 insertions(+), 2 deletions(-) > > > >>>>> > > > >>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > > > >>>>> index ad4d7cfd97..f5dcaf44cb 100644 > > > >>>>> --- a/include/hw/ppc/spapr.h > > > >>>>> +++ b/include/hw/ppc/spapr.h > > > >>>>> @@ -100,6 +100,7 @@ struct sPAPRMachineClass { > > > >>>>> > > > >>>>> /*< public >*/ > > > >>>>> bool dr_lmb_enabled; /* enable dynamic-reconfig/hotplug > > > >>>>> of LMBs */ > > > >>>>> + bool update_dt_enabled; /* enable KVMPPC_H_UPDATE_DT */ > > > >>>>> bool use_ohci_by_default; /* use USB-OHCI instead of XHCI */ > > > >>>>> bool pre_2_10_has_unused_icps; > > > >>>>> bool legacy_irq_allocation; > > > >>>>> @@ -136,6 +137,9 @@ struct sPAPRMachineState { > > > >>>>> int vrma_adjust; > > > >>>>> ssize_t rtas_size; > > > >>>>> void *rtas_blob; > > > >>>>> + uint32_t fdt_size; > > > >>>>> + uint32_t fdt_initial_size; > > > >>>> > > > >>>> I don't quite see the purpose of fdt_initial_size... it seems to be > > > >>>> only > > > >>>> used to print a trace. > > > >>> > > > >>> > > > >>> Ah, lost in rebase. The purpose was to test if the new device tree has > > > >>> not grown too much. > > > >>> > > > >>> > > > >>> > > > >>>> > > > >>>>> + void *fdt_blob; > > > >>>>> long kernel_size; > > > >>>>> bool kernel_le; > > > >>>>> uint32_t initrd_base; > > > >>>>> @@ -462,7 +466,8 @@ struct sPAPRMachineState { > > > >>>>> #define KVMPPC_H_LOGICAL_MEMOP (KVMPPC_HCALL_BASE + 0x1) > > > >>>>> /* Client Architecture support */ > > > >>>>> #define KVMPPC_H_CAS (KVMPPC_HCALL_BASE + 0x2) > > > >>>>> -#define KVMPPC_HCALL_MAX KVMPPC_H_CAS > > > >>>>> +#define KVMPPC_H_UPDATE_DT (KVMPPC_HCALL_BASE + 0x3) > > > >>>>> +#define KVMPPC_HCALL_MAX KVMPPC_H_UPDATE_DT > > > >>>>> > > > >>>>> typedef struct sPAPRDeviceTreeUpdateHeader { > > > >>>>> uint32_t version_id; > > > >>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > >>>>> index c08130facb..5e2d4d211c 100644 > > > >>>>> --- a/hw/ppc/spapr.c > > > >>>>> +++ b/hw/ppc/spapr.c > > > >>>>> @@ -1633,7 +1633,10 @@ static void spapr_machine_reset(void) > > > >>>>> /* Load the fdt */ > > > >>>>> qemu_fdt_dumpdtb(fdt, fdt_totalsize(fdt)); > > > >>>>> cpu_physical_memory_write(fdt_addr, fdt, fdt_totalsize(fdt)); > > > >>>>> - g_free(fdt); > > > >>>>> + g_free(spapr->fdt_blob); > > > >>>>> + spapr->fdt_size = fdt_totalsize(fdt); > > > >>>>> + spapr->fdt_initial_size = spapr->fdt_size; > > > >>>>> + spapr->fdt_blob = fdt; > > > >>>> > > > >>>> Hmm... It looks weird to store state in a reset handler. I'd rather > > > >>>> zeroe > > > >>>> both fdt_blob and fdt_size here. > > > >>> > > > >>> The device tree is built from the reset handler and the idea is that > > > >>> we > > > >>> want to always have some tree in the machine. > > > >> > > > >> Yes, I think the approach here is fine. Otherwise when we want to > > > >> look up the current fdt state in RTAS calls or whatever we'd always > > > >> have to do > > > >> if (fdt_blob) > > > >> look up that > > > >> else > > > >> look up qemu created fdt. > > > >> > > > > > > > > No. We only have one fdt blob: the initial one, I'd rather > > > > call reset time one, or the updated one. > > > > > > There is one fdt in the machine, always. Either initial or from cas. > > > > Yeah, reset time fdt is either the initial one, either cas... and I'm now > > wandering what happens if migration occurs between cas that sets cas_reboot > > and the corresponding reset. With the current code base, I have the > > impression > > that the destination will redo the full cas+cas_reboot cycle after restart > > or > > am I missing something ? > > Yes, I believe that's correct. It's kind of an edge case and that CAS > cycle should still complete ok, it'll just take a little longer to > boot, so I thought that was preferable to the complexity of migrating > the CAS state. > You're probably right. > > > >> Incidentally 'fdt' and 'fdt_blob' names do a terrible job of > > > >> distinguishing what the difference is. Renaming fdt to fdt_initial > > > >> (to match fdt_initial_size) and fdt_blob to fdt should make that > > > >> clearer. > > > >> > > > > > > > > As mentioned earlier in this thread, spapr->fdt_initial_size is only > > > > used > > > > for tracing if the received fdt blob fails fdt_check_full()... > > > > > > > > $ git grep -H fdt_initial_size > > > > hw/ppc/spapr.c: spapr->fdt_initial_size = spapr->fdt_size; > > > > hw/ppc/spapr.c: VMSTATE_UINT32(fdt_initial_size, > > > > sPAPRMachineState), > > > > hw/ppc/spapr_hcall.c: > > > > trace_spapr_update_dt_failed(spapr->fdt_initial_size, cb, > > > > include/hw/ppc/spapr.h: uint32_t fdt_initial_size; > > > > > > > > Not sure it is helpful, and anyway, it is expected to be the same in > > > > source > > > > and destination, so why put it in the migration stream ? > > > > > > > > > Well, we do build the fdt anyway even when receive migration but we do > > > not have to and yes we can expect the fdt on the destination to be of > > > the same size since it is the same command line, it is just guessing and > > > expecting vs. knowing and I prefer the latter as the reset time fdt and > > > migration source fdt might have different size because of > > > host-model/host-serial/slot-label/similar properties. > > > > Right but I still don't see the usefulness of fdt_initial_size... > > So, it's there to address exactly the problem you pointed out elswhere > in the thread: the idea was to disallow the guest resubmitting an fdt > which is "too much" bigger than the original one, thereby consuming a > bunch of qemu memory. The thought was that this is a bit more robust > that just checking against a fixed max size, especially if we need to > increase that fixed size in future to handle really big partitions. > Yeah, I saw that with Alexey's new patch. Thanks for the detailed clarification ! > > > > The only case where we want to migrate something is when h_update_dt() > > > > has > > > > succeeded, ie, the guest passed a valid DT blob. This implies that its > > > > size isn't 0, otherwise fdt_check_full() would return > > > > -FDT_ERR_TRUNCATED. > > > > > > > > I would suggest rather to: > > > > > > > > - completely drop spapr->fdt_initial_size > > > > - clear spapr->fdt_size at machine reset > > > > - migrate if spapr->fdt_size is not zero > > > > > > > > Also, I've just realized another problem... nothing prevents a malicious > > > > guest to pass an insanely great size to h_update_dt, which would cause > > > > g_malloc0() to abort... The passed size should be checked against > > > > FDT_MAX_SIZE. > > > > > > Good point. Just noticed - as posted, the checker actually checks the > > > reset time tree, not the updated one, my bad :) > > > > > > > > > > > >
pgpEr03ifbgwW.pgp
Description: OpenPGP digital signature