On 04/21/2016 09:28 PM, David Gibson wrote:
> On Thu, Apr 21, 2016 at 10:22:10AM -0700, Jianjun Duan wrote:
>>
>>
>> On 04/19/2016 10:14 PM, David Gibson wrote:
>>> On Fri, Apr 15, 2016 at 01:33:04PM -0700, Jianjun Duan wrote:
>>>> ccs_list in spapr state maintains the device tree related
>>>> information on the rtas side for hotplugged devices. In racing
>>>> situations between hotplug events and migration operation, a rtas
>>>> hotplug event could be migrated from the source guest to target
>>>> guest, or the source guest could have not yet finished fetching
>>>> the device tree when migration is started, the target will try
>>>> to finish fetching the device tree. By migrating ccs_list, the
>>>> target can fetch the device tree properly.
>>>>
>>>> We tracked the size of ccs_list queue, and introduced a dynamic
>>>> cache for ccs_list to get around having to create VMSD for the
>>>> queue. There also existence tests in place for the newly added
>>>> fields in the spapr state VMSD to make sure forward migration
>>>> is not broken.
>>>>
>>>> Signed-off-by: Jianjun Duan <du...@linux.vnet.ibm.com>
>>> So there's problems here at a couple of levels.
>>>
>>> 1. Even more so that the indicators, this is transitional state, which
>>> it would be really nice if we can avoid migrating. At the very least
>>> the new state should go into a subsection with an is_needed function
>>> which will skip it if we don't have a configure connector in progress.
>>> That means we'll be able to backwards migrate as long as we're not in
>>> the middle of a hotplug, which won't be possible with this version.
>>>
>>> Again, if there's some way we can defer or retry the operation instead
>>> of migrating the interim state, that would be even better.
>> I am not sure why transitional state should not be migrated.
>
> Because every extra piece of state to migrate is something which can
> go wrong. Getting migration working properly is a difficult and
> fragile process as it is - every extra bit of state we add makes it
> harder.
>
Migrating this and pending_events does fix the racing. The alternative
such as delay or throttling involves timing issues, which IMHO are even
trickier to get it right.
>> I think the
>> fact that it changes is the reason why it should be migrated. As for
>> backward migration, I thought about it, but decided to leave it later
>> to beat the time. We can do it later, or do it this time and delayed the
>> patches more. I agree that subsection seems to be the solution here.
>
> Leaving backwards migration until later - after you've already
> introduced a new stream version - will make implementing it much, much
> more difficult.
>
I will use subsection to fix the backward migration.
>>> 2. Having to copy the list of elements out into an array just for
>>> migration is pretty horrible. I'm almost include to suggest we should
>>> add list migration into the savevm core rather than this.
>> I thought about a general approach of adding something to savevm to
>> handle the queue. But the queue used here uses macro and doesn't really
>> support polymorphism. And we need to use QTAILQ_FOREACH(var, head, field)
>> to access it. It is not easy as we may need to modify the macro definition
>> to carry
>> type name of the elements of the queue.
>>
>> Also I am following Alexey's code here ([PATCH qemu v15 07/17] spapr_iommu:
>> Migrate full state).
>> It was reviewed by you.
>
> Yeah. It's not incorrect, but it's ugly and every extra time I see
> it, the impetus to find a better way increases.
>
I agree it is not elegant. I will look into it to see if I can create
something similar to QTAILQ_FOREACH to get around the type issue.
>>>> ---
>>>> hw/ppc/spapr.c | 60
>>>> ++++++++++++++++++++++++++++++++++++++++++++-
>>>> hw/ppc/spapr_rtas.c | 2 ++
>>>> include/hw/ppc/spapr.h | 11 +++++++++
>>>> include/migration/vmstate.h | 8 +++++-
>>>> 4 files changed, 79 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index af4745c..eab95f0 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -1245,10 +1245,31 @@ static bool spapr_vga_init(PCIBus *pci_bus, Error
>>>> **errp)
>>>> }
>>>> }
>>>> +static void spapr_pre_save(void *opaque)
>>>> +{
>>>> + sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
>>>> + sPAPRConfigureConnectorState *ccs;
>>>> + sPAPRConfigureConnectorStateCache *ccs_cache;
>>>> +
>>>> + /* Copy ccs_list to ccs_list_cache */
>>>> + spapr->ccs_list_cache = g_new0(sPAPRConfigureConnectorStateCache,
>>>> + spapr->ccs_list_num);
>>>> + ccs_cache = spapr->ccs_list_cache;
>>>> + QTAILQ_FOREACH(ccs, &spapr->ccs_list, next) {
>>>> + ccs_cache->drc_index = ccs->drc_index;
>>>> + ccs_cache->fdt_offset = ccs->fdt_offset;
>>>> + ccs_cache->fdt_depth = ccs->fdt_depth;
>>>> + ccs_cache++;
>>>> + }
>>>> +}
>>>> +
>>>> static int spapr_post_load(void *opaque, int version_id)
>>>> {
>>>> sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
>>>> int err = 0;
>>>> + sPAPRConfigureConnectorState *ccs;
>>>> + sPAPRConfigureConnectorStateCache *ccs_cache = spapr->ccs_list_cache;
>>>> + int index = 0;
>>>> /* In earlier versions, there was no separate qdev for the PAPR
>>>> * RTC, so the RTC offset was stored directly in sPAPREnvironment.
>>>> @@ -1258,6 +1279,19 @@ static int spapr_post_load(void *opaque, int
>>>> version_id)
>>>> err = spapr_rtc_import_offset(spapr->rtc, spapr->rtc_offset);
>>>> }
>>>> + if (version_id < 4) {
>>>> + return err;
>>>> + }
>>>> + /* Copy ccs_list_cache to ccs_list */
>>>> + for (index = 0; index < spapr->ccs_list_num; index ++) {
>>>> + ccs = g_new0(sPAPRConfigureConnectorState, 1);
>>>> + ccs->drc_index = (ccs_cache + index)->drc_index;
>>>> + ccs->fdt_offset = (ccs_cache + index)->fdt_offset;
>>>> + ccs->fdt_depth = (ccs_cache + index)->fdt_depth;
>>>> + QTAILQ_INSERT_TAIL(&spapr->ccs_list, ccs, next);
>>>> + }
>>>> + g_free(spapr->ccs_list_cache);
>>>> +
>>>> return err;
>>>> }
>>>> @@ -1266,10 +1300,28 @@ static bool version_before_3(void *opaque, int
>>>> version_id)
>>>> return version_id < 3;
>>>> }
>>>> +static bool version_ge_4(void *opaque, int version_id)
>>>> +{
>>>> + return version_id >= 4;
>>>> +}
>>>> +
>>>> +static const VMStateDescription vmstate_spapr_ccs_cache = {
>>>> + .name = "spaprconfigureconnectorstate",
>>>> + .version_id = 1,
>>>> + .minimum_version_id = 1,
>>>> + .fields = (VMStateField[]) {
>>>> + VMSTATE_UINT32(drc_index, sPAPRConfigureConnectorStateCache),
>>>> + VMSTATE_INT32(fdt_offset, sPAPRConfigureConnectorStateCache),
>>>> + VMSTATE_INT32(fdt_depth, sPAPRConfigureConnectorStateCache),
>>>> + VMSTATE_END_OF_LIST()
>>>> + },
>>>> +};
>>>> +
>>>> static const VMStateDescription vmstate_spapr = {
>>>> .name = "spapr",
>>>> - .version_id = 3,
>>>> + .version_id = 4,
>>>> .minimum_version_id = 1,
>>>> + .pre_save = spapr_pre_save,
>>>> .post_load = spapr_post_load,
>>>> .fields = (VMStateField[]) {
>>>> /* used to be @next_irq */
>>>> @@ -1279,6 +1331,12 @@ static const VMStateDescription vmstate_spapr = {
>>>> VMSTATE_UINT64_TEST(rtc_offset, sPAPRMachineState,
>>>> version_before_3),
>>>> VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2),
>>>> + /* RTAS state */
>>>> + VMSTATE_INT32_TEST(ccs_list_num, sPAPRMachineState, version_ge_4),
>>> You don't generally need to write your own version test functions for
>>> specific-version. Instead you can just use VMSTATE_INT32_V.
>> I agree. I realized that after the code was posted here.
>
> Ok.
>
>>>> + VMSTATE_STRUCT_VARRAY_ALLOC_TEST(ccs_list_cache,
>>>> sPAPRMachineState,
>>>> + version_ge_4, ccs_list_num, 1,
>>>> + vmstate_spapr_ccs_cache,
>>>> +
>>>> sPAPRConfigureConnectorStateCache),
>>>> VMSTATE_END_OF_LIST()
>>>> },
>>>> };
>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>> index f073258..9cfd559 100644
>>>> --- a/hw/ppc/spapr_rtas.c
>>>> +++ b/hw/ppc/spapr_rtas.c
>>>> @@ -70,6 +70,7 @@ static void spapr_ccs_add(sPAPRMachineState *spapr,
>>>> {
>>>> g_assert(!spapr_ccs_find(spapr, ccs->drc_index));
>>>> QTAILQ_INSERT_HEAD(&spapr->ccs_list, ccs, next);
>>>> + spapr->ccs_list_num++;
>>>> }
>>>> static void spapr_ccs_remove(sPAPRMachineState *spapr,
>>>> @@ -77,6 +78,7 @@ static void spapr_ccs_remove(sPAPRMachineState *spapr,
>>>> {
>>>> QTAILQ_REMOVE(&spapr->ccs_list, ccs, next);
>>>> g_free(ccs);
>>>> + spapr->ccs_list_num--;
>>>> }
>>>> void spapr_ccs_reset_hook(void *opaque)
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index 815d5ee..c8be926 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -11,6 +11,8 @@ struct VIOsPAPRBus;
>>>> struct sPAPRPHBState;
>>>> struct sPAPRNVRAM;
>>>> typedef struct sPAPRConfigureConnectorState sPAPRConfigureConnectorState;
>>>> +typedef struct sPAPRConfigureConnectorStateCache
>>>> + sPAPRConfigureConnectorStateCache;
>>>> typedef struct sPAPREventLogEntry sPAPREventLogEntry;
>>>> #define HPTE64_V_HPTE_DIRTY 0x0000000000000040ULL
>>>> @@ -75,6 +77,9 @@ struct sPAPRMachineState {
>>>> /* RTAS state */
>>>> QTAILQ_HEAD(, sPAPRConfigureConnectorState) ccs_list;
>>>> + /* Temporary cache for migration purposes */
>>>> + int32_t ccs_list_num;
>>>> + sPAPRConfigureConnectorStateCache *ccs_list_cache;
>>>> /*< public >*/
>>>> char *kvm_type;
>>>> @@ -589,6 +594,12 @@ struct sPAPRConfigureConnectorState {
>>>> QTAILQ_ENTRY(sPAPRConfigureConnectorState) next;
>>>> };
>>>> +struct sPAPRConfigureConnectorStateCache {
>>>> + uint32_t drc_index;
>>>> + int fdt_offset;
>>>> + int fdt_depth;
>>>> +};
>>>> +
>>>> void spapr_ccs_reset_hook(void *opaque);
>>>> #define TYPE_SPAPR_RTC "spapr-rtc"
>>>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>>>> index 1622638..7966979 100644
>>>> --- a/include/migration/vmstate.h
>>>> +++ b/include/migration/vmstate.h
>>>> @@ -549,9 +549,10 @@ extern const VMStateInfo vmstate_info_bitmap;
>>>> .offset = offsetof(_state, _field), \
>>>> }
>>>> -#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version,
>>>> _vmsd, _type) {\
>>>> +#define VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, _test,
>>>> _field_num, _version, _vmsd, _type) { \
>>>> .name = (stringify(_field)), \
>>>> .version_id = (_version), \
>>>> + .field_exists = (_test), \
>>>> .vmsd = &(_vmsd), \
>>>> .num_offset = vmstate_offset_value(_state, _field_num, int32_t), \
>>>> .size = sizeof(_type), \
>>>> @@ -677,6 +678,11 @@ extern const VMStateInfo vmstate_info_bitmap;
>>>> VMSTATE_STRUCT_ARRAY_TEST(_field, _state, _num, NULL, _version, \
>>>> _vmsd, _type)
>>>> +#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version,
>>>> \
>>>> + _vmsd, _type)
>>>> \
>>>> + VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, NULL, _field_num,
>>>> \
>>>> + _version, _vmsd, _type)
>>>> +
>>>> #define VMSTATE_BUFFER_UNSAFE_INFO(_field, _state, _version, _info,
>>>> _size) \
>>>> VMSTATE_BUFFER_UNSAFE_INFO_TEST(_field, _state, NULL, _version,
>>>> _info, \
>>>> _size)
>>
>