On 04/21/2016 09:28 PM, David Gibson wrote:
> On Thu, Apr 21, 2016 at 10:22:10AM -0700, Jianjun Duan wrote:
>>
>>
>> On 04/19/2016 10:14 PM, David Gibson wrote:
>>> On Fri, Apr 15, 2016 at 01:33:04PM -0700, Jianjun Duan wrote:
>>>> ccs_list in spapr state maintains the device tree related
>>>> information on the rtas side for hotplugged devices. In racing
>>>> situations between hotplug events and migration operation, a rtas
>>>> hotplug event could be migrated from the source guest to target
>>>> guest, or the source guest could have not yet finished fetching
>>>> the device tree when migration is started, the target will try
>>>> to finish fetching the device tree. By migrating ccs_list, the
>>>> target can fetch the device tree properly.
>>>>
>>>> We tracked the size of ccs_list queue, and introduced a dynamic
>>>> cache for ccs_list to get around having to create VMSD for the
>>>> queue. There also existence tests in place for the newly added
>>>> fields in the spapr state VMSD to make sure forward migration
>>>> is not broken.
>>>>
>>>> Signed-off-by: Jianjun Duan <du...@linux.vnet.ibm.com>
>>> So there's problems here at a couple of levels.
>>>
>>> 1. Even more so that the indicators, this is transitional state, which
>>> it would be really nice if we can avoid migrating.  At the very least
>>> the new state should go into a subsection with an is_needed function
>>> which will skip it if we don't have a configure connector in progress.
>>> That means we'll be able to backwards migrate as long as we're not in
>>> the middle of a hotplug, which won't be possible with this version.
>>>
>>> Again, if there's some way we can defer or retry the operation instead
>>> of migrating the interim state, that would be even better.
>> I am not sure why transitional state should not be migrated.
> 
> Because every extra piece of state to migrate is something which can
> go wrong.  Getting migration working properly is a difficult and
> fragile process as it is - every extra bit of state we add makes it
> harder.
> 
Migrating this and pending_events does fix the racing. The alternative
such as delay or throttling involves timing issues, which IMHO are even
trickier to get it right.
>> I think the
>> fact that it changes is the reason why it should be migrated. As for
>> backward migration, I thought about it, but decided to leave it later
>> to beat the time. We can do it later, or do it this time and delayed the
>> patches more. I agree that subsection seems to be the solution here.
> 
> Leaving backwards migration until later - after you've already
> introduced a new stream version - will make implementing it much, much
> more difficult.
> 
I will use subsection to fix the backward migration.
>>> 2. Having to copy the list of elements out into an array just for
>>> migration is pretty horrible.  I'm almost include to suggest we should
>>> add list migration into the savevm core rather than this.
>> I thought about a general approach of adding something to savevm to
>> handle the queue. But the queue used here uses macro and doesn't really
>> support polymorphism.  And we need to use  QTAILQ_FOREACH(var, head, field)
>> to access it. It is not easy as we may need to modify the macro definition
>> to carry
>> type name of the elements of the queue.
>>
>> Also I am following Alexey's code here ([PATCH qemu v15 07/17] spapr_iommu:
>> Migrate full state).
>> It was reviewed by you.
> 
> Yeah.  It's not incorrect, but it's ugly and every extra time I see
> it, the impetus to find a better way increases.
> 
I agree it is not elegant. I will look into it to see if I can create
something similar to QTAILQ_FOREACH to get around the type issue.
>>>> ---
>>>>  hw/ppc/spapr.c              | 60 
>>>> ++++++++++++++++++++++++++++++++++++++++++++-
>>>>  hw/ppc/spapr_rtas.c         |  2 ++
>>>>  include/hw/ppc/spapr.h      | 11 +++++++++
>>>>  include/migration/vmstate.h |  8 +++++-
>>>>  4 files changed, 79 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index af4745c..eab95f0 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -1245,10 +1245,31 @@ static bool spapr_vga_init(PCIBus *pci_bus, Error 
>>>> **errp)
>>>>      }
>>>>  }
>>>> +static void spapr_pre_save(void *opaque)
>>>> +{
>>>> +    sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
>>>> +    sPAPRConfigureConnectorState *ccs;
>>>> +    sPAPRConfigureConnectorStateCache *ccs_cache;
>>>> +
>>>> +    /* Copy ccs_list to ccs_list_cache */
>>>> +    spapr->ccs_list_cache = g_new0(sPAPRConfigureConnectorStateCache,
>>>> +                                   spapr->ccs_list_num);
>>>> +    ccs_cache = spapr->ccs_list_cache;
>>>> +    QTAILQ_FOREACH(ccs, &spapr->ccs_list, next) {
>>>> +        ccs_cache->drc_index = ccs->drc_index;
>>>> +        ccs_cache->fdt_offset = ccs->fdt_offset;
>>>> +        ccs_cache->fdt_depth = ccs->fdt_depth;
>>>> +        ccs_cache++;
>>>> +    }
>>>> +}
>>>> +
>>>>  static int spapr_post_load(void *opaque, int version_id)
>>>>  {
>>>>      sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
>>>>      int err = 0;
>>>> +    sPAPRConfigureConnectorState *ccs;
>>>> +    sPAPRConfigureConnectorStateCache *ccs_cache = spapr->ccs_list_cache;
>>>> +    int index = 0;
>>>>      /* In earlier versions, there was no separate qdev for the PAPR
>>>>       * RTC, so the RTC offset was stored directly in sPAPREnvironment.
>>>> @@ -1258,6 +1279,19 @@ static int spapr_post_load(void *opaque, int 
>>>> version_id)
>>>>          err = spapr_rtc_import_offset(spapr->rtc, spapr->rtc_offset);
>>>>      }
>>>> +    if (version_id < 4) {
>>>> +        return err;
>>>> +    }
>>>> +    /* Copy ccs_list_cache to ccs_list */
>>>> +    for (index = 0; index < spapr->ccs_list_num; index ++) {
>>>> +        ccs = g_new0(sPAPRConfigureConnectorState, 1);
>>>> +        ccs->drc_index = (ccs_cache + index)->drc_index;
>>>> +        ccs->fdt_offset = (ccs_cache + index)->fdt_offset;
>>>> +        ccs->fdt_depth = (ccs_cache + index)->fdt_depth;
>>>> +        QTAILQ_INSERT_TAIL(&spapr->ccs_list, ccs, next);
>>>> +    }
>>>> +    g_free(spapr->ccs_list_cache);
>>>> +
>>>>      return err;
>>>>  }
>>>> @@ -1266,10 +1300,28 @@ static bool version_before_3(void *opaque, int 
>>>> version_id)
>>>>      return version_id < 3;
>>>>  }
>>>> +static bool version_ge_4(void *opaque, int version_id)
>>>> +{
>>>> +    return version_id >= 4;
>>>> +}
>>>> +
>>>> +static const VMStateDescription vmstate_spapr_ccs_cache = {
>>>> +    .name = "spaprconfigureconnectorstate",
>>>> +    .version_id = 1,
>>>> +    .minimum_version_id = 1,
>>>> +    .fields = (VMStateField[]) {
>>>> +        VMSTATE_UINT32(drc_index, sPAPRConfigureConnectorStateCache),
>>>> +        VMSTATE_INT32(fdt_offset, sPAPRConfigureConnectorStateCache),
>>>> +        VMSTATE_INT32(fdt_depth, sPAPRConfigureConnectorStateCache),
>>>> +        VMSTATE_END_OF_LIST()
>>>> +    },
>>>> +};
>>>> +
>>>>  static const VMStateDescription vmstate_spapr = {
>>>>      .name = "spapr",
>>>> -    .version_id = 3,
>>>> +    .version_id = 4,
>>>>      .minimum_version_id = 1,
>>>> +    .pre_save = spapr_pre_save,
>>>>      .post_load = spapr_post_load,
>>>>      .fields = (VMStateField[]) {
>>>>          /* used to be @next_irq */
>>>> @@ -1279,6 +1331,12 @@ static const VMStateDescription vmstate_spapr = {
>>>>          VMSTATE_UINT64_TEST(rtc_offset, sPAPRMachineState, 
>>>> version_before_3),
>>>>          VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2),
>>>> +        /* RTAS state */
>>>> +        VMSTATE_INT32_TEST(ccs_list_num, sPAPRMachineState, version_ge_4),
>>> You don't generally need to write your own version test functions for 
>>> specific-version.  Instead you can just use VMSTATE_INT32_V.
>> I agree. I realized that after the code was posted here.
> 
> Ok.
> 
>>>> +        VMSTATE_STRUCT_VARRAY_ALLOC_TEST(ccs_list_cache, 
>>>> sPAPRMachineState,
>>>> +                                         version_ge_4, ccs_list_num, 1,
>>>> +                                         vmstate_spapr_ccs_cache,
>>>> +                                         
>>>> sPAPRConfigureConnectorStateCache),
>>>>          VMSTATE_END_OF_LIST()
>>>>      },
>>>>  };
>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>> index f073258..9cfd559 100644
>>>> --- a/hw/ppc/spapr_rtas.c
>>>> +++ b/hw/ppc/spapr_rtas.c
>>>> @@ -70,6 +70,7 @@ static void spapr_ccs_add(sPAPRMachineState *spapr,
>>>>  {
>>>>      g_assert(!spapr_ccs_find(spapr, ccs->drc_index));
>>>>      QTAILQ_INSERT_HEAD(&spapr->ccs_list, ccs, next);
>>>> +    spapr->ccs_list_num++;
>>>>  }
>>>>  static void spapr_ccs_remove(sPAPRMachineState *spapr,
>>>> @@ -77,6 +78,7 @@ static void spapr_ccs_remove(sPAPRMachineState *spapr,
>>>>  {
>>>>      QTAILQ_REMOVE(&spapr->ccs_list, ccs, next);
>>>>      g_free(ccs);
>>>> +    spapr->ccs_list_num--;
>>>>  }
>>>>  void spapr_ccs_reset_hook(void *opaque)
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index 815d5ee..c8be926 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -11,6 +11,8 @@ struct VIOsPAPRBus;
>>>>  struct sPAPRPHBState;
>>>>  struct sPAPRNVRAM;
>>>>  typedef struct sPAPRConfigureConnectorState sPAPRConfigureConnectorState;
>>>> +typedef struct sPAPRConfigureConnectorStateCache
>>>> +        sPAPRConfigureConnectorStateCache;
>>>>  typedef struct sPAPREventLogEntry sPAPREventLogEntry;
>>>>  #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
>>>> @@ -75,6 +77,9 @@ struct sPAPRMachineState {
>>>>      /* RTAS state */
>>>>      QTAILQ_HEAD(, sPAPRConfigureConnectorState) ccs_list;
>>>> +    /* Temporary cache for migration purposes */
>>>> +    int32_t ccs_list_num;
>>>> +    sPAPRConfigureConnectorStateCache *ccs_list_cache;
>>>>      /*< public >*/
>>>>      char *kvm_type;
>>>> @@ -589,6 +594,12 @@ struct sPAPRConfigureConnectorState {
>>>>      QTAILQ_ENTRY(sPAPRConfigureConnectorState) next;
>>>>  };
>>>> +struct sPAPRConfigureConnectorStateCache {
>>>> +    uint32_t drc_index;
>>>> +    int fdt_offset;
>>>> +    int fdt_depth;
>>>> +};
>>>> +
>>>>  void spapr_ccs_reset_hook(void *opaque);
>>>>  #define TYPE_SPAPR_RTC "spapr-rtc"
>>>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>>>> index 1622638..7966979 100644
>>>> --- a/include/migration/vmstate.h
>>>> +++ b/include/migration/vmstate.h
>>>> @@ -549,9 +549,10 @@ extern const VMStateInfo vmstate_info_bitmap;
>>>>      .offset     = offsetof(_state, _field),                          \
>>>>  }
>>>> -#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, 
>>>> _vmsd, _type) {\
>>>> +#define VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, _test, 
>>>> _field_num, _version, _vmsd, _type) { \
>>>>      .name       = (stringify(_field)),                               \
>>>>      .version_id = (_version),                                        \
>>>> +    .field_exists = (_test),                                         \
>>>>      .vmsd       = &(_vmsd),                                          \
>>>>      .num_offset = vmstate_offset_value(_state, _field_num, int32_t), \
>>>>      .size       = sizeof(_type),                                     \
>>>> @@ -677,6 +678,11 @@ extern const VMStateInfo vmstate_info_bitmap;
>>>>      VMSTATE_STRUCT_ARRAY_TEST(_field, _state, _num, NULL, _version,   \
>>>>              _vmsd, _type)
>>>> +#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, 
>>>> \
>>>> +                                    _vmsd, _type)                         
>>>> \
>>>> +    VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, NULL, _field_num,    
>>>> \
>>>> +            _version, _vmsd, _type)
>>>> +
>>>>  #define VMSTATE_BUFFER_UNSAFE_INFO(_field, _state, _version, _info, 
>>>> _size) \
>>>>      VMSTATE_BUFFER_UNSAFE_INFO_TEST(_field, _state, NULL, _version, 
>>>> _info, \
>>>>              _size)
>>
> 


Reply via email to