Re: [Qemu-devel] [PATCH COLO-Frame v12 10/38] COLO: Implement colo checkpoint protocol

Markus Armbruster Mon, 11 Jan 2016 04:48:23 -0800

Hailiang Zhang <zhang.zhanghaili...@huawei.com> writes:

> Hi Markus,
>
> On 2015/12/19 16:54, Markus Armbruster wrote:
>> Jumping in at v12 for a bit of QAPI review (and whatever else catched my
>> eye nearby), please pardon my ignorance of COLO in general, and previous
>> review of this series in particular.
>>
>
> Thanks all the same :)
[...]
>>> diff --git a/migration/colo.c b/migration/colo.c
>>> index 0ab9618..0ce2a6e 100644
>>> --- a/migration/colo.c
>>> +++ b/migration/colo.c
>>> @@ -10,10 +10,12 @@
>>>    * later.  See the COPYING file in the top-level directory.
>>>    */
>>>
>>> +#include <unistd.h>
>>>   #include "sysemu/sysemu.h"
>>>   #include "migration/colo.h"
>>>   #include "trace.h"
>>>   #include "qemu/error-report.h"
>>> +#include "qemu/sockets.h"
>>>
>>>   bool colo_supported(void)
>>>   {
>>> @@ -34,6 +36,100 @@ bool migration_incoming_in_colo_state(void)
>>>       return mis && (mis->state == MIGRATION_STATUS_COLO);
>>>   }
>>>
>>> +static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
>>> +{
>>> +    int ret;
>>> +
>>> +    if (cmd >= COLO_COMMAND_MAX) {
>>
>> Needs a trivial rebase due to commit 7fb1cf1.
>>
>
>>> +        error_report("%s: Invalid cmd", __func__);
>>> +        return -EINVAL;
>>
>> Can this run in a context with different error handling needs?
>>
>> Or asked differently: who may ultimately handle this error?  Whoever
>> that may be, how does it need to report errors?
>>
>> Peeking ahead: the immediate callers don't handle this error, they just
>> pass it on their callers.
>>
>> I'm asking because I'm trying to understand whether error_report() is
>> appropriate here, or whether you need to use error_setg(), and leave the
>> actual reporting to the spot that ultimately handles this error.
>>
>
> Hmm, i know what you mean, we handled them all together after exit
> from the colo process loop,
> Use error_setg() seems to be a good idea, with this modification, we
> can also drop the return
> value. I will fix it in next version.
>
>
>>> +    }
>>> +    qemu_put_be32(f, cmd);
>>> +    qemu_fflush(f);
>>> +
>>> +    ret = qemu_file_get_error(f);
>>> +    trace_colo_put_cmd(COLOCommand_lookup[cmd]);
>>> +
>>> +    return ret;
>>> +}
>>
>> Looks like @cmd is a COLOCommand.  Why is the parameter type uint32_t?
>>
>
> OK, i will change it to use enum COLOCommand.
>
>>> +
>>> +static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
>>> +{
>>> +    int ret;
>>> +
>>> +    *cmd = qemu_get_be32(f);
>>> +    ret = qemu_file_get_error(f);
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>> +    if (*cmd >= COLO_COMMAND_MAX) {
>>> +        error_report("%s: Invalid cmd", __func__);
>>> +        return -EINVAL;
>>> +    }
>>> +    trace_colo_get_cmd(COLOCommand_lookup[*cmd]);
>>> +    return 0;
>>> +}
>>
>> Same question.
>>
>> The "get" in the name suggests the function returns the value gotten,
>> like similarly named function elsewhere in migration/ do.
>>
> Do you mean it should return the cmd value directly, not though parameter way 
> ?
> After we convert it to use error_setg() to indicate success or not, we
> can do like that.
> I will fix it.


Sounds good to me.

[...]
>>> diff --git a/qapi-schema.json b/qapi-schema.json
>>> index c9ff34e..85f7800 100644
>>> --- a/qapi-schema.json
>>> +++ b/qapi-schema.json
>>> @@ -720,6 +720,31 @@
>>>   { 'command': 'migrate-start-postcopy' }
>>>
>>>   ##
>>> +# @COLOCommand
>>> +#
>>> +# The commands for COLO fault tolerance
>>> +#
>>> +# @checkpoint-ready: SVM is ready for checkpointing
>>> +#
>>> +# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
>>> +#
>>> +# @checkpoint-reply: SVM gets PVM's checkpoint request
>>> +#
>>> +# @vmstate-send: VM's state will be sent by PVM.
>>> +#
>>> +# @vmstate-size: The total size of VMstate.
>>> +#
>>> +# @vmstate-received: VM's state has been received by SVM.
>>> +#
>>> +# @vmstate-loaded: VM's state has been loaded by SVM.
>>> +#
>>> +# Since: 2.6
>>> +##
>>> +{ 'enum': 'COLOCommand',
>>> +  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
>>> +            'vmstate-send', 'vmstate-size','vmstate-received',
>>> +            'vmstate-loaded' ] }
>>> +
>>
>> Space after 'vmstate-size', please.
>>
>
>> 'vmstate-size' is not used in this patch.  You may want to add it with
>> its first use instead.
>>
>
> OK, i will move it to the corresponding patch.
>
>> Should this enum really be named "COLOCommand"?  'checkpoint-ready',
>> 'checkpoint-request', 'vmstate-send' look like commands to me, but the
>> others look like replies.
>>
>
> Yes, COLOCommand is not so exact. what about name it COLOProtocol?

A protocol specifies valid sequences of messages, and what they mean.
This isn't a protocol, it's a message within a protocol.  COLOMessage?

>>
>>>   # @MouseInfo:
>>>   #
>>>   # Information about a mouse device.
>>> diff --git a/trace-events b/trace-events
>>> index 5565e79..39fdd8d 100644
>>> --- a/trace-events
>>> +++ b/trace-events
>>> @@ -1579,6 +1579,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
>>>
>>>   # migration/colo.c
>>>   colo_vm_state_change(const char *old, const char *new) "Change
>>> '%s' => '%s'"
>>> +colo_put_cmd(const char *msg) "Send '%s' cmd"
>>> +colo_get_cmd(const char *msg) "Receive '%s' cmd"
>>>
>>>   # kvm-all.c
>>>   kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
>>
>> I like how this commit creates just the two state machines, and leaves
>> filling in their actions to later commits.  Helps ignorant rewiewers
>> like me :)
>>
>>
>
> Do you mean i should split this patch ? Leave this patch with the
> simplest colo process,
> maybe just 'ready, request, reply', and add the other states in later patch?

No, I *like* how you split up the work.

Re: [Qemu-devel] [PATCH COLO-Frame v12 10/38] COLO: Implement colo checkpoint protocol

Reply via email to