Hailiang Zhang <zhang.zhanghaili...@huawei.com> writes: > On 2015/12/19 17:27, Markus Armbruster wrote: >> zhanghailiang <zhang.zhanghaili...@huawei.com> writes: >> >>> Guest will enter this state when paused to save/restore VM state >>> under colo checkpoint. >>> >>> Cc: Eric Blake <ebl...@redhat.com> >>> Cc: Markus Armbruster <arm...@redhat.com> >>> Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com> >>> Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com> >>> Signed-off-by: Gonglei <arei.gong...@huawei.com> >>> Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com> >>> Reviewed-by: Eric Blake <ebl...@redhat.com> >>> --- >>> qapi-schema.json | 5 ++++- >>> vl.c | 8 ++++++++ >>> 2 files changed, 12 insertions(+), 1 deletion(-) >>> >>> diff --git a/qapi-schema.json b/qapi-schema.json >>> index 85f7800..0423b47 100644 >>> --- a/qapi-schema.json >>> +++ b/qapi-schema.json >>> @@ -154,12 +154,15 @@ >>> # @watchdog: the watchdog action is configured to pause and has been >>> triggered >>> # >>> # @guest-panicked: guest has been panicked as a result of guest OS panic >>> +# >>> +# @colo: guest is paused to save/restore VM state under colo checkpoint >>> (since >>> +# 2.6) >>> ## >>> { 'enum': 'RunState', >>> 'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused', >>> 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm', >>> 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog', >>> - 'guest-panicked' ] } >>> + 'guest-panicked', 'colo' ] } >>> >>> ## >>> # @StatusInfo: >>> diff --git a/vl.c b/vl.c >>> index f84fde8..fca630b 100644 >>> --- a/vl.c >>> +++ b/vl.c >>> @@ -594,6 +594,7 @@ static const RunStateTransition >>> runstate_transitions_def[] = { >>> { RUN_STATE_INMIGRATE, RUN_STATE_WATCHDOG }, >>> { RUN_STATE_INMIGRATE, RUN_STATE_GUEST_PANICKED }, >>> { RUN_STATE_INMIGRATE, RUN_STATE_FINISH_MIGRATE }, >>> + { RUN_STATE_INMIGRATE, RUN_STATE_COLO }, >>> >>> { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED }, >>> { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE }, >>> @@ -603,6 +604,7 @@ static const RunStateTransition >>> runstate_transitions_def[] = { >>> >>> { RUN_STATE_PAUSED, RUN_STATE_RUNNING }, >>> { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE }, >>> + { RUN_STATE_PAUSED, RUN_STATE_COLO}, >>> >>> { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING }, >>> { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE }, >>> @@ -613,9 +615,12 @@ static const RunStateTransition >>> runstate_transitions_def[] = { >>> >>> { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING }, >>> { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE }, >>> + { RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO}, >>> >>> { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING }, >>> >>> + { RUN_STATE_COLO, RUN_STATE_RUNNING }, >>> + >>> { RUN_STATE_RUNNING, RUN_STATE_DEBUG }, >>> { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR }, >>> { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR }, >>> @@ -626,6 +631,7 @@ static const RunStateTransition >>> runstate_transitions_def[] = { >>> { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN }, >>> { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG }, >>> { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED }, >>> + { RUN_STATE_RUNNING, RUN_STATE_COLO}, >>> >>> { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING }, >>> >>> @@ -636,9 +642,11 @@ static const RunStateTransition >>> runstate_transitions_def[] = { >>> { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED }, >>> { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING }, >>> { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE }, >>> + { RUN_STATE_SUSPENDED, RUN_STATE_COLO}, >>> >>> { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, >>> { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE }, >>> + { RUN_STATE_WATCHDOG, RUN_STATE_COLO}, >>> >>> { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, >>> { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, >> >> Pardon my ignorance, but could you explain the new run state in a bit >> more detail for me? >> > > OK, in normally, we only need switch between COLO and RUNNING state. > But we can't forbid users to issue other command while VM is COLO state. > > In every checkpoint, we have to pause to send VM's state to SVM, and before we > pause VM, users may issue 'stop' command, which will change state to > 'RUN_STATE_PAUSE', > we don't want to abort VM because of this command. (Actually, we will > support 'stop' VM > during VM is in COLO state). So we need the state machine > 'RUN_STATE_PAUSED -> RUN_STATE_COLO'.
What's the next state then? > We enter COLO state just after a full migration process which the last > state will be > 'RUN_STATE_FINISH_MIGRATE' or 'RUN_STATE_INMIGRATE', before we enter > COLO loop, we may get > 'x-colo-lost-heartbeat', and will run into 'RUN_STATE_COLO' pause, so we need > state machines 'RUN_STATE_FINISH_MIGRATE -> RUN_STATE_COLO'and > 'RUN_STATE_INMIGRATE, RUN_STATE_COLO'. > The reason we need RUN_STATE_SUSPENDED -> RUN_STATE_COLO is, guest or > users may issue standby command. > We need to ensure VM not be crashed. > > Actually, we may need more states which can go to 'colo' state, maybe > just follow the cases of > 'MIGRATE' state. I believe we should fully work out the state transitions added by COLO. I like to write that down in this form: (state, trigger) -> (action, state') Example: (running, checkpoint) -> (begin-checkpointing, colo) with a suitable explanation of 'checkpoint' and 'begin-checkpointing'. For brevity, multiple (state1, trigger) -> (action, state') (state2, trigger) -> (action, state') ... (stateN, trigger) -> (action, state') can be abbreviated to ({state1, state2, stateN}, trigger) -> (action, state') Example: ({running, paused, ...}, checkpoint) -> (begin-checkpointing, colo) For clarity, chains of state transitions should be described in the order they happen. Pictures showing the states connected with transition arrows labelled with the trigger can help. Two properties to check: 1. Correctness: every state transition thus written down does the right thing. 2. Completeness: for every pair (state, trigger), we got a state transition, or an explanation why it cannot happen. > Thanks, > zhanghailiang > >> Your additions to runstate_transitions_def[] show we can go *from* state >> 'colo' only to state 'running', but we can go *to* state 'colo' from >> various other states. This may well be sane, but it's not *obviously* >> sane :)