On 07.11.2018 17:29, John Ferlan wrote:
>
>
> On 10/8/18 7:21 AM, Nikolay Shirokovskiy wrote:
>> Let's introduce shutdown reason "daemon" which means we have to
>> kill running domain ourselves as the best action we can take at
>> that moment. Failure to pick up domain on daemon restart is
>> one example of such case. Using reason "crashed" is a bit misleading
>> as it is used when qemu is actually crashed or qemu destroyed on
>> guest panic as result of user choice that is the problem was
>> in qemu/guest itself. So I propose to use "crashed" only for
>> qemu side issues and introduce "daemon" when we have to kill the qemu
>> for good.
>>
>> There is another example where "daemon" will be useful. If we can
>> not reboot domain we kill it and got "crashed" reason now. Looks
>> like good candidate for "daemon" reason.
>>
>> Signed-off-by: Nikolay Shirokovskiy <nshirokovs...@virtuozzo.com>
>> ---
>> include/libvirt/libvirt-domain.h | 1 +
>> src/conf/domain_conf.c | 3 ++-
>> src/qemu/qemu_process.c | 11 ++++-------
>> tools/virsh-domain-monitor.c | 3 ++-
>> 4 files changed, 9 insertions(+), 9 deletions(-)
>>
>
> [...]
>
> Now that I've pushed commits 296e05b54 and 8f0f8425d, let's revisit
> this... Merging in that set of changes with this patch means that
> instead of:
>
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index e9c7618..c4bc9ca 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -7988,15 +7988,12 @@ qemuProcessReconnect(void *opaque)
>> /* We can't get the monitor back, so must kill the VM
>> * to remove danger of it ending up running twice if
>> * user tries to start it again later
>> - * If we couldn't get the monitor since QEMU supports
>> - * no-shutdown, we can safely say that the domain
>> - * crashed ... */
>> - state = VIR_DOMAIN_SHUTOFF_CRASHED;
>> - /* If BeginJob failed, we jumped here without a job, let's hope
>> another
>> + * If BeginJob failed, we jumped here without a job, let's hope
>> another
>> * thread didn't have a chance to start playing with the domain yet
>> * (it's all we can do anyway).
>> */
>> - qemuProcessStop(driver, obj, state, QEMU_ASYNC_JOB_NONE, stopFlags);
>> + qemuProcessStop(driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON,
>> + QEMU_ASYNC_JOB_NONE, stopFlags);
>> }
>> goto cleanup;
>> }
>> @@ -8035,7 +8032,7 @@ qemuProcessReconnectHelper(virDomainObjPtr obj,
>> * is no thread that could be doing anything else with the same
>> domain
>> * object.
>> */
>> - qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_FAILED,
>> + qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON,
>> QEMU_ASYNC_JOB_NONE, 0);
>> qemuDomainRemoveInactiveJobLocked(src->driver, obj);
>>
>
> We'd have:
>
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 3bf84834ec..023e187729 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -7995,10 +7995,14 @@ qemuProcessReconnect(void *opaque)
> *
> * If we cannot get to the monitor when the QEMU command
> * line used -no-shutdown, then we can safely say that the
> - * domain crashed; otherwise, we don't really know. */
> + * domain crashed; otherwise, if the monitor was started,
> + * then we can blame ourselves, else we failed before the
> + * monitor started so we don't really know. */
> if (!priv->mon && tryMonReconn &&
> qemuDomainIsUsingNoShutdown(priv))
> state = VIR_DOMAIN_SHUTOFF_CRASHED;
> + else if (priv->mon)
> + state = VIR_DOMAIN_SHUTOFF_DAEMON;
> else
> state = VIR_DOMAIN_SHUTOFF_UNKNOWN;
>
> @@ -8045,7 +8049,7 @@ qemuProcessReconnectHelper(virDomainObjPtr obj,
> * is no thread that could be doing anything else with the same
> domain
> * object.
> */
> - qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_FAILED,
> + qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON,
> QEMU_ASYNC_JOB_NONE, 0);
I think we need to report VIR_DOMAIN_SHUTOFF_UNKNOWN here. This is similar to
qemuProcessReconnect before we try to reconnect. May be this hunk need to be
placed in distinct patch therefore.
> qemuDomainRemoveInactiveJobLocked(src->driver, obj);
>
> [...]
>
> So does this essentially meet/handle the condition you were trying to
> alter? That is - once we get beyond the monitor reconnection, then if
> we end up in error that means the daemon has decided to stop things.
> Leaving the UNKNOWN to be the period prior to attempting to reconnect
> and CRASHED to the period during which we try to connect to the monitor.
>
> This also captures failures after connection is successful (when
> priv->mon in qemuConnectMonitor).
>
> Getting more specific failures of why the connection failed is perhaps a
> future task if it really matters.
This looks good enough for me too.
>
> If this looks good to you let me know, I can merge and push with the
> previously noted changes and a commit message of:
>
> "This patch introduces a new shutdown reason "daemon" in order
> to indicate that the daemon needed to force shutdown the domain
> as the best course of action to take at the moment.
>
> This action would occur during reconnection when processing encounters
> an error once the monitor reconnection is successful."
>
>
Besides the first comment I'm fine with this change. I'm going
to send a patch to change reason to VIR_DOMAIN_SHUTOFF_DAEMON in
appropriate reboot failure cases too. Also one day let's fix the code not to
kill
unrelated process.
Nikolay
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list