On 16.10.2018 15:48, John Ferlan wrote:
>
>
> On 10/8/18 7:21 AM, Nikolay Shirokovskiy wrote:
>> Let's introduce shutdown reason "daemon" which means we have to
>> kill running domain ourselves as the best action we can take at
>> that moment. Failure to pick up domain on daemon restart is
>> one example of such case. Using reason "crashed" is a bit misleading
>> as it is used when qemu is actually crashed or qemu destroyed on
>> guest panic as result of user choice that is the problem was
>> in qemu/guest itself. So I propose to use "crashed" only for
>> qemu side issues and introduce "daemon" when we have to kill the qemu
>> for good.
>
> How about (although reading below may shed some more context):
>
> This patch introduces a new shutdown reason "daemon" in order
> to indicate that the daemon needed to force shutdown the domain
> as the best course of action to take at the moment.
>
> This action would occur during Reconnect processing when it's
> determined that either the QEMU monitor no longer exists for
> some unknown reason or creation of a thread to reconnect the
> domain failed.
Mostly I'm fine with this one except "monitor no longer exists"
is condition for reason "crashed" or "unknown" (Martin's contribution)
>
>>
>> There is another example where "daemon" will be useful. If we can
>> not reboot domain we kill it and got "crashed" reason now. Looks
>> like good candidate for "daemon" reason.
>
> If you feel this way, then a followup patch could/should be posted;
> otherwise, this'll be lost to commit message heaven where all good ideas
> go to die ;-).
Sure!
>
>>
>> Signed-off-by: Nikolay Shirokovskiy <nshirokovs...@virtuozzo.com>
>> ---
>> include/libvirt/libvirt-domain.h | 1 +
>> src/conf/domain_conf.c | 3 ++-
>> src/qemu/qemu_process.c | 11 ++++-------
>> tools/virsh-domain-monitor.c | 3 ++-
>> 4 files changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/libvirt/libvirt-domain.h
>> b/include/libvirt/libvirt-domain.h
>> index fdd2d6b..11fdab5 100644
>> --- a/include/libvirt/libvirt-domain.h
>> +++ b/include/libvirt/libvirt-domain.h
>> @@ -145,6 +145,7 @@ typedef enum {
>> VIR_DOMAIN_SHUTOFF_FAILED = 6, /* domain failed to start */
>> VIR_DOMAIN_SHUTOFF_FROM_SNAPSHOT = 7, /* restored from a snapshot which
>> was
>> * taken while domain was shutoff
>> */
>> + VIR_DOMAIN_SHUTOFF_DAEMON = 8, /* daemon have to kill domain */
>
> s/have to/decides to/
>
>> # ifdef VIR_ENUM_SENTINELS
>> VIR_DOMAIN_SHUTOFF_LAST
>> # endif
>> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
>> index 9911d56..e441723 100644
>> --- a/src/conf/domain_conf.c
>> +++ b/src/conf/domain_conf.c
>> @@ -779,7 +779,8 @@ VIR_ENUM_IMPL(virDomainShutoffReason,
>> VIR_DOMAIN_SHUTOFF_LAST,
>> "migrated",
>> "saved",
>> "failed",
>> - "from snapshot")
>> + "from snapshot",
>> + "daemon")
>>
>> VIR_ENUM_IMPL(virDomainCrashedReason, VIR_DOMAIN_CRASHED_LAST,
>> "unknown",
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index e9c7618..c4bc9ca 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -7988,15 +7988,12 @@ qemuProcessReconnect(void *opaque)
>> /* We can't get the monitor back, so must kill the VM
>> * to remove danger of it ending up running twice if
>> * user tries to start it again later
>
> Since we're changing anyway, let's put the stop there, e.g.
> "s/later/later./"
>
>> - * If we couldn't get the monitor since QEMU supports
>> - * no-shutdown, we can safely say that the domain
>> - * crashed ... */
>> - state = VIR_DOMAIN_SHUTOFF_CRASHED;
>> - /* If BeginJob failed, we jumped here without a job, let's hope
>> another
>> + * If BeginJob failed, we jumped here without a job, let's hope
>> another
>> * thread didn't have a chance to start playing with the domain yet
>> * (it's all we can do anyway).
>> */
>> - qemuProcessStop(driver, obj, state, QEMU_ASYNC_JOB_NONE, stopFlags);
>> + qemuProcessStop(driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON,
>> + QEMU_ASYNC_JOB_NONE, stopFlags);
>> }
>> goto cleanup;
>> }
>> @@ -8035,7 +8032,7 @@ qemuProcessReconnectHelper(virDomainObjPtr obj,
>> * is no thread that could be doing anything else with the same
>> domain
>> * object.
>> */
>> - qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_FAILED,
>> + qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON,
>> QEMU_ASYNC_JOB_NONE, 0);
>> qemuDomainRemoveInactiveJobLocked(src->driver, obj);
>>
>
> When qemuProcessReconnectHelper was introduced (commit d38897a5d)
> reconnection failure used VIR_DOMAIN_SHUTOFF_FAILED; however, that was
> changed in commit bda2f17d to either VIR_DOMAIN_SHUTOFF_CRASHED or
> VIR_DOMAIN_SHUTOFF_UNKNOWN. When QEMU_CAPS_NO_SHUTDOWN checking was
> removed in commit fe35b1ad6 the conditional state was just left at
> VIR_DOMAIN_SHUTOFF_CRASHED.
>
> Still I agree w/ Martin's thoughts, which unfortunately wasn't commented
> on in the code leading to the latter revert commit losing that unknown
> reason. The reason for the "-no-shutdown" *did* change and that probably
> should have been reflected in the qemuProcessReconnect logic.
>
> So I think perhaps we should have an initial patch which references or
> uses that first paragraph in order to change the code to:
>
> /* We cannot get the monitor back, so kill the VM to
> * remove possibility of it ending up running twice if
> * user tries to start it again later.
> *
> * If we cannot get to the monitor when the QEMU command
> * line used -no-shutdown, then we can safely say that the
> * domain crashed; otherwise we don't really know. */
> if (priv->monJSON && priv->allowReboot == VIR_TRISTATE_BOOL_YES)
> state = VIR_DOMAIN_SHUTOFF_CRASHED;
> else
> state = VIR_DOMAIN_SHUTOFF_UNKNOWN;
>
> and uses the short commit msg of "qemu: Restore lost shutdown reason".
>
> Then this followup would change the "UNKNOWN" to "DAEMON" keeping the
> CRASHED otherwise.
>
Then we will get "crashed" reason always for modern qemu and domains
that can reboot which is not true.
However I see now that the issue is more complicated then I thought. I agree
with Martin too but we have to additionally check that connect(2) for domain
monitor socket was called and error was ECONNREFUSED. So finally state is
resolved like this:
if (connect(2) was not called) {
state = VIR_DOMAIN_SHUTOFF_UNKNOWN;
else if (connect(2) result is NOT ECONNREFUSED) {
state = VIR_DOMAIN_SHUTOFF_UNKNOWN;
} else if (connect(2) result is ECONNREFUSED) {
if (priv->monJSON && priv->allowReboot == VIR_TRISTATE_BOOL_YES)
state = VIR_DOMAIN_SHUTOFF_CRASHED;
else
state = VIR_DOMAIN_SHUTOFF_UNKNOWN;
} else {
state = VIR_DOMAIN_SHUTOFF_DAEMON;
}
Moreover we should not later kill process by remembered pid in all cases except
last one in the above code because we can kill some other process.
Looks like a bit complicated, not sure should we pursuit this one or just left
VIR_DOMAIN_SHUTOFF_UNKNOWN only (in this case we can introduce
VIR_DOMAIN_SHUTOFF_DAEMON for reboot failures where reason is obvisous).
Nikolay
>
>> diff --git a/tools/virsh-domain-monitor.c b/tools/virsh-domain-monitor.c
>> index 3a26363..f0ad558 100644
>> --- a/tools/virsh-domain-monitor.c
>> +++ b/tools/virsh-domain-monitor.c
>> @@ -212,7 +212,8 @@ VIR_ENUM_IMPL(virshDomainShutoffReason,
>> N_("migrated"),
>> N_("saved"),
>> N_("failed"),
>> - N_("from snapshot"))
>> + N_("from snapshot"),
>> + N_("daemon"))
>>
>> VIR_ENUM_DECL(virshDomainCrashedReason)
>> VIR_ENUM_IMPL(virshDomainCrashedReason,
>>
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list