Hey, sorry for the delay on following up on this. I picked it up
again. Hopefully we can finalise quickly.

> On Dec 5, 2019, at 2:22 PM, Felipe Franciosi <fel...@nutanix.com> wrote:
> 
> Heya,
> 
>> On Nov 26, 2019, at 12:18 PM, Marc-André Lureau <marcandre.lur...@gmail.com> 
>> wrote:
>> 
>> On Mon, Nov 25, 2019 at 8:14 PM Felipe Franciosi <fel...@nutanix.com> wrote:
>>> 
>>> This introduces a self-fence mechanism to Qemu, causing it to die if a
>>> heartbeat condition is not met. Currently, a file-based heartbeat is
>>> available and can be configured as follows:
>>> 
>>> -object file-fence,id=ff0,file=/foo,qtimeout=20,ktimeout=25,signal=kill
>>> 
>>> Qemu will watch 'file' for attribute changes. Touching the file works as
>>> a heartbeat. This parameter is mandatory.
>>> 
>>> Fencing happens after 'qtimeout' or 'ktimeout' seconds elapse without a
>>> heartbeat. At least one of these must be specified. Both may be used.
>>> 
>>> When using 'qtimeout', an internal Qemu timer is used. Fencing with this
>>> method gives Qemu a chance to write a log message indicating which file
>>> caused the event. If Qemu's main loop is hung for whatever reason, this
>>> method won't successfully kill Qemu.
>>> 
>>> When using 'ktimeout', a kernel timer is used. In this case, 'signal'
>>> can be 'kill' (for SIGKILL, default) or 'quit' (for SIGQUIT). Using
>>> SIGQUIT may be preferred for obtaining core dumps. If Qemu is hung
>>> (eg. uninterruptable sleep), this method won't successfully kill Qemu.
>>> 
>>> It is worth noting that even successfully killing Qemu may not be
>>> sufficient to completely fence a VM as certain operations like network
>>> packets or block commands may be pending in the kernel. If that is a
>>> concern, systems should consider using further fencing mechanisms like
>>> hardware watchdogs either in addition or in conjunction with this for
>>> additional protection.
>>> 
>>> Signed-off-by: Felipe Franciosi <fel...@nutanix.com>
>>> ---
>>> Based-on: <20191125153619.39893-2-fel...@nutanix.com>
>>> 
>>> Makefile.objs       |   1 +
>>> fence/Makefile.objs |   1 +
>>> fence/file_fence.c  | 381 ++++++++++++++++++++++++++++++++++++++++++++
>> 
>> I think it could be under backends/
> 
> I thought about it and couldn't make up my mind. My decision was based on:
> - Doesn't really feel like a "backend".
> - I envision other types of self-fencing heartbeats (eg. network-based),
>  in which case this would probably be split in a "common" file.
> 
> Arguably other objects in backends/ also fall within these categories,
> so I'm happy to move if you think it's better. Let me know.

I changed it to backends/ for v2.

> 
> 
>> And a slight preference for - seperated words in filenames over qemu 
>> codebase.
> 
> Sure, will change.

Done for v2.

> 
>> 
>>> qemu-options.hx     |  27 +++-
>>> 4 files changed, 409 insertions(+), 1 deletion(-)
>>> create mode 100644 fence/Makefile.objs
>>> create mode 100644 fence/file_fence.c
>>> 
>>> diff --git a/Makefile.objs b/Makefile.objs
>>> index 11ba1a36bd..998eed4796 100644
>>> --- a/Makefile.objs
>>> +++ b/Makefile.objs
>>> @@ -75,6 +75,7 @@ common-obj-$(CONFIG_TPM) += tpm.o
>>> 
>>> common-obj-y += backends/
>>> common-obj-y += chardev/
>>> +common-obj-y += fence/
>>> 
>>> common-obj-$(CONFIG_SECCOMP) += qemu-seccomp.o
>>> qemu-seccomp.o-cflags := $(SECCOMP_CFLAGS)
>>> diff --git a/fence/Makefile.objs b/fence/Makefile.objs
>>> new file mode 100644
>>> index 0000000000..2ed2092568
>>> --- /dev/null
>>> +++ b/fence/Makefile.objs
>>> @@ -0,0 +1 @@
>>> +common-obj-y += file_fence.o
>>> diff --git a/fence/file_fence.c b/fence/file_fence.c
>>> new file mode 100644
>>> index 0000000000..5b743e69d2
>>> --- /dev/null
>>> +++ b/fence/file_fence.c
>>> @@ -0,0 +1,381 @@
>>> +/*
>>> + * QEMU file-based self-fence mechanism
>>> + *
>>> + * Copyright (c) 2019 Nutanix Inc. All rights reserved.
>>> + *
>>> + * Authors:
>>> + *    Felipe Franciosi <fel...@nutanix.com>
>>> + *
>>> + * This library is free software; you can redistribute it and/or
>>> + * modify it under the terms of the GNU Lesser General Public
>>> + * License as published by the Free Software Foundation; either
>>> + * version 2 of the License, or (at your option) any later version.
>>> + *
>>> + * This library is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>> + * Lesser General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU Lesser General Public
>>> + * License along with this library; if not, see 
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gnu.org_licenses_&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=CCrJKVC5zGot8PrnI-iYe00MdX4pgdQfMRigp14Ptmk&m=edzXQ5P0GCmmzZ4-20uvsmgqIreV3BKYC8JYikgHVH4&s=q3GMprakVoRJw8zkbbjXPqAbExv93DzJ-HgI2z00408&e=
>>>  >.
>>> + *
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qapi/error.h"
>>> +#include "qom/object_interfaces.h"
>>> +#include "qemu/filemonitor.h"
>>> +#include "qemu/timer.h"
>>> +
>>> +#include <time.h>
>>> +
>>> +#define TYPE_FILE_FENCE "file-fence"
>>> +
>>> +typedef struct FileFence {
>>> +    Object parent_obj;
>>> +
>>> +    gchar *dir;
>>> +    gchar *file;
>>> +    uint32_t qtimeout;
>>> +    uint32_t ktimeout;
>>> +    int signal;
>>> +
>>> +    timer_t ktimer;
>>> +    QEMUTimer *qtimer;
>>> +
>>> +    QFileMonitor *fm;
>>> +    uint64_t id;
>>> +} FileFence;
>>> +
>>> +#define FILE_FENCE(obj) \
>>> +    OBJECT_CHECK(FileFence, (obj), TYPE_FILE_FENCE)
>>> +
>>> +static void
>>> +timer_update(FileFence *ff)
>>> +{
>>> +    struct itimerspec its = {
>>> +        .it_value.tv_sec = ff->ktimeout,
>>> +    };
>>> +    int err;
>>> +
>>> +    if (ff->qtimeout) {
>>> +        timer_mod(ff->qtimer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
>>> +                              ff->qtimeout * 1000);
>>> +    }
>>> +
>>> +    if (ff->ktimeout) {
>>> +        err = timer_settime(ff->ktimer, 0, &its, NULL);
>>> +        g_assert(err == 0);
>>> +    }
>>> +}
>>> +
>>> +static void
>>> +file_fence_abort_cb(void *opaque)
>>> +{
>>> +    FileFence *ff = opaque;
>>> +    printf("Fencing after %u seconds on '%s'\n",
>>> +           ff->qtimeout, g_strconcat(ff->dir, "/", ff->file, NULL));
>> 
>> May be error_printf() instead.
> 
> Makes sense.

Done for v2.

> 
>> 
>>> +    abort();
>>> +}
>>> +
>>> +static void
>>> +file_fence_watch_cb(int64_t id, QFileMonitorEvent ev, const char *file,
>>> +                    void *opaque)
>>> +{
>>> +    FileFence *ff = opaque;
>>> +
>>> +    if (ev != QFILE_MONITOR_EVENT_ATTRIBUTES) {
>>> +        return;
>>> +    }
>>> +
>>> +    if (g_strcmp0(file, ff->file) != 0) {
>> 
>> Does it actually happen? Apparently the code in
>> util/filemonitor-inotify.c already checks for g_str_equal(filename)
> 
> You are right, it does. I think I saw it, but for some reason decided
> to be over protective. I will remove this check.

I replaced it with an assert for v2. It's fine to rely on the other
component, but it's easy to see how someone can change the behaviour
there and miss some of its users. Asserts are helpful in that sense as
it can catch these programming errors (including changes to other
parts of the codebase we are making assumptions about).

> 
>> 
>>> +        return;
>>> +    }
>>> +
>>> +    timer_update(ff);
>>> +}
>>> +
>>> +static void
>>> +ktimer_tear(FileFence *ff)
>>> +{
>>> +    int err;
>>> +
>>> +    if (ff->ktimer) {
>>> +        err = timer_delete(ff->ktimer);
>>> +        g_assert(err == 0);
>>> +        ff->ktimer = NULL;
>>> +    }
>>> +}
>>> +
>>> +static void
>>> +ktimer_setup(FileFence *ff, Error **errp)
>>> +{
>>> +    int err;
>>> +
>>> +    struct sigevent sev = {
>>> +        .sigev_notify = SIGEV_SIGNAL,
>>> +        .sigev_signo = ff->signal ? ff->signal : SIGKILL,
>>> +    };
>>> +
>>> +    if (ff->ktimeout == 0) {
>>> +        return;
>>> +    }
>>> +
>>> +    err = timer_create(CLOCK_MONOTONIC, &sev, &ff->ktimer);
>>> +    if (err == -1) {
>>> +        error_setg(errp, "Error creating kernel timer: %m");
>>> +        return;
>>> +    }
>>> +}
>>> +
>>> +static void
>>> +qtimer_tear(FileFence *ff)
>>> +{
>>> +    if (ff->qtimer) {
>>> +        timer_del(ff->qtimer);
>>> +        timer_free(ff->qtimer);
>>> +    }
>>> +    ff->qtimer = NULL;
>>> +}
>>> +
>>> +static void
>>> +qtimer_setup(FileFence *ff, Error **errp)
>>> +{
>>> +    QEMUTimer *qtimer;
>>> +
>>> +    if (ff->qtimeout == 0) {
>>> +        return;
>>> +    }
>>> +
>>> +    qtimer = timer_new_ms(QEMU_CLOCK_REALTIME, file_fence_abort_cb, ff);
>>> +    if (qtimer == NULL) {
>>> +        error_setg(errp, "Error creating Qemu timer");
>>> +        return;
>>> +    }
>>> +
>>> +    ff->qtimer = qtimer;
>>> +}
>>> +
>>> +static void
>>> +watch_tear(FileFence *ff)
>>> +{
>>> +    if (ff->fm) {
>>> +        qemu_file_monitor_remove_watch(ff->fm, ff->dir, ff->id);
>>> +        qemu_file_monitor_free(ff->fm);
>>> +        ff->fm = NULL;
>>> +        ff->id = 0;
>>> +    }
>>> +}
>>> +
>>> +static void
>>> +watch_setup(FileFence *ff, Error **errp)
>>> +{
>>> +    QFileMonitor *fm;
>>> +    int64_t id;
>>> +
>>> +    fm = qemu_file_monitor_new(errp);
>>> +    if (!fm) {
>>> +        return;
>>> +    }
>>> +
>>> +    id = qemu_file_monitor_add_watch(fm, ff->dir, ff->file,
>>> +                                     file_fence_watch_cb, ff, errp);
>>> +    if (id < 0) {
>>> +        qemu_file_monitor_free(fm);
>>> +        return;
>>> +    }
>>> +
>>> +    ff->fm = fm;
>>> +    ff->id = id;
>>> +}
>>> +
>>> +static void
>>> +file_fence_complete(UserCreatable *obj, Error **errp)
>>> +{
>>> +    FileFence *ff = FILE_FENCE(obj);
>>> +
>>> +    if (ff->dir == NULL) {
>>> +        error_setg(errp, "A 'file' must be set");
>>> +        return;
>>> +    }
>>> +
>>> +    if (ff->signal != 0 && ff->ktimeout == 0) {
>>> +        error_setg(errp, "Using 'signal' requires 'ktimeout' to be set");
>>> +        return;
>>> +    }
>>> +
>>> +    if (ff->ktimeout == 0 && ff->qtimeout == 0) {
>>> +        error_setg(errp, "One or both of 'ktimeout' or 'qtimeout' must be 
>>> set");
>>> +        return;
>>> +    }
>>> +
>>> +    if (ff->qtimeout >= ff->ktimeout) {
>>> +        error_setg(errp, "Using 'qtimeout' >= 'ktimeout' doesn't make 
>>> sense");
>>> +        return;
>>> +    }
>>> +
>>> +    watch_setup(ff, errp);
>>> +    if (*errp != NULL) {
>>> +        return;
>>> +    }
>>> +
>>> +    qtimer_setup(ff, errp);
>>> +    if (*errp != NULL) {
>>> +        goto err_watch;
>>> +    }
>>> +
>>> +    ktimer_setup(ff, errp);
>>> +    if (*errp != NULL) {
>>> +        goto err_qtimer;
>>> +    }
>> 
>> 
>> I would rather return success on the setup functions and write:
>> 
>> if (!watch_setup(ff, errp) ||
>>     !qtimer_setup(...) ||
>>     !...) {
>>     return; (no cleanup)
>>  }
>> 
>> Object creation will fail, and finalize function will be called.

Makes sense. Done for v2.

>> 
>>> +
>>> +    timer_update(ff);
>>> +
>>> +    return;
>>> +
>>> +err_qtimer:
>>> +    qtimer_tear(ff);
>>> +err_watch:
>>> +    watch_tear(ff);
>>> +}
>>> +
>>> +static void
>>> +file_fence_set_signal(Object *obj, const char *value, Error **errp)
>>> +{
>>> +    FileFence *ff = FILE_FENCE(obj);
>>> +    gchar *gsig;
>>> +
>>> +    if (ff->signal) {
>>> +        error_setg(errp, "Signal property already set");
>>> +        return;
>>> +    }
>>> +
>>> +    gsig = g_ascii_strup(value, -1);
>>> +
>>> +    if (g_strcmp0(gsig, "QUIT") == 0) {
>>> +        ff->signal = SIGQUIT;
>>> +    } else
>>> +    if (g_strcmp0(gsig, "KILL") == 0) {
>>> +        ff->signal = SIGKILL;
>>> +    } else {
>>> +        error_setg(errp, "Invalid signal. Must be 'quit' or 'kill'");
>>> +    }
>> 
>> Or bail out early for NULL case and use g_ascii_strcasecmp()
> 
> Sounds better. Let me look into it.

I followed your suggestion but had to add a goto. I have the feeling
you don't like them, so let me know what you think. :)

> 
>> 
>>> +
>>> +    g_free(gsig);
>>> +}
>>> +
>>> +static char *
>>> +file_fence_get_signal(Object *obj, Error **errp)
>>> +{
>>> +    FileFence *ff = FILE_FENCE(obj);
>>> +
>>> +    switch (ff->signal) {
>>> +    case SIGKILL:
>>> +        return g_strdup("kill");
>>> +    case SIGQUIT:
>>> +        return g_strdup("quit");
>>> +    }
>>> +
>>> +    /* Unreachable */
>>> +    abort();
>>> +}
>>> +
>>> +static void
>>> +file_fence_set_file(Object *obj, const char *value, Error **errp)
>>> +{
>>> +    FileFence *ff = FILE_FENCE(obj);
>>> +    gchar *dir, *file;
>> 
>> g_autofree
> 
> Sweet.

Done for v2.

> 
>> 
>>> +
>>> +    if (ff->dir) {
>>> +        error_setg(errp, "File property already set");
>>> +        return;
>>> +    }
>>> +
>>> +    dir = g_path_get_dirname(value);
>>> +    if (g_str_equal(dir, ".")) {
>>> +        error_setg(errp, "Path for file-fence must be absolute");
>>> +        return;
>>> +    }
>>> +
>>> +    file = g_path_get_basename(value);
>>> +    if (g_str_equal(file, ".")) {
>>> +        error_setg(errp, "Path for file-fence must be a file");
>>> +        g_free(dir);
>>> +        return;
>>> +    }
>>> +
>>> +    ff->dir = dir;
>>> +    ff->file = file;
>> 
>> g_steal_pointer()
> 
> Cool! Clearly I could spend more time in the glib manual. :)

Done for v2.

> 
>> 
>>> +}
>>> +
>>> +static char *
>>> +file_fence_get_file(Object *obj, Error **errp)
>>> +{
>>> +    FileFence *ff = FILE_FENCE(obj);
>>> +
>>> +    if (ff->file) {
>>> +        return g_strconcat(ff->dir, "/", ff->file, NULL);
>> 
>> Or g_build_filename()
> 
> Makes sense.

Done for v2.

I would like to add that I'm following your guidance in how Qemu uses
glib, but personally I'm not a big fan of these features. I don't know
how well compilers can optimise this code, but these (specific)
functions are rather simple and I'd like to think we can correctly
cover error cases, eventually producing more optimal code than what I
can see being produced when using the memory (de)allocation helpers.

> 
>> 
>>> +    }
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>> +static void
>>> +file_fence_instance_finalize(Object *obj)
>>> +{
>>> +    FileFence *ff = FILE_FENCE(obj);
>>> +
>>> +    ktimer_tear(ff);
>>> +    qtimer_tear(ff);
>>> +    watch_tear(ff);
>>> +
>>> +    g_free(ff->file);
>>> +    g_free(ff->dir);
>>> +}
>>> +
>>> +static void
>>> +file_fence_instance_init(Object *obj)
>>> +{
>>> +    FileFence *ff = FILE_FENCE(obj);
>>> +
>>> +    object_property_add_str(obj, "file",
>>> +                            file_fence_get_file,
>>> +                            file_fence_set_file,
>>> +                            &error_abort);
>>> +    object_property_add_str(obj, "signal",
>>> +                            file_fence_get_signal,
>>> +                            file_fence_set_signal,
>>> +                            &error_abort);
>>> +    object_property_add_uint32_ptr(obj, "qtimeout", &ff->qtimeout,
>>> +                                   false, &error_abort);
>>> +    object_property_add_uint32_ptr(obj, "ktimeout", &ff->ktimeout,
>>> +                                   false, &error_abort);
>> 
>> Make them class properties (this will help with -object
>> file-fence,help and such). (fwiw, I have pending patches to support
>> class property description & default values..)
> 
> I tried to fit some of these in the class, as well as justify a split
> of the file-based fencing with a more generic self-fencer right off
> the bat. But it didn't make sense in the end. I envisioned scenarios
> where you may have different heartbeats for one Qemu with different
> timeouts. In that case, it wouldn't work as a class property, right?

I didn't hear back from you on this so I didn't change it. Let me know
your thoughts.

> 
>> 
>>> +}
>>> +
>>> +static void
>>> +file_fence_class_init(ObjectClass *klass, void *class_data)
>>> +{
>>> +    UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
>>> +    ucc->complete = file_fence_complete;
>>> +}
>>> +
>>> +static const TypeInfo file_fence_info = {
>>> +    .name = TYPE_FILE_FENCE,
>>> +    .parent = TYPE_OBJECT,
>>> +    .class_init = file_fence_class_init,
>>> +    .instance_size = sizeof(FileFence),
>>> +    .instance_init = file_fence_instance_init,
>>> +    .instance_finalize = file_fence_instance_finalize,
>>> +    .interfaces = (InterfaceInfo[]) {
>>> +        { TYPE_USER_CREATABLE },
>>> +        { }
>>> +    }
>>> +};
>>> +
>>> +static void
>>> +register_types(void)
>>> +{
>>> +    type_register_static(&file_fence_info);
>>> +}
>>> +
>>> +type_init(register_types);
>>> diff --git a/qemu-options.hx b/qemu-options.hx
>>> index 65c9473b73..995d3d6abf 100644
>>> --- a/qemu-options.hx
>>> +++ b/qemu-options.hx
>>> @@ -4929,8 +4929,33 @@ CN=laptop.example.com,O=Example 
>>> Home,L=London,ST=London,C=GB
>>> 
>>> @end table
>>> 
>>> -ETEXI
>>> +@item -object 
>>> file-fence,id=@var{id},file=@var{file},qtimeout=@var{qtimeout},ktimeout=@var{ktimeout},signal=@{signal}
>>> +
>>> +Self-fence Qemu if @var{file} is not modified within a given timeout.
>>> +
>>> +Qemu will watch @var{file} for attribute changes. Touching the file works 
>>> as a
>>> +heartbeat. This parameter is mandatory.
>>> +
>>> +Fencing happens after @var{qtimeout} or @var{ktimeout} seconds elapse
>>> +without a heartbeat. At least one of these must be specified. Both may be 
>>> used.
>>> 
>>> +When using @var{qtimeout}, an internal Qemu timer is used. Fencing with
>>> +this method gives Qemu a chance to write a log message indicating which 
>>> file
>>> +caused the event. If Qemu's main loop is hung for whatever reason, this 
>>> method
>>> +won't successfully kill Qemu.
>>> +
>>> +When using @var{ktimeout}, a kernel timer is used. In this case, 
>>> @var{signal}
>>> +can be 'kill' (for SIGKILL, default) or 'quit' (for SIGQUIT). Using 
>>> SIGQUIT may
>>> +be preferred for obtaining core dumps. If Qemu is hung (eg. uninterruptable
>>> +sleep), this method won't successfully kill Qemu.
>>> +
>>> +It is worth noting that even successfully killing Qemu may not be 
>>> sufficient to
>>> +completely fence a VM as certain operations like network packets or block
>>> +commands may be pending in the kernel. If that is a concern, systems should
>>> +consider using further fencing mechanisms like hardware watchdogs either in
>>> +addition or in conjunction with this feature for additional protection.
>>> +
>>> +ETEXI
>>> 
>>> HXCOMM This is the last statement. Insert new options before this line!
>>> STEXI
>>> --
>>> 2.20.1
>>> 
>> 
>> Code looks fine to me otherwise
> 
> Let me work on a v2 next week.

"next week" is finally here. :)

F.

> 
> F.
> 
>> 
>> -- 
>> Marc-André Lureau
> 

Reply via email to