Hello Guenter,

On Fri, Aug 07, 2015 at 10:02:41PM -0700, Guenter Roeck wrote:
> Introduce an optional hardware maximum timeout in the watchdog core.
> The hardware maximum timeout can be lower than the maximum timeout.
> 
> Drivers can set the maximum hardware timeout value in the watchdog data
> structure. If the configured timeout exceeds the maximum hardware timeout,
> the watchdog core enables a timer function to assist sending keepalive
> requests to the watchdog driver.
> 
> Cc: Timo Kokkonen <timo.kokko...@offcode.fi>
> Cc: Uwe Kleine-König <u.kleine-koe...@pengutronix.de>
> Signed-off-by: Guenter Roeck <li...@roeck-us.net>
> ---
> v2:
> - Improved and hopefully clarified documentation.
> - Rearranged variables in struct watchdog_device such that internal variables
>   come last.
> - The code now ensures that the watchdog times out <timeout> seconds after
>   the most recent keepalive sent from user space.
> - The internal keepalive now stops silently and no longer generates a
>   warning message. Reason is that it will now stop early, while there
>   may still be a substantial amount of time for keepalives from user space
>   to arrive. If such keepalives arrive late (for example if user space
>   is configured to send keepalives just a few seconds before the watchdog
>   times out), the message would just be noise and not provide any value.
> ---
>  Documentation/watchdog/watchdog-kernel-api.txt |  23 +++-
>  drivers/watchdog/watchdog_dev.c                | 140 
> ++++++++++++++++++++++---
>  include/linux/watchdog.h                       |  26 +++--
>  3 files changed, 163 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/watchdog/watchdog-kernel-api.txt 
> b/Documentation/watchdog/watchdog-kernel-api.txt
> index d8b0d3367706..25b00b878a7b 100644
> --- a/Documentation/watchdog/watchdog-kernel-api.txt
> +++ b/Documentation/watchdog/watchdog-kernel-api.txt
> @@ -53,9 +53,12 @@ struct watchdog_device {
>       unsigned int timeout;
>       unsigned int min_timeout;
>       unsigned int max_timeout;
> +     unsigned int max_hw_timeout_ms;
>       void *driver_data;
> -     struct mutex lock;
>       unsigned long status;
> +     struct mutex lock;
> +     unsigned long last_keepalive;
> +     struct delayed_work work;
>       struct list_head deferred;
>  };
>  
> @@ -73,18 +76,28 @@ It contains following fields:
>    additional information about the watchdog timer itself. (Like it's unique 
> name)
>  * ops: a pointer to the list of watchdog operations that the watchdog 
> supports.
>  * timeout: the watchdog timer's timeout value (in seconds).
> +  This is the time after which the system will reboot if user space does
> +  not send a heartbeat request if WDOG_ACTIVE is set.
>  * min_timeout: the watchdog timer's minimum timeout value (in seconds).
>  * max_timeout: the watchdog timer's maximum timeout value (in seconds).
> +* max_hw_timeout_ms: Maximum hardware timeout, in milli-seconds. May differ
> +  from max_timeout. If set to a value larger than max_timeout, the
> +  infrastructure will send a heartbeat to the watchdog driver if 'timeout'
> +  is larger than 'max_hw_timeout / 2', unless WDOG_ACTIVE is set and user
> +  space failed to send a heartbeat for at least 'timeout' seconds.
To properly understand this it would be necessary to know what
max_timeout is. Maybe clearify that, too?

>  * bootstatus: status of the device after booting (reported with watchdog
>    WDIOF_* status bits).
>  * driver_data: a pointer to the drivers private data of a watchdog device.
>    This data should only be accessed via the watchdog_set_drvdata and
>    watchdog_get_drvdata routines.
> -* lock: Mutex for WatchDog Timer Driver Core internal use only.
>  * status: this field contains a number of status bits that give extra
>    information about the status of the device (Like: is the watchdog timer
>    running/active, is the nowayout bit set, is the device opened via
>    the /dev/watchdog interface or not, ...).
> +* lock: Mutex for WatchDog Timer Driver Core internal use only.
> +* last_keepalive: Time of most recent keepalive triggered from user space,
> +  in jiffies.
> +* work: Worker data structure for WatchDog Timer Driver Core internal use 
> only.
>  * deferred: entry in wtd_deferred_reg_list which is used to
>    register early initialized watchdogs.
>  
> @@ -160,7 +173,11 @@ they are supported. These optional routines/operations 
> are:
>    and -EIO for "could not write value to the watchdog". On success this
>    routine should set the timeout value of the watchdog_device to the
>    achieved timeout value (which may be different from the requested one
> -  because the watchdog does not necessarily has a 1 second resolution).
> +  because the watchdog does not necessarily have a 1 second resolution).
> +  Drivers implementing hw_max_timeout_ms set the hardware watchdog timeout
> +  to the minimum of timeout and hw_max_timeout_ms. Those drivers set the
> +  timeout value of the watchdog_device either to the requested timeout value
> +  (if it is larger than hw_max_timeout_ms), or to the achieved timeout value.
>    (Note: the WDIOF_SETTIMEOUT needs to be set in the options field of the
>    watchdog's info structure).
>  * get_timeleft: this routines returns the time that's left before a reset.
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index 06171c73daf5..c04ba1a98cc8 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -37,7 +37,9 @@
>  #include <linux/errno.h>     /* For the -ENODEV/... values */
>  #include <linux/kernel.h>    /* For printk/panic/... */
>  #include <linux/fs.h>                /* For file operations */
> +#include <linux/jiffies.h>   /* For timeout functions */
>  #include <linux/watchdog.h>  /* For watchdog specific items */
> +#include <linux/workqueue.h> /* For workqueue */
>  #include <linux/miscdevice.h>        /* For handling misc devices */
>  #include <linux/init.h>              /* For __init/__exit/... */
>  #include <linux/uaccess.h>   /* For copy_to_user/put_user/... */
> @@ -49,6 +51,78 @@ static dev_t watchdog_devt;
>  /* the watchdog device behind /dev/watchdog */
>  static struct watchdog_device *old_wdd;
>  
> +static struct workqueue_struct *watchdog_wq;
> +
> +static inline bool watchdog_need_worker(struct watchdog_device *wdd)
> +{

Maybe add:
        /* all variables in milli seconds */

> +     unsigned int hm = wdd->max_hw_timeout_ms;
> +     unsigned int m = wdd->max_timeout * 1000;
> +     unsigned int t = wdd->timeout * 1000;
> +
> +     return watchdog_active(wdd) && hm && (!m || hm < m) && t > hm;

ok, this means:

        watchdog_active(wdd): Userspace thinks the hardware is running
        hm: the driver is aware of framework pinging
        !m: no artificial limit
        hm < m: some artificial limit (does this make sense to have
                max_timeout > DIV_ROUND_UP(max_hw_timeout_ms, 1000), or
                should we better enforce max_timeout = 0 for "good"
                drivers?)
        t > hm: userspace requests longer timeout than hardware can
                handle.

looks good.

> +}
> +
> +static unsigned long watchdog_next_keepalive(struct watchdog_device *wdd)
> +{
> +     unsigned int t = wdd->timeout * 1000;
> +     unsigned long to = wdd->last_keepalive + msecs_to_jiffies(t);
> +     unsigned long last, max_t, tj;
> +
> +     if (wdd->max_hw_timeout_ms && t > wdd->max_hw_timeout_ms)
> +             t = wdd->max_hw_timeout_ms;

watchdog_next_keepalive is only called after watchdog_need_worker
returned true, so there shouldn't be a need to repeat this test, right?

> +     tj = msecs_to_jiffies(t / 2);
> +
> +     /*
> +      * Ensure that the watchdog times out wdd->timeout seconds
> +      * after the most recent keepalive sent from user space.
> +      */
> +     last = to - msecs_to_jiffies(t);
> +     if (time_is_after_jiffies(last))
> +             max_t = last - jiffies;
> +     else
> +             max_t = 0;
> +
> +     if (max_t < tj)
> +             tj = max_t;
> +
> +     return tj;
I find this hard to understand. Maybe the variable names could be
improved, something like:

        t -> hw_timeout_ms
        tj -> keep_alive_interval
        to -> virt_timeout
        last -> last_hw_ping

And I'd do:

        /*
         * To ensure that the watchdog times out wdd->timeout seconds
         * after the most recent ping from userspace, the last
         * worker ping has to come in hw_timeout_ms before this timeout.
         */
        last_hw_ping = virt_timeout - msecs_to_jiffies(hw_timeout_ms);

        /*
         * Worker pings usually come at intervals of
         * max_hw_timeout_ms / 2. This interval is shortend if
         * virt_timeout is near (or already over).
         */
        return min_t(long, last_hw_ping - jiffies, keep_alive_interval);

then you have to change the return type to long and ...

> +}
> +
> +static inline void watchdog_update_worker(struct watchdog_device *wdd,
> +                                       bool cancel, bool sync)
> +{
> +     if (watchdog_need_worker(wdd)) {
> +             unsigned long t = watchdog_next_keepalive(wdd);
> +
> +             if (t)

... test for t > 0 here.

> +                     mod_delayed_work(watchdog_wq, &wdd->work, t);
> +     } else if (cancel) {
> +             if (sync)

Up to now sync is always false. Does this change later? Should this
parameter be introduced only later, too?

> +                     cancel_delayed_work_sync(&wdd->work);
> +             else
> +                     cancel_delayed_work(&wdd->work);
> +     }
> +}

This function looks artificial, i.e. I don't understand why you would
want to modify the timer or stop it. Probably that is because the timer
does more than increasing the virtual timeout later?

> +static int _watchdog_ping(struct watchdog_device *wdd)
> +{
> +     int err;
> +
> +     if (test_bit(WDOG_UNREGISTERED, &wdd->status))
> +             return -ENODEV;
> +
> +     if (!watchdog_active(wdd))
> +             return 0;
> +
> +     if (wdd->ops->ping)
> +             err = wdd->ops->ping(wdd);  /* ping the watchdog */
> +     else
> +             err = wdd->ops->start(wdd); /* restart watchdog */
> +
> +     return err;
> +}
> +
>  /*
>   *   watchdog_ping: ping the watchdog.
>   *   @wdd: the watchdog device to ping
> @@ -61,26 +135,27 @@ static struct watchdog_device *old_wdd;
>  
>  static int watchdog_ping(struct watchdog_device *wdd)
>  {
> -     int err = 0;
> +     int err;
>  
>       mutex_lock(&wdd->lock);
> +     err = _watchdog_ping(wdd);
> +     wdd->last_keepalive = jiffies;
I'd switch these two lines. Consider wdd->ops->ping takes some time then
the next ping should timed relative to before the ping, not after it.

> +     watchdog_update_worker(wdd, false, false);
> +     mutex_unlock(&wdd->lock);
>  
> -     if (test_bit(WDOG_UNREGISTERED, &wdd->status)) {
> -             err = -ENODEV;
> -             goto out_ping;
> -     }
> +     return err;
> +}
>  
> -     if (!watchdog_active(wdd))
> -             goto out_ping;
> +static void watchdog_ping_work(struct work_struct *work)
> +{
> +     struct watchdog_device *wdd;
>  
> -     if (wdd->ops->ping)
> -             err = wdd->ops->ping(wdd);      /* ping the watchdog */
> -     else
> -             err = wdd->ops->start(wdd);     /* restart watchdog */
> +     wdd = container_of(to_delayed_work(work), struct watchdog_device, work);
>  
> -out_ping:
> +     mutex_lock(&wdd->lock);
> +     _watchdog_ping(wdd);
> +     watchdog_update_worker(wdd, false, false);
>       mutex_unlock(&wdd->lock);
> -     return err;
>  }
>  
>  /*
> @@ -107,8 +182,11 @@ static int watchdog_start(struct watchdog_device *wdd)
>               goto out_start;
>  
>       err = wdd->ops->start(wdd);
> -     if (err == 0)
> +     if (err == 0) {
>               set_bit(WDOG_ACTIVE, &wdd->status);
> +             wdd->last_keepalive = jiffies;
> +             watchdog_update_worker(wdd, false, false);
> +     }
>  
>  out_start:
>       mutex_unlock(&wdd->lock);
> @@ -146,8 +224,10 @@ static int watchdog_stop(struct watchdog_device *wdd)
>       }
>  
>       err = wdd->ops->stop(wdd);
> -     if (err == 0)
> +     if (err == 0) {
>               clear_bit(WDOG_ACTIVE, &wdd->status);
> +             watchdog_update_worker(wdd, true, false);

Regarding my concern about watchdog_update_worker above: After
clear_bit(WDOG_ACTIVE, ...) it's an unconditional cancel_delayed_work
and I'd prefer to not have that disguised. Maybe it would be easier to
understand if there were two different functions, one with the semantic
of watchdog_update_worker(cancel=true) and one with
watchdog_update_worker(cancel=false)?

It's to late here now to review the rest of your patch. Will follow-up
after sleeping.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to