Re: [PATCH v4 2/3] watchdog: introduce watchdog.open_timeout commandline parameter
On 2017-01-13 10:11, Rasmus Villemoes wrote: > On 2017-01-11 12:02, Guenter Roeck wrote: >> This will require input from others on the semantics. > > I agree, it would be nice to have others chime in on whether they'd even > find this useful, and what semantics they'd like. Ping. -- Rasmus Villemoes Software Developer Prevas A/S Hedeager 1 DK-8200 Aarhus N +45 51210274 rasmus.villem...@prevas.dk www.prevas.dk -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/3] watchdog: introduce watchdog.open_timeout commandline parameter
On 2017-01-11 12:02, Guenter Roeck wrote: > On 01/11/2017 12:11 AM, Rasmus Villemoes wrote: >> On 2017-01-10 19:08, Guenter Roeck wrote: >>> On Mon, Jan 09, 2017 at 04:02:32PM +0100, Rasmus Villemoes wrote: +static unsigned open_timeout; +module_param(open_timeout, uint, 0644); + +static bool watchdog_past_open_deadline(struct watchdog_core_data *data) +{ +if (!open_timeout) +return false; +return time_is_before_jiffies(data->open_deadline); >>> >>> Doesn't this return true if the time is _before_ the open deadline ? >>> Should it be time_is_after_jiffies() ? >> >> Yes, time_is_before_jiffies(foo) means foo < jiffies, and that is what >> we want when we're querying whether we've passed the deadline. >> > So you want the function to return true if the timeout has _not_ expired ? No, I want it to return true if the timeout has expired, just as its name hopefully suggests ("are we now past the deadline for opening"). time_is_before_jiffies(foo) expands to time_after(jiffies, foo), which in turn expands to (aside from some type checking) (long)((foo) - (jiffies)) < 0 which is true precisely when jiffies > foo, i.e., when the current time is later than foo. [Yes, just as every other user of the time_* helpers this only works if the times being compared are within LONG_MAX jiffies of each other.] > > [ Can you try to work with line wraps ? ] Sorry about that, I hope I've now managed to make Thunderbird not mess it up. > Uuh, no. I didn't realize that you apply that case to the "userspace app > gracefully > closes the watchdog device" case as well. This is clearly a separate use > case. > > Looks like there are now three use cases for 'open timeout'. > - time after boot > - timer after loading the watchdog device > - time after closing watchdog device and before reopening it > > I would have thought the first use case is the important one, and the > other ones are, > at best, secondary. Given that, we are clearly not there yet. This will > require input > from others on the semantics. I agree, it would be nice to have others chime in on whether they'd even find this useful, and what semantics they'd like. But for the record, for us, both the "time after boot" and "time after closing" are important use cases. In practice, approximating boot time with time of first registration effectively just pushes the deadline a little further out, which I think is acceptable (boot time anyway means "when timekeeping began", and even when the deadline is passed, it takes up to whatever the hardware is configured to before the board actually resets, so there's already some slack involved). If one really wants to measure with respect to boot time, I think one can just make set_open_deadline take an additional "from" parameter, passing INITIAL_JIFFIES from watchdog_cdev_register and jiffies from watchdog_release. (That won't work for the hotplug case, but please ignore that; it was a bad example, and certainly not a use case we actually have in mind). Thanks, Rasmus -- Rasmus Villemoes Software Developer Prevas A/S Hedeager 1 DK-8200 Aarhus N +45 51210274 rasmus.villem...@prevas.dk www.prevas.dk -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/3] watchdog: introduce watchdog.open_timeout commandline parameter
On 01/11/2017 12:11 AM, Rasmus Villemoes wrote: On 2017-01-10 19:08, Guenter Roeck wrote: On Mon, Jan 09, 2017 at 04:02:32PM +0100, Rasmus Villemoes wrote: +static unsigned open_timeout; +module_param(open_timeout, uint, 0644); + +static bool watchdog_past_open_deadline(struct watchdog_core_data *data) +{ +if (!open_timeout) +return false; +return time_is_before_jiffies(data->open_deadline); Doesn't this return true if the time is _before_ the open deadline ? Should it be time_is_after_jiffies() ? Yes, time_is_before_jiffies(foo) means foo < jiffies, and that is what we want when we're querying whether we've passed the deadline. So you want the function to return true if the timeout has _not_ expired ? +} + +static void watchdog_set_open_deadline(struct watchdog_core_data *data) +{ +data->open_deadline = jiffies + msecs_to_jiffies(open_timeout); The open deadline as defined applies to the time after the device was instantiated, not to the time since boot. Would it be better to make it "time since boot" ? I don't have a strong opinion on that, but two small things made me do it this way: (1) In case a hardware watchdog is somehow hotplugged long after boot and userspace is supposed to detect that and start feeding it, it wouldn't make sense for the framework not to take care of it for a while. (2) The open_timeout also applies to the case where the userspace app gracefully closes the watchdog device (e.g. because it's been instructed to restart to load a new configuration or whatnot) but the hardware cannot be stopped. In that case, the framework also takes over, and the same logic as after boot should apply - if the app fails to come up again, the framework should not feed the dog indefinitely, but OTOH it clearly doesn't make sense to have a boot-time based deadline. [ Can you try to work with line wraps ? ] Uuh, no. I didn't realize that you apply that case to the "userspace app gracefully closes the watchdog device" case as well. This is clearly a separate use case. Looks like there are now three use cases for 'open timeout'. - time after boot - timer after loading the watchdog device - time after closing watchdog device and before reopening it I would have thought the first use case is the important one, and the other ones are, at best, secondary. Given that, we are clearly not there yet. This will require input from others on the semantics. Thanks, Guenter Also, are you sure about using milli-seconds (instead of seconds) ? I can not really imagine a situation where this would be needed (especially and even more so in the context of using "time after instantiating"). I went back and forth on this. I did milli-seconds because that should cover more use cases. Yes, wanting an open timeout of .7 seconds with 1.0 seconds being unacceptable is unusual, but I know of some customers with very strict requirements. Also, even if one cannot make userspace start that fast, one can boot with a somewhat generous open_timeout and then write 700 to /sys/module/watchdog/parameters/open_timeout for use in case (2) above. Thanks, Rasmus -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/3] watchdog: introduce watchdog.open_timeout commandline parameter
On 2017-01-10 19:08, Guenter Roeck wrote: On Mon, Jan 09, 2017 at 04:02:32PM +0100, Rasmus Villemoes wrote: +static unsigned open_timeout; +module_param(open_timeout, uint, 0644); + +static bool watchdog_past_open_deadline(struct watchdog_core_data *data) +{ + if (!open_timeout) + return false; + return time_is_before_jiffies(data->open_deadline); Doesn't this return true if the time is _before_ the open deadline ? Should it be time_is_after_jiffies() ? Yes, time_is_before_jiffies(foo) means foo < jiffies, and that is what we want when we're querying whether we've passed the deadline. +} + +static void watchdog_set_open_deadline(struct watchdog_core_data *data) +{ + data->open_deadline = jiffies + msecs_to_jiffies(open_timeout); The open deadline as defined applies to the time after the device was instantiated, not to the time since boot. Would it be better to make it "time since boot" ? I don't have a strong opinion on that, but two small things made me do it this way: (1) In case a hardware watchdog is somehow hotplugged long after boot and userspace is supposed to detect that and start feeding it, it wouldn't make sense for the framework not to take care of it for a while. (2) The open_timeout also applies to the case where the userspace app gracefully closes the watchdog device (e.g. because it's been instructed to restart to load a new configuration or whatnot) but the hardware cannot be stopped. In that case, the framework also takes over, and the same logic as after boot should apply - if the app fails to come up again, the framework should not feed the dog indefinitely, but OTOH it clearly doesn't make sense to have a boot-time based deadline. Also, are you sure about using milli-seconds (instead of seconds) ? I can not really imagine a situation where this would be needed (especially and even more so in the context of using "time after instantiating"). I went back and forth on this. I did milli-seconds because that should cover more use cases. Yes, wanting an open timeout of .7 seconds with 1.0 seconds being unacceptable is unusual, but I know of some customers with very strict requirements. Also, even if one cannot make userspace start that fast, one can boot with a somewhat generous open_timeout and then write 700 to /sys/module/watchdog/parameters/open_timeout for use in case (2) above. Thanks, Rasmus -- Rasmus Villemoes Software Developer Prevas A/S Hedeager 1 DK-8200 Aarhus N +45 51210274 rasmus.villem...@prevas.dk www.prevas.dk -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/3] watchdog: introduce watchdog.open_timeout commandline parameter
On Mon, Jan 09, 2017 at 04:02:32PM +0100, Rasmus Villemoes wrote: > The watchdog framework takes care of feeding a hardware watchdog until > userspace opens /dev/watchdogN. If that never happens for some reason > (buggy init script, corrupt root filesystem or whatnot) but the kernel > itself is fine, the machine stays up indefinitely. This patch allows > setting an upper limit for how long the kernel will take care of the > watchdog, thus ensuring that the watchdog will eventually reset the > machine. > > This is particularly useful for embedded devices where some fallback > logic is implemented in the bootloader (e.g., use a different root > partition, boot from network, ...). > > The open timeout is also used as a maximum time for an application to > re-open /dev/watchdogN after closing it. > > A value of 0 (the default) means infinite timeout, preserving the > current behaviour. > > Signed-off-by: Rasmus Villemoes > --- > Documentation/watchdog/watchdog-parameters.txt | 9 + > drivers/watchdog/watchdog_dev.c| 26 > +- > 2 files changed, 34 insertions(+), 1 deletion(-) > > diff --git a/Documentation/watchdog/watchdog-parameters.txt > b/Documentation/watchdog/watchdog-parameters.txt > index e21850e..2ae0fdf 100644 > --- a/Documentation/watchdog/watchdog-parameters.txt > +++ b/Documentation/watchdog/watchdog-parameters.txt > @@ -8,6 +8,15 @@ See Documentation/admin-guide/kernel-parameters.rst for > information on > providing kernel parameters for builtin drivers versus loadable > modules. > > +The watchdog core currently understands one parameter, > +watchdog.open_timeout. This is the maximum time, in milliseconds, for > +which the watchdog framework will take care of pinging a hardware > +watchdog until userspace opens the corresponding /dev/watchdogN > +device. A value of 0 (the default) means an infinite timeout. Setting > +this to a non-zero value can be useful to ensure that either userspace > +comes up properly, or the board gets reset and allows fallback logic > +in the bootloader to try something else. > + > > - > acquirewdt: > diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c > index 21809e4..8bc9f24 100644 > --- a/drivers/watchdog/watchdog_dev.c > +++ b/drivers/watchdog/watchdog_dev.c > @@ -66,6 +66,7 @@ struct watchdog_core_data { > struct mutex lock; > unsigned long last_keepalive; > unsigned long last_hw_keepalive; > + unsigned long open_deadline; > struct delayed_work work; > unsigned long status; /* Internal status bits */ > #define _WDOG_DEV_OPEN 0 /* Opened ? */ > @@ -80,6 +81,21 @@ static struct watchdog_core_data *old_wd_data; > > static struct workqueue_struct *watchdog_wq; > > +static unsigned open_timeout; > +module_param(open_timeout, uint, 0644); > + > +static bool watchdog_past_open_deadline(struct watchdog_core_data *data) > +{ > + if (!open_timeout) > + return false; > + return time_is_before_jiffies(data->open_deadline); Doesn't this return true if the time is _before_ the open deadline ? Should it be time_is_after_jiffies() ? > +} > + > +static void watchdog_set_open_deadline(struct watchdog_core_data *data) > +{ > + data->open_deadline = jiffies + msecs_to_jiffies(open_timeout); The open deadline as defined applies to the time after the device was instantiated, not to the time since boot. Would it be better to make it "time since boot" ? Also, are you sure about using milli-seconds (instead of seconds) ? I can not really imagine a situation where this would be needed (especially and even more so in the context of using "time after instantiating"). Thanks, Guenter > +} > + > static inline bool watchdog_need_worker(struct watchdog_device *wdd) > { > /* All variables in milli-seconds */ > @@ -196,7 +212,13 @@ static bool watchdog_worker_should_ping(struct > watchdog_core_data *wd_data) > { > struct watchdog_device *wdd = wd_data->wdd; > > - return wdd && (watchdog_active(wdd) || watchdog_hw_running(wdd)); > + if (!wdd) > + return false; > + > + if (watchdog_active(wdd)) > + return true; > + > + return watchdog_hw_running(wdd) && > !watchdog_past_open_deadline(wd_data); > } > > static void watchdog_ping_work(struct work_struct *work) > @@ -857,6 +879,7 @@ static int watchdog_release(struct inode *inode, struct > file *file) > watchdog_ping(wdd); > } > > + watchdog_set_open_deadline(wd_data); > watchdog_update_worker(wdd); > > /* make sure that /dev/watchdog can be re-opened */ > @@ -955,6 +978,7 @@ static int watchdog_cdev_register(struct watchdog_device > *wdd, dev_t devno) > > /* Record time of most recent heartbeat as 'just before now'. */ > wd_data->last_hw_keepalive = jiffies - 1; > + watchdog_set_open_deadline(wd_d