Re: [PATCH 0/3] clear_warn_once: add timed interval resetting

Petr Mladek Wed, 09 Dec 2020 08:38:44 -0800

On Thu 2020-11-26 01:30:26, Paul Gortmaker wrote:
> The existing clear_warn_once functionality is currently a manually
> issued state reset via the file /sys/kernel/debug/clear_warn_once when
> debugfs is mounted.  The idea being that a developer would be running
> some tests, like LTP or similar, and want to check reproducibility
> without having to reboot.
> 
> But you currently can't make use of clear_warn_once unless you've got
> debugfs enabled and mounted - which may not be desired by some people
> in some deployment situations.
> 
> The functionality added here allows for periodic resets in addition to
> the one-shot reset it already had.  Then we allow for a boot-time setting
> of the periodic resets so it can be used even when debugfs isn't mounted.
> 
> By having a periodic reset, we also open the door for having the various
> "once" functions act as long period ratelimited messages, where a sysadmin
> can pick an hour or a day reset if they are facing an issue and are
> wondering "did this just happen once, or am I only being informed once?"


OK, I though more about it and I NACK this patchset.

My reason:

1. The primary purpose was to provide a way to reset warn_once() without
   debugfs. From this POV, the solution is rather complicated: timers
   and another kernel parameter.

2. I am not aware of any convincing argument why debugfs could not be
   mounted on the debugged system.

3. Debugfs provides many more debugging facilities. It is designed for
   this purpose. It does not look like a good strategy to provide
   alternative interfaces just to avoid it.

4. There were mentioned several other use cases for this feature,
   like RT systems. But it was not clear that it was really needed
   or that people would really use it.

5. Some code might even rely on that it is called only once, see commit
   dfbf2897d00499f94cd ("bug: set warn variable before calling
   WARN()") or the recent
   https://lore.kernel.org/r/[email protected]

   It should better stay as debugging feature that should be used with
   care.


6. It creates system wide ratelimited printk().

   We have printk_ratelimited() for this. And it is quite problematic.
   It is supposed to prevent flood of printk() messages. But it does
   not work well because the limits depend on more factors, like:
   system size, conditions, console speed.

   Yes, the proposed feature is supposed to solve another problem
   (lack of messages). But it is a global action that would
    re-enable >1000 messages that were limited to be printed
    only once because they could be too frequent. As a result:

        + it might cause flood of printk() messages

        + it is hard to define a good system wide time limit;
          it was even unclear what should be the lower limit.

        + it will restart the messages at some "random" point,
          so that the relation of the reported events would
          be unclear.

  From the API point of view:

        + printk_ratelimited() is used when we want to see that a
          problem is still there. It is per-message setting.

        + printk_once() is used when even printk_ratelimited() would
          be too much. It is per-message setting.

        + The new printk_repeated_once() is a strange mix of this two
          with the global setting. It does not fit much.


Best Regards,
Petr

PS: I did not answer your last mail because it looked like an endless
    fight over words or point of views. I decided to make a summary
    of my view instead. These are reason why I nacked it.

    I know that there might be different views but so far no arguments
    changed mine. And I do not know how to explain it better.

Re: [PATCH 0/3] clear_warn_once: add timed interval resetting

Reply via email to