Re: [whimsy.git] [1/1] Commit 9768038: warnings should trump info

Sam Ruby Sun, 31 Jan 2016 08:09:05 -0800

On Sun, Jan 31, 2016 at 10:13 AM, sebb <seb...@gmail.com> wrote:
> On 31 January 2016 at 14:09, Sam Ruby <ru...@intertwingly.net> wrote:
>> On Sun, Jan 31, 2016 at 8:29 AM, sebb <seb...@gmail.com> wrote:
>>> On 31 January 2016 at 13:20, Sam Ruby <ru...@intertwingly.net> wrote:
>>>> On Sun, Jan 31, 2016 at 7:56 AM, sebb <seb...@gmail.com> wrote:
>>>>> On 31 January 2016 at 12:24, Sam Ruby <ru...@intertwingly.net> wrote:
>>>>>> On Sun, Jan 31, 2016 at 6:45 AM, sebb <seb...@gmail.com> wrote:
>>>>>>> On 30 January 2016 at 00:17, Sam Ruby <ru...@apache.org> wrote:
>>>>>>>> Commit 97680383779cd485498d6bb8fa8288af49cc2145:
>>>>>>>>     warnings should trump info
>>>>>>>>     also implement parsing todo
>>>>>>>>
>>>>>>>>
>>>>>>>> Branch: refs/heads/master
>>>>>>>> Author: Sam Ruby <ru...@intertwingly.net>
>>>>>>>> Committer: Sam Ruby <ru...@intertwingly.net>
>>>>>>>> Pusher: rubys <ru...@apache.org>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------
>>>>>>>> www/status/monitors/public_json.rb                           | ++++++ 
>>>>>>>> ---
>>>>>>>> ------------------------------------------------------------
>>>>>>>> 16 changes: 11 additions, 5 deletions.
>>>>>>>> ------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> diff --git a/www/status/monitors/public_json.rb 
>>>>>>>> b/www/status/monitors/public_json.rb
>>>>>>>> index 39eb31b..7ff71b1 100644
>>>>>>>> --- a/www/status/monitors/public_json.rb
>>>>>>>> +++ b/www/status/monitors/public_json.rb
>>>>>>>> @@ -21,16 +21,22 @@ def Monitor.public_json(previous_status)
>>>>>>>>        # Ignore Wunderbar logging for normal messages (may occur 
>>>>>>>> multiple times)
>>>>>>>>        contents.gsub! /^(_INFO|_DEBUG) .*\n+/, ''
>>>>>>>>
>>>>>>>> -      # Wunderbar warning (TODO - extract the text and return it?)
>>>>>>>> -      if contents.gsub! /^_WARN .*?\n+/, ''
>>>>>>>> -        status[name].merge! level: 'warning', title: 'warning'
>>>>>>>> -      end
>>>>>>>> -
>>>>>>>>        # diff -u output:
>>>>>>>>        if contents.sub! /^--- .*?\n(\n|\Z)/m, ''
>>>>>>>>          status[name].merge! level: 'info', title: 'updated'
>>>>>>>>        end
>>>>>>>>
>>>>>>>> +      # Wunderbar warning
>>>>>>>> +      warnings = contents.scan(/^_WARN (.*?)\n+/)
>>>>>>>> +      if warnings.length == 1
>>>>>>>> +        contents.sub! /^_WARN (.*?)\n+/, ''
>>>>>>>> +        status[name].merge! level: 'warning', data: $1
>>>>>>>> +      elsif warnings.length > 0
>>>>>>>> +        contents.gsub! /^_WARN (.*?)\n+/, ''
>>>>>>>> +        status[name].merge! level: 'warning', data: warnings.flatten,
>>>>>>>> +          title: "#{warnings.length} warnings"
>>>>>>>> +      end
>>>>>>>> +
>>>>>>>
>>>>>>> Is it intended that the title is omitted when there is only one _WARN 
>>>>>>> line?
>>>>>>> If not, the else clause would be sufficient.
>>>>>>
>>>>>> Normally titles aren't intended to set by monitors:
>>>>>>
>>>>>> https://github.com/apache/whimsy/tree/master/www/status#titles
>>>>>>
>>>>>> The intent of the above is to surface the individual warning should
>>>>>> there only be one.  Otherwise provide a count of the number of
>>>>>> warnings.
>>>>>>
>>>>>> If you note, the warnings that we are getting via email are fairly
>>>>>> informative (e.g., they tell you what LDAP server is down), instead of
>>>>>> simply saying "1 warnings".
>>>>>
>>>>> OK
>>>>>
>>>>>>> Also I see that the Ping My Box service is set up to treat warning
>>>>>>> level as a failure.
>>>>>>> I assume we cannot change that just for Whimsy monitoring.
>>>>>>
>>>>>> Actually, we can:
>>>>>>
>>>>>> https://github.com/apache/whimsy/tree/master/www/status#alerts
>>>>>> https://github.com/apache/whimsy/blob/master/www/status/index.cgi#L24
>>>>>>
>>>>>>> I don't think we want PMB 'Down' mails just for warnings, so perhaps
>>>>>>> we should convert the _WARN to informational level?
>>>>>>>
>>>>>>>>        unless contents.empty?
>>>>>>>>          status[name].merge! level: 'danger', data: 
>>>>>>>> contents.split("\n")
>>>>>>>>        end
>>>>>>
>>>>>> Let's take an individual LDAP server being unavailable as an example.
>>>>>>
>>>>>> The question is: given that we have LDAP failover, is one server being
>>>>>> down only informational or something that you actually want to warn
>>>>>> somebody about?  When a warning is issued, do you want the warning to
>>>>>> only show up in the status page should somebody happen to look for it,
>>>>>> or actually trigger an email?
>>>>>>
>>>>>> I took an initial stab at it (assuming warnings would only be used
>>>>>> when you actually wanted a human to see the message).  But this is
>>>>>> open for discussion.  Thoughts?
>>>>>
>>>>> I used WARN for LDAP failover because it's more than just
>>>>> informational, but not an error.
>>>>>
>>>>> The intention was that the WARN would be seen, but not to trigger a DOWN 
>>>>> alert.
>>>>> So for example if the same LDAP host keeps failing, we could remove it
>>>>> from the list.
>>>>
>>>> To me, you have just described an informational message.
>>>
>>> Huh?
>>>
>>> An informational message is one such as git_info.
>>> It does not matter if it is never noticed.
>>
>> I would consider that a success message.
>
> Perhaps.
>
>>> However the LDAP failure is potentially important, and needs to be
>>> kept beyond the lifetime of the log file.
>>> At present AFAICT info messages are lost once the new log file is created.
>>>
>>>>> Similarly for other WARN messages (not that there are any yet).
>>>>>
>>>>> I had hoped that the system would allow for different types of e-mails
>>>>> rather than just DOWN/UP but if that is not possible then it would be
>>>>> helpful to have the warnings listed somewhere which can be reviewed
>>>>> from time to time.
>>>>
>>>> I believe that a lot more is possible, but haven't dug in to find out.
>>>> Without knowing how we were going to ultimately use it, Daniel Gruno
>>>> enabled the simplest check possible so that we could explore what we
>>>> ultimately wanted.  Once we have a concrete proposal, perhaps more
>>>> checks can be added.
>>>>
>>>> Meanwhile, what we have is a binary option (pass/fail).  And the
>>>> ability to provide a text message on failure.
>>>>
>>>> Planning ahead, I split both pass and fail into two categories;
>>>> (success, info), (warn, danger).  The web page is designed to show the
>>>> highest level, so so if there is an info message mixed in with success
>>>>  messages, that's what will show up.  If there is a danger message
>>>> mixed in with a number of warnings, that's what will show up. With
>>>> this scheme, the difference between informational and warning being
>>>> informational is something you might want to be aware of, but don't
>>>> trigger an alert.
>>>>
>>>> Perhaps we need more levels if you really want something between info and 
>>>> warn?
>>>
>>> I don't think we need more levels.
>>>
>>> We just need to be able to treat warn messages differently from info and 
>>> error.
>>
>> We seem to have a semantic problem here.  Let's momentarily put aside
>> all labels as they seem to be distracting from the discussion.
>>
>> Looking at the "generate an alert and email" boolean: true or false.
>
> I think this may be where the issue is: I think it would be useful to
> generate two kinds of notifications:
> Alerts that need to be dealt with promptly.
> Alerts that need to be dealt with at some point.
>
>> The first question is: how many levels do we need for things that do
>> not generate an alert?  I will assert that this needs to be at least
>> two.
>
> I guess so.


At the moment, we have exactly two: success and info.

> But the question is how are these recorded?
> Are the ephemeral, until the next run, or do they last longer than that?

At the moment, they are entirely ephemeral, but that is entirely up to us.

>> The next question is: how many levels do we need for things that do
>> generate an alert?  Again, I will argue that this needs to be at least
>> two.
>>
>> Levels are useful in the case where there are multiple messages, with
>> some being more important than others.
>>
>> Once we figure out how many levels we need, we can figure out what the
>> names for the various levels should be.
>
> It's not a naming issue, it's what is done with the different types of
> notifications.
> I don't think the current alternatives are sufficient to properly
> handle the different types.

If we want the infrastructure team to do something different, we need
to figure out what we want to ask for.  And it would be incumbent on
us to make sure that the data that they need to perform that service
is easy to parse.  Anticipating that step, the status is already made
available in JSON form:

https://whimsy-test.apache.org/status/status.json

If we want to do something else (for example, retain status), then we
need to figure out what we want, and to implement it.

If we need more status levels than four (two good, two bad), then
there are plenty of names available for new status levels.  A quick
search of existing logging functions comes up with debug, trace, note,
and fatal.  Any or all of these could be added to the current four
levels.

- Sam Ruby

Re: [whimsy.git] [1/1] Commit 9768038: warnings should trump info

Reply via email to