Hi Yujun,

You started an interesting discussion. I think that the distinction between an 
operational error and a programmer error is correct and we should always keep 
that in mind.

I agree that having an overall design for error handling in Vitrage is a good 
idea; but I disagree that until then we better let it crash.

I think that Vitrage is made out of many pieces that don’t necessarily depend 
on one another. For example, if one datasource fails, everything else can work 
as usual – so why crash? Similarly, if one template fails to load, all other 
templates can still be activated.
Another aspect is that the main purpose of Vitrage is to provide insights. In 
case of a failure in one datasource/template, some of the insights might be 
missing. But this will not lead to inaccurate behavior or to wrong actions 
being executed in the system. IMO, we should give the user as much information 
as possible given that we have only part of the input.

Regarding the use cases that you mentioned:


  1.  invalid configuration file
[Ifat] This should depend on the specific configuration. If keystone is 
misconfigured, nothing will work of course. But if for example Zabbix is 
misconfigured, Vitrage should work and show the topology and the non-Zabbix 
alarms.


  1.  failed to communicate with data source
[Ifat] I think that the error should be logged, and all other datasources 
should work as usual.


  1.  malformed data from data source

[Ifat] I think that the error should be logged, and all other datasources 
should work as usual. This problem means we must modify the code in the 
datasource itself, but until then Vitrage should work, right?


  1.  failed to execute an action
[Ifat] Again, that’s a problem that requires code changes; but why fail other 
actions?


  1.  ...

BTW, it might be a good idea to add API/UI for showing the configuration and 
the status of the datasources. We all know that errors in the log files are 
often ignored…

Best Regards,
Ifat.


From: "Yujun Zhang (ZTE)" <zhangyujun+...@gmail.com>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org>
Date: Monday, 29 May 2017 at 16:13
To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org>
Subject: [openstack-dev] [vitrage] error handling

Brought up by a recent code review, I think it worth a thorough discussion 
about the error handling rule.

I once read an article[1] from Joyent and it impressed me on the distinguish 
between Operational errors vs. programmer errors. The article is written for 
nodejs, but the principle also applies for other programming language.

The basic rule recommended by Joyent is
Handling operational errors
(Not) handling programmer errors
There is also one rule in openstack style guide line[2] close to this idea.

[H201] Do not write except:, use except Exception: at the very least. When 
catching an exception you should be as specific so you don’t mistakenly catch 
unexpected exceptions.

I do think before we have a well designed error handling, it is better to let 
it crash. It is dangerous to hide the errors and keep the system running in 
undetermined states.

So the question is what kind of operational errors are we facing in vitrage? I 
can think of something like

  1.  invalid configuration file
  2.  failed to communicate with data source
  3.  malformed data from data source
  4.  failed to execute an action
  5.  ...
Maybe this could be the first step for the error handling design.

[1]: https://www.joyent.com/node-js/production/design/errors
[2]: https://docs.openstack.org/developer/hacking/

--
Yujun Zhang
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to