On Thu, 24 Jul 2014, Doug Hellmann wrote:

I don’t claim any special status except that I was there and am
trying to provide background on why things are as they are. :-)

I think that counts and I very much appreciate the responses.

Having a hard-fail error handler is useful in situations where
continuing operation would make the problem worse *and* the deployer
can fix the problem. Being unable to connect to a database might be an
example of this. However, we don’t want the ceilometer collector to
shutdown if it receives a notification it doesn’t understand because
discarding a malformed message isn’t going to make other ceilometer
operations any worse, and seeing such a message isn’t a situation
the deployer can fix. Catching KeyError, IndexError, AttributeError,
etc. for those cases would be useful if we were going to treat those
exception types differently somehow, but we don’t.

I guess what I'm asking is "shouldn't we treat them differently?" If
I've got a dict coming in over the bus and it is missing a key, big
deal, the bus is still working. I _do_ want to know about it but it
isn't a disaster so I can (and should) catch KeyError and log a
short message (without traceback) that is specially encoded to say
"how about that the notification payload was messed up".

Maybe such a thing is elsewhere in the stack, if so, great. In that
case the thing code I pointed out as a bit of a compromise is in
place, just not in the same place.

What I don't want is a thing logged as _exceptional_ when it isn't.

That said, it’s possible we could tighten up some of the error
handling outside of event processing loops, so as I said, if you have
a more specific proposal for places you think we can predict the
exception type reliably, we should talk about those directly instead
of the general case. You mention distinguishing between “the noise
and the nasty” — do you have some logs you can share?

Only vaguely at this point, based on my experiences in the past few days
trying to chase down failures in the gate. There's just so much logged,
a lot of which doesn't help, but at the same time a fair bit which looks
like it ought to be a traceback and handled more aggressively. That
experience drove me into the Ceilometer code in an effort to check the
hygiene there and see if there was something I could do in that small
environment (rather than the overwhelming context of the The Entire
Project™).

I'll pay a bit closer attention to the specific relationship
between the ceilometer exceptions (on the loops) and the logs and
when I find something that particularly annoys me, I'll submit a
patch for review and we'll see how it goes.

--
Chris Dent tw:@anticdent freenode:cdent
https://tank.peermore.com/tanks/cdent
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to