On 1/2/14, 11:36 AM, "Gordon Sim" <[email protected]> wrote:
>On 12/20/2013 09:26 PM, Herndon, John Luke wrote: >> >> On Dec 20, 2013, at 12:13 PM, Gordon Sim <[email protected]> wrote: >> >>> On 12/20/2013 05:27 PM, Herndon, John Luke wrote: >>>> >>>> Other protocols may support bulk consumption. My one concern with >>>> this approach is error handling. Currently the executors treat >>>> each notification individually. So let¹s say the broker hands >>>> 100 messages at a time. When client is done processing the >>>> messages, the broker needs to know if message 25 had an error or >>>> not. We would somehow need to communicate back to the broker >>>> which messages failed. I think this may take some refactoring of >>>> executors/dispatchers. What do you think? >[...] >>> (2) What would you want the broker to do with the failed messages? >>> What sort of things might fail? Is it related to the message >>> content itself? Or is it failures suspected to be of a temporal >>> nature? > > >> There will be situations where the message can¹t be parsed, and those >> messages can¹t just be thrown away. My current thought is that >> ceilometer could provide some sort of mechanism for sending messages >> that are invalid to an external data store (like a file, or a >> different topic on the amqp server) where a living, breathing human >> can look at them and try to parse out any meaningful information. > >Right, in those cases simply requeueing probably is not the right thing >and you really want it dead-lettered in some way. I guess the first >question is whether that is part of the notification systems function, >or if it is done by the application itself (e.g. by storing it or >republishing it). If it is the latter you may not need any explicit >negative acknowledgement. Exactly, I¹m thinking this is something we¹d build into ceilometer and not oslo, since ceilometer is where the event parsing knowledge lives. From an oslo point of view, the message would be 'acked¹. > >> Other errors might be ³database not available², in which case >> re-queing the message is probably the right way to go. > >That does mean however that the backlog of messages starts to grow on >the broker, so some scheme for dealing with this if the database outage >goes on for a bit is probably important. It also means that the messages > >will keep being retried without any 'backoff' waiting for the database >to be restored which could increase the load. This is a problem we already have :( https://github.com/openstack/ceilometer/blob/master/ceilometer/notification .py#L156-L158 Since notifications cannot be lost, overflow needs to be detected and the messages need to be saved. I¹m thinking the database being down is a rare occurrence that will be worthy of waking someone up in the middle of the night. One possible solution: flip the collector into an emergency mode and save notifications to disc until the issue is resolved. Once the db is up and running, the collector inserts all of these saved messages (as one big batch!). Thoughts? I¹m not sure I understand what you are saying about retrying without a backoff. Can you explain? -john > > > >_______________________________________________ >OpenStack-dev mailing l >[email protected] >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
