I imagine the equivalent of the oracle db query : "select * from DUAL", ie: something that exercise the server.
A combination of queue produce and consume, on some existing queue or on a temp queue for that purpose. I guess an existing queue may be better b/c on production systems queue creation may be locked down. This covers any potential unexpected blocking, the caveat though, is that blocking can be a reasonable response for a queuing system that has reached some limits. A system that cannot produce may be healthy if it can browse. To that end, maybe we need to have a pre configured queue that has one message on it. We verify we can browse it, then *if* after some small timeout we can produce to it, we consume it. Essentially replacing the single entry on the queue. Periodic monitoring would cycle the head of the queue, blocking and browsing would indicate healthy but blocked since the message-in-time of the head of the queue. It is some sort of multi value return: for example, -1 cannot browse, 0 all good (replaced the head), > 0 the time of the head of the queue I guess it could be red, green, amber also, but that is more vague. It could be turned into that! what is good health is very context specific, but a framework like this could be generally useful I think and provide an example of how some more context specific health checks could be achieved. maybe some food for thought. /gary On Tue, 28 Apr 2020 at 10:52, Domenico Francesco Bruscino <bruscin...@gmail.com> wrote: > > I'm implementing a tool to determine whether the broker is in a healthy > state. There is a series of health checks that can be performed, starting > with the most basic and very rarely producing false positives, to > increasingly more comprehensive, intrusive, and opinionated that have a > higher probability of false positives. > > In the following list there are some health checks grouped by target: > - node > - up - check if a client can connect to the the node > - disk - check if the disk hits the `max-disk-usage` limit > - memory - check if the memory available to the JVM > - backup - check if the backup node is announced > - queues - check if all queues with a positive rate have a consumer > - queue > - up - check if the queue exists > - browser - check if the queue is browsable > - consumer - check if a consumer can connect to the queue and/or receive > messages > - producer - check if a producer can connect to the queue and/or send > messages > > I would start with some of the previous checks, exposing them through the > MBeans interfaces and/or the Command Line utility. > > What are your thoughts? > > Domenico