Victor Xu created AMBARI-18624:
----------------------------------

             Summary: Deeper Alerting for Kafka
                 Key: AMBARI-18624
                 URL: https://issues.apache.org/jira/browse/AMBARI-18624
             Project: Ambari
          Issue Type: Improvement
          Components: alerts
    Affects Versions: 2.2.2
            Reporter: Victor Xu


Kafka Brokers can become unhealthy while still being 'available', meaning the 
process is up and the service is running but it is unusable. Current alerting 
is only focussed on if the process is up and running, but it's desired to have 
alerting that is focussed on testing for these situations in which the 
component is up, but non functional. 

Example:
Sometimes the brokers are not working but ambari still registers them as green. 
I have seen this in at least one case, OutOfMemory in the broker logs. This 
could easily be reproduced by dropping the broker heap and creating many topics 
to blow the heap. Ambari will still register kafka as healthy.

We need the Kafka team to identify a script that Ambari can run to probe the 
Kafka process to achieve this deeper health alerting capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to