Aled Sage created BROOKLYN-176:
----------------------------------

             Summary: Expensive polling of driver.isRunning is too frequent
                 Key: BROOKLYN-176
                 URL: https://issues.apache.org/jira/browse/BROOKLYN-176
             Project: Brooklyn
          Issue Type: Improvement
    Affects Versions: 0.9.0-SNAPSHOT
            Reporter: Aled Sage


Many of our entities by default poll driver.isRunning, calling it every 5 
seconds. This often involves a ssh execution, which is cpu intensive. This 
doesn't scale.

This polling should only be done if there is no other/better way to determine 
if an entity is unhealthy. We should use other mechanisms (e.g. checking if a 
web-server is reachable over http), and only resort to calling driver.isRunning 
to provide additional information if that other check fails.

However, turning it off is a little tricky in the current code. The 
SoftwareProcess.initEnrichers does:

{noformat}
ServiceNotUpLogic.updateNotUpIndicator(this, SERVICE_PROCESS_IS_RUNNING, "No 
information yet on whether this service is running");
{noformat}

The SERVICE_PROCESS_IS_RUNNING value is cleared by `connectServiceUpIsRunning`, 
which polls the driver.isRunning`. If you don't call 
`connectServiceUpIsRunning` then you'd need to do something yourself to ensure 
`SERVICE_PROCESS_IS_RUNNING` is cleared.

We also need to better define the best practices for checking serviceUp in a 
pure-YAML entity. We need better examples (and a simpler way) to hook up sensor 
feeds, such as http feeds etc, for polling an entity's health.

There are a few areas of related code:

* The attribute `service.notUp.indicators` allows multiple ways of determining 
if an entity is healthy. If any of these indicators have put an entry into 
``service.notUp.indicators` map, then the entity is marked as serviceUp=false.

* The attribute `service.notUp.diagnostics` is populated with additional info 
when an entity fails. See 
`SoftwareProcessImpl.ServiceNotUpDiagnosticsCollector`, which is executed when 
serviceUp is set to false or when serviceState changes. The defaults are to 
check if the machine is ssh'able, and check driver.isRunning.

* The `HttpRequestSensor` is usable in YAML, via an `EntityInitializer`, to add 
an HTTP-based sensor feed.

* Config key `SoftwareProcess.RETRIEVE_USAGE_METRICS`, disables (some) polling 
for usage/performance metrics, but will still do health metrics such as 
service-up.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to