[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846328#comment-13846328 ]
Georg Henzler edited comment on SLING-3278 at 12/12/13 2:29 PM: ---------------------------------------------------------------- The property for async execution property can make sense when you want to make sure a check is called not as often as the health check itself (e.g. only twice a day). I'm pretty much done, No 2 of Bertrand's list and unit tests are missing if you like you can have a look at the patches to give feedback before I submit a final one. Impl Notes: * The main entry method is org.apache.sling.hc.core.executor.HealthCheckExecutor.runAllForTags(String...) * Results are cached for 2sec by default (configurable) * Results have now a HealthCheckDescriptor that contains meta info for the check (also used in the executor as cache key etc.) * Async is supported by attribute hc.async.cronExpression, a service listener is in place for registering/unregistering of jobs (org.apache.sling.hc.core.executor.AsyncHealthCheckExecutor) * I did add a natural order to results (failed tests first, then by name alphabetically) - if not using this the order would be arbitrary (depending on execution time) * The result has an additional finishDate and elapsedTime (I think finish date is more interesting for caching than the start date!) Other thoughts (not in patch): * I'm not sure if the CompositeHealthCheck makes sense - is this not a grouping competing with the tags? It is easy to configure it in a way that some checks are executed twice, especially if you run all checks without giving a tag (and the HealthCheckExecutor cannot prevent it as the CompositeHealthCheck looks like any other check to it) * Exceptions: The result should be able to carry a exception - I would even go as far as adding "throws Exception" to the execute() signature (this would not break any existing implementation classes) and generically add a last critical log if the HC happens to throw an exception was (Author: henzlerg): The property for async execution property can make sense when you want to make sure a check is called not as often as the health check itself (e.g. only twice a day). I'm pretty much done, No 2 of Bertrand's list and unit tests are missing if you like you can have a look at the patches to give feedback before I submit a final one. Impl Notes: * The main entry method is org.apache.sling.hc.core.executor.HealthCheckExecutor.runAllForTags(String...) * Results have now a HealthCheckDescriptor that contains meta info for the check (also used in the executor as cache key etc.) * Async is supported by attribute hc.async.cronExpression, a service listener is in place for registering/unregistering of jobs (org.apache.sling.hc.core.executor.AsyncHealthCheckExecutor) * I did add a natural order to results (failed tests first, then by name alphabetically) - if not using this the order would be arbitrary (depending on execution time) * The result has an additional finishDate and elapsedTime (I think finish date is more interesting for caching than the start date!) Other thoughts (not in patch): * I'm not sure if the CompositeHealthCheck makes sense - is this not a grouping competing with the tags? It is easy to configure it in a way that some checks are executed twice, especially if you run all checks without giving a tag (and the HealthCheckExecutor cannot prevent it as the CompositeHealthCheck looks like any other check to it) * Exceptions: The result should be able to carry a exception - I would even go as far as adding "throws Exception" to the execute() signature (this would not break any existing implementation classes) and generically add a last critical log if the HC happens to throw an exception > Provide a HealthCheckExecutor service > ------------------------------------- > > Key: SLING-3278 > URL: https://issues.apache.org/jira/browse/SLING-3278 > Project: Sling > Issue Type: New Feature > Components: Health Check > Reporter: Georg Henzler > Assignee: Georg Henzler > Attachments: > SLING-3278-hc.core-HealthCheckExecutorService-v0.5.patch, > SLING-3278-hc.webconsole-v0.5.patch > > > Goals: > * Be able to get an overall (aggregated) result as quickly as possible > (ideally <2sec) > * Whenever possible, return most current results (e.g. for a memory check) > * Provide a declarative way for async checks (async checks should be the > exception though) > Approach > * Run checks in parallel > * Make sure long running (or even stuck) checks are timed out > * If a health check must run asynchronously (because its execution time > cannot be optimized), it should be enough to just specify a service property > (e.g. "hc.async"). > See also > http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 > http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.4#6159)