[
https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bertrand Delacretaz updated SLING-3278:
---------------------------------------
Attachment: SLING-3278-bertrand.patch
Thanks for your patch! Lots of good ideas in there.
I have created a different variant (SLING-3278-bertrand.patch attached), which
requires less changes to the existing core code, and isolates the "hard" parts
better IMO: HealthCheckExecutorImpl just takes care of creating an
HealthCheckExecutionWrapper for each Health Check, and that
HealthCheckExecutionWrapper class handles all the execution and timeout logic,
for a single HC, making the overall logic simpler.
I think this makes writing tests easier, and there's no lists to manage besides
the Map of HealthCheckExecutionWrapper.
To test my variant (manually for now) I have added a SlowHealthCheck example,
executing it via the following URL provides instant response with an actual
execution of the HC every 4.6 seconds max., which is its configured delay:
http://localhost:8080/system/console/healthcheck?tags=slow&debug=true&timeout=1
Open Issues:
* Write tests for the two classes mentioned above
* For some reason the wrapper is not created for the CompositeHealthCheck
sample at startup, I get "createWrapper: no service provided by
org.apache.sling.hc.api.HealthCheck". Restarting the component fixes that.
* Scheduled execution of HC is not implemented yet - I'd prefer handling that
in a separate patch. Maybe read optional HC service properties prefixed by
"hc.schedule." which map to existing Sling scheduler properties, so that we
keep the full flexibility of that.
* Currently we get HEALTH_CHECK_ERROR "no data yet" when no Result is available
yet, instead of the NODATA status that I was planning. That might be good
enough.
What do you think, is there something else from your patch that's not covered?
> Provide a HealthCheckExecutor service
> -------------------------------------
>
> Key: SLING-3278
> URL: https://issues.apache.org/jira/browse/SLING-3278
> Project: Sling
> Issue Type: New Feature
> Components: Health Check
> Reporter: Georg Henzler
> Assignee: Georg Henzler
> Attachments: SLING-3278-bertrand.patch,
> SLING-3278-hc.core-HealthCheckExecutorService-v0.5.patch,
> SLING-3278-hc.webconsole-v0.5.patch
>
>
> Goals:
> * Be able to get an overall (aggregated) result as quickly as possible
> (ideally <2sec)
> * Whenever possible, return most current results (e.g. for a memory check)
> * Provide a declarative way for async checks (async checks should be the
> exception though)
> Approach
> * Run checks in parallel
> * Make sure long running (or even stuck) checks are timed out
> * If a health check must run asynchronously (because its execution time
> cannot be optimized), it should be enough to just specify a service property
> (e.g. "hc.async").
> See also
> http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402
> http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)