[ 
https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bertrand Delacretaz updated SLING-3278:
---------------------------------------

    Attachment: SLING-3278-bertrand.patch

Thanks for your patch! Lots of good ideas in there.

I have created a different variant (SLING-3278-bertrand.patch attached), which 
requires less changes to the existing core code, and isolates the "hard" parts 
better IMO: HealthCheckExecutorImpl just takes care of creating an 
HealthCheckExecutionWrapper for each Health Check, and that 
HealthCheckExecutionWrapper class handles all the execution and timeout logic, 
for a single HC, making the overall logic simpler.

I think this makes writing tests easier, and there's no lists to manage besides 
the Map of HealthCheckExecutionWrapper.

To test my variant (manually for now) I have added a SlowHealthCheck example, 
executing it via the following URL provides instant response with an actual 
execution of the HC every 4.6 seconds max., which is its configured delay:

http://localhost:8080/system/console/healthcheck?tags=slow&debug=true&timeout=1

Open Issues:
* Write tests for the two classes mentioned above
* For some reason the wrapper is not created for the CompositeHealthCheck 
sample at startup, I get "createWrapper: no service provided by 
org.apache.sling.hc.api.HealthCheck". Restarting the component fixes that.
* Scheduled execution of HC is not implemented yet - I'd prefer handling that 
in a separate patch. Maybe read optional HC service properties prefixed by 
"hc.schedule." which map to existing Sling scheduler properties, so that we 
keep the full flexibility of that.
* Currently we get HEALTH_CHECK_ERROR "no data yet" when no Result is available 
yet, instead of the NODATA status that I was planning. That might be good 
enough.

What do you think, is there something else from your patch that's not covered?

> Provide a HealthCheckExecutor service
> -------------------------------------
>
>                 Key: SLING-3278
>                 URL: https://issues.apache.org/jira/browse/SLING-3278
>             Project: Sling
>          Issue Type: New Feature
>          Components: Health Check
>            Reporter: Georg Henzler
>            Assignee: Georg Henzler
>         Attachments: SLING-3278-bertrand.patch, 
> SLING-3278-hc.core-HealthCheckExecutorService-v0.5.patch, 
> SLING-3278-hc.webconsole-v0.5.patch
>
>
> Goals:
> * Be able to get an overall (aggregated) result as quickly as possible 
> (ideally <2sec)
> * Whenever possible, return most current results (e.g. for a memory check)
> * Provide a declarative way for async checks (async checks should be the 
> exception though) 
> Approach
> * Run checks in parallel
> * Make sure long running (or even stuck) checks are timed out
> * If a health check must run asynchronously (because its execution time 
> cannot be optimized), it should be enough to just specify a service property 
> (e.g. "hc.async").
> See also
> http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402
> http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to