[ 
https://issues.apache.org/jira/browse/SLING-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010189#comment-16010189
 ] 

Bertrand Delacretaz commented on SLING-6855:
--------------------------------------------

bq. Could it be easier/better to just record metrics?

I like the idea, especially if it means no changes to the health check APIs.

Let me try to reformulate to see if I understand your idea, with some 
additional tentative details:

* An HC service which has a {{hc.registermeters}} or maybe 
{{hc.generatemetrics}} property causes metrics on its results to be generated 
via [1]
* Such HCs are executed at regular intervals to compute these metrics. The 
interval in seconds might be the value of that {{hc.generatemetrics}} property.
* HCs can be configured to watch these metrics and complain if values are out 
of range. 
* When out of range values are flagged, the alarm remains valid for a 
configurable amount of time. How that exactly happens is to be defined

[1] https://sling.apache.org/documentation/bundles/metrics.html

> Create ResultRegistry to provide health check behavior for executing code 
> that does not want a HealthCheck
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SLING-6855
>                 URL: https://issues.apache.org/jira/browse/SLING-6855
>             Project: Sling
>          Issue Type: New Feature
>          Components: Health Check
>            Reporter: Clinton H Goudie-Nice
>
> I want to provide a Registry service that can be leveraged to provide health 
> check results.
> These results can be for a period of time through an expiration, until the 
> JVM is restarted, or added and later removed.
> This can be useful when code observes a specific (possibly bad) state, and 
> wants to alert through the health check API that this state has taken place.
>  Some examples: 
>  An event pool has filled, and some events will be thrown away.
>   This is a failure case that requires a restart of the instance.
>   It would be appropriate to trigger a permanent failure.
>    
>  A quota has been tripped. This quota may immediately recover, but it is 
> sensible to alert for 30 minutes that the quota has been tripped.
>  If you expect the failure will clear itself within a certain window, setting 
> the expiration to that window can be ideal.
> GHPR to follow



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to