[ https://issues.apache.org/jira/browse/SLING-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010189#comment-16010189 ]
Bertrand Delacretaz commented on SLING-6855: -------------------------------------------- bq. Could it be easier/better to just record metrics? I like the idea, especially if it means no changes to the health check APIs. Let me try to reformulate to see if I understand your idea, with some additional tentative details: * An HC service which has a {{hc.registermeters}} or maybe {{hc.generatemetrics}} property causes metrics on its results to be generated via [1] * Such HCs are executed at regular intervals to compute these metrics. The interval in seconds might be the value of that {{hc.generatemetrics}} property. * HCs can be configured to watch these metrics and complain if values are out of range. * When out of range values are flagged, the alarm remains valid for a configurable amount of time. How that exactly happens is to be defined [1] https://sling.apache.org/documentation/bundles/metrics.html > Create ResultRegistry to provide health check behavior for executing code > that does not want a HealthCheck > ---------------------------------------------------------------------------------------------------------- > > Key: SLING-6855 > URL: https://issues.apache.org/jira/browse/SLING-6855 > Project: Sling > Issue Type: New Feature > Components: Health Check > Reporter: Clinton H Goudie-Nice > > I want to provide a Registry service that can be leveraged to provide health > check results. > These results can be for a period of time through an expiration, until the > JVM is restarted, or added and later removed. > This can be useful when code observes a specific (possibly bad) state, and > wants to alert through the health check API that this state has taken place. > Some examples: > An event pool has filled, and some events will be thrown away. > This is a failure case that requires a restart of the instance. > It would be appropriate to trigger a permanent failure. > > A quota has been tripped. This quota may immediately recover, but it is > sensible to alert for 30 minutes that the quota has been tripped. > If you expect the failure will clear itself within a certain window, setting > the expiration to that window can be ideal. > GHPR to follow -- This message was sent by Atlassian JIRA (v6.3.15#6346)