[jira] [Commented] (SLING-6855) Sticky Results Support

2017-06-09 Thread Bertrand Delacretaz (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044469#comment-16044469
 ] 

Bertrand Delacretaz commented on SLING-6855:


Thanks [~henzlerg] - I added a sticky config to the async HC in the samples 
bundle, and initially that did not work as the HC results cache was not updated 
for async execution.

http://svn.apache.org/r1798222 fixes that for me, do you agree with this fix? 
Tests didn't detect that issue, if you have an idea for improving them for that 
that's great otherwise I can have a look.

With this fix, installing the {{org.apache.sling.hc.samples}} bundle and 
requesting http://localhost:8080/system/console/healthcheck?tags=async shows 
output like follows:

{code}
Sticky Asynchronous Health Check sample
Tags: [async, sticky] Finished: 2017-06-09 59:29 after 1ms
Result: HEALTH_CHECK_ERROR
INFO *** Current Result ***
INFO AsyncHealthCheckSample@3d846efa - counter value set to 788 at Fri Jun 09 
15:59:29 CEST 2017

WARN *** Sticky Result CRITICAL from 15:59:24.002 ***
INFO AsyncHealthCheckSample@31a3fb2e - counter value set to 783 at Fri Jun 09 
15:59:24 CEST 2017
CRITICAL Counter value (783) is not a multiple of 3 (critical) at Fri Jun 09 
15:59:24 CEST 2017

WARN *** Sticky Result HEALTH_CHECK_ERROR from 15:58:46.002 ***
INFO AsyncHealthCheckSample@1b2b0a85 - counter value set to 745 at Fri Jun 09 
15:58:46 CEST 2017
HEALTH_CHECK_ERROR Counter value (745) is not a multiple of 5 
(healthCheckError) at Fri Jun 09 15:58:46 CEST 2017
{code}

The cache keeps one result of each type, by design, I think that's ok, it 
prevents the cache from growing indefinitely and gives useful information as to 
recent non-ok results. The results are not ordered by time which can be a bit 
surprising but is ok IMO.

If we agree on how this feature works we should document it at 
https://sling.apache.org/documentation/bundles/sling-health-check-tool.html

> Sticky Results Support
> --
>
> Key: SLING-6855
> URL: https://issues.apache.org/jira/browse/SLING-6855
> Project: Sling
>  Issue Type: New Feature
>  Components: Health Check
>Reporter: Clinton H Goudie-Nice
>Assignee: Georg Henzler
> Fix For: Health Check Annotations 1.0.6, Health Check Core 
> 1.2.10, Health Check API 1.0.2
>
>
> Introduce HC service property {{hc.warningsStickForMinutes}} to allow old 
> WARN/CRITICAL/HEALTH_CHECK_ERROR results to be sticky (see also 
> http://sling.markmail.org/thread/tawikgt7bqxvnlj5#query:+page:1+mid:57hhg55hekr7ib33+state:results)
> --- Original Request 
> *Create ResultRegistry to provide health check behavior for executing code 
> that does not want a HealthCheck* 
> I want to provide a Registry service that can be leveraged to provide health 
> check results.
> These results can be for a period of time through an expiration, until the 
> JVM is restarted, or added and later removed.
> This can be useful when code observes a specific (possibly bad) state, and 
> wants to alert through the health check API that this state has taken place.
>  Some examples: 
>  An event pool has filled, and some events will be thrown away.
>   This is a failure case that requires a restart of the instance.
>   It would be appropriate to trigger a permanent failure.
>
>  A quota has been tripped. This quota may immediately recover, but it is 
> sensible to alert for 30 minutes that the quota has been tripped.
>  If you expect the failure will clear itself within a certain window, setting 
> the expiration to that window can be ideal.
> GHPR to follow



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SLING-6855) Sticky Results Support

2017-06-10 Thread Georg Henzler (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045440#comment-16045440
 ] 

Georg Henzler commented on SLING-6855:
--

[~bdelacretaz] Thanks for fixing this - I light-heartedly only tested with 
synchronous tests. I think the fix is good (except that I maybe would use 
{{@Reference  HealthCheckResultCache cache}} in {{AsyncHealthCheckExecutor}} 
instead of passing it in as parameter)

> The cache keeps one result of each type, by design
Yes this is intended - if someone wants to keep the full history this is better 
to be done in a monitoring tool that can store the historic results. Here it is 
"only" about  changing the result status if something went wrong in the past, 
then one per type is sufficient.

> If we agree on how this feature works we should document it
I agree, will do so on Monday! (I will also look into how to create a release, 
have not done that yet). 

> Sticky Results Support
> --
>
> Key: SLING-6855
> URL: https://issues.apache.org/jira/browse/SLING-6855
> Project: Sling
>  Issue Type: New Feature
>  Components: Health Check
>Reporter: Clinton H Goudie-Nice
>Assignee: Georg Henzler
> Fix For: Health Check Annotations 1.0.6, Health Check Core 
> 1.2.10, Health Check API 1.0.2
>
>
> Introduce HC service property {{hc.warningsStickForMinutes}} to allow old 
> WARN/CRITICAL/HEALTH_CHECK_ERROR results to be sticky (see also 
> http://sling.markmail.org/thread/tawikgt7bqxvnlj5#query:+page:1+mid:57hhg55hekr7ib33+state:results)
> --- Original Request 
> *Create ResultRegistry to provide health check behavior for executing code 
> that does not want a HealthCheck* 
> I want to provide a Registry service that can be leveraged to provide health 
> check results.
> These results can be for a period of time through an expiration, until the 
> JVM is restarted, or added and later removed.
> This can be useful when code observes a specific (possibly bad) state, and 
> wants to alert through the health check API that this state has taken place.
>  Some examples: 
>  An event pool has filled, and some events will be thrown away.
>   This is a failure case that requires a restart of the instance.
>   It would be appropriate to trigger a permanent failure.
>
>  A quota has been tripped. This quota may immediately recover, but it is 
> sensible to alert for 30 minutes that the quota has been tripped.
>  If you expect the failure will clear itself within a certain window, setting 
> the expiration to that window can be ideal.
> GHPR to follow



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SLING-6855) Sticky Results Support

2017-11-15 Thread Karl Pauls (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16254352#comment-16254352
 ] 

Karl Pauls commented on SLING-6855:
---

[~henzlerg], can we resolve this issue by now (i.e., did you create the 
documentation)? I would like to do a HC Core release...

> Sticky Results Support
> --
>
> Key: SLING-6855
> URL: https://issues.apache.org/jira/browse/SLING-6855
> Project: Sling
>  Issue Type: New Feature
>  Components: Health Check
>Reporter: Clinton H Goudie-Nice
>Assignee: Georg Henzler
> Fix For: Health Check Annotations 1.0.6, Health Check Core 
> 1.2.10, Health Check API 1.0.2
>
>
> Introduce HC service property {{hc.warningsStickForMinutes}} to allow old 
> WARN/CRITICAL/HEALTH_CHECK_ERROR results to be sticky (see also 
> http://sling.markmail.org/thread/tawikgt7bqxvnlj5#query:+page:1+mid:57hhg55hekr7ib33+state:results)
> --- Original Request 
> *Create ResultRegistry to provide health check behavior for executing code 
> that does not want a HealthCheck* 
> I want to provide a Registry service that can be leveraged to provide health 
> check results.
> These results can be for a period of time through an expiration, until the 
> JVM is restarted, or added and later removed.
> This can be useful when code observes a specific (possibly bad) state, and 
> wants to alert through the health check API that this state has taken place.
>  Some examples: 
>  An event pool has filled, and some events will be thrown away.
>   This is a failure case that requires a restart of the instance.
>   It would be appropriate to trigger a permanent failure.
>
>  A quota has been tripped. This quota may immediately recover, but it is 
> sensible to alert for 30 minutes that the quota has been tripped.
>  If you expect the failure will clear itself within a certain window, setting 
> the expiration to that window can be ideal.
> GHPR to follow



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SLING-6855) Sticky Results Support

2017-11-16 Thread Karl Pauls (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16255151#comment-16255151
 ] 

Karl Pauls commented on SLING-6855:
---

[~henzlerg], I'm resolving this issue as the code has been commited. Please 
follow-up on the documentation using SLING-7246.

> Sticky Results Support
> --
>
> Key: SLING-6855
> URL: https://issues.apache.org/jira/browse/SLING-6855
> Project: Sling
>  Issue Type: New Feature
>  Components: Health Check
>Reporter: Clinton H Goudie-Nice
>Assignee: Georg Henzler
> Fix For: Health Check Annotations 1.0.6, Health Check Core 
> 1.2.10, Health Check API 1.0.2
>
>
> Introduce HC service property {{hc.warningsStickForMinutes}} to allow old 
> WARN/CRITICAL/HEALTH_CHECK_ERROR results to be sticky (see also 
> http://sling.markmail.org/thread/tawikgt7bqxvnlj5#query:+page:1+mid:57hhg55hekr7ib33+state:results)
> --- Original Request 
> *Create ResultRegistry to provide health check behavior for executing code 
> that does not want a HealthCheck* 
> I want to provide a Registry service that can be leveraged to provide health 
> check results.
> These results can be for a period of time through an expiration, until the 
> JVM is restarted, or added and later removed.
> This can be useful when code observes a specific (possibly bad) state, and 
> wants to alert through the health check API that this state has taken place.
>  Some examples: 
>  An event pool has filled, and some events will be thrown away.
>   This is a failure case that requires a restart of the instance.
>   It would be appropriate to trigger a permanent failure.
>
>  A quota has been tripped. This quota may immediately recover, but it is 
> sensible to alert for 30 minutes that the quota has been tripped.
>  If you expect the failure will clear itself within a certain window, setting 
> the expiration to that window can be ideal.
> GHPR to follow



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)