Hi,

I think this looks pretty good now! Thanks for adding HealthCheckMetadata :)

1. "b) merge the jmx implementation into the core"
I'm happy with that and the signature execute(String ... tags) is my preferred

2. Status NO_RESULT
I think it's probably cleaner to add a property hasTimedOut to the ExecutionResult - this is also better for the case that we can return an old result from cache. The main disadvantage of NO_RESULT is that it is not immediately clear from a caller's perspective, "how bad" the status NO_RESULT is. A naive caller might think this is sort of ok (GREEN) but I think it really is WARN (as the timeout is configured in a way that should be sufficient for all checks). Therefore explicitly returning WARN plus hasTimedOut=true in the ExecutionResult is IMHO the clearer option from a consumer's point of view.

3. Execution Timeouts defined by caller / "executing the HCs tagged with external ... with longer timeout" The globally defined timeout is sufficient, I agree with Carsten. The "external HCs" example is valid and I would define the global timeout to fit to the external checks, the point is that for the "internal HCs" you don't need to define a separate timeout because * they are usually quick - and if they aren't, they should probably be rewritten! ;-) * from a conceptual point of view, the maximum time you are prepared to wait is still the same as for the long running ones (why would you set the timeout for internal checks to 500ms? If the checks are quick it makes no difference, if a check requires 800ms I believe you are still better off getting a proper result after 800ms instead of a timeout one after 500ms). Anyway, as suggested let's leave it for later if we find a good use case that justifies the more complex API.

4. HealthCheckDescriptor is obsolete?
The current version in SVN uses HealthCheckDescriptor and HealthCheckMetadata in utils - you probably created it that way to not have the property serviceReference in HealthCheckMetadata. On the other hand HealthCheckMetadata is taking a service reference as constructor argument anyway - why not just merge the two classes in HealthCheckMetadata and get rid of HealthCheckDescriptor?

5. Have you had a look at https://issues.apache.org/jira/browse/SLING-3302 ? The tabular result could also include the property hasTimedOut from 2.

Best Regards
Georg

Am 03.01.2014 17:29, schrieb Carsten Ziegeler:
Yes, I would prefer a separate state, parsing the log is too error prone.

Carsten


2014/1/3 Bertrand Delacretaz <[email protected]>

Hi,

On Fri, Jan 3, 2014 at 3:28 PM, Carsten Ziegeler <[email protected]>
wrote:
> Bertrand wrote:
>> As is you can compute the age of a result, that should already help.
>>
> Hmm and what about the case, where there is no result yet and the HC
takes
> longer than the timeout?

In my prototype I returned an HEALTH_CHECK_ERROR status for that, with
a Result that includes a log that explains the problem. Haven't
checked what the current implementation does.

We could also return a specific NO_RESULT state in this case.

-Bertrand


Reply via email to