Hi,
I think this looks pretty good now! Thanks for adding
HealthCheckMetadata :)
1. "b) merge the jmx implementation into the core"
I'm happy with that and the signature execute(String ... tags) is my
preferred
2. Status NO_RESULT
I think it's probably cleaner to add a property hasTimedOut to the
ExecutionResult - this is also better for the case that we can return an
old result from cache. The main disadvantage of NO_RESULT is that it is
not immediately clear from a caller's perspective, "how bad" the status
NO_RESULT is. A naive caller might think this is sort of ok (GREEN) but
I think it really is WARN (as the timeout is configured in a way that
should be sufficient for all checks). Therefore explicitly returning
WARN plus hasTimedOut=true in the ExecutionResult is IMHO the clearer
option from a consumer's point of view.
3. Execution Timeouts defined by caller / "executing the HCs tagged
with external ... with longer timeout"
The globally defined timeout is sufficient, I agree with Carsten. The
"external HCs" example is valid and I would define the global timeout to
fit to the external checks, the point is that for the "internal HCs" you
don't need to define a separate timeout because
* they are usually quick - and if they aren't, they should probably be
rewritten! ;-)
* from a conceptual point of view, the maximum time you are prepared
to wait is still the same as for the long running ones (why would you
set the timeout for internal checks to 500ms? If the checks are quick it
makes no difference, if a check requires 800ms I believe you are still
better off getting a proper result after 800ms instead of a timeout one
after 500ms).
Anyway, as suggested let's leave it for later if we find a good use
case that justifies the more complex API.
4. HealthCheckDescriptor is obsolete?
The current version in SVN uses HealthCheckDescriptor and
HealthCheckMetadata in utils - you probably created it that way to not
have the property serviceReference in HealthCheckMetadata. On the other
hand HealthCheckMetadata is taking a service reference as constructor
argument anyway - why not just merge the two classes in
HealthCheckMetadata and get rid of HealthCheckDescriptor?
5. Have you had a look at
https://issues.apache.org/jira/browse/SLING-3302 ? The tabular result
could also include the property hasTimedOut from 2.
Best Regards
Georg
Am 03.01.2014 17:29, schrieb Carsten Ziegeler:
Yes, I would prefer a separate state, parsing the log is too error
prone.
Carsten
2014/1/3 Bertrand Delacretaz <[email protected]>
Hi,
On Fri, Jan 3, 2014 at 3:28 PM, Carsten Ziegeler
<[email protected]>
wrote:
> Bertrand wrote:
>> As is you can compute the age of a result, that should already
help.
>>
> Hmm and what about the case, where there is no result yet and the
HC
takes
> longer than the timeout?
In my prototype I returned an HEALTH_CHECK_ERROR status for that,
with
a Result that includes a log that explains the problem. Haven't
checked what the current implementation does.
We could also return a specific NO_RESULT state in this case.
-Bertrand