Re: HealthCheckExecutor.execute(ServiceReference) ??

Carsten Ziegeler Tue, 14 Jan 2014 06:07:58 -0800

Answering my own question, I think its good if we always return the elapsed
time, but return null for getFinishedDate.


I'll do the change

Carsten


2014/1/10 Carsten Ziegeler <[email protected]>

> I'Ve done the suggested changes:
> - moved jmx stuff into the core
> - changed the signature to execute(String... tags)
> - Added service reference to metadata
>
> The only thing I'm unclear with are the different result possibilites. I
> think there are these cases when execute is called:
> a) no result in the cache/cached result is obsolete, hc executes within
> the timeout -> hc is executed, real result is cached and returned
> b) no result in the cache/cached result is obsolete, hc executes not
> within the timeout -> hc is executed, fake result is returned
> c) valid result in the cache -> hc is not executed, cached result is
> returned
>
> I think there is no need for a client to find out whether it's case a) or
> c) - if interested client code can inspect the getFinishedAt() method.
> Now, the interesting case still is b) - I see the point that we return
> WARN in that case - I would assume that getElapsedTimeInMs returns -1 and
> getFinishedAt() returns null. So both could be used as an indicator for
> this situation.
>
> WDYT?
>
> Regards
> Carsten
>
>
> 2014/1/5 Georg Henzler <[email protected]>
>
>> Hi,
>>
>> I think this looks pretty good now! Thanks for adding HealthCheckMetadata
>> :)
>>
>> 1. "b) merge the jmx implementation into the core"
>> I'm happy with that and the signature execute(String ... tags) is my
>> preferred
>>
>> 2. Status NO_RESULT
>> I think it's probably cleaner to add a property hasTimedOut to the
>> ExecutionResult - this is also better for the case that we can return an
>> old result from cache. The main disadvantage of NO_RESULT is that it is not
>> immediately clear from a caller's perspective, "how bad" the status
>> NO_RESULT is. A naive caller might think this is sort of ok (GREEN) but I
>> think it really is WARN (as the timeout is configured in a way that should
>> be sufficient for all checks). Therefore explicitly returning WARN plus
>> hasTimedOut=true in the ExecutionResult is IMHO the clearer option from a
>> consumer's point of view.
>>
>> 3. Execution Timeouts defined by caller / "executing the HCs tagged with
>> external ... with longer timeout"
>> The globally defined timeout is sufficient, I agree with Carsten. The
>> "external HCs" example is valid and I would define the global timeout to
>> fit to the external checks, the point is that for the "internal HCs" you
>> don't need to define a separate timeout because
>>  * they are usually quick - and if they aren't, they should probably be
>> rewritten! ;-)
>>  * from a conceptual point of view, the maximum time you are prepared to
>> wait is still the same as for the long running ones (why would you set the
>> timeout for internal checks to 500ms? If the checks are quick it makes no
>> difference, if a check requires 800ms I believe you are still better off
>> getting a proper result after 800ms instead of a timeout one after 500ms).
>> Anyway, as suggested let's leave it for later if we find a good use case
>> that justifies the more complex API.
>>
>> 4. HealthCheckDescriptor is obsolete?
>> The current version in SVN uses HealthCheckDescriptor and
>> HealthCheckMetadata in utils - you probably created it that way to not have
>> the property serviceReference in HealthCheckMetadata. On the other hand
>> HealthCheckMetadata is taking a service reference as constructor argument
>> anyway - why not just merge the two classes in HealthCheckMetadata and get
>> rid of HealthCheckDescriptor?
>>
>> 5. Have you had a look at https://issues.apache.org/
>> jira/browse/SLING-3302 ? The tabular result could also include the
>> property hasTimedOut from 2.
>>
>> Best Regards
>> Georg
>>
>> Am 03.01.2014 17:29, schrieb Carsten Ziegeler:
>>
>>  Yes, I would prefer a separate state, parsing the log is too error prone.
>>>
>>> Carsten
>>>
>>>
>>> 2014/1/3 Bertrand Delacretaz <[email protected]>
>>>
>>>  Hi,
>>>>
>>>> On Fri, Jan 3, 2014 at 3:28 PM, Carsten Ziegeler <[email protected]>
>>>> wrote:
>>>> > Bertrand wrote:
>>>> >> As is you can compute the age of a result, that should already help.
>>>> >>
>>>> > Hmm and what about the case, where there is no result yet and the HC
>>>> takes
>>>> > longer than the timeout?
>>>>
>>>> In my prototype I returned an HEALTH_CHECK_ERROR status for that, with
>>>> a Result that includes a log that explains the problem. Haven't
>>>> checked what the current implementation does.
>>>>
>>>> We could also return a specific NO_RESULT state in this case.
>>>>
>>>> -Bertrand
>>>>
>>>>
>>
>
>
> --
> Carsten Ziegeler
> [email protected]
>



-- 
Carsten Ziegeler
[email protected]

Re: HealthCheckExecutor.execute(ServiceReference) ??

Reply via email to