[
https://issues.apache.org/jira/browse/FELIX-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joerg Hoh closed FELIX-6663.
----------------------------
> Warn if healthcheck execution takes too long
> --------------------------------------------
>
> Key: FELIX-6663
> URL: https://issues.apache.org/jira/browse/FELIX-6663
> Project: Felix
> Issue Type: Task
> Components: Health Checks
> Affects Versions: healthcheck.core 2.2.0
> Reporter: Joerg Hoh
> Priority: Major
> Fix For: healthcheck.core 2.3.0
>
>
> We monitor our system using Felix Healthchecks and require that some
> healthchecks are reported OK at least every 5 seconds. For this we configured
> the timeout in theĀ HealthCheckOptions to 5 seconds.
> Sometimes we face the situation that the system goes unhealthy without a
> healthcheck being executed. It even seems that none of the required
> healthcheck is executed during that time at all.
> I already ruled out a few obvious cases (full GC, maxed out CPU), but I still
> have a few cases which I cannot explain yet. Also while checking the code, I
> found that on every invocation of the HealthcheckExecutor.execute() all
> metadata for the healthchecks are collected, which require access to the OSGI
> Service registry. My application also has situation where a lot of access to
> the Service registry happens, which can suffer from lock contention under
> load, and that is not included into the timeout calculation of the of the
> healthchecks.
> As a first step I would like to add some more logging in case the overall
> execution of the healthchecks exceed the configured timeout.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)