[ https://issues.apache.org/jira/browse/SLING-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carsten Ziegeler updated SLING-5965: ------------------------------------ Fix Version/s: (was: Commons Scheduler 2.6.2) Commons Scheduler 2.6.4 > Metrics and a Health-Check for Scheduler to detect long-running Quartz-Jobs > --------------------------------------------------------------------------- > > Key: SLING-5965 > URL: https://issues.apache.org/jira/browse/SLING-5965 > Project: Sling > Issue Type: New Feature > Components: Commons > Affects Versions: Commons Scheduler 2.5.0 > Reporter: Stefan Egli > Fix For: Commons Scheduler 2.6.4 > > Attachments: SLING-5965.patch, SLING-5965.v2.patch.txt > > > Sling Scheduler jobs (aka Quartz-Jobs) should typically be fast running jobs. > They are served from a thread-pool and should occupy that thread only for a > short amount of time. > If there are 'misbehaving' quartz-jobs that run for a very long time, they > start to occupy threads from that thread-pool, thus have an influence on the > performance of other scheduled/quartz-jobs. > We should have metrics (using > [sling.commons.metrics|https://sling.apache.org/documentation/bundles/metrics.html]) > that provide information about internas of Sling Scheduler, such as average, > max etc duration of scheduled jobs, as well as how many jobs are currently > running and since when was the oldest job running. > Based on this, a Health-Check can monitor the 'oldest job running' metric and > flag {{critical}} when eg the oldest job is older than {{60'000ms}} > (configurable, default). -- This message was sent by Atlassian JIRA (v6.3.15#6346)