[ https://issues.apache.org/jira/browse/SLING-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stefan Egli updated SLING-5965: ------------------------------- Attachment: (was: oldestRunningJob.tiff) > Metrics and a Health-Check for Scheduler to detect long-running Quartz-Jobs > --------------------------------------------------------------------------- > > Key: SLING-5965 > URL: https://issues.apache.org/jira/browse/SLING-5965 > Project: Sling > Issue Type: New Feature > Components: Commons > Affects Versions: Commons Scheduler 2.5.0 > Reporter: Stefan Egli > Assignee: Stefan Egli > Fix For: Commons Scheduler 2.6.4 > > Attachments: numRunningJobs.jpg, oldestRunningJob.jpg, > SchedulerHealthCheck.jpg, SLING-5965.patch, SLING-5965.v2.patch.txt, > SLING-5965.v3.patch.txt, timers.jpg > > > Sling Scheduler jobs (aka Quartz-Jobs) should typically be fast running jobs. > They are served from a thread-pool and should occupy that thread only for a > short amount of time. > If there are 'misbehaving' quartz-jobs that run for a very long time, they > start to occupy threads from that thread-pool, thus have an influence on the > performance of other scheduled/quartz-jobs. > We should have metrics (using > [sling.commons.metrics|https://sling.apache.org/documentation/bundles/metrics.html]) > that provide information about internas of Sling Scheduler, such as average, > max etc duration of scheduled jobs, as well as how many jobs are currently > running and since when was the oldest job running. > Based on this, a Health-Check can monitor the 'oldest job running' metric and > flag {{critical}} when eg the oldest job is older than {{60'000ms}} > (configurable, default). -- This message was sent by Atlassian JIRA (v6.4.14#64029)