[
https://issues.apache.org/jira/browse/FLINK-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15628644#comment-15628644
]
ASF GitHub Bot commented on FLINK-4888:
---------------------------------------
Github user rmetzger commented on a diff in the pull request:
https://github.com/apache/flink/pull/2683#discussion_r86119642
--- Diff:
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
---
@@ -1828,6 +1828,33 @@ class JobManager(
jobManagerMetricGroup.gauge[Long, Gauge[Long]]("numRunningJobs", new
Gauge[Long] {
override def getValue: Long = JobManager.this.currentJobs.size
})
+ jobManagerMetricGroup.gauge[Long, Gauge[Long]]("numFailedJobs", new
Gauge[Long] {
+ override def getValue: Long = {
+ var failedJobs = 0
+ val ourJobs = createJobStatusOverview()
+ val future = (archive ?
RequestJobsOverview.getInstance())(timeout)
+ val archivedJobs : JobsOverview = Await.result(future,
timeout).asInstanceOf[JobsOverview]
+ failedJobs += ourJobs.getNumJobsFailed() +
archivedJobs.getNumJobsFailed()
+ failedJobs
+ }})
+ jobManagerMetricGroup.gauge[Long, Gauge[Long]]("numCancelledJobs", new
Gauge[Long] {
+ override def getValue: Long = {
+ var cancelledJobs = 0
+ val ourJobs = createJobStatusOverview()
+ val future = (archive ?
RequestJobsOverview.getInstance())(timeout)
+ val archivedJobs : JobsOverview = Await.result(future,
timeout).asInstanceOf[JobsOverview]
+ cancelledJobs += ourJobs.getNumJobsCancelled() +
archivedJobs.getNumJobsCancelled()
+ cancelledJobs
+ }})
+ jobManagerMetricGroup.gauge[Long, Gauge[Long]]("numFinishedJobs", new
Gauge[Long] {
+ override def getValue: Long = {
+ var finishedJobs = 0
+ val ourJobs = createJobStatusOverview()
+ val future = (archive ?
RequestJobsOverview.getInstance())(timeout)
+ val archivedJobs : JobsOverview = Await.result(future,
timeout).asInstanceOf[JobsOverview]
+ finishedJobs += ourJobs.getNumJobsFinished() +
archivedJobs.getNumJobsFinished()
+ finishedJobs
+ }})
--- End diff --
I think doing everything in one request and setting a short timeout (1
second?) is a good solution.
> instantiated job manager metrics missing important job statistics
> ------------------------------------------------------------------
>
> Key: FLINK-4888
> URL: https://issues.apache.org/jira/browse/FLINK-4888
> Project: Flink
> Issue Type: Improvement
> Components: Metrics
> Affects Versions: 1.1.2
> Reporter: Philipp von dem Bussche
> Assignee: Philipp von dem Bussche
> Priority: Minor
>
> A jobmanager is currently (only) instantiated with the following metrics:
> taskSlotsAvailable, taskSlotsTotal, numRegisteredTaskManagers and
> numRunningJobs. Important other metrics would be numFailedJobs,
> numCancelledJobs and numFinishedJobs. Also to get parity between JobManager
> metrics and whats available via the REST API it would be good to have these.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)