[ 
https://issues.apache.org/jira/browse/FLINK-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602014#comment-15602014
 ] 

ASF GitHub Bot commented on FLINK-4888:
---------------------------------------

Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2683#discussion_r84689406
  
    --- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
    @@ -1828,6 +1828,33 @@ class JobManager(
         jobManagerMetricGroup.gauge[Long, Gauge[Long]]("numRunningJobs", new 
Gauge[Long] {
           override def getValue: Long = JobManager.this.currentJobs.size
         })
    +    jobManagerMetricGroup.gauge[Long, Gauge[Long]]("numFailedJobs", new 
Gauge[Long] {
    +      override def getValue: Long = {
    +         var failedJobs = 0
    +         val ourJobs = createJobStatusOverview()
    +         val future = (archive ? 
RequestJobsOverview.getInstance())(timeout)
    +         val archivedJobs : JobsOverview = Await.result(future, 
timeout).asInstanceOf[JobsOverview]
    +         failedJobs += ourJobs.getNumJobsFailed() + 
archivedJobs.getNumJobsFailed()
    +         failedJobs
    +    }})
    +    jobManagerMetricGroup.gauge[Long, Gauge[Long]]("numCancelledJobs", new 
Gauge[Long] {
    +      override def getValue: Long = {
    +         var cancelledJobs = 0
    +         val ourJobs = createJobStatusOverview()
    +         val future = (archive ? 
RequestJobsOverview.getInstance())(timeout)
    +         val archivedJobs : JobsOverview = Await.result(future, 
timeout).asInstanceOf[JobsOverview]
    +         cancelledJobs += ourJobs.getNumJobsCancelled() + 
archivedJobs.getNumJobsCancelled()
    +         cancelledJobs
    +    }})
    +    jobManagerMetricGroup.gauge[Long, Gauge[Long]]("numFinishedJobs", new 
Gauge[Long] {
    +      override def getValue: Long = {
    +         var finishedJobs = 0
    +         val ourJobs = createJobStatusOverview()
    +         val future = (archive ? 
RequestJobsOverview.getInstance())(timeout)
    +         val archivedJobs : JobsOverview = Await.result(future, 
timeout).asInstanceOf[JobsOverview]
    +         finishedJobs += ourJobs.getNumJobsFinished() + 
archivedJobs.getNumJobsFinished()
    +         finishedJobs
    +    }})
    --- End diff --
    
    @zentol What is your take on this change?
    I'm uncertain if doing RPC calls in gauges is a good idea.


> instantiated job manager metrics missing important job statistics 
> ------------------------------------------------------------------
>
>                 Key: FLINK-4888
>                 URL: https://issues.apache.org/jira/browse/FLINK-4888
>             Project: Flink
>          Issue Type: Improvement
>          Components: Metrics
>    Affects Versions: 1.1.2
>            Reporter: Philipp von dem Bussche
>            Assignee: Philipp von dem Bussche
>            Priority: Minor
>
> A jobmanager is currently (only) instantiated with the following metrics: 
> taskSlotsAvailable, taskSlotsTotal, numRegisteredTaskManagers and 
> numRunningJobs. Important other metrics would be numFailedJobs, 
> numCancelledJobs and numFinishedJobs. Also to get parity between JobManager 
> metrics and whats available via the REST API it would be good to have these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to