wenlong88 opened a new pull request #11679: [FLINK-16864][metrics] Add IdleTime 
metric for task.
URL: https://github.com/apache/flink/pull/11679
 
 
   ## What is the purpose of the change
   This pr adds an IdleTime metric which measures idle time of a task including 
the time cost for mail processor to wait for new mail and the time cost in 
record writer to waiting a new buffer.
   1. when a job can not catch up with the speed of data generating, the vertex 
which idle time is near to zero is the bottle neck of the job.
   2. when a job is not busy, idle time can be used to guide user how much he 
can scale down the job.
   
   ## Brief change log
   A meter metrics named idleTimeMsPerSecond is added in TaskIOMetricGroup:
     - *idleTime contains time waiting mail and time waiting output buffer. *
     - *waiting mail time is collected when mail processor is trying to process 
a new mail but no mail submitted.*
     - *waiting output buffer time is collected when record writer is trying to 
request a new buffer from result partition but no buffer available.* 
   
   in addition to https://github.com/apache/flink/pull/11564 to fix unstable 
test in https://issues.apache.org/jira/browse/FLINK-17053 and 
https://issues.apache.org/jira/browse/FLINK-17054:
   - testClearBuffersAfterInterruptDuringBlockingBufferRequest is refactored so 
that it is not depends Mock Class.
   - only check valid idle time collected(idleTime > 0) in ut instead of 
exactly idle time to avoid race condition affects result of test.
   
   ## Verifying this change
   
   This change is already covered by existing tests, such as 
     - *TaskIOMetricGroupTest*
     - *RecordWriterTest*
     - *TaskMailboxProcessorTest*
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? yes
     - If yes, how is the feature documented? docs
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to