[ https://issues.apache.org/jira/browse/FLINK-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278826#comment-17278826 ]
jiguodai commented on FLINK-11742: ---------------------------------- in fact, it has nothing to do with "instance", the real reason why metrics in pushgateway will disappear some times is that > Push metrics to Pushgateway without "instance" > ---------------------------------------------- > > Key: FLINK-11742 > URL: https://issues.apache.org/jira/browse/FLINK-11742 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics > Reporter: Tom Goong > Priority: Major > Labels: pull-request-available > Attachments: image-2019-02-25-17-16-28-618.png, > image-2019-02-25-17-16-59-034.png > > Time Spent: 10m > Remaining Estimate: 0h > > According to the official article, > [https://prometheus.io/docs/concepts/jobs_instances/] > [https://github.com/prometheus/pushgateway] > when sending a metric to Prometheus Pushgateway, you need to give an > "instance" message. > In actual use, after there is no "instance", Prometheus stores metrics with > problems, metrics are not continuous, and a lot of data is lost. After adding > instance, it returns to normal. > > no "instance" > !image-2019-02-25-17-16-28-618.png! > > with "instance" > !image-2019-02-25-17-16-59-034.png! > > > {quote}In Prometheus terms, an endpoint you can scrape is called an instance, > usually corresponding to a single process. A collection of instances with the > same purpose, a process replicated for scalability or reliability for > example, is called a job. > {quote} > {quote}For example, an API server job with four replicated instances: > job: api-server > -- instance 1: 1.2.3.4:5670 > -- instance 2: 1.2.3.4:5671 > -- instance 3: 5.6.7.8:5670 > -- instance 4: 5.6.7.8:5671 > {quote} > [https://prometheus.io/docs/concepts/jobs_instances/#jobs-and-instances] > I think a Flink job corresponds to a Prometheus job, and taskmanager and > jobmanager correspond to different instances. If the jobName is used as the > instance label, the same metrics of different tasksmanages will conflict, and > operations such as sum will fail. -- This message was sent by Atlassian Jira (v8.3.4#803005)