[ 
https://issues.apache.org/jira/browse/FLINK-11457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar Westra van Holthe - Kind updated FLINK-11457:
---------------------------------------------------
    Description: 
When cancelling a job running on a yarn based cluster and then shutting down 
the cluster, metrics on the push gateway are not deleted.

My yarn-conf.yaml settings:
{code:yaml}
metrics.reporters: promgateway
metrics.reporter.promgateway.class: 
org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
metrics.reporter.promgateway.host: pushgateway.gcpstg.bolcom.net
metrics.reporter.promgateway.port: 9091
metrics.reporter.promgateway.jobName: PSMF
metrics.reporter.promgateway.randomJobNameSuffix: true
metrics.reporter.promgateway.deleteOnShutdown: true
metrics.reporter.promgateway.interval: 30 SECONDS
{code}

What I expect to happen:
* when running, the metrics are pushed to the push gateway to a separate label 
per node (jobmanager/taskmanager)
* when shutting down, the metrics are deleted from the push gateway

This last bit does not happen.

How the job is run:
{code}flink run -m yarn-cluster -yn 5 -ys 2 -yst 
"$INSTALL_DIRECTORY/app/psmf.jar"{code} 

How the job is stopped:
{code}
YARN_APP_ID=$(yarn application -list | grep "PSMF" | awk '{print $1}')
FLINK_JOB_ID=$(flink list -r -yid ${YARN_APP_ID} | grep "PSMF" | awk '{print 
$4}')
flink cancel -s "${SAVEPOINT_DIR%/}/" -yid "${YARN_APP_ID}" "${FLINK_JOB_ID}"
echo "stop" | yarn-session.sh -id ${YARN_APP_ID}
{code} 

Is there anything I'm sdoing wrong? Anything I can help to fix?

  was:
When cancelling a job running on a yarn based cluster and then shutting down 
the cluster, metrics on the push gateway are not deleted.

 

 


 

Any thoughts on a solution? I'm happy to implement it, but Im not sure what the 
best solution would be.


> PrometheusPushGatewayReporter does not cleanup its metrics
> ----------------------------------------------------------
>
>                 Key: FLINK-11457
>                 URL: https://issues.apache.org/jira/browse/FLINK-11457
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Oscar Westra van Holthe - Kind
>            Priority: Major
>
> When cancelling a job running on a yarn based cluster and then shutting down 
> the cluster, metrics on the push gateway are not deleted.
> My yarn-conf.yaml settings:
> {code:yaml}
> metrics.reporters: promgateway
> metrics.reporter.promgateway.class: 
> org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
> metrics.reporter.promgateway.host: pushgateway.gcpstg.bolcom.net
> metrics.reporter.promgateway.port: 9091
> metrics.reporter.promgateway.jobName: PSMF
> metrics.reporter.promgateway.randomJobNameSuffix: true
> metrics.reporter.promgateway.deleteOnShutdown: true
> metrics.reporter.promgateway.interval: 30 SECONDS
> {code}
> What I expect to happen:
> * when running, the metrics are pushed to the push gateway to a separate 
> label per node (jobmanager/taskmanager)
> * when shutting down, the metrics are deleted from the push gateway
> This last bit does not happen.
> How the job is run:
> {code}flink run -m yarn-cluster -yn 5 -ys 2 -yst 
> "$INSTALL_DIRECTORY/app/psmf.jar"{code} 
> How the job is stopped:
> {code}
> YARN_APP_ID=$(yarn application -list | grep "PSMF" | awk '{print $1}')
> FLINK_JOB_ID=$(flink list -r -yid ${YARN_APP_ID} | grep "PSMF" | awk '{print 
> $4}')
> flink cancel -s "${SAVEPOINT_DIR%/}/" -yid "${YARN_APP_ID}" "${FLINK_JOB_ID}"
> echo "stop" | yarn-session.sh -id ${YARN_APP_ID}
> {code} 
> Is there anything I'm sdoing wrong? Anything I can help to fix?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to