[ 
https://issues.apache.org/jira/browse/FLINK-31372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Dziolak updated FLINK-31372:
--------------------------------------
    Description: 
We've identified a memory leak, that occurs when any of the metric reporters 
fail with an exception. In such cases HTTPExchanges are not  getting closed 
properly in io.prometheus.client.exporter.HTTPServer.HTTPMetricHandler

In our case the failure was triggered by usage of incompatible Kafka Client 
failing metric collection with:

{{Exception in thread "prometheus-http-1-72873" java.lang.NoSuchMethodError: 
'double org.apache.kafka.common.Metric.value()'}}

Should Prometheus Reporter handle metric collection defensively (by suppressing 
exceptions) to guarantee metric delivery and avoid similar memory leaks?

  was:
Basically I'm running flink at the 1.15.1 version with docker  and often the 
application start to slow down because of OOM errors. 
It was observed that the memory continued to increase, and the number of 
threads continued to increase through the mertics data collected by Prometheus。
I tried to remove the sink kafka code and it looks normal,so I change the flink 
to 1.14.5 and it works fine.
Is this a bug?


> Memory Leak in HTTPMetricHandler when reporting fails
> -----------------------------------------------------
>
>                 Key: FLINK-31372
>                 URL: https://issues.apache.org/jira/browse/FLINK-31372
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Kafka, kafka
>    Affects Versions: 1.16.1, 1.15.4, 1.17.1
>            Reporter: Krzysztof Dziolak
>            Priority: Minor
>
> We've identified a memory leak, that occurs when any of the metric reporters 
> fail with an exception. In such cases HTTPExchanges are not  getting closed 
> properly in io.prometheus.client.exporter.HTTPServer.HTTPMetricHandler
> In our case the failure was triggered by usage of incompatible Kafka Client 
> failing metric collection with:
> {{Exception in thread "prometheus-http-1-72873" java.lang.NoSuchMethodError: 
> 'double org.apache.kafka.common.Metric.value()'}}
> Should Prometheus Reporter handle metric collection defensively (by 
> suppressing exceptions) to guarantee metric delivery and avoid similar memory 
> leaks?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to